HOW TO SIGNIFICANTLY REDUCE RUNTIME IN CUBE VOYAGER USING CUBE CLUSTER

The Cube Voyager “Matrix” program is a very powerful calculator, not only for converting and processing trips or skim matrices or for discrete-choice modeling, but also, more generally, to efficiently achieve any type of record processing task.

Nevertheless, looping through complex operations or processing big data files, can sometimes be quite computationally expensive. To strongly reduce the runtime of these processes, we already suggested the use of AUTOMDARRAY in a previous How To post showing how to handle matrices in memory.

An additional solution to drastically reduce runtime is to use Cube Cluster to split the computational task across multiple computing nodes, where a computing node consists of a single computer processor.

 

CUBE Cluster: Intrastep VS Multistep

Cube Cluster implemented with Cube Voyager allows you to distribute the workload of selected Voyager programs across multiple computing nodes. Each computing node is a single computer processor, therefore assembling your own computer cluster is possible by:

  • Networking several computers together on a local area network
  • Using a dedicated multiprocessor machine, with each processor acting as a cluster node

There are two forms of distributed processing that can be implemented in Cube Voyager:

  • Intrastep distributed processing (IDP)
    This type of distributed processing works by breaking up zone-based processing within a single Cube Voyager program into groups of zones. These groups are run at the same time on multiple computing nodes. (The benefits of this type of Cube Cluster implementation are highlighted below).
  • Multistep distributed processing (MDP)
    This type of distributed processing works by breaking up two or more blocks of one or more Cube Voyager programs and distributing them to multiple computing nodes to run simultaneously. When implementing this process, the user needs to carefully design it such that the distributed blocks and the mainline process are logically independent of each other. Cube’s Application Manager flow chart view of the model provides a great tool for identifying the model steps that can be distributed using MDP.

 

HERE’S HOW:

To illustrate this solution, we will use the same triple indexing application we used in our previous post.

It is indeed very straightforward to set-up a simple Cluster INTRASTEP DISTRIBUTE PROCESS (IDP) for a Matrix program in your multiprocessor machine, following the 3-step process below (further options are available, but will not be illustrated here for simplicity):

    1. Include a Pilot program in your application to automatically start the Cluster nodes (Reference Guide: Cube Cluster > Utilities > Cluster executable > Running Cluster from the Command Line)script-file
      *Cluster ParkAndRide 1-8 Start Exit

      Where, for the specific example:

      • “ParkAndRide” is the Process ID (ID of the process, same name as used in the Matrix step below)
      • “1-8” is the list of sub-processes (sequence of cluster nodes to start)

 

  1. Add the DistributeINSTRASTEP statement in the appropriate Matrix script (Reference Guide: Cube Cluster > Using Cube Cluster > Working with Cube Cluster > Intrastep distributed processing (IDP))runtime-grid
    DISTRIBUTEINTRASTEP PROCESSID=ParkAndRide PROCESSLIST=1-8


  2. Finally, include a Pilot program to close all the processing Cluster nodes
    script

    *Cluster ParkAndRide 1-8 Close Exit

 

In the Triple Indexing application, two programs apply Cube Cluster: Matrix #3 and #8. With Pilot programs before and after, opening and closing the specific cluster nodes.

 

ORIGINAL SCRIPT BEFORE IMPROVEMENT:

As reported in our previous post, the original script for the Park&Ride skim matrix used MATVAL to access the matrix cells and had a runtime of 2 minutes.

IMPROVEMENT STEP #1:
Using the AUTOMDARRAY_1 improvement, our example had a runtime of 1 minute and 22 seconds. With the use of Cluster with eight (8) extra-nodes, runtime was reduced to 28 seconds.

IMPROVEMENT STEP #2:
AUTOMDARRAY_2 allowed to further reduced runtime to 1 minute and 6 seconds, by splitting the process into 2 steps.
Using Cluster with eight (8) extra-nodes for the second step (Matrix #8) reduces the runtime of the two steps to 25 seconds, a fraction of the original runtime.

Using Cluster (Intrastep or Multistep) with more time intensive processes will show even higher improvements in runtime.

 

Want to learn more?

Do you need more information on how to use Cube Cluster to optimize your process? Chapter 15 of our Cube Voyager Reference Guide will provide you with full details on how to use Cluster.

Furthermore, the following how-to articles will provide examples and further details for setting up Intrastep or Multistep Cluster processes.

Our Citilabs experts are here to help.

Reach out to us if you need a license activation code to test Cluster in your modeling frameworks. And let us know what improvements you are able to gain using these tips. We’d like to hear from you!

 


FILIPPO CONTIERO
SENIOR TRANSPORT MODELLER
LinkedIn

Filippo is an experienced and passionate transport modeler based in München, Germany. As Senior Transport Modeller at Citilabs, he assists users in optimizing their use of Cube based on their individual needs and goals. In addition, he oversees Cube user support, training courses and “one-to-one” coaching. Prior to Citilabs, he worked as a consultant and transport modeller over a number of different projects, developing transportation models for highway and multimodal studies. He gained a Master Degree in Civil Engineering from Padova University (Italy) and an MSc in Transport Planning and Engineering from the ITS at Leeds University (UK).

 

c