Vikas Singhal

Reducing Processes for Distributed Task Processing Using Beanstalk



To reduce the number of processes and free unnecessary occupation of system resources. Achieve parallel processing and optimal system performance without creating too many processes.

Old mechanism:
​In our application​ ​(beanstalk based messaging system), whenever a new job (for example a MO or MT processing task) related to a shortcode is received, it is put into a corresponding tube (which is basically a queue). The tube would look like mo-<short-code> or mt-<short-code> (eg, mo-1010). In production, there are 50 short-codes, with 50 tubes for MT tasks and 50 tubes for MO tasks were present. To process each of the tube jobs, we had a dedicated script, which would monitor that tube in the background and if a job came in, the script would process it and free that job from relevant tube.

The Problem:
As per this mechanism, we have more than 150 processes (daemons) that are always running in the background and consuming/holding server resources like mysql (database) connections, bandwidth and memory etc. Another 50 processes exist to monitor schedule of tasks attributed to the shortcodes.

A mechanism/solution, by which the total number of processes could be brought down and at the same time to have them run, always.

We created a php main script which runs in the background as a daemon which monitors a specific set of tubes. Whenever a job comes, the script will fetch it and according to its details; another process corresponding to its tube gets started, which takes care of all the jobs related to that tube and terminates (these scripts are the ones that were earlier handling the shortcode specific tubes). After starting the process, script ignores the corresponding tube and monitor the remaining tubes. After a definite interval, we look for the process list to check if the processes that were started by script are over or not. If they’re over (not found in process list), we add those into our watch list again to start monitoring them. By following this approach, only few processes/instances of main script run as supervisor processes, so less resources get consumed, when the system is idle. Also, resources like mysql connections gets free when not required. So overall, the system is idle when no jobs exist. The additional task monitoring processes will also reduced to have each process monitor multiple shortcode-wise tasks scheduled.

The scripts created for MO/MT handling can be configured with seven arguments (with five optionals) that can be sent with the main script to manage all types of tubes. For campaign handling script, 1st and 3rd arguments mentioned below do not apply and rest all parameters are optional as well:

  1. Tube name prefix
  2. Relative php name which has to be triggered, if a job comes
  3. A flag (1 to monitor newly added relative tubes dynamically during sleep otherwise 0 to monitor only tubes fetch in starting) appended by an interval value (represents the number of iterations when no jobs found) separated by ‘-‘. Default value: 0-10
  4. Rest tube names (comma separated short-codes) to whom we want to include into the watch list (‘all’ means all tubes). Default value: all
  5. Rest tube names (comma separated short-codes) to whom we want to exclude from the watch list (‘none’ means nothing to exclude). Default value: none
  6. Instance number/name, used in creation of log file name. Default value: 1
  7. Path from where triggered php (passed as argument 2) exists. Default value: /var/www/html

In addition to above, two more global variables are also created at the start of this script which provides some additional benefits:

  1. Total number of jobs allowed per instance: ‘-1’ value means only single instance for each tube, otherwise the value (like 1000) means to handle each of the 1000 jobs a new instance will be started. Ex: If there are approx. 5000 jobs in beginning, then 5 php instances are started to handle all 5000 jobs. Formula is ceil (5000/1000)
  2. Include tubes into the watchlist in default: this is a list of default tubes (comma separated shortcodes), which has to be watched by the script

Sample command to start the main process:-

  1. sudo php -q watch_process_tubes.php mt- mttriggered.php
  2. sudo php -q watch_process_tubes.php mo- motriggered.php 1-10 all none 1 triggeredfilepath
  3. sudo php -q watch_process_tubes.php mo- motriggered.php 0 1010,1012 none 2 triggeredfilepath
  4. sudo php -q watch_process_tubes.php mt- mttriggered.php 0 all 1010,1012