Processing Model

InSilicoLab Processing Model

Many use cases analysed during the work on the InSilicoLab framework show that scientific computations not only need large computational resources, but also immediate response and interactivity. Therefore, a comprehensive computing model targeted at research communities have to, on one hand, enable or facilitate access to large computer infrastructures, and, on the other, consider interactivity and responsiveness as key functionality.
The processing model used within the InSilicoLab framework was built to enable the following types of computation:

Batch operations – they usually take a significant amount of time to complete, therefore, their immediate result is not important. These operations often involve managing large datasets. There are also two features of such computations that are usually neglected by traditional computational systems:

  1. Interaction with the application – the computational task run on an infrastructure might execute an application that, at some point, requires user input or simple decision.
  2. Continuous preview of the application results – partial results should be available for preview during the computation (if the application produces such partial results before it completes).

Immediate operations – they require immediate response from the system. They are also usually used jointly with other types of processing – e.g. to test parameters for larger batch computation or to deploy a service.

On-demand deployment of services – operations that deploy services to be later used by other components. The service endpoints will be distributed across the user infrastructure. The use of such services could enable processing data streams – which could not be handled by any other mechanism.

Implementing this concept of performing computations brings us closer to the main aim a modern processing model for computing infrastructures should follow – i.e., preserving the same functionalities and usage patterns that the users have on their personal computers. Such model is already partially available through contemporary grid and cloud technologies (IaaS, PaaS), however, it is not yet accessible in a unified way.

Implementation

The architecture of the InSilicoLab framework, to enable all the aforementioned types of computation, was built in a loosely coupled manner, with use of a message broker. The message broker became a central component of the system that coordinates requests of the other actors: requestor and worker. Requestor is an entity (usually a portal interfacing the user) that defines the computation to be performed, whereas a worker is an actual program that performs the computation. Both the requestor and worker can be deployed in multiple instances, either providing different functionality or duplicating an existing component – the latter providing means for load balancing. The message broker ensures a reliable asynchronous communication between all the actors.

Schema of the communication in the system: Requestor and Worker instances interacting through a Message Broker.

Workers start their execution by subscribing to the queue and declaring what type of specific task they are able to process (e.g., a specific program, dynamically deployed visualization service etc.). As the workers communicate with the rest of the system only via the message queue, they can be launched on any computer capable of running the specific task – regardless whether it is a node of an advanced distributed infrastructure or just a personal computer. Likewise, they can be implemented in any software technology or programming language. Workers of adequate type (able to process certain task type) can be spawned or terminated according to the system load.
The requestor is responsible for creating task definitions, based on the user input, and submitting them to the message queue. The tasks are portions of work that have to be delegated from the requestor (e.g. portal) to workers. They are usually a specific set of parameters and/or input files for a concrete program run by the worker. When the requestor submits a task of a certain type to the queue, the message broker sends it to one of the workers, that are capable of processing it. The workers are usually built for a concrete application, and, therefore, need only a limited configuration and input files passed with the task description to start the computation. Such approach limits their use to only computations of a specified kind, however, the behavior of such workers is much more predictable and can be more easily handled and communicated to the user. Furthermore, implementing a general-purpose worker that would be able to execute an arbitrary script is also possible, to complement the system functionality.
At any time, the worker may send messages back to the requestor – also through the message queue. Such messages can include e.g. task results (also partial results), execution status or error information. What is more, when receiving a task from the queue, the worker subscribes to a new, temporary, queue which serves as a channel for additional message exchange with the requestor (concerning only the given task). In this way a two-way communication is possible, in which both sides may initiate the interaction and both sides await messages – e.g. the requestor can send additional data required by the task during its execution, or the worker may notify the requestor that, it needs an input from the user. If a worker is deploying a service with an external interface, this communication channel will still be active, therefore, allowing the worker to communicate with the requestor in the usual manner (e.g. enabling the requestor to send a shutdown message), while external entities will be accessing it according to the service protocol and interface.

Relying on asynchronous message passing as the communication layer adds an important quality to the system, namely, the persistence. Appropriate message broker policies ensure that task descriptions are not lost in case of worker failure or connection problems. Similarly, should the requestor side fail, all messages from the workers are persisted by the message broker and delivered as soon as the requestor restarts/reconnects. The message broker policies also ensure the integrity of the system – e.g. that one task request will not be consumed by many workers at a time.
The latency added by the communication through a queue is minimal, therefore, instant response of the system is assured assuming there is a free worker to perform the requested task and, that, in turn, may be managed by appropriate worker deployment and spawning policy.

The workers are owned by users who run them either manually – by executing them directly or starting through a dedicated interface – or automatically, by triggering a computation of a certain type. The owner can also set the worker policy to allow other users to utilize it. In this way, light, permanent workers can be deployed on dedicated resources to realize small computations and be available to all the users. Such workers can perform immediate operations, as they are constantly ready to receive and perform their tasks. Once they complete, the result is instantly put to the queue and dispatched to the requestor which is waiting for it.
All workers are managed and monitored by a separate component – Worker Manager, which is aware how many workers, of what kind, and belonging to what users are alive, and what is their state (e.g. busy, waiting for reply, unoccupied). The manager would automatically adjust the pool of permanent workers to ensure the immediate operations are performed fast, and trigger notifications to the user whenever the user-managed pool of workers is overloaded.

 

This is based on an excerpts from the publication J. Kocot, T. Szepieniec, P. Wójcik, M. Trzeciak, M. Golik, T. Grabarczyk, H. Siejkowski, M. Sterzel: A Framework for Domain-Specic Science Gateways. In: M. Bubak, J. Kitowski, K. Wiatr (eds.) eScience on Distributed Computing Infrastructure, Achievements of PLGrid Plus. LNCS, vol. 8500, pp. 130-146 Springer, http://link.springer.com/chapter/10.1007%2F978-3-319-10894-0_10. See more in the papers section.