Protein Molecule Simulation - Desktop Grid version
AutoDock is a suite of automated docking tools. It is designed to predict how small molecules, such as substrates or drug candidates, bind to a receptor of known 3D structure. AutoDock actually consists of two main programs: AutoDock performs the docking of the ligand to a set of grids describing the target protein; AutoGrid pre-calculates these grids.
Deploying these applications on a Grid computing infrastructure, utilising hundreds of machines at the same time, allows harnessing sufficient computational power to undertake the simulations on a larger scale and in a much shorter timeframe. Running the simulations and analysing the results on the Grid provides the excessive computational power required.
AutoDock and AutoGrid are sequential applications and utilise no parallelism. Therefore, porting it to a Desktop Grid (DG) requires a new execution policy.
To create a new execution policy, we should understand how the current version of the application works. Only one of the applications in the AutoDock software suite requires excessive computational power: AutoDock. AutoGrid runs only for a few minutes, so the parallelisation of that application is not necessary. The user first runs AutoGrid, which generates the grid map files. With the grid map files and other necessary input files in hand the user is ready to start AutoDock. Running AutoDock is a much more computation intensive task. One run of AutoDock finishes on a single machine in a reasonable time, but thousands of scenarios have to be simulated and analysed to receive stable and meaningful results. That is, AutoDock has to be run multiple times with the same input files.
The necessity of multiple runs of AutoDock on the same set of input files offers a simple parallelisation technique. Without modifying the original applications at all, we can simply distribute the same set of input files to all of the worker nodes. This way, the nodes will all work on the same files, but due to random factors they will all produce different outputs. The figure below outlines the basic this idea of parallelisation.
The WS P-GRADE portal serves as a high level user interface for the application. Users can start the Autodock simulations from the portal by specifying the input files and the number of iterations. The work units are then sent to the Westminster Local Desktop Grid and also to the EGEE via the EDGeS DG to EGEE bridge, as it is illustrated on the figure below.
Comparing the performance of the two versions (sequential and DG) and measure their effectiveness is not straightforward. While the sequential solution can use a dedicated processor, the Desktop Grid solution utilises the free processing power of otherwise busy laboratory PCs that could be different in every single experiment. Instead of a basically impossible and non-realistic precise comparison, we were rather interested in measuring the speed-up in case of some realistic but non-deterministic DG scenarios.
The single processor performance of a test PC was compared to a relatively much larger but non-deterministic number of computers (about 1500 PCs) used in the desktop Grid experiment. The following table summarises the results of the test runs. Each test case was run several times both on the Desktop Grid and the test PC. The table shows the execution times of different runs for the same parameter values (each work unit uses the same set of input files, only the number of created work units was changed from run to run). The table also shows the achieved speedup by the ported version of the application compared to the sequential. The execution times in the table are rounded to the nearest minute and the execution times for the sequential version represent the same number of runs on a single PC (for example, 1000 work units is equal to 1000*3=3000 number of runs on the test PC).
The following figure shows how the value of speedup increases by increasing the value of work units.
This application was ported to the Westminster Local Desktop Grid within the framework of the European EDGeS Project.