My final project is
"Parallelization of molecular dynamics'', supervised by ``Dr Joan Adler''. I thank
P.Pine and J. Halupovich for assistance and RBNI for funding for the
computer resources.
This project originated from
the need to write a more efficient code for simulating vibrations of nanotubes (for more details please see papers by Polina Pine). Thus the scalar Molecular Dynamics
code used by P. Pine was partly parallelized with MPI and run for unclamped nanotubes.
Molecular dynamics is a way
to simulate a physical behavior on the atomic level through a computer program.
In order for the simulation to yield correct results, it's important to choose
wisely a relevant potential. This simulation is based on the potential which was
published by Donald.W Brenner in Physical Review B.
Here is a short summary of the BRENNER
potential.
First thing about MPI is that
it's not a programming language, but rather a tool that controls the distribution of
some (computational) work among a group of processors. Through this interface
each processor is assigned specific task/s and communicates with the other
processors when necessary.
The core idea behind parallel computation is that each processor
receives copy of the whole code. Inside the code lines there are
instructions for each processor what to do and how and when to communicate with
other processors. This is much like a screenplay of a show where each actor
(processor) gets the whole screenplay but knows exactly what to say in each
step of the show.
TAMNUN is a computer cluster
of 1056 cores that has been purchased (arrived
at end of March 2012) by the Technion from the
SGI-TNN company and is available for the use of all researchers at the Technion. This new cluster was funded by the Russell Berrie Nanotechnology Institute and the Minerva Foundation.
For "Fortran"
source code, the compilation command is "ifort"
for a serial code and "mpiifort" for
parallel code.
Here is an example of the
command that I used to compile the parallel code: compile.
Notes:
i.
The addition of "2> log.txt" at the end
of the command is very useful when the compiler generate a long list of errors:
all the errors are written into a single text file.
ii.
The "-I" operator is used to link other
relevant code in other libraries.
Running a job (serial/
parallel) on an HPC is carried out through a queuing system. The execution is
done through "qsub" command, after
which comes the name of the shell file. The shell file is a script that
specifies everything for a successful run of the program.
For example:
·
Running a serial job:
qsub scalar_job.sh
·
Running a parallel job:
qsub parallel_job.sh
Those are the code lines that
I've modified (See the text files):
·
Main.f
·
Pred
(inside pred_corr.f)
Other parts of the code
weren't changed and are represented in a general
flowchart of the serial code
Here is a schematic diagram of how the data was
distributed and gathered
The aim of my project was to
parallelize a specific part of the serial code, which calculates the first part
of the predictor-corrector algorithm.
The predictor-corrector
algorithm is used to solve the classical Newtonian equation of motion (which is
a second-order differential equation). This algorithm is divided into three
parts:
·
Prediction
·
Evaluation
·
Correction
At the first two steps of the
algorithm the positions and velocities of the atoms are predicted and evaluated
at some future time interval (t+Dt). The
calculation is based on the positions and the velocities at time some past time
(t+i∙Dt), where i = 0,...,k-2, and k being the
order of the predictor part (here we are using third order predictor).
At the third part of the algorithm the calculated values are corrected from the
information based on the force (Newton first law of motion) at the some future
time (t+Dt).
*for more information about the predictor-corrector algorithm you can see Polin Pine's site.
Four samples were chosen and generated through a computer program "Chrial".
Two samples were generated according to chiral
vector (m,n)=(7,7) and consisted of 50/100 periods (splines) of 28 atoms each, Thus making two samples of 1400 and 2800 atoms.
The other two samples were generated according to chiral vector (m,n)=(7,0) and consisted of 40/80 periods (splines) of 28 atoms each, Thus making two samples of 1120 and 2240 atoms.
Click here to see the wall time distribution of for subroutine PRED.
Updated: Nov 2012