cluster Archives - Microway https://www.microway.com/tag/cluster/ We Speak HPC & AI Tue, 28 May 2024 17:26:05 +0000 en-US hourly 1 https://wordpress.org/?v=6.7.1 Running GPU Benchmarks of HOOMD-blue on a Tesla K40 GPU-Accelerated Cluster https://www.microway.com/hpc-tech-tips/running-gpu-benchmarks-hoomd-blue-tesla-k40-gpu-accelerated-cluster/ https://www.microway.com/hpc-tech-tips/running-gpu-benchmarks-hoomd-blue-tesla-k40-gpu-accelerated-cluster/#respond Tue, 14 Oct 2014 20:28:19 +0000 http://https://www.microway.com/?p=4862 This short tutorial explains the usage of the GPU-accelerated HOOMD-blue particle simulation toolkit on our GPU-accelerated HPC cluster. Microway allows you to quickly test your codes on the latest high-performance systems – you are free to upload and run your own software, although we also provide a variety of pre-compiled applications with built-in GPU acceleration. […]

The post Running GPU Benchmarks of HOOMD-blue on a Tesla K40 GPU-Accelerated Cluster appeared first on Microway.

]]>
Cropped shot of a HOOMD-blue micellar crystals simulation (visualized with VMD)

This short tutorial explains the usage of the GPU-accelerated HOOMD-blue particle simulation toolkit on our GPU-accelerated HPC cluster. Microway allows you to quickly test your codes on the latest high-performance systems – you are free to upload and run your own software, although we also provide a variety of pre-compiled applications with built-in GPU acceleration. Our GPU Test Drive Cluster is a useful resource for benchmarking the faster performance which can be achieved with NVIDIA Tesla GPUs.

This post demonstrate HOOMD-blue, which comes out of the Glotzer group at the University of Michigan. HOOMD blue supports a wide variety of integrators and potentials, as well as the capability to scale runs up to thousands of GPU compute processors. We’ll demonstrate one server with dual NVIDIA® Tesla®  K40 GPUs delivering speedups over 13X!

Before continuing, please note that successful use of HOOMD-blue will require some familiarity with Python. However, you can reference their excellent Quick Start Tutorial. If you’re already familiar with a different software package, read through our list of pre-installed applications. There may be no need for you to learn a new tool.

Access a Tesla GPU-accelerated Compute Node

Getting started on our GPU system is fast and easy – complete this short form to sign up for HOOMD-blue benchmarking. We will send you an e-mail with a general list of commands when your request is accepted, but this post provides guidelines specific to HOOMD-blue tests.

First, you need SSH to access our GPU cluster. Don’t worry if you’re unfamiliar with SSH – we will send you step-by-step login instructions. Windows users have one extra step, but SSH is built-in on Linux and MacOS.

Run CPU and GPU-accelerated HOOMD-blue

Once you’re logged in, it’s easy to compare CPU and GPU performance: enter the HOOMD-blue directory and run the benchmark batch script which we have pre-written for you:

cd hoomd-blue
sbatch run-hoomd-on-TeslaK40.sh

Waiting for your HOOMD-blue job to finish

Our cluster uses SLURM to manage computational tasks. You should use the squeue command to check the status of your jobs. To watch as your job runs, use: watch squeue (hit CTRL+c to exit). Alternatively, the cluster can e-mail you when your job has finished if you update the HOOMD batch script file (although this must be done before submitting your job). Run:

nano run-hoomd-on-TeslaK40.sh

Within this file, add the following lines to the #SBATCH section (changing the e-mail address to your own):

#SBATCH --mail-user=yourname@example.com
#SBATCH --mail-type=END

If you would like to closely monitor the compute node which is executing your job, run squeue to check which compute node your job is running on. Log into that node via SSH and use one of the following tools to monitor the GPU and system status:

ssh node2
nvidia-smi
htop

(hit q to exit htop)

Check the speedup of HOOMD-blue on GPUs vs. CPUs

The results from the HOOMD-blue benchmark script will be placed in an output file named hoomd-K40.xxxx.output.log – below is a sample of the output running on CPUs:

======================================================
= Run CPU only lj_liquid_bmark_512K
======================================================
Average TPS: 21.90750

and with HOOMD-blue running on two GPUs (demonstrating a 13X speed-up):

======================================================
= Run Tesla_K40m GPU-accelerated lj_liquid_bmark_512K
======================================================
Average TPS: 290.27084

If you would like to examine the full execution sequence of a particular input, you will see that a log file has been created for each of the inputs (e.g., lj_liquid_bmark_512K.20_cpu_cores.output). If the HOOMD-blue job has any problems, the errors will be logged to the file hoomd-K40.xxxx.output.errors

The chart below shows the performance improvements for a CPU-only HOOMD-blue run (on two 10-core Ivy Bridge Intel Xeon CPUs) compared to a GPU-accelerated HOOMD-blue run (on two NVIDIA Tesla K40 GPUs):

Plot of HOOMD-blue performance results on Xeon CPUs and Tesla GPUs

Running your own HOOMD-blue inputs on GPUs

If you’re comfortable with shell scripts you can write your own batch script from scratch, but we recommend using the run-hoomd-your-files.sh file as a template when you’d like to try your own simulations. For most HOOMD-blue runs, the batch script will only reference a single Python script as input (e.g., the lj_liquid_bmark_512K.hoomd script). Reference the HOOMD-blue Quick Start Tutorial.

Once your script is in place in your hoomd-blue/ directory, you’ll need to ensure that the batch script is referencing the correct .hoomd input file. The relevant lines of the run-hoomd-your-files.sh file are:

echo "==============================================================="
echo "= Run CPU-only"
echo "==============================================================="

srun --mpi=pmi2 hoomd input_file.hoomd --mode=cpu > hoomd_output__cpu_run.txt
grep "Average TPS:" hoomd_output__cpu_run.txt

and for execution on GPUs:

echo "==============================================================="
echo "= Run GPU-Accelerated"
echo "==============================================================="

srun --mpi=pmi2 -n $GPUS_PER_NODE hoomd input_file.hoomd > hoomd_output__gpu_run.txt
grep "Average TPS:" hoomd_output__gpu_run.txt

As shown above, both the CPU and GPU runs use the same input file (input_file.hoomd). They will each save their output to a separate text file (hoomd_output__cpu_run.txt and hoomd_output__gpu_run.txt). The final line of each section uses the grep tool to print the performance of that run. HOOMD-blue typically measures performance in millions of particle time steps per second (TPS), where a higher number indicates better performance.

VMD visualization of micellar crystals
VMD visualization of micellar crystals

Will GPU acceleration speed up your research?

With our pre-configured GPU cluster, running HOOMD-blue across an HPC cluster isn’t much more difficult than running it on your own workstation. This makes it easy to compare HOOMD-blue simulations running on CPUs and GPUs. If you’d like to give it a try, contact one of our experts or sign up for a GPU Test Drive today!


Citation for HOOMD-blue:

Joshua A. Anderson, Chris D. Lorenz, and Alex Travesset – ‘General Purpose Molecular Dynamics Fully Implemented on Graphics Processing Units’, Journal of Computational Physics 227 (2008) 5342-5359
https://glotzerlab.engin.umich.edu/hoomd-blue/

Featured Illustration:

“Micellar crystals in solution from molecular dynamics simulations”, J. Chem. Phys. 128, 184906 (2008); DOI:10.1063/1.2913522
https://doi.org/10.1063/1.2913522

Citation for VMD:

Humphrey, W., Dalke, A. and Schulten, K., “VMD – Visual Molecular Dynamics” J. Molec. Graphics 1996, 14.1, 33-38
https://www.ks.uiuc.edu/Research/vmd/

The post Running GPU Benchmarks of HOOMD-blue on a Tesla K40 GPU-Accelerated Cluster appeared first on Microway.

]]>
https://www.microway.com/hpc-tech-tips/running-gpu-benchmarks-hoomd-blue-tesla-k40-gpu-accelerated-cluster/feed/ 0
Benchmarking NAMD on a GPU-Accelerated HPC Cluster with NVIDIA Tesla K40 https://www.microway.com/hpc-tech-tips/benchmarking-namd-gpu-accelerated-hpc-cluster-nvidia-tesla-k40/ https://www.microway.com/hpc-tech-tips/benchmarking-namd-gpu-accelerated-hpc-cluster-nvidia-tesla-k40/#respond Fri, 10 Oct 2014 17:32:04 +0000 http://https://www.microway.com/?p=4846 This is a tutorial on the usage of GPU-accelerated NAMD for molecular dynamics simulations. We make it simple to test your codes on the latest high-performance systems – you are free to use your own applications on our cluster and we also provide a variety of pre-installed applications with built-in GPU support. Our GPU Test […]

The post Benchmarking NAMD on a GPU-Accelerated HPC Cluster with NVIDIA Tesla K40 appeared first on Microway.

]]>
Cropped shot of a NAMD stmv simulation (visualized with VMD)

This is a tutorial on the usage of GPU-accelerated NAMD for molecular dynamics simulations. We make it simple to test your codes on the latest high-performance systems – you are free to use your own applications on our cluster and we also provide a variety of pre-installed applications with built-in GPU support. Our GPU Test Drive Cluster acts as a useful resource for demonstrating the increased application performance which can be achieved with NVIDIA Tesla GPUs.

This post describes the scalable molecular dynamics software NAMD, which comes out of the Theoretical and Computational Biophysics Group at the University of Illinois Urbana-Champaign. NAMD supports a variety of operational modes, including GPU-accelerated runs across large numbers of compute nodes. We’ll demonstrate how a single server with NVIDIA® Tesla®  K40 GPUs can deliver speedups over 4X!

Before continuing, please note that this post assumes you are familiar with NAMD. If you prefer a different molecular dynamics package (e.g., AMBER), read through the list of applications we have pre-installed. There may be no need for you to learn a new tool. If all of these tools are new to you, you will find a number of NAMD tutorials online.

Access the Tesla GPU-accelerated Cluster

Getting started with our GPU Benchmark cluster is fast and easy – fill out this short form to sign up for GPU benchmarking. Although we will send you an e-mail with a general list of commands when your request is accepted, this post goes into further detail.

First, you need to log in to the GPU cluster using SSH. Don’t worry if you haven’t used SSH before – we will send you step-by-step login instructions. Windows users have to perform one additional step, but SSH is built-in on Linux and MacOS.

Run CPU and GPU-accelerated versions of NAMD

Once you’re logged in, it’s easy to compare CPU and GPU performance: enter the NAMD directory and run the NAMD batch script which we have pre-written for you:

cd namd
sbatch run-namd-on-TeslaK40.sh

Waiting for your NAMD job to finish

Our cluster uses SLURM to manage users’ jobs. You can use the squeue command to keep track of your jobs. For real-time information on your job, run: watch squeue (hit CTRL+c to exit). Alternatively, the cluster can e-mail you when your job is finished if you update the NAMD batch script file (although this must be done before submitting your job). Run:

nano run-namd-on-TeslaK40.sh

Within this file, add the following two lines to the #SBATCH section (changing the e-mail address to your own):

#SBATCH --mail-user=yourname@example.com
#SBATCH --mail-type=END

If you would like to closely monitor the compute node which is running your job, check the output of squeue and take note of which compute node your job is running on. Log into that node with SSH and then use one of the following tools to keep an eye on GPU and system status:

ssh node2
nvidia-smi
htop

(hit q to exit htop)

Check the speedup of NAMD on GPUs vs. CPUs

The results from the NAMD batch script will be placed in an output file named namd-K40.xxxx.output.log – below is a sample of the output running on CPUs:

======================================================
= Run CPU only stmv
======================================================
Info: Benchmark time: 20 CPUs 0.531318 s/step 6.14951 days/ns 4769.63 MB memory

and with NAMD running on two GPUs (demonstrating over 4X speed-up):

======================================================
= Run Tesla_K40m GPU-accelerated stmv
======================================================
Info: Benchmark time: 18 CPUs 0.112677 s/step 1.30413 days/ns 2475.9 MB memory

Should you require further details on a particular run, you will see that a separate log file has been created for each of the inputs (e.g., stmv.20_cpu_cores.output). The NAMD output files are available in the benchmarks/ directory (with a separate subdirectory for each test case). If your job has any problems, the errors will be logged to the file namd-K40.xxxx.output.errors

The following chart shows the performance improvements for a CPU-only NAMD run (on two 10-core Ivy Bridge Intel Xeon CPUs) versus a GPU-accelerated NAMD run (on two NVIDIA Tesla K40 GPUs):

Plot comparing NAMD performance on Xeon CPUs and NVIDIA Tesla K40 GPUs

Running your own NAMD inputs on GPUs

If you’re familiar with BASH you can write your own batch script from scratch, but we recommend using the run-namd-your-files.sh file as a template when you’d like to try your own simulations. For most NAMD runs, the batch script will only reference a single input file (e.g., the stmv.namd script). This input script will reference any other input files which NAMD might require:

  • Structure file (e.g., stmv.psf)
  • Coordinates file (e.g., stmv.pdb)
  • Input parameters file (e.g., par_all27_prot_na.inp)

You can upload existing inputs from your own workstation/laptop or you can assemble an input job on the cluster. If you opt for the latter, you need to load the appropriate software packages by running:

module load cuda gcc namd

Once your files are in place in your namd/ directory, you’ll need to ensure that the batch script is referencing the correct .namd input file. The relevant lines of the run-namd-your-files.sh file are:

echo "==============================================================="
echo "= Run CPU-only"
echo "==============================================================="

namd2 +p $num_cores_cpu input_file.namd > namd_output__cpu_run.txt
grep Benchmark namd_output__cpu_run.txt

and for execution on GPUs:

echo "==============================================================="
echo "= Run GPU-Accelerated"
echo "==============================================================="

namd2 +p $num_cores_gpu +devices $CUDA_VISIBLE_DEVICES +idlepoll input_file.namd > namd_output__gpu_run.txt
grep Benchmark namd_output__gpu_run.txt

As is hopefully clear, both the CPU and GPU runs use the same input file (input_file.namd). They will each output to a separate log file (namd_output__cpu_run.txt and namd_output__gpu_run.txt). The final line of each section uses the grep utility to print the performance of each run in days per nanosecond (where a lower number indicates better performance).

If you’d like to visualize your results, you will need an SSH client which properly forwards your X-session. You are welcome to contact us if you’re uncertain of this step. Once that’s done, the VMD visualization tool can be run:

module load vmd
vmd
VMD visualization of the Satellite Tobacco Mosaic Virus
VMD visualization of the Satellite Tobacco Mosaic Virus

Ready to try GPUs?

Once properly configured (which we’ve already done for you), running NAMD on a GPU cluster isn’t much more difficult than running it on your own workstation. This makes it easy to compare NAMD simulations running on CPUs and GPUs. If you’d like to give it a try, contact one of our experts or sign up for a GPU Test Drive today!


Citations for NAMD:

“NAMD was developed by the Theoretical and Computational Biophysics Group in the Beckman Institute for Advanced Science and Technology at the University of Illinois at Urbana-Champaign.”

James C. Phillips, Rosemary Braun, Wei Wang, James Gumbart, Emad Tajkhorshid, Elizabeth Villa, Christophe Chipot, Robert D. Skeel, Laxmikant Kale, and Klaus Schulten. Scalable molecular dynamics with NAMD. Journal of Computational Chemistry, 26:1781-1802, 2005. abstract, journal
https://www.ks.uiuc.edu/Research/namd/

Featured Illustration:

Molecular Dynamics of Viruses – Satellite Tobacco Mosaic Virus (STMV)

Citation for VMD:

Humphrey, W., Dalke, A. and Schulten, K., “VMD – Visual Molecular Dynamics” J. Molec. Graphics 1996, 14.1, 33-38
https://www.ks.uiuc.edu/Research/vmd/

The post Benchmarking NAMD on a GPU-Accelerated HPC Cluster with NVIDIA Tesla K40 appeared first on Microway.

]]>
https://www.microway.com/hpc-tech-tips/benchmarking-namd-gpu-accelerated-hpc-cluster-nvidia-tesla-k40/feed/ 0
Running AMBER on a GPU Cluster https://www.microway.com/hpc-tech-tips/running-amber-gpus/ https://www.microway.com/hpc-tech-tips/running-amber-gpus/#respond Mon, 06 Oct 2014 14:14:39 +0000 http://https://www.microway.com/?p=4813 Welcome to our tutorial on GPU-accelerated AMBER! We make it easy to benchmark your applications and problem sets on the latest hardware. Our GPU Test Drive Cluster provides developers, scientists, academics, and anyone else interested in GPU computing with the opportunity to test their code. While Test Drive users are given free reign to use […]

The post Running AMBER on a GPU Cluster appeared first on Microway.

]]>
Cropped shot of an AMBER nucleosome simulation (visualized with VMD)

Welcome to our tutorial on GPU-accelerated AMBER! We make it easy to benchmark your applications and problem sets on the latest hardware. Our GPU Test Drive Cluster provides developers, scientists, academics, and anyone else interested in GPU computing with the opportunity to test their code. While Test Drive users are given free reign to use their own applications on the cluster, Microway also provides a variety of pre-installed GPU accelerated applications.

In this post, we will look at the molecular dynamics package AMBER. Collaboratively developed by professors at a variety of university labs, the latest versions of AMBER natively support GPU acceleration. We’ll demonstrate how NVIDIA® Tesla®  K40 GPUs can deliver a speedup of up to 86X!

Before we jump in, we should mention that this post assumes you are familiar with AMBER and/or AmberTools. If you are more familiar with another molecular dynamics package (e.g., GROMACS), check to see what we already have pre-installed on our cluster. There may be no need for you to learn a new tool. If you’re new to these tools in general, you can find quite a large number of AMBER tutorials online.

Access our GPU-accelerated Test Cluster

Getting access to the Microway Test Drive cluster is fast and easy – fill out a short form to sign up for a GPU Test Drive. Although our approval e-mail includes a list of commands to help you get your benchmark running, we’ll go over the steps in more detail below.

First, you need to log in to the Microway Test Drive cluster using SSH. Don’t worry if you’re unfamiliar with SSH – we include a step-by-step instruction manual for logging in. SSH is built-in on Linux and MacOS; Windows users need to install one application.

Run CPU and GPU versions of AMBER

This is one of the easiest steps. Just enter the AMBER directory and run the default benchmark script which we have pre-written for you:

cd amber
sbatch run-amber-on-TeslaK40.sh

Waiting for jobs to complete

Our cluster uses SLURM for resource management. Keeping track of your job is easy using the squeue command. For real-time information on your job, run: watch squeue (hit CTRL+c to exit). Alternatively, you can tell the cluster to e-mail you when your job is finished by editing the AMBER batch script file (although this must be done before submitting jobs with sbatch). Run:

nano run-amber-on-TeslaK40.sh

Within this file, add the following two lines to the #SBATCH section (specifying your own e-mail address):

#SBATCH --mail-user=yourname@example.com
#SBATCH --mail-type=END

If you would like to monitor the compute node which is running your job, examine the output of squeue and take note of which node your job is running on. Log into that node using SSH and then use the tools of your choice to monitor it. For example:

ssh node2
nvidia-smi
htop

(hit q to exit htop)

See the speedup of GPUs vs. CPUs

The results from our benchmark script will be placed in an output file called amber-K40.xxxx.output.log – below is a sample of the output running on CPUs:

===============================================================
= Run CPU-only: JAC_PRODUCTION_NVE - 23,558 atoms PME
===============================================================
|         ns/day =      25.95   seconds/ns =    3329.90

and with AMBER running on GPUs (demonstrating a 6X speed-up):

========================================================================
= Run Tesla_K40m GPU-accelerated: JAC_PRODUCTION_NVE - 23,558 atoms PME
========================================================================
|         ns/day =     157.24   seconds/ns =     549.47

Should you require more information on a particular run, it’s available in the benchmarks/ directory (with a separate subdirectory for each test case). If your job has any problems, the errors will be logged to the file amber-K40.xxxx.output.errors

The chart below demonstrates the performance improvements between a CPU-only AMBER run (on two 10-core Ivy Bridge Intel Xeon CPUs) and a GPU-accelerated AMBER run (on two NVIDIA Tesla K40 GPUs):

AMBER Speedups on NVIDIA Tesla K40 GPUs

Running your own AMBER inputs on GPUs

If you’re familiar with BASH, you can of course create your own batch script, but we recommend using the run-amber-your-files.sh file as a template for when you want to run you own simulations. For AMBER, the key files are the prmtop, inpcrd, and mdin files. You can upload these files yourself or you can build them. If you opt for the latter, you need to load the appropriate software packages by running:

module load cuda gcc mvapich2-cuda amber

Once your files are either created or uploaded, you’ll need to ensure that the batch script is referencing the correct input files. The relevant parts of the run-amber-your-files.sh file are:

echo "==============================================================="
echo "= Run CPU-only"
echo "==============================================================="

srun -n $NPROCS pmemd.MPI -O -i mdin -o mdout.cpu -p prmtop -inf mdinfo.cpu -c inpcrd -r restrt.cpu -x mdcrd.cpu
grep "ns/day" mdinfo.cpu | tail -n1

and for execution on GPUs:

echo "==============================================================="
echo "= Run GPU-Accelerated"
echo "==============================================================="

srun -n $GPUS_PER_NODE pmemd.cuda.MPI -O -i mdin -o mdout.cpu -p prmtop -inf mdinfo.gpu -c inpcrd -r restrt.gpu -x mdcrd.gpu
grep "ns/day" mdinfo.gpu | tail -n1

The above script assumes that mdin (control data: variables and simulation options), prmtop (topology: the molecular topology and force field parameters), and inpcrd (coordinates: the atom coordinates, velocities, box dimensions) are the main input files, but you are free to add additional levels of complexity as well. The output files (mdout, mdinfo, restrt, mdcrd) are labeled with the suffixes .cpu and .gpu. The line which lists the grep command is used to populate the amber-K40.xxxx.output.log output file with the ns/day benchmark times (just as shown in the sample output listed above).

If you’d like to visualize your results, you will need an SSH client which properly forwards your X-session. You are welcome to contact us if you’re uncertain of this step. Once that’s done, the VMD visualization tool can be accessed by running:

module load vmd
vmd
VMD visualization of a nucleosome
VMD visualization of a nucleosome

What’s next?

With the right setup (which we’ve already done for you), running AMBER on a GPU cluster isn’t much more difficult than running it on your own workstation. We also make it easy to compare benchmark results between CPUs and GPUs. If you’d like to learn more, contact one of our experts or sign up for a GPU Test Drive today!


Citations for AMBER and AmberTools:

D.A. Case, T.A. Darden, T.E. Cheatham, III, C.L. Simmerling, J. Wang, R.E. Duke, R. Luo, R.C. Walker, W. Zhang, K.M. Merz, B. Roberts, S. Hayik, A. Roitberg, G. Seabra, J. Swails, A.W. Goetz, I. Kolossváry, K.F. Wong, F. Paesani, J. Vanicek, R.M. Wolf, J. Liu, X. Wu, S.R. Brozell, T. Steinbrecher, H. Gohlke, Q. Cai, X. Ye, J. Wang, M.-J. Hsieh, G. Cui, D.R. Roe, D.H. Mathews, M.G. Seetin, R. Salomon-Ferrer, C. Sagui, V. Babin, T. Luchko, S. Gusarov, A. Kovalenko, and P.A. Kollman (2012), AMBER 12, University of California, San Francisco.

PME: Romelia Salomon-Ferrer; Andreas W. Goetz; Duncan Poole; Scott Le Grand; & Ross C. Walker* “Routine microsecond molecular dynamics simulations with AMBER – Part II: Particle Mesh Ewald” , J. Chem. Theory Comput., 2013, 9 (9), pp 3878-3888, DOI: 10.1021/ct400314y

GB: Andreas W. Goetz; Mark J. Williamson; Dong Xu; Duncan Poole; Scott Le Grand; & Ross C. Walker* “Routine microsecond molecular dynamics simulations with AMBER – Part I: Generalized Born”, J. Chem. Theory Comput., (2012), 8 (5), pp 1542-1555, DOI: 10.1021/ct200909j

https://ambermd.org/

Citation for VMD:

Humphrey, W., Dalke, A. and Schulten, K., “VMD – Visual Molecular Dynamics” J. Molec. Graphics 1996, 14.1, 33-38

https://www.ks.uiuc.edu/Research/vmd/

The post Running AMBER on a GPU Cluster appeared first on Microway.

]]>
https://www.microway.com/hpc-tech-tips/running-amber-gpus/feed/ 0