The debut of NVIDIA’s Kepler architecture in 2012 marked a significant milestone in the evolution of general-purpose GPU computing. In particular, Kepler GK110 (compute capability 3.5) brought unrivaled compute power and introduced a number of new features…
This post is Topic #3 (post 3) in our series Parallel Code: Maximizing your Performance Potential. Many applications contain algorithms which make use of multi-dimensional arrays (or matrices). For cases where threads need to index the higher…
This post is Topic #3 (post 2) in our series Parallel Code: Maximizing your Performance Potential. In my previous post, I provided an introduction to the various types of memory available for use in a CUDA application.…
With the introduction of Intel’s new Xeon E5-2600v2 processors, there are exciting new choices for HPC users. Overall, the Xeon E5-2600 series processors have provided the highest cost-effective HPC performance available to date. This new set of…
NVIDIA’s Tesla K20 GPU is currently the de facto standard for high-performance heterogeneous computing. Based upon the Kepler GK110 architecture, these are the GPUs you want if you’ll be taking advantage of the latest advancements available in…
This post is Topic #3 (part 1) in our series Parallel Code: Maximizing your Performance Potential. CUDA devices have several different memory spaces: Global, local, texture, constant, shared and register memory. Each type of memory on the…
This post is Topic #2 (part 2) in our series Parallel Code: Maximizing your Performance Potential. In my previous post, CUDA Host/Device Transfers and Data Movement, I provided an introduction into the bottlenecks associated with host/device transfers…
This post is Topic #2 (part 1) in our series Parallel Code: Maximizing your Performance Potential. In post #1, I discussed a few ways to optimize the performance of your application via controlling your threads and provided…
This post is Topic #1 in our series Parallel Code: Maximizing your Performance Potential. Regardless of the environment or architecture you are using, one thing is certain: you must properly manage the threads running in your application to…
No matter what the purpose of your application is, one thing is certain. You want to get the most bang for your buck. You see research papers being published and presented making claims of tremendous speed increases…