This post is Topic #2 (part 2) in our series Parallel Code: Maximizing your Performance Potential. In my previous post, CUDA Host/Device Transfers and Data Movement, I provided an introduction into the bottlenecks associated with host/device transfers…
This post is Topic #2 (part 1) in our series Parallel Code: Maximizing your Performance Potential. In post #1, I discussed a few ways to optimize the performance of your application via controlling your threads and provided…
This post is Topic #1 in our series Parallel Code: Maximizing your Performance Potential. Regardless of the environment or architecture you are using, one thing is certain: you must properly manage the threads running in your application to…
No matter what the purpose of your application is, one thing is certain. You want to get the most bang for your buck. You see research papers being published and presented making claims of tremendous speed increases…
This week NVIDIA provided a tutorial outlining first steps for GPU acceleration using OpenACC and CUDA. This was offered as part of the “GPUs Accelerating Research” week at Northeastern University and Boston University. After attending, it seemed…
NVIDIA is now shipping their 4.58 TFLOPS single-precision floating point GPUs. The Tesla K10 GPU Accelerators, based upon the Kepler GK104 architecture, are the first Teslas available from this new generation of products. They are designed for…
This post was last updated on 2018-11-05 Most users know how to check the status of their CPUs, see how much system memory is free, or find out how much disk space is free. In contrast, keeping…