By: Eliot Eshelman

Eliot Eshelman — Fri, 17 Feb 2017 14:16:51 +0000

In reply to Plyskeen.

The quantity of shared memory per SM is indeed 64KB. However, each thread block can use up to 48KB.

We keep a short table of this data here:
https://www.microway.com/knowledge-center-articles/in-depth-comparison-of-nvidia-tesla-pascal-gpu-accelerators/

NVIDIA keeps the full table here:
https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#features-and-technical-specifications__technical-specifications-per-compute-capability

By: Plyskeen

Plyskeen — Fri, 17 Feb 2017 12:13:21 +0000

Silly question – I was under the impression that the amount of Shared Memory available per SM was raised to 64 KiBs on the GP100 – yet the output of the deviceQuery that you list here says “Total amount of shared memory per block: 49152 bytes” (so 48KBis, same as Kepler, if I remember correctly). Which one is correct? Thanks!

Comments on: NVIDIA Tesla P100 NVLink 16GB GPU Accelerator (Pascal GP100 SXM2) Up Close

By: Eliot Eshelman

By: Plyskeen