qpi Archives - Microway https://www.microway.com/tag/qpi/ We Speak HPC & AI Tue, 28 May 2024 04:22:20 +0000 en-US hourly 1 https://wordpress.org/?v=6.7.1 Detailed Specifications of the Intel Xeon E5-2600v4 “Broadwell-EP” Processors https://www.microway.com/knowledge-center-articles/detailed-specifications-of-the-intel-xeon-e5-2600v4-broadwell-ep-processors/ https://www.microway.com/knowledge-center-articles/detailed-specifications-of-the-intel-xeon-e5-2600v4-broadwell-ep-processors/#respond Thu, 31 Mar 2016 16:30:59 +0000 https://www.microway.com/?post_type=incsub_wiki&p=7124 This article provides in-depth discussion and analysis of the 14nm Xeon E5-2600v4 series processors (formerly codenamed “Broadwell-EP”). “Broadwell” processors replace the previous 22nm “Haswell” microarchitecture and are available for sale as of March 31, 2016. For an introduction, read our blog post Intel Xeon E5-2600 v4 “Broadwell” Processor ReviewNote: these have since been superceded by […]

The post Detailed Specifications of the Intel Xeon E5-2600v4 “Broadwell-EP” Processors appeared first on Microway.

]]>
This article provides in-depth discussion and analysis of the 14nm Xeon E5-2600v4 series processors (formerly codenamed “Broadwell-EP”). “Broadwell” processors replace the previous 22nm “Haswell” microarchitecture and are available for sale as of March 31, 2016. For an introduction, read our blog post Intel Xeon E5-2600 v4 “Broadwell” Processor ReviewNote: these have since been superceded by the Intel Xeon Processor Scalable Family CPUs.

Important changes available in E5-2600v4 “Broadwell-EP” include:

  • Up to 22 processor cores per socket (with options for 4-, 6-, 8-, 10-, 12-, 14-, 16-, 18-, and 20-cores)
  • Support for DDR4 memory speeds up to 2400MHz
  • Floating Point Instruction performance improvements:
    • Faster floating point multiplier completes operations in 3 cycles (down from 5 cycles)
    • 1024 Radix divider for reduced latency
    • Split Scalar divides for increased parallelism/bandwidth
    • Faster vector Gather
    • As introduced with Haswell, Broadwell continues to support AVX2 and FMA3 instructions for significant speedups of floating-point multiplication and addition operations
  • Extract more parallelism in scheduling micro-operations:
    • Reduced instruction latencies on ADC, CMOV and PCLMULQDQ
    • Larger out-of-order scheduler, with 64 entries (up from 60 entries)
    • Improved address prediction for branches and returns, with an expanded 10-way Branch Prediction Unit Target Array (up from 8-way)
  • Improved performance on large data sets:
    • Larger L2 Translation Lookaside Buffer (TLB), with 1.5k entries (up from 1K entries)
    • A new L2 TLB for 1GB pages (with 16 entries)
    • Addition of a second TLB page miss handler for parallel page walks

With a product this complex, it’s very difficult to cover every aspect of the design. Here, we concentrate primarily on the performance of the processors for HPC applications.

Exceptional Computational Performance

The Xeon E5-2600v4 processors provide the highest performance available to date in a socketed CPU. Many of the higher-end models provide well over 500 GFLOPS (more than half a TFLOPS). Much of this performance is made possible through the use of AVX2 with FMA3 instructions. The plot below compares the peak performance of these CPUs with and without FMA instructions:

Plot of Xeon E5-2600v4 Theoretical Peak Performance (GFLOPS)

The colored bars indicate performance using only AVX instructions; the grey bars indicate theoretical peak performance when using AVX with FMA. Note that only a small set of codes will be capable of issuing almost exclusively FMA instructions (e.g., LINPACK). Most applications will issue a variety of instructions, which will result in lower than peak FLOPS. Expect the achieved performance for well-parallelized & optimized applications to fall between the grey and colored bars.

Intel Xeon E5-2600v4 Series Specifications

The tabs below compare the features and specifications of the new model line. Intel has divided the CPUs into several groups:

  • Standard: cost-effective CPUs with moderate performance
  • Advanced: CPUs offering the highest performance for most applications
  • High Core Count: ideal for highly multi-threaded applications; CPUs providing the highest number of processor cores (sometimes sacrificing clock frequency in favor of core count)
  • Frequency Optimized: ideal for non-parallel/single-threaded applications; CPUs with the highest clock speeds (sacrificing number of cores in order to provide the highest frequencies)

Although these processors introduce significant performance increases, technical readers will see that many of the changes are incremental: increased core counts, improved DDR memory speed, etc. However, processor clock speeds/frequencies have not seen significant improvements.

In fact, in some cases the CPU frequency has been lowered from the previous models. Processor frequency and Turbo Boost behavior have changed fairly significantly in the last two CPU releases (“Haswell” and “Broadwell”). Those metrics are discussed in further detail in the next section.

Clock Speeds & Turbo Boost in Xeon E5-2600v4 series “Broadwell” processors

With each new processor line, Intel introduces new architecture optimizations. The design of the “Broadwell” architecture acknowledges that highly-parallel/vectorized applications place the highest load on the processor cores (requiring more power and thus generating more heat). While a CPU core is executing intensive vector tasks (AVX instructions), the clock speed may be reduced to keep the processor within its power limits (TDP).

In effect, this may result in the processor running at a lower frequency than the “base” clock speed advertised for each model. For that reason, each “Broadwell” processor is assigned two “base” frequencies:

  1. AVX mode: due to the higher power requirements of AVX instructions, clock speeds may be somewhat lower while executing AVX instructions *
  2. Non-AVX mode: while not executing AVX instructions, the processor will operate at what would traditionally be considered the “stock” frequency

* a CPU core will return to Non-AVX mode 1 millisecond after AVX instructions complete

It is worth noting that these modes are isolated to each core. Within a given CPU, some cores may be operating in AVX mode while others are operating in Non-AVX mode. In the previous generation, AVX instructions running on a single core would cause all cores to run in AVX mode.

AVX and Non-AVX Turbo Boost

Just as in previous architectures, “Broadwell” CPUs include the Turbo Boost feature which allows each processor core to operate well above the “base” clock speed during most operations. The precise clock speed increase depends upon the number & intensity of tasks running on each CPU. However, Turbo Boost speed increases also depend upon the types of instructions (AVX vs. Non-AVX).

The two plots below show that processor clock speeds can be categorized as:

  1. All cores on the CPU actively running Non-AVX instructions
  2. All cores on the CPU actively running AVX instructions
  3. A single active core running Non-AVX instructions (all other cores on the CPU must be idle)
  4. A single active core running AVX instructions (all other cores on the CPU must be idle)

Note that despite the clear rules stated above, each value is still a range of clock speeds. Because workloads are so diverse, Intel is unable to guarantee one specific clock speed for AVX or Non-AVX instructions. Users are guaranteed that cores will run within a specific range, but each application will have to be benchmarked to determine which frequencies a CPU will operate at.

When examining the differences between AVX and Non-AVX instructions, notice that Non-AVX instructions typically result in no more than a 100MHz to 200MHz increase in the highest clock speed. However, AVX instructions may cause clock speeds to drop by 300MHz to 400MHz if they are particularly intensive.

Recall that AVX2 introduces support for both integer and floating-point instructions, which means any compute-intensive application will be using such instructions (if it has been properly designed and compiled). HPC users should expect their processors to be running in AVX mode most of the time.

Top Clock Speeds for Specific Core Counts

When workloads leave some CPU cores idle, the Xeon E5-2600v4 processors are able to use that headroom to increase the clock speed of the cores which are performing work. Just as with other Turbo Boost scenarios, the precise speed increase will depend upon the CPU model. It will also depend upon how many CPU cores are active.

We advise users to consider how many CPU cores their application is able to saturate. The tabs below detail the peak Turbo Boost frequencies for each CPU model, sorted by the number of active cores:

All of the above plots show CPU frequencies for applications utilizing AVX instructions. The colored bars indicate the worst-case scenario – CPUs will run at least this fast. The grey bars indicate the expected clock speeds for most workloads.

Cost-Effectiveness and Power Efficiency of Xeon E5-2600v4 CPUs

The “Broadwell-EP” processors have nearly the same price structure and power requirements as earlier Xeon E5-2600 products, so their cost-effectiveness and power-efficiency should be quite attractive to HPC users. Savvy readers may find the following facts useful:

  • HPC applications run best on the Advanced CPU models; they typically do not scale well on the High-Core-Count models.
  • The High-Core-Count models are more common in Enterprise and Finance – these carry higher prices than other E5-2600 models.
  • The following graphs depict the cost-effectiveness and power-efficiency of only the CPU itself. In many cases, HPC users will find that once they’ve taken the full platform and cluster design into account, the cost-effectiveness of an Advanced CPU may be higher than these plots demonstrate.

Summary of features in Xeon E5-2600v4 “Broadwell-EP” processors

In addition to the capabilities mentioned at the top of this article, these processors include many of the successful features from earlier Xeon designs. The list below provides a summary of relevant technology features:

  • Up to 22 processor cores per socket (with options for 4-, 6-, 8-, 10-, 12-, 14-, 16-, 18-, and 20-cores)
  • Support for Quad-channel ECC DDR4 memory speeds up to 2400MHz
  • Direct PCI-Express (generation 3.0) connections between each CPU and peripheral devices such as network adapters, GPUs and coprocessors (40 PCI-E lanes per socket)
  • Floating Point Instruction performance improvements:
    • Faster floating point multiplier completes operations in 3 cycles (down from 5 cycles)
    • 1024 Radix divider for reduced latency
    • Split Scalar divides for increased parallelism/bandwidth
    • Faster vector Gather
  • As introduced with “Haswell”, “Broadwell” continues to supportAdvanced Vector Extensions (AVX 2.0):
    • effectively double the throughput of integer and floating-point operations with math units expanded from 128-bits to 256-bits
    • introduce Fused Multiply Add (FMA3) instructions which allow a multiply and an accumulate instruction to be completed in a single cycle (effectively doubling the FLOPS/clock from 8 to 16 for each core of a CPU)
    • add support for additional instructions, including Gather and vector shift
    • F16C 16-bit Floating-Point conversion instructions accelerate data conversion between 16-bit and 32-bit floating point formats
  • Turbo Boost technology improves performance under peak loads by increasing processor clock speeds. With version 2.0, (introduced in “Sandy Bridge”) clock speeds are boosted more frequently, to higher speeds and for longer periods of time. With “Haswell” and “Broadwell”, top clock speeds depend upon the type of instructions (AVX vs. Non-AVX).
  • Extract more parallelism in scheduling micro-operations:
    • Reduced instruction latencies on ADC, CMOV and PCLMULQDQ
    • Larger out-of-order scheduler, with 64 entries (up from 60 entries)
    • Introduction of the ADCX and ADOX instructions to speed up cryptography
    • Improved address prediction for branches and returns, with an expanded 10-way Branch Prediction Unit Target Array (up from 8-way)
  • Improved performance on large data sets:
    • Larger L2 Translation Lookaside Buffer (TLB), with 1.5k entries (up from 1K entries)
    • A new L2 TLB for 1GB pages (with 16 entries)
    • Addition of a second TLB page miss handler for parallel page walks
  • Dual Quick Path Interconnect (QPI) links between processor sockets improve communication speeds for multi-threaded applications
  • Intel Data Direct I/O Technology increases performance and reduces latency by allowing Intel ethernet controllers and adapters to talk directly with the processor cache
  • Transactional Synchronization Extensions (TSX) improve the parallelism of multi-threaded applications with synchronization locks
  • Introduction of the RDSEED instruction for high-quality, non-deterministic, random seed values
  • Advanced Encryption Standard New Instructions (AES-NI) accelerate encryption and decryption for fast, affordable data protection and security
  • 32-bit & 64-bit Intel Virtualization Technology (VT/VT-x) forDirected I/O (VT-d) and Connectivity (VT-c) deliver faster performance for core virtualization processes and provide built-in hardware support for I/O virtualization.
  • Intel APIC Virtualization (APICv) provides increased virtualization performance
  • Hyper-Threading technology allows two threads to “share” a processor core for improved resource usage. Although useful for some workloads, it is not recommended for HPC applications.
  • Improved energy efficiency with Per Core P-States and independent uncore frequency control
  • Hardware Controlled Power Management for more rapid and efficient decisions on optimal P- and C-State operating point
  • DDR4 CRC provides better memory reliability and data integrity by detecting memory bus faults during write
  • ECRC for PCI-Express provides optional data integrity protection for systems using PCI-Express switches or bridges

The post Detailed Specifications of the Intel Xeon E5-2600v4 “Broadwell-EP” Processors appeared first on Microway.

]]>
https://www.microway.com/knowledge-center-articles/detailed-specifications-of-the-intel-xeon-e5-2600v4-broadwell-ep-processors/feed/ 0
Detailed Specifications of the Intel Xeon E5-4600 v3 “Haswell-EP” Processors https://www.microway.com/knowledge-center-articles/detailed-specifications-intel-xeon-e5-4600-v3-haswell-ep-processors/ https://www.microway.com/knowledge-center-articles/detailed-specifications-intel-xeon-e5-4600-v3-haswell-ep-processors/#respond Mon, 15 Jun 2015 22:12:22 +0000 http://https://www.microway.com/?post_type=incsub_wiki&p=5367 This article provides in-depth discussion and analysis of the 22nm Xeon E5-4600 v3 series processors (formerly codenamed “Haswell-EP”). “Haswell” processors replace the previous 22nm “Ivy Bridge” microarchitecture and are available for sale as of June 1, 2015. For an introduction, read our blog post Xeon E5-4600v3 4-socket CPU Review Important changes available in E5-4600 v3 […]

The post Detailed Specifications of the Intel Xeon E5-4600 v3 “Haswell-EP” Processors appeared first on Microway.

]]>
This article provides in-depth discussion and analysis of the 22nm Xeon E5-4600 v3 series processors (formerly codenamed “Haswell-EP”). “Haswell” processors replace the previous 22nm “Ivy Bridge” microarchitecture and are available for sale as of June 1, 2015. For an introduction, read our blog post Xeon E5-4600v3 4-socket CPU Review

Important changes available in E5-4600 v3 “Haswell-EP” include:

  • Up to 18 processor cores per socket (with options for 4-, 6-, 8-, 10-, 12-, 14- and 16-cores)
  • Support for DDR4 memory speeds up to 2133MHz
  • Advanced Vector Extensions version 2.0 (AVX2 instructions):
    • allow 256-bit wide operations for both integer and floating-point numbers (the older AVX instructions supported only floating-point operations)
    • introduce Fused Multiply Add FMA3 instructions, which allow a multiply and an accumulate instruction to be completed in a single cycle (potentially doubling throughput for floating-point applications – up to 16 FLOPS per cycle)
    • add support for additional instructions, including Gather and vector shift
  • Improved energy efficiency with Per Core P-States and independent uncore frequency control

With a product this complex, it’s very difficult to cover every aspect of the design. Here, we concentrate primarily on the performance of the processors for HPC applications.

Exceptional Computational Performance

The Xeon E5-4600 v3 processors provide some of the highest performance available to date in a socketed CPU (similar to their dual-socket “Haswell-EP” counterparts). For the first time, this architecture offers a single CPU capable of more than half a TeraFLOPS (500 GFLOPS) and total system performance over 2 TFLOPS!. This is made possible through the use of AVX2 with FMA3 instructions. The plot below compares the peak performance of a single CPU with and without FMA instructions:

Chart of Xeon E5-4600 v3 Theoretical Peak Performance in GigaFLOPS

The colored bars indicate performance using only AVX instructions; the grey bars indicate theoretical peak performance when using AVX with FMA. Note that only a small set of codes will be capable of issuing almost exclusively FMA instructions (e.g., LINPACK). Most applications will issue a variety of instructions, which will result in lower than peak FLOPS. Expect the achieved performance for well-parallelized & optimized applications to fall between the grey and colored bars.

Intel Xeon E5-4600 v3 Series Specifications

The tabs below compare the features and specifications of the new model line. Intel has divided the CPUs into several groups:

  • Standard: cost-effective CPUs with moderate performance
  • Advanced: CPUs offering the highest performance for most applications
  • High Core Count: ideal for well-parallelized applications; CPUs providing the highest number of processor cores (sometimes sacrificing clock frequency in favor of core count)
  • Frequency Optimized: ideal for non-parallel/single-threaded applications; CPUs with the highest clock speeds (sacrificing number of cores in order to provide the highest frequencies)

Although these processors introduce significant performance increases, technical readers will see that many of the changes are incremental: increased core counts, improved DDR memory speed, etc. However, processor clock speeds/frequencies have not seen significant improvements.

In fact, in some cases the CPU frequency has been lowered from the previous models. Processor frequency and Turbo Boost behavior have changed significantly with this release. Those metrics are discussed in further detail in the next section.

Clock Speeds & Turbo Boost in Xeon E5-4600 v3 series “Haswell” processors

With each new processor line, Intel introduces new architecture optimizations. The design of the “Haswell” architecture acknowledges that highly-parallel/vectorized applications place the highest load on the processor cores (requiring more power and thus generating more heat). While a CPU core is executing intensive vector tasks (AVX instructions), the clock speed may be reduced to keep the processor within its power limits (TDP).

In effect, this may result in the processor running at a lower frequency than the “base” clock speed advertised for each model. For that reason, each “Haswell” processor model is assigned two “base” frequencies:

  1. AVX mode: due to the higher power requirements of AVX instructions, clock speeds may be somewhat lower while executing AVX instructions *
  2. Non-AVX mode: while not executing AVX instructions, the processor will operate at what would traditionally be considered the “stock” frequency

* a CPU core will return to Non-AVX mode 1 millisecond after AVX instructions complete

AVX and Non-AVX Turbo Boost

Just as in previous architectures, “Haswell” CPUs include the Turbo Boost feature which causes each processor core to operate well above the “base” clock speed during most operations. The precise clock speed increase depends upon the number & intensity of tasks running on each CPU. With the “Haswell” architecture, Turbo Boost speed increases also depend upon the types of instructions (AVX vs. Non-AVX).

The two plots below show that processor clock speeds can be categorized as:

  1. All cores on the CPU actively running Non-AVX instructions
  2. All cores on the CPU actively running AVX instructions
  3. A single active core running Non-AVX instructions (all other cores on the CPU must be idle)
  4. A single active core running AVX instructions (all other cores on the CPU must be idle)

Note that despite the clear rules stated above, each value is still a range of clock speeds. Because workloads are so diverse, Intel is unable to guarantee one specific clock speed for AVX or Non-AVX instructions. Users are guaranteed that cores will run within a specific range, but each application will have to be benchmarked to determine which frequencies a CPU will operate at.

When examining the differences between AVX and Non-AVX instructions, notice that Non-AVX instructions do not result in dramatically higher Turbo Boost speeds. With the exception of the E5-4620 v3, none of the grey bars rises any higher than the colored bars. Thus, for most CPUs the maximum possible Turbo Boost speed is the same when using AVX and Non-AVX instructions. However, heavy usage of AVX instructions may reduce the clock speed by as much as 300MHz.

Recall that AVX2 introduces support for both integer and floating-point instructions, which means any compute-intensive application will be using such instructions (if it has been properly designed and compiled). HPC users should expect their processors to be running in AVX mode most of the time.

Of course, it is worth remembering that the usage of AVX instructions can result in as much as a 100% increase in performance. It is much better to leverage AVX instructions – gaining the 100% increase in instruction throughput and suffering the small 5% to 15% CPU clock speed penalty. It would be unwise to turn off AVX with the expectation that overall performance would increase.

Top Clock Speeds for Specific Core Counts

When workloads leave some CPU cores idle, the Xeon E5-4600 v3 processors are able to use that headroom to increase the clock speed of the cores which are performing work. Just as with other Turbo Boost scenarios, the precise speed increase will depend upon the CPU model. It will also depend upon how many CPU cores are active.

We advise users to consider how many CPU cores their application is able to saturate. The tabs below detail the peak Turbo Boost frequencies for each CPU model, sorted by the number of active cores:

All of the above plots show CPU frequencies for applications utilizing AVX instructions. The colored bars indicate the worst-case scenario – CPUs will run at least this fast. The grey bars indicate the expected clock speeds for most workloads.

Cost-Effectiveness and Power Efficiency of Xeon E5-4600 v3 CPUs

The “Haswell-EP” processors have nearly the same price structure and power requirements as earlier Xeon E5-4600 products, so their cost-effectiveness and power-efficiency should be quite attractive to HPC users. Savvy readers may find the following facts useful:

  • The Xeon E5-4627 v3 CPUs are typically optimized for HPC workloads. Additionally, they feature pricing attractive to HPC groups.
  • The power requirement (TDP) for each model has increased by 5 Watts over the previous generation. This is due to integration of the Voltage Regulator Modules (VRMs) which were previously placed on the motherboard. Thus, CPU TDP increases 5W and motherboard TDP decreases 5W.
  • The following graphs depict the cost-effectiveness and power-efficiency of only the CPU itself. In many cases, HPC users will find that once they’ve taken the full platform and cluster design into account, the cost-effectiveness of a higher core count CPU may be more beneficial than these plots demonstrate.

Summary of features in Xeon E5-4600 v3 “Haswell-EP” processors

In addition to the capabilities mentioned at the top of this article, these processors include many of the successful features from earlier Xeon designs. The list below provides a summary of relevant technology features:

  • Up to 18 processor cores per socket (with options for 4-, 6-, 8-, 10-, 12-, 14- and 16-cores)
  • Support for Quad-channel ECC DDR4 memory speeds up to 2133MHz
  • Direct PCI-Express (generation 3.0) connections between each CPU and peripheral devices such as network adapters, GPUs and coprocessors (40 PCI-E lanes per socket)
  • Advanced Vector Extensions (AVX 2.0):
    • effectively double the throughput of integer and floating-point operations with math units expanded from 128-bits to 256-bits
    • introduce Fused Multiply Add (FMA3) instructions which allow a multiply and an accumulate instruction to be completed in a single cycle (effectively doubling the FLOPS/clock from 8 to 16 for each core of a CPU)
    • add support for additional instructions, including Gather and vector shift
    • F16C 16-bit Floating-Point conversion instructions accelerate data conversion between 16-bit and 32-bit floating point formats
  • Turbo Boost technology improves performance under peak loads by increasing processor clock speeds. With version 2.0, (introduced in “Sandy Bridge”) clock speeds are boosted more frequently, to higher speeds and for longer periods of time. With “Haswell”, top clock speeds depend upon the type of instructions (AVX vs. Non-AVX).
  • Faster Quick Path Interconnect (QPI) links between processor sockets improve communication speeds for multi-threaded applications
  • Improved energy efficiency with Per Core P-States and independent uncore frequency control
  • Intel Data Direct I/O Technology increases performance and reduces latency by allowing Intel ethernet controllers and adapters to talk directly with the processor cache
  • Advanced Encryption Standard New Instructions (AES-NI) accelerate encryption and decryption for fast, affordable data protection and security
  • 32-bit & 64-bit Intel Virtualization Technology (VT/VT-x) forDirected I/O (VT-d) and Connectivity (VT-c) deliver faster performance for core virtualization processes and provide built-in hardware support for I/O virtualization.
  • Intel APIC Virtualization (APICv) provides increased virtualization performance
  • Hyper-Threading technology allows two threads to “share” a processor core for improved resource usage. Although useful for some workloads, it is not recommended for HPC applications.

The post Detailed Specifications of the Intel Xeon E5-4600 v3 “Haswell-EP” Processors appeared first on Microway.

]]>
https://www.microway.com/knowledge-center-articles/detailed-specifications-intel-xeon-e5-4600-v3-haswell-ep-processors/feed/ 0
Detailed Specifications of the Intel Xeon E5-2600v3 “Haswell-EP” Processors https://www.microway.com/knowledge-center-articles/detailed-specifications-intel-xeon-e5-2600v3-haswell-ep-processors/ https://www.microway.com/knowledge-center-articles/detailed-specifications-intel-xeon-e5-2600v3-haswell-ep-processors/#respond Mon, 08 Sep 2014 17:00:21 +0000 http://https://www.microway.com/?post_type=incsub_wiki&p=4559 This article provides in-depth discussion and analysis of the 22nm Xeon E5-2600v3 series processors (formerly codenamed “Haswell-EP”). “Haswell” processors replace the previous 22nm “Ivy Bridge” microarchitecture and are available for sale as of September 8, 2014. Note: these have since been superceded by Xeon E5-2600v4 Broadwell-EP Processors. Important changes available in E5-2600v3 “Haswell-EP” include: With […]

The post Detailed Specifications of the Intel Xeon E5-2600v3 “Haswell-EP” Processors appeared first on Microway.

]]>
This article provides in-depth discussion and analysis of the 22nm Xeon E5-2600v3 series processors (formerly codenamed “Haswell-EP”). “Haswell” processors replace the previous 22nm “Ivy Bridge” microarchitecture and are available for sale as of September 8, 2014. Note: these have since been superceded by Xeon E5-2600v4 Broadwell-EP Processors.

Important changes available in E5-2600v3 “Haswell-EP” include:

  • Up to 18 processor cores per socket (with options for 4-, 6-, 8-, 10-, 12-, 14- and 16-cores)
  • Support for DDR4 memory speeds up to 2133MHz
  • Advanced Vector Extensions version 2.0 (AVX2 instructions):
    • allow 256-bit wide operations for both integer and floating-point numbers (the older AVX instructions supported only floating-point operations)
    • introduce Fused Multiply Add FMA3 instructions, which allow a multiply and an accumulate instruction to be completed in a single cycle (potentially doubling throughput for floating-point applications – up to 16 FLOPS per cycle)
    • add support for additional instructions, including Gather and vector shift
  • Improved energy efficiency with Per Core P-States and independent uncore frequency control

With a product this complex, it’s very difficult to cover every aspect of the design. Here, we concentrate primarily on the performance of the processors for HPC applications.

Exceptional Computational Performance

The Xeon E5-2600v3 processors introduce the highest performance available to date in a socketed CPU. For the first time, a single CPU is capable of more than half a TeraFLOPS (500 GFLOPS). This is made possible through the use of AVX2 with FMA3 instructions. The plot below compares the peak performance of these CPUs with and without FMA instructions:

Plot of Xeon E5-2600v3 Theoretical Peak Performance (GFLOPS)

The colored bars indicate performance using only AVX instructions; the grey bars indicate theoretical peak performance when using AVX with FMA. Note that only a small set of codes will be capable of issuing almost exclusively FMA instructions (e.g., LINPACK). Most applications will issue a variety of instructions, which will result in lower than peak FLOPS. Expect the achieved performance for well-parallelized & optimized applications to fall between the grey and colored bars.

Intel Xeon E5-2600v3 Series Specifications

The tabs below compare the features and specifications of the new model line. Intel has divided the CPUs into several groups:

  • Standard: cost-effective CPUs with moderate performance
  • Advanced: CPUs offering the highest performance for most applications
  • High Core Count: ideal for well-parallelized applications; CPUs providing the highest number of processor cores (sometimes sacrificing clock frequency in favor of core count)
  • Frequency Optimized: ideal for non-parallel/single-threaded applications; CPUs with the highest clock speeds (sacrificing number of cores in order to provide the highest frequencies)

Although these processors introduce significant performance increases, technical readers will see that many of the changes are incremental: increased core counts, improved DDR memory speed, etc. However, processor clock speeds/frequencies have not seen significant improvements.

In fact, in some cases the CPU frequency has been lowered from the previous models. Processor frequency and Turbo Boost behavior have changed significantly with this release. Those metrics are discussed in further detail in the next section.

Clock Speeds & Turbo Boost in Xeon E5-2600v3 series “Haswell” processors

With each new processor line, Intel introduces new architecture optimizations. The design of the “Haswell” architecture acknowledges that highly-parallel/vectorized applications place the highest load on the processor cores (requiring more power and thus generating more heat). While a CPU core is executing intensive vector tasks (AVX instructions), the clock speed may be reduced to keep the processor within its power limits (TDP).

In effect, this may result in the processor running at a lower frequency than the “base” clock speed advertised for each model. For that reason, each “Haswell” processor model is assigned two “base” frequencies:

  1. AVX mode: due to the higher power requirements of AVX instructions, clock speeds may be somewhat lower while executing AVX instructions *
  2. Non-AVX mode: while not executing AVX instructions, the processor will operate at what would traditionally be considered the “stock” frequency

* a CPU core will return to Non-AVX mode 1 millisecond after AVX instructions complete

AVX and Non-AVX Turbo Boost

Just as in previous architectures, “Haswell” CPUs include the Turbo Boost feature which causes each processor core to operate well above the “base” clock speed during most operations. The precise clock speed increase depends upon the number & intensity of tasks running on each CPU. With the “Haswell” architecture, Turbo Boost speed increases also depend upon the types of instructions (AVX vs. Non-AVX).

The two plots below show that processor clock speeds can be categorized as:

  1. All cores on the CPU actively running Non-AVX instructions
  2. All cores on the CPU actively running AVX instructions
  3. A single active core running Non-AVX instructions (all other cores on the CPU must be idle)
  4. A single active core running AVX instructions (all other cores on the CPU must be idle)

Note that despite the clear rules stated above, each value is still a range of clock speeds. Because workloads are so diverse, Intel is unable to guarantee one specific clock speed for AVX or Non-AVX instructions. Users are guaranteed that cores will run within a specific range, but each application will have to be benchmarked to determine which frequencies a CPU will operate at.

When examining the differences between AVX and Non-AVX instructions, notice that Non-AVX instructions typically result in no more than a 100MHz to 200MHz increase in the highest clock speed. However, AVX instructions may cause clock speeds to drop by 300MHz to 400MHz if they are particularly intensive.

Recall that AVX2 introduces support for both integer and floating-point instructions, which means any compute-intensive application will be using such instructions (if it has been properly designed and compiled). HPC users should expect their processors to be running in AVX mode most of the time.

Top Clock Speeds for Specific Core Counts

When workloads leave some CPU cores idle, the Xeon E5-2600v3 processors are able to use that headroom to increase the clock speed of the cores which are performing work. Just as with other Turbo Boost scenarios, the precise speed increase will depend upon the CPU model. It will also depend upon how many CPU cores are active.

We advise users to consider how many CPU cores their application is able to saturate. The tabs below detail the peak Turbo Boost frequencies for each CPU model, sorted by the number of active cores:

All of the above plots show CPU frequencies for applications utilizing AVX instructions. The colored bars indicate the worst-case scenario – CPUs will run at least this fast. The grey bars indicate the expected clock speeds for most workloads.

Cost-Effectiveness and Power Efficiency of Xeon E5-2600v3 CPUs

The “Haswell-EP” processors have nearly the same price structure and power requirements as earlier Xeon E5-2600 products, so their cost-effectiveness and power-efficiency should be quite attractive to HPC users. Savvy readers may find the following facts useful:

  • Although v3 Xeons follow the same price steps as their v2 counterparts, three High-Core-Count models were late additions. These models are higher performing and carry higher prices than previous E5-2600 models.
  • The power requirement (TDP) for each model has increased by 5 Watts over the previous generation. This is due to integration of the Voltage Regulator Modules (VRMs) which were previously placed on the motherboard. Thus, CPU TDP increases 5W and motherboard TDP decreases 5W.
  • The following graphs depict the cost-effectiveness and power-efficiency of only the CPU itself. In many cases, HPC users will find that once they’ve taken the full platform and cluster design into account, the cost-effectiveness of a higher core count CPU may be more beneficial than these plots demonstrate.

Summary of features in Xeon E5-2600v3 “Haswell-EP” processors

In addition to the capabilities mentioned at the top of this article, these processors include many of the successful features from earlier Xeon designs. The list below provides a summary of relevant technology features:

  • Up to 18 processor cores per socket (with options for 4-, 6-, 8-, 10-, 12-, 14- and 16-cores)
  • Support for Quad-channel ECC DDR4 memory speeds up to 2133MHz
  • Direct PCI-Express (generation 3.0) connections between each CPU and peripheral devices such as network adapters, GPUs and coprocessors (40 PCI-E lanes per socket)
  • Advanced Vector Extensions (AVX 2.0):
    • effectively double the throughput of integer and floating-point operations with math units expanded from 128-bits to 256-bits
    • introduce Fused Multiply Add (FMA3) instructions which allow a multiply and an accumulate instruction to be completed in a single cycle (effectively doubling the FLOPS/clock from 8 to 16 for each core of a CPU)
    • add support for additional instructions, including Gather and vector shift
    • F16C 16-bit Floating-Point conversion instructions accelerate data conversion between 16-bit and 32-bit floating point formats
  • Turbo Boost technology improves performance under peak loads by increasing processor clock speeds. With version 2.0, (introduced in “Sandy Bridge”) clock speeds are boosted more frequently, to higher speeds and for longer periods of time. With “Haswell”, top clock speeds depend upon the type of instructions (AVX vs. Non-AVX).
  • Dual Quick Path Interconnect (QPI) links between processor sockets improve communication speeds for multi-threaded applications
  • Improved energy efficiency with Per Core P-States and independent uncore frequency control
  • Intel Data Direct I/O Technology increases performance and reduces latency by allowing Intel ethernet controllers and adapters to talk directly with the processor cache
  • Advanced Encryption Standard New Instructions (AES-NI) accelerate encryption and decryption for fast, affordable data protection and security
  • 32-bit & 64-bit Intel Virtualization Technology (VT/VT-x) forDirected I/O (VT-d) and Connectivity (VT-c) deliver faster performance for core virtualization processes and provide built-in hardware support for I/O virtualization.
  • Intel APIC Virtualization (APICv) provides increased virtualization performance
  • Hyper-Threading technology allows two threads to “share” a processor core for improved resource usage. Although useful for some workloads, it is not recommended for HPC applications.

The post Detailed Specifications of the Intel Xeon E5-2600v3 “Haswell-EP” Processors appeared first on Microway.

]]>
https://www.microway.com/knowledge-center-articles/detailed-specifications-intel-xeon-e5-2600v3-haswell-ep-processors/feed/ 0
In-Depth Comparison of Intel Xeon E5-4600v2 “Ivy Bridge” Processors https://www.microway.com/knowledge-center-articles/depth-comparison-intel-xeon-e5-4600v2-ivy-bridge-processors/ https://www.microway.com/knowledge-center-articles/depth-comparison-intel-xeon-e5-4600v2-ivy-bridge-processors/#respond Wed, 05 Mar 2014 16:18:50 +0000 http://https://www.microway.com/?post_type=incsub_wiki&p=3628 This article provides in-depth discussion and analysis of the 22nm Xeon E5-4600v2 series processors (formerly codenamed “Ivy Bridge”). These “Ivy Bridge” processors improve upon the previous 32nm “Sandy Bridge” microarchitecture and are available for sale as of March 3, 2014. For an introduction, read our blog post reviewing E5-4600v2. Important changes available in E5-4600v2 “Ivy […]

The post In-Depth Comparison of Intel Xeon E5-4600v2 “Ivy Bridge” Processors appeared first on Microway.

]]>
This article provides in-depth discussion and analysis of the 22nm Xeon E5-4600v2 series processors (formerly codenamed “Ivy Bridge”). These “Ivy Bridge” processors improve upon the previous 32nm “Sandy Bridge” microarchitecture and are available for sale as of March 3, 2014. For an introduction, read our blog post reviewing E5-4600v2.

Important changes available in E5-4600v2 “Ivy Bridge” include:

  • Up to 12 processor cores per socket (with options for 4-, 6-, 8- and 10-cores)
  • Support for DDR3 memory speeds up to 1866MHz
  • Improved PCI-Express generation 3.0 support with improved compatibility and new features: atomics, x16 non-transparent bridge & quadrupled read buffers for P2P transfers
  • AVX has been extended to support F16C (16-bit Floating-Point conversion instructions) to accelerate data conversion between 16-bit and 32-bit floating point formats
  • Intel APIC Virtualization (APICv) provides increased virtualization performance

With a product this complex, it’s very difficult to cover every aspect of the design. Here, we concentrate primarily on the performance of the processors for HPC applications.

Intel Xeon E5-4600v2 Series Specifications

Intel Turbo Boost in Xeon E5-4600v2 series “Ivy Bridge” processors

Summary of features in Xeon E5-4600v2 “Ivy Bridge” processors

In addition to the capabilities mentioned at the top of this article, these processors include many of the successful features from earlier Xeon designs. The list below provides a summary of relevant technology features:

  • Up to 12 processor cores per socket (with options for 4-, 6-, 8- and 10-cores)
  • Support for Quad-channel ECC DDR3 memory speeds up to 1866MHz
  • Direct PCI-Express (generation 3.0) connections between each CPU and peripheral devices such as network adapters, GPUs and coprocessors (40 PCI-E lanes per socket). Improved PCI-Express generation 3.0 support with improved compatibility and new features: atomics, x16 non-transparent bridge & quadrupled read buffers for P2P transfers
  • Advanced Vector Extensions (AVX) accelerate floating point operations used in HPC & technical computing applications. This technology expands the math unit from 128-bits to 256-bits, effectively doubling throughput. AVX has been extended to support F16C (16-bit Floating-Point conversion instructions) to accelerate data conversion between 16-bit and 32-bit floating point formats
  • Turbo Boost technology improves performance under peak loads by increasing processor clock speeds. With version 2.0, (introduced in “Sandy Bridge”) clock speeds are boosted more frequently, to higher speeds and for longer periods of time.
  • Dual Quick Path Interconnect (QPI) links between processor sockets improve communication speeds for multi-threaded applications
  • Intel Intelligent Power Technology reduces individual idling cores to near-zero power. Power gates adjust processors and memory to the lowest available power state to meet workload requirements without impacting performance.
  • Intel Data Direct I/O Technology increases performance and reduces latency by allowing Intel ethernet controllers and adapters to talk directly with the processor cache
  • Advanced Encryption Standard New Instructions (AES-NI) accelerate encryption and decryption for fast, affordable data protection and security
  • 32-bit & 64-bit Intel Virtualization Technology (VT/VT-x) for Directed I/O (VT-d) and Connectivity (VT-c) deliver faster performance for core virtualization processes and provide built-in hardware support for I/O virtualization.
  • Intel APIC Virtualization (APICv) provides increased virtualization performance
  • Hyper-Threading technology allows two threads to “share” a processor core for improved resource usage. Although useful for some workloads, it is not recommended for HPC applications.

More information is available in Intel’s Xeon E5-4600v2 Product Brief.

The post In-Depth Comparison of Intel Xeon E5-4600v2 “Ivy Bridge” Processors appeared first on Microway.

]]>
https://www.microway.com/knowledge-center-articles/depth-comparison-intel-xeon-e5-4600v2-ivy-bridge-processors/feed/ 0
In-Depth Comparison of Intel Xeon E5-2600v2 “Ivy Bridge” Processors https://www.microway.com/knowledge-center-articles/in-depth-comparison-and-analysis-intel-xeon-e5-2600v2-ivy-bridge-processor/ https://www.microway.com/knowledge-center-articles/in-depth-comparison-and-analysis-intel-xeon-e5-2600v2-ivy-bridge-processor/#respond Tue, 10 Sep 2013 16:01:56 +0000 http://https://www.microway.com/?post_type=incsub_wiki&p=3056 This article provides in-depth discussion and analysis of the 22nm Xeon E5-2600v2 series processors (formerly codenamed “Ivy Bridge”). “Ivy Bridge” processors improve upon the previous 32nm “Sandy Bridge” microarchitecture and are available for sale as of September 10, 2013. For an introduction, read our blog post Intel Xeon E5-2600v2 “Ivy Bridge” Processor Review Important changes […]

The post In-Depth Comparison of Intel Xeon E5-2600v2 “Ivy Bridge” Processors appeared first on Microway.

]]>
This article provides in-depth discussion and analysis of the 22nm Xeon E5-2600v2 series processors (formerly codenamed “Ivy Bridge”). “Ivy Bridge” processors improve upon the previous 32nm “Sandy Bridge” microarchitecture and are available for sale as of September 10, 2013. For an introduction, read our blog post Intel Xeon E5-2600v2 “Ivy Bridge” Processor Review

Important changes available in E5-2600v2 “Ivy Bridge” include:

  • Up to 12 processor cores per socket (with options for 4-, 6-, 8- and 10-cores)
  • Support for DDR3 memory speeds up to 1866MHz
  • Improved PCI-Express generation 3.0 support with improved compatibility and new features: atomics, x16 non-transparent bridge & quadrupled read buffers for P2P transfers
  • AVX has been extended to support F16C (16-bit Floating-Point conversion instructions) to accelerate data conversion between 16-bit and 32-bit floating point formats
  • Intel APIC Virtualization (APICv) provides increased virtualization performance

With a product this complex, it’s very difficult to cover every aspect of the design. Here, we concentrate primarily on the performance of the processors for HPC applications.

Intel Xeon E5-2600v2 Series Specifications

Intel Turbo Boost in Xeon E5-2600v2 series “Ivy Bridge” processors

Summary of features in Xeon E5-2600v2 “Ivy Bridge” processors

In addition to the capabilities mentioned at the top of this article, these processors include many of the successful features from earlier Xeon designs. The list below provides a summary of relevant technology features:

  • Up to 12 processor cores per socket (with options for 4-, 6-, 8- and 10-cores)
  • Support for Quad-channel ECC DDR3 memory speeds up to 1866MHz
  • Direct PCI-Express (generation 3.0) connections between each CPU and peripheral devices such as network adapters, GPUs and coprocessors (40 PCI-E lanes per socket). Improved PCI-Express generation 3.0 support with improved compatibility and new features: atomics, x16 non-transparent bridge & quadrupled read buffers for P2P transfers
  • Advanced Vector Extensions (AVX) accelerate floating point operations used in HPC & technical computing applications. This technology expands the math unit from 128-bits to 256-bits, effectively doubling throughput. AVX has been extended to support F16C (16-bit Floating-Point conversion instructions) to accelerate data conversion between 16-bit and 32-bit floating point formats
  • Turbo Boost technology improves performance under peak loads by increasing processor clock speeds. With version 2.0, (introduced in “Sandy Bridge”) clock speeds are boosted more frequently, to higher speeds and for longer periods of time.
  • Dual Quick Path Interconnect (QPI) links between processor sockets improve communication speeds for multi-threaded applications
  • Intel Intelligent Power Technology reduces individual idling cores to near-zero power. Power gates adjust processors and memory to the lowest available power state to meet workload requirements without impacting performance.
  • Intel Data Direct I/O Technology increases performance and reduces latency by allowing Intel ethernet controllers and adapters to talk directly with the processor cache
  • Advanced Encryption Standard New Instructions (AES-NI) accelerate encryption and decryption for fast, affordable data protection and security
  • 32-bit & 64-bit Intel Virtualization Technology (VT/VT-x) forDirected I/O (VT-d) and Connectivity (VT-c) deliver faster performance for core virtualization processes and provide built-in hardware support for I/O virtualization.
  • Intel APIC Virtualization (APICv) provides increased virtualization performance
  • Hyper-Threading technology allows two threads to “share” a processor core for improved resource usage. Although useful for some workloads, it is not recommended for HPC applications.

The post In-Depth Comparison of Intel Xeon E5-2600v2 “Ivy Bridge” Processors appeared first on Microway.

]]>
https://www.microway.com/knowledge-center-articles/in-depth-comparison-and-analysis-intel-xeon-e5-2600v2-ivy-bridge-processor/feed/ 0
Performance Characteristics of Common Transports and Buses https://www.microway.com/knowledge-center-articles/performance-characteristics-of-common-transports-buses/ https://www.microway.com/knowledge-center-articles/performance-characteristics-of-common-transports-buses/#respond Fri, 19 Jul 2013 16:45:25 +0000 http://https://www.microway.com/?post_type=incsub_wiki&p=1817 Memory The following values are measured per CPU socket. They must be doubled or quadrupled to calculate the total memory bandwidth of a multiprocessor workstation or server. For dual-processor systems, multiply by two. For quad-processor systems, multiply by four. Type # Channels Theoretical Bandwidth (unidirectional) Typical Bandwidth(in Practice) DDR4 3200MHz Eight-Channel 204.8 GB/s 171.5 GB/s […]

The post Performance Characteristics of Common Transports and Buses appeared first on Microway.

]]>
Memory

The following values are measured per CPU socket. They must be doubled or quadrupled to calculate the total memory bandwidth of a multiprocessor workstation or server. For dual-processor systems, multiply by two. For quad-processor systems, multiply by four.

Type # Channels Theoretical Bandwidth (unidirectional) Typical Bandwidth
(in Practice)
DDR4 3200MHz Eight-Channel 204.8 GB/s 171.5 GB/s
DDR4 2933MHz Six-Channel 140.8 GB/s 98 GB/s
DDR4 2666MHz Six-Channel 128 GB/s 90 GB/s
DDR4 2400MHz Quad-Channel 76.8 GB/s 64 GB/s
DDR4 2133MHz Quad-Channel 68.2 GB/s 55.5 GB/s
DDR3 1866MHz Quad-Channel 59.7 GB/s 42.8 GB/s
DDR3 1600MHz Quad-Channel 51.2 GB/s
DDR3 1333MHz Quad-Channel 42.7 GB/s
DDR3 1066MHz Quad-Channel 34.1 GB/s
DDR3 1333MHz Triple-Channel 32.0 GB/s
DDR3 1066MHz Triple-Channel 25.6 GB/s
DDR3 800MHz Triple-Channel 19.2 GB/s
DDR3 1866MHz Dual-Channel 29.9 GB/s
DDR3 1600MHz Dual-Channel 25.6 GB/s
DDR3 1333MHz Dual-Channel 21.3 GB/s
DDR3 1066MHz Dual-Channel 17.0 GB/s

Theoretical memory bandwidths are calculated with: 64 bits/transfer * DDR transfers/s * number of memory channels


PCI-Express

PCI-E Generation Lanes Theoretical Bandwidth (unidirectional) Typical Bandwidth
(in Practice)
Gen 1 x4 1,000 MB/s 880 MB/s
Gen 1 x8 2,000 MB/s 1,760 MB/s
Gen 1 x16 4,000 MB/s 3,520 MB/s
Gen 2 x4 2,000 MB/s 1,600 MB/s
Gen 2 x8 4,000 MB/s 3,200 MB/s
Gen 2 x16 8,000 MB/s 6,400 MB/s
Gen 3 x4 4,000 MB/s 2,800 MB/s
Gen 3 x8 8,000 MB/s 5,600 MB/s
Gen 3 x16 16,000 MB/s 12,100 MB/s
Gen 4 x16 32,000 MB/s 26,200 MB/s

NVIDIA GPU NVLink

The NVLink connectivity on a GPU can be split different ways depending upon the system platform design. Most NVLink 1.0 configurations split the connectivity two ways or four ways (20GB/s on each of four links). NVLink 2.0 configurations can split connectivity two, three, or six ways (25GB/s on each of six links). NVLink 3.0 supports up to twelve links (25GB/s per link).

NVLink Generation Theoretical Bandwidth* (unidirectional) Typical Bandwidth
(in Practice)
NVLink 1.0 (4 bricks) 80 GB/s 73.4 GB/s
NVLink 2.0 (6 bricks) 150 GB/s 143.5 GB/s
NVLink 3.0 (12 bricks) 300 GB/s 276 GB/s

SAS and SATA

Generation Theoretical Bandwidth (unidirectional)
4x wide port
Typical Bandwidth (in Practice)
SAS / SATA
1.5Gbps (SAS/SATA I) 600 MB/s 520 / 450 MB/s
3Gbps (SAS/SATA II) 1,200 MB/s 1,140 / 990 MB/s
6Gbps (SAS II/SATA III) 2,400 MB/s 2,280 / 1,975 MB/s
12Gbps SAS 4,800 MB/s 3,107 / — MB/s

Hard Drives and SSDs

Drive Type Random IOPS Sustained Sequential I/O
SAS/SATA 7,200RPM 70 – 175 100 – 230 MB/s
SAS 10,000RPM 275 – 300 125 – 200 MB/s
SAS 15,000RPM 350 – 450 125 – 200 MB/s
SAS/SATA Solid State Drives (SSD) 15,000 – 100,000 110 – 500 MB/s
PCI-E Solid States (NVMe SSD) 70,000 – 625,000 1,100 – 3,200 MB/s

Intel QuickPath Interconnect (QPI) and UltraPath Interconnect (UPI)

The values listed below describe a single QPI/UPI link on an Intel Xeon processor. There are typically two to three UPI links between CPU sockets, but this will vary by platform. Note that the Xeon product lines are segmented. Within a given processor series (e.g., Xeon Scalable “Cascade Lake-SP”), transfer speeds will vary from model to model.

Interconnect Transfer Speed Theoretical Bandwidth (unidirectional)
QPI 4.8 GT/s 9.6 GB/s
QPI 5.6 GT/s 11.2 GB/s
QPI 6.4 GT/s 12.8 GB/s
QPI 7.2 GT/s 14.4 GB/s
QPI 8.0 GT/s 16.0 GB/s
QPI 9.6 GT/s 19.2 GB/s
UPI 10.4 GT/s 20.8 GB/s

AMD Infinity Fabric

The values listed below describe a single Infinity Fabric link on an AMD EPYC processor. In dual-socket EPYC systems, there are typically three or four links between the CPU sockets. Within each EPYC CPU, each of the eight dies on the chip is connected to the I/O die via one Infinity Fabric link.

DDR4 Memory Speed Theoretical Bandwidth (unidirectional)
Zen2/Zen3 18GT/s 72 GB/s
Zen1 10.6GT/s 42.667 GB/s

Note that links between EPYC sockets include CRC overhead, which results in 8/9ths of the bandwidth values shown above (e.g., 37.9GB/s rather than 42.6GB/s).


AMD HyperTransport Link

The values listed below describe a single HyperTransport link on an AMD Opteron processor. In many of systems, there were dual HyperTransport links between the CPUs.

Generation Transfers Theoretical Bandwidth (unidirectional)
3.1 (Socket G34) 6.4 GT/s (16-bit) 12.8 GB/s

Fibre Channel (FC)

FC Rate Theoretical Bandwidth (unidirectional)
2Gb 200 MB/s
4Gb 400 MB/s
8Gb 800 MB/s
16Gb 1600 MB/s
32Gb 3200 MB/s

See also: Performance Characteristics of Common Network Fabrics

The post Performance Characteristics of Common Transports and Buses appeared first on Microway.

]]>
https://www.microway.com/knowledge-center-articles/performance-characteristics-of-common-transports-buses/feed/ 0