cpu Archives - Microway https://www.microway.com/tag/cpu/ We Speak HPC & AI Thu, 30 May 2024 20:48:13 +0000 en-US hourly 1 https://wordpress.org/?v=6.7.1 2nd Gen AMD EPYC “Rome” CPU Review: A Groundbreaking Leap for HPC https://www.microway.com/hpc-tech-tips/amd-epyc-rome-cpu-review/ https://www.microway.com/hpc-tech-tips/amd-epyc-rome-cpu-review/#comments Wed, 07 Aug 2019 23:00:00 +0000 https://www.microway.com/?p=11787 The 2nd Generation AMD EPYC “Rome” CPUs are here! Rome brings greater core counts, faster memory, and PCI-E Gen4 all to deliver what really matters: up to a 2X increase in HPC application performance. We’re excited to present our thoughts on this advancement, and the return of x86 server CPU competition, in our detailed AMD […]

The post 2nd Gen AMD EPYC “Rome” CPU Review: A Groundbreaking Leap for HPC appeared first on Microway.

]]>

The 2nd Generation AMD EPYC “Rome” CPUs are here! Rome brings greater core counts, faster memory, and PCI-E Gen4 all to deliver what really matters: up to a 2X increase in HPC application performance. We’re excited to present our thoughts on this advancement, and the return of x86 server CPU competition, in our detailed AMD EPYC Rome review. AMD is unquestionably back to compete for the performance crown in HPC.

2nd Generation AMD EPYC “Rome” CPUs are offered in 8-64 cores and clock speeds from 2.2-3.2Ghz. They are available in dual socket as well as aselect number of single socket only SKUs.

Important changes in AMD EPYC “Rome” CPUs include:

  • Up to 64 cores, 2X the max in the previous generation for a massive advancement in aggregate throughput
  • PCI-E Gen 4 support for 2X the I/O bandwidth of the x86 competition— in a first for an x86 server CPU
  • 2X the FLOPS per core of the previous generation EPYC CPUs with the new Zen2 architecture
  • DDR4-3200 support for improved memory bandwidth across 8 channels, reaching up to 208GB/sec per socket
  • Next Generation Infinity Fabric with higher bandwidth for intra and inter-die connection, with roots in PCI-E Gen4
  • New 14nm + 7nm chiplet architecture that separates the 14nm IO and 7nm compute core dies to yield the performance per watt benefits of the new TSMC 7nm process node

Leadership HPC Performance

There’s no other way to say it: the 2nd Generation AMD EPYC “Rome” CPUs (EPYC 7xx2) break new ground for HPC performance. In our experience, we haven’t seen this type of advancement in CPU performance in many years or without exotic architectural changes. This leap applies across floating point and integer applications.

Note: This article focuses on SPEC benchmark performance (which is rooted in real integer and floating point applications). If you’re hunting for a more raw FLOPS/dollar calculation, please visit our Knowledge Center Article on AMD EPYC 7xx2 “Rome” CPUs.

Floating Point Benchmark Performance

In short: at the top bin, you may see up to 2.12X the performance of the competition. This is compared to the top bin of Xeon Gold Processor (Xeon Gold 6252) on SPECrate2017_fp_base.

Compared to the top Xeon Platinum 8200 series SKU (Xeon Platinum 8280), up to 1.79X the performance.
AMD Rome SPECfp 2017 vs Xeon CPUs - Top Bin

Integer Benchmark Performance

Integer performance largely mirrors the same story. At the top bin, you may see up to 2.49X the performance of the competition. This is compared to the top bin of Xeon Gold Processor (Xeon Gold 6252) on SPECrate2017_int_base.

Compared to the top Xeon Platinum 8200 series SKU (Xeon Platinum 8280), up to 1.90X the performance.
AMD Rome SPECint 2017 vs Xeon CPUs - Top Bin

What Makes EPYC 7xx2 Series Perform Strongly?

Contributions towards this leap in performance come from a combination of:

  • The 2X the FLOPS per core available in the new architecture
  • Improved performance of Zen2 microarchitecture
  • Moderate increases in clock speeds
  • Most importantly dramatic increases in core count

These last 2 items are facilitated by the new 7nm process node and the chiplet architecture of EPYC. Couple that with the advantages in memory bandwidth, and you have a recipe for HPC performance.

Performance Outlook


The dramatic increase in core count coupled with Zen2 means that we predict that most of the 32 core models and above, about half AMD’s SKU stack, is likely to outperform the top Xeon Platinum 8200 series SKU. Stay tuned for the SPEC benchmarks that confirm this assertion.

If you’re comparing against more modest Xeon Gold 62xx or Silver 52xx/42xx SKUs, we predict even an even more dramatic performance uplift. This is the first time in many years we’ve seen such an incredibly competitive product from the AMD Server Group.

Class Leading Price/Performance

AMD EPYC 7xx2 series isn’t just impressive from an absolute performance perspective. It’s also a price performance machine.

Examine these same two top-bin SKUs once again:
AMD Rome SPECfp 2017 vs Xeon CPUs - Price Performance

The top-bin AMD SKU does 1.79X the floating point work at approximately 2/3 the price of Xeon Platinum 8280. It delivers 2.13X the floating point performance to the Xeon Gold 6252 for about similar price/performance.

Should you be willing to accept more modest core counts with the lower cost SKUS, these comparisons just get better.

Finally, if you’re looking to roughly match or exceed the performance of the top-bin Xeon Gold 6252 SKU, we predict you’ll be able to do so with the 24-core EPYC 7352. This will be at just over 1/3 the price of the Xeon socket.

This much more typical comparison is emblematic of the price-performance advantage AMD has delivered in the new generation of CPUs. Stay tuned for more benchmark results and charts to support the prediction.

A Few Caveats: Performance Tuning & Out of the Box

Application Performance Engineers have spent years optimizing applications for the most widely available x86 server CPU. For a number of years now, that has meant Intel’s Xeon processors. The benchmarks presented here represent performance-tuned results.

We don’t yet have great data on how easy it is to achieve optimized performance with these new AMD “Rome” CPUs yet. For those of us in HPC for some time, we know out of the box performance and optimized performance often can mean very different things.

AMD does recommend specific compilers (AOCC, GCC, LLVM) and libraries (BLIS over BLAS and FLAME over LAPACK) to achieve optimized results with all EPYC CPUs. We don’t yet have a complete understanding how much these help end users achieve these superior results. Does it require a lot of tuning for the most exceptional performance?

AMD however has released a new Compiler Options Quick Reference Guide for the new CPUs. We strongly recommend using these flags and options for tuning your application.

Chiplet and Multi-Die Architecture: IO and Compute Dies

AMD EPYC Rome Die

One of the chief innovations in the 2nd Generation AMD EPYC CPUs is in the evolution of the multi-die architecture pioneered in the first EPYC CPUs.

Rather than create one, monolithic, hard to yield die, AMD has opted to lash together “chiplets” together in a single socket with Infinity Fabric technology.

Compute Dies (now in 7nm)

8 compute chiplets (formally, Core Complex Dies or CCDs) are brought together to create a single socket. These CCDs take advantage of the latest 7nm TSMC process node. By using 7nm for the compute cores in 2nd Generation EPYC, AMD takes advantage of the space and power efficiencies of the latest process—without the yield issues of single monolithic die.

What does it mean for you? More cores than anticipated in a single socket, a reasonable power efficiency for the core count, and a less costly CPU.

The 14nm IO Die

In 2nd Generation EPYC CPUs, AMD has gone a step further with the chiplet architecture. These chiplets are now complemented by an separate I/O die. The IO Die contains the memory controllers, PCI-Express controllers, and Infinity Fabric connection to the remote socket.Also, this resolves any NUMA affinity quirks of the 1st generation EPYC Processors.

Moreover, the I/O die is created in the established 14nm node process. It’s less important that it utilize the same 7nm power efficiencies.

DDR4-3200 and Improved Memory Bandwidth

AMD EPYC 7xx2 series improves its theoretical memory bandwidth when compared to both its predecessor and the competition.

DDR4-3200 DIMMs are supported, and they are clocked 20% faster than DDR4-2666 and 9% faster than DDR4-2933.
In summary, the platform offers:

  • Compared to Cascade Lake-SP (Xeon Platinum/Gold 82xx, 62xx): Up to a 45% improvement in memory bandwidth
  • Compared to Skylake-SP (Xeon Platinum/Gold 81xx, 61xx): Up to a 60% improvement in memory bandwidth
  • Compared to AMD EPYC 7xx1 Series (Naples): Up to a 20% improvement in memory bandwidth



These comparisons are created for a system where only the first DIMM per channel is populated. Part of this memory bandwidth advantage is derived from the increase in DIMM speeds (DDR4-3200 vs 2933/2666); part of it is derived from EPYC’s 8 memory channels (vs 6 on Xeon Skylake/Cascade Lake-SP).

While we’ve yet to see final STREAM testing numbers for the new CPUs, we do anticipate them largely reflecting the changes in theoretical memory bandwidth.

PCI-E Gen4 Support: 2X the I/O bandwidth

EPYC “Rome” CPUs have an integrated PCI-E generation 4.0 controller on the I/O die. Each PCI-E lane doubles in maximum theoretical bandwidth to 4GB/sec (bidirectional).

A 16 lane connection (PCI-E x16 4.0 slot) can now deliver up to 64GB/sec of bidirectional bandwidth (32GB/uni). That’s 2X the bandwidth compared to first generation EPYC and the x86 competition.

Broadening Support for High Bandwidth I/O Devices

Mellanox ConnectX-6 Adapter
The new support allows for higher bandwidth connection to InfiniBand and other fabric adapters, storage adapters, NVMe SSDs, and in the future GPU Accelerators and FPGAs.

Some of these devices, like Mellanox ConnectX-6 200Gb HDR InfiniBand adapters, were unable to realize their maximum bandwidth in a PCI-E Gen3 x16 slot. Their performance should improve in PCI-E Gen4 x16 slot with 2nd Generation AMD EPYC Processors.

2nd Generation AMD EPYC “Rome” is the only x86 server CPU with PCI-E Gen4 support at its launch in 3Q 2019. However, we have seen PCI-E Gen4 support before in the POWER9 platform.

System Support for PCI-E Gen4

Unlike in the previous generation AMD EPYC “Naples” CPUs, there is not strong affinity of PCI-E lanes to a particular chiplet inside the processor. In Rome, all I/O traffic routes through the I/O die and all chiplets reach PCI-E devices through this die.

In order to support PCI-E Gen4, server and motherboard manufacturers are producing brand new versions of their platforms. Not every Rome-ready platform supports Gen4, so if this is a requirement be sure to specify this to your hardware vendor. Our team can help you select a server with full Gen4 capability.

Infinity Fabric

AMD Infinity Fabric DiagramDeeply interrelated with PCI-Express Gen4, AMD has also improved the Infinity Fabric Link between chiplets and sockets with the new generation of EPYC CPUs.

AMD’s Infinity Fabric has many commonalities with PCI-Express used to connect I/O devices. With 2nd Generation AMD EPYC “Rome” CPUs, the link speed of Infinity Fabric has doubled. This allows for higher bandwidth communication between dies on the same socket and to dies on remote sockets.

The result should be improved application performance for NUMA-aware and especially non- NUMA-aware applications. The increased bandwidth should help hide any transport bandwidth issues to I/O devices on a remote socket as well. The overall result is “smoother” performance when applications scale across multiple chiplets and sockets.

SKUs and Strategies to Consider for HPC Clusters

Here are the complete list of SKUs and 1KU (1000 unit) prices (Source: AMD). Please note that these costs are those for CPUs sold to channel integrators, not those for fully integrated systems with these CPUs.

Dual Socket SKUs

SKUCoresBase ClockBoost ClockL3 CacheTDPPrice
7742642.253.4256MB225W$6950
77022.03.35200W$6450
7642482.33.3225W$4775
75522.23.3192MB200W$4025
7542322.93.4128MB225W$3400
75022.53.35180W$2600
74522.353.35155W$2025
7402242.83.35128MB180W$1783
73522.33.2155W$1350
7302163.03.3128MB$978
72822.83.264MB120W$650
7272122.93.2$625
726283.23.4128MB155W$575
72523.23.464MB120W$475

EPYC 7742 or 7702 (64c): Select a High-End SKU, yield up to 2X the performance

Assuming your application scales with core count and maximum performance at a premium cost fits with your budget, you can’t beat the top 64core EPYC 7742 or 7702 SKUs. These will deliver greater throughput on a wide variety of multi-threaded applications.

Anything above EPYC 7452 (32c, 48c): Select a Mid-High Level SKU, reach new performance heights

While these SKUs aren’t inexpensive, they take application performance to new heights and break new benchmark ground. You can take advantage of that performance advantage for your application if it’s multi-threaded. From a price/performance perspective, these SKUs may also be attractive.

EPYC 7452 (32c): Select a Mid Level SKU, improve price performance vs previous generation EPYC

Previous generation AMD EPYC 7xx1 Series CPUs also featured 32 cores. However, the 32 core entrant in the new 7xx2 stack is far less costly than the prior generation while delivering greater memory bandwidth and 2X the FLOPS per core.

EPYC 7452 (32c): Select a Mid Level SKU, match top Xeon Gold and Platinum with far better price/performance

If you’re optimizing for price/performance compared to the top Intel Xeon Platinum 8200 or Xeon Gold 6200 series SKUs, consider this SKU or ones near it. We predict this to be at or near the price/performance sweet-spot for the new platform.

EPYC 7402 (24c): Select a Mid Level SKU, come close to top Xeon Gold and Platinum SKUs

The higher clock speed of this SKU also means it is well suited to some applications.

EPYC 7272-7402 (12, 16 24c):Select an affordable SKU, yield better performance and price performance

Treat these SKUs as much more affordable alternatives to most Xeon Gold or Silver CPUs. We’ll await further benchmarks to see exactly where the further sweet-spots are compared to these SKUs. They also compare favorably from a price/performance standpoint to prior generation 1st Generation EPYC 7xx1 processors with 12, 16, or 24 cores. Same performance, fewer dollars!

Single Socket Performance

As with the previous generation, AMD is heavily promoting the concept of replacing dual socket Intel Xeon servers with single sockets of 2nd Generation AMD EPYC “Rome.” They are producing discounted “P” SKUs with only single socket platform support at reduced prices to help further boost the price-performance advantage of these systems.

Single Socket SKUs

SKUCoresBase ClockBoost ClockL3 CacheTDPPrice
7702P642.03.35256MB200W$4425
7502P322.53.35128MB180W$2300
7402P242.83.35$1250
7302P163.03.3155W$825
7232P83.13.232MB120W$450

Due to the boosted capability of the new CPUs, a single socket configuration my be increasingly viable comparison to a dual socket Xeon platform for many workloads.

Next Steps: get started today!

Read More

If you’d like to read more speeds and feeds about these new processors, check out our article with detailed specifications of the 2nd Gen AMD EPYC “Rome” CPUs. We summarize and compare the specifications of each model, and provide guidance over and beyond what you’ve seen here.

Try 2nd Gen AMD EPYC CPUs for Yourself

Groups which prefer to verify performance before making a design are encouraged to sign up for a Test Drive, which will provide you with access to bare-metal hardware with AMD EPYC CPUs, large-memory, and more.

Browse Our Navion AMD EPYC Product Line

WhisperStation

Ultra-Quiet AMD EPYC workstations

Learn More

Servers

High performance AMD EPYC rackmount servers

Learn More

Clusters

Leadership performance clusters from 5-500 nodes

Learn More

The post 2nd Gen AMD EPYC “Rome” CPU Review: A Groundbreaking Leap for HPC appeared first on Microway.

]]>
https://www.microway.com/hpc-tech-tips/amd-epyc-rome-cpu-review/feed/ 2
Intel Xeon Scalable “Cascade Lake SP” Processor Review https://www.microway.com/hpc-tech-tips/intel-xeon-scalable-cascade-lake-sp-processor-review/ https://www.microway.com/hpc-tech-tips/intel-xeon-scalable-cascade-lake-sp-processor-review/#comments Tue, 02 Apr 2019 17:00:45 +0000 https://www.microway.com/?p=11305 With the launch of the latest Intel Xeon Scalable processors (previously code-named “Cascade Lake SP”), a new standard is set for high performance computing hardware. These latest Xeon CPUs bring increased core counts, faster memory, and faster clock speeds. They are compatible with the existing workstation and server platforms that have been shipping since mid-2017. […]

The post Intel Xeon Scalable “Cascade Lake SP” Processor Review appeared first on Microway.

]]>
With the launch of the latest Intel Xeon Scalable processors (previously code-named “Cascade Lake SP”), a new standard is set for high performance computing hardware. These latest Xeon CPUs bring increased core counts, faster memory, and faster clock speeds. They are compatible with the existing workstation and server platforms that have been shipping since mid-2017. Starting today, Microway is shipping these new CPUs across our entire line of turn-key Xeon workstations, systems, and clusters.

Important changes in Intel Xeon Scalable “Cascade Lake SP” Processors include:

  • Higher CPU core counts for many SKUs in the product stack
  • Improved CPU clock speeds (with Turbo Boost up to 4.4GHz)
  • Introduction of the new AVX-512 VNNI instruction for Intel Deep Learning Boost (VNNI)
    provides significant, more efficient deep learning inference acceleration
  • Higher memory capacity & performance:
    • Most CPU models provide increased memory speeds
    • Support for DDR4 memory speeds up to 2933MHz
    • Large-memory capabilities with Intel Optane DC Persistent Memory
    • Support for up to 4.5TB-per-socket system memory
  • Integrated hardware-based security mitigations against side-channel attacks

More for Your Dollar: performance uplift

With an increase in core counts, clock speeds, and memory speeds, applications will achieve better performance across the board. Particularly in the lower-end Xeon 4200- and 5200-series CPUs, the cost-effectiveness of the processors has increased considerably. The plot below compares the price of each processor against its performance. Both the current “Cascade Lake CP” and previous-generation “Skylake-SP” are shown:

Comparison chart of Intel Xeon Cascade Lake SP cost-effectiveness vs Skylake-SP for applications with AVX-512 instructions
In the diagram above, the wide colored bars indicate the price performance of these new Xeon CPUs. The dots indicate the price performance of the previous generation, which allows us to compare the two generations SKU by SKU (though a few of the newer models do not have previous-generation counterparts). In this comparison, lower values are better and indicate a higher quantity of computation per dollar spent.

Same SKU – More Performance

As shown above, many models offer more performance than their previous-generation counterpart. Here we highlight models which are showing particularly substantial improvements:

  • Xeon 4210 is 34% more price-performant than Xeon 4110
  • Xeon 4214 is 30% more price-performant than Xeon 4114
  • Xeon 4216 is 25% more price-performant than Xeon 4116
  • Xeon 5218 is 40% more price-performant than Xeon 5118
  • Xeon 5220 is 34% more price-performant than Xeon 5120
  • Xeon 6242 saw an 8% increase in clock speed and ~10% reduction in price
  • Xeon 8270 is 28% more price-performant than Xeon 8170

To summarize: this latest generation will provide more performance for the same cost if you stick with the model numbers you’ve been using. In the next section, we’ll review opportunities for cost reduction.

More for Less: Select a more modest Cascade Lake SKU for the same core count or performance

With generational improvements, it’s not unusual for a new CPU to replace a higher-end version of the older generation. There are many cases where this is true in the Cascade Lake Xeon CPUs, so be sure to consider if you can leverage such savings.

Guaranteed savings

  • Xeon 4208 replaces the Xeon 4110: providing the same 8 cores for a lower price
  • Xeon 4210 replaces the Xeon 4114: providing the same 10 cores for a lower price
  • Xeon 4214 surpasses the Xeon 4116: providing the same 12 cores at higher clock speeds
  • Xeon 5218 surpasses the Xeon 5120: providing more cores, higher clock speeds, and faster memory speeds

Worthy of consideration

  • Xeon 4216 may replace most of the 5100-series: Xeon 5115, 5118 and 5120
    Nearly all specifications are equivalent, but the UPI speed of the Xeon 4216 is 9.6GT/s rather than 10.4GT/s
  • Xeon 6230 likely replaces the Xeon 6130, 6138, 6140: providing the same or more cores for a lower price
  • Xeon 6240 competes with every Xeon 6100-series model
    with the exception that it does not provide 3+GHz processor frequencies

Greater Memory Bandwidth

For computationally-intensive applications, rapid access to data is critical. Thus, memory speed increases are valuable improvements. This generation of CPUs brings a 10% improvement to the Xeon 5200-series (2666MHz; up from 2400MHz) and the Xeon 6200-/8200-series (2933MHz; up from 2666MHz). This means that the Xeon 5200-series CPUs are more competitive (they’re running memory at the same speed as last generation’s Xeon 6100- and 8100-series processors). And the higher-end Xeon 6200-/8200-series CPUs have a 10% memory performance advantage over all others.

While a 10% improvement may seem to be only a modest improvement, keep in mind that it’s essentially a free upgrade. Combined with the other features and improvements discussed above, you can be confident you’re making the right choice by upgrading to these newest Intel Xeon Scalable CPUs.

Enabling Very Large Memory Capacity

With the official launch of Intel Optane DC Persistent Memory, it is now possible to deploy systems with multiple terabytes of system memory. Well-equipped systems provide each Xeon CPU with six Optane memory modules (alongside six standard memory modules). This results in up to 3TB of Optane memory and 1.5TB of standard DRAM per CPU! Look for more information on these possibilities as HPC sites begin adopting and exploring this new technology.

Transitioning from the “Skylake-SP” Intel Xeon Scalable CPUs

Because the new “Cascade Lake SP” CPUs are socket-compatible with the previous-generation “Skylake SP” CPUs, the upgrade path is simple. All existing platforms that support the earlier CPUs can also accept these new CPUs. This also simplifies the choice for those considering a new system: the new CPUs use existing, proven platforms. There’s little risk in selecting the latest and highest-performance components. HPC sites adding to existing clusters will find they have a choice: spend the same for increased performance or spend less for the same performance. Below are peak performance comparisons of the previous generation CPUs with the new generation:

The wider/colored bars indicate peak performance for the new Xeon CPUs. The slim grey bars indicate peak performance for the previous-generation Xeon CPUs. Without exception, the new CPUs are expected to outperform their predecessors. The widest margins of improvement are in the lower-end Xeon 4200- and 5200-series.

Standout performance in a single socket

This generation introduces three CPU models designed for single-socket systems (providing very high throughput at relatively low-cost). They provide 20+ CPU cores at prices as much as $2,000 less than their multi-socket counterparts. If your workload performs well with a single CPU, these SKUs will be incredibly valuable:

  • Xeon 6209U outperforms nearly all of last generation’s Xeon Gold 6100-series CPUs
  • Xeon 6210U outperforms all Xeon 6100-series and many 6200-series CPUs
  • Xeon 6212U outperforms several of the Xeon 8100-series CPUs

The only exception to the above would be for applications which require very high clock speeds, as these single-socket CPU models do not provide base processor frequencies higher than 2.5GHz. The strength of these single-socket processors is in high throughput (via high core count) and decent clock speeds.

Next Steps: get started today!

Read More

If you’d like to read more about these new processors, check out our article with detailed specifications of the Intel Xeon “Cascade Lake SP” CPUs. We summarize and compare the specifications of each model, and provide guidance on which models are likely to be best suited to computationally-intensive HPC & Deep Learning applications.

Try Intel Xeon Scalable CPUs for Yourself

Groups which prefer to verify performance before making a design are encouraged to sign up for a Test Drive, which will provide you with access to bare-metal hardware with Intel Xeon Scalable CPUs, large-memory, and more.

Speak with an Expert

If you’re expecting to be upgrading or deploying new systems in the coming months, our experts would be happy to help you consider your options and design a custom cluster optimized to your workloads. We also help groups writing budget proposals to ensure they’re requesting the correct resources. Please get in touch!

The post Intel Xeon Scalable “Cascade Lake SP” Processor Review appeared first on Microway.

]]>
https://www.microway.com/hpc-tech-tips/intel-xeon-scalable-cascade-lake-sp-processor-review/feed/ 1
Intel Xeon E5-2600 v4 “Broadwell” Processor Review https://www.microway.com/hpc-tech-tips/intel-xeon-e5-2600-v4-broadwell-processor-review/ https://www.microway.com/hpc-tech-tips/intel-xeon-e5-2600-v4-broadwell-processor-review/#respond Thu, 31 Mar 2016 16:30:00 +0000 https://www.microway.com/?p=7135 Today we begin shipping Intel’s new Xeon E5-2600 v4 processors. They provide more CPU cores, more cache, faster memory access and more efficient operation. These are based upon the Intel microarchitecture code-named “Broadwell” – we expect them to be the HPC processors of choice. Important changes in Xeon E5-2600 v4 include: Move faster with Xeon […]

The post Intel Xeon E5-2600 v4 “Broadwell” Processor Review appeared first on Microway.

]]>
Today we begin shipping Intel’s new Xeon E5-2600 v4 processors. They provide more CPU cores, more cache, faster memory access and more efficient operation. These are based upon the Intel microarchitecture code-named “Broadwell” – we expect them to be the HPC processors of choice.

Important changes in Xeon E5-2600 v4 include:

  • Up to 22 processor cores per CPU
  • Support for DDR4 memory speeds up to 2400MHz
  • Faster Floating Point Instruction performance
  • Improved parallelism in scheduling micro-operations
  • Improved performance for large data sets

Move faster with Xeon E5-2600 v4

Expect these new processors to be more nimble than their predecessors. A variety of microarchitecture improvements have been added to increase parallelism, speed up processing time, and strip out inefficiencies from previous models. Broadwell reduces the time to complete a multiplication by 40% (division operations also complete more quickly). Each core’s ability to optimize instruction ordering has been improved by ~6%. The tables which manage on-die L2 cache have been expanded to speed up memory operations. Several CPU instruction latencies have been reduced. Overall, Intel expects these new CPUs to complete at least 5% more instructions on every clock cycle.

For complete details, please see our Detailed Analysis of the Intel Xeon E5-2600v4 “Broadwell-EP” Processors

Transitioning from “Haswell” E5-2600 v3 Series Xeons

Because the new “Broadwell” CPUs are socket-compatible with the previous-generation “Haswell” CPUs, the upgrade path is simple. All existing platforms that support v3 CPUs can also accept v4 CPUs. This also simplifies the choice for those considering a new system: the new CPUs use existing, proven platforms. There’s little risk in selecting the latest and highest-performance components. Those who are adding to existing HPC clusters will find they have a choice: spend the same for increased performance or spend less for the same performance. Here is a comparison of the older generation with this new generation:

Comparison between Xeon E5-2600v4 vs Xeon E5-2600v3 Theoretical Peak Performance when using FMA3 and AVX Instructions

Get more for less – improved cost-effectiveness

Because each CPU core offers increased performance, and many models offer a higher core count, a lower-end CPU model can match performance with many of the older CPU models. Here are a few comparisons of note:

  • Xeon E5-2630v4 offers performance equivalent to the E5-2640v3 (and can even challenge the E5-2650v3)
  • Xeon E5-2640v4 matches the E5-2650v3 in nearly every case
  • Xeon E5-2650v4 matches the E5-2660v3 in nearly every case (and challenges the E5-2670v3)
  • Xeon E5-2660v4 will beat the E5-2670v3 and E5-2680v3 on well-parallelized applications
  • Xeon E5-2680v4 and Xeon E5-2690v4 best almost every E5-2600v3 CPU

Notable adjustments

Note that the E5-2670 CPU model has been removed from the line-up. This simplifies choice and did not come as a surprise to us: the majority of our customers had been selecting the E5-2680 and E5-2690 over the E5-2670. As noted above, the E5-2650 v4 or E5-2660 v4 can easily stand in for the older E5-2670 v3.

The E5-2623 CPU model has been modified in such a way that it isn’t ideal for the same workloads. Previously, it was a relatively high-clock-speed model available at a low price. However, the base clock speed has been adjusted downwards by 18%.

Next Steps – Putting Xeon E5-2600 v4 into Production

All of our Xeon workstations, servers & clusters are immediately available with these new CPUs. They are socket-compatible with all Xeon E5-2600 v3 platforms, so your existing systems can also be upgraded. Our most popular products which leverage these new Xeon processors are:

The post Intel Xeon E5-2600 v4 “Broadwell” Processor Review appeared first on Microway.

]]>
https://www.microway.com/hpc-tech-tips/intel-xeon-e5-2600-v4-broadwell-processor-review/feed/ 0
Intel Xeon E5-4600v3 “Haswell” 4-socket CPU Review https://www.microway.com/hpc-tech-tips/intel-xeon-e5-4600v3-cpu-review/ https://www.microway.com/hpc-tech-tips/intel-xeon-e5-4600v3-cpu-review/#respond Mon, 01 Jun 2015 07:01:08 +0000 http://https://www.microway.com/?p=5258 Intel has launched new 4-socket Xeon E5-4600v3 CPUs. They are the perfect choice for “just beyond dual socket” system scaling. Leverage them for larger memory capacity, faster memory bandwidth, and higher core-count when you aren’t ready for a multi-system purchase. Here are a few of the main technical improvements: Why pick a 4-socket Xeon E5-4600v3 CPU […]

The post Intel Xeon E5-4600v3 “Haswell” 4-socket CPU Review appeared first on Microway.

]]>
Intel has launched new 4-socket Xeon E5-4600v3 CPUs. They are the perfect choice for “just beyond dual socket” system scaling. Leverage them for larger memory capacity, faster memory bandwidth, and higher core-count when you aren’t ready for a multi-system purchase.

Here are a few of the main technical improvements:

  • DDR4-2133 memory support, for increased memory bandwidth
  • Up to 18 cores per socket, faster QPI links up to 9.6GT/sec between sockets
  • Up to 48 DIMMs per server, for a maximum of 3TB memory
  • Haswell core microarchitecture with new instructions

Why pick a 4-socket Xeon E5-4600v3 CPU over a 2 socket solution?

Increased memory space vs 2 socket

Dual socket systems max out at 512GB affordably (1TB at cost); however, many HPC users have models that outgrow that memory space. Xeon E5-4600v3 systems double the DIMM count for up to 1.5TB affordably (3TB at higher cost).

For applications like ANSYS, COMSOL, and other CAE, multiphysics, and CFD suites, this can be a game changer. Traditionally, achieving these types of memory capacities required large multi-node cluster installations. Usage of such a cluster to run simulations is almost always more effort. The Xeon E5-4600v3 permits larger models to run on a single system with a familiar single OS instance. Don’t underestimate the power of ease-of-use.

Increased core count vs 2 socket

Hand-in-hand with the memory space comes core count. What good are loading up big models if you can’t scale compute throughput to run the simulations? The Xeon E5-4600v3 CPUs mean systems deliver up to 72 cores. Executing on that scale means a faster time to solution for you and more work accomplished.

Increased aggregate memory bandwidth

One overlooked aspect of 4P systems is superior memory bandwidth. Intel integrates the same memory controller in the Xeon E5-2600v3 CPUs into each Xeon E5-4600v3 socket. However, there’s twice as many CPUs in each system: the net result is 2X the aggregate memory bandwidth per system.

Increased memory bandwidth per core (by selecting 4 sockets but fewer cores per socket)

Users might be concerned about memory bandwidth per CPU core. We find that CFD and multiphysics applications are especially sensitive. But a 4-socket system presents unique opportunities: you may select fewer cores per socket while achieving the same core count.

If you select smartly, you will have 2X the memory bandwidth per core available in your system vs. a 2 socket solution. This strategy can also be used to maximize throughput for a software license with a hard core count ceiling.

Detailed Technical Improvements

You’ve heard the why, but the nuts and bolts generation-to-generation improvements matter too. Let’s review in detail:

DDR4-2133 memory support- bandwidth and efficiency

Memory bandwidth is critical for HPC users. CFD, CAE/simulation, life-sciences and custom coded applications benefit most. With the new CPUs, you’ll see the following improvements over Xeon E5-4600v2:

  • Entry-level “Basic” CPU operates memory at 1600Mhz (increase of 20%)
  • Mid-level “Standard” CPUs now operate memory at 1866Mhz (increase of 16%)
  • Higher-end “Advanced”“High Core Count” & “Frequency Optimized” CPUs now support up to 4 DIMMs per socket at 2133MHz (increase of 14%), 8 DIMMs per socket with LR-DIMMS

The increase in memory clocks means Xeon E5-4600v3 delivers more memory bandwidth per socket, up to 68GB/sec. Moreover, DDR4 DIMMs operate at 1.2v resulting in a substantial power-efficiency gain.

Increased core counts – more for your money

Throughout the stack, core counts are increasing:

  • Xeon E5-4610v3 and E5-4620v3: 10 cores per socket, a 25% core count increase over the previous generation
  • Xeon E5-4640v3, E5-4650v3: 12 cores per socket, a 50% core count increase over the previous generation
  • E5-4669v3: 18 cores per socket, a 33% core count increase over the previous generation
  • New E5-4660v3 SKU delivers 14 cores per socket with a reasonable 120W TDP

Increased core counts means deploying larger jobs, scheduling more HPC users on the same system, and deploying more virtual machines. It also helps increase the aggregate throughput of your systems. You can do far more work with Xeon E5-4600v3.

Memory latency and DIMM size

DDR4 doesn’t just mean faster clocks – it also brings with it support for fewer compromises and larger DIMM sizes. 32GB DIMMs are now available as registered as well as load reduced (32GB DDR4-2133 RDIMMs vs. 32GB DDR4-2133 LRDIMMs) modules. The shift to a traditional register in an RDIMM from a specialty buffer in an LRDIMM means a substantial latency decrease.

Advances in manufacturing for DDR4 also mean larger DIMM sizes. 64GB LRDIMMs are now being manufactured to help support that outstanding 3TB memory capacity.

Haswell microarchitecture and AVX2

AVX2 is an advanced CPU instruction set that debuted in the Haswell architecture and has shown strong benefits:

  • New floating point FMA, with up to 2X the FLOPS per core (16 FLOPS/clock)
  • 256-bit wide integer vector instructions

These new instructions are extremely consequential. We encourage you to learn more about these improvements, and how to compile for the new instructions, with our post on AVX2 Optimization.

Intel Xeon E5-4600v3 Series Specifications

Model Frequency Frequency (AVX) Turbo Boost Core Count L3 Cache QPI Speed Memory Speed TDP (Watts)
E5-4669v3 2.10 GHz 1.80 GHz 2.90 GHz 18 45MB 9.6 GT/s 2133 MHz 135W
E5-4667v3 2.00 GHz 1.70 GHz 2.90 GHz 16 40MB
E5-4660v3 2.10 GHz 1.80 GHz 2.90 GHz 14 35MB 120W
E5-4650v3 2.10 GHz 1.80 GHz 2.80 GHz 12 30MB 105W
E5-4640v3 1.90 GHz 1.60 GHz 2.60 GHz 8.0 GT/s 1866 MHz
E5-4620v3 2.00 GHz 1.70 GHz 2.60 GHz 10 25MB
E5-4610v3 1.70 GHz 1.70 GHz None 6.4 GT/s 1600 MHz

HPC groups do not typically choose Intel’s “Basic” models (e.g., E5-4610v3)

Intel Xeon E5-4600v3 Frequency Optimized SKUs

Model Frequency Frequency (AVX) Turbo Boost Core Count L3 Cache QPI Speed Memory Speed TDP (Watts)
E5-4655v3 2.90 GHz 2.60 GHz 3.20 GHz 6 30MB 9.6 GT/s 2133 MHz 135W
E5-4627v3 2.60 GHz 2.30 GHz 3.20 GHz 10 25MB

The above SKUs offer better memory bandwidth per core

Next steps

We think the improvements in the Xeon E5-4600v3 CPUs make them a unique alternative to far more complicated HPC installations and a worthwhile upgrade from their predecessors. Want to learn more about the Xeon E5-4600v3 CPUs? Talk with an expert and assess how they might fit your HPC needs.

The post Intel Xeon E5-4600v3 “Haswell” 4-socket CPU Review appeared first on Microway.

]]>
https://www.microway.com/hpc-tech-tips/intel-xeon-e5-4600v3-cpu-review/feed/ 0
AVX2 Optimization and Haswell-EP (Xeon E5-2600v3) CPU Features https://www.microway.com/hpc-tech-tips/avx2-optimization-and-haswell-ep-cpu-features/ https://www.microway.com/hpc-tech-tips/avx2-optimization-and-haswell-ep-cpu-features/#respond Fri, 03 Oct 2014 22:14:29 +0000 http://https://www.microway.com/?p=4747 We’re very excited to be delivering systems with the new Xeon E5-2600v3 and E5-1600v3 CPUs. If you are the type who loves microarchitecture details and compiler optimization, there’s a lot to gain. If you haven’t explored the latest techniques and instructions for optimization, it’s never a bad time to start. Many end users don’t always […]

The post AVX2 Optimization and Haswell-EP (Xeon E5-2600v3) CPU Features appeared first on Microway.

]]>
We’re very excited to be delivering systems with the new Xeon E5-2600v3 and E5-1600v3 CPUs. If you are the type who loves microarchitecture details and compiler optimization, there’s a lot to gain. If you haven’t explored the latest techniques and instructions for optimization, it’s never a bad time to start.

Many end users don’t always see instruction changes as consequential. However, they can be absolutely critical to achieving optimal application performance. Here’s a comparison of Theoretical Peak Performance of the latest CPUs with and without FMA3:
Plot of Xeon E5-2600v3 Theoretical Peak Performance (GFLOPS)

Only a small set of codes will be capable of issuing almost exclusively FMA instructions (e.g., LINPACK). Achieved performance for well-parallelized & optimized applications is likely to fall between the grey and colored bars. Still, without employing a compiler optimized for FMA3 instructions, you are leaving significant potential performance of your Xeon E5-2600v3-based hardware purchase on the table.

Know your CPUs, know your instructions

With that in mind, we would like to summarize and link to these new resources from Intel:

Intel: Xeon E5-2600v3 Technical Overview

  • A brief summary of Haswell-NI (Haswell New Instructions) that add dedicated instructions for signal processing, encryption, and math functions
  • Summary of power improvements in the Haswell architecture
  • Detailed comparison of C600 and C610 series chipsets
  • Virtualization improvements and new security features

Intel: How AVX2 Improves Performance on Server Applications

  • Instructions on how to recompile your code for AVX2 instructions and supported compilers
  • Other methods of employing AVX2: Intel MKL, coding with intrinsic instructions, and assembly
  • Summary of LINPACK performance gains delivered simply by using AVX2

Deliver the highest performance for your applications by taking of advantage of the latest the Intel architecture. For more information, contact a Microway HPC expert:

 

The post AVX2 Optimization and Haswell-EP (Xeon E5-2600v3) CPU Features appeared first on Microway.

]]>
https://www.microway.com/hpc-tech-tips/avx2-optimization-and-haswell-ep-cpu-features/feed/ 0
Intel Xeon E5-2600 v3 “Haswell” Processor Review https://www.microway.com/hardware/intel-xeon-e5-2600-v3-haswell-processor-review/ https://www.microway.com/hardware/intel-xeon-e5-2600-v3-haswell-processor-review/#comments Mon, 08 Sep 2014 17:00:48 +0000 http://https://www.microway.com/?p=4561 Update: As of March 31, 2016 we recommend version four of these Intel Xeon CPUs. Please see our new post Intel Xeon E5-2600 v4 “Broadwell” Processor Review Intel has launched brand new Xeon E5-2600 v3 CPUs with groundbreaking new features. These CPUs build upon the leading performance of their predecessors with more a robust microarchitecture, […]

The post Intel Xeon E5-2600 v3 “Haswell” Processor Review appeared first on Microway.

]]>
Update:

As of March 31, 2016 we recommend version four of these Intel Xeon CPUs. Please see our new post Intel Xeon E5-2600 v4 “Broadwell” Processor Review

Intel has launched brand new Xeon E5-2600 v3 CPUs with groundbreaking new features. These CPUs build upon the leading performance of their predecessors with more a robust microarchitecture, faster memory, wider buses, and increased core counts and clock speed. The result is dramatically improved performance for HPC.

Important changes available in E5-2600 v3 “Haswell” include:

  • Support for brand new DDR4-2133 memory
  • Up to 18 processor cores per socket (with options for 6- to 16-cores)
  • Improved AVX 2.0 Instructions with:
    • New floating point FMA, with up to 2X the FLOPS per core (16 FLOPS/clock)
    • 256-bit wide integer vector instructions
  • A revised C610 Series Chipset delivering substantially improved I/O for every server (SATA, USB 3.0)
  • Increased L1, L2 cache bandwidth and faster QPI links
  • Slightly tweaked “Grantley” socket (Socket R3) and platforms

DDR4: Memory Architecture for the Present and Future

Xeon E5-2600 v3 is one of the first server CPUs to support DDR4 memory. DDR4 is big news: it takes advantage of a new design with fewer chips on each module, lower voltages, and superior power efficiency (20% less power per module). Apart from the benefits today, these changes ensure DDR4 DIMMs are primed to accept ever higher chip densities and clocks that exceed those of today’s DDR4-2133 modules. Physical characteristics of the DIMMs themselves have changed too: a slight curvature for easier seating and more PINs on each module.

Memory Performance

On top of the new JDEC standard for DIMMs themselves, Intel has increased the memory speed stepping for all  Xeon E5-2600 v3 CPU SKUs. The result is a 13-20% increase in memory performance:

  • Entry-level “Basic” CPUs now support 1600MHz memory (a 20% increase)
  • Mid-level “Standard” CPUs now support 1866MHz memory (a 16% increase)
  • Higher-end “Advanced”“High Core Count” & “Frequency Optimized” CPUs now support up to 4 DIMMs per socket at 2133MHz (a 14% increase)

Finally, it’s worth noting that configurations that populate 3 DIMMs per channel (up to a 40% performance penalty with older Xeons) or LR-DIMMs (14-40% penalty on previous gen, depending on population) see far higher frequencies than on current CPUs.

In short, DDR4 means even higher memory bandwidth today – a critical driver of HPC performance. It pairs nicely with the increased core counts of the new CPUs.

New Instructions – AVX 2.0

One of the primary drivers of the Xeon E5-2600 CPUs’ robust performance has been wider instructions, termed AVX (Advanced Vector Instructions). Intel has made its largest improvement to AVX in 3 years with Haswell’s addition of AVX 2.0:

256-bit integer instructions

Sandy-Bridge and Ivy Bridge CPUs delivered class leading floating point performance due to a 256-bit floating point unit in each core. This unit was twice as wide as that in previous Xeon CPUs and enabled twice the FLOPS of competing CPUs.

The integer unit remained at 128-bit (identical in Sandy Bridge and Ivy Bridge), but integer performance was buttressed with comparatively high clock speeds and Turbo Boost features.

With Xeon E5-2600 v3, Intel has widened the integer unit to the same 256-bits. The result is faster performance on many integer codes, even on CPUs with lower clock speeds. For example, the integer performance of the 12-core 2.7GHz IvyBridge E5-2697 v2 lies roughly between the two Haswell processors the E5-2660 v3 (10-core, 2.6Ghz) and the E5-2670 v3 (12-core, 2.3GHz).

FMA

AVX 2.0 also features a new fused-multiply-add instruction. For codes that perform multiply and add instructions in short succession, FMA reduces the number of cycles in half. 2X the FLOPS for areas of code leveraging these instructions proves extremely consequential for math and science algorithms. Since floating point performance is most important to our customers, we discuss these improvements in more detail below.

Performance – Faster in Nearly Every Metric

Much like with the Sandy-Bridge generation of Xeons, Intel has plugged in a new architecture, improved memory performance, and increased core counts and clock speeds all at once.

Users generally should expect at least a 10% increase in performance per core, excluding the new instructions. Coupled with the memory change and new instructions, this means dramatic changes (SPEC CPU2006 benchmarks):

  • Xeon E5-2620 v2 to v3: 18% performance improvement
  • Xeon E5-2630 – E5-2697 v2 to E5-2630 – E5-2697 v3: between 22% and 29% performance improvement ¹
  • Xeon E5-2697 v2 to Xeon E5-2698 v3/E5-2699 v3: between 27% and 32% performance improvement ²

¹ Transitioning from the same number v2 SKU to v3 SKU for these models (ex: Xeon E5-2640 v2, to Xeon E5-2640 v3, 2.0 vs. 2.6Ghz) often bundles an increase in core count, clock speed, memory performance, and the architecture improvements. Performance increase stated represents the net gain of these factors. DDR4 memory might result in a higher system cost.

² These two new high-end Haswell processors have no equivalent IvyBridge SKU and thus enjoy the largest performance deltas.

Theoretical Performance and LINPACK

Below is a chart with the theoretical peak performance (FLOPS) of the new Haswell-EP (Xeon E5-2600v3) CPUs with the new instructions. If you look at the graph below, you’ll see that the Haswell E5-2630 v3 is roughly equivalent to the flagship IvyBridge E5-2697 v2 (whose performance suffers without the new instruction support).

Comparison between Xeon E5-2600 v3 vs Xeon E5-2600 v2 Theoretical Peak Performance when using FMA3 and AVX Instructions

 

Keep in mind, however, that that these are peak theoretical numbers; depending upon how much your applications can take advantage of FMA, the performance gains could be far lower (see our Detailed Specifications). The 20% – 30% increases mentioned earlier come from the SPEC CPU2006 benchmarks, which execute a suite of real world applications.

Another dramatic comparison is the Xeon E5-2697 v2 (2.7Ghz, 12-core) to the new Xeon E5-2699 v3 (2.3Ghz, 18 core) on LINPACK. The new model represents a 91% increase in performance. The main reason for this substantial improvement is the new AVX 2.0 instruction set, specifically FMA. The increase in core count also contributes.

Should you prefer the most apples-to-apples architecture comparison of Xeon E5-2697 v2 (2.7Ghz, 12-core) to the Xeon E5-2690 v3 (2.6Ghz, 12-core), there is a 54% increase in LINPACK performance.

 

Transitioning from “Ivy Bridge” E5-2600 v2 Series Xeons

Xeon E5-2600 v3 and Xeon E5-2600 v2 CPUs do not use the same CPU socket, and DDR4 does come with a cost premium. Some large installations may still find a price/performance argument for the Ivy Bridge CPUs, and a few platforms (e.g., complex Phi- & GPU-accelerated servers) will take time to transition to the new CPU socket.

However, end users who are willing to invest slightly more will find attractive new SKUs to leverage in their clusters, servers, and workstations. All new CPUs offer faster memory speeds and QPI transfers. Applications which effectively leverage the new FMA instructions should be able to achieve higher performance than flagship v2 CPUs using almost any of the v3 CPUs.

Comparisons of note (providing increased value for your dollar):

  • Xeon E5-2640 v2 transition to Xeon E5-2630 v3: same core count, faster clock speeds, faster memory; lower price
  • Xeon E5-2650 v2 transition to Xeon E5-2640 v3: identical core count, clock speed, and turbo boost speed yet costs are also lower
  • Xeon E5-2695 v2 and E5-2697 v2 transition to Xeon E5-2690 v3: provides similar base and turbo speeds and at a lower price.
  • Xeon E5-2695v2 and E5-2697 v2 transition to Xeon E5-2683 v3: for well-threaded applications able to accept a lower clock speed, the two extra cores in Xeon E5-2683 v3 will outperform at a much lower price

Nearly all processor transitions come at similar or lower costs on the CPU-side. Customers may choose to apply the savings towards their DDR4 memory capacity.

Further Grantley Platform Improvements

C610 Series Chipset

Some end-users found the earlier C600 chipset needed to be supplemented to meet their needs. Intel has added features that address many of these situations:

  1. SATA: Increase from 2 SATA3 + 4 SATA2 to at least 6 SATA3 ports
  2. USB: USB 3.0 support now native to the chipset, rather than board manufacturers adding a supplemental chip
  3. Ethernet: More common deployment of RJ45-based 10GigE; a new 40GigE controller (Fortville)

QPI Links

Intel’s Quick Path Interconnect link between the two CPU sockets now features faster speeds for every SKU:

  • Entry-level “Basic” CPUs at 7.2 GT/sec
  • Mid-level “Standard” CPUs at 8.0 GT/sec
  • Higher-end “Advanced”“High Core Count” & “Frequency Optimized” CPUs at 9.6 GT/sec

QPI allows for rapid access to memory on the non-local CPU socket.

Next Steps – Putting Xeon E5-2600 v3 into Production

As always, please contact an HPC expert if you would like to discuss in further detail. You may also wish to review our products which leverage these new Xeon processors:

For more analysis of the Xeon E5-2600 v3 processor series, please read:

Detailed Specifications of the Intel Xeon E5-2600v3 “Haswell-EP” Processors

Intel’s Xeon E5 Resource Page

Summary of Intel Xeon E5-2600 v3 Series Specifications

ModelStock FrequencyMax Turbo BoostCore CountMemory SpeedL3 CacheQPI SpeedTDP (Watts)
E5-2699 v32.30 GHz3.60 GHz182133 MHz45MB9.6 GT/s145W
E5-2698 v31640MB135W
E5-2697 v32.60 GHz3.60 GHz1435MB145W
E5-2695 v32.30 GHz3.30 GHz120W
E5-2683 v32.00 GHz3.00 GHz
E5-2690 v32.60 GHz3.50 GHz1230MB135W
E5-2680 v32.50 GHz3.30 GHz120W
E5-2670 v32.30 GHz3.10 GHz
E5-2687W v33.10 GHz3.50 GHz1025MB160W
E5-2660 v32.60 GHz3.30 GHz105W
E5-2650 v32.30 GHz3.00 GHz
E5-2667 v33.20 GHz3.60 GHz820MB135W
E5-2640 v32.60 GHz3.40 GHz1866 MHz8 GT/s90W
E5-2630 v32.40 GHz3.20 GHz85W
E5-2643 v33.40 GHz3.70 GHz62133 MHz9.6 GT/s135W
E5-2620 v32.40 GHz3.20 GHz1866 MHz15MB8 GT/s85W
E5-2637 v33.50 GHz3.70 GHz42133 MHz9.6 GT/s135W
E5-2623 v33.00 GHz3.50 GHz1866 MHz10MB8 GT/s105W

HPC groups do not typically choose Intel’s “Basic” and “Low Power” models – those skus are not shown.

The post Intel Xeon E5-2600 v3 “Haswell” Processor Review appeared first on Microway.

]]>
https://www.microway.com/hardware/intel-xeon-e5-2600-v3-haswell-processor-review/feed/ 2
Intel Xeon E5-4600 v2 “Ivy Bridge” Processor Review https://www.microway.com/hpc-tech-tips/intel-xeon-e5-4600v2-ivy-bridge-processor-review/ https://www.microway.com/hpc-tech-tips/intel-xeon-e5-4600v2-ivy-bridge-processor-review/#respond Tue, 04 Mar 2014 15:13:49 +0000 http://https://www.microway.com/?p=3604 Many within the HPC community have been eagerly awaiting the new Intel Xeon E5-4600 v2 CPUs. To those already familiar with the “Ivy Bridge” architecture in the Xeon E5-2600 v2 processors, many of the updated features of these 4-socket Xeon E5-4600 v2 “Ivy-Bridge” CPUs should seem very familiar. Read on to learn the details. Important […]

The post Intel Xeon E5-4600 v2 “Ivy Bridge” Processor Review appeared first on Microway.

]]>
Many within the HPC community have been eagerly awaiting the new Intel Xeon E5-4600 v2 CPUs. To those already familiar with the “Ivy Bridge” architecture in the Xeon E5-2600 v2 processors, many of the updated features of these 4-socket Xeon E5-4600 v2 “Ivy-Bridge” CPUs should seem very familiar. Read on to learn the details.

Important changes available in the Xeon E5-4600 v2 “Ivy Bridge” CPUs include:

  • Up to 12 processor cores per socket (with options for 4-, 6-, 8- and 10-cores)
  • Support for DDR3 memory speeds up to 1866MHz
  • AVX has been extended to support F16C (16-bit Floating-Point conversion instructions) to accelerate data conversion between 16-bit and 32-bit floating point formats. These operations are of particular importance to graphics and image processing applications.
  • Intel APIC Virtualization (APICv) provides increased virtualization performance
  • Improved PCI-Express generation 3.0 support with superior compatibility and new features: atomics, x16 non-transparent bridge & quadrupled read buffers for point-to-point transfers

Intel Xeon E5-4600 v2 Series Specifications

ModelFrequencyTurbo BoostCore CountMemory SpeedL3 CacheQPI SpeedTDP (Watts)
E5-4657L v22.40 GHz2.90 GHz121866 MHz30MB8 GT/S115
E5-4650 v22.40 GHz2.90 GHz1025MB95W
E5-4640 v22.20 GHz2.70 GHz20MB
E5-4627 v23.30 GHz3.60 GHz816MB7.2 GT/S130W
E5-4620 v22.60 GHz3.00 GHz1600 MHz20MB95W
E5-4610 v22.30 GHz2.70 GHz16MB

HPC groups do not typically choose Intel’s “Basic” and “Low Power” models – those skus are not shown.

More for Your Dollar – Performance Uplift

With an increase in core count, clock speed and memory speed, HPC applications will achieve better performance on these new Xeons. Depending on the choice of SKU, users should expect to see 10% to 30% performance improvement for floating-point applications (model-to-model) without spending more. Even greater speedups are possible by upgrading to the new 12-core Xeon E5-4657L v2:

  • Xeon E5-4620 transition to Xeon E5-4657L v2: 63% performance improvement
  • Xeon E5-4640 transition to Xeon E5-4657L v2: 50% performance improvement
  • Xeon E5-4650 transition to Xeon E5-4657L v2: 33% performance improvement

More for Less – Switch SKUs without a Performance Penalty

Rather than spending the same amount for more performance, some users may prefer to spend less to achieve the same performance they’re seeing today. Given the microarchitecture improvements in “Ivy Bridge,” you’re still likely to come out at least a few percent ahead at the same core count and clock speed.

Replacing Old Servers & Clusters

If your systems are a few years old, you may be able to replace several with a single new server. The AVX instruction set, introduced with the previous generation of Xeons, provides a solid 2X performance improvement by increasing the width of the math units from 128-bits to 256-bits. Combined with other improvements in Xeon E5-4600 v2, you will be able to achieve the performance of older systems using just a single core from the “Ivy Bridge” architecture.

Transitioning from “Sandy Bridge” E5-4600 series Xeons

Given the increased core counts & higher memory speeds, lower-end Xeon E5-4600 v2 processors may replace older Xeon E5-4600 processors with improved aggregate performance.

Rather than increasing clock speeds for all “Ivy Bridge” SKUs, Intel has decided to offer some of the E5-4600 v2 processors with more cores but at a slightly slower clock speed. Specifically, the eight-core, 2.3GHz E5-4610 v2 (vs the six-core, 2.4GHz E5-4610) and the ten-core, 2.2GHz E5-4640 v2 (vs the eight-core, 2.4GHz E5-4650). Increased core counts, improved memory speeds, and Turbo Boost capabilities nearly always result in superior server performance.

Comparisons of note include:

  • Xeon E5-4610 v2 delivers additional value over Xeon E5-4610:  the new CPU has 33% more cores and faster memory at the same cost, but also a slower clock speed (a disadvantage only for poorly-threaded applications)
  • Xeon E5-4620 transitions to Xeon E5-4610 v2: same core count, but a faster clock speed and faster memory at a lower cost
  • Xeon E5-4640 v2 delivers additional value over E5-4640: 25% more cores and faster memory at the same cost, but also a slower clock speed (a disadvantage only for poorly-threaded applications)
  • Xeon E5-4650 transitions to Xeon E5-4627 v2: Same physical core count, but faster memory and clock speed at a much lower price point. Caveats include a slower QPI speed and no hyperthreading, the latter of which is of lesser importance to HPC.

Intel’s strategy with these CPUs makes sense, since four-socket systems tend to run software that takes advantage of higher core count more than higher clock speed.  Sacrificing 100MHz or 200MHz for two extra cores is almost always going to be a very favorable exchange.

Surprising Benchmark Performance Results

Despite some of the caveats mentioned above, the performance results achieved so far have been quite impressive. Benchmark numbers for the industry-standard floating-point SPEC fp_rate2006 suggest that even the modest Xeon E5-4610 v2 CPUs will stand up against the best of the dual-socket “Ivy Bridge” Xeon CPUs and the best of the previous-generation “Sandy Bridge” quad-socket CPUs.

SPEC fp_rate2006 results for Xeon E5-4600v2 compared with Xeon E5-4600 and E5-2600v2

Considering that a quad-socket server equipped with E5-4610 v2 is equivalent to the price of a dual-socket server with E5-2697 v2 and considerably less than a server with E5-4650 CPUs, we expect great success for this product line.

Greater Memory Bandwidth

Similar to what Intel did with the Xeon E5-2600 v2 series, memory performance is boosted across the board with Xeon E5-4600 v2 (Ivy Bridge):

  • Entry-level “Basic” CPUs now support 1333MHz memory
  • Mid-level “Standard” CPUs now support 1600MHz memory
  • Higher-end “Advanced”, “High Performance” & “Frequency Optimized” CPUs now support up to 4 DIMMs per socket at 1866MHz (in select configurations)

This 16-20% memory performance uplift for Xeon E5-4600 v2 is a critical performance boost for memory-intensive applications.

Special Note, Xeon E5-4627 v2 for CFD, FEA, and Multiphysics

Many CFD, FEA, and Multiphysics applications prioritize clock speed (least threaded areas), core count (well threaded solvers), and most of all memory bandwidth at once. The Xeon E5-4627 v2 SKU pairs the memory performance and core count of a 4-socket system with a high base clock speed. Previously, customers had to sacrifice one or the other.

Microway thinks this will be a winning combination for users whose models exceed the memory capacity of a 2-socket system. We anticipate extended discussion of this SKU with users of these applications.

Conclusion

As always, please contact an HPC expert if you would like to discuss in further detail. Intel has produced an Intel Xeon E5-4600 v2 Product Brief that’s available on our Knowledge Base. You may also wish to review our products which leverage these new Xeon processors:

For more analysis of the Xeon E5-4600 v2 processor series, please read:
In-Depth Comparison of Intel Xeon E5-4600v2 “Ivy Bridge” Processors

The post Intel Xeon E5-4600 v2 “Ivy Bridge” Processor Review appeared first on Microway.

]]>
https://www.microway.com/hpc-tech-tips/intel-xeon-e5-4600v2-ivy-bridge-processor-review/feed/ 0
Intel Xeon E5-2600v2 “Ivy Bridge” Processor Review https://www.microway.com/hpc-tech-tips/intel-xeon-e5-2600v2-ivy-bridge-processor-review/ https://www.microway.com/hpc-tech-tips/intel-xeon-e5-2600v2-ivy-bridge-processor-review/#respond Tue, 10 Sep 2013 16:01:48 +0000 http://https://www.microway.com/?p=3122 With the introduction of Intel’s new Xeon E5-2600v2 processors, there are exciting new choices for HPC users. Overall, the Xeon E5-2600 series processors have provided the highest cost-effective HPC performance available to date. This new set of models builds upon that success to offer higher core counts and faster performance. Important changes available in E5-2600v2 […]

The post Intel Xeon E5-2600v2 “Ivy Bridge” Processor Review appeared first on Microway.

]]>
With the introduction of Intel’s new Xeon E5-2600v2 processors, there are exciting new choices for HPC users. Overall, the Xeon E5-2600 series processors have provided the highest cost-effective HPC performance available to date. This new set of models builds upon that success to offer higher core counts and faster performance.

Important changes available in E5-2600v2 “Ivy Bridge” include:

  • Up to 12 processor cores per socket (with options for 4-, 6-, 8- and 10-cores)
  • Support for DDR3 memory speeds up to 1866MHz
  • Improved PCI-Express generation 3.0 support with improved compatibility and new features: atomics, x16 non-transparent bridge & quadrupled read buffers for point-to-point transfers
  • AVX has been extended to support F16C (16-bit Floating-Point conversion instructions) to accelerate data conversion between 16-bit and 32-bit floating point formats. These operations are of particular importance to graphics and image processing applications.
  • Intel APIC Virtualization (APICv) provides increased virtualization performance

More for Your Dollar – Performance Uplift

With an increase in core count, increase in clock speed and increase in memory speed, HPC applications will achieve better performance. It depends upon the exact CPU model, but users should expect a 13% to 22% performance improvement for floating-point applications (model-to-model). Even greater speedups are possible with the 12-core models:

  • Xeon E5-2670 transition to Xeon E5-2695v2: 32% performance improvement
  • Xeon E5-2680 transition to Xeon E5-2695v2: 30% performance improvement
  • Xeon E5-2690 transition to Xeon E5-2695v2: 24% performance improvement

The single exception is the Xeon E5-2620v2, which appears to offer only a 1% improvement above the E5-2620.

More for Less – Switch SKUs without a Performance Penalty

Rather than spending the same amount for more performance, some users may prefer to spend less to achieve the same performance they’re seeing today. Given the microarchitecture improvements in “Ivy Bridge”, you’re still likely to come out at least a few percent ahead.

Replacing Old Workstations, Servers & Clusters

If your computers are a few years old, you may be able to replace several with a single new computer. The AVX instruction set, introduced with the previous generation of Xeons, provides a solid 2X performance improvement by increasing the width of the math units from 128-bits to 256-bits. Combined with other improvements in Xeon E5-2600v2, you will be able to achieve the performance of older systems using just a single core from the “Ivy Bridge” architecture.

Transitioning from “Sandy Bridge” E5-2600 series Xeons

Given the increased core counts & memory speed in this latest series, lower-end Xeon E5-2600v2 processors may be swapped in for older Xeon E5-2600. Comparisons of note include:

  • Xeon E5-2640 transition to Xeon E5-2630v2: same core count and clock speed; faster memory
  • Xeon E5-2650 transition to Xeon E5-2640v2: identical core count, clock speed and memory speed
  • Xeon E5-2660 and E5-2665 transition to Xeon E5-2650v2: same core count, but a faster clock speed
  • Xeon E5-2670 transition to Xeon E5-2650v2: identical core count, clock speed but faster memory
  • Xeon E5-2680 transition to Xeon E5-2670v2
    Xeon E5-2690 transition to Xeon E5-2680v2
    Substantial performance improvements along with useful reductions in wattage

Nearly all these processor transitions come at similar or lower costs. We often recommend customers apply the savings towards more nodes or a desired upgrade, such as additional memory and storage.

Larger Memory Bandwidth

Intel builds upon the excellent memory performance of Xeon E5-2600 series CPUs with Xeon E5-2600v2 (Ivy Bridge). Memory performance is up for every CPU sku:

  • Entry-level “Basic” CPUs now support 1333MHz memory
  • Mid-level “Standard” CPUs now support 1600MHz memory
  • Higher-end “Advanced”, “High Core Count” & “Frequency Optimized” CPUs now support up to 4 DIMMs per socket at 1866MHz (in select configurations)

That’s a 16-20% memory performance uplift for Xeon E5-2600v2, and it’s a serious bump for memory intensive applications.

Improvements to PCI-Express generation 3.0

Although the “Sandy Bridge” architecture provided support for PCI-E gen 3, not all devices were supported. Certain network/interconnect cards and GPUs did support full-speed transfers, but a few exhibited compatibility issues. Additionally, some vendors held back their gen 3 devices until the wrinkles were smoothed out.

Now that the server “Ivy Bridge” products are launching, we can expect to see much broader adoption. This will be extremely beneficial for intensive HPC applications, as the performance boost from PCI-E gen 2 to gen 3 is typically 2X. In practice, that’s a jump from 5.6GB/s to 11.2GB/s (at the application level) for PCI-Express x16 devices.

Intel Xeon E5-2600v2 Series Specifications

ModelFrequencyTurbo BoostCore CountMemory SpeedL3 CacheQPI SpeedTDP (Watts)
E5-2697v22.70 GHz3.50 GHz121866 MHz30MB8 GT/s130W
E5-2695v22.40 GHz3.20 GHz115W
E5-2690v23.00 GHz3.60 GHz1025MB130W
E5-2680v22.80 GHz3.60 GHz115W
E5-2670v22.50 GHz3.30 GHz
E5-2660v22.20 GHz3.00 GHz95W
E5-2650v22.60 GHz3.40 GHz820MB
E5-2640v22.00 GHz2.50 GHz1600 MHz7.2 GT/s
E5-2687Wv23.40 GHz4.00 GHz1866 MHz25MB8 GT/s150W
E5-2667v23.30 GHz4.00 GHz130W
E5-2630v22.60 GHz3.10 GHz61600 MHz15MB7.2 GT/s80W
E5-2620v22.10 GHz2.60 GHz
E5-2643v23.50 GHz3.80 GHz1866 MHz25MB8 GT/s130W
E5-2637v23.50 GHz3.80 GHz415MB

HPC groups do not typically choose Intel’s “Basic” and “Low Power” models – those skus are not shown.

Conclusion

As always, please contact an HPC expert if you would like to discuss in further detail. You may also wish to review our products which leverage these new Xeon processors:

For more analysis of the Xeon E5-2600v2 processor series, please read:
In-Depth Comparison of Intel Xeon E5-2600v2 “Ivy Bridge” Processors

The post Intel Xeon E5-2600v2 “Ivy Bridge” Processor Review appeared first on Microway.

]]>
https://www.microway.com/hpc-tech-tips/intel-xeon-e5-2600v2-ivy-bridge-processor-review/feed/ 0
Achieve the Best Performance: Intel Xeon E5-2600 “Sandy Bridge” https://www.microway.com/hpc-tech-tips/achieve-the-best-performance-intel-xeon-e5-2600-sandy-bridge/ https://www.microway.com/hpc-tech-tips/achieve-the-best-performance-intel-xeon-e5-2600-sandy-bridge/#comments Fri, 13 Apr 2012 17:45:13 +0000 http://https://www.microway.com/hpc-tech-tips/?p=136 Intel has once again done an excellent job designing a high-performance processor. The new Xeon E5-2600 “Sandy Bridge EP” processors run as much as 2.2 times faster than the previous-generation Xeon 5600 “Westmere” processors. Combined with new Xeon server/workstation platforms, they will be extremely attractive to anyone with computationally-intensive needs. The new Intel architecture provides […]

The post Achieve the Best Performance: Intel Xeon E5-2600 “Sandy Bridge” appeared first on Microway.

]]>
Intel has once again done an excellent job designing a high-performance processor. The new Xeon E5-2600 “Sandy Bridge EP” processors run as much as 2.2 times faster than the previous-generation Xeon 5600 “Westmere” processors. Combined with new Xeon server/workstation platforms, they will be extremely attractive to anyone with computationally-intensive needs.

The new Intel architecture provides many benefits right out of the box, while others may require changes on your end. Read on to make sure you’re achieving the best performance.

Intel Advanced Vector Extensions (AVX) Instructions

One of the largest performance improvements, as far as HPC is concerned, is AVX. Intel AVX accelerates vector and floating point computations by increasing maximum vector size from 128 to 256 bits. Essentially, the floating point capability of Intel processors has been doubled. Very exciting, but some work is required to take advantage of this improvement.

Your current applications will run on the new processors, but they will only use the first 128 bits. In most cases, all that’s required is re-compiling your application(s). However, you’ll need to use a compiler which supports the new AVX instructions. Additionally, the operating system needs support for the 256-bit wide unit.

For the operating system, you’ll need Linux kernel version 2.6.30 or later (or a vendor who has backported the features to their kernel, such as Red Hat). Windows users will need Windows 7 SP1 or Windows Server 2008 R2 SP1.

These are the best compiler options currently available:

  • Intel Composer XE (or the older Intel Compiler Suite version 11.1)
  • GCC version 4.6 or later
  • The Portland Group compiler 2011 version 11.6 (newer versions include further enhancements)
  • Microsoft Visual Studio 2010

PCI-Express generation 3.0 (Integrated I/O)

This is a major feature which comes for free on all of Microway’s new Intel Xeon systems. Having support for gen 3 PCI-Express will be highly desirable when the new Intel MIC and NVIDIA Tesla compute processor products are released. There is a ~2X bandwidth improvement between PCI-E generations 2 and 3, so anyone installing a gen 3 device in a gen 2 platform will be sacrificing significant performance.

Furthermore, Intel built the PCI-Express controller into the CPU itself. This removes one hop between the host and the PCI-E device, reducing latency by ~30%. Initial reports suggest that this change improves application performance by 10+%, even for PCI-E gen 2 devices!

Memory Speed and Capacity

HPC experts know that getting data to the processor is one of the most common bottlenecks. Improvements to memory are always welcome, and there are several in the new architecture. First, peak memory clock speeds have been increased to 1600MHz. Second, the older triple-channel controller has been replaced with a quad-channel controller. This allows for faster access to memory and a larger number of DIMMs (up to 24, depending upon the platform). Third, L3 cache sizes have been increased to 20MB.

However, not all of the new processors feature the fastest options so you have to make a choice of which model to purchase. There are three distinct performance levels:

  • Basic @ 1066MHz (10MB L3 cache)
  • Standard @ 1333MHz (15MB L3 cache)
  • Advanced @ 1600MHz (20MB L3 cache)

Given the slower performance, we do not recommend Basic models.

Turbo Boost 2.0

Turbo Boost allows the processor frequency to be temporarily increased as long as the processor is running within its power and thermal envelopes. This capability is enabled by default and is managed automatically by the CPU hardware. You don’t have to do anything to take advantage of the speedup, but understanding Turbo Boost behavior is useful.

When only a few cores of a multi-core chip are in use, the clock speeds of those cores are boosted significantly. When more cores are in use the clock is still boosted, but there is less margin for increases. With all cores in use it’s still possible to see a boost, but the increases will be smaller. Each boost level is 100MHz, but total turbo boost capacity varies from model to model. For the Standard and Advanced processor models, the boost levels range from 300MHz or 400MHz (when all cores in use) to as high as 800 or 900MHz (when only a single core is in use). The Basic models have essentially no boost capability.

According to Intel, processors with Turbo Boost 2.0 enter boost mode more frequently and stay there longer than previous models. Note that processors with Turbo Boost 2.0 may operate above TDP for short periods of time to maximize performance.

Quick Path Interconnect (QPI)

QPI provides communication between the processor sockets. In addition to higher clock speeds, the new Xeon platforms introduce a second link between sockets (more than doubling the potential communication between processor sockets). This provides significant benefits for multi-threaded/parallel applications which send large quantities of data between threads.

Additionally, there are other conditions during which the QPI links are used. For example, the second CPU may require access to memory or a PCI-Express device which is physically connected to the first CPU. All this communication will also pass across the QPI links – two fast buses reduce the likelihood of bottlenecks.

Much like the memory speed improvements, QPI speed varies by processor model. There are three distinct performance levels:

  • Basic @ 6.4 GT/s
  • Standard @ 7.2 GT/s
  • Advanced @ 8.0 GT/s

Refer to the processor SKU table above for complete details.

Hyperthreading

Hyperthreading has long been a part of the Intel processor designs. However, it has rarely shown benefit for computationally intensive applications. It doesn’t provide faster access to data or a larger number of math units, it simply allows additional threads to be in flight at the same time.

You will have to test your application to be certain it offers any benefit. With Hyperthreading enabled, the operating system will see twice as many processor cores as are actually in the hardware. You’ll want to run test jobs on both the real and virtual numbers of cores. Then disable Hyperthreading and run a test again using one thread for each real/physical processor core (Hyperthreading may be disabled from the BIOS). Typically, we do not see dramatic performance differences.

Conclusion

Overall, Microway’s Intel Xeon E5-2600 based workstations, servers and clusters provide many benefits out-of-the-box. Improvements to memory bandwidth, cache and QPI speeds don’t require any special changes on the part of the users, but careful analysis must be made during the purchasing process to choose the best option. HPC users will need to recompile their applications to take advantage of the 2X performance boost made possible by the AVX extensions. Those planning to use high-performance add-on cards, such as GPUs and MIC, should choose these new Xeon platforms to ensure the lowest-latency, highest-bandwidth path between the compute units and the host.

The post Achieve the Best Performance: Intel Xeon E5-2600 “Sandy Bridge” appeared first on Microway.

]]>
https://www.microway.com/hpc-tech-tips/achieve-the-best-performance-intel-xeon-e5-2600-sandy-bridge/feed/ 2