64 Cores of Rendering / The AMD Threadripper Pro 3995WX Review 2021?

 64 Cores of Rendering:The AMD Threadripper Pro 3995WX Review 2021

64 Cores of Rendering / The AMD Threadripper Pro 3995WX Review 2021?


Knowing your market is key fundamental of the product planning, marketing, and the distribution. There are no point creating product with no market, or finding you have-something a amazing but offer to the wrong sort of customers. When AMD started offering high cores count Threadripper processors, the one market took as many as they could get was the graphics design business a visual effects companies and those focused on a rendering loved the cores count, the memory support, all the PCIe lanes, and the price. But if there is one thing more performance brings, it is the desire for the even more performance. Enter Threadripper Pro...


computational graphics goes brrrr....

There are number of industries that, when are looking from the outside an enthusiast might assume that using CPUs is probably old - fashioned the question is asked as why hasn’t that industry moved fully to using GPUs accelerators? One the big ones is machine learning despite the push to dedicated-machine learning hardware and lots of the big businesses doing ML on GPU, most machine learning today is still done on the CPUs. The same is still true with graphics and the visual effects.


The reason behind this is typically comes down to  software packages in use, and the programmers in charge...


Developing software for CPU, is easy because that is what the most people are trained on. Optimization packages for CPU are the well established, and even for upcoming-specialist instructions, these can be developed in simulated environments. A CPU is the designed to handle almost anything thrown at it, even super bad code.


By contrast GPU compute is harder. It isn’t as a difficult as it used to-be, there are wide arrays of libraries that enable GPU compilation without having to know too much about how to program for the GPU, however the difficulty lies in the architecting the workload to take advantage what a GPU has to-offer. A GPU is massive engine that performs the same operation to the hundreds of parallel threads at same time – it also has very small cache and accesses to GPU memory are long-so that latency is hidden by having even more threads in the flight at once. If the compute a part of the software is not amenable to the sort of that workload, such as a being structurally more the linear then spending 6-months redeveloping for - GPU is wasted effort. Or even if math works out better on GPU, trying rebuild 20 year-old codebase or older for GPUs still requires a substantial undertaking a group of experts.


GPU compute coming on leaps and bounds ever since I did it in late 2000. But the fact remains is that still a number of industries that are mix of CPU-GPU throughput. These include machine learning, oil and gas, financial, medical, and the one we are focusing on today visual effects.

A visual effects design and the rendering workload is complex mix of the dedicated software platforms and plugins. Software like a Cinema4D, Blender, Maya, and others. The GPU to the showcase a partially rendered scene for the artists to work on in real time, also relying on the strong single core performance, but the bulk of computer for the final render will depend on what plugins are-being used for that particular product. The Some plugins are GPU accelerated, such as a Blender Cycles, and the move to more GPU accelerated workloads is taking its time of ray tracing accelerated design is an area that is getting lot of GPU attention, for example..

There are always a question as to which method produces the best image  there is no point using a GPU to the accelerate rendering time if it adds additional noise or reduces of the quality. A film studio is more than like to prioritize a slow higher quality render on the CPUs than a fast noisy one on GPUs, or alternative, render lower resolution image and then the upscale with trained AI. Based on our conversations with OEMs that the supply the industry, we have been told that a number of studios will outright say that the rendering their workflow on CPU is only way they do it. The other angle is memory, as right CPU can have 256GB to 4TB of DRAM availables, where as the best GPU can only supply 80GB and those are the super expensive ones.

The point making here is that VFX studios still prefer CPU computer, and the more of better. When AMD launched his new Zen based processors, particularly the 32 and 64 cores count model, these were immediately earmarked as potential replacements for the Xeons being used in these VFX studios. AMD parts prioritized FP computer, key element in VFX design, and having double the cores per-socket was also a winner, combined with large amount of cache-per-core. This latter part meant that even though the first high cores count parts had a non uniform memory architecture, it was as much of an issue as with the some other computer processes.

A number of the VFX companies as far-as we understand focused on AMD Threadripper platform over the corresponding "EPYC". When both of the parts first arrived to market, it was very easy for VFX studios to invest in under the desk workstations as built on Threadripper, while the EPYC was more for server rack installations and not so much for the workstation. Roll around to Threadripper 3000, and EPYC 7002, and now there are 64-cores, 64 PCIe 4.0 lanes, and lots of the choice. VFX studios still went for the Threadripper, mostly due to offering the higher power 280 Watt in something that could easily be sourced by system integrators like Armari that specialize in high computer under desk systems.  They also asked the AMD for more.



AMD's has now rolled out it Threadripper Pro platform, addressing the some of these requirements. While VFX is always core computer focused, the TR Pro now gives double PCIe lanes, double the memory bandwidth, support for the up to 2TB of memory, and Pro level admin support. These PCIe lanes could be extended to local storage (always important in the VFX) as well large RAMDisk, and admin the support through DASH helps keep the companies systems managed together appropriately. AMD Memory Guard is also in its Pro line of the parts, which is designed to enable full memory encryption..


Beyond VFX, AMD's has cited world leadership computer with TR Pro for product engineering with Creo, 3D-visualization with KeyShot, model design in the architecture with Autodesk Revit, and data science, such as oil and the gas dataset analysis, where the datasets are growing into hundreds of GB's and require substantial computer support.


The Threadripper Pro vs Workstation EPYC (WEPYC)

Looking at the benefits that the new processors provide, it’s clear to see that they are more Workstations style EPYC parts than ‘enhanced’ Threadrippers. Here is a breakdown:

AMD Zen 2 High-End Comparison
AnandTechThreadripperThreadripper
Pro
Enterprise
EPYC
Cores32-6412-648-64
1P FlagshipTR 3990XTR Pro 3995WXEPYC 7702P
MSRP$3990$5490$4425
TDP280 W280 W200 W
Base Freq2900 MHz2700 MHz2000 MHz
Turbo Freq4300 MHz4200 MHz3350 MHz
SocketsTRX40sTRX4: WRX80SP3
L3 Cache256 MB256 MB256 MB
DRAM4 x DDR4-32008 x DDR4-32008 x DDR4-3200
DRAM Capacity256 GB2 TB, ECC4 TB, ECC
PCIe4.0 x56 + chipset4.0 x120 + chipset4.0 x128
Pro FeaturesNoYesYes

To get these new parts start from EPYC, all AMD's  had to do was raise the TDP to 280 Watt, and cut the DRAM support. If we start from Threadripper base, there are 3-4 substantial changes. So why this called Threadripper Pro, and not Workstation-EPYC?


We come back to VFX studios again. Having already bought in to Threadripper branding and the way of thinking, keeping these parts as a Threadripper helps smooth that transition this vertical had kind of already said they preferred Threadripper is over EPYC, what we are told, and so keeping the name consistent means that there is no real re-education to do.

The other element is that EPYC processor line is some-what fractured: there are standard versions, high performance-H models, high frequency-F models, and then series of custom designs under B, V, and others for specific customers. By keeping this new line as a Threadripper Pro, it keeps it is all under one umbrella.

Threadripper Pro Offerings: 12 cores to 64 cores

AMD Ryzen Threadripper Pro
AnandTechCoresBase
Freq
Turbo
Freq
ChipletsL3
Cache
TDPPrice
SEP
3995WX64 / 128270042008 + 1256 MB280 W$5490
3975WX32 / 64350042004 + 1128 MB280 W$2750
3955WX16 / 32390043002 + 164 MB280 W$1150
3945WX12 / 24400043002 + 164 MB280 W*
*Unsure if this is a special OEM model

AMD's announced these processors in the middle of last year, along with Lenovo Thinkstation P620 as being the launch a platform. From my experience, the Thinkstation line is very well design, and we are testing our 3995WX in a P620 today.

When TR Pro was announced with the Lenovo, we were not sure if any other OEM would have access to the Threadripper. When we asked OEMs earlier in that year about it, before we even know if TR Pro was a real thing, they started that AMD had not even marked the platform on their roadmap, which we reported at time. We have since learned that Lenovo had 6-month exclusive, and information was only supplied to the other vendors (ASUS, GIGABYTE, Supermicro) after it had been announces.



To that end, AMD's has since announced that Threadripper Pro coming to retail, both for other OEMs to designed systems, or for end users to build their own. Despite using the same LGA-4094 socket as other Threadripper and EPYC processors, TR Pro will be locked down to the WRX80 motherboards. We currently know of three, such as a Supermicro and GIGABYTE models, plus we have also had the ASUS Pro WS WRX80E-SAGE SE Wi-Fi model in the house for a short hands on, although we were not able to test it.



Of the 4 processors listed above, the top 3 are going on sale. It’s worth noting that only  64 cores comes with 256MB of L3 cache, while the 32 cores comes with 128MB of L3. AMD's has kept that these chiplet designed only use as many chipsets as absolutely necessary, keeping L3 cache per core consistent as well as the 8 cores is per chiplet (the EPYC product line varies the a bit).


The 4th processor, the 12 core, would appear to be an OEM only specific processor for the prebuilt systems.


Threadripper Pro versus The World,

These Threadripper Pro offerings are design to compete against two segments: first is AMD's themselves, showcasing anyone who is using high end professional system built on first generation Zen hardware that there a lot of performance to be had. The second is against Intel workstations customers, either using single socket Xeon W (which top out at 28 core), or a dual socket Xeon system that costs more or uses lot more power, just because it is dual socket, but also has non uniform memory architecture.


We have almost all these in this test (we do not have the 7702P, but we do have the 7742), and realistically these are the only processors that should be considered if the 3995WX is option for you:

3995WX Comparison Offerings
AnandTechCoreSEP1P
2P
TDPBase
Freq
Peak
Freq
DDRPCIeDDR
Cap
TR Pro 3995WX64C$54901P280W270042008x3200128x 4.02 TB
TR 3990X64C$39901P280W290043004x320064x 4.0¼ TB
EPYC 7702P64C$44251P200W200033508x3200128x 4.04 TB
EPYC 774264C$69502P225W225034008x3200128x 4.04 TB
Xeon 6258R28C$39502P205W270040006x293348x 3.01 TB
Xeon W-3175X28C$29991P255W310043006x293348x 3.0½ TB

Intel's tops out at 28 cores, and there no getting around that. Technically Intel has the AP processor line that goes up to 56 cores, however these are specialist systems and we have not had one physically sent to us for testing. Those are also $20k+ per CPU, and two CPUs in the same system bolted under one package.

The AMD's comparison points are the best Threadripper option and best available EPYC Processor, albeit the 2P version. The best comparison here would be 7702P, the single socket variant and much more price competitive, however we have not got this in for testing, instead we have AMD EPYC 7742, which is dual socket version but slightly higher performance.

Test Setup
AMD TR ProTR Pro
3995WX
Lenovo
Thinkstation
P620
BIOS
S07K
T0EA
Lenovo CustomKingston
8x16 GB
DDR4-3200 ECC
AMD TRTR 3990XMSI
Creator
TRX40
BIOS
1.50
Thermaltake
280mm AIO
Corsair
4x8 GB
DDR4-3200
AMD
EPYC
EPYC 7742Supermicro H11DSIBIOS
2.1
Noctua
NH-U14S
TR4-SP3
SK Hynix
16x32 GB
DDR4-3200
ECC
Intel
Xeon
Xeon Gold 6258RASUS ROG
Dominus
Extreme
BIOS 0601Asetek
690LX-PN
SK Hynix
6x32 GB
DDR4-2933
ECC
Xeon W-3175XDDR4-2666
ECC
GPUSapphire RX 460 2GB (CPU Tests)
PSUVarious (inc. Corsair AX860i)
SSDCrucial MX500 2TB
Silverstone SST-FHP141-VF 173 CFM fans also used. Nice and loud.

We must thank the following companies, for the kindly providing hardware for our multiple test beds. Some of the hardware is not in this test bed specifically, but is used in other testing..




Comments