New Mac Studio M2 -...
 
Share:
Notifications
Clear all

15th Feb 2024: Astro Pixel Processor 2.0.0-beta29 released - macOS native File Chooser, macOS CMD-Q fixed, read-only Fits on network fixed and other bug fixes

7th December 2023:  added payment option Alipay to purchase Astro Pixel Processor from China, Hong Kong, Macau, Taiwan, Korea, Japan and other countries where Alipay is used.

 

New Mac Studio M2 - benchmarking vs my old Mac Pro 2013 12-core D700 (and beta 26 vs 28 side note)

3 Posts
2 Users
0 Likes
192 Views
(@xthestreams)
Red Giant
Joined: 4 years ago
Posts: 39
Topic starter  

I was fortunate enough be able to get my hands on a Mac Studio  and so I thought I woudl test it against my old Mac Pro 2013 to get a sense of the new speed, with interesting results - especially with the impact of the move from beta 26 to 28

First up - the data source I chose was the NGC 292 training data set supplied by APP, so everyone can play the same game and we can compare results. The only change I made to the data set was to move all the flats to one directly to speed up the process of loading them into APP. I left all settings as "defaults" - but unless there's an easy way to share my setup details, I cant easily enable others to replicate them - so this is something that would have to be worked through for a benchmark.

Test environment - I tested each system using both my NAS and local SSD to get a sense of "optimal" (SSD) versus my normal operating mode (NAS) to get a sense of how much of a difference it makes. In the case of the Mac Pro I thought I'd also test using a RAM drive with beta 28 to see how much of the performance was CPU versus I/O bound (with some interesting results).

The Mac Pro is a 2013 with a 12-core CPU, 64GB RAM and a 2TB Samsung NVMe drive - this is my "daily driver" and had beta 26 installed (something I didnt notice until later) - the Mac Studio is the M2 Max (the base 32GB RAM) model. The NAS is a Synology DS1621+ with 6 drives in SHR, dual 100MB/sec NICs and dual M.2 SSD read-write cache in RAID1  

First - the NAS benchmark

  • MacPro 2013 - 6:09
  • Mac Studio -    4:05 - 33% reduction, not too shabby!

Next the SSD benchmark

  • Mac Pro  - 4:18
  • Mac Studio 2:50 - 23% reduction in time?!

Hang on this doesn't make sense, surely we're I/O bound - time to quickly benchmark I/O (using Blackmagic write speed test - all GB/sec)

                       MP2013          M2Max
NAS                      0.02             0.10  
SSD                      1.30             5.10             
RAM                      3.60             N/A  <- did not test 

WOW, the Mac Pro seems to suck at NAS I/O, which surprises me as I am certain I've managed to get line-rate from it before, will do some digging to work out what the problem might be, but it helps explain why the relative speed increase going to the Studio was higher with NAS than SSD. The  - I thought I'd try and level the playing field a little by adding a RAM drive to more closely match the Studio's raw I/O - the results were surprising (see below).

It's also at this point in the process I realised that in my hurry to test the new machine I had installed the latest Silicon version of APP (beta 28) and the Mac Pro was running beta 26 - so also re-ran teh benchmarks with beta 28 all 'round, here's the (surprising) results)

NAS

  • MacPro 2013 - 7:04 <- roughly 15% slower on beta 26
  • Mac Studio - 4:05

SSD 

  • Mac Pro 2013 - 4:14  <- approx 13% slower vs beta 26
  • Mac Studio    -  2:50 <- 33% less time compared to Mac Pro 

RAM Drive - just for fun I also ran the Mac Pro 2013 with a 15GB RAM drive just to see if, when I/O was removed as a bottleneck, that the Mac Pro 2013 might actually be able to compete with it's younger cousin. No performance gain - which leads me to conclude the 2013 has become CPU bound. A quick check of RAM disk read/write performance on the Studio (16GB/sec and 5GB/sec respectively) made me think it's probably not worth bothering to test the M2 Max.  

In all cases the D700 GPUs sat around doing nothing (as far as I can tell), is there a reason why we dont see them being used (the M2 Studio's GPU does however - albeit at just 50%) - could the Mac Pro 2013 have dormant performance?

I'm going to continue to do some digging around, but I would love to see some comparisons with other machines on the same data

Long story made short, the Apple Silicon Mac Studio offers a nice performance bump over a machine 10 years older than it, but if I am honest, 33% speed improvement is a little disappointment considering that one costs less than $800 these days and the other is $4000. Factor in that the internal storage is 3x as fast on the Studio and you have to wonder if Silicon is the speed demon Apple makes it out to be. (don't me wrong, for 4K video editing work, the Mac Pro falls to pieces by comparison, but my goal was to benchmark APP performance)

Now if you're like me and have DEEP integrations with hundreds of frames of data and you're committed to using a NAS for your source data, then the nearly 50% reduction in time is huge, and I imagine it will be even better when I move to a 10GB NIC in the Synology.

The other really big surprise was the 13-15% performance hit moving to beta 28 - I am sure the APP team can shed more light on this, but it also serves to remind me that you cant always assume newer software will be faster/better!

I'd love to know how these results compare to others using the same data set to see whether my results are outliers or the result of my settings? My dream machine is/was always a dual 16-core (32-core) HP Z820/640/820/840 - I know a few folks have them on here, getting a comparison with the same data would be illuminating. (with OpenGL both on and off to remove GPU effects).

Clear skies!

 


   
ReplyQuote
(@mabula-admin)
Universe Admin
Joined: 7 years ago
Posts: 4366
 

Hi Paul @xthestreams,

Wow, that has been some extensive testing... 🙂

Related to another topic, yesterday I ran benchmarks with beta28 on my windows 10 main development computer to compare it's performance with older beta's like beta26. On my windows 10 system beta28 is exactly as fast as beta26. And I have no reason to suspect otherwise, the internal memory controls are the same and between beta26 and beta28 there has not been an improvement where we optimized a part of APP to speed that up. 

Maybe you can test again between beta26 and beta28 and double check if all the settings are equal while processing? If you still get different results, it must be something in macOS I would think, but that would be very weird. Which macOS version are you using?

Regarding the RAM drive, what I can say is that if i use ram drive inside our development platform with appropriate read/write code, things are much faster when data is integrated which is a clear IO bound task. If you let APP do it's work on an external ramdisk with read/write operations for regular hardddisks like we have implemented, then you will be limited still by our code for normal harddisks not utilizing really the ramdisk. It is very technical, but I know this from testing myself.

I am also surprised that you only get so little performance gain with a M2 mac compared to your old macPRO 2013... how many cpu threads does APP show for the macPro 2013 and how many for the M2 mac? The amount of available threads is a huge factor still in performance.

My 2017 AMD Threadripper 1950x machine on which I develop has 16cores/32 threads and overall still quite a bit faster than a 2021 M1 mac with 8 cores/16 threads when looking at multi-core operations. But if we look at single core operations, the M1 mac is much faster as you would expect. Oh, I must point out, this AMD Threadripper was so easy to overclock... the factory setting is 3400 MHz per core, but I can run it without issue and air cooling stable on 3800 mHz which is 10% boost... which i have been doing since I purchased the machine.

My point here is, that the amount of cpu threads weighs heavily in these comparisons.

The SSD speed also influences things, but it weighs less heavy when you are comparing SSDs, the critical part is the latency for all the IO operations. So better take a SSD with the lowest latency 😉 THis is the reason why processing over the NAS will be much slower, the latency over the network is the cause.

Hope this clarifies things a bit?

Maybe, tomorrow I can share results from my system? I have another dataset that I use for benchmarking between APP versions, maybe I can share that as well for you to test?

Mabula

 

 


   
ReplyQuote
(@xthestreams)
Red Giant
Joined: 4 years ago
Posts: 39
Topic starter  

Thanks you for the considered reply Mabula - I have to admit that once I got started, it's quite addictive.

Let me see if I can handle these in rough order;

Beta 26 - beta 28, I will re-run and to confirm my numbers, I will also try the same test with the M2 Silicon

RAM Drive - this is probably the most surprising, I am not going to waste too much time on it, but it does make me scratch my head!

Intel 12-core vs M2 Max - personally I am not super surprised, I have long suspected that raw CPU performance hasn't increased THAT much over the last 10 years. Certainly there are more cores, dedicated silicon for handling SIMD tasks and encoding/decoding, GPUs etc, but given that clock speeds seem to be the upper boundary on modern CPUs and the IPCs for a single CPU, the idea that an older 12-core (24 thread) vs a modern 10-Pcore+2-Ecore CPU would roughly clock in at a roughly 30% reduction in time doesn't completely shock me - again if this was a 4K encoding job, we'd be looking at 10x-100x difference - but this is CPU bound stuff.

And in answer to your question, the CPUs on the 2013 are all running at near 100% for most of the integration.

Threadripper - I have been thinking about getting one just for APP, VERY cool. In my ideal world, APP would be able to "detect" APP compute servers on the LAN and then send the job ot the compute server, leaving me to work on my macOS system for setup, viewing, etc - I know I could always VNC into the Linux box, but I like the idea of distributed computing (I am an old man and still believe in THAT dream).

Interesting to learn about the I/O penalty for latency. You have inspired me to run more tests using different NAS protocols (such an ISCSI) and SFP versus 10GbT as the latency on SFP is marginally better. That's for another day.

I woudl LOVE To see you results and if you have a benchmarking data set and procedure/protocol you use I think that woudl be awesome. A thread inside the forum for benchmarks would be interesting/helpful to me and others shopping around for a machine I would think.  

GPU - this wasn't addressed, why are the GPUs on the 2013 sitting idle when the GPU on the Studio is working at 50% (in fact, why isn't the GPU doing ALL the work?)

 Take care Mabula and APP team, really love where the product is going, you are doing wonderful work!


   
ReplyQuote
Share: