Hacker News new | past | comments | ask | show | jobs | submit login
M1 Ultra Geekbench Score (geekbench.com)
205 points by carlycue on March 8, 2022 | hide | past | favorite | 173 comments



Impressive numbers. If you can't load the page, it scores 1793 single core / 24055 multi-core

For perspective, AMD's top consumer part is a 5950X, which scores 1686 single-core / 16565 multi-core (reference https://browser.geekbench.com/processors/amd-ryzen-9-5950x).

However, the 5950X really is a consumer CPU and can be purchased for $600 right now, so you could build 2-3 5950X PCs for the price of a single M1 Ultra Mac Studio.

A better comparison would be the Zen 3 based Threadripper Pro parts that were announced today: https://www.anandtech.com/show/17296/amd-announces-ryzen-thr...

There is supposedly a leaked AMD 5975WX (32-core Zen 3) score on Geekbench from a few months ago. It performs very similarly to the M1 Ultra: https://browser.geekbench.com/v5/cpu/compare/10531340?baseli...

The M1 Ultra is an impressive part, but the $4K price of entry is steep. On the other hand, if you need a Mac for Mac software then none of this matters and you're going to buy whatever Apple offers, so it's great to have something like this available.


And here are the numbers for the i9-12900k - another comparison people are likely to be interested in.

edit: It appears the numbers I posted below are an outlier?

A better average result might be 1990 single-core 17595 multi-core

See: https://i.imgur.com/FebpFR7.png

2740 single-core 25906 multi core

https://browser.geekbench.com/v5/cpu/compare/10820302?baseli...


Something is wrong with that result. It might be heavily overclocked.

The i9-12900K does 1893 / 17299 in more average results. Geekbench's site is down but several websites have large tables of the Geekbench V5 scores of all of the Alder Lake CPUs: https://videocardz.com/newz/intel-core-i9-12900k-is-12-faste...

But yes, the i9-12900K does outperform the 5950X I mentioned above.


Thanks for letting me know, I'm not sure why the numbers for the one I happened to click on are so far above the norm but I've now made an edit with a screenshot of more average numbers.


How can a stock clocked i9-12900K outperform M1 Ultra on single core?

It seems I have been misinformed about Intel.


> It seems I have been misinformed about Intel.

Intel's latest desktop parts are fantastic. They're the undisputed performance champions, the best value for gaming and consumer CPUs, and AMD has yet to catch up. The mid-level parts are even relatively power efficient (for desktop parts).

The pro-AMD anti-Intel commentary on the internet got completely out of control for a while. Intel is still very good at what they do, despite a few stumbles in recent history. It's just not fashionable to say good things about Intel right now.


The Intel i9-12900k has a TDP of something like 240 watts (I don't remember the exact number).

The AMD 5950x has a TDP of 105 watts.

The 5950x does roughly 10-15% worse single core and 10% worse all-core.

Intel is far behind AMD.

Overclock the 5950 a bit and you're back to where the i9 is stock, but still with less power. (I haven't actually done any research into that last sentence, but that seems plausible)

The AMD customer experience is also much better. I can upgrade - and soon will be upgrading - my 2 year old 3900x system to a 5950x. I don't have to buy a new motherboard like I would have had to with Intel. Plus my system has had PCIe 4.0 for a few years (which I have been utilizing with a super fast SSD) and also supports ECC RAM - Intel will price gouge you for those by making you go to their enterprise targeted CPUs/motherboards (to be fair, they do now support PCIe 4 on the i9).

Plus Intel withholding ECC from their consumer CPUs - even the high end ones - is just a major dick move and hurts all of the people like us on HN who want to have good hardware. And considering all the scummy anticompetitive stuff Intel has done over the last few decades, I'd say it's better to go with AMD even if their chips had a slight perf/$ disadvantage compared to Intel (but they don't). Of course, I'm not saying AMD is amazing and wouldn't do evil stuff if they got the opportunity, like Intel had, but AMD is still the underdog for just a bit longer, and in duopolies it's generally good to support the underdog if it doesn't cost you much (and here it doesn't really cost anything).


> The Intel i9-12900k has a TDP of something like 240 watts (I don't remember the exact number).

> The AMD 5950x has a TDP of 105 watts.

I have experience with both. The 5950X has significantly higher idle power consumption. It's a known downside of AMD parts. Unless you're running the CPUs at 100% all the time, the Intel platform will probably consume less power overall.

I know it defies all of the headlines and such, but it's true. Idle power consumption matters more than peak power consumption for most of us whose CPUs sit idle most of the time.

If power is a concern, you get the 12600K instead of the hot rodded 12900K.

> Overclock the 5950 a bit and you're back to where the i9 is stock, but still with less power. (I haven't actually done any research into that last sentence, but that seems plausible)

I can tell you haven't done any research because this isn't true. The 5950X doesn't overclock well at all. You can try to force higher all-core speeds with a lot of voltage, but it's going to become a power hungry monster of a CPU for very little gain.

The Intel really is the superior CPU.

> Plus Intel withholding ECC from their consumer CPUs - even the high end ones -

Also incorrect. Look up the 12900K page and scroll down to "ECC Memory Supported": https://ark.intel.com/content/www/us/en/ark/products/134599/...

This is what I was talking about when I said the pro-AMD anti-Intel rhetoric was out of control. It's like basic facts don't matter any more. People just want to hate Intel.


Interesting, thanks for the info. Unfortunately a bit too late to edit/delete my previous post.


The AMD parts are already running close to max. You can squeeze a few percent more out of them, overclocking like in the old days is out of the question.


If you set Intel and AMD to the same power limit Intel will probably be faster.

Intel just gave ECC back although you have to use a workstation chipset.


>It's just not fashionable to say good things about Intel right now.

Well, being single-handedly responsible for the nearly complete stagnation in processor performance for 10 out of the last 12 years probably has something to do with that.

That said, Intel's products and their performance are indeed quite underrated especially once you consider that they're being generated on a relatively old process node; it will be interesting to see what kind of performance they'll be able to wring out at 5 or 7nm.


Something to recognize is that tsmc’s 5nm can do 171M/mm2 where-as intel’s 7nm can do 200-259M/mm2.

M = millions of transitors


I’ve always assumed that Intels terrible corporate culture was responsible for that stagnation (told to me by former Intel employees). What upset me was the pointlessness of having their employees suffer that culture and then the rest of the world suffer the lack of progress.


...that and the lack of any real competition

Nothing focusses the mind like a real competitor.


It would be interesting to see how x86 fares against ARM on the same node size.


>The pro-AMD anti-Intel commentary on the internet got completely out of control for a while.

Not really. It was entirely justified, but things have changed. Prior to Alder Lake Intel really was just plain worse than AMD in most respects.

Processor vendors have leapfrogged each other many, many times before. A new architecture is supposed to beat everything else on the market. Recent years have been weird because Intel bungled 10nm at about the same time AMD bungled Bulldozer. We are getting back to normal.


It is clocked 1.6x higher than M1 chips (single-core), but also the CPU architecture isn't from 2015 like Intel's desktop chips were from generation 6 through 10.

A single core at that 5.2GHz turbo can consume ~40W alone, however.


A single 12900K core achieves that score using as much power as all 10 M1 cores combined.


Because Golden Cove is just better than Firestorm.


Geekbench doesn't load ATM. Are these numbers legit (both for the 5950X and 12900K)? Does Intel really outperform AMD single-core by 62%? Sniff test says that's way too much, but what do I know. I also doubt the multi-core difference. Yes, the 12900K seems to be a well performing CPU, but then again 5950X is a 16 "P" core part...


Intel has always had a huge lead over AMD on single core. AMD did manage to compete on multi core with the Zen chips but still hasn't gotten close on single core


I don't know why you're being downvoted. What you're saying is, in my experience, true. AMD is the multi core king. Intel is the single core king.

Multi core CPU performance benchmarks push your CPU and all its cores to the limit, but that doesn't reflect the typical real-world use case because the typical real-world use case involves programs that aren't able to effectively utilize all CPU cores. On top of that, gaming is the only typical use case where you are going to be pushing your CPU to the limit (i.e. the place where you actually need your CPU to be fast) and games don't CPU-parallelize well, meaning higher single core performance is generally best in the case of gaming.

Macs aren't made for gaming. They're made for productivity, multitasking, and creative work (the one major area where a multi core CPU can be fully utilized) so it makes complete sense for Apple to go the multi core route, but I can't say the same for AMD and their 8+ core gaming CPUs.


Intel had a dwindling lead in single core perf over the past few years, but when AMD came out with Zen 3, AMD actually had a non negligible lead over Intel. But Intel is now winning again with the 12th gen chips. So they are trading blows these days.


Gotcha. Is the cost efficiency comparable as well? Could I buy an AMD gaming CPU with similar single core performance to an Intel gaming CPU for about the same price?


I believe, this is result from overclocked system.


Out of pure curiosity, this is the comparison of that M1 Ultra Geekbench score against my setup which is VERY CLOSE to comparable: AMD 5950/3090/128gb workstation. Same version of Geekbench. What I would describe as well-thought out hardware configurations (no bling or neon, built with a vague sense of budget/performance in mind for work) and only "reasonable" stock settings (i.e. no manual overclocking, but XMP enabled, RAM chosen for Ryzen).

See: https://browser.geekbench.com/v5/cpu/compare/13330272?baseli...

I think you'd expect the M1 to outperform a bit on the multicore because it has more cores, but the comparison suggests a lot of the current benchmark score is being driven by exceptionally high scores in a handful of benchmark sub-tests.

On single-core I'd describe them as very very comparable at the moment.

There's another issue that was made apparent when I benchmarked my setup which suggests that a significant portion of people running 5950's are leaving a portion of performance on the table (i presume that's some mix of not setting things up right or not having the right mix of hardware).

I'd say this is an issue that doesn't happen with Macs, but...I'm aware that their past laptops have run really really hot (and dropped performance below their apparent specs due to insufficient cooling?), although everything I've seen shows the M1 family to be a a relatively great performer in this regard (to the point where I recommended my wife pick up the new M1 Macbook Air).


> my setup which is VERY CLOSE to comparable

How much did your setup cost all finished?


It depends a bit depending on how you account for other machines and peripherals (given that you can reuse/replace parts when doing it yourself), but the general price off the top of my head was:

- CPU $1100 AUD

- GPU $3000 AUD

- RAM $1000 AUD

- Motherboard $279 AUD

- NVMe Drive $300 AUD

- SSDs $150 - 300 each AUD

- SATA drives $100 - 300 each AUD

- Powersupply $240 AUD

- Case $250 AUD

- CPU cooler $170 AUD

That prices the full thing as new at about $6000 to $7000 AUD, which is vaguely in line with about what I recall my budget being, though I've had it for a year or so now.

That would put it at roughly 4k - 5k USD by my napkin math, but as a general rule US consumer tech prices recieve a bit of a discount compared to us over here, so you could probably come in on the lower end if you could physically get everything (i don't know what part supplies are like at the moment in the US).

Taking the AU Apple store price, accepting that their CPU probably outperforms mine in some multithreaded performance, and going for the top GPU option, I'd price the Apple M1 Ultra rough equivalent between $7000 to $10,000 AUD depending on where you want to cut it.

So like for like i'd say 4k-5k USD for mine vs 5k - 7k USD for the M1 in my locality.


Not a bad price.

The thing that kills me with PC builds though is longevity and when they don’t work. I’m getting too old to play RMA when something doesn’t work. I want one single point of contact. I moved from a Ryzen 3700X running windows in 2020 to a bottom end Mac mini after spending two months debugging a random crash.


What is the performance comparison? On RAM, bottom end M1 Mini is a paltry 8GB non upgradeable.


Well it was mostly used for transcoding and compiling Go. The bottom end 8Gb Mac mini was significantly faster than the Ryzen on both.


Also consider the value of your specialist labor and expertise in designing and building your rig, as well as warranty support comparison.


> The M1 Ultra is an impressive part, but the $4K price of entry is steep.

Some estimations of M1 Max and M1 Ultra price:

M1 MAX + 64 GB RAM + 1 TB SSD is $2600

M1 Ultra with 48 GPU cores + 64 GB RAM + 1 TB SSD is $4000

M1 Ultra with 64 GPU cores + 64 GB RAM + 1 TB SSD is $5000

So Apple puts a hefty price on its M1 Ultra CPU. You have to pay $1400 on top of M1 MAX price to get it and $2400 to get the best one.

I wonder if it's greed or production issues. Silicone is cheap.


The price of apple products have probably no relation whatsoever to what it cost to produce it.

It costs whatever the target customer is ready to pay to aquire it.


Do you really believe that companies should price their products below market rate?

The only likely consequence of that scenario is that they will be resold to the people willing to pay more.

Apple is not a charity. They don't owe you anything.


Yes, I hold communistic point of view in that regard and I do believe that people must earn reasonable wage and not charge anything over that. Extracting super-profits is criminal, even if you're making best computer in the world.


all products in the world are priced to maximize profit


Talking of the new chips, it seems interesting to me to note the difference in cache sizes. I know it is not a 1:1 comparison because of architecture difference but the 16 core 5955WX has

L1: 512 KB, L2: 8 MB, L3: 64 MB

compared to M1 Ultra's from the Geekbench data

L1: 128 KB instruction, 64 KB data, L2: 4 MB

I have mostly forgotten my microprocessor architecture lectures but it seems interesting that even after being able to cache more data near a core, AMD is not gaining much. Maybe packing too much cache increases latency of access or the gains simply go away beyond a certain size.

Edit: Maybe even it is coming down to the cache layout. Does anyone know if the cache fetch times for the named levels are roughly the same across architectures?

For x86, I remember L1 being a single cycle fetch and L2 being 10-20x slower than L1


I'm almost certain that the L1 caches on the M1 ultra are being reported incorrectly. That 512 KiB on the AMD CPU is the sum of data and instruction cache across all cores. M1 performance cores have 192 KiB of instruction cache and 128 KiB of data cache per core; and the efficiency cores have 128/64 KiB of instruction and data caches, respectively.

16*(192+128)+4*(128+64) = 5888 KiB of L1 cache on the M1 ultra.


Wow, if that's true then that is such a massive advantage. L1 is a single cycle fetch cache if it is like x86. So, individual cores can do compute so much better by fitting more data at once.


L1 hasn't been a single cycle for a long time, like decades.


Hmm, thanks; that seems interesting. I guess I need to read up more. On searching in Google someone[1] was quoting Xeon L1 fetch as approximately 4 cycles. I don't know if this is an average across branch prediction hit/miss, I will look for source of these numbers and try to read what has changed.

[1] https://stackoverflow.com/a/4087331


Yes, 4-5 cycles is to be expected. Also, larger caches generally trade off latency, so I would not be surprised if apple's chips were slower still.


The cycle count is largely irrelevant in a speculative, pipelined core, as long as it can speculate far enough ahead that the delay is masked by other activity. 4-5 cycles is nothing in the scheme of things.


Emphasis on 'largely'. If you have e.g. multiply indirect pointers, then you care. That said, I didn't mean to imply that moving to bigger, slower caches was the wrong tradeoff.


This is part of what makes Apple's design so incredible. Normally, increasing the cache size dramatically increases latency, but they paid the cost and their 128kb D-cache and 196kb I-cache still retain 2-3 cycles of latency.


Yes, my understanding that one of the main reasons that Apple CPUs are so power-efficient is (probably) that they have absolutely huge caches.


> compared to M1 Ultra's L1: 128 KB instruction, 64 KB data, L2: 4 MB

That doesn't quite match up with what we know about the two M1 Max chips that make up the Ultra.

Here are the resources per M1 Max, so multiply by two.

>On the core and L2 side of things, there haven’t been any changes and we consequently don’t see much alterations in terms of the results – it’s still a 3.2GHz peak core with 128KB of L1D at 3 cycles load-load latencies, [192k L1 instruction cache], and a 12MB L2 cache.

Where things are quite different is when we enter the system cache, instead of 8MB, on the M1 Max it’s now 48MB large

https://www.anandtech.com/show/17024/apple-m1-max-performanc...


The M1 Ultra has a significant memory bandwidth and latency advantage due to the type of memory used and the way the memory is connected. They use the same memory for the CPU and GPU, so the memory interface was optimized more like a GPU and the CPU benefits in a few memory-constrained benchmarks (machine learning, AES-XT streaming)

The flip side is that you're limited to 128GB combined memory for the CPU and GPU on the M1 Ultra, whereas a comparable Threadripper Pro system will take up to 16 times as much (2TB) and you can upgrade it whenever you feel like.

It will be interesting to see how much RAM Apple offers on the upcoming M1 Mac Pro parts.


Bandwidth yes, latency no.

Wiring out 32 channels of DDR5 to 16 slots might not be feasible, but latency-wise Anandtech's measurements suggest the M1 Max actually has a bit higher latency to memory than e.g. Icelake-SP


> significant memory bandwidth and latency advantage

This is to cache or the main memory?

I remember x86 based PCs taking 100ns for RAM access. Is it faster in ARM?

> They use the same memory for the CPU and GPU, so the memory interface was optimized more like a GPU and the CPU benefits in a few memory-constrained benchmarks (machine learning, AES-XT streaming)

This would mean each core having some dedicated RAM section better connected. Wouldn't this be more like L3 section in x86 but bigger? Maybe this is the advantage of having everything on the same die.

Beyond a certain size gains should also flat out, no?


> Is it faster in ARM?

Not in the general case, but specifically with how Apple has brought the ram chips so physically close to the cpu cores


The physical distance confers no latency advantage.


Is this because of signal speed anyway reaching almost speed of light?


Maybe 70% the speed of light. That's a handful of cycles at a few GHz for any SDRAM channel. The latency limitation is caked in to the scanning, strobed nature of SDRAM. It's dense and cheap, but it will never respond faster than ~20 ns (100+ cycles).


I wonder what stops AMD, Intel in trying this, except loss of modularity. It is not like the architecture will change by bringing it into the die. As far as I remember, CPU is the only thing that does RAM access even in x86


I would be surprised if they’re not evaluating this. The only difference I can see is that it complicates the number of SKUs they’d need to provide to their customers. It’s a little easier for Apple in this regard because they’re also building the final machines, whereas Intel/AMD are making chips that are going in a wider range of devices.


>L1: 128 KB instruction, 64 KB data, L2: 4 MB

Writing this would make no sense, knowing how many (more) transistors M1 has. In very simple terms one bit of SRAM is 6 transistors (+some extra for addressing).

Also comparing M1 Ultra to consumer grade chips is quite pointless, x86-64 consumer grade ones have just 2 memory channels.


M1 Max P-cores have 128 L1 D-cache, 196kb I-cache, and 24mb L2. E-cores have 64kb L1 D-cache, 128kb L1 I-cache, and 4mb L2. There is also a shared 48mb system-level cache that serves as L3.

AMD Zen 3 has 32kb D-cache and 32kb I-cache per core.

Here's the real chart

                    M1 Ultra    AMD 5950X    AMD 5995WX
    L1-D             1.13mb      512kb         2mb
    L1-I             1.78mb      512kb         2mb
    L1-total         2.91mb      1mb           4mb
    L2-total         56mb        8mb          32mb     
    L3/SLC-total     96mb       64mb         256mb

    Total L2+L3     152mb       72mb         288mb
    Total Cache     155mb       73mb         292mb


Looks like you're comparing per-chip numbers vs. per-core numbers.


Geekbench mentioned it under just the processor not core. 4 MB of per core L2 sounds too much tbh


In addition to what the other comments pinpoint there is a SLC (system level cache) in the L1 that does the job of the L3 cache


The interesting thing here is how little difference DRAM access speed makes.

AMD is probably getting 70GB/s while the M1's on chiplet DRAM is 3 to 6x the speed.


How little difference it makes… to GeekBench’s CPU speed test. They’re not testing memory speed. Not that interesting, no.


That'll depend on the usecase and benchmark. The M1 does really unreasonably well on SPECfp, for instance, which is heavily memory bandwidth dependent.


how do wattages compare? (i'm overlooking that and don't see where to find it)


Anandtech peaked M1 Max power at 43w, but that still included some GPU and other SoC usage. Doubling that gives 86w as an absolute max.

Anandtech peaked out the 5950 actual power consumption at 142w.

Anandtech peaked out the 12900k at 259w of actual power consumption.

AMD lists TDP for 5995WX at 280w. The 3990X has an identical TDP and it's actual power consumption would go up over 300w IIRC.


It also has an RTX 3090 level GPU, which costs over $3000. Taking this into account, it isn't that expensive comparatively.


I'd take the GPU comparisons with a huge grain of salt.

If it's anything like the M1 Max and M1 Pro, the real-world GPU performance doesn't quite extrapolate the same way as the synthetic benchmarks.

From what I've gathered, the performance per watt of the M1 GPU isn't actually that much different than nVidia's 3000-series performance per watt in real-world applications. I'd love to be wrong and discover that Apple has also cracked the code on making GPUs more efficient than established industry players, but I think it's not leaps and bounds better like they've claimed. At least not in real-world applications.


Maybe they have cracked the interconnect?


It barely hits a 3060Ti performance wise. That's a $400 GPU.

Also, while the 3090 is $3000 because of shitcoin miners and scalpers profiting of the demand, the MSRP is half that (and you can still get it at this price provided a bit of patience, which, admittedly, sucks to have to do).


You got a source on $400 3060ti’s?

I can’t find ‘em for under $750.


Also M1 Ultra's UMA architecture means almost 128G VRAM for GPU.

It would be interesting to see how it performed in VRAM intensive task compare to RTX 3090.


The score was 1793 single core / 24055 multi-core. This puts it in the realm of Threadrippers/EPYCs with 24-40 cores, or some of the similar core count Xeon Gold processors.

I'd love to see if Apple can post solid numbers that compete with 128 core EPYC CPUs in a Mac Pro, if they truly go for broke on it.


The interesting thing is that the 24-core threadripper is posting a score of 20k and the 64-core one is posting 25k. Whatever Geekbench is doing doesn't appear to be something you'd normally run on one of these monster core count CPUs.


Or the overhead of co-ordinating multiple cores is starting to matter more and more the more cores you have.


They shouldn't be coordinating often unless the benchmark is poorly suited.


Only for supercomputer tasks like weather forecasting. Not for near-embarrassingly parallel jobs like high traffic web servers and render farms.


High-end Threadrippers are severely limited by their thermal envelope. Looking at the latest 5000-series products, the 64-core CPU has the same TDP (280W) as the 24-core CPU. Guess which one can sustain higher clock speeds for longer before it gets throttled.


Indeed, it is a little faster single core and a little slower in multi-core on average compared to my 24-core Zen 3 EPYC (https://browser.geekbench.com/v5/cpu/compare/13330272?baseli... - which I expect to be slightly slower than the new 24-core Threadripper Pro 5965WX) - certainly comparable in terms of performance. But no word yet on if the M1 Ultra supports ECC RAM or how many PCIe lanes.


R we making any progress on $/ops? These chips are just gigantic.

We seem to be stuck on ~$50-$100/core for years

(Obviously Ultra is just twice the cost/size of Max)


Sadly, the M1 Ultra chip is far more than double the price of the Max for only double the performance.


What? You can't buy the chips separately, but the base system prices for a mac studio with a m1 max is $1999, and the version with the m1 ultra is $3999. You also get 2x the ram, and 2x the disk space when upgrading.


Keeping other specs the same, the Max -> Ultra upgrade is $2,200 which implies that the chip alone is... very expensive.


It's $1400 for the CPU upgrade (which forces a $400 32->64GB RAM upgrade). (And $200 to match 1TB SSD).

So that leaves $600 for the case, mobo, fans, and 512TB base SSD, which seems about right, depending on where you put Apple's fat margins.


Slight difference on the ports between the Max and Ultra, though still a hefty increase in price.


Single-thread performance is barely better than a M1 MacBook Air. It takes apps that are heavily parallelized or need the memory bandwidth to actually make use of the Ultra. Few creative pro apps are that optimized, unfortunately.


Of course it has the same single core performance, they use the same cores. We’re not gonna see single core move until the A15 based Mx comes out, presumably later this year.

Honestly says a lot more about the Air than it does the Studio.


That says how good the M1 air is more than anything.


The m1 air is still plenty zippy. Most people don't need these giant multithreaded chips all these companies keep churning out.


Unfortunately the M1 is limited in professional usage by the lack of a 32 GB option, and the fact that the laptops can only handle one external display. The M1 Pro is really a better option since it provides a lot of flexibility that the M1 doesn't have, without being obscene like the M1 Ultra.


I'm really hoping that the M2 air refresh allows 32GB. I bump the limits with 16GB a lot of the time. I could buy the 14", but I like the portability of the air.


The people who want M1 Ultra know who they are and have workloads that can parallelize or exercise the memory bandwidth.


Is there any M1 Ultra GPU benchmark out in the wild? If anyone knows, please share, I'm very curious.

I have a feeling the Ultra could cater to some niches really well.

For now we just have the typical Apple nebulous claims (they have a graph for "relative performance" vs "Highest end discrete GPU"[1] what does that even mean LOL).

[1] https://www.apple.com/newsroom/2022/03/apple-unveils-m1-ultr...


I think when the new M1 Pro/Max based laptops came out it was deemed that much of the performance Apple was claiming was due to the media engine helping accelerate the hard stuff in FCPX (and Premiere, IIRC). So it’s really good compared to the 3080 for per watt performance in creative work, but if you aren’t transcoding and especially if you aren’t optimized for using Metal, it’s not as great (but still good, 1050 ish performance with no fan is magical).


If you read the very fine print on those graphs, they compared with an i9-12900K and an RTX 3090. "Relative performance" is extremely questionable though I agree. I guess I'll wait for the Anandtech review to get graphs with actual units.


I saw the RTX 3090 configuration, but that is in the footprint n. 5, which is referenced from another paragraph, whereas the paragraph about GPU is more towards the beginning of that page. I'm not clear if that same configuration was used as a comparison for all the performance claims in the page. If the GPU used for that graph is an RTX 3090, the result is impressive especially considering power consumption. But yeah "relative performance" means nothing. Luckily, I think we will have to wait just a few days for "real world" benchmarking like Anandtech etc.


If it's real it does make sense, same single core score and double multicore compared to m1 max


doesn't that make sense? as i understand it the M1 ultra is two M1 Max's fused together... so wouldn't that be what to roughly expect?


I think so. They talked about some of the engineering around the fusing of the two but who knows how that would translate to Geekbench points, lol.


> PDF Rendering - 16699 - 906.2 Mpixels/sec

Well, that's me sold


This is one thing I really love about M1 macs actually. I spend a lot of my time with my face in service manuals in PDFs and the experience is spot on with Preview. Adobe Reader on windows is absolutely horrible to use in comparison.


Anybody actually uses that overbloated piece of crap? I would expect only corporate sphere where you can't really install anything yourself.

I use for example Foxit reader on my desktop which has fraction of the performance (and cost) of these new chips and its perfectly smooth experience.


Does that support document signatures, as that's a reason I've needed to use the bloatware in the past? That was a while back though. All online IME nowadays.


I still have PDFs stutter even today and it aggravates me.


Single core is where the hard stuff is. I'd love at least 1-2 cores to be insane high perf on single core in my dreams.


Would you settle for 16 insane high perf cores instead?


Well, it wouldn't be single core performance if there are 16 of them, right?


Is that even possible?


Is there a standard mechanism to measure power consumption for applications on processors?


Not really. Exact power consumption will vary wildly based on cache state, the instructions executed, the state of the branch predictor, state of the pipeline, and so forth.

The best you could do is track power during the time an application is scheduled to run on the processor but even that is an approximation because the power consumption of a single core depends on the state of other cores as well. Then you get into arbitrary territory (do you count the CPU time spent inside the filesystem driver or network minifilter as application time? is all called kernel code, which may execute on another core, taken into account? what about data transfers between the CPU and GPU? what about the power spent on cooling the CPU while the code runs?) and you end up with a very arbitrary and difficult approximation.

I think simpler approximations are more than enough, but it's near impossible to track power usage for a single application. Windows has such an approximation based on various factors, not just the CPU, and that's not much more than a "low-medium-high" scale.


That makes sense but I see a lot of places where they estimate power consumption in W or mWs. How do they do it? Is there may be a window of estimation possible in a constrained environment? No network or file I/O I guess is all the restrictions we can put on a process without compromising functionality.

Alternatively how do I measure power consumption of a random process? Assuming I only want to get a number and nothing else.


I honestly don't know how you'd get such a number.

What I'd do is measure the total energy consumption for a while and divide that by the time the process had control over the CPU.

Perhaps the CPU and GPU can be measured separately.


The processor measures it and you can use RAPL or powermetrics to read the measurement.


Wow! Thanks. Reading through it and it makes a lot of sense. I had no idea such a thing existed till now.


So single core performance is inferior to a 700$ CPU ( i9-12900k ). The M1 Ultra will be beaten within 6month for multi-core this summer on consumer grade amd / intel CPU with reasonable price.

It's impressive performance but for desktop termal / perf per watt is less relevant.


Funny, one of the reasons I’m considering the new Mac Studio is specifically perf/watt, and I’ve been tethered-laptop for about a decade. I want a good fast computer that also stays cool without an absurd power draw. In that, the specs I have in mind are the same between the Studio and MBP (lowest CPU to get 64 GB RAM), and the Studio is about $1000 cheaper for that config.


The form factor is nice too. Desktop PCs really are comically large, even if you go out of your way to make one as small as possible.


NUCs are smaller than the Mac Studio. Desktop PCs are comically large only if you want that comically large amount of extension capability. Which you can't get with any Macs.


If I‘d need an always-on white noise generator that heats up my apartment in the summer, then a way higher TDP is something I‘d aim for, too.


You seem to be getting a cold response here but I feel much the same way. I'm tired of loud hot chunky computers. I want a computer where at idle not a single fan is spinning and hardly any power is causing coil whine. I want something where long term I can re-use the box as a neat little server.

All this I/O and a nice bit of speed in this tiny quiet power-efficient little box? Sign me up!


All this can be achieved with... well any other computer if you build it properly, for fraction of the cost.


Nope, not if you want to run something like 4x4k and you're extremely sensitive to noise and want to run in SILENCE which I did. My desk setup is like the people in Apple's press conference, I'm the target market for this computer.

The primary issue is that no matter what you do, the GPU you use is never going to run on idle and will constantly be running a bit hot. You can choose to use a professional card (cooler, quieter) or consumer (flexibility, resale, cost). I choose the latter. As for the build, I had 3 options:

1: Air-cooled - What I went with. High end fans, thick oversized heatsinks on every component, I went for a low airflow low-noise case to minimize high pitched noises (although this increases fan speed requirements and fan hum). I tried to get the fans to stop completely but I ran into too many issues with the card just not staying quite cool enough and the fans suddenly ramping and now let the GPU fans spin at about 900rpm while the case fans run around ~750.

2: Water-cooled - With very careful building, you can create a custom loop with large thick rads and slow quiet fans and a quiet pump, but this setup is hardly cheaper than the mac studio, or as quiet, or as cool, or as small.

3: passive - Probably the best option was something like a compulab airtop3, which is probably the closest equivalent to a mac studio before the mac studio aside from it being a bit worse in performance. It uses a 9900k and is a bit obsolete now, and was pretty pricy back in the day, but it's pretty cool engineering.

Ultimately though, regardless of the options, none of these offer everything I need in anywhere close to the power envelope of mac studio, and more power usage = more heat, more size, more noise. The power efficiency of the m1 GPU is bonkers being built on 5nm and all. I have plenty of PC builds under my belt, I spent hours of research, I did an okay job, but I will never do as good as a job as a 3 trillion dollar company can. Why would I ever fuss with any of that ever again when I can just pay a little more to grab an m1 studio off the shelf that's a better computer?


Don't even need to go Intel. If one only needs single core performance, the M1 Mini should be enough.


Single core score is not that much better than the existing M1 models, but multi core is off the charts as would be expected.


And the single core shouldn’t be much different anyway, it’s the same core with some more airflow.


Is it just me or do all the main computer vendors have plateaued with regards with to single core performance?


Everything M1 is using the same core, so it won’t move until they go from M1 to something else.

They have managed to increase it pretty consistently with every iOS A series chip though: https://browser.geekbench.com/ios-benchmarks

So hopefully the A16 will have at least a small improvement over the A15/M1 and that’ll show in the M2 range, if that’s what they end up doing next.


Almost all current CPUs, x86 or ARM seem to be stuck at 1.6-1.7k range in GeekBench.

Still waiting for something that breaks the 2k or 3k benchmark.


Apple still has a lot left if they wanted.

Intel clocks up to 5GHz to achieve numbers similar to Apple at 3GHz.

If Apple increased speeds to 5GHz, their theoretical scores would be closer to 2400.


Firestorm can't clock any higher and if they redesigned the core for higher frequency it would lower IPC, possibly leaving performance the same.


Intel Alder Lake delivered around a 15% improvement to single-core performance gen-on-gen so it's just you.


It's not the 40% year of Moore's law...


Moore's law is about transistor density, not single-core performance. Even if a new generation of chips obeys the "law", there is no requirement that the designers dedicate the improvement to single-core performance. Alternatives include multi-core performance and miniaturisation.


I compared these numbers in topic with those of the CPU I most respect for energy efficiency while retaining desktop-like power:

the single core ratio is 2.5x, but the multicore ratio is 15x.


Who is actually getting to take advantage of the parallelism here? I still buy for single core.


Most people? If a given application isn't good at multithreading/multicore, it doesn't stop the OS from scheduling other applications on other cores, freeing up cycles. I think most people have multiple processes running at a time on their machines. Even just looking at software devs, the plugins serving your code editor probably run in their own processes.


> Most people?

Absolutely not. Take a look at your CPU usage right now. Pretty much all background tasks can be done on a core or two. I only have 8 cores, and the only time they're all facing high load is when I'm compiling code. Basically the M1 ultra is only useful for professional graphics/video editors, and developers compiling large codebases or training ml models.


Well... yeah. It has a price of entry of $4000; who did you think it was aimed at?


Going from 2 to 8 cores made a massive difference to me (although admittedly I also almost doubled single core perf in that switch). Probably for the somewhat obvious reason that I usually have far more than 2 programs open at once, so having each program be able to run on a separate core is a huge win. With my previous machine, if I was doing something intensive (like say compiling Xcode), then the performance of other tasks would be noticably slower. On my new machine I can't even tell it's doing it.


In my workflow, everything I'm actually stuck waiting for is either parallelized (code compile, media encode) or I/O (typically ethernet).


At work I easily use all my cores compiling code.


Is SPEC or Geekbench more relevant for this type of device?


Those benchmarks are similar with pretty high correlation in overall scores. Look at the individual subscores would be more useful for either benchmark.


Almost exactly 2x my MacBook Pro M1 Max that showed up TODAY after a 2.5 month wait.

2x increase in performance (I know I know it’s just a benchmark test) is unprecedented in successive Mac releases in my recollection here.


It's only for multicore performance.

I would have instantly buy it if it was twice the single core performance.


I just bought a mbp M1. Would the M1 ultra be available on mbps in the (near) future?


Apple has a terrible history when it comes to cooling down their laptops. I don't think they'll be able to cool this chip without a massive design overhaul. They'd have to make something the size of those "gaming" laptops to get it cool enough and even then it would probably exceed the noise and thermal limit they've set for their brand.


The thermals suggest no. Twice the CPU heat is probably not in the current MacBook Pro design.


Besides the thermals of the mbp probably not being suited and the form factor of the chip itself, the Mac Studio has a 370W power supply, which as well is way too much for a laptop to even legally have.


Even if they wanted to work around the thermals, they’d just be inviting the same criticism they’ve gotten of laptops running hot and throttling


No.


it has a RTX 3090 level GPU, how is Nvidia that bad? at 200watts less? that's insane... why does Apple not just produce discrete GPU with 200watts more power and just straight up destroy the GPU market?


I'd take the RTX 3090 comparisons with a huge grain of salt for the moment.

First of all, Apple's performance comparisons are against the mobile Nvidia cards, not the desktop ones. For example, the M1 Max comparisons with the "RTX 3080" were against the 100W mobile part, not the 320W desktop part. The mobile version is about 40% slower than the desktop one (something that I do find irritating about Nvidia's marketing). The desktop gaming cards are typically trading power consumption for performance. They're clocked well beyond the optimal efficiency point to squeeze everything possible out of them. Cutting the power target in half may only cost you 20% of your performance.

Another thing to note is that Nvidia pretty much holds all the high-efficiency binned dies for the enterprise cards. In terms of GPGPU, the data center cards are nearly twice the performance per watt of the consumer cards by going wider and dropping the clock rate.

Small edit:

The best benchmarks I could find suggested that the M1 Max was drawing ~105 watts with the GPU under under a large synthetic load and doing about 10 TFLOP/s of FP32. Let's assume the M1 Ultra is exactly double this and can do 20 TFLOP/s at 200 watts. The Nvidia A100 GPUs are the same performance but at 250 watts. The RTX 3090 is upwards of 36 TFLOP/s but approaching 400W of power draw.


By comparison, the fastest supercomputer in the year 2000 was ASCI White, which did 4.9 TFLOP/s. It cost $168M in 2020 dollars, took up 12,000 sqft, weighed 106 tons, and used 3 megawatts (+ another 3MW for cooling... = 6,000,000 watts). Granted, it had 6TB of memory and 160TB storage...but with 7,000 disk drives. (https://www.top500.org/resources/top-systems/asci-white-lawr...)

The M1 Max itself is a the size of a postage stamp, 10 TFLOP/s, costs ~$200 to make, and uses ~100 watts (= a single incandescent lightbulb).

Completely bananas.


Aren’t the datacenter GPUs a totally different arch these days?


Not really. They just fuse parts of the chip off.


They would have to dedicate a software team to writing drivers so you could run games on it. They don't value gaming though.


And that's assuming their "GPU" is actually performant in those scenarios. All the examples they post are for GPGPU, but how does something like Unreal perform?


I think the sweet point in Apple SoC is now they control both CPU, GPU and memory so they can optimize the price. If they make a discrete GPU, the margin will not be that high and will be less attractive.


Short answer: immediate mode architecture versus tile-based deferred rendering architecture. The latter is designed for power saving.

It's not apples to apples comparison; to get the most out of each of those two you'll have to write two different rendering paths in code.


They don't want to destroy the GPU market. They want to destroy the PC market. The gap is only going to widen now that they're on the new architecture. I'm sure AMD has more up their sleeve though. Who said Moore's law is dead?


Unless the Max Pro is sunsetting or pausing the Max Pro, WWDC must produce the kind of GPU power to upend NVIDIA’s dominance.


The GPU is nice, but software needs to support Metal and be ARM native before it will run at full speed.


Some of the fastest memory available for the entire system with zero-copy to the GPU.

A lot of what a discrete GPU needs to do is move bits backwards and forwards with the main CPU/RAM.


Hugged to death.


Seems fine.


[flagged]


Yeah, but it's doing that at 3.2 GHz versus 4.9+ GHz that the top-end Intel CPUs need to run at to get equivalent single-core performance.


And we've now come full circle back to the PowerPC days.


Except in the PPC days they didn’t even have a power advantage, which is why there never was a PowerBook G5.


They actually did, to some extent, right up to the G4. The G5 is where things really went off the rails.


Insane performance, while keeping energy consumption very low

They have 0 competition

Buying a laptop that is not M1 = you waste money and you buy tech junk that promotes global warming

Now they replicate the same thing on the Desktop, it is just crazy..

We truly are in a new era




Consider applying for YC's Summer 2025 batch! Applications are open till May 13

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: