Hacker News new | past | comments | ask | show | jobs | submit login
Do you know how much your computer can do in a second? (2015) (computers-are-fast.github.io)
281 points by surprisetalk on June 22, 2023 | hide | past | favorite | 142 comments



I like to test people’s intuition of computer power by asking them how many multiplications their phone or PC can make in the time it takes light to cross the room.

The way to estimate this is that light takes 3ns to go 1m. If the room is 10m wide, that’s 30ns. At typical frequencies that’s 100 clocks. With, say, 8 cores that’s 800 steps. At each step a core can retire about 8 multiplies using AVX vector instructions. The estimate of 6,400 multiplications is actually very conservative because you also have the GPU! Even low end models can put out hundreds of thousands of multiplications in 30ns. High end models can do about a million.

When asked, most people estimate “1 to 10”. Even programmers and IT professionals do not have a mental model of how fast computers really are.


To be honest I think this is almost a trick question, because you're combining two very different things (speed of light and performance of computers), and people are not very good at combining these sort of contexts intuitively (i.e. you need to do the calculations).

The other day I was watching The Twilight Zone, and one episode involves a crew "655 million miles" from Earth. I know what miles are, I know the rough layout and scale of the solar system, galaxy, and universe, but I had no idea how far away that actually is because it mixes these two units from very different contexts and intuitively it just doesn't make sense to me; is it still in the solar system? No idea. Converting it to AU showed it's about 7 AU, which put it in the appropriate context: a bit beyond Jupiter.


I don't even know the scale of the earth, or even the state I live in, in a way that makes any physical or intuitive sense to me. It's almost purely a mathematical understanding.

It's always fun to see the repeated scale analogies as you build your way to enormous sizes and distances, starting with something familiar, zooming to something 100 or 1000 times bigger, doing it again and again. It engages our physical understanding while counting zeroes 2 or 3 at a time, but I don't think it actually gives us any physical understanding. It really is just counting zeroes.

I just don't think there's any chance you or anyone else has any idea how far Jupiter is except as a calculation.


>or even the state I live in

I can wake up in my hometown, drive west for 12 hours, and still be in the same state. I know how big my state is in those terms. How many miles is that? How many football fields is that? How many times does light cross the room during that travel distance?


The traffic can be that bad where I am too.


Depends on your average speed.


> I just don't think there's any chance you or anyone else has any idea how far Jupiter is except as a calculation.

I agree, it's very hard – almost impossible – to really "understand" things at scales that are so far outside of our human experience and intuition. However, "near Jupiter" does put things in to relative proportion, whereas "X million miles" doesn't.


> because you're combining two very different things (speed of light and performance of computers), and people are not very good at combining these sort of contexts

I'm not saying I like GPs puzzle, but I could kinda answer that question (my estimation was a bit higher, assuming GP's answer is exactly correct) only because he combined these contexts. Ask me, how many multiplications a computer can do in 30 ns and I'll say, well, I dunno, 10? Well, ok, I'm kinda used to log-times in μs, so maybe I'll say 100. I have no idea how much is 30 ns, but it sure seems a short time.

Also, I don't remember what is the speed of light. Probably, I can figure it out, if I try, but I can't just say from the top of my head.

What I can say instantly from the top of my head, though, is that individual computations are performed with a speed of light, literally, and that a single transistor in my PC is about a billion times smaller than my room, so assuming a single multiplication takes about a hundred of logic gate operations (again, I cannot really tell without thinking, but seems legit) I'll say: well, about 10 million, I guess? And it doesn't matter that I don't know how much time it'll take for light to travel across my room.


[flagged]


Hard disagree - to me it made a lot of sense.


I agree.

I don’t think understanding the speed of light is something that can be considered intuitive by most people.

There is no generally observable way to understand the speed of light, it’s just instant to most people.

So without physics training, people just have no idea.

I always like the comparison of more computing power than it took to put a man on the moon!

Which used to be along the lines of a phone or car had more computing power than NASA in 1969.

But these days it’s probably a toothbrush ;-)

So it’s not as useful anymore.

I also have no idea of what level of computation converts to something practical. A million calculations a second means nothing on its own; what can that do? Draw a picture on the screen? Print a PDF?

So it has to convert to practical understanding.

Still, it’s a nerdy and fun comparison but one that isn’t understandable or relatable to the layperson.


The speed of light is a physics topic, but I’d argue it’s relevant to most people working in IT: it’s the maximum speed information can travel at.

Anything involving networks is very much affected by it. The internet runs on fiber, where the speed of light is a fundamental constant! Even electricity in copper obeys those same limitations.



Similarly, anything involving staying alive is affected by the distance from earth to the sun. Doesn't mean that it's relevant for anyone interested in staying alive to know it.


However unless you work in high frequency trading or are trying to layout components for a GPU you can generally ignore speed of light delays below a couple hundred miles. You might develop an understanding for the rough travel time at light speed from New York to San Francisco, or Frankfurt to Sydney, but that doesn't translate to a good understanding how long light takes from this end of the room to the other end.


I think that’s mostly theoretical. There’s a lot of hardware on long connections that adds latency. The signal also may take fairly large detours (in some cases, it may even hop via a geosynchronous satellite. That adds about 70.000km to a trip, and another 70.000 for the return signal. For most users that is a thing of the past, though)

For example, when I ping microsoft.com, I get something around 100ms (Aside: what does that time mean? Round-trip or one way? I read a few man pages for ping, but couldn’t find out)

There also is the weirdness that, when pinging apple.com or google.com I get about 6 ms. They must be playing tricks with DNS.


It mainly acts as a lower bound. If your pings return 5ns, you know that answer cannot possible be correct. But I agree that's a rather constructed use case.


I think it captures something magical - that computers can do so much while a photon is in transit. This is something that to us is the definition of instantaneous (intuitively - not actually).


How is it very different? The speed of light is based on a length and so is a second. Clocks per second and light speed are directly related to distance.


Agree but to be fair I wouldn't have known the answer either if they said 30ns.

Then again, I do know as a dev that unless it's happening every frame (ie for games), as long as you're not introducing non-scalable complexity, the computation time of a bit of reasoning is often negligible.


The real question is, "how many Twilight Zone episodes away from ubiquitous quantum computing are we?" (is joke)

Units definitely matter but anything outside of our day to day experience is difficult to communicate widely. I think most people don't know more than Jupiter is actually really far away and really really big compared to the Earth. 7 AU means nothing to my brain just like 655 million miles. It's a computed unit based on mean distance from the center of the Earth to the center of the Sun; I also have no idea how far we are from the Sun because my brain just experiences a hot circle in the sky. The only reason these things seem to make sense is because we have seen really vague maps of the Solar System. Other falsely but widely-accepted measures include using state sizes to describe other regions. People readily accept it despite never having been to said state or potentially any other state ... or even across their own state.

I think these are accepted because of a false confidence in understanding (think Dunning-Kruger) based on the fact that we have some information.

Personal anecdote: A friend flew from the East coast to Arizona and asked if we could just drive to the other side of the state to see the Grand Canyon in the evening. I had to explain that it's 6 hours one-way despite being in the same state. Had I instead said "Arizona is the size of 22 Connecticuts," the idea of how long it would take to get there would still not make any sense.

PS: I said "vague map of the Solar System" because generally nothing is always the correct scale. Either the planetary distances are the correct scale from each other or the planets are the correct scale from each other for pragmatic reasons. Sometimes, none of those things are scaled properly. Some animations attempt to show things correctly but that's still difficult to comprehend because of the non-linear velocities required to keep the attention of the viewer.


To me, 655 million miles means literally nothing to me beyond "really really far." Probably further than the moon I think? Definitely closer than the nearest non-sun star.

7 AU has at least some meaning. Further than the sun. Probably closer than Jupiter if I had to guess? Definitely closer than Pluto. Would take on the order of 100 minutes for light to travel that distance. We could send a space vehicle there in a single-digit number of years.

My brain can't comprehend either number in the context of my morning commute, but at least it can comprehend 7 AU in the context that that number was given. I can't tell you how long it would take to walk that distance, but that doesn't matter since nobody will ever walk that distance. What I am able to do is make inferences about the implications of that distance (assuming no fiction/sci-fi stuff): we could have a conversation with a few back and forth messages over the course of the day, and they aren't getting home any time soon but they probably won't die of old age.

That's the point being made about combining contexts.


What units do you think you know the scale of the solar system in? The speed of light is approximately 300 million meters per second, so call it 300 thousand kilometres per second, or very roughly 200 thousand miles.

Also, if I were asking that sort of question in an interview I would fine with a candidate getting a conversion wrong as long as they explain their thought process, and I’d help them correct the error early. I’m this sort of question is t meant to be a trick, it’s about whether you can connect different variables which may not appear connected at first glance.


> it's about

It's about absolutely nothing at all. You can project whatever you want on pointless questions like that. Whatever you think you're measuring in a candidate by asking them, well, you're not, lol.


> if I were asking that sort of question in an interview

The impression I had was that the question was asked in a social context: over coffee, beers, or something like that.

> What units do you think you know the scale of the solar system in?

AU, as I mentioned: https://en.wikipedia.org/wiki/Astronomical_unit


> What units do you think you know the scale of the solar system in?

AU


* Terms and conditions apply

What you are saying is technically true but with HUGE caveats. At this point, the issue isn't one of clock cycles but rather memory bandwidth and latency. Even assuming the best case, a hit in L1 cache, you are looking at +1ns to move the data from memory into the registers. But now talk about L2, L3, or worse system memory? Oof. And then you mention GPUs, but that problem is compounded 10x due to the bandwidth constraints of PCIe.

I mean, sure, if your CPU is doing nothing other than multiplying the same value by 2 over and over again, it can do that wickedly fast. However, once you start talking about large swaths, gigs even, of data to operate against and those numbers start to take a precipitous drop.

Heck, one of the benefits of AVX isn't the fact that you can do 64 lanes of multiplication at once, but rather the fact that you can tell the CPU to grab 64 lanes worth of memory to work against at once.

This is why when you start looking at what is talked about when people talk about next gen math machines, it's not the FLOPs but instead the memory fabric that gets all the attention.


Typical PC system memory bandwidths are on the order of 64GB/s with DDR5.

That’s 64 bytes per nanosecond, or 16 floats per ns, and hence 480 in the time needed for light to go 10 meters.

That’s the “base” performance. Caches and registers would improve on this significantly in realistic scenarios.

Again, this is why I like this thought experiment! It’s not a trick question. It’s testing intuitions.

People just think it’s a trick because their intuitions are so ridiculously out of whack with reality that they’re looking for the sleight of hand instead of reexamining their own notions.


> That’s 64 bytes per nanosecond, or 16 floats per ns, and hence 480 in the time needed for light to go 10 meters.

Again, terms and conditions.

The slow part for memory access isn't how much memory can be piped over the wire but rather how fast a request for the next chunk of memory can be issued. So while it's possible to get those 480 multiplications that relies on the cache being warm and the memory access being predictable to the CPU. If either of those two constraints are violated, then the number of multiplications is severely decreased.

In other words, the more frequently you have to tell the memory controller to get more memory, the less work you can do.

To put it in perspective, it costs around 100 cycles to load something from main memory. Imagine you are talking about something like an n body simulation that may be accessing memory all over the heap. In the worst case, you are looking at 4 multiplications (if multiplying by a scaler) and perhaps even just 2 multiplications if you need 2 memory pulls per multiplication.

Caches don't just help, they are necessary to come anywhere near achieving that 480 number.

Just a little history lesson.

Once upon a time, the memory controller didn't exist on the CPU, it was on a separate chip (the north bridge). Whenever the CPU wanted to load memory, it'd send a request to the north bridge and the north bridge would send that request to main memory.

One of the huge improvements to CPU performance for both Intel and AMD was integrating the memory controller onto the CPU. Why? Because they shortened the distance a request for memory, and the memory itself, had to travel significantly. Electrical signals travel at about 2/3 the speed of light and if your memory is 0.1 meter from the CPU, you can see how many requests for a new chunk of memory can really slow down everything.

This is fundamentally why nVidia's big GPGPU processors they sell to datacenters come chock full of memory. Because they want to limit the amount of time and times the GPGPU has to reach back into main memory to process stuff. It's far quicker to load up the 20gb of ram on the card in one go then to need to constantly pull that data over the PCIe line.


The clarify my scenario: I'm looking for the rate, not the total from a standing start.

To use an analogy: I'm not asking how far a car can go on the highway if accelerating from zero, but how far it can move if moving at highway speeds.

Nonetheless, the latency to main memory is also a great example of how bad programmers' intuitions are! A processor can perform thousands of operations in the time it takes for it to wait for just one memory access.

RAM is the new disk, and disk is the new tape.

PS: There was an article on here just a couple of days ago about how "handles are better than references". I didn't see many comments pointing out that this doubles the number of memory requests required when chasing pointers, halving the overall performance!


> The clarify my scenario: I'm looking for the rate, not the total from a standing start.

> To use an analogy: I'm not asking how far a car can go on the highway if accelerating from zero, but how far it can move if moving at highway speeds.

The terms and conditions keep applying. The everyday real world conditions implied by "highway speed" include regular cache misses and branches.

It's not about "start" or "acceleration". This massively impacts the steady state speed.

If you want a car analogy, tight vector packing compares to normal code about as well as looking at the speed and width of the flame front across a cylinder and using that to measure fuel consumption.


Not sure what you mean by testing intuitions. Which intuitions?

It sounds more like testing whether you know the speed of DDR5 without looking it up, and doing a bunch of basic unit conversions.

And your analysis doesn't even cover latency (which GP mentioned), which apparently according to my first google result is ~90ns. So unless your data is already in a cache it might not even be fetched within the requisite time.

Your question sounds like a trick question because there are so many nuances and gotchas, and if you're asked this question without a concrete context and the problem one's trying to solve, it's basically a check of whether the interviewee's mental model of computation matches the interviewer's. And as this thread shows, even among (presumably) competent people, the factors to consider can differ quite a bit.


Precisely my point.

What I’ve seen, intuition wise, on where slowness is in computation it’s usually along the terms of “how many FLOPs can this do?” or “How much bandwidth does this have?”. But that’s a tiny fraction of the story. Very little computation is boring math operations.

It’s true, CPUs are absurdly fast at multiplying 2 numbers together. That’s not what trips them up. What trips them up is getting those two numbers and determining when those 2 numbers should be multiplied. Those two problems are so hard that CPUs are often guessing, running the calculation, and then backtracking if they guessed wrong.

And that action has lead to side channel attacks (spectre, meltdown) which can leak information about data to malicious software.

This is why such a question doesn’t have either an intuitive or non-intuitive answer. It’s very strictly “it depends”. And if you boiled it all the way down to “no, what’s the absolute max it can do” you are now talking about software problems that don’t exist/aren’t valuable.

Consider, what execute the add instruction faster, a 3.4 ghz CPU from 2004 or a 3.4ghz CPU from 2023? The answer is they are the same. Both do the add in 1 cycle. Did you learn anything about the performance of a 2004 CPU vs a 2023 CPU? Heck no. That’s because how fast the CPU can execute instructions has almost nothing to do with how fast it can process data. (Yes, I’m aware that even the above statement is somewhat incorrect as x86 CPUs do weird things with adds where they’ll run multiple additions per cycle with some clever reordering… Not even that example can be simple. Assume they didn’t).

The trivia question is a boring question. The actual interesting discussion is “so WHY aren’t we doing multiplications that fast?”. That is what can actually lead to deeper understandings about what slows things down in computing. There’s a reason the 10ghz CPU never surfaced. It’s not because we couldn’t make it, it’s because we couldn’t keep it busy.


> Even programmers and IT professionals have literally not mental model of how fast computers really are.

Or of how slow light is!


I’m an Australian. A lot of the internet I interact with is in America, and a lot of the rest is in Europe. These are all hundreds of milliseconds away by fibreoptic cable, the latency of which is proportional to the speed of light.

I’ve been to America a few times. The internet is so fast there: not because of bandwidth differences, but because of latency, and especially sites carelessly loading chains of resources, which amplify the effects.

As soon as you deal in intercontinental stuff, you realise just how slow light is.

(As for studies in the effects of page load abandonment, those are never in the slightest bit relatable—at the time those studies suggest half the people are giving up, most sites still haven’t rendered anything at all, here.)


Speed of light is the lower bound for network latency but actual internet latency also significantly affected by delays added by network equipment (routers/switches). Typically the bigger distance the more routers there are in between so this delay is also proportional to the distance. And sometimes there are almost saturated channels so there is a significant queuing delay for packets waiting to be sent. RTT AU <-> US will be higher than RTT for most pairs inside the US but RTT for the same distance within the US can vary more than 3x depending on exact network infrastructure on the way.


Using fibre-optic cable where light travels in a zig-zag instead of using lasers increases latency too.


This site[0] helped me improve my mental model. You can scroll through the solar system at scale (the moon being one pixel). It takes a while!!

Then you notice a small button at the right bottom corner, that button allows you to auto-scroll at the speed of light, wow, now that’s slow!

[0]https://joshworth.com/dev/pixelspace/pixelspace_solarsystem....



Makes you realize that space travel isn't going to happen with current physics.


The faster you go, the shorter you have to travel due to length contraction. So travelling really far distances is still possible, for example, by constant acceleration [1]. It's just that everyone you left behind will have been dead for billions of years by the time you arrive.

[1] https://en.wikipedia.org/wiki/Space_travel_under_constant_ac...


> for example, by constant acceleration

Yes, this is why it won't happen, with our current understanding of physics, and the practical limitation that implies, with the resources we have. ;)


5 minutes after you takeoff you hit a pebble that obliterates your entire ship because you're covering billions of cubic meters of volume every second (in your time).


To illustrate the problem of accelerating any relevant piece of mass at relativistic speeds, observe how the kinetic energy increases towards infinity as the object approaches the light speed:

https://files.mtstatic.com/site_4539/12414/0/webview?Expires...

This means it's required infinite energy to reach c speed for any mass.

And c speed is quite slow for space travel.

Just for the fun of it, let's imagine humanity has assembled, in space, an aircraft-carrier sized vessel, fully equipped to function and nurture the little humans living inside of it.

Weight: 100 ktons.

Power is infinite ok, because badly rewarded nerds discovered new physics. Good for them they are now immortals of human history.

With Lorentz kinetic energy equation, one can estimate the kinect energy this vessel would have while traveling at say, 1% of the speed of light.

Energy: 4.5*10^20 J.

This is about two thirds of the total energy Earth receives from the sun in one hour. Or close enough to the energy the world consumed in 2017.

Now we have humans inside a vessel hurdling through space at 0.01 c, in addition to the pre-existing humans in a planet swirling through space. But there is a problem! It would take 424 years to reach the nearest star system. So we need to go faster and maybe break things. Hopefully not the hull, though.

F*** it let's go 0.5 c and reach Andromeda in about 9 years - long enough to write a book.

Energy: 1.4*10^24 J.

That's 3x the energy released by the Chicxulub meteor impact. Or 30+ times the 2003 world's total fossil fuel reserves.

Which raises the question what is the fuel being used?

Doesn't matter ok because new physics, we are transforming mass literally in energy no constraints 100% efficiency lol.

By e=mc^2 that fuel would weight - at least - 14.9 ktons.

There is margin for error, since the vessel would be shedding mass, and getting lighter. That would allow engineering to run the global process at 80% efficiency, which is a very realistic metric and maybe miss a turn or two on the way to the neighboring star.

Returns not included.


> It's just that everyone you left behind will have been dead for billions of years by the time you arrive.

sad piano plinks

https://www.youtube.com/watch?v=kEVZ3NIoEAE


We’ve already been to space, so it has happened with current physics.

I suppose the question is where do you want to go and how much luggage do you want to take with you?


I recently encountered this from both sides in my work. I was trying to get a particular single-threaded CPU program running in less than 1 microsecond. Light goes about 300m in that time. I remember reading around the same time that the Eiffel Tower is about 300m tall. So it stuck in my head that if someone switched a flashlight on at the bottom of the Eiffel Tower at the same time as my program starts the program should complete before someone at the top of the tower sees the light.


Exactly.

Speed of light is not exactly a relatable point of reference.


It ought to be intuitive for everyone working in software development!

The performance of everything is limited in one way or another by ‘c’.

Not just WAN links, but the data centre Ethernet as well. The distance to the disks matters. The physical size of the motherboard. The placement of caches, etc…

When people say things like “putting the compute near the data” they’re implicitly talking about overcoming the limits imposed by the speed of light.

When you hear about an N+1 performance issue in some ORM, that’s bad because of the speed of light.

When you test your LAN with “ping”, you’re lying to yourself because it won’t show measurements below 1 ms, which is an eternity.

I just told you how much compute can occur in just 10 nanoseconds.

Go ping something. Look at the “1 ms” in the output. Go back to my post and work out what can occur in 1,000,000 nanoseconds. Go look at the 1 ms again.

Repeat until you have an epiphany about your zone redundant Kubernetes-hosted cloud native microservices architecture.


FYI the ping utility in linux shows 3 significant figures. On my lan it's reporting 0.156ms avg between two hosts.


150 microseconds is typical, but for reference, Azure gets about 85µs between VMs in proximity placement groups, and I've heard of single-digit latencies with high-end RDMA NICs.

Your ping is 156,000 nanoseconds. You just saw that you can in principle do about 6,400 computations per 30ns, so... that's about 33 million arithmetic calculations per round-trip, within the data centre.

I hope this makes you see every unnecessary network hop in a different light.

PS: It typically takes 3 round-trips to establish a TCP connection, and 5-7 for a TLS connection. A database connection over TLS needs a few more. And then you have load balancers, firewalls, proxies, envoy, ingress, dapr, and, and, and...


Speed of light between datacenters is important.

Datacenter Ethernet has too much overhead to make a difference in almost all cases. Disks have an even higher overhead:distance ratio. The size of a motherboard only matters for signal integrity, not that half a nanosecond extra. Cache ___location inside a chip can matter, but even then size is a significantly bigger factor than ___location.


Except we know nothing is faster than light, so that's a reference people know


Unless you don’t measure it, then it’s as fast as it needs to be!

Oh group velocity.


Sounds like most people estimate surprisingly well. Fermi-wise, the obscure term in your numerator is 10^10 and the obscure term in your denominator is 10^8. The more obvious term (room size) is 10^1. So the answer is 10^3, and people are guessing 10^0 or 10^1. So some people are only off by a couple order of magnitude, which is explained by a rather small estimation error in either or both obscure terms.


1-10 is pretty accurate for a single CPU with random memory access. Sure, the GPU can physically perform many more in parallel, but then you get into the exact size of the gpu and how fast you can copy to/from memory to it which isn't the point of the question.


You should bring up Admiral Grace Hopper’s description of this on the Letterman show: https://m.youtube.com/watch?v=oE2uls6iIEU


What is the relative velocity of the computer? Is it moving at 99.999% the percent the speed of light relative to the observer? In what reference frame is the observation being done?


Maybe something like "starting from 2, how many sequential prime numbers can your computer generate mathematically, in the time it takes light to cross the diameter of the Earth" would be a more interesting measure?


Anyone else having a weird deja-vu moment while reading this?


Isn't this kinda like a convoluted version of a FLOP?


And yet: Electron apps.


It is amazing how much of that incredible performance we can eat up and then end up with something slow anyway

And also the fact that we eat up just enough of that performance for it to be slow sometimes but usually no more, because we can’t


> Do you know how much your computer can do in a second?

Make about 25% progress toward the start menu popping up after the click on the button.


I fixed that on my computer by turning off start menu web search, IIRC something like this: https://pureinfotech.com/disable-search-web-results-windows-...


I really dislike the `fill_array.c ` vs `fill_array_out_of_order.c ` example. While it's showing the effects of spatial locality, it's misattributing the performance delta to the CPU L1/L2 cache.

The problem is that it's filling an array in freshly allocated memory. While you might expect `malloc(NUMBER)` to give your process a crap ton of space in your RAM, that's far from the truth. First of all, glibc will just translate that into an `mmap`, and even if it didn't, the Linux kernel still wouldn't allocate the whole buffer due to "optimistic memory allocation." Instead, you'll receive ownership of this chunk of virtual memory, but you won't actually allocate anything at first. Only when you dirty each page will it actually allocate the backing physical memory. And even then, the Kernel has various heuristics to preemptively page in memory that it thinks you're going to use soon.

I'm sure the author was aware that the example was more nuanced than just "muh CPU cache", but reducing spatial locality to just "muh CPU cache" for the article's sake does a disservice to the reader.


The doubling of J as a means of striding across the array also gives me some concern. While it is cache related, it is cache related for sneaky reasons. J is going to end up, in binary, with a ton of 0s at the end after being doubled over and over. After 6 iterations after each reset, the position within the cache line is guaranteed to be 0. Using the standard 32K of L1 D$ would be 512x64 byte lines. Assuming 8-way associative (decently common as far as I know) means these 512 lines are organized into 64 set indexes, each with 8 lines. So after the next 6 post-reset iterations you are guaranteed to only be hitting the 0th index into the cache, effectively reducing the L1 cache size to 8 lines.

(edit: Not the 0th index, but the same index as the base of the array.)


Do the caches really just use the last few bits and not any kind of small hash function?


I mean you'd get the same number of page faults with both examples, so I think the discrepency is explainable by caches/cpu prefetching.


Yes. I know it can render 100 frames of a high quality 3d scene along with related networking, audio, logic, physics, etc. operations.

I don't need a website to tell me my computer is fast. What I need is a website that spoonfeeds "how to make webapp feel like Q3A" to the average developer and somehow keeps them on objective.

You really should be able to constantly shame yourself into focus on this. "Is rendering this report table more complicated than a scene from Overwatch?" You will likely answer "no", hang your head in shame for a moment, and then admit to yourself you need to throw away your stack of 20+ 3rd party js libs and break out the MDN bible.


> "Is rendering this report table more complicated than a scene from Overwatch?" You will likely answer "no", hang your head in shame for a moment

Hell, even just asking if it is more complicated than DOOM should illicit an even more appropriate level of shame. DOOM ran on a 386 with 4MB of RAM and 12MB of disk.


Discussed at the time:

Do you know how much your computer can do in a second? - https://news.ycombinator.com/item?id=10445927 - Oct 2015 (174 comments)


It's interesting to read the comment I made there almost 8 years ago and think "computers certainly don't feel like they've gotten any faster since then --- if anything, they've become even slower." Software has certainly gotten slower more quickly than hardware getting faster.


What surprised me the most is how fast grepping is. It can search more bytes from memory than iterations in a simple for loop?


It provided the easiest possible case to grep: a simple literal. So all grep needs to do is substring search. And in turn it's accelerated by use of memchr. And that in turn it's written in assembly using AVX2 vector instructions, courtesy of GNU libc.


(For those unaware, burntsushi is the author of ripgrep)


source?

(edit: was supposed to be a pun)



The loki logging software takes advantage of how fast grepping is. Instead of have a big index that lets you locate the exact line you are looking for it has a much small index of tags.

The tags ( something like env=prod, app=auth ) will narrow things down to say 1% of the data in the time period. The software then just greps though a few Gigabytes to find the exact lines you are after.


Probably a pretty bad test of the _actual_ speed of grep. A more realistic test would have printable characters and frequent newlines. I wouldn't be surprised if all those bytes were just taking a fast-path shortcut somewhere.


Depending on many factors (like details of the patterns used and the input), some regex engines (like Hyperscan) can match tens of gigabytes per second per core. Shockingly fast!


Grep is fast. Like obviously in this case you’re ‘just’ measuring how fast you can read from a pipe, but there are plenty of ways grep could have been implemented that would have been slower. Generally, I think grep will convert queries into a form that can be searched for reasonably efficiently (eg KMP for longer strings (bit of a guess – not sure how good it is on modern hardware), obviously no backtracking for regular expressions.


I don't think KMP has been used in any practical substring implementation in ages. At least I'm not aware of one. I believe GNU grep uses Boyer-Moore, but that's not really the key here. The key is using memchr in BM's skip loop.


grep is the beginning, not the end. it’s a great performance baseline to meet, and then beat[1]. computers are insanely fast!

the startups using grep on aws are undercutting those doing slower things on aws.

i wonder why aws architects never talk about grep.

1. https://github.com/nathants/bsv


Especially ripgrep! All hail ripgrep!

I will never go back to not using it.


This is awesome. It was fun to see where my intuition was way off. Had no idea indexing/not-indexing a column would make a 25,000x difference (makes sense though as logN on a 10-million row table is 13, which is a lot less than a linear scan).

Was also very surprised to see how slow Bcrypt is.

edit: Yes, BCrypt, is designed to be slow. I just had no idea how slow. I assumed it was 100-1000 times faster than it is/can be.


> Was also very surprised to see how slow Bcrypt is.

Though unlike all the other examples here, this one is actually intentional. If/when computers become considerably faster, the cost factor of typical bcrypt implementations will be raised so that it stays slow, to keep it difficult to throw brute-force attacks at it.


For sure, that's an important piece. But when I hear "slow hashing" I assumed it meant 1k-10k times per second, not <10 times per second. That's practically human speed.


“Practically human speed” is exactly what they’re aiming for. It’s meant to be used in human-speed applications, and to be as slow as possible in those use-cases without causing perceptible lag.


The bcrypt example is the only one that's off the mark. "How fast is this function that we've explicitly designed to be arbitrarily slow?" Well, it depends, how slowly did you configure it to run?

Basically the important argument to bcrypt is literally the amount of time you want it to take to run on your hardware (ie the number of rounds).


This one threw me off as well. I didn't see any configuration so I just assumed that it was only 1 iteration by default. After being off by several orders of magnitude, I looked it up and apparently it defaults to 4096 iterations.


As a game engine builder, you come to appreciate that your target speed to draw millions of polygons, calculate physics on hundreds of objects, render the UX, collect network traffic, send network traffic, compute pathing, and implement game logic, and everything else that needs to be done is 16 milliseconds max.

So I would guess in C++ you could make 240 million calculations in a second.


That is a really bad guess...


What's bad about it?


Did you say something ridiculous then reply to yourself three minutes later?


Indeed, once I realized the error of my train of thought!


You can edit comments. An explanatory P.S. is much clearer than a reply to yourself.


A large modern consumer GPU is several to a few dozen teraflops.


Load half a webpage sadly...

The only thing I learned is apparently default Python JSON lib sucks on speed


50MB/s was my guess for a json parser written in a direct way in a moderately fast language that doesn’t read into some optimised representation (eg I assume the python one constructs python data structures). The json file is 64k (which may have been what threw you off) so that comes to ~750 per second. But my guess of 50MB/a could be way off (might be imagining a slower implementation but then the computer for the tests is also a bit slow, I think)


For comparison, using Rust's Serde-Json library on my desktop (Intel 11700k, and with computer doing other things), benchmarked with Criterion:

    Benchmarking json_read_struct: Collecting 100 samples in estimated 5.2012 s (81k iterations)
    json_read_struct        time:   [64.126 µs 64.376 µs 64.660 µs]
                            change: [+0.2646% +0.7525% +1.2538%] (p = 0.00 < 0.05)
                            Change within noise threshold.
    Found 7 outliers among 100 measurements (7.00%)
      6 (6.00%) high mild
      1 (1.00%) high severe
My benchmark code[0] takes advantage of knowing the shape of the data, and also knowing that I can avoid allocations for the strings so we're not measuring the allocator performance. With 64 microseconds per iteration, that comes out to about 15,600 parses per second.

[0] https://gist.github.com/Measter/acbae474ba8e1451946630da2a2c...


I don’t really understand what this is trying to prove:

- you don’t seem to specify the size of the input. This is the most important omission

- you are constructing an optimised representation (in this case, strict with fields in the right places) instead of a generic ‘dumb’ representation that is more like a tree of python dicts

- rust is not a ‘moderately fast language’ imo (though this is not a very important point. It’s more about how optimised the parser is, and I suspect that serde_json is written in an optimised way, but I didn’t look very hard).

I found[1], which gives serde_json to a dom 300-400MB/s on a somewhat old laptop cpu. A simpler implementation runs at 100-200, a very optimised implementation gets 400-800. But I don’t think this does that much to confirm what I said in the comment you replied to. The numbers for simd json are a bit lower than I expected (maybe due to the ‘dom’ part). I think my 50MB/a number was probably a bit off but maybe the python implementation converts json to some C object and then converts that C object to python objects. That might half your throughput (my guess is that this is what the ‘strict parse’ case for rustc_serialise is roughly doing).

[1] https://github.com/serde-rs/json-benchmark


I probably underestimated how much does it take to convert it from "fast JSON decoder written in C" (which many interpreted languages use in backend) to actual language implementation, but in perl it's pretty fast:

    -> ᛯ ls -s -h big.json
    101M big.json
    [11:53:30] ^ [/tmp] 
    -> ᛯ cat /tmp/1.pl
    #!/usr/bin/perl
    use v5.24;
    use JSON::XS;
    use File::Slurp;
    my $j = decode_json(read_file('/tmp/big.json'));
    say scalar @{$j->{'a'}} 
    [11:53:39] ^ [/tmp] 
    -> ᛯ time /tmp/1.pl
    34952534
    /tmp/1.pl  0,81s user 0,13s system 99% cpu 0,939 total
so just around 100MB/s


The test computer may be a little slow-ish but it doesn't seem "commodity laptop picked up from local supermarket" slow. It has an SSD, for example.


I think it’s ‘thermally limited and a few years old’ which makes it a bit slower than I expected. I’ll accept that ‘old’ was a bad characterisation.


Or displaying 1/2 a character in the Visual Studio editor.


Modified the C example to avoid compiler being too clever and taking zero time.

I get over 3.3 billion additions per second. 550 million is way too low.


I clicked 1 in the first example because I thought that's the number of iterations it would do with -O2.


Yes. Exactly what it could do in 2004 but 10x slower with 1000x the computing power. #bloat


>Do you know how much your computer can do in a second?

Not much if you are running Electron apps.


I just wrote a blog post [0] on a project that I did with audio-fingerprinting and to my surprise, my desktop could do 369 billion bit-comparisons per second (popcnt using AVX2 instructions, 4 simultaneous processes).

[0] https://kenschutte.com/phingerprint/


One second is a really long time UX-wise.


It seriously feels like most UI designers don't know that.


And most web developers don’t know that. You want to add 1 second of page load time to print the time in the correct timezone? Noooo! Of course, it’s never framed like that. People just pull in moment or whatever because that’s what stack overflow recommends. Moment pulls in the global timezone database. You don’t notice your javascript bundle grow and surprise! Your website sucks.


IME many web developers do know, but don't do anything because of incentives. Leadership rarely uses the surfaces you make slow, and people don't generally report an extra half second of latency. So their incentive structure looks like:

1. They will not be rewarded for going the extra mile

2. They will not be punished for making the site slower

3. They will be punished for shipping the feature slower

Point out "X will be slow" to a lot of devs and they go "I know, but I don't have bandwidth to prioritize the extra work"

Part of why it's so important to make the right thing easy.


A few, maybe.

Most just want to get paid and dont give a shit. Which is normal.


> A few, maybe

IME >90% of non-junior engineers understand most of the slowdowns they're introducing and choose to add them anyway.

> Most just want to get paid and dont give a shit.

My point is that you can't get anyone to give a shit about doing work for which they're not only not rewarded, but actively punished. Expecting people to actively hurt their careers for the sake of shipping good product is not a reasonable expectation.

You need some mechanism to fix those incentives.


Unfortunately, for the longest time ever Moment was the only sane date and time library for Javascript.

If you needed more than just adding a single second, you were stuck with it.

But size-wise it was ginormous.


I toured my eventual college’s CS dept as a HS senior (this was a LONG time ago. Punch cards, people.)

One machine counted clock cycles between two key presses using the same finger. From what I recall the smallest answers were in the … thousands?


I had a similar thought from this video counting CPU interrupts in one key press on the 6502

https://youtu.be/DlEa8kd7n3Q?t=12m52s

In one second I can type about 7 ASCII characters, read maybe 20, or aim and click like twice. Each typed character is worth like 600 million CPU instructions of a single core of a 4GHz CPU. It's actually this fat, precious thing, the product of a giant computation from a "computer" (that was optimized for something else) that we aren't harnessing at anywhere near its real value. Instead, we type semicolons.


SPOILER ALERT Am I the only one who thought "What's the problem? - The basic loop with xrange in Python tops at 68,000,000 (loop.py), - but write_to_disk.py tops at 342,000,000, - and write_to_memory.py tops at 2,000,000,000 " ? And then thought "Ahah but it is while loops instead of iterators! Iterators are that bad for performance! XD"


I think there's a bug after completing the first question. After I selected an answer for how many iterations of an empty loop Python could do, it said the answer was incorrect and showed that 68 million was the correct answer, but proceeded to explain that Python can do 68k iterations in an empty loop. This means the answer shown and the explanation are contradictory and likely one of them is wrong, probably the explanation.

Edit: Apparently I forgot to account for the milliseconds part. My bad.


"68,000 iterations of an empty loop in a millisecond" The factor of 1000 comes from the conversion from seconds in the question, to milliseconds in the blurb.


The 68k is per millisecond, which lines up with 68M per second.


Some previous discussion from 2020:

https://news.ycombinator.com/item?id=23804373


I still have the "nanosecond" (11.8 inches of copper strand) Admiral Hopper gave out when I saw her lecture back in the 80s.


I have Xfinity, so I know exactly how much my computer can do in a second: load 1/30 of the Reddit homepage on a 300Mbs connection


I was surprised that I got every single one, but 1. Just know about performance and networking. Never considered the speed of light, but... I did point out to the docent at CHM what one of Grace Hoppers nano seconds was. I saw one of those in Time-Life book in the 1960s. I bet she would have done well at this quiz, and it was very good.


Most of this is not really "computers are fast" but rather "algorithms are cool". That's fine, of course.


Is there a bug with write_to_memory.py? (SPOILER BELOW)

loop.py does 68 million operations per second, and the text below says "we know about the most we can expect from Python (100 million things/s)", but write_to_memory.py writes 2 billion bytes per second writing one byte at a time.

How can writing to memory be faster than doing nothing?



I wasn’t sure what the units were on that one, since the chunk size is more than one byte. So NUMBER is the iterations, but I guess they’re looking for an answer in bytes, and it’s much less than 100M iterations?


Render 1/4th of a webpage?


Just a couple days ago we have programmed rp2040 to drive WS2812B LED using assembler, counting clock cycles, with the kids in the IT club I run. Not that much harder than playing Shenzhen I/O. You should give it a try. :]


I can grasp it about as well as I can grasp the size of the universe.


Why are so many of these written in python? That seems against the spirit of this quiz. Should change the name of the site to "how fast is python really?"


I presume it's because Python has a reputation for being a slow language, and this quiz wanted to show that even "slow" Python code can be fast on modern hardware, unless you touch IO.

That, and Python is probably the most accessible language for the most number of people.


I imagine it might be because python is a much easier to read language. If someone with a non programming background comes across this because the question (and answer) is interesting to them, they can have an idea of whats going on.


I don't find python easier or harder to, compared to..ruby, javascript, ( or even OCaml) for example.

Most people who are tech adjacent, learn declarative langs like SQL, HTML first. Python doesn't look like that at all.

It's like when people say "Go is so easy to read!" I learned to program in SML then OCaml, __insert other FP langs__ then Ruby and ES6, Go looked pretty alien to me the first time I saw it.


That's cool and all, but then you're not "most people". Personally I found Go super easy to read the first time I saw it, but my journey through the programming languages was also a different one.


I think the point I'm making is that there's nothing "universal" about a language that makes it easier to learn then other languages. It's what ever your previous experience is.

In my point, I rebuttal that non programmers would find python "easy to read", kinda not true.

I then use my exposure to Go as antidotal evidence, of the above observation.


10/18

Could have been worse. Only a few were off by more than one order of magnitude (mostly I underestimate just how fast md5 is).


I've been studying and writing about this exact issue recently, for a big side project of mine.




Consider applying for YC's Summer 2025 batch! Applications are open till May 13

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: