I'm the author of the post. You raise a good point about relative savings. Based on last week's data, our change reduced the task time by 40ms from an average of 3440ms, and this task runs 11 million times daily. This translates to a saving of about 1% on compute.
> This translates to a saving of about 1% on compute.
Does this translate to any tangible savings? I'm not sure what the checkly backend looks like but if tasks are running on a cluster of hosts vs invoked per-task it seems hard to realize savings. Even per-task, 40 ms can only be realized on a service like Lambda—ECS minimum billing unit is 1 second afaik.
I think that’s flawed analysis, if you’re running FaaS then sure you can fail to see benefit from small improvements in time (AWS Lambda changed their billing resolution a few years back but before then the Go services didn’t save much money despite being faster) but if you’re running thousands of requests, and speeding them all up, you should be able to realize tangible compute savings whatever your platform.
Help me to understand, then. If this stuff is being done on an autoscaling cluster, I can see it, but if you are just running everything on an always-on box for instance, it is less clear to me.
edit: Do you have an affiliation with the blog? I ask because you have submitted several articles from checkly in the past.
Hey Checkly founder here, we changed our infra quite a bit over the last ~1 year. Still, it's mostly ephemeral compute. We started actually on AWS Lambda. We are on a mix of AWS EC2 and EKS now, all autoscaled per region (we run 20+ of them).
It seems tiny, but in aggregate this will have an impact on our COGS. You are correct that if we had a fixed fleet of instances, the impact would have been not super interesting.
But still, for a couple of hours spent, this saves us quite some $1Ks per year.