The problem is data. GPT-3 was trained on 4:1 ratio of data to parameters. And f... | Hacker News

Hacker News new | past | comments | ask | show | jobs | submit

login

az226 4 months ago | parent | context | favorite | on: GPT-5 is behind schedule

The problem is data.

GPT-3 was trained on 4:1 ratio of data to parameters. And for GPT-4 the ratio was 10:1. So to scale this out, GPT-5 should be 25:1. The parameter count jumped from 175B to 1.3T, which means GPT-5 should be 10T parameters and 250T training tokens. There is zero chance OpenAI has a training set of high quality data that is 250T tokens.

If I had to guess, they trained a model that was maybe 3-4T in size and used 30-50T high quality tokens and maybe 10-30 medium and low quality ones.

There is only one company in the world that stores the data that could get us past the wall.

The training cost of the above scaled GPT-5 is 150x GPT-4, which was 25k A100 for 90 days, which poor MFU.

Let’s assume they double MFU, it would mean 1M H100s. But let’s say they made algorithmic improvements, so maybe it’s only 250-500k H100s.

While the training cluster size was 100k and then grew to 150k, this cluster is suggestive of a smaller model and less data.

But ultimately data is the bottleneck.

int_19h 4 months ago | [–]

We're also increasingly using synthetic data to train them on, though, and the race now is in coming up with better ways to generate it.

ssl-3 4 months ago | [–]

Links?

Consider applying for YC's Summer 2025 batch! Applications are open till May 13
Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact