Hacker News new | past | comments | ask | show | jobs | submit login

Getting the results is nice but that's "shareware" not "free software" (or, for a more modern example, that is like companies submitting firmware binary blobs into mainline Linux).

Free software means you have to be able to build the final binary from source. Having 10 TB of text is no problem, but having a data center of GPUs is. Until the training cost comes down there is no way to make it free software.




If I publish a massive quantity of source code — to the point that it’s very expensive to compile — it’s still open source.

If the training data and model training code is available then it should be considered open, even if it’s hard to train.


If it was only feasible for a giant corporation to compile the code, I would consider it less than open source.


> the training data

This will never be fully open


Maybe not for some closed models. That doesn’t mean truly open models can’t exist.


I doubt you’d say that if one run of compiling the code would cost you $400M.


Free software means that you have the ability - both legal and practical - to customize the tool for your needs. For software, that means you have to be able to build the final binary from source (so you can adapt the source and rebuild), for ML models that means you need the code and the model weights, which does allow you to fine-tune that model and adapt it to different purposes even without spending the compute cost for a full re-train.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: