Hacker News new | past | comments | ask | show | jobs | submit login

If you want to destroy open source completely, the more models the better. Microsoft's co-opting and infiltration of OSS projects will serve as a textbook example of eliminating competition in MBA programs.

And people still support it by uploading to GitHub.




Yes. Thank you for saying it. We're watching Microsoft et al. defeat open source.

Large language models are used to aggregate and interpolate intellectual property.

This is performed with no acknowledgement of authorship or lineage, with no attribution or citation.

In effect, the intellectual property used to train such models becomes anonymous common property.

The social rewards (e.g., credit, respect) that often motivate open source work are undermined.

Embrace, extend, extinguish.


> This is performed with no acknowledgement of authorship or lineage, with no attribution or citation.

GitHub hosts a lot of source code, including presumably the code it trained CoPilot on. So they satisfy any license that requires sharing the code and license, such as GPL 3. Not sure what the problem is.


> The social rewards (e.g., credit, respect) that often motivate open source work are undermined.

You mean people making contributions to solve problems and scratch each others' itches got displaced by people seeking social status and/or a do-at-your-own-pace accreditation outside of formal structures, to show to prospective employees? And now that LLMs start letting people solve their own coding problems, sidestepping their whole social game, the credit seekers complain because large corps did something they couldn't possibly have done?

I mean sure, their contributions were a critical piece - in aggregate - individually, any single piece of OSS code contributes approximately 0 value to LLM training. But they're somehow entitled to the reward for a vastly greater value someone is providing, just because they retroactively feel they contributed.

Or, looking from a different angle: what the complainers are saying is, they're sad they can't extract rent now that their past work became valuable for reasons they had no part in, and if they could turn back time, they'd happily rent-seek the shit out of their code, to the point of destroying LLMs as a possibility, and denying the world the value LLMs provided?

I have little sympathy for that argument. We've been calling out "copyright laundering" way before GPT-3 was a thing - those who don't like to contribute without capturing all the value for themselves should've moved off GitHub years ago. It's not like GitHub has any hold over OSS other than plain inertia (and the egos in the community - social signalling games create a network effect).


> individually, any single piece of OSS code contributes approximately 0 value to LLM training. But they're somehow entitled to the reward for a vastly greater value someone is providing, just because they retroactively feel they contributed.

You are attributing arguments to people which they never made. The most lenient of open source licenses require a simple citation, which the "A.I." never provides. Your tone comes off as pretty condescending, in my opinion. My summary of what you wrote: "I know they violated your license, but too bad! You're not as important as you think!"


>Or, looking from a different angle: what the complainers are saying is, they're sad they can't extract rent now that their past work became valuable for reasons they had no part in, and if they could turn back time, they'd happily rent-seek the shit out of their code,

Wrong and completely unfair/bitter accusation. The only people rent seeking are the corporations.

What kind of world do you want to live in? The one with "social games" or the one with corporate games? The one with corporate games seems to have less and less room for artists, musicians, language graduates, programmers...


A photocopier provides "vastly greater value" than the people who wrote the books?

> they can't extract rent now that their past work became valuable for reasons they had no part in.

That is not the case at all. If I donate food to Africa, I'm happy if it goes to humans. I'm not happy if a Mafia organization steals the food in transit, repackages it and sells it.


Can you name a company with more OSS projects and contributors? Stop with the hyperbole...


Embrace, extend...


Sure, Let me know when they extinguish kubernetes, helm, vscode, LSP, playwright, powershell, typescript, npm, or the other 6000 projects/repos sitting on GitHub.


That literally has no bearing on the issue.


It literally does, even if you don't like it.


You're free to explain how you think it's linked to the issue if you want to, but it just isn't. Yes, Microsoft contributes some open source software. That obviously does not preclude them from being exploitative toward other creators of open source software.


Some? So you're not even aware. They're the largest OSS company in the history of OSS even if you don't like it. That's the link you're obviously missing.


I'm going to pretend for my sanity's sake that you're joking.


Who's bigger, I'll wait.


I deleted my github 2 weeks ago, as much about AI, as about them forcing 2FA. Before AI it was SAAS taking more than they were giving. I miss the 'helping each other' feel of these code share sites. I wonder where are we heading with all this. All competition and no collaboration, no wonder the planet is burning.


> And people still support it by uploading to GitHub.

It’s slowly, but noticeably moving from GitHub to other sites.

The network effect is hard to work against.


Migration is on my todo list, but it’s non trivial enough I’m not sure when I’ll ever have cycles to even figure out the best option. Gitlab? Self-hosted Git? Go back to SVN? A totally different platform?

Truth be told, Git is a major pain in the ass anyway and I’m very open to something else.


Mercurial was better than git IMO, at least for smaller projects.


A classic case of perfect being the enemy of the good. The answers are Gitlab and jj, cheers.


It doesn't matter whether it is uploaded to GitHub or not. They would siphon it from GitLab, self hosting or source forge as well using crawlers.


> If you want to destroy open source completely

The irony is of course that open source is what they used to train their models with.


That was the point. They are laundering IP. It's the long way around the GPL, allowing then to steal.


Maybe there will be legal precedent set at some point around derived work in terms of the set of data used to train the AI? I'm not hopeful though.


How many OSS repositories do I personally have to read through for my own code to be considered stolen property?

That line of thought would get thrown out of court faster than an AI would generate it.


I assume you're not an AI model, but a real human being (I hope). The analogy "AI == human" just... doesn't work, really.


I think in this regard it works just fine. If the laws move to say that "learning from data" while not reproducing it is "stealing", then yes, you reading others code and learning from it is also stealing.

If I can't feed a news article into a classifier to teach it to learn whether or not that I would like that article that's not a world I want to live in. And yes it's exactly the same thing as what you are accusing LLMs of.

They should be subject to laws the same way humans are. If they substantially reproduce code they had access to then it's a copyright violation. Just like it would be for a human doing the same. But highly derived code is not "stolen" code, neither for AI nor for humans.


That’s beside the point.

Me teaching my brain someone’s way of syntactically expressing procedures is analogous to AI developers teaching their model that same mode of expression.


No, a program that copies files is quite different to a human that writes those files data down and recalls them.


It's not your reading that would be illegal, but your copying. This is well a documented area of the law and there are concrete answers to your questions.


Are you saying that if I see a nice programming pattern in someone else’s code, I am not allowed to use that pattern in my code?


Can I copy you or provide you as a service?

To me, the argument is a LLM learning from GPL stuff == creating a derivative of the GPL code, just "compressed" within the LLM. The LLM then goes on to create more derivatives, or it's being distributed (with the embedded GPL code).


Yes, I provide it as a service to my employer. It's called a job. Guess what? When I read code I learn from it and my brain doesn't care what license that code is under.


That’s what my employers keep asking.


If the product is the result of compiling all the open source code out in the wild into a LLM, it can be argued that the derived product, the LLM itself, must follow the licensing requirements of the used source code.

The AI companies don't care much about this. When the time comes, they will open their models or stop using sources that don't meet the appropriate licensing. Their current concern is learning how to build the best models, and win the race to become the dominant AI provider - who cares if they need to use polluted sources to reach their goal. They will fix it later.


This seems bit nihilistic. You can't be automated. You can't process repos at scale.


Yet.


It'll be okay. We "destroyed" photography by uploading to places like Instagram and Facebook but photography as a whole is still alive. It turns out even though there is lots of stealing, the world still spins and people still seek out original creators.


I don't understand the case being made here at all. AI is violating FOSS licenses, I totally agree. But you can write more FOSS using AI. It's totally unfair, because these companies are not sharing their source, and extracting all of the value from FOSS as they can. Fine. But when it comes to OSI Open Source, all they usually had to do was include a text file somewhere mentioning that they used it in order to do the same thing, and when it comes to Free Software, they could just lie about stealing it and/or fly under the radar.

Free software needs more user-facing software, and it needs people other than coders to drive development (think UI people, subject matter specialists, etc.), and AI will help that. While I think what the AI companies are doing is tortious, and that they either should be stopped from doing it or the entire idea of software copyright should be re-examined, I also think that AI will be massively beneficial for Free Software.

I also suspect that this could result in a grand bargain in some court (which favors the billionaires of course) where the AI companies have to pay into a fund of some sort that will be used to pay for FOSS to be created and maintained.

Lastly, maybe Free Software developers should start zipping up all of the OSI licenses that only require that a license be included in the distribution and including that zipfile with their software written in collaboration with AI copilots. That and your latest GPL for the rest (and for your own code) puts you in as safe a place as you could possibly be legally. You'll still be hit by all of the "don't do evil"-style FOSS-esque licenses out there, but you'll at least be safer than all of the proprietary software being written with AI.

I don't know what textbook directs you to eliminate all of your competition by lowering your competition's costs, narrowing your moat of expertise, and not even owning a piece of that.

edit: that being said, I'm obviously talking about Free Software here, and not Open Source. Wasn't Open Source only protected by spirits anyway?




Consider applying for YC's Summer 2025 batch! Applications are open till May 13

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: