> What’s the incentive for people to allow the crawler at all?
So that LLMs can learn from it? Profit is not the only thing that motivates people. I’ve spent years contributing to Stack Overflow to help people solve their problems, with the understanding that they had an open data policy and anybody could access the data dump easily to build things with it. It pisses me off that they are now trying to lock that information away where LLMs can’t access it. The whole reason to contribute is to help people. Locking that information away instead of exploiting this new channel to help people more effectively is antithetical to the reason I contributed in the first place.
Thank you for your contribution! I think that has to be a strategic decision. If everyone start using ChatGPT for everything, what's the value now for SO? From their perspective, they wouldn't sit and watch it happens. And I would add citation is a big deal.
I meant the LLM weights are not publicly available in the case of ØpenAI, so whatever you contribute to it will be locked up, just like SO locked up their user-generated data.
With Stack Overflow, everybody contributed to their data set. This data set is centrally managed by Stack Overflow and access is whatever they choose to allow. When they block access to that data set, it effectively takes it away from the public.
With OpenAI, they aren’t locking anything away. They are analysing the data and adjusting the weights in their model. They haven’t stopped people from accessing the data they are training upon.
What Stack Overflow are doing is stopping the free flow of information. What OpenAI are doing is providing an additional channel for it to flow through.
So that LLMs can learn from it? Profit is not the only thing that motivates people. I’ve spent years contributing to Stack Overflow to help people solve their problems, with the understanding that they had an open data policy and anybody could access the data dump easily to build things with it. It pisses me off that they are now trying to lock that information away where LLMs can’t access it. The whole reason to contribute is to help people. Locking that information away instead of exploiting this new channel to help people more effectively is antithetical to the reason I contributed in the first place.