Hacker News new | past | comments | ask | show | jobs | submit login

If they implemented this properly, they should be retroactively filtering all their content that is no longer allowed in the robots.txt, or carries the #NoAI tag.



My understanding is that it's not easy to untrain a model of data already fed to it.

Regarding noai tags - is this respected or just wishful?

    <meta name="robots" content="noai, noimageai">


Like every time you put content on the internet: you depend on their good will to respect these tags, or robots.txt. OpenAI can decide to ignore it. It's wishful thinking.


The next version of GPT might have better citations, and they could just refuse to cite things they were not allowed to crawl.

However, it's trivial to know whether the bot crawled your site or stopped at robots.txt.


Untraining may be difficult, but will they only ever improve the current model? Never want to change its dimensions or parameters (I'm not too into the jargon) and train the fresh and improved version?

I'm not sure this first reasonably working chat bot is going to be the last version we ever need, and afaik this sort of thing is as hard to port as it is to untrain, the problem in both cases being that it's a big black box


It's easy for them to delete the model and start from scratch though.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: