Hacker News new | past | comments | ask | show | jobs | submit login

How is Google search breaking copywrite?



The text preview underneath the search result is one thing I remember being contentious (some news websites in France took Google to court and won if I recall correctly)

For mostly the same reasons people are against AI. If you read that text, sometimes there’s no reason to visit the website, which ‘deprives’ website owners and the content creators of ad revenue they would have gotten if Google hadn’t copied the text from their website.

After all, the news is the same regardless of if it’s written on the Google result preview or on the news website itself.


Exact same way OpenAI does: by scraping data, ingesting it, processing it, incorporating it into its proprietary system, and using it to serve responses to queries.

This is not to say that I thing that any of this is wrong. I think that if what Google or OpenAI do is illegal, then the law is wrong, not Google or OpenAI.


Search engines are an index to a first approximation. I realize they sometimes hoist a thesis sentence or two up to their page, but it feels substantially different than automating derivative works for sale.


what is the substantive difference between googling "filter records in dataframe by regular expression in Julia" vs asking ChatGPT/Grok/Claude the same thing (to name a totally non-random example I've been working with lately)?


Not much difference for uncopyrightable facts, but ask for a portrait, report, or whole program, and you get closer to a problem. There is precedent around derivatives called "fair use," that can maybe give some guidance.

Oracle lost the API copyright case, which might also be an important precedent.


ChatGPT is a substitute to the original work, Google redirects to the original work.

(and yes,the synthesis on the top of the page is a problem, I agree)


Last I checked Google is not buying or pirating books for Google Search they just grab free data that has been provided.


What do you mean by “provided”, exactly? Just because something can be accessed via HTTP GET request doesn’t mean it’s legal to fetch it, and does not give you an implicit license to do with it whatever you want. Google, in fact, will happily, scrape, index, and serve queries from PDFs of illegally pirated copyrighted books.


You might want to check again: https://en.wikipedia.org/wiki/Authors_Guild,_Inc._v._Google,....

> For works still under copyright, Google scanned and entered the whole work into their searchable database, but only provided "snippet views" of the scanned pages in search results to users.


The same way we can't process what a trillion dollars looks like, we can't actually process what large scale theft looks like. For shits and giggles, these people also have a trillion dollars.


Is it the exact same though?


Sorry, do you have actual point, or are just trying to be pedantic? Strictly speaking, it’s not exact same, because no two different things are exact same, by definition. However, my point is that the same principle should apply to both.


No, he just means to say they are very different and if you disagree, you should think about it again.


I don’t think it’s even that close to the same thing tbh.


Even before they added the AI preview, they would take sentences that looked like answers to your query and display them at the top of the page verbatim.




Consider applying for YC's Summer 2025 batch! Applications are open till May 13

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: