Contrary to popular belief, there’s nothing shady or illicit about web scraping itself. That does not mean that any kind of web scraping is legal. Like all human activity, it needs to remain within certain boundaries. In web scraping, the most important boundaries are personal data and intellectual property regulations, but other factors, such as the website’s terms of service, can play a role as well.
If you want to learn more about the legality of web scraping, read Is Web Scraping legal by Apify's legal expert Ondra Urban. What follows are some snippets from that article.
Scraping personal data
Since the regulations are different around the world, you need to think carefully about where from and whose data you scrape. In some countries, it might be completely fine, while in other places you should avoid personal data completely. If you want to learn more, here's a great comparison of the GDPR and CCPA.
Scraping copyrighted content
If a piece of content is copyrighted, it means, among other things, that you cannot make copies of it without the author's consent (license) or legal permission. Since the very definition of scraping is copying of content, and you almost never have the author's explicit consent, legal permissions are your best bet. As always, the laws around the world vary.
Data mining in the European Union
In the EU, the scraping of copyrighted content is permitted by Article 3 and 4 of the Directive 2019/790 on copyright and related rights in the Digital Single Market (DSM Directive). The DSM Directive permits text and data mining, which means any automated analytical technique aimed at analysing text and data in digital form in order to generate information which includes, but is not limited to, patterns, trends and correlations.
Fair use in the United States
In the US, the scraping of copyrighted content is permitted by the fair use doctrine. The rules are somehow similar to the European ones, but they do not make a sharp distinction between scientific research and for-profit scraping. The fundamental case law for the application of fair use to scraping is the Authors Guild v. Google (Google Books case). In the Google Books case, the court found that making virtual copies of copyrighted content - whole books - was permitted under fair use.
Terms of use and web scraping
Can websites contractually limit scraping in their terms of use? Yes, they can. This may change in the future, but nothing currently prevents the website owner from adding provisions that ban scraping or automated access. But the real question is: Are those provisions enforceable? The legal theory behind contract enforceability is rather complex, but when talking about web scraping, the number one thing to check is the way how the contract was created.
You can find out more information about the enforceability of terms of use in Are website terms of use enforced.
Further reading
If you still haven't had enough, here's a list of a few forward-looking texts that explain the topic in more detail.