Opinions on web scraping are often divided – some people enjoy it, while others despise it. Proponents argue that web data can improve the world and increase productivity, while opponents claim that web scraping is harmful. However, the legality of web scraping is typically the crux of the matter.
Given the heightened attention on this issue due to high-profile cases like LinkedIn vs HiQ, we’ve created a guide to distinguish fact from emotion and clarify when web scraping is legal or illegal in 2022. It’s important to note that we are not lawyers, and the information presented here is based solely on our experience working with thousands of clients who scrape the web. If you have doubts about your own project, please seek legal advice.
Is Web Scraping Legal?
Often, individuals make sweeping claims about the legality of web scraping based on their own interests. Web scrapers may argue that web scraping is always legal, while corporate lawyers and anti-bot companies may argue the opposite.
The reality is that there is no straightforward answer to this question. The legality of web scraping depends on the specific situation and the definition of web scraping being used. In this context, we define web scraping as the process of gathering data from the internet. Scraping data from other websites is an essential component of many legitimate data analysis operations. Web scraping itself is not inherently illegal, but it can become illegal or fall into a gray area depending on the following three factors:
- The type of data being scraped
- How the scraped data will be used
- The method used to extract the data from the website
The first two factors are more clear-cut, so we’ll address them before delving into the trickier third factor.
Which types of data are prohibited from being scraped?
Be it e-commerce, personal or article data, the type of data you are scraping and how you plan to use it can have a huge bearing on its legality.
Unbeknown to many, the final use case of the data often has a significant impact on whether or not it is legal to scrape.
Sometimes it can be perfectly legal to scrape a website, but how you intend to use the data can make it illegal.
The two types of data we need to worry about:
- Personal Data
- Copyrighted Data
If the data you are scraping doesn’t match any of the above then you are generally safe.
Data Type #1: Personal Data
Personal data, or personally identifiable information (PII) as it is technically known, is any data that could be used to directly or indirectly identify a specific individual.
With the introduction of GDPR in 2018, the California Consumer Privacy Act and outrage that accompanied scandals such as Cambridge Analytica’s interference in the 2016 US Presidential Election, the issue of personal data has become a hot topic and one that every web scraper must be cognisant of.
Every legal jurisdiction has different regulations governing personal data, however in general, in jurisdictions with the latest consumer privacy legislation (the EU, California, etc.), it is illegal for companies to obtain, store and/or use someone’s personal data without their consent or without having a lawful reason for doing so.
Types of personal data include:
- Name
- Phone Number
- Address
- User Name
- IP Address
- Date of Birth
- Employment Info
- Bank or Credit Card Info
- Medical Data
- Biometric Data
In the vast majority of cases (lead generation, sales intelligence, etc.), when scraping personal data from a website you don’t have the consent of the data owner (the person whose data you are scraping) to scrape their data and it’s very hard to argue you have one of these lawful reasons to do so:
- Consent – the data subject consented to us having their data.
- Contract – the personal data is required for performance of a contract with the data subject.
- Compliance – necessary for compliance with a legal obligation.
- Vital Interest, Public Interest, or Official Authority – typically only applicable for state-run bodies where access to personal data is in the public’s interest.
- Legitimate Interest – necessary for our legitimate interests.
As a result, in most cases scraping the personal data of a citizen of the EU or California could result in your web scraping being deemed illegal.
If you’re not extracting any personal data, or just the personal data of non-EU or Californian citizens, then you are likely safe to keep scraping.
Data Type #2: Copyrighted Data
The second type of data you need to be careful of scraping is copyrighted data.
Copyrighted data is data owned by businesses and individuals with explicit control over its reproduction and capture.
Like the use of copyrighted images and songs, just because the data is publicly available on the internet doesn’t mean it is legal for it to be scraped without the owner’s consent. You could be infringing the owner’s copyright by scraping their data.
This generally applies the following types of web data:
- Articles
- Videos
- Pictures
- Stories
- Music
- Databases
Scraping copyrighted data itself isn’t illegal, it’s what you plan to do with the copyrighted data that could potentially make it illegal.
One person could scrape a copyrighted article and be perfectly legal to do so, however, someone else could scrape the same article and be found to have breached the owner’s copyright.
It really depends on how you plan to use the data after you’ve scraped the data.
- Can you argue fair use? Instead of replicating the article in full, you plan to use snippets of the original article.
- Can you argue that the data is factual, therefore not copyrightable? Facts like product names, prices, features, etc. aren’t covered by copyright laws so can you argue the data you plan to scrape is factual in nature.
A trickier aspect to copyright law, however, is the issue of database rights . A database is an organized collection of materials that permits a user to search for and access individual pieces of information contained within the materials.
This means that it can be illegal to scrape a full database from the web and then reproduce it exactly for your own purposes.
Again the US and the EU have different regulations around what constitutes a database and what legal protections they give to the database owner. So it is important to understand the rules and regulations for the legal jurisdictions you are scraping in.
The risks of infringing someone’s database rights can be mitigated by altering how the data is scraped and used. These two tips help ensure you’re conducting ethical data scraping with copyrighted data:
- Only scrape some of the available data;
- Do not replicate the organisational structure of the original database;
Okay, so far we’ve covered what types of data can be illegal to scrape, and have seen how you plan to use the scraped data can affect its legality.
Next, we’re going to answer the most contentious issue about the legality of web scraping: how you extract the data from the website .
Ensure Your Web Scraping is Legal with These Sanity Checks
So there you go, we’ve discussed all the main issues that determine the legality of your web scraping. In the majority of cases we see, what companies want to scrape is perfectly legal.
However, we always advise them to double-check their plans to ensure they’re conducting both legal and ethical web scraping with these three simple checks:
- Am I scraping personal data?
- Am I scraping copyrighted data?
- Am I scraping data from behind a login?
If your answers to all three of these questions is “No”, then your web scraping is legal.
However, if you answer “Yes” to any of them, then you should take a step back and do a full legal review of your web scraping to ensure you’re not scraping the web illegally.