Search Engine Results Page Data Extraction: Law and Ethics

Interned data extraction, also known as web scraping, has existed in a sort of a gray area, both legally and morally. While the process can be both extremely useful for the user and benign to the target websites, there’s no denying that scrapers and bots, in general, can be a huge nuisance too.

The subject of data extraction may seem uncertain to some, maybe shady, but it’s actually very straightforward. Let us show you:

Extracting data — legal or not?

The action of extracting data in itself is legal. After all, it’s almost the same as a normal user looking at that data and storing the information in their brain. It’s just more resistant to memory lapses when stored on a computer.

The type of data you gather and what you do with it in the aftermath is the crux of the problem. We’ll go over both subjects and define what’s nice and what’s naughty.

Declassifying Data

When a person or company creates content, be it text, software, images, music, or whatever else, they own it by default. That constitutes it as copyrighted data, and it’s illegal for someone else to take it and use it for commercial purposes.

If it’s posted publicly, as in, you’re allowed to access it, it’s ok to extract the information. What isn’t legal is to repost it under your name or brand, sell it to others, incorporate it into a product or service that makes you money, or other exploitative actions.

Check the copyright laws in your country if you’re unsure whether the actions you want to take would violate copyright laws.

Another big subject is personal information, the type of data that has led to GDPR and a whole load of companies bugging their audience for consent to continue sending them emails. WHile GDBP only applies to EU citizens, each country in the rest of the world has its own rules, some more outdated than others.

Personal information is anything that can be used to identify that person, so it covers:

As far as the GDPR is concerned, the general rule for extracting personal data is to not do it. There are exceptions, though: if the person gives their consent, which is rare and complicated to obtain in bulk, and if the scraper has a legitimate interest in the data, which is difficult to prove.

Some Terms of Service may apply

After squaring things away with the law, there’s the concern of what the search engines want. Service providers like Google, Bing, or Yandex all have their own Terms of Service that users agree to in order to use the engines.

This may be news for you since no one really reads those, but rest assured, if you have a Gmail account, you’ve explicitly accepted their ToS.

Anyway, like different countries, each business is free to define its own rules and guidelines. What you can definitely expect from each one is something along the lines of “don’t disrupt our business and don’t cause harm.”

For some companies, any sort of scraping may be viewed as a malicious action, even though you’d have to send a vast number of requests in quick succession to make a dent in a search engine’s processing power.

In other cases, the search engine developers may recognize the need to quickly extract and store data in JSON format, so they may even capitalize on the opportunity by releasing their own APIs and billing methods.

Just because something is against the Terms of Service doesn’t mean it’s against the law. Instead, if a search engine explicitly states that they may block users who do certain actions, well, you can expect to get blocked for doing those actions.

The definition of “harmful actions” can be rather vague, so it’s recommended that whenever you’re extracting SERP data to use proxies. You’re not dealing with the developers but rather their scripts designed to protect against malicious bots. Even if your extracting software is benign, they can’t really stop to ask it, so it’s a bit of a wild west environment, where bots shoot first and ask questions never.

Come to think of it, that’s an apt description for the Internet in general. Data extraction is considered by many a legal gray area because it’s largely unaddressed in legislation and there’s no consensus between different states on a universal policy on web scraping. GDPR is the closest thing, and that only applies to the European Union.

How to get valuable data without stepping on any toes

Leaving regulations and search engine policies aside, it’s pretty clear that all anyone wants is to be treated nicely and fairly. You can do that while also getting the insightful data you want from SERPs.

Here are a few basic rules to follow so that you don’t accidentally cause harm:

This article has been mainly on the topic of if you should extract data from SERPs and how to ensure you do it safely. But there’s also the matter of how to actually go through with it too. On that matter, it’s a lot less work actually.

Here’s our advice: use the SearchData REST API! You even get 100 free searches to see for yourself how much a piece of software can change your strategy.