LinkedIn Scraping Legal Case
LinkedIn Scraping Legal Case: Is It Legal to Scrape Data?
Many people still consider web scraping to be in a gray area. Are you within your legal rights to scrape data from a web page? Where is the line between legal and illegal when you are using a web scraper? The recent LinkedIn legal case made scraping a little less gray, but what does this mean to you?
Web Scraping Is a Big Part of Many Businesses
Web scraping is a method of harvesting data from the Internet. This can be done manually by visiting web pages, then copying and pasting data into a spreadsheet. However, most web scraping is done in an automated fashion. Software is set up to access a website, a list of websites, or a list of pages, and retrieve data from them, which is then stored in a database for later analysis.
The truth is that Google is itself a web scraper. Google has software that crawls the Internet, parses web pages, and stores the results so they can be served when someone makes a Google search. This type of web scraping is often overlooked because it usually benefits the owners of the websites being scraped by bringing them more web traffic, though there have been cases that have been brought against Google as well.
In fact, many companies depend on the data they scrape from the Internet, though most do not have the clout of a search engine giant like Google. Here some examples of businesses that use web scraping:
- Real estate companies scrape MLS listings to populate their own websites.
- Comparison shopping sites scrape product information and pricing from e-commerce websites.
- Social media sentiment analysis scrapes data about products and services from a variety of social media sites.
- Many companies in various industries use web scraping to generate sales leads.
HiQ Labs didn't do much differently than these other companies. They just happened to scrape a large part of their data from the LinkedIn website, which resulted in a legal case.
Details of the HiQ Labs, Inc v. LinkedIn Corp Legal Case
This court case may have changed what is legal when it comes to web scraping. The final ruling was in favor of HiQ Labs, a company whose business model depends on web scraping in general and scraping LinkedIn in particular. Let's look at the details to see how this came about.
Introduction
This story starts in 2019 when LinkedIn served HiQ Labs with a cease-and-desist letter. The reason? LinkedIn wanted to stop HiQ from accessing and copying data from LinkedIn servers. HiQ Labs then filed suit against LinkedIn, because they were accessing information that was available to the public. HiQ wanted to prevent LinkedIn from invoking the Computer Fraud and Abuse Act (CFAA), the Digital Millennium Copyright Act (DMCA), California Penal Code § 502(c), or the California common law of trespass against them. HiQ also mentioned that LinkedIn was actively blocking access to the site.
Who Is HiQ Labs?
To understand why this ruling is important, it helps to know about HiQ Labs. HiQ is a data analytics company. They help companies retain their employees by applying data analytics and machine learning to employee data. A big part of their business involved collecting publically available data on users of LinkedIn using web scraping. They used this data for their "people analysis" predictive algorithms. It helps them to determine skill gaps within a company as well as help companies retain their best employees.
LinkedIn claimed that HiQ's actions were a violation of their user agreement and put their users’ privacy at risk. The case was finally settled in court with a ruling on September 19, 2019.
Final Ruling
The final ruling came out in favor of HiQ Labs. The court required LinkedIn to withdraw the cease-and-desist letter and stop blocking HiQ Labs from accessing the LinkedIn website.
Here are the key points to this ruling:
- HiQ did not violate the Computer Fraud and Abuse Act. This was because the data that HiQ Labs was scraping was available to anyone who browsed the LinkedIn website. In other words, HiQ did not have to log in with a username and password to access the data.
- Companies should not be able to revoke authorization where no authorization is needed in the first place.
- Companies like LinkedIn don't have the right to decide who can and can't access publically available information because it would be contrary to the public interest.
- LinkedIn's actions that involved blocking HiQ could cause irreparable harm to HiQ because it was key to their business model.
- LinkedIn users' privacy was not threatened by the actions of HiQ and did not jeopardize users' trust enough to outweigh HiQ's ability to run their business.
- Because LinkedIn was planning to provide an analytics tool similar to HiQ's, the case raised concern about whether LinkedIn was acting within the sphere of fair competition.
- The court came to the conclusion that the method HiQ Labs used to collect data from LinkedIn is the same method that researchers and academics use and that ruling in favor of HiQ benefitted public interest.
Is It Legal to Scrape Data from the Web?
In this LinkedIn case, scraping wasn't illegal because the data that HiQ Labs was scraping was information that anyone could access on the LinkedIn website without logging in. So web scraping in itself is not illegal, but you can still run into issues if you don't do your due diligence. Here are some tips to ensure that your web scraping activities are legal:
- Consult the site's terms of service: You may agree to a site's terms of service simply by browsing the site. If that website forbids automated data collection, scraping the site will violate the terms of service and may result in legal action.
- Only scrape publically available information: HiQ Labs was in the right because the data they scraped was publically available. If you have to log in to a website to scrape data, you are accessing private information and could be breaking the law.
- Watch out for copyrighted content: Copyrighted content is protected and the sites you are scraping may have rights to that content that you don't have. This data can include designs, images, videos, and articles that could be considered creative work.
How to Scrape Data More Effectively
Even when you have a legal right to scrape a site, it doesn't mean that the owners of that site won't make it hard for you to do. For example, Google's platform is legal to scrape since you don't have to log in to use the service, But Google and many other websites will have protection in place to prevent scraping. They may use Captchas, block your IP address, or do both if they detect a lot of traffic coming from the same IP.
The best way to prevent websites from blacklisting your IP address when you want to scrape data is by using residential proxies. For enterprises that do large-scale data harvesting and collection, a proxy service provider is a necessity because it will reduce failure rate, blocks, and throttling. Residential proxy services will rotate the IP you are using automatically, so each new scraping attempt looks like it is coming from a different user.
Summary
Web scraping has frequently operated in the gray areas of technology. In the past, it was hard to tell if it was legal. The LinkedIn case drew a much more distinct line between legal and illegal web scraping. If you are scraping data that is publically available to anyone who browses the Internet, you are not breaking the law. But that still does not mean that websites you scrape won't try to block you from doing it, which is why you need to use a residential proxy provider to scrape more effectively.