On this page
- The case that set the tone: hiQ Labs v LinkedIn
- Van Buren narrowed "exceeds authorised access"
- The EU picture: GDPR is the main constraint
- The Bright Data litigation and what it means
- What is settled, as of April 2026
- What is still live
- Practical rules we follow at Stekpad
- What to tell your legal team
- Next steps
The legal status of web scraping in 2026 is less scary than the internet would have you believe, and more nuanced than a single paragraph can capture. The good news: scraping public web pages, at reasonable rates, for your own use, remains broadly lawful in the United States and the European Union. The bad news: the edges are sharp, the case law is still moving, and the wrong headline can make a cautious legal team kill a product that would have been fine.
This is a practical primer. What is settled, what is still live, and what you should actually do on Monday morning.
Not legal advice. This post is written by engineers, not lawyers. It summarises publicly reported rulings and statutes as of April 2026. If you are running a scraping operation at any real scale, or touching authenticated data, or operating in a regulated sector, talk to a lawyer in the relevant jurisdiction before you write the first line of code.
The case that set the tone: hiQ Labs v LinkedIn
The single most cited scraping ruling is still hiQ Labs, Inc. v LinkedIn Corp. The Ninth Circuit decided in 2019, the Supreme Court sent it back in light of Van Buren v United States in 2021, and the Ninth Circuit re-affirmed in April 2022. The headline finding survived the remand: scraping publicly accessible pages does not, by itself, constitute unauthorised access under the US Computer Fraud and Abuse Act. You cannot hack something that is already open to the public.
In November 2022, hiQ lost the rest of its case on different grounds — breach of LinkedIn's terms of service. That distinction matters. The CFAA piece says scraping a public page is not a federal crime. The contract piece says that if you clicked through terms of service while signing up, those terms can still bind you civilly.
Two practical consequences for 2026:
- If you are scraping a page that does not require a login, the CFAA is not the argument that stops you. It rarely was.
- If you created an account on a site, you probably agreed to a ToS, and that ToS probably restricts automated access. Scraping from that account is a civil contract question, not a criminal one, but "not criminal" is a lower bar than your legal team will accept.
Van Buren narrowed "exceeds authorised access"
Van Buren v United States (2021) was about a police officer who ran a license plate through a database he was allowed to use, but for a reason he was not allowed to use it for. The Supreme Court held he did not "exceed authorised access" under the CFAA. The Court rejected the broad theory that any policy violation could become a federal crime.
Translated to scraping: a site's written policy ("no automated access") does not convert normal browsing into a federal offence. It can still be a breach of contract. It can still get your IP blocked. It is not, on its own, a crime.
The EU picture: GDPR is the main constraint
In the European Union, the Computer Fraud question barely registers. The live constraint is the General Data Protection Regulation. Any time you scrape a page that contains personal data — a name, an email, a photo, a LinkedIn profile, a company "About" page with a managing director — GDPR treats you as a data controller. That pulls in three duties:
- A lawful basis for processing. For most commercial scraping, this is Article 6(1)(f), legitimate interest, and it requires a documented balancing test.
- Transparency. If you store personal data, you have to tell the people whose data it is, unless the notification is impossible or disproportionately burdensome (Article 14(5)(b)). Clay, Apollo, and ZoomInfo all have public privacy notices that attempt to thread this needle.
- Rights. Subject access, rectification, erasure. You have to be able to find a given person in your dataset and delete them on request.
The 2022 Data Governance Act and the 2024 Data Act layered new rules on top of GDPR, mostly about sharing data between companies and with regulators. Neither one bans scraping; both assume it exists and regulate what you do with the result.
In August 2025, the Dutch DPA fined Clearview AI 30.5 million euros for scraping faces without consent. That case is not a scraping case — it is a biometrics and lawful-basis case. It matters because it shows how EU regulators think about the difference between collecting public data and using it as a data controller. Collection is rarely the problem. Storage, combination, and sale are where the cases land.
The Bright Data litigation and what it means
The Meta v Bright Data series of cases in 2023 and 2024 ended up reinforcing the hiQ line. A US district court held that scraping public Facebook pages without logging in did not breach Meta's terms, because those terms only bind logged-in users. The ruling was appealed; the appeal was narrowed; the headline stuck.
In parallel, the Cloudflare v Bright Data matter worked its way through the California courts in 2025 — a commercial dispute about whether Bright Data's proxy network was being used to evade Cloudflare's bot protections in a way that interfered with Cloudflare's customers' contracts. That case is about proxy networks and interference with contract, not about scraping as such. It is worth watching because a ruling either way will affect the unit economics of residential-proxy scraping at scale. It will not change whether you can scrape a public news site from your own datacentre.
What is settled, as of April 2026
- Scraping public, unauthenticated pages at reasonable rates is broadly lawful in the US and the EU.
- The US CFAA is not the argument that stops you from scraping a public page. The 2022 remand in hiQ and the 2021 Van Buren ruling both point the same way.
- Site terms of service can still bind you civilly, especially if you signed up for an account. They do not make scraping a crime. They can get you sued.
- Under GDPR, personal data pulled from public pages is still personal data. You need a lawful basis, a privacy notice, and a deletion mechanism.
- Biometric data is a different animal. Clearview, PimEyes, and every facial-recognition scraper face regulatory action in multiple jurisdictions.
What is still live
- Whether scraping at scale using residential proxies constitutes interference with the proxy target's contracts with its own CDN. See the Cloudflare v Bright Data litigation.
- Whether AI training data pulled from the open web crosses a copyright line. The US Authors Guild v OpenAI and the UK Getty v Stability AI cases are unresolved as of April 2026 and the outcomes will reshape training-data pipelines, not general-purpose scraping.
- How EU member states apply the Data Act's "access to data generated by connected products" regime to scraping-adjacent use cases.
Practical rules we follow at Stekpad
This is not legal advice, but it is the operational checklist we use for our own platform and our own customers.
- Scrape public pages first. If a URL returns 200 with no auth, you are in the easy part of the map.
POST /v1/scrapeon Stekpad targets public URLs by default. - Respect `robots.txt`. Our
crawlverb hasrespect_robots: trueas the default. Disabling it is a paid-plan feature, gated behind a checkbox that explains the implications. We are not interested in customers who want to ignore robots at scale. - Rate-limit yourself. Hitting a site every twenty milliseconds is how you turn a legal scraping operation into a tortious interference case. We cap concurrent requests per host and back off on 429s.
- Treat authenticated pages as the user's own data. This is the entire reason our cookie bridge exists. When you scrape
linkedin.com/in/me, that is your LinkedIn account looking at your profile from your browser. Stekpad's extension fetches the page in your Chrome, attaches your cookies locally, and posts the HTML back to our backend. Your session cookies never touch our servers. There is no credential pool, no proxy account, no shared login. See `/docs/cookie-bridge` for the architecture. - Never store other people's cookies. We do not hold session tokens for any site, for any user, ever. If our database is breached tomorrow, the attacker gets scrape results — not logins. That is a design property, not a policy.
- Give users a deletion button. Any dataset in Stekpad can be permanently deleted by its owner. We honour deletion requests on our own infrastructure within the windows GDPR requires.
- Do not scrape biometric data. Faces, voices, and iris patterns are a different legal regime. Our default templates refuse to land them in a dataset.
What to tell your legal team
If you are pitching a scraping project to an internal legal reviewer, these are the three questions they will ask. Be ready.
- Where is the data coming from? Public pages or authenticated pages? If authenticated, whose account is doing the fetch?
- What personal data are you storing, and under what lawful basis? You need a one-paragraph answer referencing either consent, contract, or legitimate interest. If you cannot name one of the six Article 6 bases, you are not ready.
- How does a data subject ask for deletion, and how long does it take? If the answer is "we do not have a process", you are not ready.
If you can answer all three in writing, most legal teams will sign off on public-page scraping in the US and the EU. The boring answer turns out to be the right answer.
Next steps
- Read the cookie bridge architecture doc to see why authenticated scraping with your own browser is the legally cleanest path.
- See the authenticated scraping guide for how our enrichment catalogue handles personal data.
- Review the pricing and retention table — retention windows matter for GDPR compliance.