Skip to article
Manifesto

Why Stekpad runs authenticated scrapes in your browser

The cookie bridge architecture — why we chose to fetch pages in the user's Chrome instead of proxying cookies to a server.

Stekpad Team7 min read
On this page

Every scraping vendor that supports authenticated pages has the same problem. To fetch linkedin.com/in/whoever on behalf of a user, you need that user's LinkedIn session. There are two ways to get it. One is to ask the user for their cookies and store them on your servers. The other is to have the user's own browser fetch the page and hand you back the HTML.

The first approach is what almost every vendor does. It is the wrong approach. This post is about why we picked the second one, what it costs, and what you get in return.

The two architectures, side by side

The first architecture — the common one — looks like this.

text
User --> Vendor backend --> Target site
[cookie store]

The vendor backend holds a database of session cookies, one per (user, site) pair. When the user calls scrape, the backend pulls the matching cookies out of storage, attaches them to an outbound request, and fetches the page from the vendor's own IPs. It works. It is also a walking breach incident. Every cookie in that database is a fully usable login. If the database is exfiltrated, the attacker has every user's session for every site Stekpad supports.

The second architecture — the cookie bridge — looks like this.

text
User's Chrome Stekpad backend Target site
+---------------+ WS +---------------+
| Extension | <---------- | Bridge worker |
| + cookies | | |
+---------------+ +---------------+
| ^
| 1. Receive fetch job |
| 2. Fetch with own cookies |
| 3. Post back HTML only ------+
|
v
Target site

The backend never sees the cookies. Not during the fetch, not before it, not after it. The only thing that travels from the user's browser to the Stekpad backend is the rendered HTML of the page. That is the entire architectural difference, and it changes everything downstream.

Step by step, what actually happens

Here is the full path of a POST /v1/scrape call with use_session: "linkedin.com", written out honestly.

  1. You call the API. POST /v1/scrape hits api.stekpad.com with {"url": "https://linkedin.com/in/someone", "use_session": "linkedin.com"}. Your API key goes in the Authorization header. No cookies in the request.
  2. The backend sees `use_session` and switches mode. Instead of fetching the page from a Stekpad server, it looks up the websocket for your workspace's active Chrome extension. If no extension is connected, the call returns session_unavailable with a next_action telling you to open Chrome.
  3. The backend pushes a fetch job down the websocket. The job contains the URL, the list of browser actions (scroll, click, wait), and a run id. Zero credentials. Zero cookies. Nothing secret.
  4. The extension runs the fetch inside your own tab. It uses the Chrome fetch() API, which attaches your existing LinkedIn cookies automatically because you are logged into LinkedIn in that browser. LinkedIn sees the request as coming from your own IP, your own User-Agent, your own session. Cloudflare, if LinkedIn has it in front, sees an entirely normal user request from a browser it already trusts.
  5. The extension posts the HTML back up the websocket. Just the HTML. Plus a small envelope with the run id and any screenshots the actions asked for. The cookies stay in the browser's own cookie store, behind the standard Same-Origin Policy.
  6. The backend stores the HTML in a dataset and returns the normal scrape response to the original caller. Markdown, JSON, metadata, everything. The call looks identical to a non-authenticated scrape from the caller's perspective. The difference is invisible except for the fact that the page worked when it would otherwise have hit a login wall.

One fetch. One round trip through your browser. Zero cookies on our servers.

The security argument: zero stored cookies is zero breach surface

This is the first-order reason we picked the architecture. If Stekpad's database were fully exfiltrated tomorrow — every row, every encrypted blob, every backup — the attacker would get scrape results and API keys. They would not get a single session cookie for a single third-party site, because there are none to get.

You cannot leak what you do not store. That is not a policy. It is a wiring diagram. The storage layer does not have a column for session cookies. There is no bucket labelled "linkedin_sessions". There is no service account with read access to a cookie jar, because there is no cookie jar.

Compare this to a vendor that runs fetches on their own servers. Their security model has to include credential storage, rotation, encryption at rest, access control on the decryption key, and an audit trail on who pulled which cookie when. All of that is work they have to do correctly, forever. A single mistake in any of those layers costs every customer their session on every integrated site.

We do not do any of that work, because we do not hold the thing.

When your own Chrome browser fetches a page you were already logged into, the legal question is about as simple as it ever gets for scraping: you are looking at your page with your session. You consented to LinkedIn's terms when you signed up. You consented to Stekpad's extension when you installed it. No third party is impersonating you.

Compare this to a vendor that holds your cookies and fetches pages from a data centre on your behalf. Now there is a third party sitting in the middle, using credentials you handed over, hitting the target site from IPs that are not yours. Depending on the jurisdiction and the target site's ToS, that can look like unauthorised automated access performed by the vendor, using credentials they were not supposed to persist. We read the hiQ and Van Buren line of cases (see `/blog/web-scraping-legality-2026`) and decided we did not want to be the third party.

Your session, fetched in your browser, is the cleanest possible legal shape for authenticated scraping. It is not a loophole. It is a straight line.

The performance argument: you look like yourself

Cloudflare, Akamai, DataDome, PerimeterX, Kasada — every bot-detection vendor on earth is in an arms race against scrapers that rotate IPs and forge fingerprints. The way they win is by noticing patterns that a normal user does not produce. A data centre IP making ten LinkedIn requests per second from a fresh User-Agent with no browser history is not a normal user.

Your own Chrome, on your own home or office network, with your own cookies and your own TLS fingerprint, is the most normal user on earth. You are you. There is no ruse to catch.

This is why authenticated scrapes through the cookie bridge work on sites that block server-side scrapers entirely. It is not because we have a trick. It is because we are not trying to pretend to be a real user. The browser fetching the page is the real user, because you are the one logged in. The detection vendors have nothing to flag.

The request from the cookie bridge is indistinguishable from a human opening a tab. That is the whole point. Not because we spoof it well. Because it is a human opening a tab, and the machine on the other end is the same machine the human was already using.

The honest tradeoff: you need the extension open

There is no free lunch. The cookie bridge only works when your Chrome is running and the Stekpad extension is connected to the backend websocket. If you close Chrome, your authenticated scrapes fail with session_unavailable.

We considered every way around this. A headless copy of your browser running on our servers — same security footprint as storing cookies. A "persistent session token" that can be replayed later — same thing, dressed up. A worker that runs on your own laptop even when Chrome is closed — doable, but crosses a line between "browser extension" and "local agent" that we did not want to cross on the first version.

So we ship the trade honestly. Public scrapes work 24/7 on our servers. Authenticated scrapes need your browser to be open. For most teams, that is fine — you are at your machine during the day, the scheduled crawls run while you are there, and the few that fail outside business hours get retried the next morning. For a 24/7 automated workflow against a logged-in site, the cookie bridge is not the right tool and we tell you that.

The trade, summarised

| Property | Server-side cookies | Stekpad cookie bridge | |---|---|---| | Stored cookies on vendor infra | Yes | Zero | | Breach blast radius | Every user's sessions | None | | Works when you are asleep | Yes | No | | Fingerprint / IP match to a real user | Usually not | Always | | Legal "whose session is this" answer | The vendor's, on your behalf | Yours | | Supports Cloudflare-protected sites | Sometimes | Yes |

We picked the right-hand column on every row except one. We think that is the correct trade for a 2026 product.

Next steps

Stekpad Team
We build Stekpad. We scrape the web, store it, and enrich it — from an API, from an app, or from Claude.

Try the API. Free to start.

3 free runs a day on the playground. No credit card. Install MCP for Claude in 60 seconds.

Why Stekpad runs authenticated scrapes in your browser — Stekpad — Stekpad