Tutorial

How to scrape any page straight into Google Sheets

A tutorial for piping Stekpad scrapes into Google Sheets using the native export, with a live example and a credit cost breakdown.

Stekpad TeamApril 5, 20266 min read

On this page

Path A — The web app's native Sheets sync
Path B — The REST API plus the Google Sheets API
When to reach for the TypeScript SDK
Path C — Ask Claude to do it over MCP
How to pick
Next steps

You have a list of URLs. You want the scraped rows to land in a Google Sheet your team already opens every morning. Not a CSV download, not a Notion page, not a JSON file in an S3 bucket. A Sheet. This post walks through three ways to do that with Stekpad, the code for each, the credit cost, and when to pick which one.

The three paths are:

The Stekpad web app's native Sheets sync (two clicks, no code).
A POST /v1/scrape call plus the Google Sheets API, run from your own script.
An MCP conversation with Claude where the model calls scrape on Stekpad and sheets.append on a Sheets MCP server.

All three end up at the same place: rows in a tab. The difference is who drives the loop, how often it runs, and whether you want to touch a keyboard at all.

Path A — The web app's native Sheets sync

If you already use the Stekpad web app, this is the shortest path. Every dataset in the app has an Export dropdown in the toolbar, and one of the options is Google Sheets. The flow:

Run a scrape or crawl. Every run lands in a dataset. Let's say you crawled https://news.ycombinator.com with limits.max_pages: 50 and the rows are sitting in a dataset called HN Front Page.
Open the dataset, click Export, pick Google Sheets.
The first time, you consent to a Google OAuth scope that only includes https://www.googleapis.com/auth/drive.file — which means Stekpad can only create files and edit the files it created. It cannot read your existing Drive.
Pick New sheet or Append to existing. If you pick append, paste the Sheet URL and the tab name.
Hit Sync. Every column on the dataset becomes a column in the Sheet. Enricher columns are included. Hidden columns are skipped.

The sync has two modes. One-shot writes the current state of the dataset and stops. Live sync re-pushes the dataset every time it changes — new row scraped, new enricher run, row edited by hand. Live sync is throttled to one push per minute so you do not burn your Google Sheets API quota on a 10k-row dataset.

Credit cost: the sync itself is free. You only pay for the scrape credits that populated the dataset. A 50-page crawl of Hacker News is 50 credits. The Sheets push is 0.

When Path A is the right answer: you are a non-developer, or you are a developer who wants the result in a Sheet without writing a second service. The web app is doing the work, including the OAuth dance. You can keep the sync live, kill it from the toolbar, or disconnect the Google account from Settings → Integrations.

Pros: no code, no OAuth setup, live sync for free, uses the drive.file scope so Stekpad never sees any other file in your Drive. Cons: you have to run scrapes from the web app, not from a cron script on your own box.

Path B — The REST API plus the Google Sheets API

If you are building a pipeline — something that runs on a schedule, lives in your own repo, and writes to a specific Sheet you already own — you want Path B. You call POST /v1/scrape on Stekpad, you call spreadsheets.values.append on Google, you glue them together.

Here is the full example in Python. It scrapes ten product URLs, extracts name and price with a schema, and appends the rows to a tab called Products.

python

import os
import requests
from google.oauth2.credentials import Credentials
from googleapiclient.discovery import build
 
STEKPAD_API_KEY = os.environ["STEKPAD_API_KEY"]
SHEET_ID = "1AbCd...yourSheetId"
TAB_NAME = "Products"
 
URLS = [
    "https://example.com/p/1",
    "https://example.com/p/2",
    # ...eight more
]
 
PRODUCT_SCHEMA = {
    "type": "object",
    "properties": {
        "name":  {"type": "string"},
        "price": {"type": "number"},
        "sku":   {"type": "string"},
    },
    "required": ["name", "price"],
}
 
def scrape_one(url: str) -> dict:
    r = requests.post(
        "https://api.stekpad.com/v1/scrape",
        headers={"Authorization": f"Bearer {STEKPAD_API_KEY}"},
        json={
            "url": url,
            "formats": ["json"],
            "schema": PRODUCT_SCHEMA,
            "persist": False,
        },
        timeout=60,
    )
    r.raise_for_status()
    return r.json()["json"]
 
rows = [
    [row["name"], row["price"], row.get("sku", "")]
    for url in URLS
    for row in [scrape_one(url)]
]
 
creds = Credentials.from_authorized_user_file("token.json")
sheets = build("sheets", "v4", credentials=creds)
 
sheets.spreadsheets().values().append(
    spreadsheetId=SHEET_ID,
    range=f"{TAB_NAME}!A:C",
    valueInputOption="USER_ENTERED",
    body={"values": rows},
).execute()
 
print(f"Appended {len(rows)} rows to {TAB_NAME}.")

Ten URLs with extract schema cost 5 credits each, so 50 credits total. If you drop the schema and take the markdown yourself, it drops to 1 credit per URL, or 10 credits total. Pick the trade-off you want.

A note on `persist: false`. The call above sets persist: false, which means Stekpad returns the scrape by value and does not store a row in a dataset. If you want the data in a Stekpad dataset and in the Sheet, drop that flag and the row lands in both places.

The Sheets half of the script uses the standard google-api-python-client flow. If you have not set up OAuth before, the Google quickstart at developers.google.com/sheets/api/quickstart/python gets you a token.json in about five minutes. After that, every subsequent run of the script reuses the token.

Pros: runs on your machine, runs on a cron, lives in your repo, writes to a Sheet you own with your own OAuth credentials. Cons: two APIs to manage, two rate limits, two sets of error handling. If the Sheet is deleted, the script breaks — Stekpad cannot recover it for you.

When to reach for the TypeScript SDK

If you are in a Node codebase, the TypeScript SDK shape is identical. Swap requests.post for await sp.scrape(...), swap the Google client for googleapis. The contract — scrape returns a row, you push the row to Sheets — stays the same.

Path C — Ask Claude to do it over MCP

This is the shortest path for an agent. If you have already installed the Stekpad MCP server in Claude Desktop (see /docs/mcp), and you have also installed one of the Google Sheets MCP servers floating around npm, Claude can compose them. You type one sentence, the model picks the tools.

Your Claude Desktop config looks like this:

json

{
  "mcpServers": {
    "stekpad": {
      "command": "npx",
      "args": ["-y", "@stekpad/mcp"],
      "env": {
        "STEKPAD_API_KEY": "stkpd_live_..."
      }
    },
    "sheets": {
      "command": "npx",
      "args": ["-y", "@modelcontextprotocol/server-google-sheets"],
      "env": {
        "GOOGLE_OAUTH_TOKEN": "ya29..."
      }
    }
  }
}

Restart Claude Desktop. You now have both servers connected. Then you say, in the chat:

Scrape the first ten results from https://news.ycombinator.com, extract the title and URL of each story, and append them as rows to the HN tab of my sheet at 1AbCd...yourSheetId.

Claude calls stekpad.scrape with a schema, gets back ten rows, calls sheets.append with the values, and reports the credit cost and the Sheet link in the chat. The whole thing runs in your Claude window. If you want to do it again tomorrow with a different URL, you type one more sentence.

Credit cost: 1 per scrape for a basic fetch, 5 per URL if you ask for an extract schema. Reads against existing Stekpad datasets (list_datasets, get_dataset, query_dataset) are free — so if you already scraped the page yesterday, Claude can query the existing dataset and push those rows to a Sheet for zero Stekpad credits.

Pros: zero code, works from inside any chat, composes with other MCP servers (Sheets, Notion, Slack, whatever you have installed). Cons: you need the Claude window open, and the chat context is the control plane, so auditing becomes a matter of scrolling back through conversation.

How to pick

Short version:

Path A if you are in the Stekpad app. The sync button already exists. Use it. You get live updates for free.
Path B if you are building a pipeline. You want cron, you want error logs in your own stack, you want one Sheet wired to one script in your repo.
Path C if you are an agent or you talk to agents. The composition is the whole point. Claude picks the tools, and the next request is a different sentence, not a different script.

The three paths all hit the same POST /v1/scrape endpoint under the hood. The only difference is who holds the API key and how the result reaches the Sheet. If you change your mind later — you start in the app, then you graduate to a pipeline, then you hand the pipeline to an agent — no data moves, because the Stekpad dataset is the same object in all three worlds.

Next steps

Read the REST API reference for scrape to see every field on the request and response.
Install the MCP server for Claude, Cursor, and Claude Code if you want to drive Path C.
See the full pricing table for credit costs on scrape, extract, and crawl.

Stekpad Team

We build Stekpad. We scrape the web, store it, and enrich it — from an API, from an app, or from Claude.

Tutorial

How to scrape any page straight into Google Sheets

A tutorial for piping Stekpad scrapes into Google Sheets using the native export, with a live example and a credit cost breakdown.

Stekpad TeamApril 5, 20266 min read

On this page

Path A — The web app's native Sheets sync
Path B — The REST API plus the Google Sheets API
When to reach for the TypeScript SDK
Path C — Ask Claude to do it over MCP
How to pick
Next steps

The three paths are:

The Stekpad web app's native Sheets sync (two clicks, no code).
A POST /v1/scrape call plus the Google Sheets API, run from your own script.
An MCP conversation with Claude where the model calls scrape on Stekpad and sheets.append on a Sheets MCP server.

All three end up at the same place: rows in a tab. The difference is who drives the loop, how often it runs, and whether you want to touch a keyboard at all.

Path A — The web app's native Sheets sync

If you already use the Stekpad web app, this is the shortest path. Every dataset in the app has an Export dropdown in the toolbar, and one of the options is Google Sheets. The flow:

Run a scrape or crawl. Every run lands in a dataset. Let's say you crawled https://news.ycombinator.com with limits.max_pages: 50 and the rows are sitting in a dataset called HN Front Page.
Open the dataset, click Export, pick Google Sheets.
The first time, you consent to a Google OAuth scope that only includes https://www.googleapis.com/auth/drive.file — which means Stekpad can only create files and edit the files it created. It cannot read your existing Drive.
Pick New sheet or Append to existing. If you pick append, paste the Sheet URL and the tab name.
Hit Sync. Every column on the dataset becomes a column in the Sheet. Enricher columns are included. Hidden columns are skipped.

Credit cost: the sync itself is free. You only pay for the scrape credits that populated the dataset. A 50-page crawl of Hacker News is 50 credits. The Sheets push is 0.

When Path A is the right answer: you are a non-developer, or you are a developer who wants the result in a Sheet without writing a second service. The web app is doing the work, including the OAuth dance. You can keep the sync live, kill it from the toolbar, or disconnect the Google account from Settings → Integrations.

Path B — The REST API plus the Google Sheets API

Here is the full example in Python. It scrapes ten product URLs, extracts name and price with a schema, and appends the rows to a tab called Products.

python

import os
import requests
from google.oauth2.credentials import Credentials
from googleapiclient.discovery import build
 
STEKPAD_API_KEY = os.environ["STEKPAD_API_KEY"]
SHEET_ID = "1AbCd...yourSheetId"
TAB_NAME = "Products"
 
URLS = [
    "https://example.com/p/1",
    "https://example.com/p/2",
    # ...eight more
]
 
PRODUCT_SCHEMA = {
    "type": "object",
    "properties": {
        "name":  {"type": "string"},
        "price": {"type": "number"},
        "sku":   {"type": "string"},
    },
    "required": ["name", "price"],
}
 
def scrape_one(url: str) -> dict:
    r = requests.post(
        "https://api.stekpad.com/v1/scrape",
        headers={"Authorization": f"Bearer {STEKPAD_API_KEY}"},
        json={
            "url": url,
            "formats": ["json"],
            "schema": PRODUCT_SCHEMA,
            "persist": False,
        },
        timeout=60,
    )
    r.raise_for_status()
    return r.json()["json"]
 
rows = [
    [row["name"], row["price"], row.get("sku", "")]
    for url in URLS
    for row in [scrape_one(url)]
]
 
creds = Credentials.from_authorized_user_file("token.json")
sheets = build("sheets", "v4", credentials=creds)
 
sheets.spreadsheets().values().append(
    spreadsheetId=SHEET_ID,
    range=f"{TAB_NAME}!A:C",
    valueInputOption="USER_ENTERED",
    body={"values": rows},
).execute()
 
print(f"Appended {len(rows)} rows to {TAB_NAME}.")

A note on `persist: false`. The call above sets persist: false, which means Stekpad returns the scrape by value and does not store a row in a dataset. If you want the data in a Stekpad dataset and in the Sheet, drop that flag and the row lands in both places.

When to reach for the TypeScript SDK

Path C — Ask Claude to do it over MCP

Your Claude Desktop config looks like this:

json

{
  "mcpServers": {
    "stekpad": {
      "command": "npx",
      "args": ["-y", "@stekpad/mcp"],
      "env": {
        "STEKPAD_API_KEY": "stkpd_live_..."
      }
    },
    "sheets": {
      "command": "npx",
      "args": ["-y", "@modelcontextprotocol/server-google-sheets"],
      "env": {
        "GOOGLE_OAUTH_TOKEN": "ya29..."
      }
    }
  }
}

Restart Claude Desktop. You now have both servers connected. Then you say, in the chat:

Scrape the first ten results from https://news.ycombinator.com, extract the title and URL of each story, and append them as rows to the HN tab of my sheet at 1AbCd...yourSheetId.

How to pick

Short version:

Path A if you are in the Stekpad app. The sync button already exists. Use it. You get live updates for free.
Path B if you are building a pipeline. You want cron, you want error logs in your own stack, you want one Sheet wired to one script in your repo.
Path C if you are an agent or you talk to agents. The composition is the whole point. Claude picks the tools, and the next request is a different sentence, not a different script.

Next steps

Read the REST API reference for scrape to see every field on the request and response.
Install the MCP server for Claude, Cursor, and Claude Code if you want to drive Path C.
See the full pricing table for credit costs on scrape, extract, and crawl.

Stekpad Team

We build Stekpad. We scrape the web, store it, and enrich it — from an API, from an app, or from Claude.

How to scrape any page straight into Google Sheets

Path A — The web app's native Sheets sync

Path B — The REST API plus the Google Sheets API

When to reach for the TypeScript SDK

Path C — Ask Claude to do it over MCP

How to pick

Next steps

How to scrape Google Maps leads without getting blocked

Build a lead enrichment pipeline with Claude and Stekpad

Scrape without XPath — what a schema-first extractor actually looks like

Try the API. Free to start.

How to scrape any page straight into Google Sheets

Path A — The web app's native Sheets sync

Path B — The REST API plus the Google Sheets API

When to reach for the TypeScript SDK

Path C — Ask Claude to do it over MCP

How to pick

Next steps

How to scrape Google Maps leads without getting blocked

Build a lead enrichment pipeline with Claude and Stekpad

Scrape without XPath — what a schema-first extractor actually looks like

Try the API. Free to start.