Why SiteRows? Query any website with SQL ... no scraping code required.
How it works: Enter URL(s) → write a query → get structured data → automate via API when needed.
How it’s useful: The simplest “page → table” query: get link text and destination URLs in one shot.
URL
SQL
SELECT "TEXT", "RESOLVEDHREF"
FROM @a
WHERE length(trim(coalesce("TEXT", ''))) > 0
ORDER BY "INDEX";
How it’s useful: Instant “what’s on this page?” dashboard—great for QA, monitoring, and quick comparisons across pages.
URL
SQL
SELECT 'Number of links' AS description, count(*) AS count FROM @a UNION ALL SELECT 'Number of images' AS description, count(*) AS count FROM @img UNION ALL SELECT 'Number of headings' AS description, count(*) AS count FROM @headings UNION ALL SELECT 'Number of list items' AS description, count(*) AS count FROM @li UNION ALL SELECT 'Number of meta tags' AS description, count(*) AS count FROM @meta ORDER BY description;
How it’s useful: Pull out links from a trusted page to find good sources to crawl, save, and use later (including for RAG).
URL
SQL
SELECT "URL", "TEXT", "RESOLVEDHREF" FROM @a WHERE "RESOLVEDHREF" LIKE 'http%' ORDER BY "URL", "RESOLVEDHREF" LIMIT 200;
How it’s useful: On one busy reference page, the same destination is often linked many times (nav, “see also,” specs)—this surfaces the repeat winners.
URL
SQL
SELECT "RESOLVEDHREF", count(*) AS n FROM @a WHERE "RESOLVEDHREF" LIKE 'http%' GROUP BY "RESOLVEDHREF" ORDER BY n DESC LIMIT 50;
How it’s useful: Turn a page into a clean outline so you can split content by section (great for embeddings / RAG).
URL
SQL
SELECT "URL", "TAG", "TEXT"
FROM @headings
WHERE "TAG" IN ('h1','h2','h3')
ORDER BY "URL"
LIMIT 200;
How it’s useful: Pull external sites people cite in answers (docs, blog posts, tools) - good for research or RAG
URL
SQL
SELECT "URL", "TEXT", "RESOLVEDHREF"
FROM @a
WHERE "RESOLVEDHREF" LIKE 'http%'
AND "RESOLVEDHREF" NOT LIKE 'https://stackoverflow.com/%'
AND length(trim(coalesce("TEXT", ''))) >= 8
ORDER BY length("TEXT") DESC
LIMIT 50;
How it’s useful: Pull Open Graph fields (title, image, type, …) from meta tags - handy for link previews, SEO checks, or structured hints for RAG.
URL
SQL
SELECT "URL", "property", "content"
FROM @meta
WHERE coalesce("property", '') LIKE 'og:%'
ORDER BY "URL", "property"
LIMIT 100;
How it’s useful: Get an image inventory and spot missing/weak alt text for accessibility and QA.
URL
SQL
SELECT "URL", "ALT", length(coalesce("ALT", '')) AS "ALTLENGTH"
FROM @img
ORDER BY ("ALT" IS NULL) DESC,
length(coalesce("ALT", '')) ASC
LIMIT 200;
How it’s useful: Standard HTML ties a label to a field with for=id, joining the label and input tables lists those pairs for forms inventory or accessibility checks.
URL
SQL
SELECT l."URL",
l."TEXT" AS label_text,
l."for" AS label_for_attr,
i."id" AS input_id,
i."type" AS input_type,
i."name" AS input_name
FROM @label l
JOIN @input i
ON l."URL" = i."URL"
AND trim(coalesce(l."for", '')) != ''
AND trim(coalesce(l."for", '')) = trim(coalesce(i."id", ''))
ORDER BY l."URL", l."for"
LIMIT 200;
How it’s useful: See how “deep” a doc is (H1/H2/H3/H4) so you can choose chunk sizes and structure for RAG.
URL
SQL
SELECT "TAG" AS heading_level, count(*) AS n
FROM @headings
WHERE "TAG" IN ('h1','h2','h3','h4')
GROUP BY "TAG"
ORDER BY heading_level;
How it’s useful: Build a clean list of posts to crawl and ingest (easy way to grow a RAG corpus).
URL
SQL
SELECT "URL", "TEXT", "RESOLVEDHREF"
FROM @a
WHERE "RESOLVEDHREF" LIKE 'http%'
AND length(trim(coalesce("TEXT", ''))) >= 12
AND "RESOLVEDHREF" NOT LIKE '%/tag/%'
AND "RESOLVEDHREF" NOT LIKE '%/authors/%'
ORDER BY "RESOLVEDHREF"
LIMIT 200;
How it’s useful: Auto-detect the “main” list on a page (by list size), then pull its items as clean rows - great for checklists, directories, or “all items” pages.
URL
SQL
WITH li AS (
SELECT "URL",
"PARENT",
"INDEX",
trim(coalesce("TEXT", '')) AS item_text
FROM @li
WHERE length(trim(coalesce("TEXT", ''))) > 0
),
top_list AS (
SELECT "URL", "PARENT", count(*) AS list_size
FROM li
GROUP BY "URL", "PARENT"
ORDER BY list_size DESC
LIMIT 1
)
SELECT li."URL",
li.item_text,
top_list.list_size
FROM li
JOIN top_list
ON li."URL" = top_list."URL"
AND li."PARENT" = top_list."PARENT"
ORDER BY li."INDEX"
LIMIT 200;