Use Cases — SiteRows

Instant dataset: pull all links from a page

How it’s useful: The simplest “page → table” query: get link text and destination URLs in one shot.

URL

SQL

SELECT "TEXT", "RESOLVEDHREF" 
FROM @a 
WHERE length(trim(coalesce("TEXT", ''))) > 0 
ORDER BY "INDEX";

Quick summary: count things on a page (UNION ALL)

How it’s useful: Instant “what’s on this page?” dashboard—great for QA, monitoring, and quick comparisons across pages.

URL

SQL

SELECT 'Number of links' AS description, count(*) AS count FROM @a
UNION ALL
SELECT 'Number of images' AS description, count(*) AS count FROM @img
UNION ALL
SELECT 'Number of headings' AS description, count(*) AS count FROM @headings
UNION ALL
SELECT 'Number of list items' AS description, count(*) AS count FROM @li
UNION ALL
SELECT 'Number of meta tags' AS description, count(*) AS count FROM @meta
ORDER BY description;

Wikipedia: grab links and link text

How it’s useful: Pull out links from a trusted page to find good sources to crawl, save, and use later (including for RAG).

URL

SQL

SELECT "URL", "TEXT", "RESOLVEDHREF"
FROM @a
WHERE "RESOLVEDHREF" LIKE 'http%'
ORDER BY "URL", "RESOLVEDHREF"
LIMIT 200;

MDN docs: most common outgoing links

How it’s useful: On one busy reference page, the same destination is often linked many times (nav, “see also,” specs)—this surfaces the repeat winners.

URL

SQL

SELECT "RESOLVEDHREF", count(*) AS n
FROM @a
WHERE "RESOLVEDHREF" LIKE 'http%'
GROUP BY "RESOLVEDHREF"
ORDER BY n DESC
LIMIT 50;

Article page: get the heading outline

How it’s useful: Turn a page into a clean outline so you can split content by section (great for embeddings / RAG).

URL

SQL

SELECT "URL", "TAG", "TEXT"
FROM @headings
WHERE "TAG" IN ('h1','h2','h3')
ORDER BY "URL"
LIMIT 200;

Stack Overflow: outbound links from answers

How it’s useful: Pull external sites people cite in answers (docs, blog posts, tools) - good for research or RAG

URL

SQL

SELECT "URL", "TEXT", "RESOLVEDHREF"
FROM @a
WHERE "RESOLVEDHREF" LIKE 'http%'
  AND "RESOLVEDHREF" NOT LIKE 'https://stackoverflow.com/%'
  AND length(trim(coalesce("TEXT", ''))) >= 8
ORDER BY length("TEXT") DESC
LIMIT 50;

Open Graph docs: read og: meta tags

How it’s useful: Pull Open Graph fields (title, image, type, …) from meta tags - handy for link previews, SEO checks, or structured hints for RAG.

URL

SQL

SELECT "URL", "property", "content"
FROM @meta
WHERE coalesce("property", '') LIKE 'og:%'
ORDER BY "URL", "property"
LIMIT 100;

Product page: list images + missing alt text

How it’s useful: Get an image inventory and spot missing/weak alt text for accessibility and QA.

URL

SQL

SELECT "URL", "ALT", length(coalesce("ALT", '')) AS "ALTLENGTH"
FROM @img
ORDER BY ("ALT" IS NULL) DESC,
         length(coalesce("ALT", '')) ASC
LIMIT 200;

Forms tutorial: join labels to their inputs

How it’s useful: Standard HTML ties a label to a field with for=id, joining the label and input tables lists those pairs for forms inventory or accessibility checks.

URL

SQL

SELECT l."URL",
       l."TEXT" AS label_text,
       l."for" AS label_for_attr,
       i."id" AS input_id,
       i."type" AS input_type,
       i."name" AS input_name
FROM @label l
JOIN @input i
  ON l."URL" = i."URL"
 AND trim(coalesce(l."for", '')) != ''
 AND trim(coalesce(l."for", '')) = trim(coalesce(i."id", ''))
ORDER BY l."URL", l."for"
LIMIT 200;

Docs page: count headings by level

How it’s useful: See how “deep” a doc is (H1/H2/H3/H4) so you can choose chunk sizes and structure for RAG.

URL

SQL

SELECT "TAG" AS heading_level, count(*) AS n
FROM @headings
WHERE "TAG" IN ('h1','h2','h3','h4')
GROUP BY "TAG"
ORDER BY heading_level;

Blog index: find likely article links

How it’s useful: Build a clean list of posts to crawl and ingest (easy way to grow a RAG corpus).

URL

SQL

SELECT "URL", "TEXT", "RESOLVEDHREF"
FROM @a
WHERE "RESOLVEDHREF" LIKE 'http%'
  AND length(trim(coalesce("TEXT", ''))) >= 12
  AND "RESOLVEDHREF" NOT LIKE '%/tag/%'
  AND "RESOLVEDHREF" NOT LIKE '%/authors/%'
ORDER BY "RESOLVEDHREF"
LIMIT 200;

Docs page: find the biggest bullet list (ul/li) and extract it

How it’s useful: Auto-detect the “main” list on a page (by list size), then pull its items as clean rows - great for checklists, directories, or “all items” pages.

URL

SQL

WITH li AS (
  SELECT "URL",
         "PARENT",
         "INDEX",
         trim(coalesce("TEXT", '')) AS item_text
  FROM @li
  WHERE length(trim(coalesce("TEXT", ''))) > 0
),
top_list AS (
  SELECT "URL", "PARENT", count(*) AS list_size
  FROM li
  GROUP BY "URL", "PARENT"
  ORDER BY list_size DESC
  LIMIT 1
)
SELECT li."URL",
       li.item_text,
       top_list.list_size
FROM li
JOIN top_list
  ON li."URL" = top_list."URL"
 AND li."PARENT" = top_list."PARENT"
ORDER BY li."INDEX"
LIMIT 200;