Skip to content

Requests table

httparchive.crawl.requests is a partitioned and clustered table containing one row per request per page tested in the HTTP Archive. Pages are tested on a monthly basis and as of April 2022, both the root page and one secondary page are tested.

Field name
Type
Description
dateDATEYYYY-MM-DD format of the HTTP Archive monthly crawl
clientSTRINGTest environment: 'desktop' or 'mobile'
pageSTRINGThe URL of the page being tested
is_root_pageBOOLEANWhether the page is the root of the origin
root_pageSTRINGThe URL of the root page being tested, the origin followed by /
urlSTRINGThe URL of the request
is_main_documentBOOLEANWhether this request corresponds with the main HTML document of the page, which is the first HTML request after redirects
typeSTRINGSimplified description of the type of resource (script, html, css, text, other, etc)
indexINTEGERThe sequential 0-based index of the request
payloadJSONJSON-encoded WebPageTest result data for this request
summaryJSONJSON-encoded summarization of request data
request_headersARRAY<RECORD>Request headers
response_headersARRAY<RECORD>Response headers
response_bodySTRINGText-based response body

This field is required for all queries over the requests table.

YYYY-MM-DD format of the HTTP Archive monthly crawl.

Example: date = '2023-06-01'

Test environment: 'desktop' or 'mobile'.

The URL of the page being tested.

Example: page = 'https://har.fyi/'

Whether the page is the root of the origin.

The URL of the root page being tested, the origin followed by /.

Example: root_page = 'https://har.fyi/'

The URL of the request

Whether this request corresponds with the main HTML document of the page, which is the first HTML request after redirects

Simplified description of the type of resource (script, image, css, html, other, font, text, video, xml, audio, wasm, etc)

The sequential 1-based index of the request

JSON-encoded WebPageTest result data for this request

See the Request payload reference for more details.

JSON-encoded summarization of request data

See the Request summary reference for more details.

Request headers

See the Header reference for more details.

Response headers

See the Header reference for more details.

Text-based response body

Here are some common operations you can perform with the requests table.

/* This query will process 85 GB when run. */
SELECT
client,
is_root_page,
count(0) AS requests_total
FROM `httparchive.crawl.requests`
WHERE date = '2024-05-01'
group by client, is_root_page

Let’s check the size of individual requests served from websites across the entire dataset. To do this, we’ll be using the respBodySize summary metric. This metric represents the size of the response payload in bytes. Since 1 byte is very granular, we’ll divide by 1024 to get to 1 KB and then by 100 so that we are looking at this data with bin sizes of 100KB. We’ll also wrap this in a CEIL() function to remove the decimal points and then multiply the result by 100. Using this technique, 1234567 bytes would be rounded to a bin of 1300 KB.

/* This query will process 26 GB when run. */
WITH requests AS (
SELECT
CEIL(INT64(summary.respBodySize)/1024/100)*100 AS responseSize100KB,
COUNT(0) OVER () AS total_requests
FROM `httparchive.crawl.requests` TABLESAMPLE SYSTEM (1 PERCENT)
WHERE
date = '2024-06-01' AND
client = 'desktop' AND
is_root_page AND
INT64(summary.respBodySize) > 0
)
SELECT
responseSize100KB,
COUNT(0) AS requests,
COUNT(0)/ANY_VALUE(total_requests) AS pct_requests
FROM requests
GROUP BY responseSize100KB
ORDER BY responseSize100KB ASC
LIMIT 10

We can see that that 91% of requests have a response size less than 100KB. Try repeating this with 10KB bin sizes and you’ll be able to see the spread of response sizes with more granularity.

Let’s filter out all of the non-Image content and examine the popularity of various image formats. For example, how often is jpg, gif, webp, etc used.

/* This query will process 8 GB when run. */
WITH requests AS (
SELECT
STRING(summary.format) AS format,
page,
COUNT(0) OVER() AS total_requests,
COUNT(DISTINCT page) OVER() AS total_pages
FROM `httparchive.crawl.requests` TABLESAMPLE SYSTEM (1 PERCENT)
WHERE
date = '2024-06-01' AND
client = 'desktop' AND
is_root_page AND
type = 'image'
)
SELECT
format,
COUNT(0) requests,
COUNT(DISTINCT page) pages,
ROUND(COUNT(0) / ANY_VALUE(total_requests), 2) percent_image_requests,
ROUND(COUNT(DISTINCT page) / ANY_VALUE(total_pages), 2) percent_pages
FROM requests
GROUP BY format
ORDER BY requests DESC