Skip to content

Requests table

httparchive.crawl.requests is a partitioned and clustered table containing one row per request per page tested in the HTTP Archive. Pages are tested on a monthly basis and as of April 2022, both the root page and one secondary page are tested.

Schema

Field name
Type
Description
dateDATEYYYY-MM-DD format of the HTTP Archive monthly crawl
clientSTRINGTest environment: 'desktop' or 'mobile'
pageSTRINGThe URL of the page being tested
is_root_pageBOOLEANWhether the page is the root of the origin
root_pageSTRINGThe URL of the root page being tested, the origin followed by /
urlSTRINGThe URL of the request
is_main_documentBOOLEANWhether this request corresponds with the main HTML document of the page, which is the first HTML request after redirects
typeSTRINGSimplified description of the type of resource (script, html, css, text, other, etc)
indexINTEGERThe sequential 0-based index of the request
payloadJSONJSON-encoded WebPageTest result data for this request
summaryJSONJSON-encoded summarization of request data
request_headersARRAY<RECORD>Request headers
response_headersARRAY<RECORD>Response headers
response_bodySTRINGText-based response body

date

This field is required for all queries over the requests table.

YYYY-MM-DD format of the HTTP Archive monthly crawl.

Example: date = '2023-06-01'

client

Test environment: 'desktop' or 'mobile'.

page

The URL of the page being tested.

Example: page = 'https://har.fyi/'

is_root_page

Whether the page is the root of the origin.

root_page

The URL of the root page being tested, the origin followed by /.

Example: root_page = 'https://har.fyi/'

url

The URL of the request

is_main_document

Whether this request corresponds with the main HTML document of the page, which is the first HTML request after redirects

type

Simplified description of the type of resource (script, image, css, html, other, font, text, video, xml, audio, wasm, etc)

index

The sequential 1-based index of the request

payload

JSON-encoded WebPageTest result data for this request

See the Request payload reference for more details.

summary

JSON-encoded summarization of request data

See the Request summary reference for more details.

request_headers

Request headers

See the Header reference for more details.

response_headers

Response headers

See the Header reference for more details.

response_body

Text-based response body

Example queries

Here are some common operations you can perform with the requests table.

Count the pages crawled

/* This query will process 85 GB when run. */
SELECT
client,
is_root_page,
count(0) AS requests_total
FROM `httparchive.crawl.requests`
WHERE date = '2024-05-01'
group by client, is_root_page

Size of requests served

Let’s check the size of individual requests served from websites across the entire dataset. To do this, we’ll be using the respBodySize summary metric. This metric represents the size of the response payload in bytes. Since 1 byte is very granular, we’ll divide by 1024 to get to 1 KB and then by 100 so that we are looking at this data with bin sizes of 100KB. We’ll also wrap this in a CEIL() function to remove the decimal points and then multiply the result by 100. Using this technique, 1234567 bytes would be rounded to a bin of 1300 KB.

/* This query will process 26 GB when run. */
WITH requests AS (
SELECT
CEIL(INT64(summary.respBodySize)/1024/100)*100 AS responseSize100KB,
COUNT(0) OVER () AS total_requests
FROM `httparchive.crawl.requests` TABLESAMPLE SYSTEM (1 PERCENT)
WHERE
date = '2024-06-01' AND
client = 'desktop' AND
is_root_page AND
INT64(summary.respBodySize) > 0
)
SELECT
responseSize100KB,
COUNT(0) AS requests,
COUNT(0)/ANY_VALUE(total_requests) AS pct_requests
FROM requests
GROUP BY responseSize100KB
ORDER BY responseSize100KB ASC
LIMIT 10

We can see that that 91% of requests have a response size less than 100KB. Try repeating this with 10KB bin sizes and you’ll be able to see the spread of response sizes with more granularity.

Popularity of various image formats

Let’s filter out all of the non-Image content and examine the popularity of various image formats. For example, how often is jpg, gif, webp, etc used.

/* This query will process 8 GB when run. */
WITH requests AS (
SELECT
STRING(summary.format) AS format,
page,
COUNT(0) OVER() AS total_requests,
COUNT(DISTINCT page) OVER() AS total_pages
FROM `httparchive.crawl.requests` TABLESAMPLE SYSTEM (1 PERCENT)
WHERE
date = '2024-06-01' AND
client = 'desktop' AND
is_root_page AND
type = 'image'
)
SELECT
format,
COUNT(0) requests,
COUNT(DISTINCT page) pages,
ROUND(COUNT(0) / ANY_VALUE(total_requests), 2) percent_image_requests,
ROUND(COUNT(DISTINCT page) / ANY_VALUE(total_pages), 2) percent_pages
FROM requests
GROUP BY format
ORDER BY requests DESC