httparchive.all.pages is a partitioned and clustered table containing one row per page tested in the HTTP Archive. Pages are tested on a monthly basis and as of April 2022, both the root page and one secondary page are tested.
Here are some common operations you can perform with the
Get the median page weight
/* This query will process 1.12 GB when run. */ WITH pages AS ( SELECT client, CAST(JSON_VALUE(summary, '$.bytesTotal') AS INT64) AS page_weight FROM `httparchive.all.pages` TABLESAMPLE SYSTEM (1 PERCENT) WHERE date = '2023-05-01' ) SELECT client, APPROX_QUANTILES(page_weight, 1000)[OFFSET(500)] AS median_page_weight FROM pages GROUP BY client
The median mobile page weighs 1.78 MB and the median desktop page weighs 2.03 MB.
This query uses the
APPROX_QUANTILES function to calculate the median page weight for each client type as of May 2023.
bytesTotal property of the
summary object represents the total number of bytes loaded on the page. This value is stored as a JSON-encoded string, so we use
JSON_VALUE to extract it and
CAST to convert it to an integer.
We’re also using the
WITH clause here to create a temporary table called
pages, which is then fed into the main query below. This makes the query a bit easier to read.
Also note that for demonstration purposes, this query processes a 1% sample of the
httparchive.all.pages table. This reduces the amount of data processed by the query, which can help reduce costs. But note that the results will be less accurate than if you ran the query on the full table.
|YYYY-MM-DD format of the HTTP Archive monthly crawl|
|Test environment: |
|The URL of the page being tested|
|Whether the page is the root of the origin|
|The URL of the root page being tested, the origin followed by |
|Site popularity rank, from CrUX|
|ID of the WebPageTest results|
|JSON-encoded WebPageTest results for the page|
|JSON-encoded summarization of the page-level data|
|JSON-encoded test results of the custom metrics|
|JSON-encoded Lighthouse report|
|Blink features detected at runtime|
|Technologies detected at runtime|
|Additional metadata about the test|
This field is required for all queries over the
YYYY-MM-DD format of the HTTP Archive monthly crawl.
date = '2023-06-01'
The URL of the page being tested.
page = 'https://har.fyi/'
Whether the page is the root of the origin.
The URL of the root page being tested, the origin followed by
root_page = 'https://har.fyi/'
Site popularity rank, from CrUX
ID of the WebPageTest results, for example
wptid = '230509_Dx20W_FMHK5'.
The ID encodes the date of the test at the start in YYMMDD format. The date is followed by an underscore and a
M character indicating whether it was a desktop or mobile test. The rest of the ID is randomly generated. In the example above we can tell that the page was tested on May 9, 2023, and that it was a desktop test.
You can view the WebPageTest results in the browser by visiting
https://webpagetest.httparchive.org/result/<wptid>/, eg https://webpagetest.httparchive.org/result/230509_Dx20W_FMHK5/. This is HTTP Archive’s own private instance of WebPageTest, which is required to view any of the results.
JSON-encoded WebPageTest results for the page.
For a full example value, see payload.json.
har reference for more details.
JSON-encoded summarization of the page-level data
JSON-encoded test results of the custom metrics
JSON-encoded Lighthouse report.
lighthouse reference for more details.
Blink features detected at runtime (see https://chromestatus.com/features)
Technologies detected at runtime (see https://www.wappalyzer.com/)
technology reference for more details.
Additional metadata about the test
metadata reference for more details.