Requests table
httparchive.crawl.requests
is a partitioned and clustered table containing one row per request per page tested in the HTTP Archive. Pages are tested on a monthly basis and as of April 2022, both the root page and one secondary page are tested.
Schema
Field name | Type | Description |
---|---|---|
date | DATE | YYYY-MM-DD format of the HTTP Archive monthly crawl |
client | STRING | Test environment: 'desktop' or 'mobile' |
page | STRING | The URL of the page being tested |
is_root_page | BOOLEAN | Whether the page is the root of the origin |
root_page | STRING | The URL of the root page being tested, the origin followed by / |
url | STRING | The URL of the request |
is_main_document | BOOLEAN | Whether this request corresponds with the main HTML document of the page, which is the first HTML request after redirects |
type | STRING | Simplified description of the type of resource (script, html, css, text, other, etc) |
index | INTEGER | The sequential 0-based index of the request |
payload | JSON | JSON-encoded WebPageTest result data for this request |
summary | JSON | JSON-encoded summarization of request data |
request_headers | ARRAY<RECORD> | Request headers |
response_headers | ARRAY<RECORD> | Response headers |
response_body | STRING | Text-based response body |
date
This field is required for all queries over the requests
table.
YYYY-MM-DD format of the HTTP Archive monthly crawl.
Example: date = '2023-06-01'
client
Test environment: 'desktop'
or 'mobile'
.
page
The URL of the page being tested.
Example: page = 'https://har.fyi/'
is_root_page
Whether the page is the root of the origin.
root_page
The URL of the root page being tested, the origin followed by /
.
Example: root_page = 'https://har.fyi/'
url
The URL of the request
is_main_document
Whether this request corresponds with the main HTML document of the page, which is the first HTML request after redirects
type
Simplified description of the type of resource (script, image, css, html, other, font, text, video, xml, audio, wasm, etc)
index
The sequential 1-based index of the request
payload
JSON-encoded WebPageTest result data for this request
See the Request payload reference for more details.
summary
JSON-encoded summarization of request data
See the Request summary reference for more details.
request_headers
Request headers
See the Header reference for more details.
response_headers
Response headers
See the Header reference for more details.
response_body
Text-based response body
Example queries
Here are some common operations you can perform with the requests
table.
Count the pages crawled
client | is_root_page | requests_total |
---|---|---|
mobile | true | 1517364094 |
desktop | true | 1299394354 |
mobile | false | 1216156430 |
desktop | false | 1093804725 |
Size of requests served
Let’s check the size of individual requests served from websites across the entire dataset. To do this, we’ll be using the respBodySize summary metric. This metric represents the size of the response payload in bytes. Since 1 byte is very granular, we’ll divide by 1024 to get to 1 KB and then by 100 so that we are looking at this data with bin sizes of 100KB. We’ll also wrap this in a CEIL() function to remove the decimal points and then multiply the result by 100. Using this technique, 1234567 bytes would be rounded to a bin of 1300 KB.
responseSize100KB | requests | pct_requests |
---|---|---|
100.0 | 10113115 | 0.90864138408777051 |
200.0 | 486257 | 0.043689133714228209 |
300.0 | 188335 | 0.016921490072264605 |
400.0 | 87127 | 0.0078281714260556891 |
500.0 | 54134 | 0.004863822144433972 |
600.0 | 37443 | 0.0033641721017113315 |
700.0 | 26985 | 0.0024245435505883687 |
800.0 | 19817 | 0.0017805143428575023 |
900.0 | 24519 | 0.0022029788147814046 |
1000.0 | 11787 | 0.0010590363102014118 |
We can see that that 91% of requests have a response size less than 100KB. Try repeating this with 10KB bin sizes and you’ll be able to see the spread of response sizes with more granularity.
Popularity of various image formats
Let’s filter out all of the non-Image content and examine the popularity of various image formats. For example, how often is jpg, gif, webp, etc used.
format | requests | pages | percent_image_requests | percent_pages |
---|---|---|---|---|
jpg | 1644804 | 1310081 | 0.38 | 0.43 |
png | 1328825 | 1151809 | 0.31 | 0.38 |
gif | 793541 | 495055 | 0.18 | 0.16 |
svg | 250130 | 227550 | 0.06 | 0.08 |
webp | 223783 | 191184 | 0.05 | 0.06 |
ico | 64468 | 64016 | 0.01 | 0.02 |
avif | 29226 | 25794 | 0.01 | 0.01 |
4405 | 3938 | 0.0 | 0.0 | |
heic | 395 | 382 | 0.0 | 0.0 |