Migrate queries to `crawl` dataset
New tables have been introduced in the HTTP Archive dataset, which are more efficient and easier to use. The crawl
dataset contains all the data from the previous pages
, requests
, and other datasets. This guide will help you migrate your queries to the new dataset.
Migrating to crawl.pages
Page data schemas comparison
previously | crawl.pages |
---|---|
date in a table name | date |
client as _TABLE_SUFFIX | client |
url in pages.YYYY_MM_DD_client | page |
not available | is_root_page |
not available | root_page |
not available | rank |
$.testID within payload column in pages.YYYY_MM_DD_client , wptid column in summary_pages.YYYY_MM_DD_client | wptid |
payload in pages.YYYY_MM_DD_client | payload |
req* , resp* and other in summary_pages.YYYY_MM_DD_client | summary |
$.CUSTOM_METRIC_NAME within payload column in pages.YYYY_MM_DD_client | custom_metrics |
report in lighthouse.YYYY_MM_DD_client | lighthouse |
feature , type , id in blink_features.features | feature , type , id in features |
category , app , info in technologies.YYYY_MM_DD_client | categories , technology , info in technologies |
not available | metadata |
Page query updates
- Migrate custom metrics
- Migrate summary metrics queries
- Migrate detected technologies metrics
- Migrate lighthouse insights
- Migrate Blink features metrics
Migrating to crawl.requests
Request data schemas comparison
previously | crawl.requests |
---|---|
date in a table name | date |
client as _TABLE_SUFFIX | client |
page in requests.YYYY_MM_DD_client | page |
not available | is_root_page |
not available | root_page |
url in requests.YYYY_MM_DD_client | url |
firstHtml in summary_requests.YYYY_MM_DD_client | is_main_document |
type in summary_requests.YYYY_MM_DD_client | type |
$._index within payload in requests.YYYY_MM_DD_client | index |
payload column in requests.YYYY_MM_DD_client | payload |
req* , resp* and other in summary_requests.YYYY_MM_DD_client | summary |
req_* and reqOtherHeaders in summary_requests.YYYY_MM_DD_client | request_headers |
resp_* and respOtherHeaders in summary_requests.YYYY_MM_DD_client | response_headers |
body in response_bodies.YYYY_MM_DD_client | response_body |
Request query updates
- Migrate headers metrics
- Migrate summary metrics
- Migrate response body queries