Skip to content
har.fyi 🧪

Migrate queries to `all` dataset

New tables have been introduced in the HTTP Archive dataset, which are more efficient and easier to use. The all dataset contains all the data from the previous pages, requests, and other datasets. This guide will help you migrate your queries to the new dataset.

Migrating to all.pages

Page data schemas comparison

previouslyall.pages
date in a table namedate
client as _TABLE_SUFFIXclient
url in pages.YYYY_MM_DD_clientpage
not availableis_root_page
not availableroot_page
not availablerank
$.testID within payload column in pages.YYYY_MM_DD_client, wptid column in summary_pages.YYYY_MM_DD_clientwptid
payload in pages.YYYY_MM_DD_clientpayload
req*, resp* and other in summary_pages.YYYY_MM_DD_clientsummary
$.CUSTOM_METRIC_NAME within payload column in pages.YYYY_MM_DD_clientcustom_metrics
report in lighthouse.YYYY_MM_DD_clientlighthouse
feature, type, id in blink_features.featuresfeature, type, id in features
category, app, info in technologies.YYYY_MM_DD_clientcategories, technology, info in technologies
not availablemetadata

Page query updates

  • Migrate from blink_features.features
SELECT
  url,
  feature,
  type,
  id
FROM `httparchive.blink_features.features`
WHERE yyyymmdd = DATE('2024-05-01')
  AND client = 'desktop'
  • Migrate from lighthouse.YYYY_MM_DD_client
SELECT
  url,
  JSON_QUERY(report, '$.audits.largest-contentful-paint.numericValue') AS LCP,
FROM `httparchive.lighthouse.2024_06_01_desktop`
  • Migrate from pages.YYYY_MM_DD_client
SELECT
  url,
  _TABLE_SUFFIX AS client,
  JSON_QUERY(payload, '$.testID') AS testID,
-- JSON with the results of the custom metrics,
  JSON_QUERY(payload, '$._privacy') AS custom_metrics,
FROM `httparchive.pages.2022_06_01_*`
  • Migrate from summary_pages.YYYY_MM_DD_client
SELECT
  numDomains,
  COUNT(0) pages,
  ROUND(AVG(reqTotal),2) avg_requests,
FROM `httparchive.summary_pages.2024_06_01_desktop`
GROUP BY
  numDomains
HAVING
  pages > 1000
ORDER BY
  numDomains ASC
  • Migrate from technologies.YYYY_MM_DD_client
SELECT
  url,
  category,
  app,
  info
FROM `httparchive.technologies.2024_06_01_desktop`

Migrating to all.requests

Request data schemas comparison

previouslyall.requests
date in a table namedate
client as _TABLE_SUFFIXclient
page in requests.YYYY_MM_DD_clientpage
not availableis_root_page
not availableroot_page
url in requests.YYYY_MM_DD_clienturl
firstHtml in summary_requests.YYYY_MM_DD_clientis_main_document
type in summary_requests.YYYY_MM_DD_clienttype
$._index within payload in requests.YYYY_MM_DD_clientindex
payload column in requests.YYYY_MM_DD_clientpayload
req*, resp* and other in summary_requests.YYYY_MM_DD_clientsummary
req_* and reqOtherHeaders in almanac.requestsrequest_headers
resp_* and respOtherHeaders in almanac.requestsresponse_headers
body in response_bodies.YYYY_MM_DD_clientresponse_body

Request query updates

  • Migrate from almanac.requests
SELECT
  LOWER(JSON_VALUE(request_headers, '$.name')) AS header_name,
  JSON_VALUE(request_headers, '$.value') AS header_value,
FROM `httparchive.almanac.requests`,
UNNEST(JSON_QUERY_ARRAY(request_headers)) AS request_headers
WHERE date = '2024-06-01'
  AND client = 'desktop'
  AND firstHtml
  • Migrate from requests.YYYY_MM_DD_client
SELECT
  page,
  url,
  JSON_VALUE(payload, '$.response.content.mimeType') AS mimeType,
  CAST(JSON_VALUE(payload, '$.response.bodySize') AS INT64) AS respBodySize,
FROM `httparchive.requests.2024_06_01_desktop`
  • Migrate from response_bodies.YYYY_MM_DD_client
SELECT
  page,
  url,
  BYTE_LENGTH(response_body) AS bodySize
FROM `httparchive.response_bodies.2024_06_01_desktop`
  • Migrate from summary_requests.YYYY_MM_DD_client
SELECT
  ROUND(respBodySize/1024/100)*100 AS responseSize100KB,
  COUNT(0) requests,
FROM `httparchive.summary_requests.2024_06_01_desktop`
GROUP BY responseSize100KB
HAVING responseSize100KB > 0
ORDER BY responseSize100KB ASC