Skip to content

CAPO function

The httparchive.fn.CAPO function takes an HTML response body and returns an array of objects containing the relative performance weighting for each element in the static HTML <head>.

Learn more about using Capo on BigQuery in the capo.js docs.

Input

html

The HTML response body.

Type: STRING

Response bodies can be sourced from the response_body field of the requests table, or the body field of legacy response_bodies tables.

The HTML does not need to be complete and can contain only the <head> element, so for faster results and to avoid hitting memory limitations of BigQuery functions, it’s recommended to extract everything before the opening <body> tag using a regular expression like this:

REGEXP_EXTRACT(response_body, r'(?s)(.*)(<body.*?>)')

Output

Capo object.

Type: ARRAY<STRUCT<vizWeight STRING, weight INT64, element STRING>>

Example usage

Static input

SELECT httparchive.fn.CAPO('''
<html>
<head>
<title>Example</title>
<link rel="manifest" href="/manifest.json">
<style></style>
<script defer src="script.js"></script>
<meta charset="utf-8">
</head>
</html>
''')

Live input

SELECT
page,
httparchive.fn.CAPO(response_body) AS capo
FROM `httparchive.crawl.requests` TABLESAMPLE SYSTEM (0.001 PERCENT)
WHERE
date = '2023-05-01' AND
client = 'desktop' AND
is_main_document
LIMIT 1