Technology struct
Appears in: pages
table
As: technologies
Technologies are detected by Wappalyzer. Refer to HTTP Archive’s fork of the Wappalyzer repository on GitHub to request a new technology detection or to browse the source code of existing detections.
Schema
Field name | Type | Description |
---|---|---|
technology | STRING | Name of the detected technology |
categories | ARRAY<STRING> | List of categories to which this technology belongs |
info | ARRAY<STRING> | Additional metadata about the detected technology, ie version number |
technology
Type: STRING
Name of the detected technology
categories
Type: ARRAY<STRING>
List of categories to which this technology belongs
info
Type: ARRAY<STRING>
Additional metadata about the detected technology, ie version number
Example queries
Pages using WordPress in the top 5k
As the technologies
field is a repeated struct, we need to use UNNEST
to query it.
Top 10 CMSs
Within the technologies
field, the categories
field is also repeated. We can use UNNEST
to query it as well.
It’s straightforward to detect whether a page uses a technology. However, to generalize that to an entire website (or origin), we detect if either its root_page
or secondary page use it. To handle this in the query, we count the distinct number of pages’ root_page
fields.
Top 5 WordPress versions
There is usually only one technology version on a given page, but in some cases a site uses the same technology twice. For example, multiple widgets load different versions of jQuery.
To account for these edge cases, the info
field is also repeated, so we need to use UNNEST
to query it as well.
Also note that some pages omit version numbers, so you may see empty or null values in the results.
Regular expressions can be used to parse major version numbers, for example REGEXP_EXTRACT(version, r'^(\d+)')
. Beware of garbage values, as the version info is extracted from the source HTML. For example, you may encounter a subset of pages with a version number that hasn’t even been released yet.