Description of Methodology
The lack of clarity around how metrics are actually measured, their methodologies, advantages, and limitations has led to immense confusion. It has frequently stalled the adoption of measurements such as viewability. This opacity has allowed some publishers and agencies each choose the vendor data that spins the data into the best story, regardless of accuracy, and then fight over who's right. Instead, we believe the market should be exchanging value based upon data designed for trust and transparency rather than spin emanating from a black box.
We've spent the last five years building the next generation of web metrics in our attention and engagement metrics. We want to be crystal clear about how they work and – more importantly, how you put them to work.
We hope that by sharing our methodology we will encourage all the other data companies to share their thinking and technologies as well. Let's build faster, better, trusted tools together.
This document is our Description Of Methodology from which future companies can build knowing it gives them the best starting point in seeking accreditation on attention metrics such as time and viewability.
We believe that much of the internet is best understood by looking at engagement and rather than forcing others to stumble while trying to replicate these already ingrained ideas, we wanted to make it easy to build on and improve this core concept.
There's a better internet out there and we hope that by working from this shared starting point we can build it together.
Engaged Time Window
Events Indicating Engagement
Listed with the event name, DOM target, and description:
||window/document||Captures when a page is brought to the foreground either by tabbed browsing, or by bringing the browser window with the active page to the foreground.|
||window||Captures any time the window/document is scrolled by any means (arrow keys, scroll bar, mousewheel).|
||window||Captures any time the browser window dimensions are changed.|
||document.body||Captures mouse movement over the document body.|
||document.body||Captures any time a mouse button is depressed in the document.body.|
||document.body||Captures any time a non-scroll keyboard key is pressed. Fires continuously if held down in almost every browser.|
Page load and page focus are treated as acts of initial engagement, representing either loading or returning to a page, respectively. Scroll, mousemove, mousedown, and keydown all serve as proxies for ongoing engagement as these actions capture ways the user can navigate or interact with the page. Resize is also treated as an act of engagement as the user is specifically manipulating their view of the page.
Note on What is not Tracked
Other possible events are ignored because they are redundant with the ones currently monitored and tracking them would impose unnecessary additional load on a page.
keyupis not needed when keydown fires continuously.
mouseupcould fire absent a proximate mousedown or mousemove, such as when a user holds the mouse button down for a long period of time before releasing it. But Chartbeat believes such an event is an edge case and not worth the extra event listener.
pointerevents are not included as adding listeners for them can have an effect on the presentation, functionality, and/or performance of client sites in iOS Safari.
keypressare redundant when listening to mousedown and keydown.
mousewheelcould potentially fire when scroll does not, such as scrolling the mousewheel over an un-scrollable element, but Chartbeat believes that this type of scenario is an atypical way to engage with a page, and the user is likely to immediately adjust their behavior in a way that triggers one of the other engaged events (scroll, mouse move, etc.). Additionally, Chartbeat has observed that mousemove is usually triggered simultaneously with mousewheel events.
Note on Mobile Activity
Essentially all mobile browsers fire desktop mouse events at the end of a touch tap to better accommodate legacy web pages that do not take advantage of the touch events. We do not listen to mobile-specific touch events as they can have negative effects on client sites.
In-Focus Browser Window
Chartbeat considers both whether a given browser tab is the active tab in that window ('Tab Focus') and whether the window is the active application ('Browser/Application Focus') before deciding that the page is visible.
By requiring both Tab Focus and Browser/Application Focus, Chartbeat does not consider some technically visible scenarios as viewable, such as a page in a window behind another window, but still on the screen, as there are inconsistencies in how these scenarios are measured.
Auto-Refresh and virtualPage
If a page is auto-refreshed while the page has both tab and application focus (see In-Focus Browser Window above), a viewable impression and active exposure time will likely be registered for any ads that are in view in the new session. During daily data processing, Chartbeat uses page referrer data to detect impressions generated via a full page refresh by checking to see if the referring URL matches the page URL.
Chartbeat offers a public API method called virtualPage that enables publishers to signal a new page session even when there was no actual page load. This is used on AJAX layouts where scrolling or other interaction triggers new content loaded dynamically into the page. A new URL is usually pushed and new ads are loaded, making this effectively a new page load, and Chartbeat handles it accordingly by resetting all aspects of Chartbeat's tracking. So a virtualPage will record viewable impressions and active exposure time for any ads that meet the criteria afterwards.
In the event that an ad unit becomes viewable or the campaign data within an ad changes, Chartbeat immediately sends a "force ping" back to its servers to confirm the event. A "force ping" is identical to a "standard ping"–it just occurs outside the usual ping cycle, when an event of interest has occurred.
External: Chartbeat hooks into the beforeunload event for any users exiting the page (except via an internal anchor element click, which will be captured via the internal mechanism detailed below) to attempt one final ping. As this is an asynchronous event it is not guaranteed to succeed but Chartbeat has found that when combined with the internal unload pinging there is a 68% rate of success in capturing the final state of the pinger. Chartbeat does not attempt the ping in the unload event, as that would make exiting a page synchronous and in turn block the loading of the next page. This would contribute to users' perception of slowness across Chartbeat's client network and is an anti-pattern.
Internal: Chartbeat hooks into the unload event to store the current state of the pinger in the client for an hour after the user leaves a page. If the user returns to any page on that domain with Chartbeat's pinger within the hour their final state will be sent.
As with any tracking philosophy Chartbeat suffers from certain technological limitations that prevent complete measurement coverage. Below is a full list of weaknesses unique to Chartbeat in addition to those shared by all analytics providers.
- If a user leaves a page in-between ping intervals, there is a chance that activity after the last ping will not be recorded. This will only affect time metrics as Chartbeat sends a Force Ping when viewable impressions occur.
- Chartbeat stores certain information on the client side in localStorage or cookies. If the user has disabled this type of storage, certain information cannot be persisted between sessions.
- Chartbeat may be specifically targeted in Do Not Track software, blocking script loading or functionality.
- Chartbeat infers that viewability measurements are taken of an ad appearing both a) in its container and b) in its intended form.
- While Chartbeat follows industry best practices, not all robot traffic can be identified and it is possible that a certain number of impressions are still triggered by non-humans.
- Due to inconsistencies in browser location detection Chartbeat assumes a browser is always fully on screen if not collapsed or otherwise disabled.
- Chartbeat does not have a system to identify or estimate unique audience metrics in situations where cookies cannot produce an accurate measure. Chartbeat has found that among US customers fewer than 0.1% of impressions occur with cookies disabled while fewer than 0.3% of impressions have cookies disabled among European customers.
General Measurement Limitations
- Chartbeat may not detect ad blocking software unless it affects the geometry, visibility, or labeling of the ad.
- Chartbeat relies on image beacons which can be disabled by a user resulting in no information being collected.
- Not all caching can be eliminated so it is possible that certain users will run old versions of Chartbeat code.
Engagement Detection Methodology
- The page must be in the in-focus tab in an in-focus browser window.
- The user must have exhibited some act of engagement in the past five seconds.
This five second window is chosen such that we have a high degree of confidence that the reader is actively looking at the page. To derive this window, we performed a large behavioral experiment, the details of which are described below.
The goal of this experiment was to determine a time window
w such that we can be highly confident that a person with the page open and in focus who has interacted with their computer's console in the last
w seconds is likely to be still looking at the page.
If we select too small a
w, we risk declaring a visitor idle when she is, in fact looking at a page, and hence undercounting the time that she spent engaging with the page and exposed to ads. On the other hand, if we select too large a
w, we risk declaring a visitor engaged when she is in fact idle, resulting in overcounting the time spent engaging with the page and exposed to ads. We seek a
w that balances these concerns - one which correctly marks the vast majority of visitors' true engaged time while avoiding overcounting. In practice, we prefer to undercount, such that time that is counted can be considered, roughly, guaranteed to have occurred.
We solicited a group of 150 participants and asked each participant to read one of three articles on a computer—participants were randomly assigned the article they read. One article was a news article, one a magazine-type article, and one a light blog post. Participants were instructed to read as much or as little of the article as they liked, and to read however they wanted, with two exceptions:
- That they spend their time continuously reading—i.e. not to go idle during their time on page.
- That they leave the page immediately after the completion of their reading.
The data thus collected constitutes a set of time series where we can determine the frequency of console interaction and the true amount of time spent reading. For example, in the figure above, the visitor interacted with the console nearly every second, with the notable exception of a roughly 8 second gap starting at second 22; total reading time was about 62 seconds. Across all participants, the vast majority of gaps were very small (1s or less) and the vast majority of time was spent with very small gaps in engagement. A histogram showing the distribution of the amount of time spent with various gaps in engagement can be seen below:
Given this dataset, we can define a hard optimization criterion for
w—seeking the minimal
w such that, for each participant, at each second of their reading, a console interaction had occurred within the previous
w seconds. This can be expressed as the following optimization problem:
where time P is the set of participants, tp is the total amount of time participant p spent on the page, and it is the most recent console interaction interaction before time t.
To handle outliers, we can express a relaxed optimization problem, which finds the minimal
w such that at least 95% of total time was spent with gaps of size
w or less for at least 95% of participants:
for at least 95% of people p in set P.
The minimum of the relaxed optimization is reached at
w=4.8 seconds. That is, for >= 95% of participants, >= 95% of their active reading time was spent while having interacted with the console of their computer in the past 4.8 seconds, and 4.8 seconds is the smallest window such that this is true (i.e. it is the window that has the least risk of overcounting).
In practice, because we record time in integer increments, we round this window to 5 seconds.