January 18, 2020
Google Analytics Platform Principles – Lesson 4.4 Report sampling

Google Analytics Platform Principles – Lesson 4.4 Report sampling


Report sampling is an analytics practice that
generates reports based on a small, random subset of your data instead of using
all of your  available data. Sampling lets programs, including Google Analytics,
calculate the data for your reports faster than if every single piece of data is included
during the generation process. During processing, Google Analytics prepares
the data for your standard reports by precalculating it and then storing it in
aggregate tables. This lets Google Analytics quickly retrieve
the data you request without sampling. However, there might be times when you want
to modify one of the standard reports in Google Analytics by adding a segment, secondary dimension,
or another customization. Or, you might want to create a custom report
with a completely new combination of dimensions and metrics. When you make any of these kinds of custom
requests, either through the reporting interface or the reporting APIs, Google Analytics inspects the set of aggregate
tables to see if the request can be met using data that’s already processed and is in the
tables. If it can’t, Google Analytics goes back to
the raw session data to process your request on-the-fly. When this happens, Google Analytics checks
to see how many sessions should be included in your request. If the number of sessions is small enough,
Google Analytics can calculate the data for your request using all of the sessions. If the number of sessions is too large, Google
Analytics uses a sample to fulfill the request. For example, let’s say you create a Custom
Report with the dimensions City and Campaign and the metrics Visits and Conversion Rate. This combination of metrics and dimensions
is not already pre-calculated in any of the aggregate tables. So, if you choose a date range for the report
that includes a very large number of sessions, your report will be calculated from a sampled
set of data. The number of sessions used to calculate the
report is called the “sample size.” You can adjust the sample size using a control
in the reporting interface or by specifying the size when you query the
API. If you increase the sample size, you’ll include
more sessions in your calculation, but it’ll take longer to generate your report. If you decrease the the sample size, you’ll
include fewer sessions in your calculation, but your report will be generated faster. Google Analytics sets a maximum number of
sessions that can be used to calculate your reports. If you go over that limit, your data gets
sampled. One way to stay below the limit is to shorten
the date range in your report, which reduces the number of sessions Google
Analytics needs to calculate your request. Google Analytics Premium also offers an Unsampled
Reporting feature that will pull unsampled data for custom requests, even for large reports that exceed the sampling
limit. Session sampling is an effective way to reduce
latency while maintaining a high level of accuracy
for your reports. It helps Google Analytics process your custom
data requests efficiently, so you get timely answers to your business
questions.

Leave a Reply

Your email address will not be published. Required fields are marked *