Integrating Google Analytics with BigQuery is an extremely powerful solution for analyzing user behavior in a mobile or web application. In this article, we will give you a brief introduction into both platforms, explain how to connect and use them, and more.
A quick intro to Google Analytics for developers
As it says on the official website, Google Analytics is a free, unlimited app measurement solution that provides insight on app usage and user engagement, which enables you to optimize your app accordingly. Analytics integrates across Firebase features and provides you with unlimited reporting for up to 500 distinct events that you can define using the Firebase SDK.
From the mobile side of developing, applications should be connected and configured on a Firebase project. All requests which will record data to Google Analytics for Firebase will be called by using the Analytics SDK in native (Android or iOS) or Flutter project. Every possible event can be tracked – every screen opening, every user interaction inside the app. Every event can have custom data and attributes to provide the maximum amount of data that needs to be collected.
public void trackUserClickOnPush(String pushClickState, String pushTag) {
Bundle bundle = new Bundle();
bundle.putString(ATTR_LABEL, pushClickState);
bundle.putString(ATTR_ACTION, EVENT_PUSH_CLICK);
bundle.putString(ATTR_CONTENT_TYPE, pushTag);
mFirebaseAnalytics.logEvent(FirebaseAnalytics.Event.SELECT_CONTENT, bundle);
}
What is BigQuery?
BigQuery is a member of the Google Cloud Platform. It is Google’s serverless data warehouse which is an extremely powerful tool for analyzing massive sets of data. It is highly scalable, serverless, and it integrates with the most popular data visualization tools – for example, Data Studio, Looker, or Tableau.
BigQuery is designed to query massive datasets using an SQL-like language. Google Analytics users found BigQuery interesting (especially big corporations which have some special requirements) because data from Google Analytics can be dumped into BigQuery daily.
Why integrate BigQuery with Google Analytics
BigQuery receives raw sets of data which can be analyzed, filtered and processed much better and with more accuracy than in Google Analytics itself. It also provides many types of analysis that you can’t find in other analytics platforms.
One of the most significant benefits of BigQuery is that it is easy to set up and use. Data from Google Analytics is collected automatically, and you just pay for what you use.
It also solves the scaling problem because all scaling work is done for you. BigQuery separates storing and computing data using a process which enables elastic scaling, which helps you scale at a higher performance rate.
With collected data, you can analyze how users interact with specific parts of the app on a very granular level (every second, filtering by dimensions). Applying the right data mining algorithms on the collected raw data can help you discover patterns of user’s behavior that are really hard to notice in standard reporting.
Example of BigQuery integration with Google Analytics
Google Analytics has a method setUserId which defines a unique user ID. It allows saving the user ID for the individual using the application. It is generally used when we want to use Analytics together with BigQuery in order to associate analytics data for the same user across multiple applications, multiple devices or multiple analytics providers.
To illustrate this further, let’s use a specific example for which we proposed a BigQuery integration:
We worked on a project where we needed to track all screen openings (around 30 screens) and roughly around 90 additional events – users clicks, changes and some other settings data. The customer wanted to track all those events with the possibility to view and analyze collected data by users, platforms (we had iOS and Android mobile application and web) and some other parameters.
As developers, we were also responsible to ensure that used user IDs are in accordance with Google Analytics Terms of Service. This includes avoiding the use of impermissible personally identifiable information and providing appropriate notice of the usage of identifiers in application’s Privacy Policy. User ID must not contain information that a third party could use to determine the identity of the user. For example, a user’s email address or social security number can not be used as a user ID.
On this project, every user had a unique number which was used to identify every registered user. We used that unique user’s number, but we have hashed it before setting it to the user ID. In that way, we ensured that nobody knew and could not find out the user’s personal data from events that were tracked.
mFirebaseAnalytics.setUserId(hashedUserNumber);
After setting a user ID, all future events are automatically tagged with this value, and all collected events can be accessed by querying for the user_id value in BigQuery. Adding a user ID will not affect any events previously recorded by Google Analytics.
Linking other data sources with BigQuery
The BigQuery Data Transfer Service allows transfers of data from external sources automatically. With BigQuery Data Transfer Service, data from multiple sources can be taken and imported on a scheduled and fully managed basis. These platforms are supported:
- Google Marketing Platform
- Google Ads
- YouTube
- Amazon S3
- Teradata
- Partner Saas applications to BigQuery
Because of that feature, BigQuery is a great tool to merge data from different tools into one place in order to collect and analyze all combined data.
How to connect a Firebase project to BigQuery
Linking the Firebase project to BigQuery gives the opportunity to access raw, unsampled event data along with all collected parameters and user properties. When the Firebase is connected to BigQuery, there is an option to select which data will be exported on a daily basis to BigQuery. There are Google Analytics for Firebase, Crashlytics, Predictions, Cloud Messaging and Performance Monitoring data. All of selected data will be exported to corresponding datasets in BigQuery.
In order to access, modify or analyze the data, users must have permission and a given role to the project. There are three types of roles in access control:
- Basic roles represent the access control and they are: owner, editor and viewer.
- Predefined roles provide granular access for a specific service and are managed by Google Cloud. They are used to support common use cases and access control patterns.
- Custom roles represent granular access according to a user-specified list of permissions.
To grant access to a BigQuery resource, one or more roles should be assigned to a user, group, or service account. There are three BigQuery resource level:
- organization or Google Cloud project level
- dataset level
- table or view level
How much does BigQuery cost?
BigQuery offers scalable, flexible pricing options to meet your technical needs and your budget. You only pay for the resources you use. Whether it is storage or computing resources, you will be charged solely based on how much of the tool you use. Copying and exporting data comes at no charge. You will be charged separately for storage and streaming inserts.
Storage costs are based on the amount of data stored in BigQuery. Storage charges can be:
- Active – A monthly charge for data stored in tables or in partitions that have been modified in the last 90 days.
- Long-term – A lower monthly charge for data stored in tables or in partitions that have not been modified in the last 90 days.
For query costs, you can choose between two pricing models:
- On-demand – This is the most flexible option. On-demand pricing is based on the amount of data processed by each query you run.
- Flat-rate – This pricing option is best for customers who desire cost predictability. Flat-rate customers purchase dedicated resources for query processing and are not charged for individual queries.
Every created project has a billing account attached to it. Any charges incurred by BigQuery jobs run in the project are billed to the attached billing account. BigQuery storage charges are also billed to the attached billing account.
All costs from BigQuery can be seen with the Cloud Billing reports page in the Cloud Console. Also, inside Cloud Console cost trends by period are also visible.
BigQuery has operations that are free of charge in every location, some of them are:
- Loading data
- Copying data
- Exporting data
- Deleting datasets
- Deleting tables, views, partitions, and functions
- Metadata operations
- Reading pseudo columns
- Reading meta tables
- Creating, replacing, or calling UDFs
If you want to find out more about the costs, you can check the official BigQuery pricing documentation.
Conclusion – Why BigQuery is a great solution for data analysis
To sum up, the main reasons why we recommend using BigQuery for analyzing data collected with Google Analytics are:
- It’s easy to set up
- Data is protected
- You can combine different scopes in one report
- You can use an unlimited amount of dimensions
- You can add third-party data sources
- You can calculate goal completions, and build your own Channel Grouping
- It has an affordable billing plan
Unlike Google Analytics, BigQuery has the capability to allow not only session-based information, but user specific details. All previous behavior of the user in earlier sessions is visible and accessible. This is great for e-commerce companies, but also for all other companies that need similar result format from analytics data.
BigQuery is hosted on Google’s cloud-based server architecture (Google Cloud Platform) which is extremely fast – queries on billions of rows of data are done in seconds. There is no setup required, just need to connect with Google Analytics and all data is transferred to BigQuery. Price is acceptable and realistic to the service it provides.
Have some thoughts on the topic? If you need help with analyzing your own application data using Google Analytics and BigQuery, feel free to contact us.