This blog is geared toward the novice Google Analytics users who feel overwhelmed by concepts like filtering, segments and customs reports. My hope is that by the end of this post, I will be able to provide you with enough context on these concepts to give you the courage to use in these features in day to day work.
The “Filter” feature in Google Analytics is used to limit or modify collected pixel data before it is processed and accessible within the view reporting interface or through the API. It is the first line of defense against dirty data both malicious (spam) and accidental, and also a way to manipulate the presentation of your data to your liking. I can’t stress enough the importance of fixing problems at the source of the issue rather than manipulating data after it is stored and this is Google Analytics way of doing just that. It lets you choose from a large variety of reportable fields with the ability to include only or exclude data that match the criteria passed to the filter field. This simple approach to cleaning data provides easy control of a user's data collection environment, which allows for more time focusing on reporting and less time worrying about the integrity of the data being processed at all hours of the day.
If you can’t immediately think of a use case for this feature based on my description, then consider the following scenarios and I’m sure you will understand how useful it is to use the filtering feature.
Imagine that your website has seen an unusually high (outlier) amount of user traffic that is isolated to one specific city that is ironically the same city in which your business operates. Further more, you just launched a page that hasn’t been indexed by search engines, yet it is rivaling other historical pages for supremacy of the most organic pageviews. This is a clear red flag that a lot of traffic you are analyzing is coming from internal (i.e. within your company) users. The only way to get rid of this traffic is by adding an “Exclude” filter to your view to exclude your offices IP Address.
In another scenario, you are fed up with having to clean up url paths containing hashes (#) and query parameters (?). Whenever you are trying to pull a distinct count of "/pet-stores", you have to export the page report and clean the rows that contain hashes (/pet-stores#dogs) and query parameters (/pet-stores?internal=true) to get a true read for how much traffic this page has organically generated. Instead of repeating this process each time you pull the report, you can use a “Find and Replace” filter with a regular expression that matches instances of hashes and query parameters and removes them from all page traffic.
As you can see from my examples above, it is very likely that your Google Analytics data faces some instances that challenges its integrity and which would be easily alleviated with the filter function.
Having pre-built reports at your disposal makes it easy to coast through the Google Analytics interface and get a general understanding of what is happening with your website, but in order to dig deeper into this data you will have to make use of “Custom Reports". Custom Reporting extends GA’s primary reporting interface by providing users with the ability to generate reports based on business-related criteria. Instead of having to rely on the dimension/metric combinations provided with the default reports, users have full ability to mix and match whatever dimensions and metrics that they please. This feature also allow users to create flat tables or drillable reports (same report format as built-in) with up to five dimensions. And if that isn’t enough to sell the feature, then the ability to filter on fields not present in the report itself should solidify a decision to start using custom reports. No built-in report has anywhere near this type of flexibility or functionality.
Note: I highly recommend you understand dimension/metric scopes before you start building custom reports. Not all combinations that you may make are relatable and the data that is returned will be incomprehensible. For more information: Google Analytics Scope
Maybe the most difficult feature and analytics concept to grasp for a novice user is the “Segment”. The general definition of a segment, as well as within the context of this feature, is a subset of the analytics data. It takes a sample of the data set population based on certain attributes that are defined by the analyst. For example, if I asked the question, “How much of your overall website traffic is coming from a Google AdWords campaign”, then in technical terms, I am asking to see a segment of your website traffic that is AdWords traffic. I’m isolating my site traffic to better understand how Adwords users interact with with the website compared to other traffic.
Simple right? Well it certainly sounds like it in general terms, but in the context of Google Analytics, there is one nuance that makes this request a little bit more complicated than the definition provided. What differs between the two is the concept of segmenting based on users or sessions. Depending on which option is selected, the information being displayed could tell completely different stories.
Using the “user” segment, you are saying that within the data being presented, include any user that has at least one session where the user visited the website through an Adwords ad. This means that they could have visited the website organically or through another means within this time frame, but at least one session started when the user visited through an AdWords ad.
Using the “sessions” segment, you are focusing on individual sessions to a website and isolating only those sessions that came through from Google AdWords traffic. That same user explained above may have come to the website on multiple occasions, but we only want to include those users sessions that started from an AdWords ad, remove all of those sessions that do not match this criteria.
This should now give you a better understanding of the difference between the two segment criteria, but if it doesn’t, then I suggest you take a look at an older post I wrote that dives into examples of the differences: Segments - Users vs Sessions