--- stage: Growth group: Product Intelligence info: To determine the technical writer assigned to the Stage/Group associated with this page, see https://about.gitlab.com/handbook/engineering/ux/technical-writing/#assignments --- # Snowplow Guide This guide provides an overview of how Snowplow works, and implementation details. For more information about Product Intelligence, see: - [Product Intelligence Guide](https://about.gitlab.com/handbook/product/product-intelligence-guide/) - [Service Ping Guide](../service_ping/index.md) More useful links: - [Product Intelligence Direction](https://about.gitlab.com/direction/product-intelligence/) - [Data Analysis Process](https://about.gitlab.com/handbook/business-technology/data-team/#data-analysis-process/) - [Data for Product Managers](https://about.gitlab.com/handbook/business-technology/data-team/programs/data-for-product-managers/) - [Data Infrastructure](https://about.gitlab.com/handbook/business-technology/data-team/platform/infrastructure/) ## What is Snowplow Snowplow is an enterprise-grade marketing and Product Intelligence platform which helps track the way users engage with our website and application. [Snowplow](https://snowplowanalytics.com) consists of the following loosely-coupled sub-systems: - **Trackers** fire Snowplow events. Snowplow has 12 trackers, covering web, mobile, desktop, server, and IoT. - **Collectors** receive Snowplow events from trackers. We have three different event collectors, synchronizing events either to Amazon S3, Apache Kafka, or Amazon Kinesis. - **Enrich** cleans up the raw Snowplow events, enriches them and puts them into storage. We have an Hadoop-based enrichment process, and a Kinesis-based or Kafka-based process. - **Storage** is where the Snowplow events live. We store the Snowplow events in a flat file structure on S3, and in the Redshift and PostgreSQL databases. - **Data modeling** is where event-level data is joined with other data sets and aggregated into smaller data sets, and business logic is applied. This produces a clean set of tables which make it easier to perform analysis on the data. We have data models for Redshift and Looker. - **Analytics** are performed on the Snowplow events or on the aggregate tables. ![snowplow_flow](../img/snowplow_flow.png) ### Useful links - [Understanding the structure of Snowplow data](https://docs.snowplowanalytics.com/docs/understanding-your-pipeline/canonical-event/) - [Our Iglu schema registry](https://gitlab.com/gitlab-org/iglu) - [List of events used in our codebase (Event Dictionary)](dictionary.md) ## Enable Snowplow tracking Tracking can be enabled at: - The instance level, which enables tracking on both the frontend and backend layers. - The user level, though user tracking can be disabled on a per-user basis. GitLab respects the [Do Not Track](https://www.eff.org/issues/do-not-track) standard, so any user who has enabled the Do Not Track option in their browser is not tracked at a user level. Snowplow tracking is enabled on GitLab.com, and we use it for most of our tracking strategy. To enable Snowplow tracking on a self-managed instance: 1. On the top bar, select **Menu > Admin**, then select **Settings > General**. Alternatively, go to `admin/application_settings/general` in your browser. 1. Expand **Snowplow**. 1. Select **Enable Snowplow tracking** and enter your Snowplow configuration information. For example: | Name | Value | |--------------------|-------------------------------| | Collector hostname | `your-snowplow-collector.net` | | App ID | `gitlab` | | Cookie domain | `.your-gitlab-instance.com` | 1. Select **Save changes**. ## Snowplow request flow The following example shows a basic request/response flow between the following components: - Snowplow JS / Ruby Trackers on GitLab.com - [GitLab.com Snowplow Collector](https://gitlab.com/gitlab-com/gl-infra/readiness/-/blob/master/library/snowplow/index.md) - The GitLab S3 Bucket - The GitLab Snowflake Data Warehouse - Sisense: ```mermaid sequenceDiagram participant Snowplow JS (Frontend) participant Snowplow Ruby (Backend) participant GitLab.com Snowplow Collector participant S3 Bucket participant Snowflake DW participant Sisense Dashboards Snowplow JS (Frontend) ->> GitLab.com Snowplow Collector: FE Tracking event Snowplow Ruby (Backend) ->> GitLab.com Snowplow Collector: BE Tracking event loop Process using Kinesis Stream GitLab.com Snowplow Collector ->> GitLab.com Snowplow Collector: Log raw events GitLab.com Snowplow Collector ->> GitLab.com Snowplow Collector: Enrich events GitLab.com Snowplow Collector ->> GitLab.com Snowplow Collector: Write to disk end GitLab.com Snowplow Collector ->> S3 Bucket: Kinesis Firehose S3 Bucket->>Snowflake DW: Import data Snowflake DW->>Snowflake DW: Transform data using dbt Snowflake DW->>Sisense Dashboards: Data available for querying ``` ## Structured event taxonomy When adding new click events, we should add them in a way that's internally consistent. If we don't, it is difficult to perform analysis across features because each feature captures events differently. The current method provides several attributes that are sent on each click event. Please try to follow these guidelines when specifying events to capture: | attribute | type | required | description | | --------- | ------- | -------- | ----------- | | category | text | true | The page or backend area of the application. Unless infeasible, please use the Rails page attribute by default in the frontend, and namespace + class name on the backend. | | action | text | true | The action the user is taking, or aspect that's being instrumented. The first word should always describe the action or aspect: clicks should be `click`, activations should be `activate`, creations should be `create`, etc. Use underscores to describe what was acted on; for example, activating a form field would be `activate_form_input`. An interface action like clicking on a dropdown would be `click_dropdown`, while a behavior like creating a project record from the backend would be `create_project` | | label | text | false | The specific element or object to act on. This can be one of the following: the label of the element (for example, a tab labeled 'Create from template' for `create_from_template`), a unique identifier if no text is available (for example, `groups_dropdown_close` for closing the Groups dropdown in the top bar), or the name or title attribute of a record being created. | | property | text | false | Any additional property of the element, or object being acted on. | | value | decimal | false | Describes a numeric value or something directly related to the event. This could be the value of an input. For example, `10` when clicking `internal` visibility. | ### Examples | category* | label | action | property** | value | |-------------|------------------|-----------------------|----------|:-----:| | `[root:index]` | `main_navigation` | `click_navigation_link` | `[link_label]` | - | | `[groups:boards:show]` | `toggle_swimlanes` | `click_toggle_button` | - | `[is_active]` | | `[projects:registry:index]` | `registry_delete` | `click_button` | - | - | | `[projects:registry:index]` | `registry_delete` | `confirm_deletion` | - | - | | `[projects:blob:show]` | `congratulate_first_pipeline` | `click_button` | `[human_access]` | - | | `[projects:clusters:new]` | `chart_options` | `generate_link` | `[chart_link]` | - | | `[projects:clusters:new]` | `chart_options` | `click_add_label_button` | `[label_id]` | - | _* It's ok to omit the category, and use the default._
_** Property is usually the best place for variable strings._ ### Reference SQL #### Last 20 `reply_comment_button` events ```sql SELECT session_id, event_id, event_label, event_action, event_property, event_value, event_category, contexts FROM legacy.snowplow_structured_events_all WHERE event_label = 'reply_comment_button' AND event_action = 'click_button' -- AND event_category = 'projects:issues:show' -- AND event_value = 1 ORDER BY collector_tstamp DESC LIMIT 20 ``` ### Web-specific parameters Snowplow JS adds many [web-specific parameters](https://docs.snowplowanalytics.com/docs/collecting-data/collecting-from-own-applications/snowplow-tracker-protocol/#Web-specific_parameters) to all web events by default. ## Implement Snowplow JS (Frontend) tracking GitLab provides `Tracking`, an interface that wraps the [Snowplow JavaScript Tracker](https://docs.snowplowanalytics.com/docs/collecting-data/collecting-from-own-applications/javascript-trackers/) for tracking custom events. The simplest way to use it is to add `data-` attributes to clickable elements and dropdowns. There is also a Vue mixin (exposing a `track` method), and the static method `Tracking.event`. Each of these requires at minimum a `category` and an `action`. You can provide additional [Structured event taxonomy](#structured-event-taxonomy) properties along with an `extra` object that accepts key-value pairs. | field | type | default value | description | |:-----------|:-------|:---------------------------|:---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| | `category` | string | `document.body.dataset.page` | Page or subsection of a page that events are being captured within. | | `action` | string | generic | Action the user is taking. Clicks should be `click` and activations should be `activate`, so for example, focusing a form field would be `activate_form_input`, and clicking a button would be `click_button`. | | `data` | object | `{}` | Additional data such as `label`, `property`, `value`, `context` (as described in our [Structured event taxonomy](#structured-event-taxonomy)), and `extra` (key-value pairs object). | ### Usage recommendations - Use [data attributes](#tracking-with-data-attributes) on HTML elements that emits either the `click`, `show.bs.dropdown`, or `hide.bs.dropdown` events. - Use the [Vue mixin](#tracking-within-vue-components) when tracking custom events, or if the supported events for data attributes are not propagating. - Use the [Tracking class directly](#tracking-in-raw-javascript) when tracking on raw JS files. ### Tracking with data attributes When working in HAML (or Vue templates) we can add `data-track-*` attributes to elements of interest. All elements that have a `data-track-action` attribute automatically have event tracking bound on clicks. You can provide extra data as a valid JSON string using `data-track-extra`. Below is an example of `data-track-*` attributes assigned to a button: ```haml %button.btn{ data: { track: { action: "click_button", label: "template_preview", property: "my-template" } } } ``` ```html