--- stage: Growth group: Product Intelligence info: To determine the technical writer assigned to the Stage/Group associated with this page, see https://about.gitlab.com/handbook/engineering/ux/technical-writing/#assignments --- # Snowplow Snowplow is an enterprise-grade marketing and Product Intelligence platform that tracks how users engage with our website and application. [Snowplow](https://snowplowanalytics.com) consists of several loosely-coupled sub-systems: - **Trackers** fire Snowplow events. Snowplow has twelve trackers that cover web, mobile, desktop, server, and IoT. - **Collectors** receive Snowplow events from trackers. We use different event collectors that synchronize events to Amazon S3, Apache Kafka, or Amazon Kinesis. - **Enrich** cleans raw Snowplow events, enriches them, and puts them into storage. There is a Hadoop-based enrichment process, and a Kinesis-based or Kafka-based process. - **Storage** stores Snowplow events. We store the Snowplow events in a flat file structure on S3, and in the Redshift and PostgreSQL databases. - **Data modeling** joins event-level data with other data sets, aggregates them into smaller data sets, and applies business logic. This produces a clean set of tables for data analysis. We use data models for Redshift and Looker. - **Analytics** are performed on Snowplow events or on aggregate tables. ![snowplow_flow](../img/snowplow_flow.png) ## Enable Snowplow tracking Tracking can be enabled at: - The instance level, which enables tracking on both the frontend and backend layers. - The user level. User tracking can be disabled on a per user basis. GitLab respects the [Do Not Track](https://www.eff.org/issues/do-not-track) standard, so any user who has enabled the Do Not Track option in their browser is not tracked at a user level. Snowplow tracking is enabled on GitLab.com, and we use it for most of our tracking strategy. To enable Snowplow tracking on a self-managed instance: 1. On the top bar, select **Menu > Admin**, then select **Settings > General**. Alternatively, go to `admin/application_settings/general` in your browser. 1. Expand **Snowplow**. 1. Select **Enable Snowplow tracking** and enter your Snowplow configuration information. For example: | Name | Value | |--------------------|-------------------------------| | Collector hostname | `your-snowplow-collector.net` | | App ID | `gitlab` | | Cookie domain | `.your-gitlab-instance.com` | 1. Select **Save changes**. ## Snowplow request flow The following example shows a basic request/response flow between the following components: - Snowplow JS / Ruby Trackers on GitLab.com - [GitLab.com Snowplow Collector](https://gitlab.com/gitlab-com/gl-infra/readiness/-/blob/master/library/snowplow/index.md) - The GitLab S3 Bucket - The GitLab Snowflake Data Warehouse - Sisense: ```mermaid sequenceDiagram participant Snowplow JS (Frontend) participant Snowplow Ruby (Backend) participant GitLab.com Snowplow Collector participant S3 Bucket participant Snowflake DW participant Sisense Dashboards Snowplow JS (Frontend) ->> GitLab.com Snowplow Collector: FE Tracking event Snowplow Ruby (Backend) ->> GitLab.com Snowplow Collector: BE Tracking event loop Process using Kinesis Stream GitLab.com Snowplow Collector ->> GitLab.com Snowplow Collector: Log raw events GitLab.com Snowplow Collector ->> GitLab.com Snowplow Collector: Enrich events GitLab.com Snowplow Collector ->> GitLab.com Snowplow Collector: Write to disk end GitLab.com Snowplow Collector ->> S3 Bucket: Kinesis Firehose S3 Bucket->>Snowflake DW: Import data Snowflake DW->>Snowflake DW: Transform data using dbt Snowflake DW->>Sisense Dashboards: Data available for querying ``` ## Structured event taxonomy Click events must be consistent. If each feature captures events differently, it can be difficult to perform analysis. Each click event provides attributes that describe the event. | Attribute | Type | Required | Description | | --------- | ------- | -------- | ----------- | | category | text | true | The page or backend section of the application. Unless infeasible, use the Rails page attribute by default in the frontend, and namespace + class name on the backend. | | action | text | true | The action the user takes, or aspect that's being instrumented. The first word must describe the action or aspect. For example, clicks must be `click`, activations must be `activate`, creations must be `create`. Use underscores to describe what was acted on. For example, activating a form field is `activate_form_input`, an interface action like clicking on a dropdown is `click_dropdown`, a behavior like creating a project record from the backend is `create_project`. | | label | text | false | The specific element or object to act on. This can be one of the following: the label of the element, for example, a tab labeled 'Create from template' for `create_from_template`; a unique identifier if no text is available, for example, `groups_dropdown_close` for closing the Groups dropdown in the top bar; or the name or title attribute of a record being created. | | property | text | false | Any additional property of the element, or object being acted on. | | value | decimal | false | Describes a numeric value (decimal) directly related to the event. This could be the value of an input. For example, `10` when clicking `internal` visibility. | ### Examples | Category* | Label | Action | Property** | Value | |-------------|------------------|-----------------------|----------|:-----:| | `[root:index]` | `main_navigation` | `click_navigation_link` | `[link_label]` | - | | `[groups:boards:show]` | `toggle_swimlanes` | `click_toggle_button` | - | `[is_active]` | | `[projects:registry:index]` | `registry_delete` | `click_button` | - | - | | `[projects:registry:index]` | `registry_delete` | `confirm_deletion` | - | - | | `[projects:blob:show]` | `congratulate_first_pipeline` | `click_button` | `[human_access]` | - | | `[projects:clusters:new]` | `chart_options` | `generate_link` | `[chart_link]` | - | | `[projects:clusters:new]` | `chart_options` | `click_add_label_button` | `[label_id]` | - | _* If you choose to omit the category you can use the default._
_** Use property for variable strings._ ### Reference SQL #### Last 20 `reply_comment_button` events ```sql SELECT session_id, event_id, event_label, event_action, event_property, event_value, event_category, contexts FROM legacy.snowplow_structured_events_all WHERE event_label = 'reply_comment_button' AND event_action = 'click_button' -- AND event_category = 'projects:issues:show' -- AND event_value = 1 ORDER BY collector_tstamp DESC LIMIT 20 ``` #### Last 100 page view events ```sql SELECT -- page_url, -- page_title, -- referer_url, -- marketing_medium, -- marketing_source, -- marketing_campaign, -- browser_window_width, -- device_is_mobile * FROM legacy.snowplow_page_views_30 ORDER BY page_view_start DESC LIMIT 100 ``` #### Query JSON formatted data ```sql SELECT derived_tstamp, contexts:data[0]:data:extra:old_format as CURRENT_FORMAT, contexts:data[0]:data:extra:value as UPDATED_FORMAT FROM legacy.snowplow_structured_events_all WHERE event_action in ('wiki_format_updated') ORDER BY derived_tstamp DESC LIMIT 100 ``` ### Web-specific parameters Snowplow JavaScript adds [web-specific parameters](https://docs.snowplowanalytics.com/docs/collecting-data/collecting-from-own-applications/snowplow-tracker-protocol/#Web-specific_parameters) to all web events by default. ## Related topics - [Snowplow data structure](https://docs.snowplowanalytics.com/docs/understanding-your-pipeline/canonical-event/) - [Our Iglu schema registry](https://gitlab.com/gitlab-org/iglu) - [List of events used in our codebase (Event Dictionary)](https://metrics.gitlab.com/snowplow.html) - [Product Intelligence Guide](https://about.gitlab.com/handbook/product/product-intelligence-guide/) - [Service Ping Guide](../service_ping/index.md) - [Product Intelligence Direction](https://about.gitlab.com/direction/product-intelligence/) - [Data Analysis Process](https://about.gitlab.com/handbook/business-technology/data-team/#data-analysis-process/) - [Data for Product Managers](https://about.gitlab.com/handbook/business-technology/data-team/programs/data-for-product-managers/) - [Data Infrastructure](https://about.gitlab.com/handbook/business-technology/data-team/platform/infrastructure/)