debian-mirror-gitlab/doc/development/feature_flags/controls.md
2021-03-08 18:12:59 +05:30

11 KiB

type stage group info
reference, dev none Development See the Technical Writers assigned to Development Guidelines: https://about.gitlab.com/handbook/engineering/ux/technical-writing/#assignments-to-development-guidelines

Feature flag controls

Access

To be able to turn on/off features behind feature flags in any of the GitLab Inc. provided environments such as staging and production, you need to have access to the Chatops bot. The Chatops bot is currently running on the ops instance, which is different from https://gitlab.com or https://dev.gitlab.org.

Follow the Chatops document to request access.

Once you are added to the project test if your access propagated, run:

/chatops run feature --help

Rolling out changes

When the changes are deployed to the environments it is time to start rolling out the feature to our users. The exact procedure of rolling out a change is unspecified, as this can vary from change to change. However, in general we recommend rolling out changes incrementally, instead of enabling them for everybody right away. We also recommend you to not enable a feature before the code is being deployed. This allows you to separate rolling out a feature from a deploy, making it easier to measure the impact of both separately.

The GitLab feature library (using Flipper, and covered in the Feature Flags process guide) supports rolling out changes to a percentage of time to users. This in turn can be controlled using GitLab Chatops.

For an up to date list of feature flag commands please see the source code. Note that all the examples in that file must be preceded by /chatops run.

If you get an error "Whoops! This action is not allowed. This incident will be reported." that means your Slack account is not allowed to change feature flags or you do not have access.

Enabling a feature for preproduction testing

As a first step in a feature rollout, you should enable the feature on https://about.staging.gitlab.com and https://dev.gitlab.org.

These two environments have different scopes. dev.gitlab.org is a production CE environment that has internal GitLab Inc. traffic and is used for some development and other related work. staging.gitlab.com has a smaller subset of GitLab.com database and repositories and does not have regular traffic. Staging is an EE instance and can give you a (very) rough estimate of how your feature will look/behave on GitLab.com. Both of these instances are connected to Sentry so make sure you check the projects there for any exceptions while testing your feature after enabling the feature flag.

For these preproduction environments, the commands should be run in a Slack channel for the stage the feature is relevant to. For example, use the #s_monitor channel for features developed by the Monitor stage, Health group.

To enable a feature for 25% of all users, run the following in Slack:

/chatops run feature set new_navigation_bar 25 --dev
/chatops run feature set new_navigation_bar 25 --staging

Enabling a feature for GitLab.com

When a feature has successfully been enabled on a preproduction environment and verified as safe and working, you can roll out the change to GitLab.com (production).

Communicate the change

Some feature flag changes on GitLab.com should be communicated with parts of the company. The developer responsible needs to determine whether this is necessary and the appropriate level of communication. This depends on the feature and what sort of impact it might have.

Guidelines:

  1. If the feature meets the requirements for creating a Change Management issue, create a Change Management issue per criticality guidelines.
  2. For simple, low-risk, easily reverted features, proceed and enable the feature in #production.
  3. For features that impact the user experience, consider notifying #support_gitlab-com beforehand.

Process

Before toggling any feature flag, check that there are no ongoing significant incidents on GitLab.com. You can do this by checking the #production and #incident-management Slack channels, or looking for open incident issues (although check the dates and times).

We do not want to introduce changes during an incident, as it can make diagnosis and resolution of the incident much harder to achieve, and also will largely invalidate your rollout process as you will be unable to assess whether the rollout was without problems or not.

If there is any doubt, ask in #production.

The following /chatops commands should be performed in the Slack #production channel.

When you begin to enable the feature, please link to the relevant Feature Flag Rollout Issue within a Slack thread of the first /chatops command you make so people can understand the change if they need to.

To enable a feature for 25% of the time, run the following in Slack:

/chatops run feature set new_navigation_bar 25

This sets a feature flag to true based on the following formula:

feature_flag_state = rand < (25 / 100.0)

This will enable the feature for GitLab.com, with new_navigation_bar being the name of the feature. This command does not enable the feature for 25% of the total users. Instead, when the feature is checked with enabled?, it will return true 25% of the time.

To enable a feature for 25% of actors such as users, projects, or groups, run the following in Slack:

/chatops run feature set some_feature 25 --actors

This sets a feature flag to true based on the following formula:

feature_flag_state = Zlib.crc32("some_feature<Actor>:#{actor.id}") % (100 * 1_000) < 25 * 1_000
# where <Actor>: is a `User`, `Group`, `Project` and actor is an instance

During development, based on the nature of the feature, an actor choice should be made.

For user focused features:

Feature.enabled?(:feature_cool_avatars, current_user)

For group or namespace level features:

Feature.enabled?(:feature_cooler_groups, group)

For project level features:

Feature.enabled?(:feature_ice_cold_projects, project)

If you are not certain what percentages to use, simply use the following steps:

  1. 25%
  2. 50%
  3. 75%
  4. 100%

Between every step you'll want to wait a little while and monitor the appropriate graphs on https://dashboards.gitlab.net. The exact time to wait may differ. For some features a few minutes is enough, while for others you may want to wait several hours or even days. This is entirely up to you, just make sure it is clearly communicated to your team, and the Production team if you anticipate any potential problems.

Feature gates can also be actor based, for example a feature could first be enabled for only the gitlab project. The project is passed by supplying a --project flag:

/chatops run feature set --project=gitlab-org/gitlab some_feature true

For groups the --group flag is available:

/chatops run feature set --group=gitlab-org some_feature true

Note that actor-based gates are applied before percentages. For example, considering the group/project as gitlab-org/gitlab and a given example feature as some_feature, if you run these 2 commands:

/chatops run feature set --project=gitlab-org/gitlab some_feature true
/chatops run feature set some_feature 25 --actors

Then some_feature will be enabled for both 25% of actors and always when interacting with gitlab-org/gitlab. This is a good idea if the feature flag development makes use of group actors.

Feature.enabled?(:some_feature, group)

Percentage of time rollout is not a good idea if what you want is to make sure a feature is always on or off to the users. In that case, Percentage of actors rollout is a better method.

Lastly, to verify that the feature is deemed stable in as many cases as possible, you should fully roll out the feature by enabling the flag globally by running:

/chatops run feature set some_feature true

This changes the feature flag state to be enabled always, which overrides the existing gates (e.g. --group=gitlab-org) in the above processes.

Feature flag change logging

Chatops level

Any feature flag change that affects GitLab.com (production) via Chatops is automatically logged in an issue.

The issue is created in the gl-infra/feature-flag-log project, and it will at minimum log the Slack handle of person enabling a feature flag, the time, and the name of the flag being changed.

The issue is then also posted to the GitLab internal Grafana dashboard as an annotation marker to make the change even more visible.

Changes to the issue format can be submitted in the Chatops project.

Instance level

Any feature flag change that affects any GitLab instance is automatically logged in features_json.log. You can search the change history in Kibana. You can access the feature flag change history for GitLab.com here.

Cleaning up

A feature flag should be removed as soon as it is no longer needed. Each additional feature flag in the codebase increases the complexity of the application and reduces confidence in our testing suite covering all possible combinations. Additionally, a feature flag overwritten in some of the environments can result in undefined and untested system behavior.

To remove a feature flag:

  1. Open a new merge request with the ~"feature flag" label so release managers are aware the changes are hidden behind a feature flag.
  2. If the merge request has to be picked into a stable branch, add the appropriate ~"Pick into X.Y" label, for example ~"Pick into 13.0". See the feature flag process for further details.
  3. Remove all references to the feature flag from the codebase.
  4. Remove the YAML definition for the feature from the repository.
  5. Clean up the feature flag from all environments with /chatops run feature delete some_feature.
  6. Close the rollout issue for the feature flag after the feature flag is removed from the codebase.

Cleanup ChatOps

When a feature gate has been removed from the codebase, the feature record still exists in the database that the flag was deployed too. The record can be deleted once the MR is deployed to each environment:

/chatops run feature delete some_feature --dev
/chatops run feature delete some_feature --staging

Then, you can delete it from production after the MR is deployed to prod:

/chatops run feature delete some_feature