# Cycle Analytics development guide

Cycle analytics calculates the time between two arbitrary events recorded on domain objects and provides aggregated statistics about the duration.

## Stage

During development, events occur that move issues and merge requests through different stages of progress until they are considered finished. These stages can be expressed with the `Stage` model.

Example stage:

- Name: Development
- Start event: Issue created
- End event: Issue first mentioned in commit
- Parent: `Group: gitlab-org`

### Events

Events are the smallest building blocks of the cycle analytics feature. A stage consists of two events:

- Start
- End

These events play a key role in the duration calculation.

Formula: `duration = end_event_time - start_event_time`

To make the duration calculation flexible, each `Event` is implemented as a separate class. They're responsible for defining a timestamp expression that will be used in the calculation query.

#### Implementing an `Event` class

There are a few methods that are required to be implemented, the `StageEvent` base class describes them in great detail. The most important ones are:

- `object_type`
- `timestamp_projection`

The `object_type` method defines which domain object will be queried for the calculation. Currently two models are allowed:

- `Issue`
- `MergeRequest`

For the duration calculation the `timestamp_projection` method will be used.

```ruby
def timestamp_projection
  # your timestamp expression comes here
end

# event will use the issue creation time in the duration calculation
def timestamp_projection
  Issue.arel_table[:created_at]
end
```

NOTE: **Note:**
More complex expressions are also possible (e.g. using `COALESCE`). Look at the existing event classes for examples.

In some cases, defining the `timestamp_projection` method is not enough. The calculation query should know which table contains the timestamp expression. Each `Event` class is responsible for making modifications to the calculation query to make the `timestamp_projection` work. This usually means joining an additional table.

Example for joining the `issue_metrics` table and using the `first_mentioned_in_commit_at` column as the timestamp expression:

```ruby
def object_type
  Issue
end

def timestamp_projection
  IssueMetrics.arel_table[:first_mentioned_in_commit_at]
end

def apply_query_customization(query)
  # in this case the query attribute will be based on the Issue model: `Issue.where(...)`
  query.joins(:metrics)
end
```

### Validating start and end events

Some start/end event pairs are not "compatible" with each other. For example:

- "Issue created" to "Merge Request created": The event classes are defined on different  domain models, the `object_type` method is different.
- "Issue closed" to "Issue created": Issue must be created first before it can be  closed.
- "Issue closed" to "Issue closed": Duration is always 0.

The `StageEvents` module describes the allowed `start_event` and `end_event` pairings (`PAIRING_RULES` constant). If a new event is added, it needs to be registered in this module.
​To add a new event:​

1. Add an entry in `ENUM_MAPPING` with a unique number, it'll be used in the `Stage` model as `enum`.
1. Define which events are compatible with the event in the `PAIRING_RULES` hash.

Supported start/end event pairings:

```mermaid
graph LR;
  IssueCreated --> IssueClosed;
  IssueCreated --> IssueFirstAddedToBoard;
  IssueCreated --> IssueFirstAssociatedWithMilestone;
  IssueCreated --> IssueFirstMentionedInCommit;
  IssueCreated --> IssueLastEdited;
  IssueCreated --> IssueLabelAdded;
  IssueCreated --> IssueLabelRemoved;
  MergeRequestCreated --> MergeRequestMerged;
  MergeRequestCreated --> MergeRequestClosed;
  MergeRequestCreated --> MergeRequestFirstDeployedToProduction;
  MergeRequestCreated --> MergeRequestLastBuildStarted;
  MergeRequestCreated --> MergeRequestLastBuildFinished;
  MergeRequestCreated --> MergeRequestLastEdited;
  MergeRequestCreated --> MergeRequestLabelAdded;
  MergeRequestCreated --> MergeRequestLabelRemoved;
  MergeRequestLastBuildStarted --> MergeRequestLastBuildFinished;
  MergeRequestLastBuildStarted --> MergeRequestClosed;
  MergeRequestLastBuildStarted --> MergeRequestFirstDeployedToProduction;
  MergeRequestLastBuildStarted --> MergeRequestLastEdited;
  MergeRequestLastBuildStarted --> MergeRequestMerged;
  MergeRequestLastBuildStarted --> MergeRequestLabelAdded;
  MergeRequestLastBuildStarted --> MergeRequestLabelRemoved;
  MergeRequestMerged --> MergeRequestFirstDeployedToProduction;
  MergeRequestMerged --> MergeRequestClosed;
  MergeRequestMerged --> MergeRequestFirstDeployedToProduction;
  MergeRequestMerged --> MergeRequestLastEdited;
  MergeRequestMerged --> MergeRequestLabelAdded;
  MergeRequestMerged --> MergeRequestLabelRemoved;
  IssueLabelAdded --> IssueLabelAdded;
  IssueLabelAdded --> IssueLabelRemoved;
  IssueLabelAdded --> IssueClosed;
  IssueLabelRemoved --> IssueClosed;
  IssueFirstAddedToBoard --> IssueClosed;
  IssueFirstAddedToBoard --> IssueFirstAssociatedWithMilestone;
  IssueFirstAddedToBoard --> IssueFirstMentionedInCommit;
  IssueFirstAddedToBoard --> IssueLastEdited;
  IssueFirstAddedToBoard --> IssueLabelAdded;
  IssueFirstAddedToBoard --> IssueLabelRemoved;
  IssueFirstAssociatedWithMilestone --> IssueClosed;
  IssueFirstAssociatedWithMilestone --> IssueFirstAddedToBoard;
  IssueFirstAssociatedWithMilestone --> IssueFirstMentionedInCommit;
  IssueFirstAssociatedWithMilestone --> IssueLastEdited;
  IssueFirstAssociatedWithMilestone --> IssueLabelAdded;
  IssueFirstAssociatedWithMilestone --> IssueLabelRemoved;
  IssueFirstMentionedInCommit --> IssueClosed;
  IssueFirstMentionedInCommit --> IssueFirstAssociatedWithMilestone;
  IssueFirstMentionedInCommit --> IssueFirstAddedToBoard;
  IssueFirstMentionedInCommit --> IssueLastEdited;
  IssueFirstMentionedInCommit --> IssueLabelAdded;
  IssueFirstMentionedInCommit --> IssueLabelRemoved;
  IssueClosed --> IssueLastEdited;
  IssueClosed --> IssueLabelAdded;
  IssueClosed --> IssueLabelRemoved;
  MergeRequestClosed --> MergeRequestFirstDeployedToProduction;
  MergeRequestClosed --> MergeRequestLastEdited;
  MergeRequestClosed --> MergeRequestLabelAdded;
  MergeRequestClosed --> MergeRequestLabelRemoved;
  MergeRequestFirstDeployedToProduction --> MergeRequestLastEdited;
  MergeRequestFirstDeployedToProduction --> MergeRequestLabelAdded;
  MergeRequestFirstDeployedToProduction --> MergeRequestLabelRemoved;
  MergeRequestLastBuildFinished --> MergeRequestClosed;
  MergeRequestLastBuildFinished --> MergeRequestFirstDeployedToProduction;
  MergeRequestLastBuildFinished --> MergeRequestLastEdited;
  MergeRequestLastBuildFinished --> MergeRequestMerged;
  MergeRequestLastBuildFinished --> MergeRequestLabelAdded;
  MergeRequestLastBuildFinished --> MergeRequestLabelRemoved;
  MergeRequestLabelAdded --> MergeRequestLabelAdded;
  MergeRequestLabelAdded --> MergeRequestLabelRemoved;
  MergeRequestLabelRemoved --> MergeRequestLabelAdded;
  MergeRequestLabelRemoved --> MergeRequestLabelRemoved;
```

### Parent

Teams and organizations might define their own way of building software, thus stages can be completely different. For each stage, a parent object needs to be defined.

Currently supported parents:

- `Project`
- `Group`

#### How parent relationship it work

1. User navigates to the cycle analytics page.
1. User selects a group.
1. Backend loads the defined stages for the selected group.
1. Additions and modifications to the stages will be persisted within the selected group only.

### Default stages

The [original implementation](https://gitlab.com/gitlab-org/gitlab/issues/847) of cycle analytics defined 7 stages. These stages are always available for each parent, however altering these stages is not possible.
​
To make things efficient and reduce the number of records created, the default stages are expressed as in-memory objects (not persisted). When the user creates a custom stage for the first time, all the stages will be persisted. This behaviour is implemented in the cycle analytics service objects.
​
The reason for this was that we'd like to add the abilities to hide and order stages later on.

## Data Collector

`DataCollector` is the central point where the data will be queried from the database. The class always operates on a single stage and consists of the following components:

- `BaseQueryBuilder`:
  - Responsible for composing the initial query.
  - Deals with `Stage` specific configuration: events and their query customizations.
  - Parameters coming from the UI: date ranges.
- `Median`: Calculates the median duration for a stage using the query from  `BaseQueryBuilder`.
- `RecordsFetcher`: Loads relevant records for a stage using the query from  `BaseQueryBuilder` and specific `Finder` classes to apply visibility rules.
- `DataForDurationChart`: Loads calculated durations with the finish time (end event timestamp) for the scatterplot chart.

For a new calculation or a query, implement it as a new method call in the `DataCollector` class.

## Database query

Structure of the database query:

```sql
SELECT (customized by: Median or RecordsFetcher or DataForDurationChart)
FROM OBJECT_TYPE (Issue or MergeRequest)
INNER JOIN (several JOIN statements, depending on the events)
WHERE
  (Filter by the PARENT model, example: filter Issues from Project A)
  (Date range filter based on the OBJECT_TYPE.created_at)
  (Check if the START_EVENT is earlier than END_EVENT, preventing negative duration)
```

Structure of the `SELECT` statement for `Median`:

```sql
SELECT (calculate median from START_EVENT_TIME-END_EVENT_TIME)
```

Structure of the `SELECT` statement for `DataForDurationChart`:

```sql
SELECT (START_EVENT_TIME-END_EVENT_TIME) as duration, END_EVENT.timestamp
```

## High-level overview

- Rails Controller (`Analytics::CycleAnalytics` module): Cycle analytics exposes its data via JSON endpoints, implemented within the `analytics` workspace. Configuring the stages are also implements JSON endpoints (CRUD).
- Services (`Analytics::CycleAnalytics` module): All `Stage` related actions will be delegated to respective service objects.
- Models (`Analytics::CycleAnalytics` module): Models are used to persist the `Stage` objects `ProjectStage` and `GroupStage`.
- Feature classes (`Gitlab::Analytics::CycleAnalytics` module):
  - Responsible for composing queries and define feature specific busines logic.
  - `DataCollector`, `Event`, `StageEvents`, etc.

## Testing

Since we have a lots of events and possible pairings, testing each pairing is not possible. The rule is to have at least one test case using an `Event` class.

Writing a test case for a stage using a new `Event` can be challenging since data must be created for both events. To make this a bit simpler, each test case must be implemented in the `data_collector_spec.rb` where the stage is tested through the `DataCollector`. Each test case will be turned into multiple tests, covering the following cases:

- Different parents: `Group` or `Project`
- Different calculations: `Median`, `RecordsFetcher` or `DataForDurationChart`