debian-mirror-gitlab/doc/development/merge_request_performance_guidelines.md

# Merge Request Performance Guidelines

To ensure a merge request does not negatively impact performance of GitLab
_every_ merge request **must** adhere to the guidelines outlined in this
document. There are no exceptions to this rule unless specifically discussed
with and agreed upon by backend maintainers and performance specialists.

To measure the impact of a merge request you can use
[Sherlock](profiling.md#sherlock). It's also highly recommended that you read
the following guides:

* [Performance Guidelines](performance.md)
* [What requires downtime?](what_requires_downtime.md)

## Impact Analysis

**Summary:** think about the impact your merge request may have on performance
and those maintaining a GitLab setup.

Any change submitted can have an impact not only on the application itself but
also those maintaining it and those keeping it up and running (e.g. production
engineers). As a result you should think carefully about the impact of your
merge request on not only the application but also on the people keeping it up
and running.

Can the queries used potentially take down any critical services and result in
engineers being woken up in the night? Can a malicious user abuse the code to
take down a GitLab instance? Will my changes simply make loading a certain page
slower? Will execution time grow exponentially given enough load or data in the
database?

These are all questions one should ask themselves before submitting a merge
request. It may sometimes be difficult to assess the impact, in which case you
should ask a performance specialist to review your code. See the "Reviewing"
section below for more information.

## Performance Review

**Summary:** ask performance specialists to review your code if you're not sure
about the impact.

Sometimes it's hard to assess the impact of a merge request. In this case you
should ask one of the merge request reviewers to review your changes. You can
find a list of these reviewers at <https://about.gitlab.com/team/>. A reviewer
in turn can request a performance specialist to review the changes.

## Query Counts

**Summary:** a merge request **should not** increase the number of executed SQL
queries unless absolutely necessary.

The number of queries executed by the code modified or added by a merge request
must not increase unless absolutely necessary. When building features it's
entirely possible you will need some extra queries, but you should try to keep
this at a minimum.

As an example, say you introduce a feature that updates a number of database
rows with the same value. It may be very tempting (and easy) to write this using
the following pseudo code:

```ruby
objects_to_update.each do |object|
  object.some_field = some_value
  object.save
end
```

This will end up running one query for every object to update. This code can
easily overload a database given enough rows to update or many instances of this
code running in parallel. This particular problem is known as the
["N+1 query problem"](http://guides.rubyonrails.org/active_record_querying.html#eager-loading-associations). You can write a test with [QueryRecoder](query_recorder.md) to detect this and prevent regressions.

In this particular case the workaround is fairly easy:

```ruby
objects_to_update.update_all(some_field: some_value)
```

This uses ActiveRecord's `update_all` method to update all rows in a single
query. This in turn makes it much harder for this code to overload a database.

## Executing Queries in Loops

**Summary:** SQL queries **must not** be executed in a loop unless absolutely
necessary.

Executing SQL queries in a loop can result in many queries being executed
depending on the number of iterations in a loop. This may work fine for a
development environment with little data, but in a production environment this
can quickly spiral out of control.

There are some cases where this may be needed. If this is the case this should
be clearly mentioned in the merge request description.

## Eager Loading

**Summary:** always eager load associations when retrieving more than one row.

When retrieving multiple database records for which you need to use any
associations you **must** eager load these associations. For example, if you're
retrieving a list of blog posts and you want to display their authors you
**must** eager load the author associations.

In other words, instead of this:

```ruby
Post.all.each do |post|
  puts post.author.name
end
```

You should use this:

```ruby
Post.all.includes(:author).each do |post|
  puts post.author.name
end
```

Also consider using [QueryRecoder tests](query_recorder.md) to prevent a regression when eager loading.

## Memory Usage

**Summary:** merge requests **must not** increase memory usage unless absolutely
necessary.

A merge request must not increase the memory usage of GitLab by more than the
absolute bare minimum required by the code. This means that if you have to parse
some large document (e.g. an HTML document) it's best to parse it as a stream
whenever possible, instead of loading the entire input into memory. Sometimes
this isn't possible, in that case this should be stated explicitly in the merge
request.

## Lazy Rendering of UI Elements

**Summary:** only render UI elements when they're actually needed.

Certain UI elements may not always be needed. For example, when hovering over a
diff line there's a small icon displayed that can be used to create a new
comment. Instead of always rendering these kind of elements they should only be
rendered when actually needed. This ensures we don't spend time generating
Haml/HTML when it's not going to be used.

## Instrumenting New Code

**Summary:** always add instrumentation for new classes, modules, and methods.

Newly added classes, modules, and methods must be instrumented. This ensures
we can track the performance of this code over time.

For more information see [Instrumentation](instrumentation.md). This guide
describes how to add instrumentation and where to add it.

## Use of Caching

**Summary:** cache data in memory or in Redis when it's needed multiple times in
a transaction or has to be kept around for a certain time period.

Sometimes certain bits of data have to be re-used in different places during a
transaction. In these cases this data should be cached in memory to remove the
need for running complex operations to fetch the data. You should use Redis if
data should be cached for a certain time period instead of the duration of the
transaction.

For example, say you process multiple snippets of text containing username
mentions (e.g. `Hello @alice` and `How are you doing @alice?`). By caching the
user objects for every username we can remove the need for running the same
query for every mention of `@alice`.

Caching data per transaction can be done using
[RequestStore](https://github.com/steveklabnik/request_store) (use
`Gitlab::SafeRequestStore` to avoid having to remember to check
`RequestStore.active?`). Caching data in Redis can be done using [Rails' caching
system](http://guides.rubyonrails.org/caching_with_rails.html).
New upstream version 8.12.1+dfsg1 2016-09-29 09:46:39 +05:30			`# Merge Request Performance Guidelines`

			`To ensure a merge request does not negatively impact performance of GitLab`
			`_every_ merge request must adhere to the guidelines outlined in this`
			`document. There are no exceptions to this rule unless specifically discussed`
New upstream version 9.2.10+dfsg 2017-08-17 22:00:37 +05:30			`with and agreed upon by backend maintainers and performance specialists.`
New upstream version 8.12.1+dfsg1 2016-09-29 09:46:39 +05:30
			`To measure the impact of a merge request you can use`
			`[Sherlock](profiling.md#sherlock). It's also highly recommended that you read`
			`the following guides:`

			`* [Performance Guidelines](performance.md)`
			`* [What requires downtime?](what_requires_downtime.md)`

			`## Impact Analysis`

			`Summary: think about the impact your merge request may have on performance`
			`and those maintaining a GitLab setup.`

			`Any change submitted can have an impact not only on the application itself but`
			`also those maintaining it and those keeping it up and running (e.g. production`
			`engineers). As a result you should think carefully about the impact of your`
			`merge request on not only the application but also on the people keeping it up`
			`and running.`

			`Can the queries used potentially take down any critical services and result in`
			`engineers being woken up in the night? Can a malicious user abuse the code to`
			`take down a GitLab instance? Will my changes simply make loading a certain page`
			`slower? Will execution time grow exponentially given enough load or data in the`
			`database?`

			`These are all questions one should ask themselves before submitting a merge`
			`request. It may sometimes be difficult to assess the impact, in which case you`
			`should ask a performance specialist to review your code. See the "Reviewing"`
			`section below for more information.`

			`## Performance Review`

			`Summary: ask performance specialists to review your code if you're not sure`
			`about the impact.`

			`Sometimes it's hard to assess the impact of a merge request. In this case you`
New upstream version 9.2.10+dfsg 2017-08-17 22:00:37 +05:30			`should ask one of the merge request reviewers to review your changes. You can`
			`find a list of these reviewers at <https://about.gitlab.com/team/>. A reviewer`
			`in turn can request a performance specialist to review the changes.`
New upstream version 8.12.1+dfsg1 2016-09-29 09:46:39 +05:30
			`## Query Counts`

			`Summary: a merge request should not increase the number of executed SQL`
			`queries unless absolutely necessary.`

			`The number of queries executed by the code modified or added by a merge request`
			`must not increase unless absolutely necessary. When building features it's`
			`entirely possible you will need some extra queries, but you should try to keep`
			`this at a minimum.`

			`As an example, say you introduce a feature that updates a number of database`
			`rows with the same value. It may be very tempting (and easy) to write this using`
			`the following pseudo code:`

			```ruby
			`objects_to_update.each do \|object\|`
			`object.some_field = some_value`
			`object.save`
			`end`
			```

			`This will end up running one query for every object to update. This code can`
			`easily overload a database given enough rows to update or many instances of this`
			`code running in parallel. This particular problem is known as the`
New upstream version 9.2.10+dfsg 2017-08-17 22:00:37 +05:30			`["N+1 query problem"](http://guides.rubyonrails.org/active_record_querying.html#eager-loading-associations). You can write a test with [QueryRecoder](query_recorder.md) to detect this and prevent regressions.`
New upstream version 8.12.1+dfsg1 2016-09-29 09:46:39 +05:30
			`In this particular case the workaround is fairly easy:`

			```ruby
			`objects_to_update.update_all(some_field: some_value)`
			```

			This uses ActiveRecord's `update_all` method to update all rows in a single
			`query. This in turn makes it much harder for this code to overload a database.`

			`## Executing Queries in Loops`

			`Summary: SQL queries must not be executed in a loop unless absolutely`
			`necessary.`

			`Executing SQL queries in a loop can result in many queries being executed`
			`depending on the number of iterations in a loop. This may work fine for a`
			`development environment with little data, but in a production environment this`
			`can quickly spiral out of control.`

			`There are some cases where this may be needed. If this is the case this should`
			`be clearly mentioned in the merge request description.`

			`## Eager Loading`

			`Summary: always eager load associations when retrieving more than one row.`

			`When retrieving multiple database records for which you need to use any`
			`associations you must eager load these associations. For example, if you're`
			`retrieving a list of blog posts and you want to display their authors you`
			`must eager load the author associations.`

			`In other words, instead of this:`

			```ruby
			`Post.all.each do \|post\|`
			`puts post.author.name`
			`end`
			```

			`You should use this:`

			```ruby
			`Post.all.includes(:author).each do \|post\|`
			`puts post.author.name`
			`end`
			```

New upstream version 9.2.10+dfsg 2017-08-17 22:00:37 +05:30			`Also consider using [QueryRecoder tests](query_recorder.md) to prevent a regression when eager loading.`

New upstream version 8.12.1+dfsg1 2016-09-29 09:46:39 +05:30			`## Memory Usage`

			`Summary: merge requests must not increase memory usage unless absolutely`
			`necessary.`

			`A merge request must not increase the memory usage of GitLab by more than the`
			`absolute bare minimum required by the code. This means that if you have to parse`
			`some large document (e.g. an HTML document) it's best to parse it as a stream`
			`whenever possible, instead of loading the entire input into memory. Sometimes`
			`this isn't possible, in that case this should be stated explicitly in the merge`
			`request.`

			`## Lazy Rendering of UI Elements`

			`Summary: only render UI elements when they're actually needed.`

			`Certain UI elements may not always be needed. For example, when hovering over a`
			`diff line there's a small icon displayed that can be used to create a new`
			`comment. Instead of always rendering these kind of elements they should only be`
			`rendered when actually needed. This ensures we don't spend time generating`
			`Haml/HTML when it's not going to be used.`

			`## Instrumenting New Code`

			`Summary: always add instrumentation for new classes, modules, and methods.`

			`Newly added classes, modules, and methods must be instrumented. This ensures`
			`we can track the performance of this code over time.`

			`For more information see [Instrumentation](instrumentation.md). This guide`
			`describes how to add instrumentation and where to add it.`

			`## Use of Caching`

			`Summary: cache data in memory or in Redis when it's needed multiple times in`
			`a transaction or has to be kept around for a certain time period.`

			`Sometimes certain bits of data have to be re-used in different places during a`
			`transaction. In these cases this data should be cached in memory to remove the`
			`need for running complex operations to fetch the data. You should use Redis if`
			`data should be cached for a certain time period instead of the duration of the`
			`transaction.`

New upstream version 10.8.7+dfsg 2018-10-15 14:42:47 +05:30			`For example, say you process multiple snippets of text containing username`
New upstream version 8.12.1+dfsg1 2016-09-29 09:46:39 +05:30			mentions (e.g. `Hello @alice` and `How are you doing @alice?`). By caching the
			`user objects for every username we can remove the need for running the same`
			query for every mention of `@alice`.

			`Caching data per transaction can be done using`
New upstream version 11.4.9+dfsg 2018-12-05 23:21:45 +05:30			`[RequestStore](https://github.com/steveklabnik/request_store) (use`
			`Gitlab::SafeRequestStore` to avoid having to remember to check
			`RequestStore.active?`). Caching data in Redis can be done using [Rails' caching
New upstream version 8.12.1+dfsg1 2016-09-29 09:46:39 +05:30			`system](http://guides.rubyonrails.org/caching_with_rails.html).`