debian-mirror-gitlab/doc/development/database/transaction_guidelines.md

144 lines
4.7 KiB
Markdown
Raw Normal View History

2021-09-30 23:02:18 +05:30
---
2022-07-23 23:45:48 +05:30
stage: Data Stores
2021-09-30 23:02:18 +05:30
group: Database
info: To determine the technical writer assigned to the Stage/Group associated with this page, see https://about.gitlab.com/handbook/engineering/ux/technical-writing/#assignments
---
# Transaction guidelines
This document gives a few examples of the usage of database transactions in application code.
2021-11-18 22:05:49 +05:30
For further reference, check PostgreSQL documentation about [transactions](https://www.postgresql.org/docs/current/tutorial-transactions.html).
2021-09-30 23:02:18 +05:30
## Database decomposition and sharding
2021-11-18 22:05:49 +05:30
The [sharding group](https://about.gitlab.com/handbook/engineering/development/enablement/sharding/) plans
to split the main GitLab database and move some of the database tables to other database servers.
2021-09-30 23:02:18 +05:30
2022-07-23 23:45:48 +05:30
We start decomposing the `ci_*`-related database tables first. To maintain the current application
development experience, we add tooling and static analyzers to the codebase to ensure correct
2021-11-18 22:05:49 +05:30
data access and data modification methods. By using the correct form for defining database transactions,
we can save significant refactoring work in the future.
2021-09-30 23:02:18 +05:30
## The transaction block
2021-11-18 22:05:49 +05:30
The `ActiveRecord` library provides a convenient way to group database statements into a transaction:
2021-09-30 23:02:18 +05:30
```ruby
issue = Issue.find(10)
project = issue.project
ApplicationRecord.transaction do
issue.update!(title: 'updated title')
project.update!(last_update_at: Time.now)
end
```
2021-11-18 22:05:49 +05:30
This transaction involves two database tables. In case of an error, each `UPDATE`
statement rolls back to the previous consistent state.
2021-09-30 23:02:18 +05:30
NOTE:
Avoid referencing the `ActiveRecord::Base` class and use `ApplicationRecord` instead.
## Transaction and database locks
2021-11-18 22:05:49 +05:30
When a transaction block is opened, the database tries to acquire the necessary
locks on the resources. The type of locks depend on the actual database statements.
2021-09-30 23:02:18 +05:30
2021-11-18 22:05:49 +05:30
Consider a concurrent update scenario where the following code is executed at the
same time from two different processes:
2021-09-30 23:02:18 +05:30
```ruby
issue = Issue.find(10)
project = issue.project
ApplicationRecord.transaction do
issue.update!(title: 'updated title')
project.update!(last_update_at: Time.now)
end
```
2021-11-18 22:05:49 +05:30
The database tries to acquire the `FOR UPDATE` lock for the referenced `issue` and
`project` records. In our case, we have two competing transactions for these locks,
2022-07-23 23:45:48 +05:30
and only one of them successfully acquires them. The other transaction has
2021-11-18 22:05:49 +05:30
to wait in the lock queue until the first transaction finishes. The execution of the
second transaction is blocked at this point.
2021-09-30 23:02:18 +05:30
## Transaction speed
2021-11-18 22:05:49 +05:30
To prevent lock contention and maintain stable application performance, the transaction
block should finish as fast as possible. When a transaction acquires locks, it holds
on to them until the transaction finishes.
2021-09-30 23:02:18 +05:30
2021-11-18 22:05:49 +05:30
Apart from application performance, long-running transactions can also affect application
upgrade processes by blocking database migrations.
2021-09-30 23:02:18 +05:30
2021-11-18 22:05:49 +05:30
### Dangerous example: third-party API calls
2021-09-30 23:02:18 +05:30
Consider the following example:
```ruby
member = Member.find(5)
Member.transaction do
member.update!(notification_email_sent: true)
member.send_notification_email
end
```
2021-11-18 22:05:49 +05:30
Here, we ensure that the `notification_email_sent` column is updated only when the
`send_notification_email` method succeeds. The `send_notification_email` method
executes a network request to an email sending service. If the underlying infrastructure
does not specify timeouts or the network call takes too long time, the database transaction
stays open.
2021-09-30 23:02:18 +05:30
Ideally, a transaction should only contain database statements.
Avoid doing in a `transaction` block:
2021-11-18 22:05:49 +05:30
- External network requests such as:
- Triggering Sidekiq jobs.
- Sending emails.
- HTTP API calls.
- Running database statements using a different connection.
2021-09-30 23:02:18 +05:30
- File system operations.
- Long, CPU intensive computation.
- Calling `sleep(n)`.
## Explicit model referencing
2021-11-18 22:05:49 +05:30
If a transaction modifies records from the same database table, we advise to use the
`Model.transaction` block:
2021-09-30 23:02:18 +05:30
```ruby
build_1 = Ci::Build.find(1)
build_2 = Ci::Build.find(2)
Ci::Build.transaction do
build_1.touch
build_2.touch
end
```
2021-11-18 22:05:49 +05:30
The transaction above uses the same database connection for the transaction as the models
in the `transaction` block. In a multi-database environment the following example is dangerous:
2021-09-30 23:02:18 +05:30
```ruby
# `ci_builds` table is located on another database
class Ci::Build < CiDatabase
end
build_1 = Ci::Build.find(1)
build_2 = Ci::Build.find(2)
ActiveRecord::Base.transaction do
build_1.touch
build_2.touch
end
```
2021-11-18 22:05:49 +05:30
The `ActiveRecord::Base` class uses a different database connection than the `Ci::Build` records.
2022-07-23 23:45:48 +05:30
The two statements in the transaction block are not part of the transaction and are
2021-11-18 22:05:49 +05:30
rolled back in case something goes wrong. They act as 3rd part calls.