debian-mirror-gitlab/doc/architecture/blueprints/cells/cells-feature-database-sequences.md
2023-05-27 22:25:52 +05:30

94 lines
4.8 KiB
Markdown

---
stage: enablement
group: Tenant Scale
comments: false
description: 'Cells: Database Sequences'
---
DISCLAIMER:
This page may contain information related to upcoming products, features and
functionality. It is important to note that the information presented is for
informational purposes only, so please do not rely on the information for
purchasing or planning purposes. Just like with all projects, the items
mentioned on the page are subject to change or delay, and the development,
release, and timing of any products, features, or functionality remain at the
sole discretion of GitLab Inc.
This document is a work-in-progress and represents a very early state of the
Cells design. Significant aspects are not documented, though we expect to add
them in the future. This is one possible architecture for Cells, and we intend to
contrast this with alternatives before deciding which approach to implement.
This documentation will be kept even if we decide not to implement this so that
we can document the reasons for not choosing this approach.
# Cells: Database Sequences
GitLab today ensures that every database row create has unique ID, allowing
to access Merge Request, CI Job or Project by a known global ID.
Cells will use many distinct and not connected databases, each of them having
a separate IDs for most of entities.
It might be desirable to retain globally unique IDs for all database rows
to allow migrating resources between Cells in the future.
## 1. Definition
## 2. Data flow
## 3. Proposal
This are some preliminary ideas how we can retain unique IDs across the system.
### 3.1. UUID
Instead of using incremental sequences use UUID (128 bit) that is stored in database.
- This might break existing IDs and requires adding UUID column for all existing tables.
- This makes all indexes larger as it requires storing 128 bit instead of 32/64 bit in index.
### 3.2. Use Cell index encoded in ID
Since significant number of tables already use 64 bit ID numbers we could use MSB to encode
Cell ID effectively enabling
- This might limit amount of Cells that can be enabled in system, as we might decide to only
allocate 1024 possible Cell numbers.
- This might make IDs to be migratable between Cells, since even if entity from Cell 1 is migrated to Cell 100
this ID would still be unique.
- If resources are migrated the ID itself will not be enough to decode Cell number and we would need
lookup table.
- This requires updating all IDs to 32 bits.
### 3.3. Allocate sequence ranges from central place
Each Cell might receive its own range of the sequences as they are consumed from a centrally managed place.
Once Cell consumes all IDs assigned for a given table it would be replenished and a next range would be allocated.
Ranges would be tracked to provide a faster lookup table if a random access pattern is required.
- This might make IDs to be migratable between Cells, since even if entity from Cell 1 is migrated to Cell 100
this ID would still be unique.
- If resources are migrated the ID itself will not be enough to decode Cell number and we would need
much more robust lookup table as we could be breaking previously assigned sequence ranges.
- This does not require updating all IDs to 64 bits.
- This adds some performance penalty to all `INSERT` statements in Postgres or at least from Rails as we need to check for the sequence number and potentially wait for our range to be refreshed from the ID server
- The available range will need to be stored and incremented in a centralized place so that concurrent transactions cannot possibly get the same value.
### 3.4. Define only some tables to require unique IDs
Maybe this is acceptable only for some tables to have a globally unique IDs. It could be projects, groups
and other top-level entities. All other tables like `merge_requests` would only offer Cell-local ID,
but when referenced outside it would rather use IID (an ID that is monotonic in context of a given resource, like project).
- This makes the ID 10000 for `merge_requests` be present on all Cells, which might be sometimes confusing
as for uniqueness of the resource.
- This might make random access by ID (if ever needed) be impossible without using composite key, like: `project_id+merge_request_id`.
- This would require us to implement a transformation/generation of new ID if we need to migrate records to another cell. This can lead to very difficult migration processes when these IDs are also used as foreign keys for other records being migrated.
- If IDs need to change when moving between cells this means that any links to records by ID would no longer work even if those links included the `project_id`.
- If we plan to allow these ids to not be unique and change the unique constraint to be based on a composite key then we'd need to update all foreign key references to be based on the composite key
## 4. Evaluation
## 4.1. Pros
## 4.2. Cons