debian-mirror-gitlab/doc/administration/sidekiq/processing_specific_job_classes.md

Ignoring revisions in .git-blame-ignore-revs. Click here to bypass and see the normal blame view.

338 lines
14 KiB
Markdown
Raw Normal View History

2023-03-04 22:38:38 +05:30
---
stage: Systems
group: Distribution
info: To determine the technical writer assigned to the Stage/Group associated with this page, see https://about.gitlab.com/handbook/product/ux/technical-writing/#assignments
---
# Processing specific job classes
WARNING:
These are advanced settings. While they are used on GitLab.com, most GitLab
instances should add more processes that all listen to all queues. This is the
same approach we take in our [Reference Architectures](../reference_architectures/index.md).
GitLab has two options for creating Sidekiq processes that only handle specific
job classes:
1. [Routing rules](#routing-rules) are used on GitLab.com. They direct jobs
inside the application to queue names configured by administrators. This
lowers the load on Redis, which is important on very large-scale deployments.
1. [Queue selectors](#queue-selectors) perform the job selection outside the
application, when starting the Sidekiq process. This was used on GitLab.com
until September 2021, and is retained for compatibility reasons.
Both of these use the same [worker matching query](#worker-matching-query)
syntax. While they can technically be used together, most deployments should
choose one or the other; there is no particular benefit in combining them.
Routing rules must be the same across all GitLab nodes as they are part of the
application configuration. Queue selectors can be different across GitLab nodes
because they only change the arguments to the launched Sidekiq process.
## Routing rules
> - [Introduced](https://gitlab.com/gitlab-org/gitlab/-/merge_requests/59604) in GitLab 13.12.
> - [Default routing rule value](https://gitlab.com/gitlab-org/gitlab/-/merge_requests/97908) added in GitLab 15.4.
NOTE:
Mailer jobs cannot be routed by routing rules, and always go to the
`mailers` queue. When using routing rules, ensure that at least one process is
listening to the `mailers` queue. Typically this can be placed alongside the
`default` queue.
We recommend most GitLab instances using routing rules to manage their Sidekiq
queues. This allows administrators to choose single queue names for groups of
job classes based on their attributes. The syntax is an ordered array of pairs of `[query, queue]`:
1. The query is a [worker matching query](#worker-matching-query).
1. The queue name must be a valid Sidekiq queue name. If the queue name
is `nil`, or an empty string, the worker is routed to the queue generated
by the name of the worker instead. (See [list of available job classes](#list-of-available-job-classes)
for more information).
The queue name does not have to match any existing queue name in the
list of available job classes.
1. The first query matching a worker is chosen for that worker; later rules are
ignored.
### Routing rules migration
After the Sidekiq routing rules are changed, administrators must take care with
the migration to avoid losing jobs entirely, especially in a system with long
queues of jobs. The migration can be done by following the migration steps
mentioned in [Sidekiq job migration](sidekiq_job_migration.md).
### Detailed example
This is a comprehensive example intended to show different possibilities. It is
not a recommendation.
1. Edit `/etc/gitlab/gitlab.rb`:
```ruby
sidekiq['routing_rules'] = [
# Route all non-CPU-bound workers that are high urgency to `high-urgency` queue
['resource_boundary!=cpu&urgency=high', 'high-urgency'],
# Route all database, gitaly and global search workers that are throttled to `throttled` queue
['feature_category=database,gitaly,global_search&urgency=throttled', 'throttled'],
# Route all workers having contact with outside world to a `network-intenstive` queue
['has_external_dependencies=true|feature_category=hooks|tags=network', 'network-intensive'],
# Route all import workers to the queues generated by the worker name, for
# example, JiraImportWorker to `jira_import`, SVNWorker to `svn_worker`
['feature_category=import', 'import'],
# Wildcard matching, route the rest to `default` queue
['*', 'default']
]
```
The `queue_groups` can then be set to match these generated queue names. For
instance:
```ruby
sidekiq['queue_selector'] = false
sidekiq['queue_groups'] = [
# Run two high-urgency processes
'high-urgency',
'high-urgency',
# Run one process for throttled, network-intensive, import
'throttled,network-intensive,import',
# Run one 'catchall' process on the default and mailers queues
'default,mailers'
]
```
1. Save the file and reconfigure GitLab:
```shell
sudo gitlab-ctl reconfigure
```
## Queue selectors
> - [Introduced](https://gitlab.com/gitlab-com/gl-infra/scalability/-/issues/45) in GitLab 12.8.
> - [Sidekiq cluster, including queue selector, moved](https://gitlab.com/groups/gitlab-com/gl-infra/-/epics/181) to GitLab Free in 12.10.
> - [Renamed from `experimental_queue_selector` to `queue_selector`](https://gitlab.com/gitlab-com/gl-infra/scalability/-/issues/147) in GitLab 13.6.
The `queue_selector` option allows queue groups to be selected in a more general
way using a [worker matching query](#worker-matching-query). After
`queue_selector` is set, all `queue_groups` must follow the aforementioned
syntax.
### Using queue selectors
1. Edit `/etc/gitlab/gitlab.rb`:
```ruby
sidekiq['enable'] = true
sidekiq['routing_rules'] = [['*', nil]]
sidekiq['queue_selector'] = true
sidekiq['queue_groups'] = [
# Run all non-CPU-bound queues that are high urgency
'resource_boundary!=cpu&urgency=high',
# Run all continuous integration and pages queues that are not high urgency
'feature_category=continuous_integration,pages&urgency!=high',
# Run all queues
'*'
]
```
1. Save the file and reconfigure GitLab:
```shell
sudo gitlab-ctl reconfigure
```
### Negate settings
This allows you to have the Sidekiq process work on every queue **except** the
ones you list. This is generally only used when there are multiple Sidekiq
nodes. In this example, we exclude all import-related jobs from a Sidekiq node.
1. Edit `/etc/gitlab/gitlab.rb`:
```ruby
sidekiq['routing_rules'] = [['*', nil]]
sidekiq['negate'] = true
sidekiq['queue_selector'] = true
sidekiq['queue_groups'] = [
"feature_category=importers"
]
```
1. Save the file and reconfigure GitLab:
```shell
sudo gitlab-ctl reconfigure
```
### Migrating from queue selectors to routing rules
We recommend GitLab deployments add more Sidekiq processes listening to all queues, as in the
[Reference Architectures](../reference_architectures/index.md). For very large-scale deployments, we recommend
[routing rules](#routing-rules) instead of [queue selectors](#queue-selectors). We use routing rules on GitLab.com as
it helps to lower the load on Redis.
To migrate from queue selectors to routing rules:
1. Open `/etc/gitlab/gitlab.rb`.
1. Set `sidekiq['queue_selector']` to `false`.
1. Take all queue `selector`s in the `sidekiq['queue_groups']`.
1. Give each `selector` a `queue_name` and put them in `[selector, queue_name]` format.
1. Replace `sidekiq['routing_rules']` with an array of `[selector, queue_name]` entries.
1. Add a wildcard match of `['*', 'default']` as the last entry in `sidekiq['routing_rules']`. This "catchall" queue has
to be named as `default`.
1. Replace `sidekiq['queue_groups']` with `queue_name`s.
1. Add at least one `default` queue and at least one `mailers` queue to the `sidekiq['queue_groups']`.
1. Save the file and reconfigure GitLab:
```shell
sudo gitlab-ctl reconfigure
```
1. Run the Rake task to [migrate existing jobs](sidekiq_job_migration.md):
```shell
sudo gitlab-rake gitlab:sidekiq:migrate_jobs:retry gitlab:sidekiq:migrate_jobs:schedule gitlab:sidekiq:migrate_jobs:queued
```
NOTE:
It is important to run the Rake task immediately after reconfiguring GitLab.
After reconfiguring GitLab, existing jobs are not processed until the Rake task starts to migrate the jobs.
The following example better illustrates the migration process above:
1. Check the following content of `/etc/gitlab/gitlab.rb`:
```ruby
sidekiq['routing_rules'] = []
sidekiq['queue_selector'] = true
sidekiq['queue_groups'] = [
'urgency=high',
'urgency=low',
'urgency=throttled',
'*'
]
```
1. Update `/etc/gitlab/gitlab.rb` to use routing rules:
```ruby
sidekiq['min_concurrency'] = 20
sidekiq['max_concurrency'] = 20
sidekiq['routing_rules'] = [
['urgency=high', 'high_urgency'],
['urgency=low', 'low_urgency'],
['urgency=throttled', 'throttled_urgency'],
# Wildcard matching, route the rest to `default` queue
['*', 'default']
]
sidekiq['queue_selector'] = false
sidekiq['queue_groups'] = [
'high_urgency',
'low_urgency',
'throttled_urgency',
'default,mailers'
]
```
1. Save the file and reconfigure GitLab:
```shell
sudo gitlab-ctl reconfigure
```
1. Run the Rake task to [migrate existing jobs](sidekiq_job_migration.md):
```shell
sudo gitlab-rake gitlab:sidekiq:migrate_jobs:retry gitlab:sidekiq:migrate_jobs:schedule gitlab:sidekiq:migrate_jobs:queued
```
WARNING:
As described in [the concurrency section](extra_sidekiq_processes.md#manage-thread-counts-explicitly), we
recommend setting `min_concurrency` and `max_concurrency` to the same value. For example, if the number of queues
in a queue group entry is 1, while `min_concurrency` is set to `0`, and `max_concurrency` is set to `20`, the resulting
2023-04-23 21:23:45 +05:30
concurrency is set to `2` instead. A concurrency of `2` might be too low in most cases, except for very highly-CPU
2023-03-04 22:38:38 +05:30
bound tasks.
## Worker matching query
GitLab provides a query syntax to match a worker based on its attributes. This
query syntax is employed by both [routing rules](#routing-rules) and
[queue selectors](#queue-selectors). A query includes two components:
- Attributes that can be selected.
- Operators used to construct a query.
### Available attributes
> [Introduced](https://gitlab.com/gitlab-com/gl-infra/scalability/-/issues/261) in GitLab 13.1 (`tags`).
Queue matching query works upon the worker attributes, described in
[Sidekiq style guide](../../development/sidekiq/index.md). We support querying
based on a subset of worker attributes:
- `feature_category` - the
[GitLab feature category](https://about.gitlab.com/direction/maturity/#category-maturity) the
queue belongs to. For example, the `merge` queue belongs to the
`source_code_management` category.
- `has_external_dependencies` - whether or not the queue connects to external
services. For example, all importers have this set to `true`.
- `urgency` - how important it is that this queue's jobs run
quickly. Can be `high`, `low`, or `throttled`. For example, the
`authorized_projects` queue is used to refresh user permissions, and
is `high` urgency.
- `worker_name` - the worker name. Use this attribute to select a specific worker. Find all available names in [the job classes lists](#list-of-available-job-classes) below.
- `name` - the queue name generated from the worker name. Use this attribute to select a specific queue. Because this is generated from
the worker name, it does not change based on the result of other routing
rules.
- `resource_boundary` - if the queue is bound by `cpu`, `memory`, or
`unknown`. For example, the `ProjectExportWorker` is memory bound as it has
to load data in memory before saving it for export.
- `tags` - short-lived annotations for queues. These are expected to frequently
change from release to release, and may be removed entirely.
`has_external_dependencies` is a boolean attribute: only the exact
string `true` is considered true, and everything else is considered
false.
`tags` is a set, which means that `=` checks for intersecting sets, and
`!=` checks for disjoint sets. For example, `tags=a,b` selects queues
that have tags `a`, `b`, or both. `tags!=a,b` selects queues that have
neither of those tags.
### Available operators
Routing rules and queue selectors support the following operators, listed from
highest to lowest precedence:
- `|` - the logical `OR` operator. For example, `query_a|query_b` (where `query_a`
and `query_b` are queries made up of the other operators here) includes
queues that match either query.
- `&` - the logical `AND` operator. For example, `query_a&query_b` (where
2023-04-23 21:23:45 +05:30
`query_a` and `query_b` are queries made up of the other operators here)
include only queues that match both queries.
2023-03-04 22:38:38 +05:30
- `!=` - the `NOT IN` operator. For example, `feature_category!=issue_tracking`
excludes all queues from the `issue_tracking` feature category.
- `=` - the `IN` operator. For example, `resource_boundary=cpu` includes all
queues that are CPU bound.
- `,` - the concatenate set operator. For example,
`feature_category=continuous_integration,pages` includes all queues from
either the `continuous_integration` category or the `pages` category. This
example is also possible using the OR operator, but allows greater brevity, as
well as being lower precedence.
The operator precedence for this syntax is fixed: it's not possible to make `AND`
have higher precedence than `OR`.
As with the standard queue group syntax above, a single `*` as the
entire queue group selects all queues.
### List of available job classes
For a list of the existing Sidekiq job classes and queues, check the following
files:
- [Queues for all GitLab editions](https://gitlab.com/gitlab-org/gitlab/-/blob/master/app/workers/all_queues.yml)
- [Queues for GitLab Enterprise Editions only](https://gitlab.com/gitlab-org/gitlab/-/blob/master/ee/app/workers/all_queues.yml)