debian-mirror-gitlab/doc/development/reactive_caching.md

Ignoring revisions in .git-blame-ignore-revs. Click here to bypass and see the normal blame view.

309 lines
11 KiB
Markdown
Raw Normal View History

2021-01-29 00:20:46 +05:30
---
stage: none
group: unassigned
2022-11-25 23:54:43 +05:30
info: To determine the technical writer assigned to the Stage/Group associated with this page, see https://about.gitlab.com/handbook/product/ux/technical-writing/#assignments
2021-01-29 00:20:46 +05:30
---
2020-03-13 15:44:24 +05:30
# `ReactiveCaching`
2021-09-04 01:27:46 +05:30
> This doc refers to [`reactive_caching.rb`](https://gitlab.com/gitlab-org/gitlab/-/blob/master/app/models/concerns/reactive_caching.rb).
2020-03-13 15:44:24 +05:30
2021-02-22 17:27:13 +05:30
The `ReactiveCaching` concern is used for fetching some data in the background and storing it
2020-03-13 15:44:24 +05:30
in the Rails cache, keeping it up-to-date for as long as it is being requested. If the
2021-02-22 17:27:13 +05:30
data hasn't been requested for `reactive_cache_lifetime`, it stops being refreshed,
and is removed.
2020-03-13 15:44:24 +05:30
## Examples
```ruby
class Foo < ApplicationRecord
include ReactiveCaching
after_save :clear_reactive_cache!
def calculate_reactive_cache(param1, param2)
# Expensive operation here. The return value of this method is cached
end
def result
# Any arguments can be passed to `with_reactive_cache`. `calculate_reactive_cache`
# will be called with the same arguments.
with_reactive_cache(param1, param2) do |data|
# ...
end
end
end
```
2021-02-22 17:27:13 +05:30
In this example, the first time `#result` is called, it returns `nil`. However,
it enqueues a background worker to call `#calculate_reactive_cache` and set an
initial cache lifetime of 10 minutes.
2020-03-13 15:44:24 +05:30
## How it works
The first time `#with_reactive_cache` is called, a background job is enqueued and
`with_reactive_cache` returns `nil`. The background job calls `#calculate_reactive_cache`
and stores its return value. It also re-enqueues the background job to run again after
2021-02-22 17:27:13 +05:30
`reactive_cache_refresh_interval`. Therefore, it keeps the stored value up to date.
2020-03-13 15:44:24 +05:30
Calculations never run concurrently.
2021-02-22 17:27:13 +05:30
Calling `#with_reactive_cache` while a value is cached calls the block given to
`#with_reactive_cache`, yielding the cached value. It also extends the lifetime
2020-03-13 15:44:24 +05:30
of the cache by the `reactive_cache_lifetime` value.
2021-02-22 17:27:13 +05:30
After the lifetime has expired, no more background jobs are enqueued and calling
`#with_reactive_cache` again returns `nil`, starting the process all over again.
2020-03-13 15:44:24 +05:30
2021-02-22 17:27:13 +05:30
### Set a hard limit for ReactiveCaching
2020-03-13 15:44:24 +05:30
2021-02-22 17:27:13 +05:30
To preserve performance, you should set a hard caching limit in the class that includes
`ReactiveCaching`. See the example of [how to set it up](#selfreactive_cache_hard_limit).
2020-03-13 15:44:24 +05:30
2021-02-22 17:27:13 +05:30
For more information, read the internal issue
[Redis (or ReactiveCache) soft and hard limits](https://gitlab.com/gitlab-org/gitlab/-/issues/14015).
2020-03-13 15:44:24 +05:30
## When to use
- If we need to make a request to an external API (for example, requests to the k8s API).
2021-02-22 17:27:13 +05:30
It is not advisable to keep the application server worker blocked for the duration of
the external request.
2020-03-13 15:44:24 +05:30
- If a model needs to perform a lot of database calls or other time consuming
2021-02-22 17:27:13 +05:30
calculations.
2020-03-13 15:44:24 +05:30
## How to use
2021-09-30 23:02:18 +05:30
### In models and integrations
2020-03-13 15:44:24 +05:30
2021-12-11 22:18:48 +05:30
The ReactiveCaching concern can be used in models as well as integrations
2021-09-30 23:02:18 +05:30
(`app/models/integrations`).
2020-03-13 15:44:24 +05:30
2021-09-30 23:02:18 +05:30
1. Include the concern in your model or integration.
2020-03-13 15:44:24 +05:30
2021-09-30 23:02:18 +05:30
To include the concern in a model:
2020-03-13 15:44:24 +05:30
```ruby
include ReactiveCaching
```
2021-09-30 23:02:18 +05:30
To include the concern in an integration:
2020-03-13 15:44:24 +05:30
```ruby
2021-12-11 22:18:48 +05:30
include Integrations::ReactivelyCached
2020-03-13 15:44:24 +05:30
```
2021-09-30 23:02:18 +05:30
1. Implement the `calculate_reactive_cache` method in your model or integration.
1. Call `with_reactive_cache` in your model or integration where the cached value is needed.
2021-01-03 14:25:43 +05:30
1. Set the [`reactive_cache_work_type` accordingly](#selfreactive_cache_work_type).
2020-03-13 15:44:24 +05:30
### In controllers
Controller endpoints that call a model or service method that uses `ReactiveCaching` should
not wait until the background worker completes.
- An API that calls a model or service method that uses `ReactiveCaching` should return
2021-02-22 17:27:13 +05:30
`202 accepted` when the cache is being calculated (when `#with_reactive_cache` returns `nil`).
2020-03-13 15:44:24 +05:30
- It should also
2021-03-11 19:13:27 +05:30
[set the polling interval header](fe_guide/performance.md#real-time-components) with
2021-02-22 17:27:13 +05:30
`Gitlab::PollingInterval.set_header`.
2020-03-13 15:44:24 +05:30
- The consumer of the API is expected to poll the API.
- You can also consider implementing [ETag caching](polling.md) to reduce the server
2021-02-22 17:27:13 +05:30
load caused by polling.
2020-03-13 15:44:24 +05:30
### Methods to implement in a model or service
These are methods that should be implemented in the model/service that includes `ReactiveCaching`.
#### `#calculate_reactive_cache` (required)
2021-02-22 17:27:13 +05:30
- This method must be implemented. Its return value is cached.
- It is called by `ReactiveCaching` when it needs to populate the cache.
- Any arguments passed to `with_reactive_cache` are also passed to `calculate_reactive_cache`.
2020-03-13 15:44:24 +05:30
#### `#reactive_cache_updated` (optional)
- This method can be implemented if needed.
- It is called by the `ReactiveCaching` concern whenever the cache is updated.
2021-02-22 17:27:13 +05:30
If the cache is being refreshed and the new cache value is the same as the old cache
value, this method is not called. It is only called if a new value is stored in
the cache.
2020-03-13 15:44:24 +05:30
- It can be used to perform an action whenever the cache is updated.
### Methods called by a model or service
These are methods provided by `ReactiveCaching` and should be called in
the model/service.
#### `#with_reactive_cache` (required)
- `with_reactive_cache` must be called where the result of `calculate_reactive_cache`
2021-02-22 17:27:13 +05:30
is required.
2020-03-13 15:44:24 +05:30
- A block can be given to `with_reactive_cache`. `with_reactive_cache` can also take
2021-02-22 17:27:13 +05:30
any number of arguments. Any arguments passed to `with_reactive_cache` are
passed to `calculate_reactive_cache`. The arguments passed to `with_reactive_cache`
are appended to the cache key name.
2020-03-13 15:44:24 +05:30
- If `with_reactive_cache` is called when the result has already been cached, the
2021-02-22 17:27:13 +05:30
block is called, yielding the cached value and the return value of the block
is returned by `with_reactive_cache`. It also resets the timeout of the
cache to the `reactive_cache_lifetime` value.
- If the result has not been cached as yet, `with_reactive_cache` return `nil`.
It also enqueues a background job, which calls `calculate_reactive_cache`
and caches the result.
- After the background job has completed and the result is cached, the next call
to `with_reactive_cache` picks up the cached value.
2020-03-13 15:44:24 +05:30
- In the example below, `data` is the cached value which is yielded to the block
2021-02-22 17:27:13 +05:30
given to `with_reactive_cache`.
2020-03-13 15:44:24 +05:30
```ruby
class Foo < ApplicationRecord
include ReactiveCaching
def calculate_reactive_cache(param1, param2)
# Expensive operation here. The return value of this method is cached
end
def result
with_reactive_cache(param1, param2) do |data|
# ...
end
end
end
```
#### `#clear_reactive_cache!` (optional)
- This method can be called when the cache needs to be expired/cleared. For example,
2021-02-22 17:27:13 +05:30
it can be called in an `after_save` callback in a model so that the cache is
cleared after the model is modified.
2020-03-13 15:44:24 +05:30
- This method should be called with the same parameters that are passed to
2021-02-22 17:27:13 +05:30
`with_reactive_cache` because the parameters are part of the cache key.
2020-03-13 15:44:24 +05:30
#### `#without_reactive_cache` (optional)
- This is a convenience method that can be used for debugging purposes.
- This method calls `calculate_reactive_cache` in the current process instead of
2021-02-22 17:27:13 +05:30
in a background worker.
2020-03-13 15:44:24 +05:30
### Configurable options
There are some `class_attribute` options which can be tweaked.
#### `self.reactive_cache_key`
- The value of this attribute is the prefix to the `data` and `alive` cache key names.
2021-02-22 17:27:13 +05:30
The parameters passed to `with_reactive_cache` form the rest of the cache key names.
2020-03-13 15:44:24 +05:30
- By default, this key uses the model's name and the ID of the record.
```ruby
self.reactive_cache_key = -> (record) { [model_name.singular, record.id] }
```
2021-02-22 17:27:13 +05:30
- The `data` and `alive` cache keys in this case are `"ExampleModel:1:arg1:arg2"`
and `"ExampleModel:1:arg1:arg2:alive"` respectively, where `ExampleModel` is the
name of the model, `1` is the ID of the record, `arg1` and `arg2` are parameters
passed to `with_reactive_cache`.
2021-12-11 22:18:48 +05:30
- If you're including this concern in an integration (`app/models/integrations/`) instead, you must override
the default by adding the following to your integration:
2020-03-13 15:44:24 +05:30
```ruby
2021-12-11 22:18:48 +05:30
self.reactive_cache_key = ->(integration) { [integration.class.model_name.singular, integration.project_id] }
2020-03-13 15:44:24 +05:30
```
If your reactive_cache_key is exactly like the above, you can use the existing
2021-12-11 22:18:48 +05:30
`Integrations::ReactivelyCached` concern instead.
2020-03-13 15:44:24 +05:30
#### `self.reactive_cache_lease_timeout`
- `ReactiveCaching` uses `Gitlab::ExclusiveLease` to ensure that the cache calculation
2021-02-22 17:27:13 +05:30
is never run concurrently by multiple workers.
2020-03-13 15:44:24 +05:30
- This attribute is the timeout for the `Gitlab::ExclusiveLease`.
- It defaults to 2 minutes, but can be overridden if a different timeout is required.
```ruby
self.reactive_cache_lease_timeout = 2.minutes
```
#### `self.reactive_cache_refresh_interval`
- This is the interval at which the cache is refreshed.
- It defaults to 1 minute.
```ruby
2022-01-26 12:08:38 +05:30
self.reactive_cache_refresh_interval = 1.minute
2020-03-13 15:44:24 +05:30
```
#### `self.reactive_cache_lifetime`
2021-02-22 17:27:13 +05:30
- This is the duration after which the cache is cleared if there are no requests.
2020-03-13 15:44:24 +05:30
- The default is 10 minutes. If there are no requests for this cache value for 10 minutes,
2021-02-22 17:27:13 +05:30
the cache expires.
- If the cache value is requested before it expires, the timeout of the cache is
reset to `reactive_cache_lifetime`.
2020-03-13 15:44:24 +05:30
```ruby
self.reactive_cache_lifetime = 10.minutes
```
#### `self.reactive_cache_hard_limit`
- This is the maximum data size that `ReactiveCaching` allows to be cached.
2021-02-22 17:27:13 +05:30
- The default is 1 megabyte. Data that goes over this value is not cached
and silently raises `ReactiveCaching::ExceededReactiveCacheLimit` on Sentry.
2020-03-13 15:44:24 +05:30
```ruby
self.reactive_cache_hard_limit = 5.megabytes
```
2020-05-24 23:13:21 +05:30
#### `self.reactive_cache_work_type`
- This is the type of work performed by the `calculate_reactive_cache` method. Based on this attribute,
it's able to pick the right worker to process the caching job. Make sure to
set it as `:external_dependency` if the work performs any external request
2021-09-30 23:02:18 +05:30
(for example, Kubernetes, Sentry); otherwise set it to `:no_dependency`.
2020-05-24 23:13:21 +05:30
2020-03-13 15:44:24 +05:30
#### `self.reactive_cache_worker_finder`
- This is the method used by the background worker to find or generate the object on
which `calculate_reactive_cache` can be called.
- By default it uses the model primary key to find the object:
```ruby
self.reactive_cache_worker_finder = ->(id, *_args) do
find_by(primary_key => id)
end
```
2020-05-24 23:13:21 +05:30
- The default behavior can be overridden by defining a custom `reactive_cache_worker_finder`.
2020-03-13 15:44:24 +05:30
```ruby
class Foo < ApplicationRecord
include ReactiveCaching
self.reactive_cache_worker_finder = ->(_id, *args) { from_cache(*args) }
def self.from_cache(var1, var2)
# This method will be called by the background worker with "bar1" and
# "bar2" as arguments.
new(var1, var2)
end
def initialize(var1, var2)
# ...
end
def calculate_reactive_cache(var1, var2)
# Expensive operation here. The return value of this method is cached
end
def result
with_reactive_cache("bar1", "bar2") do |data|
# ...
end
end
end
```
2021-02-22 17:27:13 +05:30
- In this example, the primary key ID is passed to `reactive_cache_worker_finder`
along with the parameters passed to `with_reactive_cache`.
2020-03-13 15:44:24 +05:30
- The custom `reactive_cache_worker_finder` calls `.from_cache` with the parameters
2021-02-22 17:27:13 +05:30
passed to `with_reactive_cache`.