This process can also directly upload to object storage.
## The problem description
The following graph explains machine boundaries in a scalable GitLab installation. Without any workhorse optimization in place, we can expect incoming requests to follow the numbers on the arrows.
Rails process are expensive in terms of both CPU and memory. Ruby [global interpreter lock](https://en.wikipedia.org/wiki/Global_interpreter_lock) adds to cost too because the Ruby process will spend time on I/O operations on step 3 causing incoming requests to pile up.
In order to improve this, [disk buffered upload](#disk-buffered-upload) was implemented. With this, Rails no longer deals with writing uploaded files to disk.
All features that require Sidekiq and do not use direct upload won't work without NFS. In Kubernetes, machine boundaries translate to PODs, and in this case the uploaded file will be written into the POD private disk. Since Sidekiq POD cannot reach into other pods, the operation will fail to read it.
1.**storage:** if we are uploading for storing a file (i.e. artifacts, packages, discussion attachments). In this case [direct upload](#direct-upload) is the proper level as it's the less resource-intensive operation. Additional information can be found on [File Storage in GitLab](file_storage.md).
1.**in-controller/synchronous processing:** if we allow processing **small files** synchronously, using [disk buffered upload](#disk-buffered-upload) may speed up development.
1.**Sidekiq/asynchronous processing:** Asynchronous processing must implement [direct upload](#direct-upload), the reason being that it's the only way to support Cloud Native deployments without a shared NFS.
By upload encoding we mean how the file is included within the incoming request.
We have three kinds of file encoding in our uploads:
1.<iclass="fa fa-check-circle"></i>**multipart**: `multipart/form-data` is the most common, a file is encoded as a part of a multipart encoded request.
1.<iclass="fa fa-check-circle"></i>**body**: some APIs uploads files as the whole request body.
1.<iclass="fa fa-times-circle"></i>**JSON**: some JSON API uploads files as base64 encoded strings. This will require a change to GitLab Workhorse, which [is planned](https://gitlab.com/gitlab-org/gitlab-workhorse/-/issues/226).
By uploading technologies we mean how all the involved services interact with each other.
GitLab supports 3 kinds of uploading technologies, here follows a brief description with a sequence diagram for each one. Diagrams are not meant to be exhaustive.
When a multipart request reaches the rails application, `Rack::Multipart` leaves behind temporary files in `/tmp` and uses valuable Ruby process time to copy files around.
This will fallback to _disk buffered upload_ when `direct_upload` is disabled inside the [object storage setting](../administration/uploads.md#object-storage-settings).
In this section, we'll describe how to add a new upload route [accelerated](#uploading-technologies) by Workhorse for [body and multipart](#upload-encodings) encoded uploads.
Uploads routes belong to one of these categories:
1. Rails controllers: uploads handled by Rails controllers.
1. Grape API: uploads handled by a Grape API endpoint.
1. GraphQL API: uploads handled by a GraphQL resolve function. In these cases, there is nothing else
to do apart from implementing the actual upload.
### Update Workhorse for the new route
For both the Rails controller and Grape API uploads, Workhorse has to be updated in order to get the
support for the new upload route.
1. Open an new issue in the [Workhorse tracker](https://gitlab.com/gitlab-org/gitlab-workhorse/-/issues/new) describing precisely the new upload route:
- The route's URL.
- The [upload encoding](#upload-encodings).
- If possible, provide a dump of the upload request.
1. Implement and get the MR merged for this issue above.
1. Ask the Maintainers of [Workhorse](https://gitlab.com/gitlab-org/gitlab-workhorse) to create a new release. You can do that in the MR
directly during the maintainer review or ask for it in the `#workhorse` Slack channel.
1. Bump the [Workhorse version file](https://gitlab.com/gitlab-org/gitlab/-/blob/master/GITLAB_WORKHORSE_VERSION)
to the version you have from the previous points, or bump it in the same merge request that contains
the Rails changes (see [Implementing the new route with a Rails controller](#implementing-the-new-route-with-a-rails-controller) or [Implementing the new route with a Grape API endpoint](#implementing-the-new-route-with-a-grape-api-endpoint) below).
### Implementing the new route with a Rails controller
For a Rails controller upload, we usually have a [multipart](#upload-encodings) upload and there are a
few things to do:
1. The upload is available under the parameter name you're using. For example, it could be an `artifact`
or a nested parameter such as `user[avatar]`. Let's say that we have the upload under the
`file` parameter, reading `params[:file]` should get you an [`UploadedFile`](https://gitlab.com/gitlab-org/gitlab/-/blob/master/lib/uploaded_file.rb) instance.
1. Generally speaking, it's a good idea to check if the instance is from the [`UploadedFile`](https://gitlab.com/gitlab-org/gitlab/-/blob/master/lib/uploaded_file.rb) class. For example, see how we checked
[that the parameter is indeed an `UploadedFile`](https://gitlab.com/gitlab-org/gitlab/-/commit/ea30fe8a71bf16ba07f1050ab4820607b5658719#51c0cc7a17b7f12c32bc41cfab3649ff2739b0eb_79_77).
CAUTION: **Caution:**
**Do not** call `UploadedFile#from_params` directly! Do not build an [`UploadedFile`](https://gitlab.com/gitlab-org/gitlab/-/blob/master/lib/uploaded_file.rb)
instance using `UploadedFile#from_params`! This method can be unsafe to use depending on the `params`
passed. Instead, use the [`UploadedFile`](https://gitlab.com/gitlab-org/gitlab/-/blob/master/lib/uploaded_file.rb)
instance that [`multipart.rb`](https://gitlab.com/gitlab-org/gitlab/-/blob/master/lib/gitlab/middleware/multipart.rb)
builds automatically for you.
### Implementing the new route with a Grape API endpoint
For a Grape API upload, we can have [body or a multipart](#upload-encodings) upload. Things are slightly more complicated: two endpoints are needed. One for the
Workhorse pre-upload authorization and one for accepting the upload metadata from Workhorse:
1. Implement an endpoint with the URL + `/authorize` suffix that will:
- Check that the request is coming from Workhorse with the `require_gitlab_workhorse!` from the [API helpers](https://gitlab.com/gitlab-org/gitlab/blob/master/lib/api/helpers.rb).
- Check user permissions.
- Set the status to `200` with `status 200`.
- Set the content type with `content_type Gitlab::Workhorse::INTERNAL_API_CONTENT_TYPE`.
- Use your dedicated `Uploader` class (let's say that it's `FileUploader`) to build the response with `FileUploader.workhorse_authorize(params)`.
1. Implement the endpoint for the upload request that will:
- Require all the `UploadedFile` objects as parameters.
- For example, if we expect a single parameter `file` to be an [`UploadedFile`](https://gitlab.com/gitlab-org/gitlab/-/blob/master/lib/uploaded_file.rb) instance,
use `requires :file, type: ::API::Validations::Types::WorkhorseFile`.
- Body upload requests have their upload available under the parameter `file`.
- Check that the request is coming from Workhorse with the `require_gitlab_workhorse!` from the
- The remaining code of the processing. This is where the code must be reading the parameter (for
our example, it would be `params[:file]`).
CAUTION: **Caution:**
**Do not** call `UploadedFile#from_params` directly! Do not build an [`UploadedFile`](https://gitlab.com/gitlab-org/gitlab/-/blob/master/lib/uploaded_file.rb)
object using `UploadedFile#from_params`! This method can be unsafe to use depending on the `params`
passed. Instead, use the [`UploadedFile`](https://gitlab.com/gitlab-org/gitlab/-/blob/master/lib/uploaded_file.rb)
object that [`multipart.rb`](https://gitlab.com/gitlab-org/gitlab/-/blob/master/lib/gitlab/middleware/multipart.rb)