[GitLab Workhorse](https://gitlab.com/gitlab-org/gitlab-workhorse) has special rules for handling uploads.
To prevent occupying a ruby process on I/O operations, we process the upload in workhorse, where is cheaper.
This process can also directly upload to object storage.
## The problem description
The following graph explains machine boundaries in a scalable GitLab installation. Without any workhorse optimization in place, we can expect incoming requests to follow the numbers on the arrows.
```mermaid
graph TB
subgraph "load balancers"
LB(HA Proxy)
end
subgraph "Shared storage"
nfs(NFS)
end
subgraph "redis cluster"
r(persisted redis)
end
LB-- 1 -->workhorse
subgraph "web or API fleet"
workhorse-- 2 -->rails
end
rails-- "3 (write files)" -->nfs
rails-- "4 (schedule a job)" -->r
subgraph sidekiq
s(sidekiq)
end
s-- "5 (fetch a job)" -->r
s-- "6 (read files)" -->nfs
```
We have three challenges here: performance, availability, and scalability.
### Performance
Rails process are expensive in terms of both CPU and memory. Ruby [global interpreter lock](https://en.wikipedia.org/wiki/Global_interpreter_lock) adds to cost too because the ruby process will spend time on I/O operations on step 3 causing incoming requests to pile up.
In order to improve this, [disk buffered upload](#disk-buffered-upload) was implemented. With this, Rails no longer deals with writing uploaded files to disk.
All features that require Sidekiq and do not use direct upload won't work without NFS. In Kubernetes, machine boundaries translate to PODs, and in this case the uploaded file will be written into the POD private disk. Since Sidekiq POD cannot reach into other pods, the operation will fail to read it.
1.**storage:** if we are uploading for storing a file (i.e. artifacts, packages, discussion attachments). In this case [direct upload](#direct-upload) is the proper level as it's the less resource-intensive operation. Additional information can be found on [File Storage in GitLab](file_storage.md).
1.**in-controller/synchronous processing:** if we allow processing **small files** synchronously, using [disk buffered upload](#disk-buffered-upload) may speed up development.
1.**Sidekiq/asynchronous processing:** Async processing must implement [direct upload](#direct-upload), the reason being that it's the only way to support Cloud Native deployments without a shared NFS.
By upload encoding we mean how the file is included within the incoming request.
We have three kinds of file encoding in our uploads:
1.<iclass="fa fa-check-circle"></i>**multipart**: `multipart/form-data` is the most common, a file is encoded as a part of a multipart encoded request.
1.<iclass="fa fa-check-circle"></i>**body**: some APIs uploads files as the whole request body.
1.<iclass="fa fa-times-circle"></i>**JSON**: some JSON API uploads files as base64 encoded strings. This will require a change to GitLab Workhorse, which [is planned](https://gitlab.com/gitlab-org/gitlab-workhorse/issues/226).
By uploading technologies we mean how all the involved services interact with each other.
GitLab supports 3 kinds of uploading technologies, here follows a brief description with a sequence diagram for each one. Diagrams are not meant to be exhaustive.
This is the default kind of upload, and it's most expensive in terms of resources.
In this case, workhorse is unaware of files being uploaded and acts as a regular proxy.
When a multipart request reaches the rails application, `Rack::Multipart` leaves behind tempfiles in `/tmp` and uses valuable Ruby process time to copy files around.
and [its routes](https://gitlab.com/gitlab-org/gitlab/blob/cc723071ad337573e0360a879cbf99bc4fb7adb9/config/routes/git_http.rb#L31-32).
- [API endpoints for uploading packages](packages.md#file-uploads).
**note:** this will fallback to _disk buffered upload_ when `direct_upload` is disabled inside the [object storage setting](../administration/uploads.md#object-storage-settings).
The answer to the `/authorize` call will only contain a file system path.