213 lines
6.2 KiB
Markdown
213 lines
6.2 KiB
Markdown
|
# Benchmark Report
|
||
|
|
||
|
Benchmarks were run at various stages of development to keep track of
|
||
|
performance. Tech stacks were changed and the implementation optimized
|
||
|
to increase throughput. This report summarizes the findings of the
|
||
|
benchmarks
|
||
|
|
||
|
Ultimately, we were able to identify a bottleneck that was previously
|
||
|
hidden in mCaptcha (hidden because a different bottleneck like DB access
|
||
|
eclipsed it :p) [and were able to increase performance of the critical
|
||
|
path by ~147 times](https://git.batsense.net/mCaptcha/dcache/pulls/3)
|
||
|
through a trivial optimization.
|
||
|
|
||
|
## Environment
|
||
|
|
||
|
These benchmarks were run on a noisy development laptop and should be
|
||
|
used for guidance only.
|
||
|
|
||
|
- CPU: AMD Ryzen 5 5600U with Radeon Graphics (12) @ 4.289GHz
|
||
|
- Memory: 22849MiB
|
||
|
- OS: Arch Linux x86_64
|
||
|
- Kernel: 6.6.7-arch1-1
|
||
|
- rustc: 1.73.0 (cc66ad468 2023-10-03)
|
||
|
|
||
|
## Baseline: Tech stack version 1
|
||
|
|
||
|
Actix Web based networking with JSON for message format. Was chosen for
|
||
|
prototyping, and was later used to set a baseline.
|
||
|
|
||
|
## Without connection pooling in server-to-server communications
|
||
|
|
||
|
### Single requests (no batching)
|
||
|
|
||
|
|
||
|
<details>
|
||
|
|
||
|
|
||
|
<summary>Peak throughput observed was 1117 request/second (please click
|
||
|
to see charts)</summary>
|
||
|
|
||
|
|
||
|
#### Total number of requests vs time
|
||
|
|
||
|
data:image/s3,"s3://crabby-images/6a1b3/6a1b3d63539898ef569cddb4e3da559fbe236527" alt="number of requests"
|
||
|
|
||
|
#### Response times(ms) vs time
|
||
|
|
||
|
data:image/s3,"s3://crabby-images/b2c98/b2c98266dcabc994fc2382d0ea8864a0c65e4644" alt="repsonse times(ms)"_1703969194.png>)
|
||
|
|
||
|
#### Number of concurrent users vs time
|
||
|
|
||
|
data:image/s3,"s3://crabby-images/65285/6528547380d01cbaa05e228c562fce5174b0f23c" alt="number of concurrent
|
||
|
users"
|
||
|
|
||
|
|
||
|
</details>
|
||
|
|
||
|
### Batched requests
|
||
|
|
||
|
<details>
|
||
|
<summary>
|
||
|
Each network request contained 1,000 application requests, so peak throughput observed was 1,800 request/second.
|
||
|
Please click to see charts</summary>
|
||
|
|
||
|
|
||
|
#### Total number of requests vs time
|
||
|
|
||
|
data:image/s3,"s3://crabby-images/f893b/f893bea5b7ddee40209addcb9d020b4b8c7d9772" alt="number of requests"
|
||
|
|
||
|
#### Response times(ms) vs time
|
||
|
|
||
|
data:image/s3,"s3://crabby-images/69d8a/69d8a0925dd7d69adb463b331297b627b3d6e9f3" alt="repsonse times(ms)"_1703968582.png>))
|
||
|
|
||
|
#### Number of concurrent users vs time
|
||
|
|
||
|
data:image/s3,"s3://crabby-images/32af7/32af70a33b4ae5556baa799d1054ea5d311d1138" alt="number of concurrent
|
||
|
users"
|
||
|
|
||
|
|
||
|
</details>
|
||
|
|
||
|
## With connection pooling in server-to-server communications
|
||
|
|
||
|
|
||
|
### Single requests (no batching)
|
||
|
|
||
|
<details>
|
||
|
<summary>
|
||
|
Peak throughput observed was 3904 request/second. Please click to see
|
||
|
charts</summary>
|
||
|
|
||
|
|
||
|
#### Total number of requests vs time
|
||
|
|
||
|
data:image/s3,"s3://crabby-images/903a0/903a057c6af5df30e7911f31f2a84387772db21c" alt="number of requests"
|
||
|
|
||
|
#### Response times(ms) vs time
|
||
|
|
||
|
data:image/s3,"s3://crabby-images/fb2cf/fb2cf6c3ea19b69bc9de7907ca3cc4e4e00fef6e" alt="repsonse times(ms)"_1703968215.png>)
|
||
|
|
||
|
#### Number of concurrent users vs time
|
||
|
|
||
|
data:image/s3,"s3://crabby-images/9439f/9439fb1e97d28cc48a6b68ee7e71790325c64757" alt="number of concurrent
|
||
|
users"
|
||
|
|
||
|
|
||
|
</details>
|
||
|
|
||
|
### Batched requests
|
||
|
|
||
|
|
||
|
<details>
|
||
|
<summary>
|
||
|
Each network request contained 1,000 application requests, so peak throughput observed was 15,800 request/second.
|
||
|
Please click to see charts.
|
||
|
</summary>
|
||
|
|
||
|
|
||
|
#### Total number of requests vs time
|
||
|
|
||
|
data:image/s3,"s3://crabby-images/f893b/f893bea5b7ddee40209addcb9d020b4b8c7d9772" alt="number of requests"
|
||
|
|
||
|
#### Response times(ms) vs time
|
||
|
|
||
|
data:image/s3,"s3://crabby-images/69d8a/69d8a0925dd7d69adb463b331297b627b3d6e9f3" alt="repsonse times(ms)"_1703968582.png>))
|
||
|
|
||
|
#### Number of concurrent users vs time
|
||
|
|
||
|
data:image/s3,"s3://crabby-images/32af7/32af70a33b4ae5556baa799d1054ea5d311d1138" alt="number of concurrent
|
||
|
users"
|
||
|
|
||
|
</details>
|
||
|
|
||
|
|
||
|
## Tech stack version 2
|
||
|
|
||
|
Tonic for the network stack and GRPC for wire format. We ran over a
|
||
|
dozen benchmarks with this tech stack. The trend was similar to the ones
|
||
|
observed above: throughput was higher when connection pool was used and
|
||
|
even higher when requests were batched. _But_ the throughput of all of these benchmarks were lower than the
|
||
|
baseline benchmarks!
|
||
|
|
||
|
The CPU was busier. We put it through
|
||
|
[flamgragh](https://github.com/flamegraph-rs/flamegraph) and hit it with
|
||
|
the same test suite to identify compute-heavy areas. The result was
|
||
|
unexpected:
|
||
|
|
||
|
data:image/s3,"s3://crabby-images/63c60/63c60574bc7b385241c4ecbc9da2b0588f05f1f6" alt="flamegraph indicating libmcaptcha being
|
||
|
slow"
|
||
|
|
||
|
libmCaptcha's [AddVisitor
|
||
|
handler](https://github.com/mCaptcha/libmcaptcha/blob/e3f456f35b2c9e55e0475b01b3e05d48b21fd51f/src/master/embedded/counter.rs#L124)
|
||
|
was taking up 59% of CPU time of the entire test run. This is a very
|
||
|
critical part of the variable difficulty factor PoW algorithm that
|
||
|
mCaptcha uses. We never ran into this bottleneck before because in other
|
||
|
cache implementations, it was always preceded with a database request.
|
||
|
It surfaced here as we are using in-memory data sources in dcache.
|
||
|
|
||
|
libmCaptcha uses an actor-based approach with message passing for clean
|
||
|
concurrent state management. Message passing is generally faster in most
|
||
|
cases, but in our case, sharing memory using CPU's concurrent primitives
|
||
|
turned out to be significantly faster:
|
||
|
|
||
|
data:image/s3,"s3://crabby-images/53d0d/53d0dae47394d37a72e69c0cb0282638e975a707" alt="flamegraph indicating libmcaptcha being
|
||
|
slow"
|
||
|
|
||
|
CPU time was reduced from 59% to 0.4%, roughly by one 147 times!
|
||
|
|
||
|
With this fix in place:
|
||
|
|
||
|
|
||
|
### Connection pooled server-to-server communications, single requests (no batching)
|
||
|
|
||
|
Peak throughput observed was 4816 request/second, ~1000 requests/second
|
||
|
more than baseline.
|
||
|
|
||
|
|
||
|
#### Total number of requests vs time
|
||
|
|
||
|
data:image/s3,"s3://crabby-images/96c8a/96c8a8dca312c534629035f788dee71083817728" alt="number of requests"
|
||
|
|
||
|
#### Response times(ms) vs time
|
||
|
|
||
|
data:image/s3,"s3://crabby-images/0ddb7/0ddb7196e6fd4cc7785e138245b64723f2960656" alt="repsonse times(ms)"_1703970940.png)
|
||
|
|
||
|
#### Number of concurrent users vs time
|
||
|
|
||
|
data:image/s3,"s3://crabby-images/9f2c5/9f2c51a432c82dd8416b745323875038acf2c7b7" alt="number of concurrent
|
||
|
users"
|
||
|
|
||
|
|
||
|
### Connection pooled server-to-server communications, batched requests
|
||
|
|
||
|
|
||
|
Each network request contained 1,000 application requests, so peak throughput observed was 95,700 request/second. This six times higher than baseline.
|
||
|
Please click to see charts.
|
||
|
|
||
|
|
||
|
#### Total number of requests vs time
|
||
|
|
||
|
data:image/s3,"s3://crabby-images/4c536/4c536bb11a6c84b2b70d6439006cbfc9ec3d3298" alt="number of requests"
|
||
|
|
||
|
#### Response times(ms) vs time
|
||
|
|
||
|
data:image/s3,"s3://crabby-images/fb4eb/fb4eb1d0bafa735b39b63d9b8324391ce5ec678b" alt="repsonse times(ms)"_1703971082.png)
|
||
|
|
||
|
#### Number of concurrent users vs time
|
||
|
|
||
|
data:image/s3,"s3://crabby-images/07e0e/07e0e34dafc01b597e83dc7a0911f6102fb88f85" alt="number of concurrent
|
||
|
users"
|
||
|
|
||
|
</details>
|