diff --git a/bench/results/README.md b/bench/results/README.md new file mode 100644 index 0000000..9118e4a --- /dev/null +++ b/bench/results/README.md @@ -0,0 +1,212 @@ +# Benchmark Report + +Benchmarks were run at various stages of development to keep track of +performance. Tech stacks were changed and the implementation optimized +to increase throughput. This report summarizes the findings of the +benchmarks + +Ultimately, we were able to identify a bottleneck that was previously +hidden in mCaptcha (hidden because a different bottleneck like DB access +eclipsed it :p) [and were able to increase performance of the critical +path by ~147 times](https://git.batsense.net/mCaptcha/dcache/pulls/3) +through a trivial optimization. + +## Environment + +These benchmarks were run on a noisy development laptop and should be +used for guidance only. + +- CPU: AMD Ryzen 5 5600U with Radeon Graphics (12) @ 4.289GHz +- Memory: 22849MiB +- OS: Arch Linux x86_64 +- Kernel: 6.6.7-arch1-1 +- rustc: 1.73.0 (cc66ad468 2023-10-03) + +## Baseline: Tech stack version 1 + +Actix Web based networking with JSON for message format. Was chosen for +prototyping, and was later used to set a baseline. + +## Without connection pooling in server-to-server communications + +### Single requests (no batching) + + +
+ + +Peak throughput observed was 1117 request/second (please click +to see charts) + + +#### Total number of requests vs time + +![number of requests](./v1/nopooling/nopipelining/total_requests_per_second_1703969194.png) + +#### Response times(ms) vs time + +![repsonse times(ms)](<./v1/nopooling/nopipelining/response_times_(ms)_1703969194.png>) + +#### Number of concurrent users vs time + +![number of concurrent +users](./v1/nopooling/nopipelining/number_of_users_1703969194.png) + + +
+ +### Batched requests + +
+ +Each network request contained 1,000 application requests, so peak throughput observed was 1,800 request/second. +Please click to see charts + + +#### Total number of requests vs time + +![number of requests](./v1/pooling/pipelining/total_requests_per_second_1703968582.png) + +#### Response times(ms) vs time + +![repsonse times(ms)](<./v1/pooling/pipelining/response_times_(ms)_1703968582.png>)) + +#### Number of concurrent users vs time + +![number of concurrent +users](./v1/pooling/pipelining/number_of_users_1703968582.png) + + +
+ +## With connection pooling in server-to-server communications + + +### Single requests (no batching) + +
+ +Peak throughput observed was 3904 request/second. Please click to see +charts + + +#### Total number of requests vs time + +![number of requests](./v1/pooling/nopipelining/total_requests_per_second_1703968214.png) + +#### Response times(ms) vs time + +![repsonse times(ms)](<./v1/pooling/nopipelining/response_times_(ms)_1703968215.png>) + +#### Number of concurrent users vs time + +![number of concurrent +users](./v1/pooling/nopipelining/number_of_users_1703968215.png) + + +
+ +### Batched requests + + +
+ +Each network request contained 1,000 application requests, so peak throughput observed was 15,800 request/second. +Please click to see charts. + + + +#### Total number of requests vs time + +![number of requests](./v1/pooling/pipelining/total_requests_per_second_1703968582.png) + +#### Response times(ms) vs time + +![repsonse times(ms)](<./v1/pooling/pipelining/response_times_(ms)_1703968582.png>)) + +#### Number of concurrent users vs time + +![number of concurrent +users](./v1/pooling/pipelining/number_of_users_1703968582.png) + +
+ + +## Tech stack version 2 + +Tonic for the network stack and GRPC for wire format. We ran over a +dozen benchmarks with this tech stack. The trend was similar to the ones +observed above: throughput was higher when connection pool was used and +even higher when requests were batched. _But_ the throughput of all of these benchmarks were lower than the +baseline benchmarks! + +The CPU was busier. We put it through +[flamgragh](https://github.com/flamegraph-rs/flamegraph) and hit it with +the same test suite to identify compute-heavy areas. The result was +unexpected: + +![flamegraph indicating libmcaptcha being +slow](./v2/libmcaptcha-bottleneck/problem/flamegraph.svg) + +libmCaptcha's [AddVisitor +handler](https://github.com/mCaptcha/libmcaptcha/blob/e3f456f35b2c9e55e0475b01b3e05d48b21fd51f/src/master/embedded/counter.rs#L124) +was taking up 59% of CPU time of the entire test run. This is a very +critical part of the variable difficulty factor PoW algorithm that +mCaptcha uses. We never ran into this bottleneck before because in other +cache implementations, it was always preceded with a database request. +It surfaced here as we are using in-memory data sources in dcache. + +libmCaptcha uses an actor-based approach with message passing for clean +concurrent state management. Message passing is generally faster in most +cases, but in our case, sharing memory using CPU's concurrent primitives +turned out to be significantly faster: + +![flamegraph indicating libmcaptcha being +slow](./v2/libmcaptcha-bottleneck/solution/flamegraph.svg) + +CPU time was reduced from 59% to 0.4%, roughly by one 147 times! + +With this fix in place: + + +### Connection pooled server-to-server communications, single requests (no batching) + +Peak throughput observed was 4816 request/second, ~1000 requests/second +more than baseline. + + +#### Total number of requests vs time + +![number of requests](./v2/grpc-conn-pool-post-bottleneck/single/total_requests_per_second_1703970940.png) + +#### Response times(ms) vs time + +![repsonse times(ms)](./v2/grpc-conn-pool-post-bottleneck/single/response_times_(ms)_1703970940.png) + +#### Number of concurrent users vs time + +![number of concurrent +users](./v2/grpc-conn-pool-post-bottleneck/single/number_of_users_1703970940.png) + + +### Connection pooled server-to-server communications, batched requests + + +Each network request contained 1,000 application requests, so peak throughput observed was 95,700 request/second. This six times higher than baseline. +Please click to see charts. + + +#### Total number of requests vs time + +![number of requests](./v2/grpc-conn-pool-post-bottleneck/pipeline/total_requests_per_second_1703971082.png) + +#### Response times(ms) vs time + +![repsonse times(ms)](./v2/grpc-conn-pool-post-bottleneck/pipeline/response_times_(ms)_1703971082.png) + +#### Number of concurrent users vs time + +![number of concurrent +users](./v2/grpc-conn-pool-post-bottleneck/pipeline/number_of_users_1703971082.png) + + diff --git a/bench/results/v1/nopooling/nopipelining/number_of_users_1703969194.png b/bench/results/v1/nopooling/nopipelining/number_of_users_1703969194.png new file mode 100644 index 0000000..7119f16 Binary files /dev/null and b/bench/results/v1/nopooling/nopipelining/number_of_users_1703969194.png differ diff --git a/bench/results/v1/nopooling/nopipelining/response_times_(ms)_1703969194.png b/bench/results/v1/nopooling/nopipelining/response_times_(ms)_1703969194.png new file mode 100644 index 0000000..aac7b92 Binary files /dev/null and b/bench/results/v1/nopooling/nopipelining/response_times_(ms)_1703969194.png differ diff --git a/bench/results/v1/nopooling/nopipelining/total_requests_per_second_1703969194.png b/bench/results/v1/nopooling/nopipelining/total_requests_per_second_1703969194.png new file mode 100644 index 0000000..cbd8efe Binary files /dev/null and b/bench/results/v1/nopooling/nopipelining/total_requests_per_second_1703969194.png differ diff --git a/bench/results/v1/nopooling/pipelining/number_of_users_1703969381.png b/bench/results/v1/nopooling/pipelining/number_of_users_1703969381.png new file mode 100644 index 0000000..19de878 Binary files /dev/null and b/bench/results/v1/nopooling/pipelining/number_of_users_1703969381.png differ diff --git a/bench/results/v1/nopooling/pipelining/response_times_(ms)_1703969381.png b/bench/results/v1/nopooling/pipelining/response_times_(ms)_1703969381.png new file mode 100644 index 0000000..cebf22c Binary files /dev/null and b/bench/results/v1/nopooling/pipelining/response_times_(ms)_1703969381.png differ diff --git a/bench/results/v1/nopooling/pipelining/total_requests_per_second_1703969381.png b/bench/results/v1/nopooling/pipelining/total_requests_per_second_1703969381.png new file mode 100644 index 0000000..688d77d Binary files /dev/null and b/bench/results/v1/nopooling/pipelining/total_requests_per_second_1703969381.png differ diff --git a/bench/results/v1/pooling/nopipelining/number_of_users_1703968215.png b/bench/results/v1/pooling/nopipelining/number_of_users_1703968215.png new file mode 100644 index 0000000..2164d43 Binary files /dev/null and b/bench/results/v1/pooling/nopipelining/number_of_users_1703968215.png differ diff --git a/bench/results/v1/pooling/nopipelining/response_times_(ms)_1703968215.png b/bench/results/v1/pooling/nopipelining/response_times_(ms)_1703968215.png new file mode 100644 index 0000000..2d0ed1e Binary files /dev/null and b/bench/results/v1/pooling/nopipelining/response_times_(ms)_1703968215.png differ diff --git a/bench/results/v1/pooling/nopipelining/total_requests_per_second_1703968214.png b/bench/results/v1/pooling/nopipelining/total_requests_per_second_1703968214.png new file mode 100644 index 0000000..042ab65 Binary files /dev/null and b/bench/results/v1/pooling/nopipelining/total_requests_per_second_1703968214.png differ diff --git a/bench/results/v1/pooling/pipelining/number_of_users_1703968582.png b/bench/results/v1/pooling/pipelining/number_of_users_1703968582.png new file mode 100644 index 0000000..28543ca Binary files /dev/null and b/bench/results/v1/pooling/pipelining/number_of_users_1703968582.png differ diff --git a/bench/results/v1/pooling/pipelining/response_times_(ms)_1703968582.png b/bench/results/v1/pooling/pipelining/response_times_(ms)_1703968582.png new file mode 100644 index 0000000..a08221d Binary files /dev/null and b/bench/results/v1/pooling/pipelining/response_times_(ms)_1703968582.png differ diff --git a/bench/results/v1/pooling/pipelining/total_requests_per_second_1703968582.png b/bench/results/v1/pooling/pipelining/total_requests_per_second_1703968582.png new file mode 100644 index 0000000..2c6dfeb Binary files /dev/null and b/bench/results/v1/pooling/pipelining/total_requests_per_second_1703968582.png differ diff --git a/bench/results/v2/grpc-conn-pool-post-bottleneck/pipeline/number_of_users_1703971082.png b/bench/results/v2/grpc-conn-pool-post-bottleneck/pipeline/number_of_users_1703971082.png new file mode 100644 index 0000000..c253b7d Binary files /dev/null and b/bench/results/v2/grpc-conn-pool-post-bottleneck/pipeline/number_of_users_1703971082.png differ diff --git a/bench/results/v2/grpc-conn-pool-post-bottleneck/pipeline/response_times_(ms)_1703971082.png b/bench/results/v2/grpc-conn-pool-post-bottleneck/pipeline/response_times_(ms)_1703971082.png new file mode 100644 index 0000000..3b32e88 Binary files /dev/null and b/bench/results/v2/grpc-conn-pool-post-bottleneck/pipeline/response_times_(ms)_1703971082.png differ diff --git a/bench/results/v2/grpc-conn-pool-post-bottleneck/pipeline/total_requests_per_second_1703971082.png b/bench/results/v2/grpc-conn-pool-post-bottleneck/pipeline/total_requests_per_second_1703971082.png new file mode 100644 index 0000000..100fcdd Binary files /dev/null and b/bench/results/v2/grpc-conn-pool-post-bottleneck/pipeline/total_requests_per_second_1703971082.png differ diff --git a/bench/results/v2/grpc-conn-pool-post-bottleneck/single/number_of_users_1703970940.png b/bench/results/v2/grpc-conn-pool-post-bottleneck/single/number_of_users_1703970940.png new file mode 100644 index 0000000..132bdae Binary files /dev/null and b/bench/results/v2/grpc-conn-pool-post-bottleneck/single/number_of_users_1703970940.png differ diff --git a/bench/results/v2/grpc-conn-pool-post-bottleneck/single/response_times_(ms)_1703970940.png b/bench/results/v2/grpc-conn-pool-post-bottleneck/single/response_times_(ms)_1703970940.png new file mode 100644 index 0000000..d299f0a Binary files /dev/null and b/bench/results/v2/grpc-conn-pool-post-bottleneck/single/response_times_(ms)_1703970940.png differ diff --git a/bench/results/v2/grpc-conn-pool-post-bottleneck/single/total_requests_per_second_1703970940.png b/bench/results/v2/grpc-conn-pool-post-bottleneck/single/total_requests_per_second_1703970940.png new file mode 100644 index 0000000..2787484 Binary files /dev/null and b/bench/results/v2/grpc-conn-pool-post-bottleneck/single/total_requests_per_second_1703970940.png differ