Periodically publish results from mCaptcha/survey. #15

Merged
realaravinth merged 5 commits from publish-results into master 2023-03-12 21:15:42 +05:30
Owner

Periodically publish benchmark results

closes mCaptcha/2023-NLnet#4

Motivation

Proof-of-Work (PoW) difficulty is configurable in mCaptcha. It is important that webmasters choose difficulty factors that are effective at containing DoS attacks while also allowing the majority of legitimate visitors generate proofs.

mCaptcha/survey runs benchmark campaigns and aggregates performance statistics from its visitors. This data can guide webmasters to choosing effective PoW difficulty factors, if it is publicly available and downloadable.

Also, mCaptcha/mCaptcha will soon be able to pull benchmark results from mCaptcha/survey instances to make effective and user-friendly Proof-of-Work difficulty recommendations.

So, this patch publishes campaign results periodically.

Description

Results are published periodically. Publication period is configured by the mCaptcha/survey admin.

Results are published at http://{instance-hostname}/download in the following structure:

└── <campaign ID>
    ├── <publication timestamp>
    │   ├── benchmark.csv
    │   └── campaign.json
    └── <publication timetamp>
        ├── benchmark.csv
        └── campaign.json

Data formats

Campaign information

Each campaign will run benchmarks with a different set of PoW configurations. Therefore, the benchmarks produced by each campaign is unique and is closely tied to the campaign's PoW configuration.

The campaign metadata is published in campaign.json:

{
	"id":"4e951e01-71ee4a18-9b97-782965495ae3", ## campaign ID
	"name":"lab.b.1", ## campaign name
	"difficulties":[11,12], ## campaign difficulties
	"created_at":1674650743 ## campaign creation timestamp
}

Benchmark results

Each campaign participant gets an ephemeral user ID so that if they run the campaign multiple times, their benchmarks can be aggregated.

The campaign result includes the following data:

  1. Benchmark ID
  2. user identifier
  3. device identifier as provided by the user (ex: pinephone pro)
  4. user agent
  5. Number of threads in device, identified through client-side JavaScript code
  6. Submission timestamp
  7. Type of benchmark suite executed: wasm/js
  8. Difficulty level and its execution time in milliseconds. Difficulty factor 5000 will have a unique column in CSV file as Difficulty 5000, which will contain its execution times.
Example:

CSV converted to markdown for convenience:

ID user device_user_provided device_software_recognised threads submitted_at submission_type Difficulty 11 Difficulty 12
27 17b31d3d-9b22-4005-8309-c1a5bf444476 lab.b.1 Mozilla/5.0 (X11; Linux x86_64; rv:109.0) Gecko/20100101 Firefox/109.0 12 1675244542 wasm 0 0
28 17b31d3d-9b22-4005-8309-c1a5bf444476 pativu Mozilla/5.0 (X11; Linux x86_64; rv:109.0) Gecko/20100101 Firefox/109.0 12 1675244542 wasm - -

Raw CSV data:

ID,user,device_user_provided,device_software_recognised,threads,submitted_at,submission_type,Difficulty 11,Difficulty 12
27,17b31d3d-9b22-4005-8309-c1a5bf444476,lab.b.1,Mozilla/5.0 (X11; Linux x86_64; rv:109.0) Gecko/20100101 Firefox/109.0,12,1675244542,wasm,0,0
28,17b31d3d-9b22-4005-8309-c1a5bf444476,pativu,Mozilla/5.0 (X11; Linux x86_64; rv:109.0) Gecko/20100101 Firefox/109.0,12,1675244542,wasm,-,-

# Periodically publish benchmark results > closes https://git.batsense.net/mCaptcha/2023-NLnet/issues/4 ## Motivation Proof-of-Work (PoW) difficulty is configurable in mCaptcha. It is important that webmasters choose difficulty factors that are effective at containing DoS attacks while also allowing the majority of legitimate visitors generate proofs. mCaptcha/survey runs benchmark campaigns and aggregates performance statistics from its visitors. This data can guide webmasters to choosing effective PoW difficulty factors, if it is publicly available and downloadable. Also, mCaptcha/mCaptcha will soon be able to pull benchmark results from mCaptcha/survey instances to make effective and user-friendly Proof-of-Work difficulty recommendations. So, this patch publishes campaign results periodically. ## Description Results are published periodically. Publication period is configured by the mCaptcha/survey admin. Results are published at `http://{instance-hostname}/download` in the following structure: ```bash └── <campaign ID> ├── <publication timestamp> │   ├── benchmark.csv │   └── campaign.json └── <publication timetamp> ├── benchmark.csv └── campaign.json ``` ### Data formats #### Campaign information Each campaign will run benchmarks with a different set of PoW configurations. Therefore, the benchmarks produced by each campaign is unique and is closely tied to the campaign's PoW configuration. The campaign metadata is published in `campaign.json`: ```json { "id":"4e951e01-71ee4a18-9b97-782965495ae3", ## campaign ID "name":"lab.b.1", ## campaign name "difficulties":[11,12], ## campaign difficulties "created_at":1674650743 ## campaign creation timestamp } ``` #### Benchmark results Each campaign participant gets an ephemeral user ID so that if they run the campaign multiple times, their benchmarks can be aggregated. The campaign result includes the following data: 1. Benchmark ID 2. user identifier 3. device identifier as provided by the user (ex: pinephone pro) 4. user agent 5. Number of threads in device, identified through client-side JavaScript code 6. Submission timestamp 7. Type of benchmark suite executed: wasm/js 8. Difficulty level and its execution time in milliseconds. Difficulty factor 5000 will have a unique column in CSV file as `Difficulty 5000`, which will contain its execution times. ##### Example: CSV converted to markdown for convenience: | ID | user | device_user_provided | device_software_recognised | threads | submitted_at | submission_type | Difficulty 11 | Difficulty 12 | | --- | ------------------------------------ | -------------------- | ---------------------------------------------------------------------- | ------- | ------------ | --------------- | ------------- | ------------- | | 27 | 17b31d3d-9b22-4005-8309-c1a5bf444476 | lab.b.1 | Mozilla/5.0 (X11; Linux x86_64; rv:109.0) Gecko/20100101 Firefox/109.0 | 12 | 1675244542 | wasm | 0 | 0 | | 28 | 17b31d3d-9b22-4005-8309-c1a5bf444476 | pativu | Mozilla/5.0 (X11; Linux x86_64; rv:109.0) Gecko/20100101 Firefox/109.0 | 12 | 1675244542 | wasm | - | - | Raw CSV data: ```csv ID,user,device_user_provided,device_software_recognised,threads,submitted_at,submission_type,Difficulty 11,Difficulty 12 27,17b31d3d-9b22-4005-8309-c1a5bf444476,lab.b.1,Mozilla/5.0 (X11; Linux x86_64; rv:109.0) Gecko/20100101 Firefox/109.0,12,1675244542,wasm,0,0 28,17b31d3d-9b22-4005-8309-c1a5bf444476,pativu,Mozilla/5.0 (X11; Linux x86_64; rv:109.0) Gecko/20100101 Firefox/109.0,12,1675244542,wasm,-,- ```
realaravinth added 5 commits 2023-03-12 20:32:06 +05:30
DESCRIPTION
    FORMATS
	- Campaign configuration is stored in JSON format
	- Benchmark data is stored in CSV format

    DIRECTORY STRUCTURE
	Each campaign gets a separate directory. A campaign can have
	multiple archives. Archives are stored in directories whose names
	would be the same as the UNIX timestamp of when they were
	recorded.

	EXAMPLE
	    The example below shows three campaigns with one archive
	    each. Each archive is stored in a directory denoting the
	    moment in which the archive was generated. Each archive
	    includes campaign configuration and benchmark.

	    ```bash
		14:53 atm@lab archive → tree
		.
		├── 4e951e01-71ee-4a18-9b97-782965495ae3
		│   └── 1675329764
		│       ├── benchmark.csv
		│       └── challenge.json
		├── 9d16df08-dffc-484e-bbe2-10c00b431c7e
		│   └── 1675329764
		│       ├── benchmark.csv
		│       └── challenge.json
		└── fa9f7c24-afec-4505-adb9-8e0c3ce54d37
		    └── 1675329764
			├── benchmark.csv
			└── challenge.json

		7 directories, 6 files
		```
feat: publish benchmark data periodically (configurable)
Some checks failed
ci/woodpecker/push/woodpecker Pipeline failed
ci/woodpecker/pr/woodpecker Pipeline failed
a3f2c3632e
realaravinth self-assigned this 2023-03-12 20:37:19 +05:30
realaravinth changed title from WIP: Periodically publish results from mCaptcha/survey. to Periodically publish results from mCaptcha/survey. 2023-03-12 21:15:36 +05:30
realaravinth merged commit 6657dba05b into master 2023-03-12 21:15:42 +05:30
Sign in to join this conversation.
No description provided.