mCaptcha/survey

Fork 0

Periodically publish results from mCaptcha/survey. #15

Merged

realaravinth merged 5 commits from publish-results into master

2023-03-12 21:15:42 +05:30

realaravinth commented

2023-03-12 20:32:05 +05:30

Owner

Periodically publish benchmark results

closes mCaptcha/2023-NLnet#4

Motivation

Proof-of-Work (PoW) difficulty is configurable in mCaptcha. It is important that webmasters choose difficulty factors that are effective at containing DoS attacks while also allowing the majority of legitimate visitors generate proofs.

mCaptcha/survey runs benchmark campaigns and aggregates performance statistics from its visitors. This data can guide webmasters to choosing effective PoW difficulty factors, if it is publicly available and downloadable.

Also, mCaptcha/mCaptcha will soon be able to pull benchmark results from mCaptcha/survey instances to make effective and user-friendly Proof-of-Work difficulty recommendations.

So, this patch publishes campaign results periodically.

Description

Results are published periodically. Publication period is configured by the mCaptcha/survey admin.

Results are published at http://{instance-hostname}/download in the following structure:

└── <campaign ID>
    ├── <publication timestamp>
    │   ├── benchmark.csv
    │   └── campaign.json
    └── <publication timetamp>
        ├── benchmark.csv
        └── campaign.json

Data formats

Campaign information

Each campaign will run benchmarks with a different set of PoW configurations. Therefore, the benchmarks produced by each campaign is unique and is closely tied to the campaign's PoW configuration.

The campaign metadata is published in campaign.json:

{
	"id":"4e951e01-71ee4a18-9b97-782965495ae3", ## campaign ID
	"name":"lab.b.1", ## campaign name
	"difficulties":[11,12], ## campaign difficulties
	"created_at":1674650743 ## campaign creation timestamp
}

Benchmark results

Each campaign participant gets an ephemeral user ID so that if they run the campaign multiple times, their benchmarks can be aggregated.

The campaign result includes the following data:

Benchmark ID
user identifier
device identifier as provided by the user (ex: pinephone pro)
user agent
Number of threads in device, identified through client-side JavaScript code
Submission timestamp
Type of benchmark suite executed: wasm/js
Difficulty level and its execution time in milliseconds. Difficulty factor 5000 will have a unique column in CSV file as Difficulty 5000, which will contain its execution times.

Example:

CSV converted to markdown for convenience:

ID	user	device_user_provided	device_software_recognised	threads	submitted_at	submission_type	Difficulty 11	Difficulty 12
27	17b31d3d-9b22-4005-8309-c1a5bf444476	lab.b.1	Mozilla/5.0 (X11; Linux x86_64; rv:109.0) Gecko/20100101 Firefox/109.0	12	1675244542	wasm	0	0
28	17b31d3d-9b22-4005-8309-c1a5bf444476	pativu	Mozilla/5.0 (X11; Linux x86_64; rv:109.0) Gecko/20100101 Firefox/109.0	12	1675244542	wasm	-	-

Raw CSV data:

ID,user,device_user_provided,device_software_recognised,threads,submitted_at,submission_type,Difficulty 11,Difficulty 12
27,17b31d3d-9b22-4005-8309-c1a5bf444476,lab.b.1,Mozilla/5.0 (X11; Linux x86_64; rv:109.0) Gecko/20100101 Firefox/109.0,12,1675244542,wasm,0,0
28,17b31d3d-9b22-4005-8309-c1a5bf444476,pativu,Mozilla/5.0 (X11; Linux x86_64; rv:109.0) Gecko/20100101 Firefox/109.0,12,1675244542,wasm,-,-

# Periodically publish benchmark results > closes https://git.batsense.net/mCaptcha/2023-NLnet/issues/4 ## Motivation Proof-of-Work (PoW) difficulty is configurable in mCaptcha. It is important that webmasters choose difficulty factors that are effective at containing DoS attacks while also allowing the majority of legitimate visitors generate proofs. mCaptcha/survey runs benchmark campaigns and aggregates performance statistics from its visitors. This data can guide webmasters to choosing effective PoW difficulty factors, if it is publicly available and downloadable. Also, mCaptcha/mCaptcha will soon be able to pull benchmark results from mCaptcha/survey instances to make effective and user-friendly Proof-of-Work difficulty recommendations. So, this patch publishes campaign results periodically. ## Description Results are published periodically. Publication period is configured by the mCaptcha/survey admin. Results are published at `http://{instance-hostname}/download` in the following structure: ```bash └── <campaign ID> ├── <publication timestamp> │ ├── benchmark.csv │ └── campaign.json └── <publication timetamp> ├── benchmark.csv └── campaign.json ``` ### Data formats #### Campaign information Each campaign will run benchmarks with a different set of PoW configurations. Therefore, the benchmarks produced by each campaign is unique and is closely tied to the campaign's PoW configuration. The campaign metadata is published in `campaign.json`: ```json { "id":"4e951e01-71ee4a18-9b97-782965495ae3", ## campaign ID "name":"lab.b.1", ## campaign name "difficulties":[11,12], ## campaign difficulties "created_at":1674650743 ## campaign creation timestamp } ``` #### Benchmark results Each campaign participant gets an ephemeral user ID so that if they run the campaign multiple times, their benchmarks can be aggregated. The campaign result includes the following data: 1. Benchmark ID 2. user identifier 3. device identifier as provided by the user (ex: pinephone pro) 4. user agent 5. Number of threads in device, identified through client-side JavaScript code 6. Submission timestamp 7. Type of benchmark suite executed: wasm/js 8. Difficulty level and its execution time in milliseconds. Difficulty factor 5000 will have a unique column in CSV file as `Difficulty 5000`, which will contain its execution times. ##### Example: CSV converted to markdown for convenience: | ID | user | device_user_provided | device_software_recognised | threads | submitted_at | submission_type | Difficulty 11 | Difficulty 12 | | --- | ------------------------------------ | -------------------- | ---------------------------------------------------------------------- | ------- | ------------ | --------------- | ------------- | ------------- | | 27 | 17b31d3d-9b22-4005-8309-c1a5bf444476 | lab.b.1 | Mozilla/5.0 (X11; Linux x86_64; rv:109.0) Gecko/20100101 Firefox/109.0 | 12 | 1675244542 | wasm | 0 | 0 | | 28 | 17b31d3d-9b22-4005-8309-c1a5bf444476 | pativu | Mozilla/5.0 (X11; Linux x86_64; rv:109.0) Gecko/20100101 Firefox/109.0 | 12 | 1675244542 | wasm | - | - | Raw CSV data: ```csv ID,user,device_user_provided,device_software_recognised,threads,submitted_at,submission_type,Difficulty 11,Difficulty 12 27,17b31d3d-9b22-4005-8309-c1a5bf444476,lab.b.1,Mozilla/5.0 (X11; Linux x86_64; rv:109.0) Gecko/20100101 Firefox/109.0,12,1675244542,wasm,0,0 28,17b31d3d-9b22-4005-8309-c1a5bf444476,pativu,Mozilla/5.0 (X11; Linux x86_64; rv:109.0) Gecko/20100101 Firefox/109.0,12,1675244542,wasm,-,- ```

realaravinth added 5 commits

2023-03-12 20:32:06 +05:30

feat: add archive base_path in settings a44f6f1748

feat: archive campaign and benchmark data 604fca0a62

DESCRIPTION
    FORMATS
	- Campaign configuration is stored in JSON format
	- Benchmark data is stored in CSV format

    DIRECTORY STRUCTURE
	Each campaign gets a separate directory. A campaign can have
	multiple archives. Archives are stored in directories whose names
	would be the same as the UNIX timestamp of when they were
	recorded.

	EXAMPLE
	    The example below shows three campaigns with one archive
	    each. Each archive is stored in a directory denoting the
	    moment in which the archive was generated. Each archive
	    includes campaign configuration and benchmark.

	    ```bash
		14:53 atm@lab archive → tree
		.
		├── 4e951e01-71ee-4a18-9b97-782965495ae3
		│   └── 1675329764
		│       ├── benchmark.csv
		│       └── challenge.json
		├── 9d16df08-dffc-484e-bbe2-10c00b431c7e
		│   └── 1675329764
		│       ├── benchmark.csv
		│       └── challenge.json
		└── fa9f7c24-afec-4505-adb9-8e0c3ce54d37
		    └── 1675329764
			├── benchmark.csv
			└── challenge.json

		7 directories, 6 files
		```

feat: cleanup archiver and include tests d2c52cc62c

feat: read publication dir config and serve it 9411c2ba9f

feat: publish benchmark data periodically (configurable)

ci/woodpecker/push/woodpecker Pipeline failed

Details

ci/woodpecker/pr/woodpecker Pipeline failed

Details

a3f2c3632e

realaravinth self-assigned this

2023-03-12 20:37:19 +05:30

realaravinth changed title from ~~WIP: Periodically publish results from mCaptcha/survey.~~ to Periodically publish results from mCaptcha/survey.

2023-03-12 21:15:36 +05:30

realaravinth merged commit 6657dba05b into master

2023-03-12 21:15:42 +05:30

realaravinth referenced this pull request from a commit

2023-03-12 21:15:43 +05:30

Merge pull request 'Periodically publish results from mCaptcha/survey.' (#15) from publish-results into master

realaravinth referenced this pull request from mCaptcha/2023-NLnet

2023-03-14 20:26:26 +05:30

Periodically publish results from mCaptcha/survey #4