feat: document craweled data format and dir structure
This commit is contained in:
parent
8f51205d64
commit
664f03d6eb
1 changed files with 80 additions and 0 deletions
80
docs/published-crwaled-data.md
Normal file
80
docs/published-crwaled-data.md
Normal file
|
@ -0,0 +1,80 @@
|
||||||
|
# Published Crawled Data
|
||||||
|
|
||||||
|
Starchart publishes all crawled data. This document explains the
|
||||||
|
format(s) and the directory structure of the published data.
|
||||||
|
|
||||||
|
## Directory Structure
|
||||||
|
|
||||||
|
```bash
|
||||||
|
(lab)➜ starchart tree data
|
||||||
|
data
|
||||||
|
└── git.batsense.net
|
||||||
|
├── instance.yml
|
||||||
|
└── realaravinth
|
||||||
|
├── analysis-of-captcha-systems
|
||||||
|
│ └── publiccode.yml
|
||||||
|
└── user.yml
|
||||||
|
```
|
||||||
|
|
||||||
|
> Snippet of data crawled from git.batsense.net
|
||||||
|
|
||||||
|
## Forge
|
||||||
|
|
||||||
|
Each forge instance gets its own directory in the repository root path
|
||||||
|
specified in the [configuration](../config/default.toml). All data
|
||||||
|
crawled from an instance will be stored in the instance's directory
|
||||||
|
only.
|
||||||
|
|
||||||
|
Each forge instance directory contains an `instance.yml` file that
|
||||||
|
describes the instance. The schema of `instance.yml` might change as
|
||||||
|
starchart is currently under development.
|
||||||
|
|
||||||
|
```yml
|
||||||
|
---
|
||||||
|
hostname: git.batsense.net
|
||||||
|
forge_type: gitea
|
||||||
|
```
|
||||||
|
|
||||||
|
> example instance.yml
|
||||||
|
|
||||||
|
## User
|
||||||
|
|
||||||
|
A forge instance's user gets their own subdirectory in starchart and an
|
||||||
|
`user.yml` to describe them. Information on all their repositories will be stored under
|
||||||
|
this subdirectory.
|
||||||
|
|
||||||
|
Like `instance.yml`, `user.yml` schema is not finalized too.
|
||||||
|
|
||||||
|
```yml
|
||||||
|
---
|
||||||
|
hostname: git.batsense.net
|
||||||
|
username: realaravinth
|
||||||
|
html_link: "https://git.batsense.net/realaravinth"
|
||||||
|
profile_photo: "https://git.batsense.net/avatars/bc11e95d9356ac4bdc035964be00ff0d"
|
||||||
|
```
|
||||||
|
|
||||||
|
> example user.yml
|
||||||
|
|
||||||
|
## Repository
|
||||||
|
|
||||||
|
Repository information is stored under the owner's subdirectory.
|
||||||
|
Currently, partial support for
|
||||||
|
[publiccodeyml](https://yml.publiccode.tools/) is implemented. So all
|
||||||
|
repository information is stored in `publiccode.yml` under the
|
||||||
|
repository subdirectory.
|
||||||
|
|
||||||
|
```yml
|
||||||
|
---
|
||||||
|
publiccodeYmlVersion: "0.2"
|
||||||
|
name: git.batsense.net
|
||||||
|
url: "https://git.batsense.net/realaravinth/git.batsense.net"
|
||||||
|
description:
|
||||||
|
en:
|
||||||
|
shortDescription: "Instance administration logs and discussions pertaining to this Gitea instance. Have a question about git.batsense.net? Please create an issue on this repository! :)"
|
||||||
|
```
|
||||||
|
|
||||||
|
> example publiccode.yml implemented by starchart
|
||||||
|
|
||||||
|
See
|
||||||
|
[forgeflux-org/starchart#3](https://github.com/forgeflux-org/starchart/issues/3) and
|
||||||
|
[publiccodeyml/publiccodeyml/discussions](https://github.com/publiccodeyml/publiccode.yml/discussions/157) for more information.
|
Loading…
Reference in a new issue