starchart/docs/published-crwaled-data.md

2.2 KiB

Published Crawled Data

Starchart publishes all crawled data. This document explains the format(s) and the directory structure of the published data.

Directory Structure

(lab)➜  starchart tree data
data
└── git.batsense.net
	├── instance.yml
	└── realaravinth
		├── analysis-of-captcha-systems
		│   └── publiccode.yml
        └── user.yml

Snippet of data crawled from git.batsense.net

Forge

Each forge instance gets its own directory in the repository root path specified in the configuration. All data crawled from an instance will be stored in the instance's directory only.

Each forge instance directory contains an instance.yml file that describes the instance. The schema of instance.yml might change as starchart is currently under development.

---
hostname: git.batsense.net
forge_type: gitea

example instance.yml

User

A forge instance's user gets their own subdirectory in starchart and an user.yml to describe them. Information on all their repositories will be stored under this subdirectory.

Like instance.yml, user.yml schema is not finalized too.

---
hostname: git.batsense.net
username: realaravinth
html_link: "https://git.batsense.net/realaravinth"
profile_photo: "https://git.batsense.net/avatars/bc11e95d9356ac4bdc035964be00ff0d"

example user.yml

Repository

Repository information is stored under the owner's subdirectory. Currently, partial support for publiccodeyml is implemented. So all repository information is stored in publiccode.yml under the repository subdirectory.

---
publiccodeYmlVersion: "0.2"
name: git.batsense.net
url: "https://git.batsense.net/realaravinth/git.batsense.net"
description:
    en:
        shortDescription: "Instance administration logs and discussions pertaining to this Gitea instance. Have a question about git.batsense.net? Please create an issue on this repository! :)"

example publiccode.yml implemented by starchart

See forgeflux-org/starchart#3 and publiccodeyml/publiccodeyml/discussions for more information.