forgejo-federation/modules
Peter Gardfjäll e28cc79c92
Improve sync performance for pull-mirrors (#19125)
This addresses https://github.com/go-gitea/gitea/issues/18352

It aims to improve performance (and resource use) of the `SyncReleasesWithTags` operation for pull-mirrors.

For large repositories with many tags, `SyncReleasesWithTags` can be a costly operation (taking several minutes to complete). The reason is two-fold:
    
1. on sync, every upstream repo tag is compared (for changes) against existing local entries in the release table to ensure that they are up-to-date.
    
2. the procedure for getting _each tag_ involves a series of git operations    
    ```bash
     git show-ref --tags -- v8.2.4477
     git cat-file -t 29ab6ce9f36660cffaad3c8789e71162e5db5d2f
     git cat-file -p 29ab6ce9f36660cffaad3c8789e71162e5db5d2f
     git rev-list --count 29ab6ce9f36660cffaad3c8789e71162e5db5d2f
     ```    

     of which the `git rev-list --count` can be particularly heavy.
    
This PR optimizes performance for pull-mirrors. We utilize the fact that a pull-mirror is always identical to its upstream and rebuild the entire release table on every sync and use a batch `git for-each-ref .. refs/tags` call to retrieve all tags in one go.
    
For large mirror repos, with hundreds of annotated tags, this brings down the duration of the sync operation from several minutes to a few seconds. A few unscientific examples run on my local machine:

- https://github.com/spring-projects/spring-boot (223 tags)
  - before: `0m28,673s`
  - after: `0m2,244s`
- https://github.com/kubernetes/kubernetes (890 tags)
  - before: `8m00s`
  - after: `0m8,520s`
- https://github.com/vim/vim (13954 tags)
  - before: `14m20,383s`
  - after: `0m35,467s`

 

I added a `foreachref` package which contains a flexible way of specifying which reference fields are of interest (`git-for-each-ref(1)`) and to produce a parser for the expected output. These could be reused in other places where `for-each-ref` is used.  I'll add unit tests for those if the overall PR looks promising.
2022-03-31 14:30:40 +02:00
..
activitypub format with gofumpt (#18184) 2022-01-20 18:46:10 +01:00
analyze Use git attributes to determine generated and vendored status for language stats and diffs (#16773) 2021-09-09 21:13:36 +01:00
appstate format with gofumpt (#18184) 2022-01-20 18:46:10 +01:00
auth RSS/Atom support for Repos (#19055) 2022-03-13 17:40:47 +01:00
avatar format with gofumpt (#18184) 2022-01-20 18:46:10 +01:00
base format with gofumpt (#18184) 2022-01-20 18:46:10 +01:00
cache format with gofumpt (#18184) 2022-01-20 18:46:10 +01:00
charset Don't treat BOM escape sequence as hidden character. (#18909) 2022-02-26 16:48:23 +00:00
container Move reaction to models/issues/ (#19264) 2022-03-31 17:20:39 +08:00
context Add Package Registry (#16510) 2022-03-30 16:42:47 +08:00
convert Add Package Registry (#16510) 2022-03-30 16:42:47 +08:00
csv format with gofumpt (#18184) 2022-01-20 18:46:10 +01:00
doctor Make git.OpenRepository accept Context (#19260) 2022-03-30 03:13:41 +08:00
emoji format with gofumpt (#18184) 2022-01-20 18:46:10 +01:00
eventsource Simplify parameter types (#18006) 2021-12-20 04:41:31 +00:00
generate Use base32 for 2FA scratch token (#18384) 2022-01-26 12:10:10 +08:00
git Improve sync performance for pull-mirrors (#19125) 2022-03-31 14:30:40 +02:00
gitgraph Make git.OpenRepository accept Context (#19260) 2022-03-30 03:13:41 +08:00
graceful Add pprof labels in processes and for lifecycles (#19202) 2022-03-25 12:47:12 +00:00
hcaptcha hCaptcha Support (#12594) 2020-10-02 23:37:53 -04:00
highlight format with gofumpt (#18184) 2022-01-20 18:46:10 +01:00
hostmatcher remove not needed (#19128) 2022-03-18 20:17:57 +01:00
httpcache format with gofumpt (#18184) 2022-01-20 18:46:10 +01:00
httplib refactor httplib (#18338) 2022-01-19 19:31:39 -05:00
indexer Make git.OpenRepository accept Context (#19260) 2022-03-30 03:13:41 +08:00
json Make gitea, gitea-vet future-proof (#18361) 2022-01-22 21:59:34 +00:00
lfs Update HTTP status codes to modern codes (#18063) 2022-03-23 12:54:07 +08:00
log Add auto logging of goroutine pid label (#19212) 2022-03-26 20:04:36 +00:00
markup Make git.OpenRepository accept Context (#19260) 2022-03-30 03:13:41 +08:00
metrics format with gofumpt (#18184) 2022-01-20 18:46:10 +01:00
migration Store the foreign ID of issues during migration (#18446) 2022-03-17 18:08:35 +01:00
nosql Add Redis Sentinel Authentication Support (#19213) 2022-03-30 21:12:02 +02:00
notification Add Package Registry (#16510) 2022-03-30 16:42:47 +08:00
options format with gofumpt (#18184) 2022-01-20 18:46:10 +01:00
packages Add Package Registry (#16510) 2022-03-30 16:42:47 +08:00
password Fixed assert statements. (#16089) 2021-06-07 07:27:09 +02:00
pprof refactor: move from io/ioutil to io and os package (#17109) 2021-09-22 13:38:34 +08:00
private Update HTTP status codes to modern codes (#18063) 2022-03-23 12:54:07 +08:00
process Refactor git.Command.Run*, introduce RunWithContextString and RunWithContextBytes (#19266) 2022-03-31 13:56:22 +02:00
proxy Return nil proxy function if proxy not enabled (#16742) 2021-08-19 16:41:20 -04:00
public Fix mime-type detection for HTTP server (#18370) 2022-01-23 20:19:49 +08:00
queue Add number in queue status to monitor page (#18712) 2022-02-12 13:31:26 +08:00
recaptcha refactor: move from io/ioutil to io and os package (#17109) 2021-09-22 13:38:34 +08:00
references format with gofumpt (#18184) 2022-01-20 18:46:10 +01:00
repository Improve sync performance for pull-mirrors (#19125) 2022-03-31 14:30:40 +02:00
secret Use CryptoRandomBytes instead of CryptoRandomString (#18439) 2022-02-04 18:03:15 +01:00
session format with gofumpt (#18184) 2022-01-20 18:46:10 +01:00
setting Add Package Registry (#16510) 2022-03-30 16:42:47 +08:00
ssh Update golang.org/x/crypto (#19097) 2022-03-16 02:59:53 +01:00
storage Add Package Registry (#16510) 2022-03-30 16:42:47 +08:00
structs Add Package Registry (#16510) 2022-03-30 16:42:47 +08:00
svg refactor: move from io/ioutil to io and os package (#17109) 2021-09-22 13:38:34 +08:00
sync Fix missing unlock in uniquequeue (#9790) 2020-01-15 23:58:33 +02:00
templates Add Package Registry (#16510) 2022-03-30 16:42:47 +08:00
test Make git.OpenRepository accept Context (#19260) 2022-03-30 03:13:41 +08:00
timeutil format with gofumpt (#18184) 2022-01-20 18:46:10 +01:00
translation Refactor i18n, use Locale to provide i18n/translation related functions (#18648) 2022-02-08 11:02:30 +08:00
typesniffer format with gofumpt (#18184) 2022-01-20 18:46:10 +01:00
updatechecker format with gofumpt (#18184) 2022-01-20 18:46:10 +01:00
upload Simplify parameter types (#18006) 2021-12-20 04:41:31 +00:00
uri Prevent NPE if gitea uploader fails to open url (#18080) 2021-12-23 16:27:33 +00:00
user Add gitea-vet (#10948) 2020-04-05 07:20:50 +01:00
util Use a more general (and faster) method to sanitize URLs with credentials (#19239) 2022-03-31 10:25:40 +08:00
validation format with gofumpt (#18184) 2022-01-20 18:46:10 +01:00
web Update HTTP status codes to modern codes (#18063) 2022-03-23 12:54:07 +08:00