This addresses https://github.com/go-gitea/gitea/issues/18352
It aims to improve performance (and resource use) of the `SyncReleasesWithTags` operation for pull-mirrors.
For large repositories with many tags, `SyncReleasesWithTags` can be a costly operation (taking several minutes to complete). The reason is two-fold:
1. on sync, every upstream repo tag is compared (for changes) against existing local entries in the release table to ensure that they are up-to-date.
2. the procedure for getting _each tag_ involves a series of git operations
```bash
git show-ref --tags -- v8.2.4477
git cat-file -t 29ab6ce9f36660cffaad3c8789e71162e5db5d2f
git cat-file -p 29ab6ce9f36660cffaad3c8789e71162e5db5d2f
git rev-list --count 29ab6ce9f36660cffaad3c8789e71162e5db5d2f
```
of which the `git rev-list --count` can be particularly heavy.
This PR optimizes performance for pull-mirrors. We utilize the fact that a pull-mirror is always identical to its upstream and rebuild the entire release table on every sync and use a batch `git for-each-ref .. refs/tags` call to retrieve all tags in one go.
For large mirror repos, with hundreds of annotated tags, this brings down the duration of the sync operation from several minutes to a few seconds. A few unscientific examples run on my local machine:
- https://github.com/spring-projects/spring-boot (223 tags)
- before: `0m28,673s`
- after: `0m2,244s`
- https://github.com/kubernetes/kubernetes (890 tags)
- before: `8m00s`
- after: `0m8,520s`
- https://github.com/vim/vim (13954 tags)
- before: `14m20,383s`
- after: `0m35,467s`
I added a `foreachref` package which contains a flexible way of specifying which reference fields are of interest (`git-for-each-ref(1)`) and to produce a parser for the expected output. These could be reused in other places where `for-each-ref` is used. I'll add unit tests for those if the overall PR looks promising.
This follows
* https://github.com/go-gitea/gitea/issues/18553
Introduce `RunWithContextString` and `RunWithContextBytes` to help the refactoring. Add related unit tests. They keep the same behavior to save stderr into err.Error() as `RunInXxx` before.
Remove `RunInDirTimeoutPipeline` `RunInDirTimeoutFullPipeline` `RunInDirTimeout` `RunInDirTimeoutEnv` `RunInDirPipeline` `RunInDirFullPipeline` `RunTimeout`, `RunInDirTimeoutEnvPipeline`, `RunInDirTimeoutEnvFullPipeline`, `RunInDirTimeoutEnvFullPipelineFunc`.
Then remaining `RunInDir` `RunInDirBytes` `RunInDirWithEnv` can be easily refactored in next PR with a simple search & replace:
* before: `stdout, err := RunInDir(path)`
* next: `stdout, _, err := RunWithContextString(&git.RunContext{Dir:path})`
Other changes:
1. When `timeout <= 0`, use default. Because `timeout==0` is meaningless and could cause bugs. And now many functions becomes more simple, eg: `GitGcRepos` 9 lines to 1 line. `Fsck` 6 lines to 1 line.
2. Only set defaultCommandExecutionTimeout when the option `setting.Git.Timeout.Default > 0`
Strangely #19038 appears to relate to an issue whereby a tag appears to
be listed in `git show-ref --tags` but then does not appear when `git
show-ref --tags -- short_name` is called.
As a solution though I propose to stop the second call as it is
unnecessary and only likely to cause problems.
I've also noticed that the tags calls are wildly inefficient and aren't using the common cat-files - so these have been added.
I've also noticed that the git commit-graph is not being written on mirroring - so I've also added writing this to the migration which should improve mirror rendering somewhat.
Fix #19038
Signed-off-by: Andrew Thornton <art27@cantab.net>
Co-authored-by: 6543 <6543@obermui.de>
The git command by default adds a number of global arguments. These are not
helpful to be displayed in the process manager and so should be skipped for
default process descriptions.
Signed-off-by: Andrew Thornton <art27@cantab.net>
Make SKIP_TLS_VERIFY apply to git data migrations too through adding the `-c http.sslVerify=false` option to the git clone command.
Fix #18998
Signed-off-by: Andrew Thornton <art27@cantab.net>
It appears possible that there could be a hang due to unread data from the
repo-attribute command pipes. This PR simply closes these during the defer.
Signed-off-by: Andrew Thornton <art27@cantab.net>
This code adds a simple endpoint to apply patches to repositories and
branches on gitea. This is then used along with the conflicting checking
code in #18004 to provide a basic implementation of cherry-pick revert.
Now because the buttons necessary for cherry-pick and revert have
required us to create a dropdown next to the Browse Source button
I've also implemented Create Branch and Create Tag operations.
Fix #3880
Fix #17986
Signed-off-by: Andrew Thornton <art27@cantab.net>
* Ensure git tag tests and other create test repos in tmpdir
There are a few places where tests appear to reuse testing repos which
causes random CI failures.
This PR simply changes these tests to ensure that cloning always happens
into new temporary directories.
Fix #18444
* Change log root for integration tests to use the REPO_TEST_DIR
There is a potential race in the drone integration tests whereby test-mysql etc
will start writing to log files causing make test-check fail.
Fix #18077
Signed-off-by: Andrew Thornton <art27@cantab.net>
- Pass the Global command args into serviceRPC.
- Fixes error with partial cloning.
- Add partial clone test
- Include diff
Co-authored-by: Lunny Xiao <xiaolunwen@gmail.com>
This PR continues the work in #17125 by progressively ensuring that git
commands run within the request context.
This now means that the if there is a git repo already open in the context it will be used instead of reopening it.
Signed-off-by: Andrew Thornton <art27@cantab.net>
Git will and can pack references into packfiles and therefore if you write/read the
files directly you will get false results. Instead you should use update-ref and
show-ref. To that end I have created three new functions in git/repo_commit.go that
will do this correctly.
Related #17191
Signed-off-by: Andrew Thornton <art27@cantab.net>
There are repeated panics in tests due to TestRepository_GetTag failing
to run properly. This happens when we attempt to reset the internal
repo for a tag which has failed to load. The problem is - the panic that
this is causing is preventing us from finding what the real error is.
This PR simply moves the failure out so we have a chance to see what
really is failing.
Signed-off-by: Andrew Thornton <art27@cantab.net>
Co-authored-by: 6543 <6543@obermui.de>
Co-authored-by: Lunny Xiao <xiaolunwen@gmail.com>
Co-authored-by: wxiaoguang <wxiaoguang@gmail.com>
* Prevent off-by-one error on comments on newly appended lines
There was a bug in CutDiffAroundLine whereby if a file without a terminal new line
has a patch which appends lines to it and a comment is placed on one of those lines
the comment diff will be a line out of place.
This fixes CutDiffAroundLine to simply ignore the missing terminal newline - however,
we should really improve this rendering to add a marker to say that there was a
previously missing terminal newline.
Fix #17875
Signed-off-by: Andrew Thornton <art27@cantab.net>
The current TestPatch conflict code uses a plain git apply which does not properly
account for 3-way merging. However, we can improve things using `git read-tree -m` to
do a three-way merge then follow the algorithm used in merge-one-file. We can also use
`--patience` and/or `--histogram` to generate a nicer diff for applying patches too.
Fix #13679
Fix #6417
Signed-off-by: Andrew Thornton <art27@cantab.net>
This PR contains multiple fixes. The most important of which is:
* Prevent hang in git cat-file if the repository is not a valid repository
Unfortunately it appears that if git cat-file is run in an invalid
repository it will hang until stdin is closed. This will result in
deadlocked /pulls pages and dangling git cat-file calls if a broken
repository is tried to be reviewed or pulls exists for a broken
repository.
Fix #14734
Fix #9271
Fix #16113
Otherwise there are a few small other fixes included which this PR was initially intending to fix:
* Fix panic on partial compares due to missing PullRequestWorkInProgressPrefixes
* Fix links on pulls pages due to regression from #17551 - by making most /issues routes match /pulls too - Fix #17983
* Fix links on feeds pages due to another regression from #17551 but also fix issue with syncing tags - Fix #17943
* Add missing locale entries for oauth group claims
* Prevent NPEs if ColorFormat is called on nil users, repos or teams.
The current implementation of checkBranchName is highly inefficient
involving opening the repository, the listing all of the branch names
checking them individually before then using using opened repo to get
the tags.
This PR avoids this by simply walking the references from show-ref
instead of opening the repository (in the nogogit case).
Signed-off-by: Andrew Thornton <art27@cantab.net>
This PR registers requests with the process manager and manages hierarchy within the processes.
Git repos are then associated with a context, (usually the request's context) - with sub commands using this context as their base context.
Signed-off-by: Andrew Thornton <art27@cantab.net>
Use check attribute code to check the assigned language of a file and send that in to
chroma as a hint for the language of the file.
Signed-off-by: Andrew Thornton <art27@cantab.net>
There are multiple places where Gitea does not properly escape URLs that it is building and there are multiple places where it builds urls when there is already a simpler function available to use this.
This is an extensive PR attempting to fix these issues.
1. The first commit in this PR looks through all href, src and links in the Gitea codebase and has attempted to catch all the places where there is potentially incomplete escaping.
2. Whilst doing this we will prefer to use functions that create URLs over recreating them by hand.
3. All uses of strings should be directly escaped - even if they are not currently expected to contain escaping characters. The main benefit to doing this will be that we can consider relaxing the constraints on user names and reponames in future.
4. The next commit looks at escaping in the wiki and re-considers the urls that are used there. Using the improved escaping here wiki files containing '/'. (This implementation will currently still place all of the wiki files the root directory of the repo but this would not be difficult to change.)
5. The title generation in feeds is now properly escaped.
6. EscapePound is no longer needed - urls should be PathEscaped / QueryEscaped as necessary but then re-escaped with Escape when creating html with locales Signed-off-by: Andrew Thornton <art27@cantab.net>
Signed-off-by: Andrew Thornton <art27@cantab.net>
There is a slight race in checking of a context deadline exceed in #16467
which leads to a 500 on the repository page.
The solution is to check the error coming back from `*LogNameStatusRepoParser.Next()`
and if it is the `ContextDeadlineExceeded` break from the loop.
Fix #17314
Signed-off-by: Andrew Thornton <art27@cantab.net>
core.protectNTFS protects NTFS from files which may be difficult to remove or interact
with using the win32 api, however, it also appears to prevent such files from
being entered into the git indexes - fundamentally causing breakages with PRs that
affect these files. However, deliberately setting this to false may cause security
issues due to the remain sparse checkout of files in the merge pipeline.
The only sensible option therefore is to provide an optional setting which admins
could set which would forcibly switch this off if they are affected by this issue.
Fix #17092
Signed-off-by: Andrew Thornton <art27@cantab.net>
- Update default branch if needed
- Update protected branch if needed
- Update all not merged pull request base branch name
- Rename git branch
- Record this rename work and auto redirect for old branch on ui
Signed-off-by: a1012112796 <1012112796@qq.com>
Co-authored-by: delvh <dev.lh@web.de>
One of the biggest reasons for slow repository browsing is that we wait
until last commit information has been generated for all files in the
repository.
This PR proposes deferring this generation to a new POST endpoint that
does the look up outside of the main page request.
Signed-off-by: Andrew Thornton <art27@cantab.net>