As recognised in #21841 the rendering of plain text files is somewhat
incorrect when there are ambiguous characters as the html code is double
escaped. In fact there are several more problems here.
We have a residual isRenderedHTML which is actually simply escaping the
file - not rendering it. This is badly named and gives the wrong
impression.
There is also unusual behaviour whether the file is called a Readme or
not and there is no way to get to the source code if the file is called
README.
In reality what should happen is different depending on whether the file
is being rendered a README at the bottom of the directory view or not.
1. If it is rendered as a README on a directory - it should simply be
escaped and rendered as `<pre>` text.
2. If it is rendered as a file then it should be rendered as source
code.
This PR therefore does:
1. Rename IsRenderedHTML to IsPlainText
2. Readme files rendered at the bottom of the directory are rendered
without line numbers
3. Otherwise plain text files are rendered as source code.
Replace #21841
Signed-off-by: Andrew Thornton <art27@cantab.net>
Signed-off-by: Andrew Thornton <art27@cantab.net>
Co-authored-by: Lunny Xiao <xiaolunwen@gmail.com>
Although there are per-locale fallbacks for ambiguity the locale names
for Chinese do not quite match our locales. This PR simply maps zh-CN on
to zh-hans and other zh variants on to zh-hant.
Ref #20999
Signed-off-by: Andrew Thornton <art27@cantab.net>
Change all license headers to comply with REUSE specification.
Fix #16132
Co-authored-by: flynnnnnnnnnn <flynnnnnnnnnn@github>
Co-authored-by: John Olheiser <john.olheiser@gmail.com>
The `go-licenses` make task introduced in #21034 is being run on make vendor
and occasionally causes an empty go-licenses file if the vendors need to
change. This should be moved to the generate task as it is a generated file.
Now because of this change we also need to split generation into two separate
steps:
1. `generate-backend`
2. `generate-frontend`
In the future it would probably be useful to make `generate-swagger` part of `generate-frontend` but it's not tolerated with our .drone.yml
Ref #21034
Signed-off-by: Andrew Thornton <art27@cantab.net>
Signed-off-by: Andrew Thornton <art27@cantab.net>
Co-authored-by: delvh <dev.lh@web.de>
The recovery, API, Web and package frameworks all create their own HTML
Renderers. This increases the memory requirements of Gitea
unnecessarily with duplicate templates being kept in memory.
Further the reloading framework in dev mode for these involves locking
and recompiling all of the templates on each load. This will potentially
hide concurrency issues and it is inefficient.
This PR stores the templates renderer in the context and stores this
context in the NormalRoutes, it then creates a fsnotify.Watcher
framework to watch files.
The watching framework is then extended to the mailer templates which
were previously not being reloaded in dev.
Then the locales are simplified to a similar structure.
Fix #20210
Fix #20211
Fix #20217
Signed-off-by: Andrew Thornton <art27@cantab.net>
This PR rewrites the invisible unicode detection algorithm to more
closely match that of the Monaco editor on the system. It provides a
technique for detecting ambiguous characters and relaxes the detection
of combining marks.
Control characters are in addition detected as invisible in this
implementation whereas they are not on monaco but this is related to
font issues.
Close #19913
Signed-off-by: Andrew Thornton <art27@cantab.net>
Our character detection algorithm can potentially incorrectly detect utf-8 as iso-8859-x
if there is a truncated character at the end of the partially read file.
This PR changes the detection algorithm to truncated utf8 characters at the end of the
buffer.
Fix #19743
Signed-off-by: Andrew Thornton <art27@cantab.net>
* Don't treat BOM escape sequence as hidden character.
- BOM sequence is a common non-harmfull escape sequence, it shouldn't be
shown as hidden character.
- Follows GitHub's behavior.
- Resolves #18837
Co-authored-by: wxiaoguang <wxiaoguang@gmail.com>
There is a potential panic due to a mistaken resetting of the length parameter when
multibyte characters go over a read boundary.
Signed-off-by: Andrew Thornton <art27@cantab.net>
Fix #17514
Given the comments I've adjusted this somewhat. The numbers of characters detected are increased and include things like the use of U+300 to make à instead of à and non-breaking spaces.
There is a button which can be used to escape the content to show it.
Signed-off-by: Andrew Thornton <art27@cantab.net>
Co-authored-by: Gwyneth Morgan <gwymor@tilde.club>
Co-authored-by: silverwind <me@silverwind.io>
Co-authored-by: wxiaoguang <wxiaoguang@gmail.com>
The io/ioutil package has been deprecated as of Go 1.16, see
https://golang.org/doc/go1.16#ioutil. This commit replaces the existing
io/ioutil functions with their new definitions in io and os packages.
Signed-off-by: Eng Zer Jun <engzerjun@gmail.com>
Co-authored-by: techknowlogick <techknowlogick@gitea.io>
TestToUTF8WithFallback is the cause of recurrent spurious test failures
even despite code to set the detected charset order.
The reason why this happens is because the preferred detected charset order
is not being initialised for these tests.
This PR simply ensures that this is set at the start of each test and would
allow different tests to be written to allow differing orders.
Replaces #12571
Close #12571
Signed-off-by: Andrew Thornton <art27@cantab.net>
* Fix chardet test and add ordering option
Signed-off-by: Andrew Thornton <art27@cantab.net>
* minor fixes
Signed-off-by: Andrew Thornton <art27@cantab.net>
* remove log
Signed-off-by: Andrew Thornton <art27@cantab.net>
* remove log2
Signed-off-by: Andrew Thornton <art27@cantab.net>
* only iterate through top results
Signed-off-by: Andrew Thornton <art27@cantab.net>
* Update docs/content/doc/advanced/config-cheat-sheet.en-us.md
* slight restructure of for loop
Signed-off-by: Andrew Thornton <art27@cantab.net>
Co-authored-by: techknowlogick <techknowlogick@gitea.io>
* Convert files to utf-8 for indexing
* Move utf8 functions to modules/base
* Bump repoIndexerLatestVersion to 3
* Add tests for base/encoding.go
* Changes to pass gosimple
* Move UTF8 funcs into new modules/charset package