gitlab_git 10.6.4 relies on Rugged marking blobs as binary or not,
instead of relying on Linguist. Linguist in turn would mark text blobs
as binary whenever they would contain byte sequences that could not be
encoded using UTF-8.
However, marking such blobs as binary is not correct. If one pushes a
Markdown document with invalid character sequences it's still a text
based Markdown document and not some random binary blob.
This commit overwrites Blob#data so it automatically converts text-based
content to UTF-8 (the encoding we use everywhere else) while taking care
of replacing any invalid sequences with the UTF-8 replacement character.
The data of binary blobs is left as-is.
This ensures that SVGs greater than 2 megabytes are not scrubbed and
rendered. This in turn prevents requests from timing out due to
reading/scrubbing large SVGs potentially taking a lot of time (and
memory). The use of 2 megabytes is completely arbitrary.
Fixesgitlab-org/gitlab-ce#1435
Here was the problem:
1. When determining whether a given blob is viewable text, gitlab_git reads the first 1024 bytes and checks with Linguist whether it is a text or binary file.
2. If the blob is text, GitLab will attempt to display it.
3. However, if the text has binary characters after the first 1024 bytes, then GitLab will attempt to load the entire contents, but the encoding will be ASCII-8BIT since there are binary characters.
4. The Error 500 results when GitLab attempts to display a mix UTF-8 and ASCII-8BIT.
To fix this, we load as much data as we are willing to display so that the detection will work properly. Requires
an update to gitlab_git: gitlab-org/gitlab_git!86
Closes#13826
This allows us to take advantage of Rails' `to_partial_path` to render
the correct partial based on the Blob type, rather than cluttering the
view with conditionals.
It also allows (and will allow in the future) better encapsulation for
Blob-related logic which makes sense for our Rails app but might not
make as much sense for the core `gitlab_git` library, such as detecting
if the blob is an SVG.