gitlab-org--gitlab-foss/doc/development/sha1_as_binary.md

1.5 KiB

stage group info
none unassigned To determine the technical writer assigned to the Stage/Group associated with this page, see https://about.gitlab.com/handbook/engineering/ux/technical-writing/#designated-technical-writers

Storing SHA1 Hashes As Binary

Storing SHA1 hashes as strings is not very space efficient. A SHA1 as a string requires at least 40 bytes, an additional byte to store the encoding, and perhaps more space depending on the internals of PostgreSQL.

On the other hand, if one were to store a SHA1 as binary one would only need 20 bytes for the actual SHA1, and 1 or 4 bytes of additional space (again depending on database internals). This means that in the best case scenario we can reduce the space usage by 50%.

To make this easier to work with you can include the concern ShaAttribute into a model and define a SHA attribute using the sha_attribute class method. For example:

class Commit < ActiveRecord::Base
  include ShaAttribute

  sha_attribute :sha
end

This allows you to use the value of the sha attribute as if it were a string, while storing it as binary. This means that you can do something like this, without having to worry about converting data to the right binary format:

commit = Commit.find_by(sha: '88c60307bd1f215095834f09a1a5cb18701ac8ad')
commit.sha = '971604de4cfa324d91c41650fabc129420c8d1cc'
commit.save

There is however one requirement: the column used to store the SHA has must be a binary type. For Rails this means you need to use the :binary type instead of :text or :string.