gitlab-org--gitlab-foss/doc/development/secure_coding_guidelines.md

54 KiB

type stage group info
reference, dev none Development See the Technical Writers assigned to Development Guidelines: https://about.gitlab.com/handbook/engineering/ux/technical-writing/#assignments-to-development-guidelines

Secure Coding Guidelines

This document contains descriptions and guidelines for addressing security vulnerabilities commonly identified in the GitLab codebase. They are intended to help developers identify potential security vulnerabilities early, with the goal of reducing the number of vulnerabilities released over time.

Contributing

If you would like to contribute to one of the existing documents, or add guidelines for a new vulnerability type, please open an MR! Please try to include links to examples of the vulnerability found, and link to any resources used in defined mitigations. If you have questions or when ready for a review, please ping gitlab-com/gl-security/appsec.

Permissions

Description

Application permissions are used to determine who can access what and what actions they can perform. For more information about the permission model at GitLab, please see the GitLab permissions guide or the EE docs on permissions.

Impact

Improper permission handling can have significant impacts on the security of an application. Some situations may reveal sensitive data or allow a malicious actor to perform harmful actions. The overall impact depends heavily on what resources can be accessed or modified improperly.

A common vulnerability when permission checks are missing is called IDOR for Insecure Direct Object References.

When to Consider

Each time you implement a new feature/endpoint, whether it is at UI, API or GraphQL level.

Mitigations

Start by writing tests around permissions: unit and feature specs should both include tests based around permissions

  • Fine-grained, nitty-gritty specs for permissions are good: it is ok to be verbose here
    • Make assertions based on the actors and objects involved: can a user or group or XYZ perform this action on this object?
    • Consider defining them upfront with stakeholders, particularly for the edge cases
  • Do not forget abuse cases: write specs that make sure certain things can't happen
    • A lot of specs are making sure things do happen and coverage percentage doesn't take into account permissions as same piece of code is used.
    • Make assertions that certain actors cannot perform actions
  • Naming convention to ease auditability: to be defined, for example, a subfolder containing those specific permission tests or a #permissions block

Be careful to also test visibility levels and not only project access rights.

Some example of well implemented access controls and tests:

  1. example1
  2. example2
  3. example3

NB: any input from development team is welcome, for example, about Rubocop rules.

Regular Expressions guidelines

Anchors / Multi line

Unlike other programming languages (for example, Perl or Python) Regular Expressions are matching multi-line by default in Ruby. Consider the following example in Python:

import re
text = "foo\nbar"
matches = re.findall("^bar$",text)
print(matches)

The Python example will output an empty array ([]) as the matcher considers the whole string foo\nbar including the newline (\n). In contrast Ruby's Regular Expression engine acts differently:

text = "foo\nbar"
p text.match /^bar$/

The output of this example is #<MatchData "bar">, as Ruby treats the input text line by line. In order to match the whole string the Regex anchors \A and \z should be used.

Impact

This Ruby Regex specialty can have security impact, as often regular expressions are used for validations or to impose restrictions on user-input.

Examples

GitLab-specific examples can be found in the following path traversal and open redirect issues.

Another example would be this fictional Ruby on Rails controller:

class PingController < ApplicationController
  def ping
    if params[:ip] =~ /^\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}$/
      render :text => `ping -c 4 #{params[:ip]}`
    else
      render :text => "Invalid IP"
    end
  end
end

Here params[:ip] should not contain anything else but numbers and dots. However this restriction can be easily bypassed as the Regex anchors ^ and $ are being used. Ultimately this leads to a shell command injection in ping -c 4 #{params[:ip]} by using newlines in params[:ip].

Mitigation

In most cases the anchors \A for beginning of text and \z for end of text should be used instead of ^ and $.

Denial of Service (ReDoS) / Catastrophic Backtracking

When a regular expression (regex) is used to search for a string and can't find a match, it may then backtrack to try other possibilities.

For example when the regex .*!$ matches the string hello!, the .* first matches the entire string but then the ! from the regex is unable to match because the character has already been used. In that case, the Ruby regex engine backtracks one character to allow the ! to match.

ReDoS is an attack in which the attacker knows or controls the regular expression used. The attacker may be able to enter user input that triggers this backtracking behavior in a way that increases execution time by several orders of magnitude.

Impact

The resource, for example Puma, or Sidekiq, can be made to hang as it takes a long time to evaluate the bad regex match. The evaluation time may require manual termination of the resource.

Examples

Here are some GitLab-specific examples.

User inputs used to create regular expressions:

Hardcoded regular expressions with backtracking issues:

Consider the following example application, which defines a check using a regular expression. A user entering user@aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa!.com as the email on a form will hang the web server.

class Email < ApplicationRecord
  DOMAIN_MATCH = Regexp.new('([a-zA-Z0-9]+)+\.com')

  validates :domain_matches

  private

  def domain_matches
    errors.add(:email, 'does not match') if email =~ DOMAIN_MATCH
  end
end

Mitigation

Ruby

GitLab has Gitlab::UntrustedRegexp which internally uses the re2 library. re2 does not support backtracking so we get constant execution time, and a smaller subset of available regex features.

All user-provided regular expressions should use Gitlab::UntrustedRegexp.

For other regular expressions, here are a few guidelines:

  • If there's a clean non-regex solution, such as String#start_with?, consider using it
  • Ruby supports some advanced regex features like atomic groups and possessive quantifiers that eliminate backtracking
  • Avoid nested quantifiers if possible (for example (a+)+)
  • Try to be as precise as possible in your regex and avoid the . if there's an alternative
    • For example, Use _[^_]+_ instead of _.*_ to match _text here_
  • Use reasonable ranges (for example, {1,10}) for repeating patterns instead of unbounded * and + matchers
  • When possible, perform simple input validation such as maximum string length checks before using regular expressions
  • If in doubt, don't hesitate to ping @gitlab-com/gl-security/appsec

Go

Go's regexp package uses re2 and isn't vulnerable to backtracking issues.

Server Side Request Forgery (SSRF)

Description

A Server-side Request Forgery (SSRF) is an attack in which an attacker is able coerce a application into making an outbound request to an unintended resource. This resource is usually internal. In GitLab, the connection most commonly uses HTTP, but an SSRF can be performed with any protocol, such as Redis or SSH.

With an SSRF attack, the UI may or may not show the response. The latter is called a Blind SSRF. While the impact is reduced, it can still be useful for attackers, especially for mapping internal network services as part of recon.

Impact

The impact of an SSRF can vary, depending on what the application server can communicate with, how much the attacker can control of the payload, and if the response is returned back to the attacker. Examples of impact that have been reported to GitLab include:

  • Network mapping of internal services
    • This can help an attacker gather information about internal services that could be used in further attacks. More details.
  • Reading internal services, including cloud service metadata.
    • The latter can be a serious problem, because an attacker can obtain keys that allow control of the victim's cloud infrastructure. (This is also a good reason to give only necessary privileges to the token.). More details.
  • When combined with CRLF vulnerability, remote code execution. More details.

When to Consider

  • When the application makes any outbound connection

Mitigations

In order to mitigate SSRF vulnerabilities, it is necessary to validate the destination of the outgoing request, especially if it includes user-supplied information.

The preferred SSRF mitigations within GitLab are:

  1. Only connect to known, trusted domains/IP addresses.
  2. Use the GitLab::HTTP library
  3. Implement feature-specific mitigations

GitLab HTTP Library

The GitLab::HTTP wrapper library has grown to include mitigations for all of the GitLab-known SSRF vectors. It is also configured to respect the Outbound requests options that allow instance administrators to block all internal connections, or limit the networks to which connections can be made.

In some cases, it has been possible to configure GitLab::HTTP as the HTTP connection library for 3rd-party gems. This is preferable over re-implementing the mitigations for a new feature.

URL blocker & validation libraries

Gitlab::UrlBlocker can be used to validate that a provided URL meets a set of constraints. Importantly, when dns_rebind_protection is true, the method returns a known-safe URI where the hostname has been replaced with an IP address. This prevents DNS rebinding attacks, because the DNS record has been resolved. However, if we ignore this returned value, we will not be protected against DNS rebinding.

This is the case with validators such as the AddressableUrlValidator (called with validates :url, addressable_url: {opts} or public_url: {opts}). Validation errors are only raised when validations are called, for example when a record is created or saved. If we ignore the value returned by the validation when persisting the record, we need to recheck its validity before using it. You can learn more about Time of Check to Time of Use bugs in a later section of these guidelines.

Feature-specific mitigations

There are many tricks to bypass common SSRF validations. If feature-specific mitigations are necessary, they should be reviewed by the AppSec team, or a developer who has worked on SSRF mitigations previously.

For situations in which you can't use an allowlist or GitLab:HTTP, you must implement mitigations directly in the feature. It's best to validate the destination IP addresses themselves, not just domain names, as the attacker can control DNS. Below is a list of mitigations that you should implement.

  • Block connections to all localhost addresses
    • 127.0.0.1/8 (IPv4 - note the subnet mask)
    • ::1 (IPv6)
  • Block connections to networks with private addressing (RFC 1918)
    • 10.0.0.0/8
    • 172.16.0.0/12
    • 192.168.0.0/24
  • Block connections to link-local addresses (RFC 3927)
    • 169.254.0.0/16
    • In particular, for GCP: metadata.google.internal -> 169.254.169.254
  • For HTTP connections: Disable redirects or validate the redirect destination
  • To mitigate DNS rebinding attacks, validate and use the first IP address received.

See url_blocker_spec.rb for examples of SSRF payloads. See time of check to time of use bugs to learn more about DNS rebinding's class of bug.

Don't rely on methods like .start_with? when validating a URL, or make assumptions about which part of a string maps to which part of a URL. Use the URI class to parse the string, and validate each component (scheme, host, port, path, and so on). Attackers can create valid URLs which look safe, but lead to malicious locations.

user_supplied_url = "https://my-safe-site.com@my-evil-site.com" # Content before an @ in a URL is usually for basic authentication
user_supplied_url.start_with?("https://my-safe-site.com")       # Don't trust with start_with? for URLs!
=> true
URI.parse(user_supplied_url).host
=> "my-evil-site.com"

user_supplied_url = "https://my-safe-site.com-my-evil-site.com"
user_supplied_url.start_with?("https://my-safe-site.com")      # Don't trust with start_with? for URLs!
=> true
URI.parse(user_supplied_url).host
=> "my-safe-site.com-my-evil-site.com"

# Here's an example where we unsafely attempt to validate a host while allowing for
# subdomains
user_supplied_url = "https://my-evil-site-my-safe-site.com"
user_supplied_host = URI.parse(user_supplied_url).host
=> "my-evil-site-my-safe-site.com"
user_supplied_host.end_with?("my-safe-site.com")      # Don't trust with end_with?
=> true

XSS guidelines

Description

Cross site scripting (XSS) is an issue where malicious JavaScript code gets injected into a trusted web application and executed in a client's browser. The input is intended to be data, but instead gets treated as code by the browser.

XSS issues are commonly classified in three categories, by their delivery method:

Impact

The injected client-side code is executed on the victim's browser in the context of their current session. This means the attacker could perform any same action the victim would normally be able to do through a browser. The attacker would also have the ability to:

Much of the impact is contingent upon the function of the application and the capabilities of the victim's session. For further impact possibilities, please check out the beef project.

For a demonstration of the impact on GitLab with a realistic attack scenario, see this video on the GitLab Unfiltered channel (internal, it requires being logged in with the GitLab Unfiltered account).

When to consider?

When user submitted data is included in responses to end users, which is just about anywhere.

Mitigation

In most situations, a two-step solution can be used: input validation and output encoding in the appropriate context.

Input validation

Setting expectations

For any and all input fields, ensure to define expectations on the type/format of input, the contents, size limits, the context in which it will be output. It's important to work with both security and product teams to determine what is considered acceptable input.

Validate input
  • Treat all user input as untrusted.
  • Based on the expectations you defined above:
    • Validate the input size limits.
    • Validate the input using an allowlist approach to only allow characters through which you are expecting to receive for the field.
      • Input which fails validation should be rejected, and not sanitized.
  • When adding redirects or links to a user-controlled URL, ensure that the scheme is HTTP or HTTPS. Allowing other schemes like javascript:// can lead to XSS and other security issues.

Note that denylists should be avoided, as it is near impossible to block all variations of XSS.

Output encoding

Once you've determined when and where the user submitted data will be output, it's important to encode it based on the appropriate context. For example:

Additional information

XSS mitigation and prevention in Rails

By default, Rails automatically escapes strings when they are inserted into HTML templates. Avoid the methods used to keep Rails from escaping strings, especially those related to user-controlled values. Specifically, the following options are dangerous because they mark strings as trusted and safe:

Method Avoid these options
HAML templates html_safe, raw, !=
Embedded Ruby (ERB) html_safe, raw, <%== %>

In case you want to sanitize user-controlled values against XSS vulnerabilities, you can use ActionView::Helpers::SanitizeHelper. Calling link_to and redirect_to with user-controlled parameters can also lead to cross-site scripting.

Do also sanitize and validate URL schemes.

References:

XSS mitigation and prevention in JavaScript and Vue

  • When updating the content of an HTML element using JavaScript, mark user-controlled values as textContent or nodeValue instead of innerHTML.
  • Avoid using v-html with user-controlled data, use v-safe-html instead.
  • Render unsafe or unsanitized content using dompurify.
  • Consider using gl-sprintf to interpolate translated strings securely.
  • Avoid __() with translations that contain user-controlled values.
  • When working with postMessage, ensure the origin of the message is allowlisted.
  • Consider using the Safe Link Directive to generate secure hyperlinks by default.

GitLab specific libraries for mitigating XSS

Vue

Content Security Policy

Free form input field

Select examples of past XSS issues affecting GitLab

Internal Developer Training

Path Traversal guidelines

Description

Path Traversal vulnerabilities grant attackers access to arbitrary directories and files on the server that is executing an application, including data, code or credentials.

Impact

Path Traversal attacks can lead to multiple critical and high severity issues, like arbitrary file read, remote code execution or information disclosure.

When to consider

When working with user-controlled filenames/paths and file system APIs.

Mitigation and prevention

In order to prevent Path Traversal vulnerabilities, user-controlled filenames or paths should be validated before being processed.

  • Comparing user input against an allowlist of allowed values or verifying that it only contains allowed characters.
  • After validating the user supplied input, it should be appended to the base directory and the path should be canonicalized using the file system API.

GitLab specific validations

The methods Gitlab::Utils.check_path_traversal!() and Gitlab::Utils.check_allowed_absolute_path!() can be used to validate user-supplied paths and prevent vulnerabilities. check_path_traversal!() will detect their Path Traversal payloads and accepts URL-encoded paths. check_allowed_absolute_path!() will check if a path is absolute and whether it is inside the allowed path list. By default, absolute paths are not allowed, so you need to pass a list of allowed absolute paths to the path_allowlist parameter when using check_allowed_absolute_path!().

To use a combination of both checks, follow the example below:

Gitlab::Utils.check_allowed_absolute_path_and_path_traversal!(path, path_allowlist)

In the REST API, we have the FilePath validator that can be used to perform the checking on any file path argument the endpoints have. It can be used as follows:

requires :file_path, type: String, file_path: { allowlist: ['/foo/bar/', '/home/foo/', '/app/home'] }

The Path Traversal check can also be used to forbid any absolute path:

requires :file_path, type: String, file_path: true

Absolute paths are not allowed by default. If allowing an absolute path is required, you need to provide an array of paths to the parameter allowlist.

OS command injection guidelines

Command injection is an issue in which an attacker is able to execute arbitrary commands on the host operating system through a vulnerable application. Such attacks don't always provide feedback to a user, but the attacker can use simple commands like curl to obtain an answer.

Impact

The impact of command injection greatly depends on the user context running the commands, as well as how data is validated and sanitized. It can vary from low impact because the user running the injected commands has limited rights, to critical impact if running as the root user.

Potential impacts include:

  • Execution of arbitrary commands on the host machine.
  • Unauthorized access to sensitive data, including passwords and tokens in secrets or configuration files.
  • Exposure of sensitive system files on the host machine, such as /etc/passwd/ or /etc/shadow.
  • Compromise of related systems and services gained through access to the host machine.

You should be aware of and take steps to prevent command injection when working with user-controlled data that are used to run OS commands.

Mitigation and prevention

To prevent OS command injections, user-supplied data shouldn't be used within OS commands. In cases where you can't avoid this:

  • Validate user-supplied data against an allowlist.
  • Ensure that user-supplied data only contains alphanumeric characters (and no syntax or whitespace characters, for example).
  • Always use -- to separate options from arguments.

Ruby

Consider using system("command", "arg0", "arg1", ...) whenever you can. This prevents an attacker from concatenating commands.

For more examples on how to use shell commands securely, consult Guidelines for shell commands in the GitLab codebase. It contains various examples on how to securely call OS commands.

Go

Go has built-in protections that usually prevent an attacker from successfully injecting OS commands.

Consider the following example:

package main

import (
  "fmt"
  "os/exec"
)

func main() {
  cmd := exec.Command("echo", "1; cat /etc/passwd")
  out, _ := cmd.Output()
  fmt.Printf("%s", out)
}

This echoes "1; cat /etc/passwd".

Do not use sh, as it bypasses internal protections:

out, _ = exec.Command("sh", "-c", "echo 1 | cat /etc/passwd").Output()

This outputs 1 followed by the content of /etc/passwd.

General recommendations

As we have moved away from supporting TLS 1.0 and 1.1, you must use TLS 1.2 and above.

Ciphers

We recommend using the ciphers that Mozilla is providing in their recommended SSL configuration generator for TLS 1.2:

  • ECDHE-ECDSA-AES128-GCM-SHA256
  • ECDHE-RSA-AES128-GCM-SHA256
  • ECDHE-ECDSA-AES256-GCM-SHA384
  • ECDHE-RSA-AES256-GCM-SHA384
  • ECDHE-ECDSA-CHACHA20-POLY1305
  • ECDHE-RSA-CHACHA20-POLY1305

And the following cipher suites (according to the RFC 8446) for TLS 1.3:

  • TLS_AES_128_GCM_SHA256
  • TLS_AES_256_GCM_SHA384
  • TLS_CHACHA20_POLY1305_SHA256

Note: Golang does not support all cipher suites with TLS 1.3.

Implementation examples
TLS 1.3

For TLS 1.3, Golang only supports 3 cipher suites, as such we only need to set the TLS version:

cfg := &tls.Config{
    MinVersion: tls.VersionTLS13,
}

For Ruby, you can use HTTParty and specify TLS 1.3 version as well as ciphers:

Whenever possible this example should be avoided for security purposes:

response = HTTParty.get('https://gitlab.com', ssl_version: :TLSv1_3, ciphers: ['TLS_AES_128_GCM_SHA256', 'TLS_AES_256_GCM_SHA384', 'TLS_CHACHA20_POLY1305_SHA256'])

When using GitLab::HTTP, the code looks like:

This is the recommended implementation to avoid security issues such as SSRF:

response = GitLab::HTTP.perform_request(Net::HTTP::Get, 'https://gitlab.com', ssl_version: :TLSv1_3, ciphers: ['TLS_AES_128_GCM_SHA256', 'TLS_AES_256_GCM_SHA384', 'TLS_CHACHA20_POLY1305_SHA256'])
TLS 1.2

Golang does support multiple cipher suites that we do not want to use with TLS 1.2. We need to explicitly list authorized ciphers:

func secureCipherSuites() []uint16 {
  return []uint16{
    tls.TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256,
    tls.TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256,
    tls.TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384,
    tls.TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384,
    tls.TLS_ECDHE_ECDSA_WITH_CHACHA20_POLY1305,
    tls.TLS_ECDHE_RSA_WITH_CHACHA20_POLY1305,
  }

And then use secureCipherSuites() in tls.Config:

tls.Config{
  (...),
  CipherSuites: secureCipherSuites(),
  MinVersion:   tls.VersionTLS12,
  (...),
}

This example was taken here.

For Ruby, you can use again HTTParty and specify this time TLS 1.2 version alongside with the recommended ciphers:

response = GitLab::HTTP.perform_request(Net::HTTP::Get, 'https://gitlab.com', ssl_version: :TLSv1_2, ciphers: ['ECDHE-ECDSA-AES128-GCM-SHA256', 'ECDHE-RSA-AES128-GCM-SHA256', 'ECDHE-ECDSA-AES256-GCM-SHA384', 'ECDHE-RSA-AES256-GCM-SHA384', 'ECDHE-ECDSA-CHACHA20-POLY1305', 'ECDHE-RSA-CHACHA20-POLY1305'])

GitLab Internal Authorization

Introduction

There are some cases where users passed in the code is actually referring to a DeployToken/DeployKey entity instead of a real User, because of the code below in /lib/api/api_guard.rb

      def find_user_from_sources
        strong_memoize(:find_user_from_sources) do
          deploy_token_from_request ||
            find_user_from_bearer_token ||
            find_user_from_job_token ||
            user_from_warden
        end
      end

Past Vulnerable Code

In some scenarios such as this one, user impersonation is possible because a DeployToken ID can be used in place of a User ID. This happened because there was no check on the line with Gitlab::Auth::CurrentUserMode.bypass_session!(user.id). In this case, the id is actually a DeployToken ID instead of a User ID.

      def find_current_user!
        user = find_user_from_sources
        return unless user

        # Sessions are enforced to be unavailable for API calls, so ignore them for admin mode
        Gitlab::Auth::CurrentUserMode.bypass_session!(user.id) if Gitlab::CurrentSettings.admin_mode

        unless api_access_allowed?(user)
          forbidden!(api_access_denied_message(user))
        end

Best Practices

In order to prevent this from happening, it is recommended to use the method user.is_a?(User) to make sure it returns true when we are expecting to deal with a User object. This could prevent the ID confusion from the method find_user_from_sources mentioned above. Below code snippet shows the fixed code after applying the best practice to the vulnerable code above.

      def find_current_user!
        user = find_user_from_sources
        return unless user

        if user.is_a?(User) && Gitlab::CurrentSettings.admin_mode
          # Sessions are enforced to be unavailable for API calls, so ignore them for admin mode
          Gitlab::Auth::CurrentUserMode.bypass_session!(user.id)
        end

        unless api_access_allowed?(user)
          forbidden!(api_access_denied_message(user))
        end

Guidelines when defining missing methods with metaprogramming

Metaprogramming is a way to define methods at runtime, instead of at the time of writing and deploying the code. It is a powerful tool, but can be dangerous if we allow untrusted actors (like users) to define their own arbitrary methods. For example, imagine we accidentally let an attacker overwrite an access control method to always return true! It can lead to many classes of vulnerabilities such as access control bypass, information disclosure, arbitrary file reads, and remote code execution.

Key methods to watch out for are method_missing, define_method, delegate, and similar methods.

Insecure metaprogramming example

This example is adapted from an example submitted by @jobert through our HackerOne bug bounty program. Thank you for your contribution!

Before Ruby 2.5.1, you could implement delegators using the delegate or method_missing methods. For example:

class User
  def initialize(attributes)
    @options = OpenStruct.new(attributes)
  end

  def is_admin?
    name.eql?("Sid") # Note - never do this!
  end

  def method_missing(method, *args)
    @options.send(method, *args)
  end
end

When a method was called on a User instance that didn't exist, it passed it along to the @options instance variable.

User.new({name: "Jeeves"}).is_admin?
# => false

User.new(name: "Sid").is_admin?
# => true

User.new(name: "Jeeves", "is_admin?" => true).is_admin?
# => false

Because the is_admin? method is already defined on the class, its behavior is not overridden when passing is_admin? to the initializer.

This class can be refactored to use the Forwardable method and def_delegators:

class User
  extend Forwardable

  def initialize(attributes)
    @options = OpenStruct.new(attributes)

    self.class.instance_eval do
      def_delegators :@options, *attributes.keys
    end
  end

  def is_admin?
    name.eql?("Sid") # Note - never do this!
  end
end

It might seem like this example has the same behavior as the first code example. However, there's one crucial difference: because the delegators are meta-programmed after the class is loaded, it can overwrite existing methods:

User.new({name: "Jeeves"}).is_admin?
# => false

User.new(name: "Sid").is_admin?
# => true

User.new(name: "Jeeves", "is_admin?" => true).is_admin?
# => true
#     ^------------------ The method is overwritten! Sneaky Jeeves!

In the example above, the is_admin? method is overwritten when passing it to the initializer.

Best practices

  • Never pass user-provided details into method-defining metaprogramming methods.
    • If you must, be very confident that you've sanitized the values correctly. Consider creating an allowlist of values, and validating the user input against that.
  • When extending classes that use metaprogramming, make sure you don't inadvertently override any method definition safety checks.

Working with archive files

Working with archive files like zip, tar, jar, war, cpio, apk, rar and 7z presents an area where potentially critical security vulnerabilities can sneak into an application.

Zip Slip

In 2018, the security company Snyk released a blog post describing research into a widespread and critical vulnerability present in many libraries and applications which allows an attacker to overwrite arbitrary files on the server file system which, in many cases, can be leveraged to achieve remote code execution. The vulnerability was dubbed Zip Slip.

A Zip Slip vulnerability happens when an application extracts an archive without validating and sanitizing the filenames inside the archive for directory traversal sequences that change the file location when the file is extracted.

Example malicious file names:

  • ../../etc/passwd
  • ../../root/.ssh/authorized_keys
  • ../../etc/gitlab/gitlab.rb

If a vulnerable application extracts an archive file with any of these file names, the attacker can overwrite these files with arbitrary content.

Insecure archive extraction examples

Ruby

For zip files, the rubyzip Ruby gem is already patched against the Zip Slip vulnerability and will refuse to extract files that try to perform directory traversal, so for this vulnerable example we will extract a tar.gz file with Gem::Package::TarReader:

# Vulnerable tar.gz extraction example!

begin
  tar_extract = Gem::Package::TarReader.new(Zlib::GzipReader.open("/tmp/uploaded.tar.gz"))
rescue Errno::ENOENT
  STDERR.puts("archive file does not exist or is not readable")
  exit(false)
end
tar_extract.rewind

tar_extract.each do |entry|
  next unless entry.file? # Only process files in this example for simplicity.

  destination = "/tmp/extracted/#{entry.full_name}" # Oops! We blindly use the entry file name for the destination.
  File.open(destination, "wb") do |out|
    out.write(entry.read)
  end
end

Go

// unzip INSECURELY extracts source zip file to destination.
func unzip(src, dest string) error {
  r, err := zip.OpenReader(src)
  if err != nil {
    return err
  }
  defer r.Close()

  os.MkdirAll(dest, 0750)

  for _, f := range r.File {
    if f.FileInfo().IsDir() { // Skip directories in this example for simplicity.
      continue
    }

    rc, err := f.Open()
    if err != nil {
      return err
    }
    defer rc.Close()

    path := filepath.Join(dest, f.Name) // Oops! We blindly use the entry file name for the destination.
    os.MkdirAll(filepath.Dir(path), f.Mode())
    f, err := os.OpenFile(path, os.O_WRONLY|os.O_CREATE|os.O_TRUNC, f.Mode())
    if err != nil {
      return err
    }
    defer f.Close()

    if _, err := io.Copy(f, rc); err != nil {
      return err
    }
  }

  return nil
}

Best practices

Always expand the destination file path by resolving all potential directory traversals and other sequences that can alter the path and refuse extraction if the final destination path does not start with the intended destination directory.

Ruby
# tar.gz extraction example with protection against Zip Slip attacks.

begin
  tar_extract = Gem::Package::TarReader.new(Zlib::GzipReader.open("/tmp/uploaded.tar.gz"))
rescue Errno::ENOENT
  STDERR.puts("archive file does not exist or is not readable")
  exit(false)
end
tar_extract.rewind

tar_extract.each do |entry|
  next unless entry.file? # Only process files in this example for simplicity.

  # safe_destination will raise an exception in case of Zip Slip / directory traversal.
  destination = safe_destination(entry.full_name, "/tmp/extracted")

  File.open(destination, "wb") do |out|
    out.write(entry.read)
  end
end

def safe_destination(filename, destination_dir)
  raise "filename cannot start with '/'" if filename.start_with?("/")

  destination_dir = File.realpath(destination_dir)
  destination = File.expand_path(filename, destination_dir)

  raise "filename is outside of destination directory" unless
    destination.start_with?(destination_dir + "/"))

  destination
end
# zip extraction example using rubyzip with built-in protection against Zip Slip attacks.
require 'zip'

Zip::File.open("/tmp/uploaded.zip") do |zip_file|
  zip_file.each do |entry|
    # Extract entry to /tmp/extracted directory.
    entry.extract("/tmp/extracted")
  end
end
Go

You are encouraged to use the secure archive utilities provided by LabSec which will handle Zip Slip and other types of vulnerabilities for you. The LabSec utilities are also context aware which makes it possible to cancel or timeout extractions:

package main

import "gitlab-com/gl-security/appsec/labsec/archive/zip"

func main() {
  f, err := os.Open("/tmp/uploaded.zip")
  if err != nil {
    panic(err)
  }
  defer f.Close()

  fi, err := f.Stat()
  if err != nil {
    panic(err)
  }

  if err := zip.Extract(context.Background(), f, fi.Size(), "/tmp/extracted"); err != nil {
    panic(err)
  }
}

In case the LabSec utilities do not fit your needs, here is an example for extracting a zip file with protection against Zip Slip attacks:

// unzip extracts source zip file to destination with protection against Zip Slip attacks.
func unzip(src, dest string) error {
  r, err := zip.OpenReader(src)
  if err != nil {
    return err
  }
  defer r.Close()

  os.MkdirAll(dest, 0750)

  for _, f := range r.File {
    if f.FileInfo().IsDir() { // Skip directories in this example for simplicity.
      continue
    }

    rc, err := f.Open()
    if err != nil {
      return err
    }
    defer rc.Close()

    path := filepath.Join(dest, f.Name)

    // Check for Zip Slip / directory traversal
    if !strings.HasPrefix(path, filepath.Clean(dest) + string(os.PathSeparator)) {
      return fmt.Errorf("illegal file path: %s", path)
    }

    os.MkdirAll(filepath.Dir(path), f.Mode())
    f, err := os.OpenFile(path, os.O_WRONLY|os.O_CREATE|os.O_TRUNC, f.Mode())
    if err != nil {
      return err
    }
    defer f.Close()

    if _, err := io.Copy(f, rc); err != nil {
      return err
    }
  }

  return nil
}

Symlink attacks makes it possible for an attacker to read the contents of arbitrary files on the server of a vulnerable application. While it is a high-severity vulnerability that can often lead to remote code execution and other critical vulnerabilities, it is only exploitable in scenarios where a vulnerable application accepts archive files from the attacker and somehow displays the extracted contents back to the attacker without any validation or sanitization of symbolic links inside the archive.

Ruby

For zip files, the rubyzip Ruby gem is already patched against symlink attacks as it simply ignores symbolic links, so for this vulnerable example we will extract a tar.gz file with Gem::Package::TarReader:

# Vulnerable tar.gz extraction example!

begin
  tar_extract = Gem::Package::TarReader.new(Zlib::GzipReader.open("/tmp/uploaded.tar.gz"))
rescue Errno::ENOENT
  STDERR.puts("archive file does not exist or is not readable")
  exit(false)
end
tar_extract.rewind

# Loop over each entry and output file contents
tar_extract.each do |entry|
  next if entry.directory?

  # Oops! We don't check if the file is actually a symbolic link to a potentially sensitive file.
  puts entry.read
end

Go

// printZipContents INSECURELY prints contents of files in a zip file.
func printZipContents(src string) error {
  r, err := zip.OpenReader(src)
  if err != nil {
    return err
  }
  defer r.Close()

  // Loop over each entry and output file contents
  for _, f := range r.File {
    if f.FileInfo().IsDir() {
      continue
    }

    rc, err := f.Open()
    if err != nil {
      return err
    }
    defer rc.Close()

    // Oops! We don't check if the file is actually a symbolic link to a potentially sensitive file.
    buf, err := ioutil.ReadAll(rc)
    if err != nil {
      return err
    }

    fmt.Println(buf.String())
  }

  return nil
}

Best practices

Always check the type of the archive entry before reading the contents and ignore entries that are not plain files. If you absolutely must support symbolic links, ensure that they only point to files inside the archive and nowhere else.

Ruby
# tar.gz extraction example with protection against symlink attacks.

begin
  tar_extract = Gem::Package::TarReader.new(Zlib::GzipReader.open("/tmp/uploaded.tar.gz"))
rescue Errno::ENOENT
  STDERR.puts("archive file does not exist or is not readable")
  exit(false)
end
tar_extract.rewind

# Loop over each entry and output file contents
tar_extract.each do |entry|
  next if entry.directory?

  # By skipping symbolic links entirely, we are sure they can't cause any trouble!
  next if entry.symlink?

  puts entry.read
end
Go

You are encouraged to use the secure archive utilities provided by LabSec which will handle Zip Slip and symlink vulnerabilities for you. The LabSec utilities are also context aware which makes it possible to cancel or timeout extractions.

In case the LabSec utilities do not fit your needs, here is an example for extracting a zip file with protection against symlink attacks:

// printZipContents prints contents of files in a zip file with protection against symlink attacks.
func printZipContents(src string) error {
  r, err := zip.OpenReader(src)
  if err != nil {
    return err
  }
  defer r.Close()

  // Loop over each entry and output file contents
  for _, f := range r.File {
    if f.FileInfo().IsDir() {
      continue
    }

    // By skipping all irregular file types (including symbolic links), we are sure they can't cause any trouble!
    if !zf.Mode().IsRegular() {
      continue
    }

    rc, err := f.Open()
    if err != nil {
      return err
    }
    defer rc.Close()

    buf, err := ioutil.ReadAll(rc)
    if err != nil {
      return err
    }

    fmt.Println(buf.String())
  }

  return nil
}

Time of check to time of use bugs

Time of check to time of use, or TOCTOU, is a class of error which occur when the state of something changes unexpectedly partway during a process. More specifically, it's when the property you checked and validated has changed when you finally get around to using that property.

These types of bugs are often seen in environments which allow multi-threading and concurrency, like filesystems and distributed web applications; these are a type of race condition. TOCTOU also occurs when state is checked and stored, then after a period of time that state is relied on without re-checking its accuracy and/or validity.

Examples

Example 1: you have a model which accepts a URL as input. When the model is created you verify that the URL's host resolves to a public IP address, to prevent attackers making internal network calls. But DNS records can change (DNS rebinding]). An attacker updates the DNS record to 127.0.0.1, and when your code resolves those URL's host it results in sending a potentially malicious request to a server on the internal network. The property was valid at the "time of check", but invalid and malicious at "time of use".

GitLab-specific example can be found in this issue where, although Gitlab::UrlBlocker.validate! was called, the returned value was not used. This made it vulnerable to TOCTOU bug and SSRF protection bypass through DNS rebinding. The fix was to use the validated IP address.

Example 2: you have a feature which schedules jobs. When the user schedules the job, they have permission to do so. But imagine if, between the time they schedule the job and the time it is run, their permissions are restricted. Unless you re-check permissions at time of use, you could inadvertently allow unauthorized activity.

Example 3: you need to fetch a remote file, and perform a HEAD request to get and validate the content length and content type. When you subsequently make a GET request, though, the file delivered is a different size or different file type. (This is stretching the definition of TOCTOU, but things have changed between time of check and time of use).

Example 4: you allow users to upvote a comment if they haven't already. The server is multi-threaded, and you aren't using transactions or an applicable database index. By repeatedly clicking upvote in quick succession a malicious user is able to add multiple upvotes: the requests arrive at the same time, the checks run in parallel and confirm that no upvote exists yet, and so each upvote is written to the database.

Here's some pseudocode showing an example of a potential TOCTOU bug:

def upvote(comment, user)
  # The time between calling .exists? and .create can lead to TOCTOU,
  # particularly if .create is a slow method, or runs in a background job
  if Upvote.exists?(comment: comment, user: user)
    return
  else
    Upvote.create(comment: comment, user: user)
  end
end

Prevention & defense

  • Assume values will change between the time you validate them and the time you use them.
  • Perform checks as close to execution time as possible.
  • Perform checks after your operation completes.
  • Use your framework's validations and database features to impose constraints and atomic reads and writes.
  • Read about Server Side Request Forgery (SSRF) and DNS rebinding

An example of well implemented Gitlab::UrlBlocker.validate! call that prevents TOCTOU bug:

  1. Preventing DNS rebinding in Gitea importer

Resources