gitlab-org--gitlab-foss/app/services/projects/detect_repository_languages_service.rb

# frozen_string_literal: true

module Projects
  class DetectRepositoryLanguagesService < BaseService
    attr_reader :programming_languages

    # rubocop: disable CodeReuse/ActiveRecord
    def execute
      repository_languages = project.repository_languages
      detection = Gitlab::LanguageDetection.new(repository, repository_languages)

      matching_programming_languages = ensure_programming_languages(detection)

      RepositoryLanguage.transaction do
        project.repository_languages.where(programming_language_id: detection.deletions).delete_all

        detection.updates.each do |update|
          RepositoryLanguage
            .where(project_id: project.id)
            .where(programming_language_id: update[:programming_language_id])
            .update_all(share: update[:share])
        end

        Gitlab::Database.bulk_insert(
          RepositoryLanguage.table_name,
          detection.insertions(matching_programming_languages)
        )

        set_detected_repository_languages
      end

      project.repository_languages.reload
    end
    # rubocop: enable CodeReuse/ActiveRecord

    private

    # rubocop: disable CodeReuse/ActiveRecord
    def ensure_programming_languages(detection)
      existing_languages = ProgrammingLanguage.where(name: detection.languages)
      return existing_languages if detection.languages.size == existing_languages.size

      missing_languages = detection.languages - existing_languages.map(&:name)
      created_languages = missing_languages.map do |name|
        create_language(name, detection.language_color(name))
      end

      existing_languages + created_languages
    end
    # rubocop: enable CodeReuse/ActiveRecord

    # rubocop: disable CodeReuse/ActiveRecord
    def create_language(name, color)
      ProgrammingLanguage.transaction do
        ProgrammingLanguage.where(name: name).first_or_create(color: color)
      end
    rescue ActiveRecord::RecordNotUnique
      retry
    end
    # rubocop: enable CodeReuse/ActiveRecord

    def set_detected_repository_languages
      return if project.detected_repository_languages?

      project.update_column(:detected_repository_languages, true)
    end
  end
end
Enable frozen string in vestigial app files Partially addresses #47424. 2018-08-11 03:00:39 -04:00			`# frozen_string_literal: true`

Add repository languages for projects Our friends at GitHub show the programming languages for a long time, and inspired by that this commit means to create about the same functionality. Language detection is done through Linguist, as before, where the difference is that we cache the result in the database. Also, Gitaly can incrementaly scan a repository. This is done through a shell out, which creates overhead of about 3s each run. For now this won't be improved. Scans are triggered by pushed to the default branch, usually `master`. However, one exception to this rule the charts page. If we're requesting this expensive data anyway, we just cache it in the database. Edge cases where there is no repository, or its empty are caught in the Repository model. This makes use of Redis caching, which is probably already loaded. The added model is called RepositoryLanguage, which will make it harder if/when GitLab supports multiple repositories per project. However, for now I think this shouldn't be a concern. Also, Language could be confused with the i18n languages and felt like the current name was suiteable too. Design of the Project#Show page is done with help from @dimitrieh. This change is not visible to the end user unless detections are done. 2018-06-06 07:10:59 -04:00			`module Projects`
			`class DetectRepositoryLanguagesService < BaseService`
Return cached languages if they've been detected before 2019-03-20 13:23:23 -04:00			`attr_reader :programming_languages`
Add repository languages for projects Our friends at GitHub show the programming languages for a long time, and inspired by that this commit means to create about the same functionality. Language detection is done through Linguist, as before, where the difference is that we cache the result in the database. Also, Gitaly can incrementaly scan a repository. This is done through a shell out, which creates overhead of about 3s each run. For now this won't be improved. Scans are triggered by pushed to the default branch, usually `master`. However, one exception to this rule the charts page. If we're requesting this expensive data anyway, we just cache it in the database. Edge cases where there is no repository, or its empty are caught in the Repository model. This makes use of Redis caching, which is probably already loaded. The added model is called RepositoryLanguage, which will make it harder if/when GitLab supports multiple repositories per project. However, for now I think this shouldn't be a concern. Also, Language could be confused with the i18n languages and felt like the current name was suiteable too. Design of the Project#Show page is done with help from @dimitrieh. This change is not visible to the end user unless detections are done. 2018-06-06 07:10:59 -04:00
Disable existing offenses for the CodeReuse cops This whitelists all existing offenses for the various CodeReuse cops, of which most are triggered by the CodeReuse/ActiveRecord cop. 2018-08-27 11:31:01 -04:00			`# rubocop: disable CodeReuse/ActiveRecord`
Add repository languages for projects Our friends at GitHub show the programming languages for a long time, and inspired by that this commit means to create about the same functionality. Language detection is done through Linguist, as before, where the difference is that we cache the result in the database. Also, Gitaly can incrementaly scan a repository. This is done through a shell out, which creates overhead of about 3s each run. For now this won't be improved. Scans are triggered by pushed to the default branch, usually `master`. However, one exception to this rule the charts page. If we're requesting this expensive data anyway, we just cache it in the database. Edge cases where there is no repository, or its empty are caught in the Repository model. This makes use of Redis caching, which is probably already loaded. The added model is called RepositoryLanguage, which will make it harder if/when GitLab supports multiple repositories per project. However, for now I think this shouldn't be a concern. Also, Language could be confused with the i18n languages and felt like the current name was suiteable too. Design of the Project#Show page is done with help from @dimitrieh. This change is not visible to the end user unless detections are done. 2018-06-06 07:10:59 -04:00			`def execute`
			`repository_languages = project.repository_languages`
			`detection = Gitlab::LanguageDetection.new(repository, repository_languages)`

			`matching_programming_languages = ensure_programming_languages(detection)`

			`RepositoryLanguage.transaction do`
			`project.repository_languages.where(programming_language_id: detection.deletions).delete_all`

			`detection.updates.each do \|update\|`
			`RepositoryLanguage`
			`.where(project_id: project.id)`
			`.where(programming_language_id: update[:programming_language_id])`
Update query simplification Rails 5 didn't like the arel usage, see: https://gitlab.com/gitlab-org/gitlab-ce/issues/49873#note_92040225 This change makes that right, but also makes the query nicer. I'm not sure anymore why it didn't work before, however there were issues with it that have been resolved. 2018-08-02 08:09:49 -04:00			`.update_all(share: update[:share])`
Add repository languages for projects Our friends at GitHub show the programming languages for a long time, and inspired by that this commit means to create about the same functionality. Language detection is done through Linguist, as before, where the difference is that we cache the result in the database. Also, Gitaly can incrementaly scan a repository. This is done through a shell out, which creates overhead of about 3s each run. For now this won't be improved. Scans are triggered by pushed to the default branch, usually `master`. However, one exception to this rule the charts page. If we're requesting this expensive data anyway, we just cache it in the database. Edge cases where there is no repository, or its empty are caught in the Repository model. This makes use of Redis caching, which is probably already loaded. The added model is called RepositoryLanguage, which will make it harder if/when GitLab supports multiple repositories per project. However, for now I think this shouldn't be a concern. Also, Language could be confused with the i18n languages and felt like the current name was suiteable too. Design of the Project#Show page is done with help from @dimitrieh. This change is not visible to the end user unless detections are done. 2018-06-06 07:10:59 -04:00			`end`

			`Gitlab::Database.bulk_insert(`
			`RepositoryLanguage.table_name,`
			`detection.insertions(matching_programming_languages)`
			`)`
Return cached languages if they've been detected before 2019-03-20 13:23:23 -04:00
			`set_detected_repository_languages`
Add repository languages for projects Our friends at GitHub show the programming languages for a long time, and inspired by that this commit means to create about the same functionality. Language detection is done through Linguist, as before, where the difference is that we cache the result in the database. Also, Gitaly can incrementaly scan a repository. This is done through a shell out, which creates overhead of about 3s each run. For now this won't be improved. Scans are triggered by pushed to the default branch, usually `master`. However, one exception to this rule the charts page. If we're requesting this expensive data anyway, we just cache it in the database. Edge cases where there is no repository, or its empty are caught in the Repository model. This makes use of Redis caching, which is probably already loaded. The added model is called RepositoryLanguage, which will make it harder if/when GitLab supports multiple repositories per project. However, for now I think this shouldn't be a concern. Also, Language could be confused with the i18n languages and felt like the current name was suiteable too. Design of the Project#Show page is done with help from @dimitrieh. This change is not visible to the end user unless detections are done. 2018-06-06 07:10:59 -04:00			`end`

			`project.repository_languages.reload`
			`end`
Disable existing offenses for the CodeReuse cops This whitelists all existing offenses for the various CodeReuse cops, of which most are triggered by the CodeReuse/ActiveRecord cop. 2018-08-27 11:31:01 -04:00			`# rubocop: enable CodeReuse/ActiveRecord`
Add repository languages for projects Our friends at GitHub show the programming languages for a long time, and inspired by that this commit means to create about the same functionality. Language detection is done through Linguist, as before, where the difference is that we cache the result in the database. Also, Gitaly can incrementaly scan a repository. This is done through a shell out, which creates overhead of about 3s each run. For now this won't be improved. Scans are triggered by pushed to the default branch, usually `master`. However, one exception to this rule the charts page. If we're requesting this expensive data anyway, we just cache it in the database. Edge cases where there is no repository, or its empty are caught in the Repository model. This makes use of Redis caching, which is probably already loaded. The added model is called RepositoryLanguage, which will make it harder if/when GitLab supports multiple repositories per project. However, for now I think this shouldn't be a concern. Also, Language could be confused with the i18n languages and felt like the current name was suiteable too. Design of the Project#Show page is done with help from @dimitrieh. This change is not visible to the end user unless detections are done. 2018-06-06 07:10:59 -04:00
			`private`

Disable existing offenses for the CodeReuse cops This whitelists all existing offenses for the various CodeReuse cops, of which most are triggered by the CodeReuse/ActiveRecord cop. 2018-08-27 11:31:01 -04:00			`# rubocop: disable CodeReuse/ActiveRecord`
Add repository languages for projects Our friends at GitHub show the programming languages for a long time, and inspired by that this commit means to create about the same functionality. Language detection is done through Linguist, as before, where the difference is that we cache the result in the database. Also, Gitaly can incrementaly scan a repository. This is done through a shell out, which creates overhead of about 3s each run. For now this won't be improved. Scans are triggered by pushed to the default branch, usually `master`. However, one exception to this rule the charts page. If we're requesting this expensive data anyway, we just cache it in the database. Edge cases where there is no repository, or its empty are caught in the Repository model. This makes use of Redis caching, which is probably already loaded. The added model is called RepositoryLanguage, which will make it harder if/when GitLab supports multiple repositories per project. However, for now I think this shouldn't be a concern. Also, Language could be confused with the i18n languages and felt like the current name was suiteable too. Design of the Project#Show page is done with help from @dimitrieh. This change is not visible to the end user unless detections are done. 2018-06-06 07:10:59 -04:00			`def ensure_programming_languages(detection)`
			`existing_languages = ProgrammingLanguage.where(name: detection.languages)`
			`return existing_languages if detection.languages.size == existing_languages.size`

			`missing_languages = detection.languages - existing_languages.map(&:name)`
			`created_languages = missing_languages.map do \|name\|`
			`create_language(name, detection.language_color(name))`
			`end`

			`existing_languages + created_languages`
			`end`
Disable existing offenses for the CodeReuse cops This whitelists all existing offenses for the various CodeReuse cops, of which most are triggered by the CodeReuse/ActiveRecord cop. 2018-08-27 11:31:01 -04:00			`# rubocop: enable CodeReuse/ActiveRecord`
Add repository languages for projects Our friends at GitHub show the programming languages for a long time, and inspired by that this commit means to create about the same functionality. Language detection is done through Linguist, as before, where the difference is that we cache the result in the database. Also, Gitaly can incrementaly scan a repository. This is done through a shell out, which creates overhead of about 3s each run. For now this won't be improved. Scans are triggered by pushed to the default branch, usually `master`. However, one exception to this rule the charts page. If we're requesting this expensive data anyway, we just cache it in the database. Edge cases where there is no repository, or its empty are caught in the Repository model. This makes use of Redis caching, which is probably already loaded. The added model is called RepositoryLanguage, which will make it harder if/when GitLab supports multiple repositories per project. However, for now I think this shouldn't be a concern. Also, Language could be confused with the i18n languages and felt like the current name was suiteable too. Design of the Project#Show page is done with help from @dimitrieh. This change is not visible to the end user unless detections are done. 2018-06-06 07:10:59 -04:00
Disable existing offenses for the CodeReuse cops This whitelists all existing offenses for the various CodeReuse cops, of which most are triggered by the CodeReuse/ActiveRecord cop. 2018-08-27 11:31:01 -04:00			`# rubocop: disable CodeReuse/ActiveRecord`
Add repository languages for projects Our friends at GitHub show the programming languages for a long time, and inspired by that this commit means to create about the same functionality. Language detection is done through Linguist, as before, where the difference is that we cache the result in the database. Also, Gitaly can incrementaly scan a repository. This is done through a shell out, which creates overhead of about 3s each run. For now this won't be improved. Scans are triggered by pushed to the default branch, usually `master`. However, one exception to this rule the charts page. If we're requesting this expensive data anyway, we just cache it in the database. Edge cases where there is no repository, or its empty are caught in the Repository model. This makes use of Redis caching, which is probably already loaded. The added model is called RepositoryLanguage, which will make it harder if/when GitLab supports multiple repositories per project. However, for now I think this shouldn't be a concern. Also, Language could be confused with the i18n languages and felt like the current name was suiteable too. Design of the Project#Show page is done with help from @dimitrieh. This change is not visible to the end user unless detections are done. 2018-06-06 07:10:59 -04:00			`def create_language(name, color)`
			`ProgrammingLanguage.transaction do`
			`ProgrammingLanguage.where(name: name).first_or_create(color: color)`
			`end`
			`rescue ActiveRecord::RecordNotUnique`
			`retry`
			`end`
Disable existing offenses for the CodeReuse cops This whitelists all existing offenses for the various CodeReuse cops, of which most are triggered by the CodeReuse/ActiveRecord cop. 2018-08-27 11:31:01 -04:00			`# rubocop: enable CodeReuse/ActiveRecord`
Return cached languages if they've been detected before 2019-03-20 13:23:23 -04:00
			`def set_detected_repository_languages`
			`return if project.detected_repository_languages?`

			`project.update_column(:detected_repository_languages, true)`
			`end`
Add repository languages for projects Our friends at GitHub show the programming languages for a long time, and inspired by that this commit means to create about the same functionality. Language detection is done through Linguist, as before, where the difference is that we cache the result in the database. Also, Gitaly can incrementaly scan a repository. This is done through a shell out, which creates overhead of about 3s each run. For now this won't be improved. Scans are triggered by pushed to the default branch, usually `master`. However, one exception to this rule the charts page. If we're requesting this expensive data anyway, we just cache it in the database. Edge cases where there is no repository, or its empty are caught in the Repository model. This makes use of Redis caching, which is probably already loaded. The added model is called RepositoryLanguage, which will make it harder if/when GitLab supports multiple repositories per project. However, for now I think this shouldn't be a concern. Also, Language could be confused with the i18n languages and felt like the current name was suiteable too. Design of the Project#Show page is done with help from @dimitrieh. This change is not visible to the end user unless detections are done. 2018-06-06 07:10:59 -04:00			`end`
			`end`