Commit graph

23 commits

Author SHA1 Message Date
Clement Ho
71c948d637 Replace $.post in importer status with axios 2018-02-09 11:14:48 +00:00
Yorick Peterse
4dfe26cd8b
Rewrite the GitHub importer from scratch
Prior to this MR there were two GitHub related importers:

* Github::Import: the main importer used for GitHub projects
* Gitlab::GithubImport: importer that's somewhat confusingly used for
  importing Gitea projects (apparently they have a compatible API)

This MR renames the Gitea importer to Gitlab::LegacyGithubImport and
introduces a new GitHub importer in the Gitlab::GithubImport namespace.
This new GitHub importer uses Sidekiq for importing multiple resources
in parallel, though it also has the ability to import data sequentially
should this be necessary.

The new code is spread across the following directories:

* lib/gitlab/github_import: this directory contains most of the importer
  code such as the classes used for importing resources.
* app/workers/gitlab/github_import: this directory contains the Sidekiq
  workers, most of which simply use the code from the directory above.
* app/workers/concerns/gitlab/github_import: this directory provides a
  few modules that are included in every GitHub importer worker.

== Stages

The import work is divided into separate stages, with each stage
importing a specific set of data. Stages will schedule the work that
needs to be performed, followed by scheduling a job for the
"AdvanceStageWorker" worker. This worker will periodically check if all
work is completed and schedule the next stage if this is the case. If
work is not yet completed this worker will reschedule itself.

Using this approach we don't have to block threads by calling `sleep()`,
as doing so for large projects could block the thread from doing any
work for many hours.

== Retrying Work

Workers will reschedule themselves whenever necessary. For example,
hitting the GitHub API's rate limit will result in jobs rescheduling
themselves. These jobs are not processed until the rate limit has been
reset.

== User Lookups

Part of the importing process involves looking up user details in the
GitHub API so we can map them to GitLab users. The old importer used
an in-memory cache, but this obviously doesn't work when the work is
spread across different threads.

The new importer uses a Redis cache and makes sure we only perform
API/database calls if absolutely necessary.  Frequently used keys are
refreshed, and lookup misses are also cached; removing the need for
performing API/database calls if we know we don't have the data we're
looking for.

== Performance & Models

The new importer in various places uses raw INSERT statements (as
generated by `Gitlab::Database.bulk_insert`) instead of using Rails
models. This allows us to bypass any validations and callbacks,
drastically reducing the number of SQL queries and Gitaly RPC calls
necessary to import projects.

To ensure the code produces valid data the corresponding tests check if
the produced rows are valid according to the model validation rules.
2017-11-07 23:24:59 +01:00
Robert Speicher
260c8da060 Whitelist or fix additional Gitlab/PublicSend cop violations
An upcoming update to rubocop-gitlab-security added additional
violations.
2017-08-14 12:14:11 -04:00
Brian Neel
9770c57fab Re-enable SqlInjection and CommandInjection 2017-08-08 10:50:54 -04:00
Rémy Coutable
e046e4c14d Namespace access token session key in Import::GithubController
Signed-off-by: Rémy Coutable <remy@rymai.me>
2016-12-19 17:35:51 +01:00
Rémy Coutable
8fc63d1f64 Improve Gitlab::ImportSources
Signed-off-by: Rémy Coutable <remy@rymai.me>
2016-12-19 17:35:51 +01:00
Rémy Coutable
103114e3d7 Rename Gogs to Gitea, DRY the controller and improve views
Signed-off-by: Rémy Coutable <remy@rymai.me>
2016-12-19 17:35:51 +01:00
James Lopez
0c65112da7 modify github import JS and controller so we can now specify a namespace and/or name for a project.
- Fixed and added specs.
- Added different namespace options depending on user privilages
- Updated docs.
2016-09-20 10:14:39 +02:00
Douglas Barbosa Alexandre
e293ffd48f Refactoring Import::BaseController#find_or_create_namespace 2016-08-31 16:54:15 -03:00
Douglas Barbosa Alexandre
325de662ce Don't create groups for unallowed users when importing projects 2016-08-31 12:55:45 -03:00
Rémy Coutable
ce6635406c Make GH one-off auth the default again for importing GH projects
Advertise the PAT as an alternative unless GH import is not configured.

Signed-off-by: Rémy Coutable <remy@rymai.me>
2016-06-30 18:48:17 +02:00
Eric K Idema
12aa1f898d Import from Github using Personal Access Tokens.
This stands as an alternative to using OAuth to access a user's Github
repositories.  This is setup in such a way that it can be used without OAuth
configuration.

From a UI perspective, the how to import modal has been replaced by a full
page, which includes a form for posting a personal access token back to the
Import::GithubController.

If the user has logged in via GitHub, skip the Personal Access Token and go
directly to Github for an access token via OAuth.
2016-06-30 18:48:17 +02:00
Stan Hu
4ad64ab3f4 Fix duplicate repositories in GitHub import page
By default, all the current user's repositories are accessible via the
/users endpoint. There's no need to traverse all the organization
repositories as well.

See:

* http://www.rubydoc.info/github/pengwynn/octokit/Octokit/Client/Repositories#repositories-instance_method
* https://developer.github.com/v3/repos/#list-your-repositories

Closes #2523
2015-10-19 10:39:59 -07:00
Valery Sizov
8346dde052 Only render 404 page from /public 2015-10-13 20:12:34 +03:00
Stan Hu
ed1d4fa477 Remove user OAuth tokens stored in database for Bitbucket, GitHub, and GitLab
and request them each session. Pass these tokens to the project import data.

This prevents the need to encrypt these tokens and clear them in case they
expire or get revoked.

For example, if you deleted and re-created OAuth2 keys for Bitbucket, you would get
an Error 500 with no way to recover:

```
Started GET "/import/bitbucket/status" for x.x.x.x at 2015-08-07 05:24:10 +0000
Processing by Import::BitbucketController#status as HTML
Completed 500 Internal Server Error in 607ms (ActiveRecord: 2.3ms)

NameError (uninitialized constant Import::BitbucketController::Unauthorized):
  app/controllers/import/bitbucket_controller.rb:77:in `rescue in go_to_bitbucket_for_permissions'
  app/controllers/import/bitbucket_controller.rb:74:in `go_to_bitbucket_for_permissions'
  app/controllers/import/bitbucket_controller.rb:86:in `bitbucket_unauthorized'
```

Closes #1871
2015-08-23 09:23:44 -07:00
Jeroen van Baarsen
5a4ebfb47a Fixed the Rails/ActionFilter cop
Signed-off-by: Jeroen van Baarsen <jeroenvanbaarsen@gmail.com>
2015-04-20 15:39:37 +02:00
Douwe Maan
737f322e41 Import GitHub, Bitbucket or GitLab.com projects owned by authenticated user into current namespace. 2015-03-31 16:34:13 +02:00
Douwe Maan
3175438f02 Fix missing GitHub organisation repositories on import page. 2015-03-12 13:47:15 +01:00
Douwe Maan
448817c4de Load public key in initializer. 2015-02-24 15:07:24 +01:00
Valery Sizov
b3c90dd514 GitHub importer refactoring 2015-02-05 21:48:21 -08:00
Valery Sizov
1ac20698a5 gitlab.com importer: refactorig 2015-02-05 17:03:43 -08:00
Valery Sizov
592ed8738c Gitlab.com integration: code folding 2015-02-05 12:50:34 -08:00
Valery Sizov
33349dd549 GitLab.com integration: refactoring 2015-02-05 12:50:34 -08:00
Renamed from app/controllers/importers/githubs_controller.rb (Browse further)