gitlab-org--gitlab-foss/spec/services
Stan Hu 919e3ed273 Merge branch 'fair-usage-of-shared-runners' into 'master'
Fair usage of Shared Runners

## What does this MR do?

Introduces a fair usage scheduler for shared runners.

It tries to assign builds to shared runner from projects that have the lowest number of builds currently running on shared runners.

**Example 1**:
```
We have following builds in queue:
build 1 for project 1
build 2 for project 1
build 3 for project 1
build 4 for project 2
build 5 for project 2
build 6 for project 3

With the new algorithm we will assign builds in following order:
- We choose build 1, because project 1 doesn't run currently any builds and has the lowest build number from projects that doesn't run builds,
- We choose build 4, because project 2 doesn't run currently any builds and has the lowest build number from projects that doesn't run builds,
- We choose build 6, because project 3 doesn't run currently any builds and has the lowest build number from projects that doesn't run builds,
- We choose build 2, because project 1 as other it runs 1 build,
- We choose build 5, because project 2 runs 1 build, where project 1 runs 2 builds now,
- We choose build 3, because project 1 and runs 2 builds.
```

 
**Example 2**:
```
We have following builds in queue:
build 1 for project 1
build 2 for project 1
build 3 for project 1
build 4 for project 2
build 5 for project 2
build 6 for project 3

With the new algorithm we will assign builds in following order:
- We choose build 1, because project 1 doesn't run currently any builds and has the lowest build number from projects that doesn't run builds,
- We finish build 1,
- We choose build 2, because project 1 doesn't run currently any builds and has the lowest build number from projects that doesn't run builds,
- We choose build 4, because project 2 doesn't run currently any builds and has the lowest build number from projects that doesn't run builds,
- We finish build 4,
- We choose build 5, because project 2 doesn't run currently any builds and has the lowest build number from projects that doesn't run builds,
- We choose build 6, because project 3 doesn't run currently any builds,
- We choose build 3, because project 1, 2 and 3 runs exactly one build now,
```

## Why was this MR needed?

Currently, we are scheduling builds using FIFO. This is catastrophic if there are projects that create a 100-300 jobs, this basically eats most of available shared runners.

## Performance

All this logic is implemented with the help of SQL queries, because this is the fastest way to process 1k-2k pending builds in queue.
It's not the fastest SQL query, because it sorts based on number of running_builds, and this forces to calculate a number of running builds for all dependent projects. However, since we have one/two shared runners that asks every few seconds for builds this should have minimal impact on DB performance.

```
 explain analyze SELECT "ci_builds".* FROM "ci_builds" JOIN (SELECT "ci_builds"."gl_project_id", count(case when status = 'running' AND runner_id = (SELECT "ci_runners"."id" FROM "ci_runners" WHERE "ci_runners"."is_shared" = 't') then 1 end) as running_builds FROM "ci_builds" INNER JOIN "projects" ON "projects"."id" = "ci_builds"."gl_project_id" AND "projects"."pending_delete" = 'f' WHERE "ci_builds"."type" IN ('Ci::Build') AND "ci_builds"."status" IN ('running', 'pending') AND "projects"."builds_enabled" = 't' AND "projects"."shared_runners_enabled" = 't' GROUP BY "ci_builds"."gl_project_id") AS projects ON ci_builds.gl_project_id=projects.gl_project_id WHERE "ci_builds"."type" IN ('Ci::Build') AND "ci_builds"."status" = 'pending' AND "ci_builds"."runner_id" IS NULL  ORDER BY projects.running_builds ASC, ci_builds.id ASC;
                                                                                  QUERY PLAN                                                                        
           
--------------------------------------------------------------------------------------------------------------------------------------------------------------------
-----------
 Sort  (cost=64777.28..64777.29 rows=1 width=1010) (actual time=301.794..302.535 rows=1537 loops=1)
   Sort Key: (count(CASE WHEN (((public.ci_builds.status)::text = 'running'::text) AND (public.ci_builds.runner_id = $0)) THEN 1 ELSE NULL::integer END)), public.ci
_builds.id
   Sort Method: quicksort  Memory: 1423kB
   ->  Nested Loop  (cost=63279.78..64777.27 rows=1 width=1010) (actual time=66.384..298.724 rows=1537 loops=1)
         ->  HashAggregate  (cost=63177.15..63177.30 rows=15 width=15) (actual time=65.641..65.851 rows=187 loops=1)
               InitPlan 1 (returns $0)
                 ->  Seq Scan on ci_runners  (cost=0.00..26963.66 rows=1 width=4) (actual time=1.145..34.381 rows=1 loops=1)
                       Filter: is_shared
                       Rows Removed by Filter: 6965
               ->  Nested Loop  (cost=0.00..36186.34 rows=2715 width=15) (actual time=0.065..29.717 rows=1710 loops=1)
                     ->  Index Scan using index_ci_builds_on_status on ci_builds  (cost=0.00..8913.95 rows=3577 width=15) (actual time=0.051..12.012 rows=2583 loops
=1)
                           Index Cond: ((status)::text = ANY ('{running,pending}'::text[]))
                           Filter: ((type)::text = 'Ci::Build'::text)
                           Rows Removed by Filter: 1219
                     ->  Index Scan using projects_pkey on projects  (cost=0.00..7.61 rows=1 width=4) (actual time=0.003..0.004 rows=1 loops=2583)
                           Index Cond: (id = public.ci_builds.gl_project_id)
                           Filter: ((NOT pending_delete) AND builds_enabled AND shared_runners_enabled)
                           Rows Removed by Filter: 0
         ->  Bitmap Heap Scan on ci_builds  (cost=102.63..106.64 rows=1 width=1002) (actual time=1.216..1.231 rows=8 loops=187)
               Recheck Cond: ((gl_project_id = public.ci_builds.gl_project_id) AND ((status)::text = 'pending'::text))
               Filter: ((runner_id IS NULL) AND ((type)::text = 'Ci::Build'::text))
               ->  BitmapAnd  (cost=102.63..102.63 rows=1 width=0) (actual time=1.201..1.201 rows=0 loops=187)
                     ->  Bitmap Index Scan on index_ci_builds_on_gl_project_id  (cost=0.00..10.52 rows=241 width=0) (actual time=0.406..0.406 rows=1944 loops=187)
                           Index Cond: (gl_project_id = public.ci_builds.gl_project_id)
                     ->  Bitmap Index Scan on index_ci_builds_on_status  (cost=0.00..91.78 rows=3089 width=0) (actual time=0.652..0.652 rows=3362 loops=187)
                           Index Cond: ((status)::text = 'pending'::text)
 Total runtime: 303.832 ms
```

## Specific runners

It doesn't affect the specific runners which still serve builds FIFO.

@stanhu @markpundsack @yorickpeterse What do you think?


See merge request !4634
2016-06-15 19:00:32 +00:00
..
auth Fix rubocop offenses 2016-05-31 15:38:42 +02:00
ci Make sure that we test RegisterBuildService behavior for deleted projects 2016-06-15 16:48:42 +02:00
groups Update tests for the current_application_settings request store changes 2016-05-27 19:05:52 -07:00
issues Fix spec 2016-06-03 17:14:23 -05:00
merge_requests Rename all `[ci_]commit` to `[ci_]pipeline` in specs and features 2016-06-03 16:22:26 +02:00
milestones Tag service specs 2015-12-09 11:55:49 +01:00
notes Fix latests concerns 2016-05-18 08:47:15 -05:00
projects Project members with guest role can't access confidential issues 2016-06-13 19:32:00 -03:00
create_commit_builds_service_spec.rb Fix remaining test offenses 2016-06-07 13:14:37 +02:00
create_deployment_service_spec.rb Make `project_id` and `environment_id` nullable 2016-06-15 13:00:13 +02:00
create_release_service_spec.rb Tag service specs 2015-12-09 11:55:49 +01:00
create_snippet_service_spec.rb Fix more specs 2016-03-20 23:09:33 +01:00
create_tag_service_spec.rb Use a similar approach to branch creation for tag creation 2016-05-04 17:19:13 +02:00
delete_tag_service_spec.rb Delete tags via rugged 2016-04-15 16:05:04 +02:00
delete_user_service_spec.rb Specs for the DeleteUserService 2016-03-15 22:19:51 +01:00
destroy_group_service_spec.rb Tag service specs 2015-12-09 11:55:49 +01:00
event_create_service_spec.rb Tag service specs 2015-12-09 11:55:49 +01:00
git_hooks_service_spec.rb Tag service specs 2015-12-09 11:55:49 +01:00
git_push_service_spec.rb Remove main_language and tests 2016-05-25 17:16:42 -04:00
git_tag_push_service_spec.rb Refactor GitTagPushService and fig tags_push system event hook 2016-04-19 11:00:30 +02:00
notification_service_spec.rb Project members with guest role can't access confidential issues 2016-06-13 19:32:00 -03:00
repair_ldap_blocked_user_service_spec.rb Code style fixes and some code simplified 2016-01-08 16:26:04 -02:00
search_service_spec.rb Tag service specs 2015-12-09 11:55:49 +01:00
system_hooks_service_spec.rb Merge branch 'add_username_to_user_system_hooks' of https://github.com/tbeadle/gitlabhq 2016-01-13 10:12:30 +01:00
system_note_service_spec.rb Rename all `[ci_]commit` to `[ci_]pipeline` in specs and features 2016-06-03 16:22:26 +02:00
test_hook_service_spec.rb Tag service specs 2015-12-09 11:55:49 +01:00
todo_service_spec.rb Merge branch 'manual-todos-issuable-sidebar' into 'master' 2016-06-14 22:01:43 +00:00
update_release_service_spec.rb Tag service specs 2015-12-09 11:55:49 +01:00
update_snippet_service_spec.rb Fix more specs 2016-03-20 23:09:33 +01:00