Add latest changes from gitlab-org/gitlab@master

This commit is contained in:
GitLab Bot 2022-06-08 00:08:12 +00:00
parent 9a4d2a38dc
commit 6df0351836
17 changed files with 113 additions and 160 deletions

View File

@ -107,7 +107,7 @@ perspective, but that definition is volatile and not actionable.
Finally, a characteristic that further differentiates time-decay data in sub-categories with
slightly different approaches available is **whether we want to keep the old data or not**
(for example, retention policy) and/or
**whether old data will be accessible by users through the application**.
**whether old data is accessible by users through the application**.
#### (optional) Extended definition of time-decay data
@ -148,7 +148,7 @@ factors:
would include too many unnecessary records in each partition, as is the case for `web_hook_logs`.
1. **How large are the partitions created?**
The major purpose of partitioning is accessing tables that are as small as possible. If they get too
large by themselves, queries will start underperforming. We may have to re-partition (split) them
large by themselves, queries start underperforming. We may have to re-partition (split) them
in even smaller partitions.
The perfect partitioning scheme keeps **all queries over a dataset almost always over a single partition**,
@ -194,7 +194,7 @@ The disadvantage of such a solution over large, non-partitioned tables is that w
access and delete all the records that are considered as not relevant any more. That is a very
expensive operation, due to multi-version concurrency control in PostgreSQL. It also leads to the
pruning worker not being able to catch up with new records being created, if that rate exceeds a
threshold, as is the case of [web_hook_logs](https://gitlab.com/gitlab-org/gitlab/-/issues/256088)
threshold, as is the case of [`web_hook_logs`](https://gitlab.com/gitlab-org/gitlab/-/issues/256088)
at the time of writing this document.
For the aforementioned reasons, our proposal is that
@ -315,7 +315,7 @@ The process required follows:
1. After the non-partitioned table is dropped, we can add a worker to implement the
pruning strategy by dropping past partitions.
In this case, the worker will make sure that only 4 partitions are always active (as the
In this case, the worker makes sure that only 4 partitions are always active (as the
retention policy is 90 days) and drop any partitions older than four months. We have to keep 4
months of partitions while the current month is still active, as going 90 days back takes you to
the fourth oldest partition.
@ -325,7 +325,7 @@ The process required follows:
Related epic: [Partitioning: Design and implement partitioning strategy for Audit Events](https://gitlab.com/groups/gitlab-org/-/epics/3206)
The `audit_events` table shares a lot of characteristics with the `web_hook_logs` table discussed
in the previous sub-section, so we are going to focus on the points they differ.
in the previous sub-section, so we focus on the points they differ.
The consensus was that
[partitioning could solve most of the performance issues](https://gitlab.com/groups/gitlab-org/-/epics/3206#note_338157248).

View File

@ -55,7 +55,7 @@ To use this feature, define a [CI/CD variable](../../../ci/variables/index.md#cu
`CI_PRE_CLONE_SCRIPT` that contains a bash script.
NOTE:
The `CI_PRE_CLONE_SCRIPT` variable does not work on Windows runners.
The `CI_PRE_CLONE_SCRIPT` variable does not work on GitLab SaaS Windows or macOS Runners.
### Pre-clone script example

View File

@ -179,7 +179,7 @@ The collected Dependency Scanning report uploads to GitLab as an artifact.
GitLab can display the results of one or more reports in:
- The merge request [dependency scanning widget](../../user/application_security/dependency_scanning/index.md#overview).
- The merge request [dependency scanning widget](../../user/application_security/dependency_scanning/index.md).
- The pipeline [**Security** tab](../../user/application_security/security_dashboard/index.md#view-vulnerabilities-in-a-pipeline).
- The [security dashboard](../../user/application_security/security_dashboard/index.md).
- The [Project Vulnerability report](../../user/application_security/vulnerability_report/index.md).

View File

@ -178,7 +178,7 @@ To make keyset pagination work, we must configure custom order objects, to do so
collect information about the order columns:
- `relative_position` can have duplicated values because no unique index is present.
- `relative_position` can have null values because we don't have a not null constraint on the column. For this, we must determine where we see NULL values, at the beginning of the result set, or the end (`NULLS LAST`).
- `relative_position` can have null values because we don't have a not null constraint on the column. For this, we must determine where we see `NULL` values, at the beginning of the result set, or the end (`NULLS LAST`).
- Keyset pagination requires distinct order columns, so we must add the primary key (`id`) to make the order distinct.
- Jumping to the last page and paginating backwards actually reverses the `ORDER BY` clause. For this, we must provide the reversed `ORDER BY` clause.

View File

@ -28,9 +28,9 @@ We have two options for rendering the content:
Rendering long lists can significantly affect both the frontend and backend performance:
- The database will need to read a lot of data from the disk.
- The result of the query (records) will eventually be transformed to Ruby objects which increases memory allocation.
- Large responses will take more time to send over the wire, to the user's browser.
- The database reads a lot of data from the disk.
- The result of the query (records) is eventually transformed to Ruby objects which increases memory allocation.
- Large responses take more time to send over the wire, to the user's browser.
- Rendering long lists might freeze the browser (bad user experience).
With pagination, the data is split into equal pieces (pages). On the first visit, the user receives only a limited number of items (page size). The user can see more items by paginating forward which results in a new HTTP request and a new database query.
@ -127,17 +127,17 @@ We can produce the same query in Rails:
Issue.where(project_id: 1).page(1).per(20)
```
The SQL query will return a maximum of 20 rows from the database. However, it doesn't mean that the database will only read 20 rows from the disk to produce the result.
The SQL query returns a maximum of 20 rows from the database. However, it doesn't mean that the database only reads 20 rows from the disk to produce the result.
This is what will happen:
This is what happens:
1. The database will try to plan the execution in the most efficient way possible based on the table statistics and the available indexes.
1. The database tries to plan the execution in the most efficient way possible based on the table statistics and the available indexes.
1. The planner knows that we have an index covering the `project_id` column.
1. The database will read all rows using the index on `project_id`.
1. The rows at this point are not sorted, so the database will need to sort the rows.
1. The database reads all rows using the index on `project_id`.
1. The rows at this point are not sorted, so the database sorts the rows.
1. The database returns the first 20 rows.
In case the project has 10_000 rows, the database will read 10_000 rows and sort them in memory (or on disk). This is not going to scale well in the long term.
In case the project has 10,000 rows, the database reads 10,000 rows and sorts them in memory (or on disk). This does not scale well in the long term.
To fix this we need the following index:
@ -145,16 +145,16 @@ To fix this we need the following index:
CREATE INDEX index_on_issues_project_id ON issues (project_id, id);
```
By making the `id` column part of the index, the previous query will read maximum 20 rows. The query will perform well regardless of the number of issues within a project. So with this change, we've also improved the initial page load (when the user loads the issue page).
By making the `id` column part of the index, the previous query reads maximum 20 rows. The query performs well regardless of the number of issues within a project. So with this change, we've also improved the initial page load (when the user loads the issue page).
NOTE:
Here we're leveraging the ordered property of the b-tree database index. Values in the index are sorted so reading 20 rows will not require further sorting.
Here we're leveraging the ordered property of the b-tree database index. Values in the index are sorted so reading 20 rows does not require further sorting.
#### Limitations
##### `COUNT(*)` on a large dataset
Kaminari by default executes a count query to determine the number of pages for rendering the page links. Count queries can be quite expensive for a large table, in an unfortunate scenario the queries will simply time out.
Kaminari by default executes a count query to determine the number of pages for rendering the page links. Count queries can be quite expensive for a large table. In an unfortunate scenario the queries simply time out.
To work around this, we can run Kaminari without invoking the count SQL query.
@ -162,11 +162,11 @@ To work around this, we can run Kaminari without invoking the count SQL query.
Issue.where(project_id: 1).page(1).per(20).without_count
```
In this case, the count query will not be executed and the pagination will no longer render the page numbers. We'll see only the next and previous links.
In this case, the count query is not executed and the pagination no longer renders the page numbers. We see only the next and previous links.
##### `OFFSET` on a large dataset
When we paginate over a large dataset, we might notice that the response time will get slower and slower. This is due to the `OFFSET` clause that seeks through the rows and skips N rows.
When we paginate over a large dataset, we might notice that the response time gets slower and slower. This is due to the `OFFSET` clause that seeks through the rows and skips N rows.
From the user point of view, this might not be always noticeable. As the user paginates forward, the previous rows might be still in the buffer cache of the database. If the user shares the link with someone else and it's opened after a few minutes or hours, the response time might be significantly higher or it would even time out.
@ -214,7 +214,7 @@ Limit (cost=137878.89..137881.65 rows=20 width=1309) (actual time=5523.588..552
(8 rows)
```
We can argue that a normal user will not be going to visit these pages, however, API users could easily navigate to very high page numbers (scraping, collecting data).
We can argue that a normal user does not visit these pages, however, API users could easily navigate to very high page numbers (scraping, collecting data).
### Keyset pagination
@ -279,7 +279,7 @@ eyJpZCI6Ijk0NzMzNTk0IiwidXBkYXRlZF9hdCI6IjIwMjEtMDQtMDkgMDg6NTA6MDUuODA1ODg0MDAw
```
NOTE:
Pagination parameters will be visible to the user, so we need to be careful about which columns we order by.
Pagination parameters are visible to the user, so be careful about which columns we order by.
Keyset pagination can only provide the next, previous, first, and last pages.

View File

@ -55,13 +55,13 @@ LIMIT 20
OFFSET 0
```
With PostgreSQL version 11, the planner will first look up all issues matching the `project_id` filter and then join all `issue_metrics` rows. The ordering of rows will happen in memory. In case the joined relation is always present (1:1 relationship), the database will read `N * 2` rows where N is the number of rows matching the `project_id` filter.
With PostgreSQL version 11, the planner first looks up all issues matching the `project_id` filter and then join all `issue_metrics` rows. The ordering of rows happens in memory. In case the joined relation is always present (1:1 relationship), the database reads `N * 2` rows where N is the number of rows matching the `project_id` filter.
For performance reasons, we should avoid mixing columns from different tables when specifying the `ORDER BY` clause.
In this particular case there is no simple way (like index creation) to improve the query. We might think that changing the `issues.id` column to `issue_metrics.issue_id` will help, however, this will likely make the query perform worse because it might force the database to process all rows in the `issue_metrics` table.
In this particular case there is no simple way (like index creation) to improve the query. We might think that changing the `issues.id` column to `issue_metrics.issue_id` helps, however, this likely makes the query perform worse because it might force the database to process all rows in the `issue_metrics` table.
One idea to address this problem is denormalization. Adding the `project_id` column to the `issue_metrics` table will make the filtering and sorting efficient:
One idea to address this problem is denormalization. Adding the `project_id` column to the `issue_metrics` table makes the filtering and sorting efficient:
```sql
SELECT issues.* FROM issues
@ -73,7 +73,7 @@ OFFSET 0
```
NOTE:
The query will require an index on `issue_metrics` table with the following column configuration: `(project_id, first_mentioned_in_commit_at DESC, issue_id DESC)`.
The query requires an index on `issue_metrics` table with the following column configuration: `(project_id, first_mentioned_in_commit_at DESC, issue_id DESC)`.
## Filtering
@ -81,7 +81,7 @@ The query will require an index on `issue_metrics` table with the following colu
Filtering by a project is a very common use case since we have many features on the project level. Examples: merge requests, issues, boards, iterations.
These features will have a filter on `project_id` in their base query. Loading issues for a project:
These features have a filter on `project_id` in their base query. Loading issues for a project:
```ruby
project = Project.find(5)
@ -108,9 +108,9 @@ This index fully covers the database query and the pagination.
### By group
Unfortunately, there is no efficient way to sort and paginate on the group level. The database query execution time will increase based on the number of records in the group.
Unfortunately, there is no efficient way to sort and paginate on the group level. The database query execution time increases based on the number of records in the group.
Things get worse when group level actually means group and its subgroups. To load the first page, the database needs to look up the group hierarchy, find all projects and then look up all issues.
Things get worse when group level actually means group and its subgroups. To load the first page, the database looks up the group hierarchy, finds all projects, and then looks up all issues.
The main reason behind the inefficient queries on the group level is the way our database schema is designed; our core domain models are associated with a project, and projects are associated with groups. This doesn't mean that the database structure is bad, it's just in a well-normalized form that is not optimized for efficient group level queries. We might need to look into denormalization in the long term.
@ -184,7 +184,7 @@ LIMIT 20
OFFSET 0
```
Keep in mind that the index above will not support the following project level query:
The index above does not support the following project level query:
```sql
SELECT "issues".*
@ -213,7 +213,7 @@ OFFSET 0
We might be tempted to add an index on `project_id`, `confidential`, and `iid` to improve the database query, however, in this case it's probably unnecessary. Based on the data distribution in the table, confidential issues are rare. Filtering them out does not make the database query significantly slower. The database might read a few extra rows, the performance difference might not even be visible to the end-user.
On the other hand, if we would implement a special filter where we only show confidential issues, we will surely need the index. Finding 20 confidential issues might require the database to scan hundreds of rows or in the worst case, all issues in the project.
On the other hand, if we implemented a special filter where we only show confidential issues, we need the index. Finding 20 confidential issues might require the database to scan hundreds of rows or, in the worst case, all issues in the project.
NOTE:
Be aware of the data distribution and the table access patterns (how features work) when introducing a new database index. Sampling production data might be necessary to make the right decision.
@ -253,7 +253,7 @@ Example database (oversimplified) execution plan:
- `SELECT "issues".* FROM "issues" WHERE "issues"."project_id" = 5`
1. The database estimates the number of rows and the costs to run these queries.
1. The database executes the cheapest query first.
1. Using the query result, load the rows from the other table (from the other query) using the JOIN column and filter the rows further.
1. Using the query result, load the rows from the other table (from the other query) using the `JOIN` column and filter the rows further.
In this particular example, the `issue_assignees` query would likely be executed first.
@ -276,17 +276,17 @@ Running the query in production for the GitLab project produces the following ex
(13 rows)
```
The query looks up the `assignees` first, filtered by the `user_id` (`user_id = 4156052`) and it finds 215 rows. Using that 215 rows, the database will look up the 215 associated issue rows by the primary key. Notice that the filter on the `project_id` column is not backed by an index.
The query looks up the `assignees` first, filtered by the `user_id` (`user_id = 4156052`) and it finds 215 rows. Using those 215 rows, the database looks up the 215 associated issue rows by the primary key. Notice that the filter on the `project_id` column is not backed by an index.
In most cases, we are lucky that the joined relation will not be going to return too many rows, therefore, we will end up with a relatively efficient database query that accesses low number of rows. As the database grows, these queries might start to behave differently. Let's say the number `issue_assignees` records for a particular user is very high (millions), then this join query will not perform well, and it will likely time out.
In most cases, we are lucky that the joined relation does not return too many rows, therefore, we end up with a relatively efficient database query that accesses a small number of rows. As the database grows, these queries might start to behave differently. Let's say the number `issue_assignees` records for a particular user is very high, in the millions. This join query does not perform well, and it likely times out.
A similar problem could be a double join, where the filter exists in the 2nd JOIN query. Example: `Issue -> LabelLink -> Label(name=bug)`.
A similar problem could be a double join, where the filter exists in the 2nd `JOIN` query. Example: `Issue -> LabelLink -> Label(name=bug)`.
There is no easy way to fix these problems. Denormalization of data could help significantly, however, it has also negative effects (data duplication and keeping the data up to date).
Ideas for improving the `issue_assignees` filter:
- Add `project_id` column to the `issue_assignees` table so when JOIN-ing, the extra `project_id` filter will further filter the rows. The sorting will likely happen in memory:
- Add `project_id` column to the `issue_assignees` table so when performing the `JOIN`, the extra `project_id` filter further filters the rows. The sorting likely happens in memory:
```sql
SELECT "issues".*

View File

@ -15,8 +15,8 @@ For further reference, check PostgreSQL documentation about [transactions](https
The [sharding group](https://about.gitlab.com/handbook/engineering/development/enablement/sharding/) plans
to split the main GitLab database and move some of the database tables to other database servers.
We'll start decomposing the `ci_*`-related database tables first. To maintain the current application
development experience, we'll add tooling and static analyzers to the codebase to ensure correct
We start decomposing the `ci_*`-related database tables first. To maintain the current application
development experience, we add tooling and static analyzers to the codebase to ensure correct
data access and data modification methods. By using the correct form for defining database transactions,
we can save significant refactoring work in the future.
@ -60,7 +60,7 @@ end
The database tries to acquire the `FOR UPDATE` lock for the referenced `issue` and
`project` records. In our case, we have two competing transactions for these locks,
and only one of them will successfully acquire them. The other transaction will have
and only one of them successfully acquires them. The other transaction has
to wait in the lock queue until the first transaction finishes. The execution of the
second transaction is blocked at this point.
@ -139,5 +139,5 @@ end
```
The `ActiveRecord::Base` class uses a different database connection than the `Ci::Build` records.
The two statements in the transaction block will not be part of the transaction and will not be
The two statements in the transaction block are not part of the transaction and are
rolled back in case something goes wrong. They act as 3rd part calls.

View File

@ -39,7 +39,7 @@ migration only.
### Required
You must provide the following artifacts when you request a ~database review.
If your merge request description does not include these items, the review will be reassigned back to the author.
If your merge request description does not include these items, the review is reassigned back to the author.
#### Migrations
@ -96,7 +96,7 @@ A database **maintainer**'s role is to:
Review workload is distributed using [reviewer roulette](code_review.md#reviewer-roulette)
([example](https://gitlab.com/gitlab-org/gitlab-foss/-/merge_requests/25181#note_147551725)).
The MR author should request a review from the suggested database
**reviewer**. When they give their sign-off, they will hand over to
**reviewer**. When they sign off, they hand over to
the suggested database **maintainer**.
If reviewer roulette didn't suggest a database reviewer & maintainer,
@ -124,10 +124,10 @@ the following preparations into account.
- When adding an index to a [large table](https://gitlab.com/gitlab-org/gitlab/-/blob/master/rubocop/rubocop-migrations.yml#L3),
test its execution using `CREATE INDEX CONCURRENTLY` in the `#database-lab` Slack channel and add the execution time to the MR description:
- Execution time largely varies between `#database-lab` and GitLab.com, but an elevated execution time from `#database-lab`
can give a hint that the execution on GitLab.com will also be considerably high.
can give a hint that the execution on GitLab.com is also considerably high.
- If the execution from `#database-lab` is longer than `1h`, the index should be moved to a [post-migration](database/post_deployment_migrations.md).
Keep in mind that in this case you may need to split the migration and the application changes in separate releases to ensure the index
will be in place when the code that needs it will be deployed.
is in place when the code that needs it is deployed.
- Manually trigger the [database testing](database/database_migration_pipeline.md) job (`db:gitlabcom-database-testing`) in the `test` stage.
- This job runs migrations in a production-like environment (similar to `#database_lab`) and posts to the MR its findings (queries, runtime, size change).
- Review migration runtimes and any warnings.
@ -142,7 +142,7 @@ Include in the MR description:
- If the migration itself is not reversible, details of how data changes could be reverted in the event of an incident. For example, in the case of a migration that deletes records (an operation that most of the times is not automatically revertable), how _could_ the deleted records be recovered.
- If the migration deletes data, apply the label `~data-deletion`.
- Concise descriptions of possible user experience impact of an error; for example, "Issues would unexpectedly go missing from Epics".
- Relevant data from the [query plans](#query-plans) that indicate the query works as expected; such as the approximate number of records that will be modified/deleted.
- Relevant data from the [query plans](#query-plans) that indicate the query works as expected; such as the approximate number of records that are modified or deleted.
#### Preparation when adding or modifying queries
@ -151,9 +151,9 @@ Include in the MR description:
- Write the raw SQL in the MR description. Preferably formatted
nicely with [pgFormatter](https://sqlformat.darold.net) or
[paste.depesz.com](https://paste.depesz.com) and using regular quotes
<!-- vale off -->
<!-- vale gitlab.NonStandardQuotes = NO -->
(for example, `"projects"."id"`) and avoiding smart quotes (for example, `“projects”.“id”`).
<!-- vale on -->
<!-- vale gitlab.NonStandardQuotes = YES -->
- In case of queries generated dynamically by using parameters, there should be one raw SQL query for each variation.
For example, a finder for issues that may take as a parameter an optional filter on projects,

View File

@ -15,7 +15,7 @@ class User < ActiveRecord::Base
end
```
Here you will need to add a foreign key on column `posts.user_id`. This ensures
Add a foreign key here on column `posts.user_id`. This ensures
that data consistency is enforced on database level. Foreign keys also mean that
the database can very quickly remove associated data (for example, when removing a
user), instead of Rails having to do this.
@ -28,7 +28,7 @@ Guide](migration_style_guide.md) for more information.
Keep in mind that you can only safely add foreign keys to existing tables after
you have removed any orphaned rows. The method `add_concurrent_foreign_key`
does not take care of this so you'll need to do so manually. See
does not take care of this so you need to do so manually. See
[adding foreign key constraint to an existing column](database/add_foreign_key_to_existing_column.md).
## Cascading Deletes
@ -39,7 +39,7 @@ this should be set to `CASCADE`.
## Indexes
When adding a foreign key in PostgreSQL the column is not indexed automatically,
thus you must also add a concurrent index. Not doing so will result in cascading
thus you must also add a concurrent index. Not doing so results in cascading
deletes being very slow.
## Naming foreign keys
@ -48,7 +48,7 @@ By default Ruby on Rails uses the `_id` suffix for foreign keys. So we should
only use this suffix for associations between two tables. If you want to
reference an ID on a third party platform the `_xid` suffix is recommended.
The spec `spec/db/schema_spec.rb` will test if all columns with the `_id` suffix
The spec `spec/db/schema_spec.rb` tests if all columns with the `_id` suffix
have a foreign key constraint. So if that spec fails, don't add the column to
`IGNORED_FK_COLUMNS`, but instead add the FK constraint, or consider naming it
differently.
@ -56,7 +56,7 @@ differently.
## Dependent Removals
Don't define options such as `dependent: :destroy` or `dependent: :delete` when
defining an association. Defining these options means Rails will handle the
defining an association. Defining these options means Rails handles the
removal of data, instead of letting the database handle this in the most
efficient way possible.
@ -80,13 +80,13 @@ foreign keys to remove the data as this would result in the file system data
being left behind. In such a case you should use a service class instead that
takes care of removing non database data.
In cases where the relation spans multiple databases you will have even
In cases where the relation spans multiple databases you have even
further problems using `dependent: :destroy` or the above hooks. You can
read more about alternatives at [Avoid `dependent: :nullify` and
`dependent: :destroy` across
databases](database/multiple_databases.md#avoid-dependent-nullify-and-dependent-destroy-across-databases).
## Alternative primary keys with has_one associations
## Alternative primary keys with `has_one` associations
Sometimes a `has_one` association is used to create a one-to-one relationship:
@ -112,9 +112,9 @@ create_table :user_configs, id: false do |t|
end
```
Setting `default: nil` will ensure a primary key sequence is not created, and since the primary key
will automatically get an index, we set `index: false` to avoid creating a duplicate.
You will also need to add the new primary key to the model:
Setting `default: nil` ensures a primary key sequence is not created, and since the primary key
automatically gets an index, we set `index: false` to avoid creating a duplicate.
You also need to add the new primary key to the model:
```ruby
class UserConfig < ActiveRecord::Base
@ -126,4 +126,4 @@ end
Using a foreign key as primary key saves space but can make
[batch counting](service_ping/implement.md#batch-counters) in [Service Ping](service_ping/index.md) less efficient.
Consider using a regular `id` column if the table will be relevant for Service Ping.
Consider using a regular `id` column if the table is relevant for Service Ping.

View File

@ -20,7 +20,7 @@ When implementing a new endpoint we should aim to minimise the number of SQL que
Batch loading is useful when a series of queries for inputs `Qα, Qβ, ... Qω` can be combined to a single query for `Q[α, β, ... ω]`. An example of this is lookups by ID, where we can find two users by usernames as cheaply as one, but real-world examples can be more complex.
Batch loading is not suitable when the result sets have different sort-orders, grouping, aggregation or other non-composable features.
Batch loading is not suitable when the result sets have different sort orders, grouping, aggregation, or other non-composable features.
There are two ways to use the batch-loader in your code. For simple ID lookups, use `::Gitlab::Graphql::Loaders::BatchModelLoader.new(model, id).find`. For more complex cases, you can use the batch API directly.
@ -69,7 +69,7 @@ end
## How does it work exactly?
Each lazy object knows which data it needs to load and how to batch the query. When we need to use the lazy objects (which we announce by calling `#sync`), they will be loaded along with all other similar objects in the current batch.
Each lazy object knows which data it needs to load and how to batch the query. When we need to use the lazy objects (which we announce by calling `#sync`), they are loaded along with all other similar objects in the current batch.
Inside the block we execute a batch query for our items (`User`). After that, all we have to do is to call loader by passing an item which was used in `BatchLoader::GraphQL.for` method (`usernames`) and the loaded object itself (`user`):

View File

@ -31,7 +31,7 @@ User.each_batch(of: 10) do |relation|
end
```
This will end up producing queries such as:
This produces queries such as:
```plaintext
User Load (0.7ms) SELECT "users"."id" FROM "users" WHERE ("users"."id" >= 41654) ORDER BY "users"."id" ASC LIMIT 1 OFFSET 1000
@ -46,7 +46,7 @@ all of the arguments that `in_batches` supports. You should always use
One should proceed with extra caution, and possibly avoid iterating over a column that can contain
duplicate values. When you iterate over an attribute that is not unique, even with the applied max
batch size, there is no guarantee that the resulting batches will not surpass it. The following
batch size, there is no guarantee that the resulting batches do not surpass it. The following
snippet demonstrates this situation when one attempt to select `Ci::Build` entries for users with
`id` between `1` and `10,000`, the database returns `1 215 178` matching rows.
@ -67,7 +67,7 @@ SELECT "ci_builds".* FROM "ci_builds" WHERE "ci_builds"."type" = 'Ci::Build' AND
even though the range size is limited to a certain threshold (`10,000` in the previous example) this
threshold does not translate to the size of the returned dataset. That happens because when taking
`n` possible values of attributes, one can't tell for sure that the number of records that contains
them will be less than `n`.
them is less than `n`.
## Column definition
@ -99,8 +99,8 @@ determines the data ranges (slices) and schedules the background jobs uses `each
## Efficient usage of `each_batch`
`EachBatch` helps to iterate over large tables. It's important to highlight that `EachBatch` is
not going to magically solve all iteration related performance problems and it might not help at
`EachBatch` helps to iterate over large tables. It's important to highlight that `EachBatch`
does not magically solve all iteration-related performance problems, and it might not help at
all in some scenarios. From the database point of view, correctly configured database indexes are
also necessary to make `EachBatch` perform well.
@ -108,7 +108,7 @@ also necessary to make `EachBatch` perform well.
Let's consider that we want to iterate over the `users` table and print the `User` records to the
standard output. The `users` table contains millions of records, thus running one query to fetch
the users will likely time out.
the users likely times out.
![Users table overview](img/each_batch_users_table_v13_7.png)
@ -171,7 +171,7 @@ SELECT "users".* FROM "users" WHERE "users"."id" >= 1 AND "users"."id" < 302
![Reading the rows from the `users` table](img/each_batch_users_table_iteration_3_v13_7.png)
Notice the `<` sign. Previously six items were read from the index and in this query, the last
value is "excluded". The query will look at the index to get the location of the five `user`
value is "excluded". The query looks at the index to get the location of the five `user`
rows on the disk and read the rows from the table. The returned array is processed in Ruby.
The first iteration is done. For the next iteration, the last `id` value is reused from the
@ -204,13 +204,13 @@ users.each_batch(of: 5) do |relation|
end
```
`each_batch` will produce the following SQL query for the start `id` value:
`each_batch` produces the following SQL query for the start `id` value:
```sql
SELECT "users"."id" FROM "users" WHERE "users"."sign_in_count" = 0 ORDER BY "users"."id" ASC LIMIT 1
```
Selecting only the `id` column and ordering by `id` is going to "force" the database to use the
Selecting only the `id` column and ordering by `id` forces the database to use the
index on the `id` (primary key index) column however, we also have an extra condition on the
`sign_in_count` column. The column is not part of the index, so the database needs to look into
the actual table to find the first matching row.
@ -225,7 +225,7 @@ The number of scanned rows depends on the data distribution in the table.
In this particular example, the database had to read 10 rows (regardless of our batch size setting)
to determine the first `id` value. In a "real-world" application it's hard to predict whether the
filtering is going to cause problems or not. In the case of GitLab, verifying the data on a
filtering causes problems or not. In the case of GitLab, verifying the data on a
production replica is a good start, but keep in mind that data distribution on GitLab.com can be
different from self-managed instances.
@ -289,7 +289,7 @@ CREATE INDEX index_on_users_never_logged_in ON users (sign_in_count, id)
![Reading a good index](img/each_batch_users_table_good_index_v13_7.png)
The following index definition is not going to work well with `each_batch` (avoid).
The following index definition does not work well with `each_batch` (avoid).
```sql
CREATE INDEX index_on_users_never_logged_in ON users (sign_in_count)

View File

@ -178,7 +178,8 @@ To enable DAST to run automatically, either:
#### Include the DAST template
> This template was [updated](https://gitlab.com/gitlab-org/gitlab/-/merge_requests/62597) to DAST_VERSION: 2 in GitLab 14.0.
> - This template was [updated](https://gitlab.com/gitlab-org/gitlab/-/merge_requests/62597) to DAST_VERSION: 2 in GitLab 14.0.
> - This template was [updated](https://gitlab.com/gitlab-org/gitlab/-/merge_requests/87183) to DAST_VERSION: 3 in GitLab 15.0.
If you want to manually add DAST to your application, the DAST job is defined
in a CI/CD template file. Updates to the template are provided with GitLab

View File

@ -1021,6 +1021,19 @@ variables:
DAST_API_EXCLUDE_PATHS=/auth*;/v1/*
```
To exclude one or more nested levels within a path we use `**`. In this example we are testing API endpoints. We are testing `/api/v1/` and `/api/v2/` of a data query requesting `mass`, `brightness` and `coordinates` data for `planet`, `moon`, `star`, and `satellite` objects. Example paths that could be scanned include, but are not limited to:
- `/api/v2/planet/coordinates`
- `/api/v1/star/mass`
- `/api/v2/satellite/brightness`
In this example we test the `brightness` endpoint only:
```yaml
variables:
DAST_API_EXCLUDE_PATHS=/api/**/mass;/api/**/coordinates
```
### Exclude parameters
> [Introduced](https://gitlab.com/gitlab-org/gitlab/-/issues/292196) in GitLab 14.10.

View File

@ -17,6 +17,24 @@ aspects of inspecting the items your code uses. These items typically include ap
dependencies that are almost always imported from external sources, rather than sourced from items
you wrote yourself.
If you're using [GitLab CI/CD](../../../ci/index.md), you can use dependency scanning to analyze
your dependencies for known vulnerabilities. GitLab scans all dependencies, including transitive
dependencies (also known as nested dependencies). You can take advantage of dependency scanning by
either:
- [Including the dependency scanning template](#configuration)
in your existing `.gitlab-ci.yml` file.
- Implicitly using
the [auto dependency scanning](../../../topics/autodevops/stages.md#auto-dependency-scanning)
provided by [Auto DevOps](../../../topics/autodevops/index.md).
GitLab checks the dependency scanning report, compares the found vulnerabilities
between the source and target branches, and shows the information on the
merge request. The results are sorted by the [severity](../vulnerabilities/severities.md) of the
vulnerability.
![Dependency scanning Widget](img/dependency_scanning_v13_2.png)
## Dependency Scanning compared to Container Scanning
GitLab offers both Dependency Scanning and Container Scanning
@ -58,26 +76,6 @@ The following table summarizes which types of dependencies each scanning tool ca
1. Lock file must be present in the image to be detected.
1. Binary file must be present in the image to be detected.
## Overview
If you're using [GitLab CI/CD](../../../ci/index.md), you can use dependency scanning to analyze
your dependencies for known vulnerabilities. GitLab scans all dependencies, including transitive
dependencies (also known as nested dependencies). You can take advantage of dependency scanning by
either:
- [Including the dependency scanning template](#configuration)
in your existing `.gitlab-ci.yml` file.
- Implicitly using
the [auto dependency scanning](../../../topics/autodevops/stages.md#auto-dependency-scanning)
provided by [Auto DevOps](../../../topics/autodevops/index.md).
GitLab checks the dependency scanning report, compares the found vulnerabilities
between the source and target branches, and shows the information on the
merge request. The results are sorted by the [severity](../vulnerabilities/severities.md) of the
vulnerability.
![Dependency scanning Widget](img/dependency_scanning_v13_2.png)
## Requirements
Dependency Scanning runs in the `test` stage, which is available by default. If you redefine the
@ -110,6 +108,8 @@ maximum of two directory levels from the repository's root. For example, the
`gemnasium-dependency_scanning` job is enabled if a repository contains either `Gemfile`,
`api/Gemfile`, or `api/client/Gemfile`, but not if the only supported dependency file is `api/v1/client/Gemfile`.
When a supported dependency file is detected, all dependencies, including transitive dependencies are analyzed. There is no limit to the depth of nested or transitive dependencies that are analyzed.
The following languages and dependency managers are supported:
<style>
@ -420,6 +420,8 @@ GitLab relies on [`rules:exists`](../../../ci/yaml/index.md#rulesexists) to star
The current detection logic limits the maximum search depth to two levels. For example, the `gemnasium-dependency_scanning` job is enabled if
a repository contains either a `Gemfile.lock`, `api/Gemfile.lock`, or `api/client/Gemfile.lock`, but not if the only supported dependency file is `api/v1/client/Gemfile.lock`.
When a supported dependency file is detected, all dependencies, including transitive dependencies are analyzed. There is no limit to the depth of nested or transitive dependencies that are analyzed.
### How multiple files are processed
NOTE:

View File

@ -1243,9 +1243,6 @@ msgid_plural "+%d more"
msgstr[0] ""
msgstr[1] ""
msgid "+%{extra} more"
msgstr ""
msgid "+%{more_assignees_count}"
msgstr ""
@ -10498,9 +10495,6 @@ msgstr ""
msgid "Create group label"
msgstr ""
msgid "Create incident"
msgstr ""
msgid "Create issue"
msgstr ""
@ -25183,15 +25177,6 @@ msgstr ""
msgid "Network:"
msgstr ""
msgid "NetworkPolicy|Policy"
msgstr ""
msgid "NetworkPolicy|Search by policy name"
msgstr ""
msgid "NetworkPolicy|Status"
msgstr ""
msgid "Never"
msgstr ""
@ -38475,9 +38460,6 @@ msgstr ""
msgid "There was an error fetching configuration for charts"
msgstr ""
msgid "There was an error fetching content, please refresh the page"
msgstr ""
msgid "There was an error fetching data for the selected stage"
msgstr ""
@ -39225,63 +39207,21 @@ msgstr ""
msgid "ThreatMonitoring|Alert Details"
msgstr ""
msgid "ThreatMonitoring|Alerts"
msgstr ""
msgid "ThreatMonitoring|Date and time"
msgstr ""
msgid "ThreatMonitoring|Dismissed"
msgstr ""
msgid "ThreatMonitoring|Events"
msgstr ""
msgid "ThreatMonitoring|Failed to create incident, please try again."
msgstr ""
msgid "ThreatMonitoring|Hide dismissed alerts"
msgstr ""
msgid "ThreatMonitoring|In review"
msgstr ""
msgid "ThreatMonitoring|Incident"
msgstr ""
msgid "ThreatMonitoring|Name"
msgstr ""
msgid "ThreatMonitoring|No alerts available to display. See %{linkStart}enabling threat alerts%{linkEnd} for more information on adding alerts to the list."
msgstr ""
msgid "ThreatMonitoring|No alerts to display."
msgstr ""
msgid "ThreatMonitoring|Resolved"
msgstr ""
msgid "ThreatMonitoring|Status"
msgstr ""
msgid "ThreatMonitoring|There was an error displaying the alerts. Confirm your endpoint's configuration details to ensure alerts appear."
msgstr ""
msgid "ThreatMonitoring|There was an error while updating the status of the alert. Please try again."
msgstr ""
msgid "ThreatMonitoring|Threat Monitoring"
msgstr ""
msgid "ThreatMonitoring|Threat Monitoring help page link"
msgstr ""
msgid "ThreatMonitoring|Unreviewed"
msgstr ""
msgid "ThreatMonitoring|View documentation"
msgstr ""
msgid "Threshold in bytes at which to compress Sidekiq job arguments."
msgstr ""

View File

@ -47,6 +47,7 @@ COPY ./config/initializers/0_inject_enterprise_edition_module.rb /home/gitlab/co
# The [b] part makes ./ee/app/models/license.r[b] a pattern that is allowed to return no files (which is the case in FOSS)
COPY VERSION ./ee/app/models/license.r[b] /home/gitlab/ee/app/models/
COPY ./config/bundler_setup.rb /home/gitlab/config/
COPY ./config/feature_flags /home/gitlab/config/feature_flags
COPY ./lib/gitlab_edition.rb /home/gitlab/lib/
COPY ./lib/gitlab/utils.rb /home/gitlab/lib/gitlab/
COPY ./INSTALLATION_TYPE ./VERSION /home/gitlab/

View File

@ -1,14 +1,10 @@
# frozen_string_literal: true
module QA
RSpec.describe 'Manage', quarantine: { issue: 'https://gitlab.com/gitlab-org/gitlab/-/issues/212145', type: :stale } do
RSpec.describe 'Manage' do
describe 'Check for broken images', :requires_admin do
before(:context) do
admin = QA::Resource::User.init do |user|
user.username = QA::Runtime::User.admin_username
user.password = QA::Runtime::User.admin_password
end
@api_client = Runtime::API::Client.new(:gitlab, user: admin)
@api_client = Runtime::API::Client.as_admin
@new_user = Resource::User.fabricate_via_api! do |user|
user.api_client = @api_client
end