# Cycle Analytics development guide Cycle analytics calculates the time between two arbitrary events recorded on domain objects and provides aggregated statistics about the duration. ## Stage During development, events occur that move issues and merge requests through different stages of progress until they are considered finished. These stages can be expressed with the `Stage` model. Example stage: - Name: Development - Start event: Issue created - End event: Issue first mentioned in commit - Parent: `Group: gitlab-org` ### Events Events are the smallest building blocks of the cycle analytics feature. A stage consists of two events: - Start - End These events play a key role in the duration calculation. Formula: `duration = end_event_time - start_event_time` To make the duration calculation flexible, each `Event` is implemented as a separate class. They're responsible for defining a timestamp expression that will be used in the calculation query. #### Implementing an `Event` class There are a few methods that are required to be implemented, the `StageEvent` base class describes them in great detail. The most important ones are: - `object_type` - `timestamp_projection` The `object_type` method defines which domain object will be queried for the calculation. Currently two models are allowed: - `Issue` - `MergeRequest` For the duration calculation the `timestamp_projection` method will be used. ```ruby def timestamp_projection # your timestamp expression comes here end # event will use the issue creation time in the duration calculation def timestamp_projection Issue.arel_table[:created_at] end ``` NOTE: **Note:** More complex expressions are also possible (e.g. using `COALESCE`). Look at the existing event classes for examples. In some cases, defining the `timestamp_projection` method is not enough. The calculation query should know which table contains the timestamp expression. Each `Event` class is responsible for making modifications to the calculation query to make the `timestamp_projection` work. This usually means joining an additional table. Example for joining the `issue_metrics` table and using the `first_mentioned_in_commit_at` column as the timestamp expression: ```ruby def object_type Issue end def timestamp_projection IssueMetrics.arel_table[:first_mentioned_in_commit_at] end def apply_query_customization(query) # in this case the query attribute will be based on the Issue model: `Issue.where(...)` query.joins(:metrics) end ``` ### Validating start and end events Some start/end event pairs are not "compatible" with each other. For example: - "Issue created" to "Merge Request created": The event classes are defined on different domain models, the `object_type` method is different. - "Issue closed" to "Issue created": Issue must be created first before it can be closed. - "Issue closed" to "Issue closed": Duration is always 0. The `StageEvents` module describes the allowed `start_event` and `end_event` pairings (`PAIRING_RULES` constant). If a new event is added, it needs to be registered in this module. ​To add a new event:​ 1. Add an entry in `ENUM_MAPPING` with a unique number, it'll be used in the `Stage` model as `enum`. 1. Define which events are compatible with the event in the `PAIRING_RULES` hash. Supported start/end event pairings: ```mermaid graph LR; IssueCreated --> IssueClosed; IssueCreated --> IssueFirstAddedToBoard; IssueCreated --> IssueFirstAssociatedWithMilestone; IssueCreated --> IssueFirstMentionedInCommit; IssueCreated --> IssueLastEdited; IssueCreated --> IssueLabelAdded; IssueCreated --> IssueLabelRemoved; MergeRequestCreated --> MergeRequestMerged; MergeRequestCreated --> MergeRequestClosed; MergeRequestCreated --> MergeRequestFirstDeployedToProduction; MergeRequestCreated --> MergeRequestLastBuildStarted; MergeRequestCreated --> MergeRequestLastBuildFinished; MergeRequestCreated --> MergeRequestLastEdited; MergeRequestCreated --> MergeRequestLabelAdded; MergeRequestCreated --> MergeRequestLabelRemoved; MergeRequestLastBuildStarted --> MergeRequestLastBuildFinished; MergeRequestLastBuildStarted --> MergeRequestClosed; MergeRequestLastBuildStarted --> MergeRequestFirstDeployedToProduction; MergeRequestLastBuildStarted --> MergeRequestLastEdited; MergeRequestLastBuildStarted --> MergeRequestMerged; MergeRequestLastBuildStarted --> MergeRequestLabelAdded; MergeRequestLastBuildStarted --> MergeRequestLabelRemoved; MergeRequestMerged --> MergeRequestFirstDeployedToProduction; MergeRequestMerged --> MergeRequestClosed; MergeRequestMerged --> MergeRequestFirstDeployedToProduction; MergeRequestMerged --> MergeRequestLastEdited; MergeRequestMerged --> MergeRequestLabelAdded; MergeRequestMerged --> MergeRequestLabelRemoved; IssueLabelAdded --> IssueLabelAdded; IssueLabelAdded --> IssueLabelRemoved; IssueLabelAdded --> IssueClosed; IssueLabelRemoved --> IssueClosed; IssueFirstAddedToBoard --> IssueClosed; IssueFirstAddedToBoard --> IssueFirstAssociatedWithMilestone; IssueFirstAddedToBoard --> IssueFirstMentionedInCommit; IssueFirstAddedToBoard --> IssueLastEdited; IssueFirstAddedToBoard --> IssueLabelAdded; IssueFirstAddedToBoard --> IssueLabelRemoved; IssueFirstAssociatedWithMilestone --> IssueClosed; IssueFirstAssociatedWithMilestone --> IssueFirstAddedToBoard; IssueFirstAssociatedWithMilestone --> IssueFirstMentionedInCommit; IssueFirstAssociatedWithMilestone --> IssueLastEdited; IssueFirstAssociatedWithMilestone --> IssueLabelAdded; IssueFirstAssociatedWithMilestone --> IssueLabelRemoved; IssueFirstMentionedInCommit --> IssueClosed; IssueFirstMentionedInCommit --> IssueFirstAssociatedWithMilestone; IssueFirstMentionedInCommit --> IssueFirstAddedToBoard; IssueFirstMentionedInCommit --> IssueLastEdited; IssueFirstMentionedInCommit --> IssueLabelAdded; IssueFirstMentionedInCommit --> IssueLabelRemoved; IssueClosed --> IssueLastEdited; IssueClosed --> IssueLabelAdded; IssueClosed --> IssueLabelRemoved; MergeRequestClosed --> MergeRequestFirstDeployedToProduction; MergeRequestClosed --> MergeRequestLastEdited; MergeRequestClosed --> MergeRequestLabelAdded; MergeRequestClosed --> MergeRequestLabelRemoved; MergeRequestFirstDeployedToProduction --> MergeRequestLastEdited; MergeRequestFirstDeployedToProduction --> MergeRequestLabelAdded; MergeRequestFirstDeployedToProduction --> MergeRequestLabelRemoved; MergeRequestLastBuildFinished --> MergeRequestClosed; MergeRequestLastBuildFinished --> MergeRequestFirstDeployedToProduction; MergeRequestLastBuildFinished --> MergeRequestLastEdited; MergeRequestLastBuildFinished --> MergeRequestMerged; MergeRequestLastBuildFinished --> MergeRequestLabelAdded; MergeRequestLastBuildFinished --> MergeRequestLabelRemoved; MergeRequestLabelAdded --> MergeRequestLabelAdded; MergeRequestLabelAdded --> MergeRequestLabelRemoved; MergeRequestLabelRemoved --> MergeRequestLabelAdded; MergeRequestLabelRemoved --> MergeRequestLabelRemoved; ``` ### Parent Teams and organizations might define their own way of building software, thus stages can be completely different. For each stage, a parent object needs to be defined. Currently supported parents: - `Project` - `Group` #### How parent relationship it work 1. User navigates to the cycle analytics page. 1. User selects a group. 1. Backend loads the defined stages for the selected group. 1. Additions and modifications to the stages will be persisted within the selected group only. ### Default stages The [original implementation](https://gitlab.com/gitlab-org/gitlab/issues/847) of cycle analytics defined 7 stages. These stages are always available for each parent, however altering these stages is not possible. ​ To make things efficient and reduce the number of records created, the default stages are expressed as in-memory objects (not persisted). When the user creates a custom stage for the first time, all the stages will be persisted. This behaviour is implemented in the cycle analytics service objects. ​ The reason for this was that we'd like to add the abilities to hide and order stages later on. ## Data Collector `DataCollector` is the central point where the data will be queried from the database. The class always operates on a single stage and consists of the following components: - `BaseQueryBuilder`: - Responsible for composing the initial query. - Deals with `Stage` specific configuration: events and their query customizations. - Parameters coming from the UI: date ranges. - `Median`: Calculates the median duration for a stage using the query from `BaseQueryBuilder`. - `RecordsFetcher`: Loads relevant records for a stage using the query from `BaseQueryBuilder` and specific `Finder` classes to apply visibility rules. - `DataForDurationChart`: Loads calculated durations with the finish time (end event timestamp) for the scatterplot chart. For a new calculation or a query, implement it as a new method call in the `DataCollector` class. ## Database query Structure of the database query: ```sql SELECT (customized by: Median or RecordsFetcher or DataForDurationChart) FROM OBJECT_TYPE (Issue or MergeRequest) INNER JOIN (several JOIN statements, depending on the events) WHERE (Filter by the PARENT model, example: filter Issues from Project A) (Date range filter based on the OBJECT_TYPE.created_at) (Check if the START_EVENT is earlier than END_EVENT, preventing negative duration) ``` Structure of the `SELECT` statement for `Median`: ```sql SELECT (calculate median from START_EVENT_TIME-END_EVENT_TIME) ``` Structure of the `SELECT` statement for `DataForDurationChart`: ```sql SELECT (START_EVENT_TIME-END_EVENT_TIME) as duration, END_EVENT.timestamp ``` ## High-level overview - Rails Controller (`Analytics::CycleAnalytics` module): Cycle analytics exposes its data via JSON endpoints, implemented within the `analytics` workspace. Configuring the stages are also implements JSON endpoints (CRUD). - Services (`Analytics::CycleAnalytics` module): All `Stage` related actions will be delegated to respective service objects. - Models (`Analytics::CycleAnalytics` module): Models are used to persist the `Stage` objects `ProjectStage` and `GroupStage`. - Feature classes (`Gitlab::Analytics::CycleAnalytics` module): - Responsible for composing queries and define feature specific busines logic. - `DataCollector`, `Event`, `StageEvents`, etc. ## Testing Since we have a lots of events and possible pairings, testing each pairing is not possible. The rule is to have at least one test case using an `Event` class. Writing a test case for a stage using a new `Event` can be challenging since data must be created for both events. To make this a bit simpler, each test case must be implemented in the `data_collector_spec.rb` where the stage is tested through the `DataCollector`. Each test case will be turned into multiple tests, covering the following cases: - Different parents: `Group` or `Project` - Different calculations: `Median`, `RecordsFetcher` or `DataForDurationChart`