gitlab-org--gitlab-foss/app/models/event.rb
Yorick Peterse 0395c47193
Migrate events into a new format
This commit migrates events data in such a way that push events are
stored much more efficiently. This is done by creating a shadow table
called "events_for_migration", and a table called "push_event_payloads"
which is used for storing push data of push events. The background
migration in this commit will copy events from the "events" table into
the "events_for_migration" table, push events in will also have a row
created in "push_event_payloads".

This approach allows us to reclaim space in the next release by simply
swapping the "events" and "events_for_migration" tables, then dropping
the old events (now "events_for_migration") table.

The new table structure is also optimised for storage space, and does
not include the unused "title" column nor the "data" column (since this
data is moved to "push_event_payloads").

== Newly Created Events

Newly created events are inserted into both "events" and
"events_for_migration", both using the exact same primary key value. The
table "push_event_payloads" in turn has a foreign key to the _shadow_
table. This removes the need for recreating and validating the foreign
key after swapping the tables. Since the shadow table also has a foreign
key to "projects.id" we also don't have to worry about orphaned rows.

This approach however does require some additional storage as we're
duplicating a portion of the events data for at least 1 release. The
exact amount is hard to estimate, but for GitLab.com this is expected to
be between 10 and 20 GB at most. The background migration in this commit
deliberately does _not_ update the "events" table as doing so would put
a lot of pressure on PostgreSQL's auto vacuuming system.

== Supporting Both Old And New Events

Application code has also been adjusted to support push events using
both the old and new data formats. This is done by creating a PushEvent
class which extends the regular Event class. Using Rails' Single Table
Inheritance system we can ensure the right class is used for the right
data, which in this case is based on the value of `events.action`. To
support displaying old and new data at the same time the PushEvent class
re-defines a few methods of the Event class, falling back to their
original implementations for push events in the old format.

Once all existing events have been migrated the various push event
related methods can be removed from the Event model, and the calls to
`super` can be removed from the methods in the PushEvent model.

The UI and event atom feed have also been slightly changed to better
handle this new setup, fortunately only a few changes were necessary to
make this work.

== API Changes

The API only displays push data of events in the new format. Supporting
both formats in the API is a bit more difficult compared to the UI.
Since the old push data was not really well documented (apart from one
example that used an incorrect "action" nmae) I decided that supporting
both was not worth the effort, especially since events will be migrated
in a few days _and_ new events are created in the correct format.
2017-08-10 17:45:44 +02:00

446 lines
9 KiB
Ruby

class Event < ActiveRecord::Base
include Sortable
default_scope { reorder(nil).where.not(author_id: nil) }
CREATED = 1
UPDATED = 2
CLOSED = 3
REOPENED = 4
PUSHED = 5
COMMENTED = 6
MERGED = 7
JOINED = 8 # User joined project
LEFT = 9 # User left project
DESTROYED = 10
EXPIRED = 11 # User left project due to expiry
ACTIONS = HashWithIndifferentAccess.new(
created: CREATED,
updated: UPDATED,
closed: CLOSED,
reopened: REOPENED,
pushed: PUSHED,
commented: COMMENTED,
merged: MERGED,
joined: JOINED,
left: LEFT,
destroyed: DESTROYED,
expired: EXPIRED
).freeze
TARGET_TYPES = HashWithIndifferentAccess.new(
issue: Issue,
milestone: Milestone,
merge_request: MergeRequest,
note: Note,
project: Project,
snippet: Snippet,
user: User
).freeze
RESET_PROJECT_ACTIVITY_INTERVAL = 1.hour
delegate :name, :email, :public_email, :username, to: :author, prefix: true, allow_nil: true
delegate :title, to: :issue, prefix: true, allow_nil: true
delegate :title, to: :merge_request, prefix: true, allow_nil: true
delegate :title, to: :note, prefix: true, allow_nil: true
belongs_to :author, class_name: "User"
belongs_to :project
belongs_to :target, polymorphic: true # rubocop:disable Cop/PolymorphicAssociations
has_one :push_event_payload, foreign_key: :event_id
# For Hash only
serialize :data # rubocop:disable Cop/ActiveRecordSerialize
# Callbacks
after_create :reset_project_activity
after_create :set_last_repository_updated_at, if: :push?
after_create :replicate_event_for_push_events_migration
# Scopes
scope :recent, -> { reorder(id: :desc) }
scope :code_push, -> { where(action: PUSHED) }
scope :in_projects, ->(projects) do
where(project_id: projects.pluck(:id)).recent
end
scope :with_associations, -> do
# We're using preload for "push_event_payload" as otherwise the association
# is not always available (depending on the query being built).
includes(:author, :project, project: :namespace)
.preload(:target, :push_event_payload)
end
scope :for_milestone_id, ->(milestone_id) { where(target_type: "Milestone", target_id: milestone_id) }
self.inheritance_column = 'action'
class << self
def find_sti_class(action)
if action.to_i == PUSHED
PushEvent
else
Event
end
end
def subclass_from_attributes(attrs)
# Without this Rails will keep calling this method on the returned class,
# resulting in an infinite loop.
return unless self == Event
action = attrs.with_indifferent_access[inheritance_column].to_i
PushEvent if action == PUSHED
end
# Update Gitlab::ContributionsCalendar#activity_dates if this changes
def contributions
where("action = ? OR (target_type IN (?) AND action IN (?)) OR (target_type = ? AND action = ?)",
Event::PUSHED,
%w(MergeRequest Issue), [Event::CREATED, Event::CLOSED, Event::MERGED],
"Note", Event::COMMENTED)
end
def limit_recent(limit = 20, offset = nil)
recent.limit(limit).offset(offset)
end
def actions
ACTIONS.keys
end
def target_types
TARGET_TYPES.keys
end
end
def visible_to_user?(user = nil)
if push? || commit_note?
Ability.allowed?(user, :download_code, project)
elsif membership_changed?
true
elsif created_project?
true
elsif issue? || issue_note?
Ability.allowed?(user, :read_issue, note? ? note_target : target)
elsif merge_request? || merge_request_note?
Ability.allowed?(user, :read_merge_request, note? ? note_target : target)
else
milestone?
end
end
def project_name
if project
project.name_with_namespace
else
"(deleted project)"
end
end
def target_title
target.try(:title)
end
def created?
action == CREATED
end
def push?
action == PUSHED && valid_push?
end
def merged?
action == MERGED
end
def closed?
action == CLOSED
end
def reopened?
action == REOPENED
end
def joined?
action == JOINED
end
def left?
action == LEFT
end
def expired?
action == EXPIRED
end
def destroyed?
action == DESTROYED
end
def commented?
action == COMMENTED
end
def membership_changed?
joined? || left? || expired?
end
def created_project?
created? && !target && target_type.nil?
end
def created_target?
created? && target
end
def milestone?
target_type == "Milestone"
end
def note?
target.is_a?(Note)
end
def issue?
target_type == "Issue"
end
def merge_request?
target_type == "MergeRequest"
end
def milestone
target if milestone?
end
def issue
target if issue?
end
def merge_request
target if merge_request?
end
def note
target if note?
end
def action_name
if push?
if new_ref?
"pushed new"
elsif rm_ref?
"deleted"
else
"pushed to"
end
elsif closed?
"closed"
elsif merged?
"accepted"
elsif joined?
'joined'
elsif left?
'left'
elsif expired?
'removed due to membership expiration from'
elsif destroyed?
'destroyed'
elsif commented?
"commented on"
elsif created_project?
if project.external_import?
"imported"
else
"created"
end
else
"opened"
end
end
def valid_push?
data[:ref] && ref_name.present?
rescue
false
end
def tag?
Gitlab::Git.tag_ref?(data[:ref])
end
def branch?
Gitlab::Git.branch_ref?(data[:ref])
end
def new_ref?
Gitlab::Git.blank_ref?(commit_from)
end
def rm_ref?
Gitlab::Git.blank_ref?(commit_to)
end
def md_ref?
!(rm_ref? || new_ref?)
end
def commit_from
data[:before]
end
def commit_to
data[:after]
end
def ref_name
if tag?
tag_name
else
branch_name
end
end
def branch_name
@branch_name ||= Gitlab::Git.ref_name(data[:ref])
end
def tag_name
@tag_name ||= Gitlab::Git.ref_name(data[:ref])
end
# Max 20 commits from push DESC
def commits
@commits ||= (data[:commits] || []).reverse
end
def commit_title
commit = commits.last
commit[:message] if commit
end
def commit_id
commit_to || commit_from
end
def commits_count
data[:total_commits_count] || commits.count || 0
end
def ref_type
tag? ? "tag" : "branch"
end
def push_with_commits?
!commits.empty? && commit_from && commit_to
end
def last_push_to_non_root?
branch? && project.default_branch != branch_name
end
def target_iid
target.respond_to?(:iid) ? target.iid : target_id
end
def commit_note?
note? && target && target.for_commit?
end
def issue_note?
note? && target && target.for_issue?
end
def merge_request_note?
note? && target && target.for_merge_request?
end
def project_snippet_note?
note? && target && target.for_snippet?
end
def note_target
target.noteable
end
def note_target_id
if commit_note?
target.commit_id
else
target.noteable_id.to_s
end
end
def note_target_reference
return unless note_target
# Commit#to_reference returns the full SHA, but we want the short one here
if commit_note?
note_target.short_id
else
note_target.to_reference
end
end
def note_target_type
if target.noteable_type.present?
target.noteable_type.titleize
else
"Wall"
end.downcase
end
def body?
if push?
push_with_commits? || rm_ref?
elsif note?
true
else
target.respond_to? :title
end
end
def reset_project_activity
return unless project
# Don't bother updating if we know the project was updated recently.
return if recent_update?
# At this point it's possible for multiple threads/processes to try to
# update the project. Only one query should actually perform the update,
# hence we add the extra WHERE clause for last_activity_at.
Project.unscoped.where(id: project_id)
.where('last_activity_at <= ?', RESET_PROJECT_ACTIVITY_INTERVAL.ago)
.update_all(last_activity_at: created_at)
end
def authored_by?(user)
user ? author_id == user.id : false
end
# We're manually replicating data into the new table since database triggers
# are not dumped to db/schema.rb. This could mean that a new installation
# would not have the triggers in place, thus losing events data in GitLab
# 10.0.
def replicate_event_for_push_events_migration
new_attributes = attributes.with_indifferent_access.except(:title, :data)
EventForMigration.create!(new_attributes)
end
private
def recent_update?
project.last_activity_at > RESET_PROJECT_ACTIVITY_INTERVAL.ago
end
def set_last_repository_updated_at
Project.unscoped.where(id: project_id)
.update_all(last_repository_updated_at: created_at)
end
end