Class: Gitlab::GithubImport::UserFinder

Inherits:
Object
  • Object
show all
Includes:
ExclusiveLeaseHelpers, Utils::StrongMemoize
Defined in:
lib/gitlab/github_import/user_finder.rb

Overview

Class that can be used for finding a GitLab user ID based on a GitHub user ID or username.

Any found user IDs are cached in Redis to reduce the number of SQL queries executed over time. Valid keys are refreshed upon access so frequently used keys stick around.

Lookups are cached even if no ID was found to remove the need for querying the database when most queries are not going to return results anyway.

Constant Summary collapse

ID_CACHE_KEY =

The base cache key to use for caching user IDs for a given GitHub user ID.

'github-import/user-finder/user-id/%s'
ID_FOR_EMAIL_CACHE_KEY =

The base cache key to use for caching user IDs for a given GitHub email address.

'github-import/user-finder/id-for-email/%s'
EMAIL_FOR_USERNAME_CACHE_KEY =

The base cache key to use for caching the Email addresses of GitHub usernames.

'github-import/user-finder/email-for-username/%s'
USERNAME_ETAG_CACHE_KEY =

The base cache key to use for caching the user ETAG response headers

'github-import/user-finder/user-etag/%s'
EMAIL_FETCHED_FOR_PROJECT_CACHE_KEY =

The base cache key to store whether an email has been fetched for a project

'github-import/user-finder/%{project}/email-fetched/%{username}'
SOURCE_NAME_CACHE_KEY =
'github-import/user-finder/%{project}/source-name/%{username}'
EMAIL_API_CALL_LOGGING_MESSAGE =
{
  true => 'Fetching email from GitHub with ETAG header',
  false => 'Fetching email from GitHub'
}.freeze

Constants included from ExclusiveLeaseHelpers

ExclusiveLeaseHelpers::FailedToObtainLockError

Instance Attribute Summary collapse

Instance Method Summary collapse

Methods included from ExclusiveLeaseHelpers

#in_lock

Constructor Details

#initialize(project, client) ⇒ UserFinder

project - An instance of Project client - An instance of Gitlab::GithubImport::Client



44
45
46
47
# File 'lib/gitlab/github_import/user_finder.rb', line 44

def initialize(project, client)
  @project = project
  @client = client
end

Instance Attribute Details

#clientObject (readonly)

Returns the value of attribute client.



18
19
20
# File 'lib/gitlab/github_import/user_finder.rb', line 18

def client
  @client
end

#projectObject (readonly)

Returns the value of attribute project.



18
19
20
# File 'lib/gitlab/github_import/user_finder.rb', line 18

def project
  @project
end

Instance Method Details

#author_id_for(object, author_key: :author) ⇒ Object

Returns the GitLab user ID of an object's author.

If the object has no author ID we'll use the ID of the GitLab ghost user. object - An instance of Hash or a Github::Representer



54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
# File 'lib/gitlab/github_import/user_finder.rb', line 54

def author_id_for(object, author_key: :author)
   = case author_key
              when :actor
                object[:actor]
              when :review_requester
                object[:review_requester]
              else
                object ? object[:author] : nil
              end

  # TODO when improved user mapping is released we can refactor everything below to just
  # user_id_for(user_info)
  id = user_id_for(, ghost: true)

  if id
    [id, true]
  else
    [project.creator_id, false]
  end
end

#cached_id_for_github_email(email) ⇒ Object



233
234
235
# File 'lib/gitlab/github_import/user_finder.rb', line 233

def cached_id_for_github_email(email)
  read_id_from_cache(ID_FOR_EMAIL_CACHE_KEY % email)
end

#cached_id_for_github_id(id) ⇒ Object



229
230
231
# File 'lib/gitlab/github_import/user_finder.rb', line 229

def cached_id_for_github_id(id)
  read_id_from_cache(ID_CACHE_KEY % id)
end

#email_for_github_username(username) ⇒ String, Nil

Parameters:

  • The username of the GitHub user.

Returns:

  • If public email is found

  • If public email or username does not exist



190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
# File 'lib/gitlab/github_import/user_finder.rb', line 190

def email_for_github_username(username)
  email = read_email_from_cache(username)

  if email.blank? && !email_fetched_for_project?(username)
    in_lock(lease_key(username), sleep_sec: 0.2.seconds, retries: 30) do |retried|
      # when retried, check the cache again as the other process that had the lease may have fetched the email
      if retried
        email = read_email_from_cache(username)

        # early return if the other process fetched a non-empty email. If the email is empty, we'll attempt to
        # fetch it again in the lines below, but using the ETAG cached by the other process which won't count to
        # the rate limit.
        next email if email.present?
      end

      # If an ETAG is available, make an API call with the ETAG.
      # Only make a rate-limited API call if the ETAG is not available and the email is nil.
      etag = read_etag_from_cache(username)
      email = fetch_email_from_github(username, etag: etag) || email

      cache_email!(username, email)
      cache_etag!(username) if email.blank? && etag.nil?

      # If a non-blank email is cached, we don't need the ETAG or project check caches.
      # Otherwise, indicate that the project has been checked.
      if email.present?
        clear_caches!(username)
      else
        set_project_as_checked!(username)
      end
    end
  end

  email.presence
rescue ::Octokit::NotFound
  cache_email!(username, '')
  nil
end

#fetch_source_name_from_github(username) ⇒ String

Retrieves the name of the user associated with a specified GitHub username.

To prevent multiple concurrent requests for the same user, a exclusive lock is used. The name is cached to avoid multiple calls to GitHub.

Parameters:

  • GitHub username

Returns:

  • name of the user



124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
# File 'lib/gitlab/github_import/user_finder.rb', line 124

def fetch_source_name_from_github(username)
  in_lock(lease_key(username), sleep_sec: 0.2.seconds, retries: 30) do |retried|
    if retried
      source_name = read_source_name_from_cache(username)

      next source_name if source_name.present?
    end

    begin
      user = client.user(username)
      source_name = user.fetch(:name, username)
    rescue ::Octokit::NotFound => error
      log("GitHub user not found. #{error.message}", username: username)

      source_name = username
    end

    cache_source_name(username, source_name)

    source_name
  end
end

#find(id, username) ⇒ Object

Returns the GitLab ID for the given GitHub ID or username.

id - The ID of the GitHub user. username - The username of the GitHub user.



151
152
153
154
155
156
157
158
159
160
161
# File 'lib/gitlab/github_import/user_finder.rb', line 151

def find(id, username)
  email = email_for_github_username(username)
  cached, found_id = find_from_cache(id, email)

  return found_id if found_id

  # We only want to query the database if necessary. If previous lookups
  # didn't yield a user ID we won't query the database again until the
  # keys expire.
  find_id_from_database(id, email) unless cached
end

#find_from_cache(id, email = nil) ⇒ Object

Finds a user ID from the cache for a given GitHub ID or Email.



164
165
166
167
168
169
170
171
172
173
# File 'lib/gitlab/github_import/user_finder.rb', line 164

def find_from_cache(id, email = nil)
  id_exists, id_for_github_id = cached_id_for_github_id(id)

  return [id_exists, id_for_github_id] if id_for_github_id

  # Just in case no Email address could be retrieved (for whatever reason)
  return [false] unless email

  cached_id_for_github_email(email)
end

#find_id_from_database(id, email) ⇒ Object

Finds a GitLab user ID from the database for a given GitHub user ID or Email.



177
178
179
# File 'lib/gitlab/github_import/user_finder.rb', line 177

def find_id_from_database(id, email)
  id_for_github_id(id) || id_for_github_email(email)
end

#id_for_github_email(email) ⇒ Object

Queries and caches the GitLab user ID for a GitHub email, if one was found.



255
256
257
258
259
# File 'lib/gitlab/github_import/user_finder.rb', line 255

def id_for_github_email(email)
  gitlab_id = query_id_for_github_email(email) || nil

  Gitlab::Cache::Import::Caching.write(ID_FOR_EMAIL_CACHE_KEY % email, gitlab_id)
end

#id_for_github_id(id) ⇒ Object

If importing from github.com, queries and caches the GitLab user ID for a GitHub user ID, if one was found.

When importing from Github Enterprise, do not query user by Github ID since we only have users' Github ID from github.com.



242
243
244
245
246
247
248
249
250
251
# File 'lib/gitlab/github_import/user_finder.rb', line 242

def id_for_github_id(id)
  gitlab_id =
    if project.github_enterprise_import?
      nil
    else
      query_id_for_github_id(id)
    end

  Gitlab::Cache::Import::Caching.write(ID_CACHE_KEY % id, gitlab_id)
end

#query_id_for_github_email(email) ⇒ Object



265
266
267
# File 'lib/gitlab/github_import/user_finder.rb', line 265

def query_id_for_github_email(email)
  User.by_any_email(email).pick(:id)
end

#query_id_for_github_id(id) ⇒ Object



261
262
263
# File 'lib/gitlab/github_import/user_finder.rb', line 261

def query_id_for_github_id(id)
  User.by_provider_and_extern_uid(:github, id).select(:id).first&.id
end

#read_id_from_cache(key) ⇒ Object

Reads an ID from the cache.

The return value is an Array with two values:

  1. A boolean indicating if the key was present or not.
  2. The ID as an Integer, or nil in case no ID could be found.


275
276
277
278
279
280
281
282
283
# File 'lib/gitlab/github_import/user_finder.rb', line 275

def read_id_from_cache(key)
  value = Gitlab::Cache::Import::Caching.read(key)
  exists = !value.nil?
  number = value.to_i

  # The cache key may be empty to indicate a previously looked up user for
  # which we couldn't find an ID.
  [exists, number > 0 ? number : nil]
end

#source_user(user) ⇒ Object

Returns the GitLab user ID from placeholder or reassigned_to user.



97
98
99
100
101
102
103
104
105
106
107
# File 'lib/gitlab/github_import/user_finder.rb', line 97

def source_user(user)
  source_user = source_user_mapper.find_source_user(user[:id])

  return source_user if source_user

  source_user_mapper.find_or_create_source_user(
    source_name: fetch_source_name_from_github(user[:login]),
    source_username: user[:login],
    source_user_identifier: user[:id]
  )
end

#source_user_accepted?(user) ⇒ Boolean

Returns true if GitLab user has accepted their reassignment status or if UCM is not enabled

Returns:



110
111
112
113
114
115
# File 'lib/gitlab/github_import/user_finder.rb', line 110

def source_user_accepted?(user)
  return true unless user_mapping_enabled?
  return true if map_to_personal_namespace_owner?

  source_user(user).accepted_status?
end

#user_id_for(user, ghost: true) ⇒ Integer, NilClass

Returns the GitLab user ID for a GitHub user. Can return nil if ghost is false. The ghost: false argument is used to avoid assigning ghost users as assignees or reviewers.

Parameters:

  • (defaults to: true)

    Determines what to do if user is nil or is the GitHub ghost. If true, ID of the GitLab ghost is returned. If false, nil is returned.

Returns:



83
84
85
86
87
88
89
90
91
92
93
94
# File 'lib/gitlab/github_import/user_finder.rb', line 83

def user_id_for(user, ghost: true)
  # user[:login] == 'ghost' here refers to the Github username
  if user.nil? || user[:login].nil? || user[:login] == 'ghost'
    return ghost ? GithubImport.ghost_user_id(project.organization_id) : nil
  end

  return find(user[:id], user[:login]) unless user_mapping_enabled?

  return project.root_ancestor.owner_id if map_to_personal_namespace_owner?

  source_user(user).mapped_user_id
end