Class: Lentil::InstagramHarvester

Inherits:
Object
  • Object
show all
Defined in:
lib/lentil/instagram_harvester.rb

Overview

A collection of methods for querying the Instagram API and importing metadata.

Instance Method Summary collapse

Instance Method Details

#configure_comment_connection(access_token = nil) ⇒ Object

Configure the Instagram class in preparation for leaving comments

Parameters:

  • access_token (defaults to: nil)

    nil [String] Instagram access token for the writing account



40
41
42
43
44
# File 'lib/lentil/instagram_harvester.rb', line 40

def configure_comment_connection(access_token = nil)
  access_token ||= Lentil::Engine::APP_CONFIG["instagram_access_token"] || nil
  raise "instagram_access_token must be defined as a parameter or in the application config" unless access_token
  configure_connection({'access_token' => access_token})
end

#configure_connection(opts = {}) ⇒ Object

Configure the Instagram class in preparation requests.



20
21
22
23
24
25
26
27
28
29
30
31
32
33
# File 'lib/lentil/instagram_harvester.rb', line 20

def configure_connection(opts = {})
  opts['client_id'] ||= Lentil::Engine::APP_CONFIG["instagram_client_id"]
  opts['client_secret'] ||= Lentil::Engine::APP_CONFIG["instagram_client_secret"]
  opts['access_token'] ||= Lentil::Engine::APP_CONFIG["instagram_access_token"] || nil

  Instagram.configure do |config|
    config.client_id = opts['client_id']
    config.client_secret = opts['client_secret']

    if (opts['access_token'])
      config.access_token = opts['access_token']
    end
  end
end

#extract_image_data(instagram_metadata) ⇒ Hash

Produce processed image metadata from Instagram metadata. This metadata is accepted by the save_image method.

Parameters:

  • instagram_metadata (Hashie::Mash)

    The single image metadata returned by Instagram API

Returns:

  • (Hash)

    processed image metadata



94
95
96
97
98
99
100
101
102
103
104
105
106
107
# File 'lib/lentil/instagram_harvester.rb', line 94

def extract_image_data()
  {
    url: .link,
    external_id: .id,
    large_url: .images.standard_resolution.url,
    name: .caption && .caption.text,
    tags: .tags,
    user: .user,
    original_datetime: Time.at(.created_time.to_i).to_datetime,
    original_metadata: ,
    media_type: .type,
    video_url: .videos && .videos.standard_resolution.url
  }
end

#fetch_image_by_id(image_id) ⇒ Hashie::Mash

Queries the Instagram API for the image metadata associated with a given ID.

Parameters:

  • image_id (String)

    Instagram image ID

Returns:

  • (Hashie::Mash)

    data returned by Instagram API



63
64
65
66
# File 'lib/lentil/instagram_harvester.rb', line 63

def fetch_image_by_id(image_id)
  configure_connection
  Instagram.media_item(image_id)
end

#fetch_recent_images_by_tag(tag = nil) ⇒ Hashie::Mash

Queries the Instagram API for recent images with a given tag.

Parameters:

  • tag (String) (defaults to: nil)

    The tag to query by

Returns:

  • (Hashie::Mash)

    The data returned by Instagram API



51
52
53
54
55
# File 'lib/lentil/instagram_harvester.rb', line 51

def fetch_recent_images_by_tag(tag = nil)
  configure_connection
  tag ||= Lentil::Engine::APP_CONFIG["default_image_search_tag"]
  Instagram.tag_recent_media(tag, :count=>100)
end

#harvest_image_data(image) ⇒ String

Retrieve the binary image data for a given Image object

Parameters:

  • image (Image)

    An Image model object from the Instagram service

Returns:

  • (String)

    Binary image data

Raises:

  • (Exception)

    If there are request problems



206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
# File 'lib/lentil/instagram_harvester.rb', line 206

def harvest_image_data(image)
  response = Typhoeus.get(image.large_url(false), followlocation: true)

  if response.success?
    raise "Invalid content type: " + response.headers['Content-Type'] unless (response.headers['Content-Type'] == 'image/jpeg')
  elsif response.timed_out?
    raise "Request timed out"
  elsif response.code == 0
    raise "Could not get an HTTP response"
  else
    raise "HTTP request failed: " + response.code.to_s
  end

  response.body
end

#harvest_video_data(image) ⇒ String

Retrieve the binary video data for a given Image object

Parameters:

  • image (Image)

    An Image model object from the Instagram service

Returns:

  • (String)

    Binary video data

Raises:

  • (Exception)

    If there are request problems



230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
# File 'lib/lentil/instagram_harvester.rb', line 230

def harvest_video_data(image)
  response = Typhoeus.get(image.video_url, followlocation: true)

  if response.success?
    raise "Invalid content type: " + response.headers['Content-Type'] unless (response.headers['Content-Type'] == 'video/mp4')
  elsif response.timed_out?
    raise "Request timed out"
  elsif response.code == 0
    raise "Could not get an HTTP response"
  else
    raise "HTTP request failed: " + response.code.to_s
  end

  response.body
end

#leave_image_comment(image, comment) ⇒ Hashie::Mash

Leave a comment containing the donor agreement on an Instagram image

Parameters:

  • image (type)

    An Image model object from the Instagram service

Returns:

  • (Hashie::Mash)

    Instagram response

Raises:

  • (Exception)

    If a comment submission fails



275
276
277
278
# File 'lib/lentil/instagram_harvester.rb', line 275

def leave_image_comment(image, comment)
  configure_comment_connection
  Instagram.client.create_media_comment(image.external_identifier, comment)
end

#retrieve_oembed_data_from_url(url) ⇒ String

Retrieves an image OEmbed metadata from the public URL using the Instagram OEmbed service

Parameters:

  • url (String)

    The public Instagram image URL

Returns:

  • (String)

    the Instagram image OEmbed data



74
75
76
# File 'lib/lentil/instagram_harvester.rb', line 74

def retrieve_oembed_data_from_url(url)
  OEmbed::Providers::Instagram.get(url)
end

#save_image(image_data) ⇒ Image

Takes return from Instagram API gem and adds image, users, and tags to the database.

Parameters:

  • image_data (Hash)

    processed Instagram image metadata

Returns:

  • (Image)

    new Image object

Raises:



118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
# File 'lib/lentil/instagram_harvester.rb', line 118

def save_image(image_data)

  instagram_service = Lentil::Service.where(:name => "Instagram").first

  user_record = instagram_service.users.where(:user_name => image_data[:user][:username]).
    first_or_create!({:full_name => image_data[:user][:full_name], :bio => image_data[:user][:bio]})

  raise DuplicateImageError, "Duplicate image identifier" unless user_record.
    images.where(:external_identifier => image_data[:external_id]).first.nil?

  image_record = user_record.images.build({
    :external_identifier => image_data[:external_id],
    :description => image_data[:name],
    :url => image_data[:url],
    :long_url => image_data[:large_url],
    :video_url => image_data[:video_url],
    :original_datetime => image_data[:original_datetime],
    :media_type => image_data[:media_type]
  })

  image_record. = image_data[:original_metadata].to_hash

  # Default to "All Rights Reserved" until we find out more about licenses
  # FIXME: Set the default license in the app config
  unless image_record.licenses.size > 0
    image_record.licenses << Lentil::License.where(:short_name => "ARR").first
  end

  image_data[:tags].each {|tag| image_record.tags << Lentil::Tag.where(:name => tag).first_or_create}

  user_record.save!
  image_record.save!
  image_record
end

#save_image_from_url(url) ⇒ Array

Retrieves image metadata via the public URL and imports it

Parameters:

  • url (String)

    The public Instagram image URL

Returns:

  • (Array)

    new image objects



83
84
85
# File 'lib/lentil/instagram_harvester.rb', line 83

def save_image_from_url(url)
  save_instagram_load(fetch_image_by_id(retrieve_oembed_data_from_url(url).fields["media_id"]))
end

#save_instagram_load(instagram_load, raise_dupes = false) ⇒ Array

Takes return from Instagram API gem and adds all new images, users, and tags to the database.

Parameters:

  • instagram_load (Hashie::Mash)

    The content returned by the Instagram gem

  • raise_dupes (Boolean) (defaults to: false)

    Whether to raise exceptions for duplicate images

Returns:

  • (Array)

    New image objects

Raises:



162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
# File 'lib/lentil/instagram_harvester.rb', line 162

def save_instagram_load(instagram_load, raise_dupes=false)
  # Handle collections of images and individual images
  images = instagram_load

  if !images.kind_of?(Array)
    images = [images]
  end

  images.collect {|image|
    begin
      save_image(extract_image_data(image))
    rescue DuplicateImageError => e
      raise e if raise_dupes
      next
    rescue => e
      Rails.logger.error e.message
      puts e.message
      pp image
      next
    end
  }.compact
end

#save_instagram_load!(instagram_load) ⇒ Array

Call save_instagram_load, but raise exceptions for duplicates.

Parameters:

  • instagram_load (Hashie::Mash)

    The content returned by the Instagram gem

Returns:

  • (Array)

    New image objects

Raises:



194
195
196
# File 'lib/lentil/instagram_harvester.rb', line 194

def save_instagram_load!(instagram_load)
  save_instagram_load(instagram_load, true)
end

#test_remote_image(image) ⇒ Boolean

Test if an image is still avaiable

Parameters:

  • image (Image)

    An Image model object from the Instagram service

Returns:

  • (Boolean)

    Whether the image request was successful

Raises:

  • (Exception)

    If there are request problems



254
255
256
257
258
259
260
261
262
263
264
# File 'lib/lentil/instagram_harvester.rb', line 254

def test_remote_image(image)
  response = Typhoeus.get(image.thumbnail_url(false), followlocation: true)

  if response.success?
    true
  elsif response.timed_out? || (response.code == 0)
    nil
  else
    false
  end
end