Class: Typingpool::Amazon::HIT
- Inherits:
-
Object
- Object
- Typingpool::Amazon::HIT
- Defined in:
- lib/typingpool/amazon/hit.rb,
lib/typingpool/amazon/hit/full.rb,
lib/typingpool/amazon/hit/assignment.rb,
lib/typingpool/amazon/hit/assignment/empty.rb,
lib/typingpool/amazon/hit/full/fromsearchhits.rb
Overview
Class representing an Amazon Mechanical Turk Human Intelligence Task (HIT).
We go above and beyond RTurk::Hit for several practical reasons:
-
To allow easy serialization. Caching is a very useful way of reducing network calls to Amazon, and thus of speeding up Typingpool. RTurk::Hit objects cannot be dumped via Marshal, apparently due to some Nokogiri objects they contain. Typingpool::Amazon::HIT objects, in contrast, are designed to be easily and compactly serialized. They store the minimal subset of information we need via simple attribtues. (Presently we serialize via PStore.)
-
To attach convenience methods. RTurk does not make it easy, for example, to get HITs beyond the first “page” returned by Amazon. This class provides methods that make it easy to get ALL HITs returned by various operations.
-
To attach methods specific to Typingpool. For example, the url and project_id methods read params we’ve embedded in the annotation or in hidden fields on an external question, while the underlying stashed_params method optimizes its lookup of these variables based on how the app is most likely to be used. See also the ours? and cacheable? methods.
-
To simplify. Typingpool HITs are constrained such that we can assume they all contain only one assignment and thus only a maximum of one answer. Also, once we’ve determined that a HIT does not belong to Typingpool, it is safe to cache it forever and never download it again from Amazon.
-
To clearly partition methods that result in network calls. When you access an attribute under hit.full, like hit.full.status, it is clear you are doing something potentially expensive to obtain your hit status. Same thing with accessing an attribute under hit.assignment, like hit.assignment.worker_id – it is clear an assignment object will need to be created, implying a network call. Calling hit.id, in contrast, is always fast. (Caveat: Accessing partitioned attributes often, but not always, results in a network call. In some cases, hit.full is generated at the same time we create the hit, since we’ve obtained a full HIT serialization from Amazon. In other cases, we only have a HIT id, so accessing anything under hit.full generates a network call.)
Defined Under Namespace
Classes: Assignment, Full
Constant Summary collapse
- @@cacheable_assignment_status =
Set.new %w(Approved Rejected)
Instance Attribute Summary collapse
-
#id ⇒ Object
readonly
Corresponds to the Amazon Mechanical Turk HIT#HITId.
Class Method Summary collapse
-
.all(&filter) ⇒ Object
Returns all HITs associated with your AWS account as an array of Typingpool::Amazon::HIT instances.
-
.all_approved ⇒ Object
Returns all Typingpool HITs that have been approved, as an array of Typingpool::Amazon::HIT instances.
-
.all_for_project(id) ⇒ Object
Takes a Typingpool::Project::Local#id and returns all HITs associated with that project, as an array of Typingpool::Amazon::HIT instances.
-
.all_reviewable(&filter) ⇒ Object
Returns as an array of Typingpool::Amazon::HIT instances all HITs returned by Amazon’s GetReviewableHITs operation (which have HIT status == ‘Reviewable’).
- .cache_key(hit_id, id_at = self.id_at, url_at = self.url_at) ⇒ Object
-
.cached_or_new(rturk_hit) ⇒ Object
Constructor.
-
.cached_or_new_from_searchhits(rturk_hit, annotation) ⇒ Object
Constructor.
- .commission_rate ⇒ Object
-
.create(question, config_assign) ⇒ Object
Constructor.
- .delete_cache(hit_id, id_at = self.id_at, url_at = self.url_at) ⇒ Object
- .each_page ⇒ Object
- .filter_ours(hits, &filter) ⇒ Object
- .from_cache(hit_id, id_at = self.id_at, url_at = self.url_at) ⇒ Object
-
.id_at ⇒ Object
Name of the hidden HTML form field used to provide the project_id in an external question or (form-encoded) annotation.
- .minimum_commission ⇒ Object
- .reward_to_total_cost(reward) ⇒ Object
-
.url_at ⇒ Object
Name of the hidden HTML form field used to provide the (audio) url in an external question or (form-encoded) annotation.
-
.with_ids(ids) ⇒ Object
Takes an array of HIT ids, returns Typingpool::Amazon::HIT instances corresponding to those ids.
Instance Method Summary collapse
-
#approved? ⇒ Boolean
Returns true if this HIT has an approved assignment associated with it.
-
#assignment ⇒ Object
Returns the assignment associated with this HIT - a Typingpool::Amazon::HIT::Assignment instance.
-
#at_amazon ⇒ Object
Returns an RTurk::Hit instance corresponding to this HIT.
- #cacheable? ⇒ Boolean
-
#full(full_hit = nil) ⇒ Object
Returns “the full hit” - a Typingpool::Amazon::HIT::Full instance associated with this HIT.
-
#initialize(rturk_hit) ⇒ HIT
constructor
Constructor.
-
#ours? ⇒ Boolean
Returns true if this HIT is associated with Typingpool.
-
#project_id ⇒ Object
The Typingpool::Project::Local#id associated with this HIT.
-
#project_title_from_url(url = self.url) ⇒ Object
Returns the Typingpool::Project#name associated with this HIT by parsing the #url.
-
#rejected? ⇒ Boolean
Returns true if this HIT has a rejected assignment associated with it.
-
#remove_from_amazon ⇒ Object
Deletes the HIT from Amazon’s servers.
-
#stashed_param(param) ⇒ Object
private.
-
#submitted? ⇒ Boolean
Returns true if this HIT has a submitted assignment associated with it.
-
#to_cache ⇒ Object
If this HIT is cacheable, serializes it to the cache file specified in the config passed to Amazon.setup, or specified in the default config file.
-
#transcript ⇒ Object
Returns a Typingpool::Transcript::Chunk instance built using this HIT and its associated assignment.
-
#url ⇒ Object
URL of the audio file associated with this HIT (the audio file to be transcribed).
Constructor Details
#initialize(rturk_hit) ⇒ HIT
Constructor. Takes an RTurk::Hit instance.
251 252 253 254 255 256 |
# File 'lib/typingpool/amazon/hit.rb', line 251 def initialize(rturk_hit) @id = rturk_hit.id @full = nil @assignment = nil @ours = nil end |
Instance Attribute Details
#id ⇒ Object (readonly)
Corresponds to the Amazon Mechanical Turk HIT#HITId
248 249 250 |
# File 'lib/typingpool/amazon/hit.rb', line 248 def id @id end |
Class Method Details
.all(&filter) ⇒ Object
Returns all HITs associated with your AWS account as an array of Typingpool::Amazon::HIT instances. Takes an optional filter block (which should return true for HITs to be included in the final results). If not supplied, will filter so the returned hits are all Typingpool HITs (hit.ours? == true).
154 155 156 157 158 159 160 161 162 163 164 165 |
# File 'lib/typingpool/amazon/hit.rb', line 154 def all(&filter) hits = each_page do |page_number| page = RTurk::SearchHITs.create(:page_number => page_number) raw_hits = page.xml.xpath('//HIT') page.hits.map do |rturk_hit| annotation = raw_hits.shift.xpath('RequesterAnnotation').inner_text.strip full = Amazon::HIT::Full::FromSearchHITs.new(rturk_hit, annotation) cached_or_new_from_searchhits(rturk_hit, annotation) end end filter_ours(hits, &filter) end |
.all_approved ⇒ Object
Returns all Typingpool HITs that have been approved, as an array of Typingpool::Amazon::HIT instances.
113 114 115 116 117 118 119 120 121 122 123 124 125 126 |
# File 'lib/typingpool/amazon/hit.rb', line 113 def all_approved hits = all_reviewable do |hit| begin #optimization: we assume it is more common to have an #unapproved HIT than an approved HIT that does not #belong to this app hit.approved? && hit.ours? rescue RestClient::ServiceUnavailable => e warn "Warning: Service unavailable error, skipped HIT #{hit.id}. (Error: #{e})" false end end hits end |
.all_for_project(id) ⇒ Object
Takes a Typingpool::Project::Local#id and returns all HITs associated with that project, as an array of Typingpool::Amazon::HIT instances.
144 145 146 |
# File 'lib/typingpool/amazon/hit.rb', line 144 def all_for_project(id) all{|hit| hit.ours? && hit.project_id == id} end |
.all_reviewable(&filter) ⇒ Object
Returns as an array of Typingpool::Amazon::HIT instances all HITs returned by Amazon’s GetReviewableHITs operation (which have HIT status == ‘Reviewable’). Takes an optional filter block (which should return true for HITs to be included in the final results). If not supplied, will filter so the returned hits are all Typingpool HITs (hit.ours? == true).
134 135 136 137 138 139 |
# File 'lib/typingpool/amazon/hit.rb', line 134 def all_reviewable(&filter) hits = each_page do |page_number| RTurk.GetReviewableHITs(:page_number => page_number).hit_ids.map{|id| RTurk::Hit.new(id) }.map{|hit| cached_or_new(hit) } end filter_ours(hits, &filter) end |
.cache_key(hit_id, id_at = self.id_at, url_at = self.url_at) ⇒ Object
207 208 209 |
# File 'lib/typingpool/amazon/hit.rb', line 207 def cache_key(hit_id, id_at=self.id_at, url_at=self.url_at) "RESULT///#{hit_id}///#{url_at}///#{id_at}" end |
.cached_or_new(rturk_hit) ⇒ Object
Constructor. Takes an RTurk::Hit instance. Returns a Typingpool::Amazon::HIT instance, preferably from the cache.
171 172 173 |
# File 'lib/typingpool/amazon/hit.rb', line 171 def cached_or_new(rturk_hit) from_cache(rturk_hit.id) || new(rturk_hit) end |
.cached_or_new_from_searchhits(rturk_hit, annotation) ⇒ Object
Constructor. Same as cached_or_new, but handles peculiarities of objects returned by RTurk::SearchHITs. Such objects map two Amazon HIT fields to different names than those used by other RTurk HIT instances. They also do not bother to extract the annotation from the Amazon HIT, so we have to do that ourselves (elsewhere) and take it as a param here. Finally, on the bright side, RTurk::SearchHITs already contain a big chunk of hit.full attributes, potentially obviating the need for an additional network call to flesh out the HIT, so this method pre-fleshes-out the HIT.
185 186 187 188 189 190 191 |
# File 'lib/typingpool/amazon/hit.rb', line 185 def cached_or_new_from_searchhits(rturk_hit, annotation) if not (typingpool_hit = from_cache(rturk_hit.id)) typingpool_hit = new(rturk_hit) typingpool_hit.full(Amazon::HIT::Full::FromSearchHITs.new(rturk_hit, annotation)) end typingpool_hit end |
.commission_rate ⇒ Object
241 242 243 |
# File 'lib/typingpool/amazon/hit.rb', line 241 def commission_rate 0.2 end |
.create(question, config_assign) ⇒ Object
Constructor. Creates an Amazon Mechanical Turk HIT. ** Warning: This method can spend your money! **
Params
- question
-
Typingpool::Amazon::Question instance, used not only to generate the (external) question but also parsed to provide one or more core HIT attributes. Must include a non-nil annotation attribute. Provides fallback values for HIT title and description.
- config_assign
-
The ‘assign’ attribute of a Typingpool::Config instance (that is, a Typingpool::Config::Root::Assign instance). Must include values for reward, lifetime, duration, and approval. May include values for keywords and qualifications. Preferred source for HIT title and description. See Typingpool::Config documentation for further details.
Returns
Typingpool::Amazon::HIT instance corresponding to the new Mechanical Turk HIT.
74 75 76 77 78 79 80 81 82 83 84 85 86 87 |
# File 'lib/typingpool/amazon/hit.rb', line 74 def create(question, config_assign) new(RTurk::Hit.create(:title => config_assign.title || question.title) do |hit| hit.description = config_assign.description || question.description hit.question(question.url) hit.annotation = question.annotation or raise Error, "Missing annotation from question" hit.reward = config_assign.reward or raise Error, "Missing reward config" hit.max_assignments = 1 hit.lifetime = config_assign.lifetime or raise Error, "Missing lifetime config" hit.duration = config_assign.deadline or raise Error, "Missing deadline config" hit.auto_approval_delay = config_assign.approval or raise Error, "Missing approval config" hit.keywords = config_assign.keywords if config_assign.keywords config_assign.qualify.each{|q| hit.qualifications.add(*q.to_arg)} if config_assign.qualify end) end |
.delete_cache(hit_id, id_at = self.id_at, url_at = self.url_at) ⇒ Object
199 200 201 202 203 204 205 |
# File 'lib/typingpool/amazon/hit.rb', line 199 def delete_cache(hit_id, id_at=self.id_at, url_at=self.url_at) Amazon.cache.transaction do key = cache_key(hit_id, id_at, url_at) cached = Amazon.cache[key] Amazon.cache.delete(key) unless cached.nil? end end |
.each_page ⇒ Object
211 212 213 214 215 216 217 218 219 220 |
# File 'lib/typingpool/amazon/hit.rb', line 211 def each_page results = [] page = 0 begin page += 1 new_results = yield(page) results.push(*new_results) end while new_results.count > 0 results end |
.filter_ours(hits, &filter) ⇒ Object
222 223 224 225 226 227 228 229 |
# File 'lib/typingpool/amazon/hit.rb', line 222 def filter_ours(hits, &filter) filter ||= lambda{|hit| hit.ours? } hits.select do |hit| selected = filter.call(hit) hit.to_cache selected end end |
.from_cache(hit_id, id_at = self.id_at, url_at = self.url_at) ⇒ Object
193 194 195 196 197 |
# File 'lib/typingpool/amazon/hit.rb', line 193 def from_cache(hit_id, id_at=self.id_at, url_at=self.url_at) Amazon.cache.transaction do Amazon.cache[cache_key(hit_id, id_at, url_at)] end end |
.id_at ⇒ Object
Name of the hidden HTML form field used to provide the project_id in an external question or (form-encoded) annotation. Hard coded to typingpool_project_id but overridable in a subclass.
93 94 95 |
# File 'lib/typingpool/amazon/hit.rb', line 93 def id_at @@id_at ||= 'typingpool_project_id' end |
.minimum_commission ⇒ Object
237 238 239 |
# File 'lib/typingpool/amazon/hit.rb', line 237 def minimum_commission 0.01 end |
.reward_to_total_cost(reward) ⇒ Object
231 232 233 234 235 |
# File 'lib/typingpool/amazon/hit.rb', line 231 def reward_to_total_cost(reward) amazon_fee = reward.to_f * commission_rate amazon_fee = minimum_commission if amazon_fee < minimum_commission reward + amazon_fee end |
.url_at ⇒ Object
Name of the hidden HTML form field used to provide the (audio) url in an external question or (form-encoded) annotation. Hard coded to typingpool_url but overridable in a subclass.
101 102 103 |
# File 'lib/typingpool/amazon/hit.rb', line 101 def url_at @@url_at ||= 'typingpool_url' end |
.with_ids(ids) ⇒ Object
Takes an array of HIT ids, returns Typingpool::Amazon::HIT instances corresponding to those ids.
107 108 109 |
# File 'lib/typingpool/amazon/hit.rb', line 107 def with_ids(ids) ids.map{|id| cached_or_new(RTurk::Hit.new(id)) } end |
Instance Method Details
#approved? ⇒ Boolean
Returns true if this HIT has an approved assignment associated with it. (Attached to Typingpool::Amazon::HIT rather than Typingpool::Amazon::HIT::Assignment because sometimes we can tell simply from looking at hit.full that there are no approved assignments – hit.full.assignments_completed == 0. This check is only performed when hit.full has already been loaded.)
286 287 288 289 290 291 292 |
# File 'lib/typingpool/amazon/hit.rb', line 286 def approved? if @full return false if full.assignments_completed == 0 return false if full.status != 'Reviewable' end assignment.status == 'Approved' end |
#assignment ⇒ Object
Returns the assignment associated with this HIT - a Typingpool::Amazon::HIT::Assignment instance. The first time this is called, an Amazon HTTP request is typically (but not always) sent.
409 410 411 412 |
# File 'lib/typingpool/amazon/hit.rb', line 409 def assignment @assignment ||= Assignment.new(at_amazon) #expensive @assignment end |
#at_amazon ⇒ Object
Returns an RTurk::Hit instance corresponding to this HIT.
365 366 367 |
# File 'lib/typingpool/amazon/hit.rb', line 365 def at_amazon Amazon.rturk_hit_full(@id) end |
#cacheable? ⇒ Boolean
453 454 455 456 457 458 459 460 461 462 463 464 |
# File 'lib/typingpool/amazon/hit.rb', line 453 def cacheable? if @ours == false return true end if @full return true if full.expired_and_overdue? end if @assignment && assignment.status return true if @@cacheable_assignment_status.include?(assignment.status) end return false end |
#full(full_hit = nil) ⇒ Object
Returns “the full hit” - a Typingpool::Amazon::HIT::Full instance associated with this HIT. If the instance is being created for the first time, this will trigger an HTTP request to Amazon’s servers. “Full” hit fields segregated because accessing any one of them is expensive if we only have a hit id (but after fetching one all are cheap). Accepts an optional Typingpool::Amazon::HIT::Full (or subclass) to set for this attribute, preventing the need to create one. This is useful in cases in which extensive HIT data was returned by an Amazon operation (for example, SearchHITs returns lots of HIT data)
398 399 400 401 402 403 |
# File 'lib/typingpool/amazon/hit.rb', line 398 def full(full_hit=nil) if @full.nil? @full = full_hit || Full.new(at_amazon) end @full end |
#ours? ⇒ Boolean
Returns true if this HIT is associated with Typingpool. One Amazon account can be used for many tasks, so it’s important to check whether the HIT belongs to this software. (Presently, this is determined by looking for a stashed param like url or project_id).
323 324 325 |
# File 'lib/typingpool/amazon/hit.rb', line 323 def ours? @ours ||= not(url.to_s.empty?) end |
#project_id ⇒ Object
The Typingpool::Project::Local#id associated with this HIT. Extracted as described for the url method.
269 270 271 |
# File 'lib/typingpool/amazon/hit.rb', line 269 def project_id @project_id ||= stashed_param(self.class.id_at) end |
#project_title_from_url(url = self.url) ⇒ Object
Returns the Typingpool::Project#name associated with this HIT by parsing the #url. May be dropped in a future release.
275 276 277 278 |
# File 'lib/typingpool/amazon/hit.rb', line 275 def project_title_from_url(url=self.url) matches = Project.url_regex.match(url) or raise Error::Argument::Format, "Unexpected format to url '#{url}'" URI.decode_www_form_component(matches[2]) end |
#rejected? ⇒ Boolean
Returns true if this HIT has a rejected assignment associated with it. (For an explanation of why this is not attached to Typingpool::Amazon::HIT::Assignment, see the documentation for approved?.)
298 299 300 301 302 303 304 |
# File 'lib/typingpool/amazon/hit.rb', line 298 def rejected? if @full return false if full.assignments_completed == 0 return false if full.status != 'Reviewable' end assignment.status == 'Rejected' end |
#remove_from_amazon ⇒ Object
Deletes the HIT from Amazon’s servers. Examines the HIT and assignment status to determine whether calling the DisposeHIT or DisableHIT operation is most appropriate. If the HIT has been submitted but not approved or rejected, will raise an exception of type Typingpool::Error::Amazon::UnreviewedContent. Catch this exception in your own code if you’d like to automatically approve such HITs before removing them.
377 378 379 380 381 382 383 384 385 386 |
# File 'lib/typingpool/amazon/hit.rb', line 377 def remove_from_amazon if full.status == 'Reviewable' if assignment.status == 'Submitted' raise Error::Amazon::UnreviewedContent, "There is an unreviewed submission for #{url}" end at_amazon.dispose! else at_amazon.disable! end end |
#stashed_param(param) ⇒ Object
private
417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 |
# File 'lib/typingpool/amazon/hit.rb', line 417 def stashed_param(param) if @assignment && assignment.answers[param] return assignment.answers[param] elsif full.annotation[param] #A question assigned through this software. May be #expensive: May result in HTTP request to fetch HIT #fields. We choose to fetch (sometimes) the HIT rather than #the assignment on the assumption it will be MORE common to #encounter HITs with no answers and LESS common to encounter #HITs assigned through the RUI (and thus lacking in an #annotation from this software and thus rendering the HTTP #request to fetch the HIT fields pointless). return full.annotation[param] elsif full.assignments_completed.to_i >= 1 #A question assigned through Amazon's RUI, with an answer #submitted. If the HIT belongs to this software, this #assignment's answers will include our param. We prefer #fetching the assignment to fetching the external question #(as below) because fetching the assignment will potentially #save us an HTTP request down the line -- for example, if we #need other assignment data (e.g. assignment status). #Fetching the external question only serves to give us #access to params. If the answers do not include our param, #we know the HIT does not belong to this software, since we #know the param was also not in the annotation. So we are #safe returning nil in that case. return assignment.answers[param] else #A question assigned via Amazon's RUI, with no answer #submitted. Expensive: Results in HTTP request to fetch #external question. return full.external_question_param(param) end end |
#submitted? ⇒ Boolean
Returns true if this HIT has a submitted assignment associated with it. (For an explanation of why this is not attached to Typingpool::Amazon::HIT::Assignment, see the documentation for approved?.)
310 311 312 313 314 315 |
# File 'lib/typingpool/amazon/hit.rb', line 310 def submitted? if @full return false if full.status != 'Reviewable' end assignment.status == 'Submitted' end |
#to_cache ⇒ Object
If this HIT is cacheable, serializes it to the cache file specified in the config passed to Amazon.setup, or specified in the default config file. In short, a HIT is cacheable if it does not belong to Typingpool (ours? == false), if it is approved or rejected (approved? || rejected?), or if it is expired (full.expired_and_overdue?). See also cacheable? code.
When available, cached HITs are used by Typingpool::Amazon::HIT.all, Typingpool::Amazon::HIT.all_approved, and all the other class methods that retrieve HITs. These methods call to_cache for you at logical times (after downloading and filtering, when the HIT is most fleshed out), so you should not need to call this yourself. But if you have an operation that makes network calls to further flesh out the HIT, calling to_cache may be worthwhile.
354 355 356 357 358 359 360 361 362 |
# File 'lib/typingpool/amazon/hit.rb', line 354 def to_cache #any obj containing a Nokogiri object cannot be stored in pstore - do #not forget this (again) if cacheable? Amazon.cache.transaction do Amazon.cache[self.class.cache_key(@id)] = self end end end |
#transcript ⇒ Object
Returns a Typingpool::Transcript::Chunk instance built using this HIT and its associated assignment.
329 330 331 332 333 334 335 336 |
# File 'lib/typingpool/amazon/hit.rb', line 329 def transcript transcript = Transcript::Chunk.new(assignment.body) transcript.url = url transcript.project = project_id transcript.worker = assignment.worker_id transcript.hit = @id transcript end |
#url ⇒ Object
URL of the audio file associated with this HIT (the audio file to be transcribed). Extracted from the annotation (when the HIT was assigned via Typingpool) or from a hidden field in the HTML form on the external question (when the HIT was assigned via the Amazon Mechanical Turk RUI).
263 264 265 |
# File 'lib/typingpool/amazon/hit.rb', line 263 def url @url ||= stashed_param(self.class.url_at) end |