Class: DoverToCalais::Dover
- Inherits:
-
Object
- Object
- DoverToCalais::Dover
- Defined in:
- lib/dover_to_calais.rb
Overview
This class is responsible for parsing, reading and sending to OpenCalais, text from a data source. The data source is passed to the class constructor and can be pretty much any form of document or URL. The class allows the user to specify one or more callbacks, to be called when the data source has been processed by OpenCalais (#to_calais).
Constant Summary collapse
- CALAIS_SERVICE =
'https://api.opencalais.com/tag/rs/enrich'
Instance Attribute Summary collapse
-
#data_src ⇒ String
readonly
The data source to be processed, either a file path or a URL.
-
#error ⇒ String?
readonly
Any error that occurred during data-source processing, nil if none occurred.
Instance Method Summary collapse
-
#analyse_this(output_format = nil) ⇒ Object
(also: #analyze_this)
Gets the source text parsed.
-
#initialize(data_src) ⇒ Dover
constructor
creates a new Dover object, passing the name of the data source to be processed.
-
#to_calais(&block) ⇒ Object
Defines the user callbacks.
Constructor Details
#initialize(data_src) ⇒ Dover
creates a new Dover object, passing the name of the data source to be processed
480 481 482 483 |
# File 'lib/dover_to_calais.rb', line 480 def initialize(data_src) @data_src = data_src @callbacks = [] end |
Instance Attribute Details
#data_src ⇒ String (readonly)
Returns the data source to be processed, either a file path or a URL.
471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 |
# File 'lib/dover_to_calais.rb', line 471 class Dover CALAIS_SERVICE = 'https://api.opencalais.com/tag/rs/enrich' attr_reader :data_src, :error # creates a new Dover object, passing the name of the data source to be processed # # @param data_src [String] the name of the data source to be processed def initialize(data_src) @data_src = data_src @callbacks = [] end # uses the {https://github.com/Erol/yomu yomu} gem to extract text from a number of document formats and URLs. # If an exception occurs, it is written to the {@error} instance variable # # @param [String] src the name of the data source (file-path or URI) # @return [String, nil] the extracted text, or nil if an exception occurred. def get_src_data(src) begin yomu = Yomu.new src rescue Exception=>e @error = "ERR: #{e}" else yomu.text end end # Defines the user callbacks. If the data source is successfully read, then this method will store a # user-defined block which will be called on completion of the OpenCalais HTTP request. If the data source # cannot be read -for whatever reason- then the block will immediately be called, passing the parameter that # caused the read failure. # # @param block a user-defined block # @return N/A def to_calais(&block) #fred rules ok if !@error @callbacks << block else result = ResponseData.new nil, @error block.call(result) end end #method # Gets the source text parsed. If the parsing is successful, the data source is POSTed to OpenCalais # via an EventMachine request and a callback is set to manage the OpenCalais response. # All Dover object callbacks are then called with the request result yielded to them. # # @param N/A # @return a {Class ResponseData} object def analyse_this(output_format=nil) if output_format @output_format = 'application/json' else @output_format = 'Text/Simple' end @document = get_src_data(@data_src) begin if @document[0..2].eql?('ERR') raise 'Invalid data source' else response = nil = {:inactivity_timeout => 0} if DoverToCalais::PROXY && DoverToCalais::PROXY.class.eql?('Hash') && DoverToCalais::PROXY.keys[0].eql?(:proxy) = .merge(DoverToCalais::PROXY) end = { :body => @document.to_s, :head => { 'x-calais-licenseID' => DoverToCalais::API_KEY, :content_type => 'TEXT/RAW', :enableMetadataType => 'GenericRelations,SocialTags', :outputFormat => @output_format} } http = EventMachine::HttpRequest.new(CALAIS_SERVICE, ).post http.callback do if http.response_header.status == 200 if @output_format == 'Text/Simple' http.response.match(/<OpenCalaisSimple>/) do |m| response = Nokogiri::XML('<OpenCalaisSimple>' + m.post_match) do |config| #strict xml parsing, disallow network connections config.strict.nonet end #block end else #@output_format == 'application/json' response = JSON.parse(http.response) #response should now be a Hash end #if case response.class.to_s when 'NilClass' result = ResponseData.new(nil,'ERR: cannot parse response data - source invalid?') when 'Nokogiri::XML::Document' result = ResponseData.new(response, nil) when 'Hash' result = ResponseData.new(response, nil) else result = ResponseData.new(nil,'ERR: cannot parse response data - unrecognized format!') end else #non-200 response result = ResponseData.new nil, "ERR: OpenCalais service responded with #{http.response_header.status} - response body: '#{http.response}'" end @callbacks.each { |c| c.call(result) } end #callback http.errback do result = ResponseData.new nil, "ERR: #{http.error}" @callbacks.each { |c| c.call(result) } end #errback end #if rescue Exception=>e #result = ResponseData.new nil, "ERR: #{e}" #@callbacks.each { |c| c.call(result) } @error = "ERR: #{e}" end end #method alias_method :analyze_this, :analyse_this public :to_calais, :analyse_this private :get_src_data end |
#error ⇒ String? (readonly)
Returns any error that occurred during data-source processing, nil if none occurred.
471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 |
# File 'lib/dover_to_calais.rb', line 471 class Dover CALAIS_SERVICE = 'https://api.opencalais.com/tag/rs/enrich' attr_reader :data_src, :error # creates a new Dover object, passing the name of the data source to be processed # # @param data_src [String] the name of the data source to be processed def initialize(data_src) @data_src = data_src @callbacks = [] end # uses the {https://github.com/Erol/yomu yomu} gem to extract text from a number of document formats and URLs. # If an exception occurs, it is written to the {@error} instance variable # # @param [String] src the name of the data source (file-path or URI) # @return [String, nil] the extracted text, or nil if an exception occurred. def get_src_data(src) begin yomu = Yomu.new src rescue Exception=>e @error = "ERR: #{e}" else yomu.text end end # Defines the user callbacks. If the data source is successfully read, then this method will store a # user-defined block which will be called on completion of the OpenCalais HTTP request. If the data source # cannot be read -for whatever reason- then the block will immediately be called, passing the parameter that # caused the read failure. # # @param block a user-defined block # @return N/A def to_calais(&block) #fred rules ok if !@error @callbacks << block else result = ResponseData.new nil, @error block.call(result) end end #method # Gets the source text parsed. If the parsing is successful, the data source is POSTed to OpenCalais # via an EventMachine request and a callback is set to manage the OpenCalais response. # All Dover object callbacks are then called with the request result yielded to them. # # @param N/A # @return a {Class ResponseData} object def analyse_this(output_format=nil) if output_format @output_format = 'application/json' else @output_format = 'Text/Simple' end @document = get_src_data(@data_src) begin if @document[0..2].eql?('ERR') raise 'Invalid data source' else response = nil = {:inactivity_timeout => 0} if DoverToCalais::PROXY && DoverToCalais::PROXY.class.eql?('Hash') && DoverToCalais::PROXY.keys[0].eql?(:proxy) = .merge(DoverToCalais::PROXY) end = { :body => @document.to_s, :head => { 'x-calais-licenseID' => DoverToCalais::API_KEY, :content_type => 'TEXT/RAW', :enableMetadataType => 'GenericRelations,SocialTags', :outputFormat => @output_format} } http = EventMachine::HttpRequest.new(CALAIS_SERVICE, ).post http.callback do if http.response_header.status == 200 if @output_format == 'Text/Simple' http.response.match(/<OpenCalaisSimple>/) do |m| response = Nokogiri::XML('<OpenCalaisSimple>' + m.post_match) do |config| #strict xml parsing, disallow network connections config.strict.nonet end #block end else #@output_format == 'application/json' response = JSON.parse(http.response) #response should now be a Hash end #if case response.class.to_s when 'NilClass' result = ResponseData.new(nil,'ERR: cannot parse response data - source invalid?') when 'Nokogiri::XML::Document' result = ResponseData.new(response, nil) when 'Hash' result = ResponseData.new(response, nil) else result = ResponseData.new(nil,'ERR: cannot parse response data - unrecognized format!') end else #non-200 response result = ResponseData.new nil, "ERR: OpenCalais service responded with #{http.response_header.status} - response body: '#{http.response}'" end @callbacks.each { |c| c.call(result) } end #callback http.errback do result = ResponseData.new nil, "ERR: #{http.error}" @callbacks.each { |c| c.call(result) } end #errback end #if rescue Exception=>e #result = ResponseData.new nil, "ERR: #{e}" #@callbacks.each { |c| c.call(result) } @error = "ERR: #{e}" end end #method alias_method :analyze_this, :analyse_this public :to_calais, :analyse_this private :get_src_data end |
Instance Method Details
#analyse_this(output_format = nil) ⇒ Object Also known as: analyze_this
Gets the source text parsed. If the parsing is successful, the data source is POSTed to OpenCalais via an EventMachine request and a callback is set to manage the OpenCalais response. All Dover object callbacks are then called with the request result yielded to them.
531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 |
# File 'lib/dover_to_calais.rb', line 531 def analyse_this(output_format=nil) if output_format @output_format = 'application/json' else @output_format = 'Text/Simple' end @document = get_src_data(@data_src) begin if @document[0..2].eql?('ERR') raise 'Invalid data source' else response = nil = {:inactivity_timeout => 0} if DoverToCalais::PROXY && DoverToCalais::PROXY.class.eql?('Hash') && DoverToCalais::PROXY.keys[0].eql?(:proxy) = .merge(DoverToCalais::PROXY) end = { :body => @document.to_s, :head => { 'x-calais-licenseID' => DoverToCalais::API_KEY, :content_type => 'TEXT/RAW', :enableMetadataType => 'GenericRelations,SocialTags', :outputFormat => @output_format} } http = EventMachine::HttpRequest.new(CALAIS_SERVICE, ).post http.callback do if http.response_header.status == 200 if @output_format == 'Text/Simple' http.response.match(/<OpenCalaisSimple>/) do |m| response = Nokogiri::XML('<OpenCalaisSimple>' + m.post_match) do |config| #strict xml parsing, disallow network connections config.strict.nonet end #block end else #@output_format == 'application/json' response = JSON.parse(http.response) #response should now be a Hash end #if case response.class.to_s when 'NilClass' result = ResponseData.new(nil,'ERR: cannot parse response data - source invalid?') when 'Nokogiri::XML::Document' result = ResponseData.new(response, nil) when 'Hash' result = ResponseData.new(response, nil) else result = ResponseData.new(nil,'ERR: cannot parse response data - unrecognized format!') end else #non-200 response result = ResponseData.new nil, "ERR: OpenCalais service responded with #{http.response_header.status} - response body: '#{http.response}'" end @callbacks.each { |c| c.call(result) } end #callback http.errback do result = ResponseData.new nil, "ERR: #{http.error}" @callbacks.each { |c| c.call(result) } end #errback end #if rescue Exception=>e #result = ResponseData.new nil, "ERR: #{e}" #@callbacks.each { |c| c.call(result) } @error = "ERR: #{e}" end end |
#to_calais(&block) ⇒ Object
Defines the user callbacks. If the data source is successfully read, then this method will store a user-defined block which will be called on completion of the OpenCalais HTTP request. If the data source cannot be read -for whatever reason- then the block will immediately be called, passing the parameter that caused the read failure.
510 511 512 513 514 515 516 517 518 519 |
# File 'lib/dover_to_calais.rb', line 510 def to_calais(&block) #fred rules ok if !@error @callbacks << block else result = ResponseData.new nil, @error block.call(result) end end |