Class: Google::Cloud::Language::Document

Inherits:
Object
  • Object
show all
Defined in:
lib/google/cloud/language/document.rb

Overview

# Document

Represents a document for the Language service.

Cloud Natural Language API supports UTF-8, UTF-16, and UTF-32 encodings. (Ruby uses UTF-8 natively, which is the default sent to the API, so unless you’re working with text processed in different platform, you should not need to set the encoding type.)

Be aware that only English, Spanish, and Japanese language content are supported, and sentiment analysis only supports English text.

See Project#document.

Examples:

require "google/cloud/language"

language = Google::Cloud::Language.new

content = "Darth Vader is the best villain in Star Wars."
document = language.document content
annotation = document.annotate

annotation.entities.count #=> 2
annotation.sentiment.polarity #=> 1.0
annotation.sentiment.magnitude #=> 0.8999999761581421
annotation.sentences.count #=> 1
annotation.tokens.count #=> 10

Instance Attribute Summary collapse

Class Method Summary collapse

Instance Method Summary collapse

Constructor Details

#initializeDocument

Returns a new instance of Document.



58
59
60
61
# File 'lib/google/cloud/language/document.rb', line 58

def initialize
  @grpc = nil
  @service = nil
end

Instance Attribute Details

#serviceObject



54
55
56
# File 'lib/google/cloud/language/document.rb', line 54

def service
  @service
end

Class Method Details

.from_grpc(grpc, service) ⇒ Object



325
326
327
328
329
330
# File 'lib/google/cloud/language/document.rb', line 325

def self.from_grpc grpc, service
  new.tap do |i|
    i.instance_variable_set :@grpc, grpc
    i.instance_variable_set :@service, service
  end
end

.from_source(source, service, format: nil, language: nil) ⇒ Object



334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
# File 'lib/google/cloud/language/document.rb', line 334

def self.from_source source, service, format: nil, language: nil
  source = String source
  grpc = Google::Cloud::Language::V1beta1::Document.new
  if source.start_with? "gs://"
    grpc.gcs_content_uri = source
    format ||= :html if source.end_with? ".html"
  else
    grpc.content = source
  end
  if format.to_s == "html"
    grpc.type = :HTML
  else
    grpc.type = :PLAIN_TEXT
  end
  grpc.language = language.to_s unless language.nil?
  from_grpc grpc, service
end

Instance Method Details

#annotate(sentiment: false, entities: false, syntax: false, encoding: nil) ⇒ Annotation Also known as: mark, detect

Analyzes the document and returns sentiment, entity, and syntactic feature results, depending on the option flags. Calling ‘annotate` with no arguments will perform all analysis features. Each feature is priced separately. See [Pricing](cloud.google.com/natural-language/pricing) for details.

Examples:

require "google/cloud/language"

language = Google::Cloud::Language.new

content = "Darth Vader is the best villain in Star Wars."
document = language.document content
annotation = document.annotate

annotation.sentiment.polarity #=> 1.0
annotation.sentiment.magnitude #=> 0.8999999761581421
annotation.entities.count #=> 2
annotation.sentences.count #=> 1
annotation.tokens.count #=> 10

With feature flags:

require "google/cloud/language"

language = Google::Cloud::Language.new

content = "Darth Vader is the best villain in Star Wars."
document = language.document content
annotation = document.annotate entities: true, text: true

annotation.sentiment #=> nil
annotation.entities.count #=> 2
annotation.sentences.count #=> 1
annotation.tokens.count #=> 10

Parameters:

  • sentiment (Boolean) (defaults to: false)

    Whether to perform sentiment analysis. Optional. The default is ‘false`. If every feature option is `false`, all features will be performed.

  • entities (Boolean) (defaults to: false)

    Whether to perform the entity analysis. Optional. The default is ‘false`. If every feature option is `false`, all features will be performed.

  • syntax (Boolean) (defaults to: false)

    Whether to perform syntactic analysis. Optional. The default is ‘false`. If every feature option is `false`, all features will be performed.

  • encoding (String) (defaults to: nil)

    The encoding type used by the API to calculate offsets. Optional.

Returns:

  • (Annotation)

    ] The results of the content analysis.



216
217
218
219
220
221
222
223
224
# File 'lib/google/cloud/language/document.rb', line 216

def annotate sentiment: false, entities: false, syntax: false,
             encoding: nil
  ensure_service!
  grpc = service.annotate to_grpc, sentiment: sentiment,
                                   entities: entities,
                                   syntax: syntax,
                                   encoding: encoding
  Annotation.from_grpc grpc
end

#content?Boolean

Returns:

  • (Boolean)


66
67
68
# File 'lib/google/cloud/language/document.rb', line 66

def content?
  @grpc.source == :content
end

#entities(encoding: nil) ⇒ Annotation::Entities

Entity analysis inspects the given text for known entities (proper nouns such as public figures, landmarks, etc.) and returns information about those entities.

content = “Darth Vader is the best villain in Star Wars.” document = language.document content entities = document.entities # API call

entities.count #=> 2 entities.first.name #=> “Darth Vader” entities.first.type #=> :PERSON entities.first.name #=> “Star Wars” entities.first.type #=> :WORK_OF_ART

Examples:

require "google/cloud/language"

language = Google::Cloud::Language.new

Parameters:

  • encoding (String) (defaults to: nil)

    The encoding type used by the API to calculate offsets. Optional.

Returns:



277
278
279
280
281
# File 'lib/google/cloud/language/document.rb', line 277

def entities encoding: nil
  ensure_service!
  grpc = service.entities to_grpc, encoding: encoding
  Annotation::Entities.from_grpc grpc
end

#formatSymbol

The document’s format.

Returns:

  • (Symbol)

    ‘:text` or `:html`



90
91
92
93
# File 'lib/google/cloud/language/document.rb', line 90

def format
  return :text if text?
  return :html if html?
end

#format=(new_format) ⇒ Object

Sets the document’s format.

Examples:

document = language.document "<p>The Old Man and the Sea</p>"
document.format = :html

Parameters:

  • new_format (Symbol, String)

    Accepted values are ‘:text` or `:html`.



105
106
107
108
109
# File 'lib/google/cloud/language/document.rb', line 105

def format= new_format
  @grpc.type = :PLAIN_TEXT if new_format.to_s == "text"
  @grpc.type = :HTML       if new_format.to_s == "html"
  @grpc.type
end

#html!Object

Sets the document to the ‘HTML` format.



139
140
141
# File 'lib/google/cloud/language/document.rb', line 139

def html!
  @grpc.type = :HTML
end

#html?Boolean

Whether the document is the ‘HTML` format.

Returns:

  • (Boolean)


132
133
134
# File 'lib/google/cloud/language/document.rb', line 132

def html?
  @grpc.type == :HTML
end

#inspectObject



311
312
313
314
315
# File 'lib/google/cloud/language/document.rb', line 311

def inspect
  "#<#{self.class.name} (" \
    "#{(content? ? "\"#{source[0, 16]}...\"" : source)}, " \
    "format: #{format.inspect}, language: #{language.inspect})>"
end

#languageString

The document’s language. ISO and BCP-47 language codes are supported.

Returns:

  • (String)


148
149
150
# File 'lib/google/cloud/language/document.rb', line 148

def language
  @grpc.language
end

#language=(new_language) ⇒ Object

Sets the document’s language.

Examples:

document = language.document "<p>El viejo y el mar</p>"
document.language = "es"

Parameters:

  • new_language (String, Symbol)

    ISO and BCP-47 language codes are accepted.



162
163
164
# File 'lib/google/cloud/language/document.rb', line 162

def language= new_language
  @grpc.language = new_language.to_s
end

#sentimentAnnotation::Sentiment

Sentiment analysis inspects the given text and identifies the prevailing emotional opinion within the text, especially to determine a writer’s attitude as positive, negative, or neutral. Currently, only English is supported for sentiment analysis.

content = “Darth Vader is the best villain in Star Wars.” document = language.document content sentiment = document.sentiment # API call

sentiment.polarity #=> 1.0 sentiment.magnitude #=> 0.8999999761581421

Examples:

require "google/cloud/language"

language = Google::Cloud::Language.new

Returns:



304
305
306
307
308
# File 'lib/google/cloud/language/document.rb', line 304

def sentiment
  ensure_service!
  grpc = service.sentiment to_grpc
  Annotation::Sentiment.from_grpc grpc
end

#sourceObject



80
81
82
83
# File 'lib/google/cloud/language/document.rb', line 80

def source
  return @grpc.content if content?
  @grpc.gcs_content_uri
end

#syntax(encoding: nil) ⇒ Annotation

Syntactic analysis extracts linguistic information, breaking up the given text into a series of sentences and tokens (generally, word boundaries), providing further analysis on those tokens.

Examples:

require "google/cloud/language"

language = Google::Cloud::Language.new

document = language.document "Hello world!"

annotation = document.syntax
annotation.thing #=> Some Result

Parameters:

  • encoding (String) (defaults to: nil)

    The encoding type used by the API to calculate offsets. Optional.

Returns:

  • (Annotation)

    ] The results for the content analysis.



248
249
250
# File 'lib/google/cloud/language/document.rb', line 248

def syntax encoding: nil
  annotate syntax: true, encoding: encoding
end

#text!Object

Sets the document to the ‘TEXT` format.



123
124
125
# File 'lib/google/cloud/language/document.rb', line 123

def text!
  @grpc.type = :PLAIN_TEXT
end

#text?Boolean

Whether the document is the ‘TEXT` format.

Returns:

  • (Boolean)


116
117
118
# File 'lib/google/cloud/language/document.rb', line 116

def text?
  @grpc.type == :PLAIN_TEXT
end

#to_grpcObject



319
320
321
# File 'lib/google/cloud/language/document.rb', line 319

def to_grpc
  @grpc
end

#url?Boolean

Returns:

  • (Boolean)


73
74
75
# File 'lib/google/cloud/language/document.rb', line 73

def url?
  @grpc.source == :gcs_content_uri
end