Class: Google::Cloud::Language::Document

Inherits:
Object
  • Object
show all
Defined in:
lib/google/cloud/language/document.rb

Overview

# Document

Represents a document for the Language service.

Cloud Natural Language API supports UTF-8, UTF-16, and UTF-32 encodings. (Ruby uses UTF-8 natively, which is the default sent to the API, so unless you’re working with text processed in different platform, you should not need to set the encoding type.)

Be aware that only English, Spanish, and Japanese language content are supported, and sentiment analysis only supports English text.

See Project#document.

Examples:

require "google/cloud"

gcloud = Google::Cloud.new
language = gcloud.language

content = "Darth Vader is the best villain in Star Wars."
document = language.document content
annotation = document.annotate

annotation.entities.count #=> 2
annotation.sentiment.polarity #=> 1.0
annotation.sentiment.magnitude #=> 0.8999999761581421
annotation.sentences.count #=> 1
annotation.tokens.count #=> 10

Instance Attribute Summary collapse

Class Method Summary collapse

Instance Method Summary collapse

Constructor Details

#initializeDocument

Returns a new instance of Document.



59
60
61
62
# File 'lib/google/cloud/language/document.rb', line 59

def initialize
  @grpc = nil
  @service = nil
end

Instance Attribute Details

#serviceObject



55
56
57
# File 'lib/google/cloud/language/document.rb', line 55

def service
  @service
end

Class Method Details

.from_grpc(grpc, service) ⇒ Object



331
332
333
334
335
336
# File 'lib/google/cloud/language/document.rb', line 331

def self.from_grpc grpc, service
  new.tap do |i|
    i.instance_variable_set :@grpc, grpc
    i.instance_variable_set :@service, service
  end
end

.from_source(source, service, format: nil, language: nil) ⇒ Object



340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
# File 'lib/google/cloud/language/document.rb', line 340

def self.from_source source, service, format: nil, language: nil
  source = String source
  grpc = Google::Cloud::Language::V1beta1::Document.new
  if source.start_with? "gs://"
    grpc.gcs_content_uri = source
    format ||= :html if source.end_with? ".html"
  else
    grpc.content = source
  end
  if format.to_s == "html"
    grpc.type = :HTML
  else
    grpc.type = :PLAIN_TEXT
  end
  grpc.language = language.to_s unless language.nil?
  from_grpc grpc, service
end

Instance Method Details

#annotate(sentiment: false, entities: false, syntax: false, encoding: nil) ⇒ Annotation Also known as: mark, detect

Analyzes the document and returns sentiment, entity, and syntactic feature results, depending on the option flags. Calling ‘annotate` with no arguments will perform all analysis features. Each feature is priced separately. See [Pricing](cloud.google.com/natural-language/pricing) for details.

Examples:

require "google/cloud"

gcloud = Google::Cloud.new
language = gcloud.language

content = "Darth Vader is the best villain in Star Wars."
document = language.document content
annotation = document.annotate

annotation.sentiment.polarity #=> 1.0
annotation.sentiment.magnitude #=> 0.8999999761581421
annotation.entities.count #=> 2
annotation.sentences.count #=> 1
annotation.tokens.count #=> 10

With feature flags:

require "google/cloud"

gcloud = Google::Cloud.new
language = gcloud.language

content = "Darth Vader is the best villain in Star Wars."
document = language.document content
annotation = document.annotate entities: true, text: true

annotation.sentiment #=> nil
annotation.entities.count #=> 2
annotation.sentences.count #=> 1
annotation.tokens.count #=> 10

Parameters:

  • sentiment (Boolean) (defaults to: false)

    Whether to perform sentiment analysis. Optional. The default is ‘false`. If every feature option is `false`, all features will be performed.

  • entities (Boolean) (defaults to: false)

    Whether to perform the entity analysis. Optional. The default is ‘false`. If every feature option is `false`, all features will be performed.

  • syntax (Boolean) (defaults to: false)

    Whether to perform syntactic analysis. Optional. The default is ‘false`. If every feature option is `false`, all features will be performed.

  • encoding (String) (defaults to: nil)

    The encoding type used by the API to calculate offsets. Optional.

Returns:

  • (Annotation)

    ] The results of the content analysis.



219
220
221
222
223
224
225
226
227
# File 'lib/google/cloud/language/document.rb', line 219

def annotate sentiment: false, entities: false, syntax: false,
             encoding: nil
  ensure_service!
  grpc = service.annotate to_grpc, sentiment: sentiment,
                                   entities: entities,
                                   syntax: syntax,
                                   encoding: encoding
  Annotation.from_grpc grpc
end

#content?Boolean

Returns:

  • (Boolean)


67
68
69
# File 'lib/google/cloud/language/document.rb', line 67

def content?
  @grpc.source == :content
end

#entities(encoding: nil) ⇒ Annotation::Entities

Entity analysis inspects the given text for known entities (proper nouns such as public figures, landmarks, etc.) and returns information about those entities.

content = “Darth Vader is the best villain in Star Wars.” document = language.document content entities = document.entities # API call

entities.count #=> 2 entities.first.name #=> “Darth Vader” entities.first.type #=> :PERSON entities.first.name #=> “Star Wars” entities.first.type #=> :WORK_OF_ART

Examples:

require "google/cloud"

gcloud = Google::Cloud.new
language = gcloud.language

Parameters:

  • encoding (String) (defaults to: nil)

    The encoding type used by the API to calculate offsets. Optional.

Returns:



282
283
284
285
286
# File 'lib/google/cloud/language/document.rb', line 282

def entities encoding: nil
  ensure_service!
  grpc = service.entities to_grpc, encoding: encoding
  Annotation::Entities.from_grpc grpc
end

#formatSymbol

The document’s format.

Returns:

  • (Symbol)

    ‘:text` or `:html`



91
92
93
94
# File 'lib/google/cloud/language/document.rb', line 91

def format
  return :text if text?
  return :html if html?
end

#format=(new_format) ⇒ Object

Sets the document’s format.

Examples:

document = language.document "<p>The Old Man and the Sea</p>"
document.format = :html

Parameters:

  • new_format (Symbol, String)

    Accepted values are ‘:text` or `:html`.



106
107
108
109
110
# File 'lib/google/cloud/language/document.rb', line 106

def format= new_format
  @grpc.type = :PLAIN_TEXT if new_format.to_s == "text"
  @grpc.type = :HTML       if new_format.to_s == "html"
  @grpc.type
end

#html!Object

Sets the document to the ‘HTML` format.



140
141
142
# File 'lib/google/cloud/language/document.rb', line 140

def html!
  @grpc.type = :HTML
end

#html?Boolean

Whether the document is the ‘HTML` format.

Returns:

  • (Boolean)


133
134
135
# File 'lib/google/cloud/language/document.rb', line 133

def html?
  @grpc.type == :HTML
end

#inspectObject



317
318
319
320
321
# File 'lib/google/cloud/language/document.rb', line 317

def inspect
  "#<#{self.class.name} (" \
    "#{(content? ? "\"#{source[0, 16]}...\"" : source)}, " \
    "format: #{format.inspect}, language: #{language.inspect})>"
end

#languageString

The document’s language. ISO and BCP-47 language codes are supported.

Returns:

  • (String)


149
150
151
# File 'lib/google/cloud/language/document.rb', line 149

def language
  @grpc.language
end

#language=(new_language) ⇒ Object

Sets the document’s language.

Examples:

document = language.document "<p>El viejo y el mar</p>"
document.language = "es"

Parameters:

  • new_language (String, Symbol)

    ISO and BCP-47 language codes are accepted.



163
164
165
# File 'lib/google/cloud/language/document.rb', line 163

def language= new_language
  @grpc.language = new_language.to_s
end

#sentimentAnnotation::Sentiment

Sentiment analysis inspects the given text and identifies the prevailing emotional opinion within the text, especially to determine a writer’s attitude as positive, negative, or neutral. Currently, only English is supported for sentiment analysis.

content = “Darth Vader is the best villain in Star Wars.” document = language.document content sentiment = document.sentiment # API call

sentiment.polarity #=> 1.0 sentiment.magnitude #=> 0.8999999761581421

Examples:

require "google/cloud"

gcloud = Google::Cloud.new
language = gcloud.language

Returns:



310
311
312
313
314
# File 'lib/google/cloud/language/document.rb', line 310

def sentiment
  ensure_service!
  grpc = service.sentiment to_grpc
  Annotation::Sentiment.from_grpc grpc
end

#sourceObject



81
82
83
84
# File 'lib/google/cloud/language/document.rb', line 81

def source
  return @grpc.content if content?
  @grpc.gcs_content_uri
end

#syntax(encoding: nil) ⇒ Annotation

Syntactic analysis extracts linguistic information, breaking up the given text into a series of sentences and tokens (generally, word boundaries), providing further analysis on those tokens.

Examples:

require "google/cloud"

gcloud = Google::Cloud.new
language = gcloud.language

document = language.document "Hello world!"

annotation = document.syntax
annotation.thing #=> Some Result

Parameters:

  • encoding (String) (defaults to: nil)

    The encoding type used by the API to calculate offsets. Optional.

Returns:

  • (Annotation)

    ] The results for the content analysis.



252
253
254
# File 'lib/google/cloud/language/document.rb', line 252

def syntax encoding: nil
  annotate syntax: true, encoding: encoding
end

#text!Object

Sets the document to the ‘TEXT` format.



124
125
126
# File 'lib/google/cloud/language/document.rb', line 124

def text!
  @grpc.type = :PLAIN_TEXT
end

#text?Boolean

Whether the document is the ‘TEXT` format.

Returns:

  • (Boolean)


117
118
119
# File 'lib/google/cloud/language/document.rb', line 117

def text?
  @grpc.type == :PLAIN_TEXT
end

#to_grpcObject



325
326
327
# File 'lib/google/cloud/language/document.rb', line 325

def to_grpc
  @grpc
end

#url?Boolean

Returns:

  • (Boolean)


74
75
76
# File 'lib/google/cloud/language/document.rb', line 74

def url?
  @grpc.source == :gcs_content_uri
end