Class: Spacy::Language
- Inherits:
-
Object
- Object
- Spacy::Language
- Defined in:
- lib/ruby-spacy.rb
Overview
See also spaCy Python API document for [‘Language`](spacy.io/api/language).
Instance Attribute Summary collapse
-
#py_nlp ⇒ Object
readonly
A Python ‘Language` instance accessible via `PyCall`.
-
#spacy_nlp_id ⇒ String
readonly
An identifier string that can be used to refer to the Python ‘Language` object inside `PyCall::exec` or `PyCall::eval`.
Instance Method Summary collapse
-
#get_lexeme(text) ⇒ Object
A utility method to get a Python ‘Lexeme` object.
-
#initialize(model = "en_core_web_sm", max_retrial: MAX_RETRIAL, retrial: 0) ⇒ Language
constructor
Creates a language model instance, which is conventionally referred to by a variable named ‘nlp`.
-
#matcher ⇒ Matcher
Generates a matcher for the current language model.
-
#method_missing(name, *args) ⇒ Object
Methods defined in Python but not wrapped in ruby-spacy can be called by this dynamic method handling mechanism.…
-
#most_similar(vector, num) ⇒ Array<Hash{:key => Integer, :text => String, :best_rows => Array<Float>, :score => Float}>
Returns n lexemes having the vector representations that are the most similar to a given vector representation of a word.
-
#pipe(texts, disable: [], batch_size: 50) ⇒ Array<Doc>
Utility function to batch process many texts.
-
#pipe_names ⇒ Array<String>
A utility method to list pipeline components.
-
#read(text) ⇒ Object
Reads and analyze the given text.
- #respond_to_missing?(sym) ⇒ Boolean
-
#vocab(text) ⇒ Lexeme
Returns a ruby lexeme object.
-
#vocab_string_lookup(id) ⇒ Object
A utility method to lookup a vocabulary item of the given id.
Constructor Details
#initialize(model = "en_core_web_sm", max_retrial: MAX_RETRIAL, retrial: 0) ⇒ Language
Creates a language model instance, which is conventionally referred to by a variable named ‘nlp`.
358 359 360 361 362 363 364 365 366 367 368 |
# File 'lib/ruby-spacy.rb', line 358 def initialize(model = "en_core_web_sm", max_retrial: MAX_RETRIAL, retrial: 0) @spacy_nlp_id = "nlp_#{model.object_id}" PyCall.exec("import spacy; #{@spacy_nlp_id} = spacy.load('#{model}')") @py_nlp = PyCall.eval(@spacy_nlp_id) rescue StandardError retrial += 1 raise "Error: Pycall failed to load Spacy" unless retrial <= max_retrial sleep 0.5 initialize(model, max_retrial: max_retrial, retrial: retrial) end |
Dynamic Method Handling
This class handles dynamic methods through the method_missing method
#method_missing(name, *args) ⇒ Object
Methods defined in Python but not wrapped in ruby-spacy can be called by this dynamic method handling mechanism.…
453 454 455 |
# File 'lib/ruby-spacy.rb', line 453 def method_missing(name, *args) @py_nlp.send(name, *args) end |
Instance Attribute Details
#py_nlp ⇒ Object (readonly)
Returns a Python ‘Language` instance accessible via `PyCall`.
354 355 356 |
# File 'lib/ruby-spacy.rb', line 354 def py_nlp @py_nlp end |
#spacy_nlp_id ⇒ String (readonly)
Returns an identifier string that can be used to refer to the Python ‘Language` object inside `PyCall::exec` or `PyCall::eval`.
351 352 353 |
# File 'lib/ruby-spacy.rb', line 351 def spacy_nlp_id @spacy_nlp_id end |
Instance Method Details
#get_lexeme(text) ⇒ Object
A utility method to get a Python ‘Lexeme` object.
402 403 404 |
# File 'lib/ruby-spacy.rb', line 402 def get_lexeme(text) @py_nlp.vocab[text] end |
#matcher ⇒ Matcher
Generates a matcher for the current language model.
378 379 380 |
# File 'lib/ruby-spacy.rb', line 378 def matcher Matcher.new(@py_nlp) end |
#most_similar(vector, num) ⇒ Array<Hash{:key => Integer, :text => String, :best_rows => Array<Float>, :score => Float}>
Returns n lexemes having the vector representations that are the most similar to a given vector representation of a word.
416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 |
# File 'lib/ruby-spacy.rb', line 416 def most_similar(vector, num) vec_array = Numpy.asarray([vector]) py_result = @py_nlp.vocab.vectors.most_similar(vec_array, n: num) key_texts = PyCall.eval("[[str(num), #{@spacy_nlp_id}.vocab[num].text] for num in #{py_result[0][0].tolist}]") keys = key_texts.map { |kt| kt[0] } texts = key_texts.map { |kt| kt[1] } best_rows = PyCall::List.call(py_result[1])[0] scores = PyCall::List.call(py_result[2])[0] results = [] num.times do |i| result = { key: keys[i].to_i, text: texts[i], best_row: best_rows[i], score: scores[i] } result.each_key do |key| result.define_singleton_method(key) { result[key] } end results << result end results end |
#pipe(texts, disable: [], batch_size: 50) ⇒ Array<Doc>
Utility function to batch process many texts
444 445 446 447 448 449 450 |
# File 'lib/ruby-spacy.rb', line 444 def pipe(texts, disable: [], batch_size: 50) docs = [] PyCall::List.call(@py_nlp.pipe(texts, disable: disable, batch_size: batch_size)).each do |py_doc| docs << Doc.new(@py_nlp, py_doc: py_doc) end docs end |
#pipe_names ⇒ Array<String>
A utility method to list pipeline components.
391 392 393 394 395 396 397 |
# File 'lib/ruby-spacy.rb', line 391 def pipe_names pipe_array = [] PyCall::List.call(@py_nlp.pipe_names).each do |pipe| pipe_array << pipe end pipe_array end |
#read(text) ⇒ Object
Reads and analyze the given text.
372 373 374 |
# File 'lib/ruby-spacy.rb', line 372 def read(text) Doc.new(py_nlp, text: text) end |
#respond_to_missing?(sym) ⇒ Boolean
457 458 459 |
# File 'lib/ruby-spacy.rb', line 457 def respond_to_missing?(sym) sym ? true : super end |
#vocab(text) ⇒ Lexeme
Returns a ruby lexeme object
409 410 411 |
# File 'lib/ruby-spacy.rb', line 409 def vocab(text) Lexeme.new(@py_nlp.vocab[text]) end |
#vocab_string_lookup(id) ⇒ Object
A utility method to lookup a vocabulary item of the given id.
385 386 387 |
# File 'lib/ruby-spacy.rb', line 385 def vocab_string_lookup(id) PyCall.eval("#{@spacy_nlp_id}.vocab.strings[#{id}]") end |