Class: Spacy::Language

Inherits:
Object
  • Object
show all
Defined in:
lib/ruby-spacy.rb

Overview

See also spaCy Python API document for [‘Language`](spacy.io/api/language).

Instance Attribute Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(model = "en_core_web_sm", max_retrial: MAX_RETRIAL, retrial: 0) ⇒ Language

Creates a language model instance, which is conventionally referred to by a variable named ‘nlp`.

Parameters:

  • model (String) (defaults to: "en_core_web_sm")

    A language model installed in the system



358
359
360
361
362
363
364
365
366
367
368
# File 'lib/ruby-spacy.rb', line 358

def initialize(model = "en_core_web_sm", max_retrial: MAX_RETRIAL, retrial: 0)
  @spacy_nlp_id = "nlp_#{model.object_id}"
  PyCall.exec("import spacy; #{@spacy_nlp_id} = spacy.load('#{model}')")
  @py_nlp = PyCall.eval(@spacy_nlp_id)
rescue StandardError
  retrial += 1
  raise "Error: Pycall failed to load Spacy" unless retrial <= max_retrial

  sleep 0.5
  initialize(model, max_retrial: max_retrial, retrial: retrial)
end

Dynamic Method Handling

This class handles dynamic methods through the method_missing method

#method_missing(name, *args) ⇒ Object

Methods defined in Python but not wrapped in ruby-spacy can be called by this dynamic method handling mechanism.…



453
454
455
# File 'lib/ruby-spacy.rb', line 453

def method_missing(name, *args)
  @py_nlp.send(name, *args)
end

Instance Attribute Details

#py_nlpObject (readonly)

Returns a Python ‘Language` instance accessible via `PyCall`.

Returns:

  • (Object)

    a Python ‘Language` instance accessible via `PyCall`



354
355
356
# File 'lib/ruby-spacy.rb', line 354

def py_nlp
  @py_nlp
end

#spacy_nlp_idString (readonly)

Returns an identifier string that can be used to refer to the Python ‘Language` object inside `PyCall::exec` or `PyCall::eval`.

Returns:

  • (String)

    an identifier string that can be used to refer to the Python ‘Language` object inside `PyCall::exec` or `PyCall::eval`



351
352
353
# File 'lib/ruby-spacy.rb', line 351

def spacy_nlp_id
  @spacy_nlp_id
end

Instance Method Details

#get_lexeme(text) ⇒ Object

A utility method to get a Python ‘Lexeme` object.

Parameters:

  • text (String)

    A text string representing a lexeme

Returns:



402
403
404
# File 'lib/ruby-spacy.rb', line 402

def get_lexeme(text)
  @py_nlp.vocab[text]
end

#matcherMatcher

Generates a matcher for the current language model.

Returns:



378
379
380
# File 'lib/ruby-spacy.rb', line 378

def matcher
  Matcher.new(@py_nlp)
end

#most_similar(vector, num) ⇒ Array<Hash{:key => Integer, :text => String, :best_rows => Array<Float>, :score => Float}>

Returns n lexemes having the vector representations that are the most similar to a given vector representation of a word.

Parameters:

  • vector (Object)

    A vector representation of a word (whether existing or non-existing)

Returns:

  • (Array<Hash{:key => Integer, :text => String, :best_rows => Array<Float>, :score => Float}>)

    An array of hash objects each contains the ‘key`, `text`, `best_row` and similarity `score` of a lexeme



416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
# File 'lib/ruby-spacy.rb', line 416

def most_similar(vector, num)
  vec_array = Numpy.asarray([vector])
  py_result = @py_nlp.vocab.vectors.most_similar(vec_array, n: num)
  key_texts = PyCall.eval("[[str(num), #{@spacy_nlp_id}.vocab[num].text] for num in #{py_result[0][0].tolist}]")
  keys = key_texts.map { |kt| kt[0] }
  texts = key_texts.map { |kt| kt[1] }
  best_rows = PyCall::List.call(py_result[1])[0]
  scores = PyCall::List.call(py_result[2])[0]

  results = []
  num.times do |i|
    result = { key: keys[i].to_i,
               text: texts[i],
               best_row: best_rows[i],
               score: scores[i] }
    result.each_key do |key|
      result.define_singleton_method(key) { result[key] }
    end
    results << result
  end
  results
end

#pipe(texts, disable: [], batch_size: 50) ⇒ Array<Doc>

Utility function to batch process many texts

Parameters:

  • texts (String)
  • disable (Array<String>) (defaults to: [])
  • batch_size (Integer) (defaults to: 50)

Returns:



444
445
446
447
448
449
450
# File 'lib/ruby-spacy.rb', line 444

def pipe(texts, disable: [], batch_size: 50)
  docs = []
  PyCall::List.call(@py_nlp.pipe(texts, disable: disable, batch_size: batch_size)).each do |py_doc|
    docs << Doc.new(@py_nlp, py_doc: py_doc)
  end
  docs
end

#pipe_namesArray<String>

A utility method to list pipeline components.

Returns:

  • (Array<String>)

    An array of text strings representing pipeline components



391
392
393
394
395
396
397
# File 'lib/ruby-spacy.rb', line 391

def pipe_names
  pipe_array = []
  PyCall::List.call(@py_nlp.pipe_names).each do |pipe|
    pipe_array << pipe
  end
  pipe_array
end

#read(text) ⇒ Object

Reads and analyze the given text.

Parameters:

  • text (String)

    a text to be read and analyzed



372
373
374
# File 'lib/ruby-spacy.rb', line 372

def read(text)
  Doc.new(py_nlp, text: text)
end

#respond_to_missing?(sym) ⇒ Boolean

Returns:

  • (Boolean)


457
458
459
# File 'lib/ruby-spacy.rb', line 457

def respond_to_missing?(sym)
  sym ? true : super
end

#vocab(text) ⇒ Lexeme

Returns a ruby lexeme object

Parameters:

  • text (String)

    a text string representing the vocabulary item

Returns:



409
410
411
# File 'lib/ruby-spacy.rb', line 409

def vocab(text)
  Lexeme.new(@py_nlp.vocab[text])
end

#vocab_string_lookup(id) ⇒ Object

A utility method to lookup a vocabulary item of the given id.

Parameters:

  • id (Integer)

    a vocabulary id

Returns:



385
386
387
# File 'lib/ruby-spacy.rb', line 385

def vocab_string_lookup(id)
  PyCall.eval("#{@spacy_nlp_id}.vocab.strings[#{id}]")
end