Class: Myasorubka::MSD

Inherits:
Object
  • Object
show all
Defined in:
lib/myasorubka/msd.rb

Overview

MSD is a morphosyntactic descriptor model.

This representation, with the concrete applications which display and exemplify the attributes and values and provide their internal constraints and relationships, makes the proposal self-explanatory. Other groups can easily test the specifications on their language, simply by following the method of the applications. The possibility of incorporating idiosyncratic classes and distinctions after the common core features makes the proposal relatively adaptable and flexible, without compromising compatibility.

MSD implementation and documentation are based on MULTEXT-East Morphosyntactic Specifications, Version 4: nl.ijs.si/ME/V4/msd/html/msd.html

You may use Myasorubka::MSD either as parser and generator.

“‘ruby msd = Myasorubka::MSD.new(Myasorubka::MSD::Russian) msd = :noun msd = :common msd = :plural msd = :locative msd.to_s # => “Nc-pl” “`

“‘ruby msd = Myasorubka::MSD.new(Myasorubka::MSD::Russian, ’Vmps-snpfel’) msd # => :verb msd # => :past msd # => nil msd.grammemes # => :vform=>:participle, … “‘

Defined Under Namespace

Modules: English, Russian Classes: InvalidDescriptor

Constant Summary collapse

EMPTY_DESCRIPTOR =

Empty descriptor character.

'-'

Instance Attribute Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(language, msd = '') ⇒ MSD

Creates a new morphosyntactic descriptor model instance. Please specify a ‘language` module with defined `CATEGORIES`.

Optionally, you can parse MSD string that is passed as ‘msd` argument.

Parameters:

  • language (Myasorubka::MSD::Language)

    a language to use.

  • msd (String) (defaults to: '')

    a String to initialize new MSD.



63
64
65
66
67
68
69
70
71
72
# File 'lib/myasorubka/msd.rb', line 63

def initialize(language, msd = '')
  @language, @pos, @grammemes = language, nil, {}

  unless language.const_defined? 'CATEGORIES'
    raise ArgumentError,
      'given language has no morphosyntactic descriptions'
  end

  parse! msd if msd && !msd.empty?
end

Instance Attribute Details

#grammemesObject (readonly)

Returns the value of attribute grammemes.



50
51
52
# File 'lib/myasorubka/msd.rb', line 50

def grammemes
  @grammemes
end

#languageObject (readonly)

Returns the value of attribute language.



50
51
52
# File 'lib/myasorubka/msd.rb', line 50

def language
  @language
end

#posObject

Returns the value of attribute pos.



51
52
53
# File 'lib/myasorubka/msd.rb', line 51

def pos
  @pos
end

Instance Method Details

#<=>(other) ⇒ Object



104
105
106
# File 'lib/myasorubka/msd.rb', line 104

def <=> other
  to_s <=> other.to_s
end

#==(other) ⇒ Object



109
110
111
# File 'lib/myasorubka/msd.rb', line 109

def == other
  to_s == other.to_s
end

#[](key) ⇒ Symbol

Retrieves the morphosyntactic descriptor corresponding to the ‘key` object. If not, returns `nil`.

Parameters:

  • key (Symbol)

    a key to look at.

Returns:

  • (Symbol)

    a value of ‘key`.



80
81
82
83
# File 'lib/myasorubka/msd.rb', line 80

def [] key
  return pos if :pos == key
  grammemes[key]
end

#[]=(key, value) ⇒ Symbol

Assignes the morphosyntactic descriptor given by ‘value` with the key given by `key` object.

Parameters:

  • key (Symbol)

    a key to be set.

  • value (Symbol)

    a value to be assigned.

Returns:

  • (Symbol)

    the assigned value.

Raises:



92
93
94
95
96
# File 'lib/myasorubka/msd.rb', line 92

def []= key, value
  return @pos = value if :pos == key
  raise InvalidDescriptor, 'category is not set yet' unless pos
  grammemes[key] = value
end

#inspectObject



99
100
101
# File 'lib/myasorubka/msd.rb', line 99

def inspect
  '#<%s msd=%s>' % [language.name, to_s.inspect]
end

#merge!(hash) ⇒ MSD

Merges grammemes that are stored in ‘hash` into the MSD grammemes.

Parameters:

  • hash (Hash<Symbol, Symbol>)

    a hash to be processed.

Returns:

  • (MSD)

    self.



140
141
142
143
144
145
146
# File 'lib/myasorubka/msd.rb', line 140

def merge! hash
  hash.each do |key, value|
    self[key.to_sym] = value.to_sym
  end

  self
end

#to_regexpRegexp

Generates Regexp from the MSD that is useful to perform database queries.

“‘ruby msd = Myasorubka::MSD.new(Myasorubka::MSD::Russian, ’Vm’) r = msd.to_regexp # => /^Vm.*$/ ‘Vmp’ =~ r # 0 ‘Nc-pl’ =~ r # nil “‘

Returns:

  • (Regexp)

    the correspondent regular expression.



125
126
127
128
129
130
131
132
# File 'lib/myasorubka/msd.rb', line 125

def to_regexp
  Regexp.new([
    '^',
    self.to_s.gsub(EMPTY_DESCRIPTOR, '.'),
    '.*',
    '$'
  ].join)
end

#to_sObject



149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
# File 'lib/myasorubka/msd.rb', line 149

def to_s
  return '' unless pos

  unless category = language::CATEGORIES[pos]
    raise InvalidDescriptor, "category is nil"
  end

  msd = [category[:code]]

  attrs = category[:attrs]
  grammemes.each do |attr_name, value|
    next unless value

    attr_index = attrs.index { |name, *values| name == attr_name }
    unless attr_index
      raise InvalidDescriptor, 'no such attribute "%s" of category "%s"' %
        [attr_name, pos]
    end

    attr_name, values = attrs[attr_index]

    unless attr_value = values[value]
      raise InvalidDescriptor, 'no such attribute "%s" ' \
        'for attribute "%s" of category "%s"' % [value, attr_name, pos]
    end

    msd[attr_index + 1] = attr_value
  end

  msd.map { |e| e || EMPTY_DESCRIPTOR }.join
end

#valid?true, false

Validates the MSD instance.

Returns:

  • (true, false)

    validation state of the MSD instance.



185
186
187
188
189
# File 'lib/myasorubka/msd.rb', line 185

def valid?
  !!to_s
rescue InvalidDescriptor
  false
end