Class: WordNet::Synset

Inherits:
Object
  • Object
show all
Includes:
Constants
Defined in:
lib/wordnet/synset.rb

Overview

WordNet synonym-set object class

Instances of this class encapsulate the data for a synonym set (‘synset’) in a WordNet lexical database. A synonym set is a set of words that are interchangeable in some context.

We can either fetch the synset from a connected Lexicon:

lexicon = WordNet::Lexicon.new( 'postgres://localhost/wordnet30' )
ss = lexicon[ :first, 'time' ]
# => #<WordNet::Synset:0x7ffbf2643bb0 {115265518} 'commencement, first,
#       get-go, offset, outset, start, starting time, beginning, kickoff,
#       showtime' (noun): [noun.time] the time at which something is
#       supposed to begin>

or if you’ve already created a Lexicon, use its connection indirectly to look up a Synset by its ID:

ss = WordNet::Synset[ 115265518 ]
# => #<WordNet::Synset:0x7ffbf257e928 {115265518} 'commencement, first,
#       get-go, offset, outset, start, starting time, beginning, kickoff,
#       showtime' (noun): [noun.time] the time at which something is
#       supposed to begin>

You can fetch a list of the lemmas (base forms) of the words included in the synset:

ss.words.map( &:lemma )
# => ["commencement", "first", "get-go", "offset", "outset", "start",
#     "starting time", "beginning", "kickoff", "showtime"]

But the primary reason for a synset is its lexical and semantic links to other words and synsets. For instance, its hypernym is the equivalent of its superclass: it’s the class of things of which the receiving synset is a member.

ss.hypernyms
# => [#<WordNet::Synset:0x7ffbf25c76c8 {115180528} 'point, point in
#        time' (noun): [noun.time] an instant of time>]

The synset’s hypernyms, on the other hand, are kind of like its subclasses:

ss.hyponyms
# => [#<WordNet::Synset:0x7ffbf25d83b0 {115142167} 'birth' (noun):
#       [noun.time] the time when something begins (especially life)>,
#     #<WordNet::Synset:0x7ffbf25d8298 {115268993} 'threshold' (noun):
#       [noun.time] the starting point for a new state or experience>,
#     #<WordNet::Synset:0x7ffbf25d8180 {115143012} 'incipiency,
#       incipience' (noun): [noun.time] beginning to exist or to be
#       apparent>,
#     #<WordNet::Synset:0x7ffbf25d8068 {115266164} 'starting point,
#       terminus a quo' (noun): [noun.time] earliest limiting point>]

Traversal

Synset also provides a few ‘traversal’ methods which provide recursive searching of a Synset’s semantic links:

# Recursively search for more-general terms for the synset, and print out
# each one with indentation according to how distantly it's related.
lexicon[ :fencing, 'sword' ].
    traverse(:hypernyms).with_depth.
    each {|ss, depth| puts "%s%s [%d]" % ['  ' * (depth-1), ss.words.first, ss.synsetid] }
# (outputs:)
play [100041468]
  action [100037396]
    act [100030358]
      event [100029378]
        psychological feature [100023100]
          abstract entity [100002137]
            entity [100001740]
combat [101170962]
  battle [100958896]
    group action [101080366]
      event [100029378]
        psychological feature [100023100]
          abstract entity [100002137]
            entity [100001740]
      act [100030358]
        event [100029378]
          psychological feature [100023100]
            abstract entity [100002137]
              entity [100001740]

See the Traversal Methods section for more details.

Low-Level API

This library is implemented using Sequel::Model, an ORM layer on top of the excellent Sequel database toolkit. This means that in addition to the high-level methods above, you can also make use of a database-oriented API if you need to do something not provided by a high-level method.

In order to make use of this API, you’ll need to be familiar with Sequel, especially Datasets and Model Associations. Most of Ruby-WordNet’s functionality is implemented in terms of one or both of these.

Datasets

The main dataset is available from WordNet::Synset.dataset:

WordNet::Synset.dataset
# => #<Sequel::SQLite::Dataset: "SELECT * FROM `synsets`">

In addition to this, Synset also defines a few other canned datasets. To facilitate searching by part of speech on the Synset class:

  • WordNet::Synset.nouns

  • WordNet::Synset.verbs

  • WordNet::Synset.adjectives

  • WordNet::Synset.adverbs

  • WordNet::Synset.adjective_satellites

or by the semantic links for a particular Synset:

  • WordNet::Synset#also_see_dataset

  • WordNet::Synset#attributes_dataset

  • WordNet::Synset#causes_dataset

  • WordNet::Synset#domain_categories_dataset

  • WordNet::Synset#domain_member_categories_dataset

  • WordNet::Synset#domain_member_regions_dataset

  • WordNet::Synset#domain_member_usages_dataset

  • WordNet::Synset#domain_regions_dataset

  • WordNet::Synset#domain_usages_dataset

  • WordNet::Synset#entailments_dataset

  • WordNet::Synset#hypernyms_dataset

  • WordNet::Synset#hyponyms_dataset

  • WordNet::Synset#instance_hypernyms_dataset

  • WordNet::Synset#instance_hyponyms_dataset

  • WordNet::Synset#member_holonyms_dataset

  • WordNet::Synset#member_meronyms_dataset

  • WordNet::Synset#part_holonyms_dataset

  • WordNet::Synset#part_meronyms_dataset

  • WordNet::Synset#semlinks_dataset

  • WordNet::Synset#semlinks_to_dataset

  • WordNet::Synset#senses_dataset

  • WordNet::Synset#similar_words_dataset

  • WordNet::Synset#substance_holonyms_dataset

  • WordNet::Synset#substance_meronyms_dataset

  • WordNet::Synset#sumo_terms_dataset

  • WordNet::Synset#verb_groups_dataset

  • WordNet::Synset#words_dataset

Constant Summary collapse

SEMANTIC_TYPEKEYS =

Semantic link type keys; maps what the API calls them to what they are in the DB.

Hash.new {|h,type| h[type] = type.to_s.chomp('s').to_sym }

Constants included from Constants

Constants::DEFAULT_DB_OPTIONS, Constants::DELIM, Constants::DELIM_RE, Constants::DOMAIN_TYPES, Constants::DomainSymbols, Constants::HOLONYM_SYMBOLS, Constants::HOLONYM_TYPES, Constants::HYPERNYM_SYMBOLS, Constants::HYPERNYM_TYPES, Constants::HYPONYM_SYMBOLS, Constants::HYPONYM_TYPES, Constants::LEXFILES, Constants::MEMBER_SYMBOLS, Constants::MEMBER_TYPES, Constants::MERONYM_SYMBOLS, Constants::MERONYM_TYPES, Constants::POINTER_SUBTYPES, Constants::POINTER_SYMBOLS, Constants::POINTER_TYPES, Constants::SUB_DELIM, Constants::SUB_DELIM_RE, Constants::SYNTACTIC_CATEGORIES, Constants::SYNTACTIC_SYMBOLS, Constants::VERB_SENTS

Class Attribute Summary collapse

Class Method Summary collapse

Instance Method Summary collapse

Class Attribute Details

Returns the value of attribute semantic_link_methods.



362
363
364
# File 'lib/wordnet/synset.rb', line 362

def semantic_link_methods
  @semantic_link_methods
end

Class Method Details

.db=(newdb) ⇒ Object

Overridden to reset any lookup tables that may have been loaded from the previous database.



289
290
291
292
# File 'lib/wordnet/synset.rb', line 289

def self::db=( newdb )
	self.reset_lookup_tables
	super
end

.lexdomain_tableObject

Return the table of lexical domains, keyed by id.



307
308
309
# File 'lib/wordnet/synset.rb', line 307

def self::lexdomain_table
	@lexdomain_table ||= self.db[:lexdomains].to_hash( :lexdomainid )
end

.lexdomainsObject

Lexical domains, keyed by name as a String (e.g., “verb.cognition”)



313
314
315
316
317
318
# File 'lib/wordnet/synset.rb', line 313

def self::lexdomains
	@lexdomains ||= self.lexdomain_table.inject({}) do |hash,(id,domain)|
		hash[ domain[:lexdomainname] ] = domain
		hash
	end
end

.linktype_tableObject

Return the table of link types, keyed by linkid



322
323
324
325
326
327
328
329
330
331
332
# File 'lib/wordnet/synset.rb', line 322

def self::linktype_table
	@linktype_table ||= self.db[:linktypes].inject({}) do |hash,row|
		hash[ row[:linkid] ] = {
			:id       => row[:linkid],
			:typename => row[:link],
			:type     => row[:link].gsub( /\s+/, '_' ).to_sym,
			:recurses => row[:recurses] && row[:recurses] != 0,
		}
		hash
	end
end

.linktypesObject

Return the table of link types, keyed by name.



336
337
338
339
340
341
# File 'lib/wordnet/synset.rb', line 336

def self::linktypes
	@linktypes ||= self.linktype_table.inject({}) do |hash,(id,link)|
		hash[ link[:type] ] = link
		hash
	end
end

.postype_tableObject

Return the table of part-of-speech types, keyed by letter identifier.



345
346
347
348
349
350
# File 'lib/wordnet/synset.rb', line 345

def self::postype_table
	@postype_table ||= self.db[:postypes].inject({}) do |hash, row|
		hash[ row[:pos].untaint.to_sym ] = row[:posname]
		hash
	end
end

.postypesObject

Return the table of part-of-speech names to letter identifiers (both Symbols).



354
355
356
# File 'lib/wordnet/synset.rb', line 354

def self::postypes
	@postypes ||= self.postype_table.invert
end

.reset_lookup_tablesObject

Unload all of the cached lookup tables that have been loaded.



296
297
298
299
300
301
302
303
# File 'lib/wordnet/synset.rb', line 296

def self::reset_lookup_tables
	@lexdomain_table = nil
	@lexdomains      = nil
	@linktype_table  = nil
	@linktypes       = nil
	@postype_table   = nil
	@postypes        = nil
end

Generate methods that will return Synsets related by the given semantic pointer type.



368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
# File 'lib/wordnet/synset.rb', line 368

def self::semantic_link( type )
	self.log.debug "Generating a %p method" % [ type ]

	ds_method_body = Proc.new do
		self.semanticlink_dataset( type )
	end
	define_method( "#{type}_dataset", &ds_method_body )

	ss_method_body = Proc.new do
		self.semanticlink_dataset( type ).all
	end
	define_method( type, &ss_method_body )

	self.semantic_link_methods << type.to_sym
end

Instance Method Details

#adjective_satellitesObject

:singleton-method: adjective_satellites Dataset method: filtered by part of speech: adjective satellites.



282
# File 'lib/wordnet/synset.rb', line 282

def_dataset_method( :adjective_satellites ) { filter(pos: 's') }

#adjectivesObject

:singleton-method: adjectives Dataset method: filtered by part of speech: adjectives.



272
# File 'lib/wordnet/synset.rb', line 272

def_dataset_method( :adjectives ) { filter(pos: 'a') }

#adverbsObject

:singleton-method: adverbs Dataset method: filtered by part of speech: adverbs.



277
# File 'lib/wordnet/synset.rb', line 277

def_dataset_method( :adverbs ) { filter(pos: 'r') }

#also_seeObject

“See Also” synsets



459
# File 'lib/wordnet/synset.rb', line 459

semantic_link :also_see

#attributesObject

Attribute synsets



463
# File 'lib/wordnet/synset.rb', line 463

semantic_link :attributes

#causesObject

Cause synsets



467
# File 'lib/wordnet/synset.rb', line 467

semantic_link :causes

#domain_categoriesObject

Domain category synsets



471
# File 'lib/wordnet/synset.rb', line 471

semantic_link :domain_categories

#domain_member_categoriesObject

Domain member category synsets



475
# File 'lib/wordnet/synset.rb', line 475

semantic_link :domain_member_categories

#domain_member_regionsObject

Domain member region synsets



479
# File 'lib/wordnet/synset.rb', line 479

semantic_link :domain_member_regions

#domain_member_usagesObject

Domain member usage synsets



483
# File 'lib/wordnet/synset.rb', line 483

semantic_link :domain_member_usages

#domain_regionsObject

Domain region synsets



487
# File 'lib/wordnet/synset.rb', line 487

semantic_link :domain_regions

#domain_usagesObject

Domain usage synsets



491
# File 'lib/wordnet/synset.rb', line 491

semantic_link :domain_usages

#entailmentsObject

Verb entailment synsets



495
# File 'lib/wordnet/synset.rb', line 495

semantic_link :entailments

#hypernymsObject

Hypernym sunsets



499
# File 'lib/wordnet/synset.rb', line 499

semantic_link :hypernyms

#hyponymsObject

Hyponym synsets



503
# File 'lib/wordnet/synset.rb', line 503

semantic_link :hyponyms

#inspectObject

Return a human-readable representation of the objects, suitable for debugging.



675
676
677
678
679
680
681
682
683
684
685
# File 'lib/wordnet/synset.rb', line 675

def inspect
	return "#<%p:%0#x {%d} '%s' (%s): [%s] %s>" % [
		self.class,
		self.object_id * 2,
		self.synsetid,
		self.words.map(&:to_s).join(', '),
		self.part_of_speech,
		self.lexical_domain,
		self.definition,
	]
end

#instance_hypernymsObject

Instance hypernym synsets



507
# File 'lib/wordnet/synset.rb', line 507

semantic_link :instance_hypernyms

#instance_hyponymsObject

Instance hyponym synsets



511
# File 'lib/wordnet/synset.rb', line 511

semantic_link :instance_hyponyms

#lexical_domainObject

Return the name of the lexical domain the synset belongs to; this also corresponds to the lexicographer’s file the synset was originally loaded from.



439
440
441
# File 'lib/wordnet/synset.rb', line 439

def lexical_domain
	return self.class.lexdomain_table[ self.lexdomainid ][ :lexdomainname ]
end

#member_holonymsObject

Member holonym synsets



515
# File 'lib/wordnet/synset.rb', line 515

semantic_link :member_holonyms

#member_meronymsObject

Member meronym synsets



519
# File 'lib/wordnet/synset.rb', line 519

semantic_link :member_meronyms

#nounsObject

:singleton-method: nouns Dataset method: filtered by part of speech: nouns.



262
# File 'lib/wordnet/synset.rb', line 262

def_dataset_method( :nouns ) { filter(pos: 'n') }

#part_holonymsObject

Part holonym synsets



523
# File 'lib/wordnet/synset.rb', line 523

semantic_link :part_holonyms

#part_meronymsObject

Part meronym synsets



527
# File 'lib/wordnet/synset.rb', line 527

semantic_link :part_meronyms

#part_of_speechObject

Return the name of the Synset’s part of speech (#pos).



409
410
411
# File 'lib/wordnet/synset.rb', line 409

def part_of_speech
	return self.class.postype_table[ self.pos.to_sym ]
end

#samplesObject

Return any sample sentences.



445
446
447
448
449
450
# File 'lib/wordnet/synset.rb', line 445

def samples
	return self.db[:samples].
		filter( synsetid: self.synsetid ).
		order( :sampleid ).
		map( :sample )
end

#search(type, synset) ⇒ Object

Search for the specified synset in the semantic links of the given type of the receiver, returning the depth it was found at if it’s found, or nil if it wasn’t found.



664
665
666
667
# File 'lib/wordnet/synset.rb', line 664

def search( type, synset )
	found, depth = self.traverse( type ).with_depth.find {|ss,depth| synset == ss }
	return depth
end

Return a Sequel::Dataset for synsets related to the receiver via the semantic link of the specified type.



391
392
393
394
395
396
397
398
# File 'lib/wordnet/synset.rb', line 391

def semanticlink_dataset( type )
	typekey  = SEMANTIC_TYPEKEYS[ type ]
	linkinfo = self.class.linktypes[ typekey ] or
		raise ArgumentError, "no such link type %p" % [ typekey ]
	ssids    = self.semlinks_dataset.filter( :linkid => linkinfo[:id] ).select( :synset2id )

	return self.class.filter( :synsetid => ssids )
end

Return an Enumerator that will iterate over the Synsets related to the receiver via the semantic links of the specified linktype.



403
404
405
# File 'lib/wordnet/synset.rb', line 403

def semanticlink_enum( linktype )
	return self.semanticlink_dataset( linktype ).to_enum
end

:singleton-method: The WordNet::SemanticLinks indicating a relationship with other WordNet::Synsets



199
200
201
202
203
# File 'lib/wordnet/synset.rb', line 199

one_to_many :semlinks,
:class       => :"WordNet::SemanticLink",
:key         => :synset1id,
:primary_key => :synsetid,
:eager       => :target

:singleton-method: The WordNet::SemanticLinks pointing to this Synset



209
210
211
212
# File 'lib/wordnet/synset.rb', line 209

many_to_one :semlinks_to,
:class       => :"WordNet::SemanticLink",
:key         => :synsetid,
:primary_key => :synset2id

#sensesObject

:singleton-method: The WordNet::Senses associated with the receiver



190
191
192
# File 'lib/wordnet/synset.rb', line 190

one_to_many :senses,
:key         => :synsetid,
:primary_key => :synsetid

#similar_wordsObject

Similar word synsets



531
# File 'lib/wordnet/synset.rb', line 531

semantic_link :similar_words

#substance_holonymsObject

Substance holonym synsets



535
# File 'lib/wordnet/synset.rb', line 535

semantic_link :substance_holonyms

#substance_meronymsObject

Substance meronym synsets



539
# File 'lib/wordnet/synset.rb', line 539

semantic_link :substance_meronyms

#sumo_termsObject

:singleton-method: Terms from the Suggested Upper Merged Ontology



218
219
220
221
# File 'lib/wordnet/synset.rb', line 218

many_to_many :sumo_terms,
:join_table  => :sumomaps,
:left_key    => :synsetid,
:right_key   => :sumoid

#to_sObject

Stringify the synset.



415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
# File 'lib/wordnet/synset.rb', line 415

def to_s

	# Make a sorted list of the semantic link types from this synset
	semlink_list = self.semlinks_dataset.
		group_and_count( :linkid ).
		to_hash( :linkid, :count ).
		collect do |linkid, count|
			'%s: %d' % [ self.class.linktype_table[linkid][:typename], count ]
		end.
		sort.
		join( ', ' )

	return "%s (%s): [%s] %s (%s)" % [
		self.words.map( &:to_s ).join(', '),
		self.part_of_speech,
		self.lexical_domain,
		self.definition,
		semlink_list
	]
end

#traverse(type, &block) ⇒ Object

With a block, yield a WordNet::Synset related to the receiver via a link of the specified type, recursing depth first into each of its links if the link type is recursive. To exit from the traversal at any depth, throw :stop_traversal.

If no block is given, return an Enumerator that will do the same thing instead.

# Print all the parts of a boot
puts lexicon[:boot].traverse( :member_meronyms ).to_a

You can also traverse with an addiitional argument that indicates the depth of recursion by calling #with_depth on the Enumerator:

$lex[:fencing].traverse( :hypernyms ).with_depth.each {|ss,d| puts "%02d: %s" % [d,ss] }
# (outputs:)

01: play, swordplay (noun): [noun.act] the act using a sword (or other weapon) vigorously
  and skillfully (hypernym: 1, hyponym: 1)
02: action (noun): [noun.act] something done (usually as opposed to something said)
  (hypernym: 1, hyponym: 33)
03: act, deed, human action, human activity (noun): [noun.tops] something that people do
  or cause to happen (hypernym: 1, hyponym: 40)
...


617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
# File 'lib/wordnet/synset.rb', line 617

def traverse( type, &block )
	enum = Enumerator.new do |yielder|
		traversals = [ self.semanticlink_enum(type) ]
		syn        = nil
		typekey    = SEMANTIC_TYPEKEYS[ type ]
		recurses   = self.class.linktypes[ typekey ][:recurses]

		self.log.debug "Traversing %s semlinks%s" % [ type, recurses ? " (recursive)" : ''  ]

		catch( :stop_traversal ) do
			until traversals.empty?
				begin
					self.log.debug "  %d traversal/s left" % [ traversals.length ]
					syn = traversals.last.next

					if enum.with_depth?
						yielder.yield( syn, traversals.length )
					else
						yielder.yield( syn )
					end

					traversals << syn.semanticlink_enum( type ) if recurses
				rescue StopIteration
					traversals.pop
				end
			end
		end
	end

	def enum.with_depth?
		@with_depth = false if !defined?( @with_depth )
		return @with_depth
	end

	def enum.with_depth
		@with_depth = true
		self
	end

	return enum.each( &block ) if block
	return enum
end

#verb_groupsObject

Verb group synsets



543
# File 'lib/wordnet/synset.rb', line 543

semantic_link :verb_groups

#verbsObject

:singleton-method: verbs Dataset method: filtered by part of speech: verbs.



267
# File 'lib/wordnet/synset.rb', line 267

def_dataset_method( :verbs ) { filter(pos: 'v') }

#wordsObject

:singleton-method: The WordNet::Words associated with the receiver



181
182
183
184
# File 'lib/wordnet/synset.rb', line 181

many_to_many :words,
:join_table  => :senses,
:left_key    => :synsetid,
:right_key   => :wordid

#|(othersyn) ⇒ Object

Union: Return the least general synset that the receiver and othersyn have in common as a hypernym, or nil if it doesn’t share any.



575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
# File 'lib/wordnet/synset.rb', line 575

def |( othersyn )

	# Find all of this syn's hypernyms
	hypersyns = self.traverse( :hypernyms ).to_a
	commonsyn = nil

	# Now traverse the other synset's hypernyms looking for one of our
	# own hypernyms.
	othersyn.traverse( :hypernyms ) do |syn|
		if hypersyns.include?( syn )
			commonsyn = syn
			throw :stop_traversal
		end
	end

	return commonsyn
end