Class: WordNet::Lexicon
- Inherits:
-
Object
- Object
- WordNet::Lexicon
- Extended by:
- Loggability
- Includes:
- Constants
- Defined in:
- lib/wordnet/lexicon.rb
Overview
WordNet lexicon class - provides access to the WordNet lexical database, and provides factory methods for looking up words and synsets.
Creating a Lexicon
To create a Lexicon, point it at a database using [Sequel database connection criteria]http://sequel.rubyforge.org/rdoc/files/doc/opening_databases_rdoc.html:
lex = WordNet::Lexicon.new( 'postgres://localhost/wordnet30' )
# => #<WordNet::Lexicon:0x7fd192a76668 postgres://localhost/wordnet30>
# Another way of doing the same thing:
lex = WordNet::Lexicon.new( adapter: 'postgres', database: 'wordnet30', host: 'localhost' )
# => #<WordNet::Lexicon:0x7fd192d374b0 postgres>
Alternatively, if you have the ‘wordnet-defaultdb’ gem (which includes an embedded copy of the SQLite WordNET-SQL database) installed, just call ::new without any arguments:
lex = WordNet::Lexicon.new
# => #<WordNet::Lexicon:0x7fdbfac1a358 sqlite:[...]/gems/wordnet-defaultdb-1.0.1
# /data/wordnet-defaultdb/wordnet30.sqlite>
Looking Up Synsets
Once you have a Lexicon created, the main lookup method for Synsets is #[], which will return the first of any Synsets that are found:
synset = lex[ :language ]
# => #<WordNet::Synset:0x7fdbfaa987a0 {105650820} 'language, speech' (noun):
# [noun.cognition] the mental faculty or power of vocal communication>
If you want to look up all matching Synsets, use the #lookup_synsets method:
synsets = lex.lookup_synsets( :language )
# => [#<WordNet::Synset:0x7fdbfaac46c0 {105650820} 'language, speech' (noun):
# [noun.cognition] the mental faculty or power of vocal
# communication>,
# #<WordNet::Synset:0x7fdbfaac45a8 {105808557} 'language, linguistic process'
# (noun): [noun.cognition] the cognitive processes involved
# in producing and understanding linguistic communication>,
# #<WordNet::Synset:0x7fdbfaac4490 {106282651} 'language, linguistic
# communication' (noun): [noun.communication] a systematic means of
# communicating by the use of sounds or conventional symbols>,
# #<WordNet::Synset:0x7fdbfaac4378 {106304059} 'language, nomenclature,
# terminology' (noun): [noun.communication] a system of words used to
# name things in a particular discipline>,
# #<WordNet::Synset:0x7fdbfaac4260 {107051975} 'language, lyric, words'
# (noun): [noun.communication] the text of a popular song or musical-comedy
# number>,
# #<WordNet::Synset:0x7fdbfaac4120 {107109196} 'language, oral communication,
# speech, speech communication, spoken communication, spoken language,
# voice communication' (noun): [noun.communication] (language)
# communication by word of mouth>]
Sometime, the first Synset isn’t necessarily what you want; you want to look up a particular one. Both #[] and #lookup_synsets also provide several ways of filtering or selecting synsets.
The first is the ability to select one based on its offset:
lex[ :language, 2 ]
# => #<WordNet::Synset:0x7ffa78e74d78 {105808557} 'language, linguistic
# process' (noun): [noun.cognition] the cognitive processes involved in
# producing and understanding linguistic communication>
You can also select one with a particular word in its definition:
lex[ :language, 'sounds' ]
# => #<WordNet::Synset:0x7ffa78ee01b8 {106282651} 'linguistic communication,
# language' (noun): [noun.communication] a systematic means of
# communicating by the use of sounds or conventional symbols>
If you’re using a database that supports using regular expressions (e.g., PostgreSQL), you can use that to select one with a matching definition:
lex[ :language, %r:name.*discipline: ]
# => #<WordNet::Synset:0x7ffa78f235a8 {106304059} 'language, nomenclature,
# terminology' (noun): [noun.communication] a system of words used
# to name things in a particular discipline>
You can also select certain parts of speech:
lex[ :right, :noun ]
# => #<WordNet::Synset:0x7ffa78f30b68 {100351000} 'right' (noun):
# [noun.act] a turn toward the side of the body that is on the south
# when the person is facing east>
lex[ :right, :verb ]
# => #<WordNet::Synset:0x7ffa78f09590 {200199659} 'correct, right, rectify'
# (verb): [verb.change] make right or correct>
lex[ :right, :adjective ]
# => #<WordNet::Synset:0x7ffa78ea8060 {300631391} 'correct, right'
# (adjective): [adj.all] free from error; especially conforming to
# fact or truth>
lex[ :right, :adverb ]
# => #<WordNet::Synset:0x7ffa78e5b2d8 {400032299} 'powerful, mightily,
# mighty, right' (adverb): [adv.all] (Southern regional intensive)
# very; to a great degree>
or by lexical domain, which is a more-specific part of speech (see WordNet::Synset.lexdomains.keys for the list of valid ones):
lex.lookup_synsets( :right, 'verb.social' )
# => [#<WordNet::Synset:0x7ffa78d817e0 {202519991} 'redress, compensate,
# correct, right' (verb): [verb.social] make reparations or amends
# for>]
Constant Summary
Constants included from Constants
Constants::DEFAULT_DB_OPTIONS, Constants::DELIM, Constants::DELIM_RE, Constants::DOMAIN_TYPES, Constants::DomainSymbols, Constants::HOLONYM_SYMBOLS, Constants::HOLONYM_TYPES, Constants::HYPERNYM_SYMBOLS, Constants::HYPERNYM_TYPES, Constants::HYPONYM_SYMBOLS, Constants::HYPONYM_TYPES, Constants::LEXFILES, Constants::MEMBER_SYMBOLS, Constants::MEMBER_TYPES, Constants::MERONYM_SYMBOLS, Constants::MERONYM_TYPES, Constants::POINTER_SUBTYPES, Constants::POINTER_SYMBOLS, Constants::POINTER_TYPES, Constants::SUB_DELIM, Constants::SUB_DELIM_RE, Constants::SYNTACTIC_CATEGORIES, Constants::SYNTACTIC_SYMBOLS, Constants::VERB_SENTS
Instance Attribute Summary collapse
-
#db ⇒ Object
readonly
The Sequel::Database object that model tables read from.
-
#uri ⇒ Object
readonly
The database URI the lexicon will use to look up WordNet data.
Class Method Summary collapse
-
.default_db_uri ⇒ Object
Get the Sequel URI of the default database, if it’s installed.
Instance Method Summary collapse
-
#[](word, *args) ⇒ Object
Find a Word or Synset in the WordNet database and return it.
-
#connect(uri, options) ⇒ Object
Connect to the WordNet DB and return a Sequel::Database object.
-
#initialize(*args) ⇒ Lexicon
constructor
Create a new WordNet::Lexicon object that will use the database connection specified by the given
dbconfig. -
#initialize_with_defaultdb(options) ⇒ Object
Connect to the WordNet DB using an optional options hash.
-
#initialize_with_opthash(options) ⇒ Object
Connect to the WordNet DB using a connection options hash.
-
#initialize_with_uri(uri, options = {}) ⇒ Object
Connect to the WordNet DB using a URI and an optional options hash.
-
#inspect ⇒ Object
Return a human-readable string representation of the Lexicon, suitable for debugging.
-
#lookup_synsets(word, *args) ⇒ Object
Look up synsets (Wordnet::Synset objects) associated with
word, optionally filtered by additionalargs.
Constructor Details
#initialize(*args) ⇒ Lexicon
Create a new WordNet::Lexicon object that will use the database connection specified by the given dbconfig.
172 173 174 175 176 177 178 179 180 181 182 183 |
# File 'lib/wordnet/lexicon.rb', line 172 def initialize( *args ) if args.empty? self.initialize_with_defaultdb( args.shift ) elsif args.first.is_a?( String ) self.initialize_with_uri( *args ) else self.initialize_with_opthash( args.shift ) end @db.sql_log_level = :debug WordNet::Model.db = @db end |
Instance Attribute Details
#db ⇒ Object (readonly)
The Sequel::Database object that model tables read from
231 232 233 |
# File 'lib/wordnet/lexicon.rb', line 231 def db @db end |
#uri ⇒ Object (readonly)
The database URI the lexicon will use to look up WordNet data
228 229 230 |
# File 'lib/wordnet/lexicon.rb', line 228 def uri @uri end |
Class Method Details
.default_db_uri ⇒ Object
Get the Sequel URI of the default database, if it’s installed.
135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 |
# File 'lib/wordnet/lexicon.rb', line 135 def self::default_db_uri self.log.debug "Fetching the default db URI" # Try to load the default database gem, ignoring it if it's not installed. begin gem 'wordnet-defaultdb' rescue Gem::LoadError end # Now try the gem datadir first, and fall back to a local installation of the # default db datadir = nil if Gem.datadir( 'wordnet-defaultdb' ) datadir = Pathname( Gem.datadir('wordnet-defaultdb') ) else self.log.warn " no defaultdb gem; looking for the development database" datadir = Pathname( __FILE__ ).dirname.parent.parent + 'wordnet-defaultdb/data/wordnet-defaultdb' end dbfile = datadir + 'wordnet30.sqlite' self.log.debug " dbfile is: %s" % [ dbfile ] if dbfile.exist? return "sqlite:#{dbfile}" else return nil end end |
Instance Method Details
#[](word, *args) ⇒ Object
Find a Word or Synset in the WordNet database and return it. In the case of multiple matching Synsets, only the first will be returned. If you want them all, you can use #lookup_synsets instead.
The word can be one of:
- Integer
-
Looks up the corresponding Word or Synset by ID. This assumes that all Synset IDs are all 9 digits or greater, which is true as of WordNet 3.1. Any additional
argsare ignored. - Symbol, String
-
Look up a Word by its gloss using #lookup_synsets, passing any additional
args, and return the first one that is found.
246 247 248 249 250 251 252 253 254 255 256 257 |
# File 'lib/wordnet/lexicon.rb', line 246 def []( word, *args ) if word.is_a?( Integer ) # :TODO: Assumes Synset IDs are all >= 100_000_000 if word.to_s.length > 8 return WordNet::Synset[ word ] else return WordNet::Word[ word ] end else return self.lookup_synsets( word, 1, *args ).first end end |
#connect(uri, options) ⇒ Object
Connect to the WordNet DB and return a Sequel::Database object.
209 210 211 212 213 214 215 216 217 218 219 220 |
# File 'lib/wordnet/lexicon.rb', line 209 def connect( uri, ) = WordNet::DEFAULT_DB_OPTIONS.merge( || {} ) if uri self.log.debug "Connecting using uri + options style: uri = %s, options = %p" % [ uri, ] return Sequel.connect( uri, ) else self.log.debug "Connecting using hash style connect: options = %p" % [ ] return Sequel.connect( ) end end |
#initialize_with_defaultdb(options) ⇒ Object
Connect to the WordNet DB using an optional options hash.
187 188 189 190 191 192 193 |
# File 'lib/wordnet/lexicon.rb', line 187 def initialize_with_defaultdb( ) uri = WordNet::Lexicon.default_db_uri or raise WordNet::LexiconError, "No default WordNetSQL database! You can install it via the " + "wordnet-defaultdb gem, or download a version yourself from " + "http://sourceforge.net/projects/wnsql/" @db = self.connect( uri, ) end |
#initialize_with_opthash(options) ⇒ Object
Connect to the WordNet DB using a connection options hash.
203 204 205 |
# File 'lib/wordnet/lexicon.rb', line 203 def initialize_with_opthash( ) @db = self.connect( nil, ) end |
#initialize_with_uri(uri, options = {}) ⇒ Object
Connect to the WordNet DB using a URI and an optional options hash.
197 198 199 |
# File 'lib/wordnet/lexicon.rb', line 197 def initialize_with_uri( uri, ={} ) @db = self.connect( uri, ) end |
#inspect ⇒ Object
Return a human-readable string representation of the Lexicon, suitable for debugging.
330 331 332 333 334 335 336 |
# File 'lib/wordnet/lexicon.rb', line 330 def inspect return "#<%p:%0#x %s>" % [ self.class, self.object_id * 2, self.db.url || self.db.adapter_scheme ] end |
#lookup_synsets(word, *args) ⇒ Object
Look up synsets (Wordnet::Synset objects) associated with word, optionally filtered by additional args.
The args can contain:
- Integer, Range
-
The sense/s of the Word (1-indexed) to use when searching for Synsets. If not specified, all senses of the
wordare used. - Regexp
-
The Word’s Synsets are filtered by definition using an RLIKE filter. Note that not all databases (including the default one, sqlite3) support RLIKE.
- Symbol, String
-
If it matches one of either a lexical domain (e.g., “verb.motion”) or a part of speech (e.g., “adjective”, :noun, :v), the resulting Synsets are filtered by that criteria. If the doesn’t match a lexical domain or part of speech, it’s used to filter by definition using a LIKE query.
278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 |
# File 'lib/wordnet/lexicon.rb', line 278 def lookup_synsets( word, *args ) dataset = WordNet::Synset.filter( :words => WordNet::Word.filter(lemma: word.to_s) ) self.log.debug "Looking up synsets for %p" % [ word.to_s ] # Add filters to the dataset for each argument args.each do |arg| self.log.debug " constraint arg: %p" % [ arg ] case arg when Integer self.log.debug " limiting to sense %d" % [ arg ] dataset = dataset.limit( 1, arg-1 ) when Range self.log.debug " limiting to range of senses: %p" % [ arg ] dataset = dataset.limit( arg.entries.length, arg.begin - 1 ) when Regexp self.log.debug " filter: definition =~ %p" % [ arg ] dataset = dataset.filter( definition: arg ) when Symbol, String # Lexical domain, e.g., "verb.motion" if domain = WordNet::Synset.lexdomains[ arg.to_s ] self.log.debug " filter: lex domain: %s (%d)" % [ arg, domain[:lexdomainid] ] dataset = dataset.filter( lexdomainid: domain[:lexdomainid] ) # Part of speech symbol, e.g., "v" elsif WordNet::Synset.postype_table.key?( arg.to_sym ) self.log.debug " filter: part of speech: %s" % [ arg ] dataset = dataset.filter( pos: arg.to_s ) # Part of speech name, e.g., "verb" elsif pos = WordNet::Synset.postypes[ arg.to_s ] self.log.debug " filter: part of speech: %s" % [ pos.to_s ] dataset = dataset.filter( pos: pos.to_s ) # Assume it's a definition match else pattern = "%%%s%%" % [ arg ] self.log.debug " filter: definition LIKE %p" % [ pattern ] dataset = dataset.filter { :definition.like(pattern) } end end end return dataset.all end |