Class: Bio::FlatFile
- Includes:
- Enumerable
- Defined in:
- lib/bio/io/flatfile.rb,
lib/bio/io/flatfile/buffer.rb,
lib/bio/io/flatfile/splitter.rb,
lib/bio/io/flatfile/autodetection.rb
Overview
Bio::FlatFile is a helper and wrapper class to read a biological data file. It acts like a IO object. It can automatically detect data format, and users do not need to tell the class what the data is.
Defined Under Namespace
Modules: Splitter Classes: AutoDetect, BufferedInputStream, UnknownDataFormatError
Instance Attribute Summary collapse
-
#dbclass ⇒ Object
Returns database class which is automatically detected or given in FlatFile#initialize.
-
#entry ⇒ Object
readonly
Returns the value of attribute entry.
-
#raw ⇒ Object
If true, raw mode.
-
#skip_leader_mode ⇒ Object
The mode how to skip leader of the data.
Class Method Summary collapse
-
.auto(*arg, &block) ⇒ Object
Same as Bio::FlatFile.open(nil, filename_or_stream, mode, perm, options).
-
.autodetect(text) ⇒ Object
Detects database class (== file format) of given string.
-
.autodetect_file(filename) ⇒ Object
Detects database class (== file format) of given file.
-
.autodetect_io(io) ⇒ Object
Detects database class (== file format) of given input stream.
-
.autodetect_stream(io) ⇒ Object
This is OBSOLETED.
-
.foreach(*arg) ⇒ Object
Executes the block for every entry in the stream.
-
.open(*arg, &block) ⇒ Object
Bio::FlatFile.open(file, *arg) Bio::FlatFile.open(dbclass, file, *arg).
-
.open_file(filename, *arg) ⇒ Object
Same as FlatFile.auto(filename, *arg), except that it only accept filename and doesn’t accept IO object.
-
.open_uri(uri, *arg) ⇒ Object
Opens URI specified as uri.
-
.to_a(*arg) ⇒ Object
Same as FlatFile.auto(filename_or_stream, *arg).to_a.
Instance Method Summary collapse
-
#autodetect(lines = 31, ad = AutoDetect.default) ⇒ Object
Performs determination of database class (file format).
-
#close ⇒ Object
Closes input stream.
-
#each_entry ⇒ Object
(also: #each)
Iterates over each entry in the flatfile.
-
#entry_ended_pos ⇒ Object
(end position of the last entry) + 1.
-
#entry_pos_flag ⇒ Object
a flag to write down entry start and end positions.
-
#entry_pos_flag=(x) ⇒ Object
Sets flag to write down entry start and end positions.
-
#entry_raw ⇒ Object
Returns the last raw entry as a string.
-
#entry_start_pos ⇒ Object
start position of the last entry.
-
#eof? ⇒ Boolean
Returns true if input stream is end-of-file.
-
#gets(*arg) ⇒ Object
Similar to IO#gets.
-
#initialize(dbclass, stream) ⇒ FlatFile
constructor
Same as FlatFile.open, except that ‘stream’ should be a opened stream object (IO, File, …, who have the ‘gets’ method).
-
#io ⇒ Object
(DEPRECATED) IO object in the flatfile object.
-
#next_entry ⇒ Object
Get next entry.
-
#path ⇒ Object
Pathname, filename or URI (or nil).
-
#pos ⇒ Object
Returns current position of input stream.
-
#pos=(p) ⇒ Object
(Not recommended to use it.) Sets position of input stream.
-
#rewind ⇒ Object
Resets file pointer to the start of the flatfile.
-
#to_io ⇒ Object
IO object in the flatfile object.
Constructor Details
#initialize(dbclass, stream) ⇒ FlatFile
Same as FlatFile.open, except that ‘stream’ should be a opened stream object (IO, File, …, who have the ‘gets’ method).
-
Example 1
Bio::FlatFile.new(Bio::GenBank, ARGF)
-
Example 2
Bio::FlatFile.new(Bio::GenBank, IO.popen("gzip -dc nc1101.flat.gz"))
Compatibility Note: Now, you cannot specify “:raw => true” or “:raw => false”. Below styles are DEPRECATED.
-
Example 3 (deprecated)
# Bio::FlatFile.new(nil, $stdin, :raw=>true) # => ERROR # Please rewrite as below. ff = Bio::FlatFile.new(nil, $stdin) ff.raw = true
-
Example 3 in old style (deprecated)
# Bio::FlatFile.new(nil, $stdin, true) # => ERROR # Please rewrite as below. ff = Bio::FlatFile.new(nil, $stdin) ff.raw = true
225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 |
# File 'lib/bio/io/flatfile.rb', line 225 def initialize(dbclass, stream) # 2nd arg: IO object if stream.kind_of?(BufferedInputStream) @stream = stream else @stream = BufferedInputStream.for_io(stream) end # 1st arg: database class (or file format autodetection) if dbclass then self.dbclass = dbclass else autodetect end # @skip_leader_mode = :firsttime @firsttime_flag = true # default raw mode is false self.raw = false end |
Instance Attribute Details
#dbclass ⇒ Object
Returns database class which is automatically detected or given in FlatFile#initialize.
421 422 423 |
# File 'lib/bio/io/flatfile.rb', line 421 def dbclass @dbclass end |
#entry ⇒ Object (readonly)
Returns the value of attribute entry.
299 300 301 |
# File 'lib/bio/io/flatfile.rb', line 299 def entry @entry end |
#raw ⇒ Object
If true, raw mode.
391 392 393 |
# File 'lib/bio/io/flatfile.rb', line 391 def raw @raw end |
#skip_leader_mode ⇒ Object
The mode how to skip leader of the data.
- :firsttime
-
(DEFAULT) only head of file (= first time to read)
- :everytime
-
everytime to read entry
- nil
-
never skip
249 250 251 |
# File 'lib/bio/io/flatfile.rb', line 249 def skip_leader_mode @skip_leader_mode end |
Class Method Details
.auto(*arg, &block) ⇒ Object
122 123 124 |
# File 'lib/bio/io/flatfile.rb', line 122 def self.auto(*arg, &block) self.open(nil, *arg, &block) end |
.autodetect(text) ⇒ Object
Detects database class (== file format) of given string. If fails to determine, returns false or nil.
460 461 462 |
# File 'lib/bio/io/flatfile.rb', line 460 def self.autodetect(text) AutoDetect.default.autodetect(text) end |
.autodetect_file(filename) ⇒ Object
Detects database class (== file format) of given file. If fails to determine, returns nil.
440 441 442 |
# File 'lib/bio/io/flatfile.rb', line 440 def self.autodetect_file(filename) self.open_file(filename).dbclass end |
.autodetect_io(io) ⇒ Object
Detects database class (== file format) of given input stream. If fails to determine, returns nil. Caution: the method reads some data from the input stream, and the data will be lost.
448 449 450 |
# File 'lib/bio/io/flatfile.rb', line 448 def self.autodetect_io(io) self.new(nil, io).dbclass end |
.autodetect_stream(io) ⇒ Object
This is OBSOLETED. Please use autodetect_io(io) instead.
453 454 455 456 |
# File 'lib/bio/io/flatfile.rb', line 453 def self.autodetect_stream(io) $stderr.print "Bio::FlatFile.autodetect_stream will be deprecated." if $VERBOSE self.autodetect_io(io) end |
.foreach(*arg) ⇒ Object
194 195 196 197 198 199 200 |
# File 'lib/bio/io/flatfile.rb', line 194 def self.foreach(*arg) self.open(*arg) do |flatfileobj| flatfileobj.each do |entry| yield entry end end end |
.open(*arg, &block) ⇒ Object
Bio::FlatFile.open(file, *arg)
Bio::FlatFile.open(dbclass, file, *arg)
Creates a new Bio::FlatFile object to read a file or a stream which contains dbclass data.
dbclass should be a class (or module) or nil. e.g. Bio::GenBank, Bio::FastaFormat.
If file is a filename (which doesn’t have gets method), the method opens a local file named file with File.open(filename, *arg)
.
When dbclass is omitted or nil is given to dbclass, the method tries to determine database class (file format) automatically. When it fails to determine, dbclass is set to nil and FlatFile#next_entry would fail. You can still set dbclass using FlatFile#dbclass= method.
-
Example 1
Bio::FlatFile.open(Bio::GenBank, "genbank/gbest40.seq")
-
Example 2
Bio::FlatFile.open(nil, "embl/est_hum17.dat")
-
Example 3
Bio::FlatFile.open("genbank/gbest40.seq")
-
Example 4
Bio::FlatFile.open(Bio::GenBank, $stdin)
If it is called with a block, the block will be executed with a new Bio::FlatFile object. If filename is given, the file is automatically closed when leaving the block.
-
Example 5
Bio::FlatFile.open(nil, 'test4.fst') do |ff| ff.each { |e| print e.definition, "\n" } end
-
Example 6
Bio::FlatFile.open('test4.fst') do |ff| ff.each { |e| print e.definition, "\n" } end
Compatibility Note: *arg is completely passed to the File.open
and you cannot specify “:raw => true” or “:raw => false”.
80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 |
# File 'lib/bio/io/flatfile.rb', line 80 def self.open(*arg, &block) # FlatFile.open(dbclass, file, mode, perm) # FlatFile.open(file, mode, perm) if arg.size <= 0 raise ArgumentError, 'wrong number of arguments (0 for 1)' end x = arg.shift if x.is_a?(Module) then # FlatFile.open(dbclass, filename_or_io, ...) dbclass = x elsif x.nil? then # FlatFile.open(nil, filename_or_io, ...) dbclass = nil else # FlatFile.open(filename, ...) dbclass = nil arg.unshift(x) end if arg.size <= 0 raise ArgumentError, 'wrong number of arguments (1 for 2)' end file = arg.shift # check if file is filename or IO object unless file.respond_to?(:gets) # 'file' is a filename _open_file(dbclass, file, *arg, &block) else # 'file' is a IO object ff = self.new(dbclass, file) block_given? ? (yield ff) : ff end end |
.open_file(filename, *arg) ⇒ Object
Same as FlatFile.auto(filename, *arg), except that it only accept filename and doesn’t accept IO object. File format is automatically determined.
It can accept a block. If a block is given, it returns the block’s return value. Otherwise, it returns a new FlatFile object.
144 145 146 |
# File 'lib/bio/io/flatfile.rb', line 144 def self.open_file(filename, *arg) _open_file(nil, filename, *arg) end |
.open_uri(uri, *arg) ⇒ Object
Opens URI specified as uri. uri must be a String or URI object. *arg is passed to OpenURI.open_uri or URI#open.
Like FlatFile#open, it can accept a block.
Note that you MUST explicitly require ‘open-uri’. Because open-uri.rb modifies existing class, it isn’t required by default.
177 178 179 180 181 182 183 184 185 186 |
# File 'lib/bio/io/flatfile.rb', line 177 def self.open_uri(uri, *arg) if block_given? then BufferedInputStream.open_uri(uri, *arg) do |stream| yield self.new(nil, stream) end else stream = BufferedInputStream.open_uri(uri, *arg) self.new(nil, stream) end end |
.to_a(*arg) ⇒ Object
Same as FlatFile.auto(filename_or_stream, *arg).to_a
(This method might be OBSOLETED in the future.)
129 130 131 132 133 134 |
# File 'lib/bio/io/flatfile.rb', line 129 def self.to_a(*arg) self.auto(*arg) do |ff| raise 'cannot determine file format' unless ff.dbclass ff.to_a end end |
Instance Method Details
#autodetect(lines = 31, ad = AutoDetect.default) ⇒ Object
Performs determination of database class (file format). Pre-reads lines
lines for format determination (default 31 lines). If fails, returns nil or false. Otherwise, returns database class.
The method can be called anytime if you want (but not recommended). This might be useful if input file is a mixture of muitiple format data.
429 430 431 432 433 434 435 436 |
# File 'lib/bio/io/flatfile.rb', line 429 def autodetect(lines = 31, ad = AutoDetect.default) if r = ad.autodetect_flatfile(self, lines) self.dbclass = r else self.dbclass = nil unless self.dbclass end r end |
#close ⇒ Object
Closes input stream. (similar to IO#close)
351 352 353 |
# File 'lib/bio/io/flatfile.rb', line 351 def close @stream.close end |
#each_entry ⇒ Object Also known as: each
334 335 336 337 338 |
# File 'lib/bio/io/flatfile.rb', line 334 def each_entry while e = self.next_entry yield e end end |
#entry_ended_pos ⇒ Object
(end position of the last entry) + 1
322 323 324 |
# File 'lib/bio/io/flatfile.rb', line 322 def entry_ended_pos @splitter.entry_ended_pos end |
#entry_pos_flag ⇒ Object
a flag to write down entry start and end positions
307 308 309 |
# File 'lib/bio/io/flatfile.rb', line 307 def entry_pos_flag @splitter.entry_pos_flag end |
#entry_pos_flag=(x) ⇒ Object
Sets flag to write down entry start and end positions
312 313 314 |
# File 'lib/bio/io/flatfile.rb', line 312 def entry_pos_flag=(x) @splitter.entry_pos_flag = x end |
#entry_raw ⇒ Object
Returns the last raw entry as a string.
302 303 304 |
# File 'lib/bio/io/flatfile.rb', line 302 def entry_raw @splitter.entry end |
#entry_start_pos ⇒ Object
start position of the last entry
317 318 319 |
# File 'lib/bio/io/flatfile.rb', line 317 def entry_start_pos @splitter.entry_start_pos end |
#eof? ⇒ Boolean
Returns true if input stream is end-of-file. Otherwise, returns false. (Similar to IO#eof?, but may not be equal to io.eof?, because FlatFile has its own internal buffer.)
380 381 382 |
# File 'lib/bio/io/flatfile.rb', line 380 def eof? @stream.eof? end |
#gets(*arg) ⇒ Object
Similar to IO#gets. Internal use only. Users should not call it directly.
395 396 397 |
# File 'lib/bio/io/flatfile.rb', line 395 def gets(*arg) @stream.gets(*arg) end |
#io ⇒ Object
(DEPRECATED) IO object in the flatfile object.
Compatibility Note: Bio::FlatFile#io is deprecated. Please use Bio::FlatFile#to_io instead.
255 256 257 258 |
# File 'lib/bio/io/flatfile.rb', line 255 def io warn "Bio::FlatFile#io is deprecated." @stream.to_io end |
#next_entry ⇒ Object
Get next entry.
277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 |
# File 'lib/bio/io/flatfile.rb', line 277 def next_entry raise UnknownDataFormatError, 'file format auto-detection failed?' unless @dbclass if @skip_leader_mode and ((@firsttime_flag and @skip_leader_mode == :firsttime) or @skip_leader_mode == :everytime) @splitter.skip_leader end if raw then r = @splitter.get_entry else r = @splitter.get_parsed_entry end @firsttime_flag = false return nil unless r if raw then r else @entry = r @entry end end |
#path ⇒ Object
Pathname, filename or URI (or nil).
268 269 270 |
# File 'lib/bio/io/flatfile.rb', line 268 def path @stream.path end |
#pos ⇒ Object
Returns current position of input stream. If the input stream is not a normal file, the result is not guaranteed. It is similar to IO#pos. Note that it will not be equal to io.pos, because FlatFile has its own internal buffer.
361 362 363 |
# File 'lib/bio/io/flatfile.rb', line 361 def pos @stream.pos end |
#pos=(p) ⇒ Object
(Not recommended to use it.) Sets position of input stream. If the input stream is not a normal file, the result is not guaranteed. It is similar to IO#pos=. Note that it will not be equal to io.pos=, because FlatFile has its own internal buffer.
372 373 374 |
# File 'lib/bio/io/flatfile.rb', line 372 def pos=(p) @stream.pos=(p) end |
#rewind ⇒ Object
Resets file pointer to the start of the flatfile. (similar to IO#rewind)
343 344 345 346 347 |
# File 'lib/bio/io/flatfile.rb', line 343 def rewind r = (@splitter || @stream).rewind @firsttime_flag = true r end |
#to_io ⇒ Object
IO object in the flatfile object.
Compatibility Note: Bio::FlatFile#io is deprecated.
263 264 265 |
# File 'lib/bio/io/flatfile.rb', line 263 def to_io @stream.to_io end |