Class: Bio::FlatFile

Inherits:
Object show all
Includes:
Enumerable
Defined in:
lib/bio/io/flatfile.rb

Overview

Bio::FlatFile is a helper and wrapper class to read a biological data file. It acts like a IO object. It can automatically detect data format, and users do not need to tell the class what the data is.

Defined Under Namespace

Modules: Splitter Classes: AutoDetect, BufferedInputStream, UnknownDataFormatError

Instance Attribute Summary collapse

Class Method Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(dbclass, stream) ⇒ FlatFile

Same as FlatFile.open, except that ‘stream’ should be a opened stream object (IO, File, …, who have the ‘gets’ method).

Compatibility Note: Now, you cannot specify “:raw => true” or “:raw => false”. Below styles are DEPRECATED.

  • Example 3 (deprecated)

    # Bio::FlatFile.new(nil, $stdin, :raw=>true) # => ERROR
    # Please rewrite as below.
    ff = Bio::FlatFile.new(nil, $stdin)
    ff.raw = true
    
  • Example 3 in old style (deprecated)

    # Bio::FlatFile.new(nil, $stdin, true) # => ERROR
    # Please rewrite as below.
    ff = Bio::FlatFile.new(nil, $stdin)
    ff.raw = true
    


548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
# File 'lib/bio/io/flatfile.rb', line 548

def initialize(dbclass, stream)
  # 2nd arg: IO object
  if stream.kind_of?(BufferedInputStream)
    @stream = stream
  else
    @stream = BufferedInputStream.for_io(stream)
  end
  # 1st arg: database class (or file format autodetection)
  if dbclass then
	self.dbclass = dbclass
  else
	autodetect
  end
  #
  @skip_leader_mode = :firsttime
  @firsttime_flag = true
  # default raw mode is false
  self.raw = false
end

Instance Attribute Details

#dbclassObject

Returns database class which is automatically detected or given in FlatFile#initialize.



735
736
737
# File 'lib/bio/io/flatfile.rb', line 735

def dbclass
  @dbclass
end

#entryObject (readonly)

Returns the value of attribute entry.



618
619
620
# File 'lib/bio/io/flatfile.rb', line 618

def entry
  @entry
end

#rawObject

If true, raw mode.



710
711
712
# File 'lib/bio/io/flatfile.rb', line 710

def raw
  @raw
end

#skip_leader_modeObject

The mode how to skip leader of the data.

:firsttime

(DEFAULT) only head of file (= first time to read)

:everytime

everytime to read entry

nil

never skip



572
573
574
# File 'lib/bio/io/flatfile.rb', line 572

def skip_leader_mode
  @skip_leader_mode
end

Class Method Details

.auto(*arg, &block) ⇒ Object

Same as Bio::FlatFile.open(nil, filename_or_stream, mode, perm, options).



445
446
447
# File 'lib/bio/io/flatfile.rb', line 445

def self.auto(*arg, &block)
  self.open(nil, *arg, &block)
end

.autodetect(text) ⇒ Object

Detects database class (== file format) of given string. If fails to determine, returns false or nil.



774
775
776
# File 'lib/bio/io/flatfile.rb', line 774

def self.autodetect(text)
  AutoDetect.default.autodetect(text)
end

.autodetect_file(filename) ⇒ Object

Detects database class (== file format) of given file. If fails to determine, returns nil.



754
755
756
# File 'lib/bio/io/flatfile.rb', line 754

def self.autodetect_file(filename)
  self.open_file(filename).dbclass
end

.autodetect_io(io) ⇒ Object

Detects database class (== file format) of given input stream. If fails to determine, returns nil. Caution: the method reads some data from the input stream, and the data will be lost.



762
763
764
# File 'lib/bio/io/flatfile.rb', line 762

def self.autodetect_io(io)
  self.new(nil, io).dbclass
end

.autodetect_stream(io) ⇒ Object

This is OBSOLETED. Please use autodetect_io(io) instead.



767
768
769
770
# File 'lib/bio/io/flatfile.rb', line 767

def self.autodetect_stream(io)
  $stderr.print "Bio::FlatFile.autodetect_stream will be deprecated." if $VERBOSE
  self.autodetect_io(io)
end

.foreach(*arg) ⇒ Object

Executes the block for every entry in the stream. Same as FlatFile.open(*arg) { |ff| ff.each { |entry| … }}.

  • Example

    Bio::FlatFile.foreach('test.fst') { |e| puts e.definition }
    


517
518
519
520
521
522
523
# File 'lib/bio/io/flatfile.rb', line 517

def self.foreach(*arg)
  self.open(*arg) do |flatfileobj|
    flatfileobj.each do |entry|
      yield entry
    end
  end
end

.open(*arg, &block) ⇒ Object

Bio::FlatFile.open(file, *arg)

Bio::FlatFile.open(dbclass, file, *arg)

Creates a new Bio::FlatFile object to read a file or a stream which contains dbclass data.

dbclass should be a class (or module) or nil. e.g. Bio::GenBank, Bio::FastaFormat.

If file is a filename (which doesn’t have gets method), the method opens a local file named file with File.open(filename, *arg).

When dbclass is omitted or nil is given to dbclass, the method tries to determine database class (file format) automatically. When it fails to determine, dbclass is set to nil and FlatFile#next_entry would fail. You can still set dbclass using FlatFile#dbclass= method.

If it is called with a block, the block will be executed with a new Bio::FlatFile object. If filename is given, the file is automatically closed when leaving the block.

  • Example 5

    Bio::FlatFile.open(nil, 'test4.fst') do |ff|
        ff.each { |e| print e.definition, "\n" }
    end
    
  • Example 6

    Bio::FlatFile.open('test4.fst') do |ff|
        ff.each { |e| print e.definition, "\n" }
    end
    

Compatibility Note: *arg is completely passed to the File.open and you cannot specify “:raw => true” or “:raw => false”.



403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
# File 'lib/bio/io/flatfile.rb', line 403

def self.open(*arg, &block)
  # FlatFile.open(dbclass, file, mode, perm)
  # FlatFile.open(file, mode, perm)
  if arg.size <= 0
    raise ArgumentError, 'wrong number of arguments (0 for 1)'
  end
  x = arg.shift
  if x.is_a?(Module) then
    # FlatFile.open(dbclass, filename_or_io, ...)
    dbclass = x
  elsif x.nil? then
    # FlatFile.open(nil, filename_or_io, ...)
    dbclass = nil
  else
    # FlatFile.open(filename, ...)
    dbclass = nil
    arg.unshift(x)
  end
  if arg.size <= 0
    raise ArgumentError, 'wrong number of arguments (1 for 2)'
  end
  file = arg.shift
  # check if file is filename or IO object
  unless file.respond_to?(:gets)
    # 'file' is a filename
    _open_file(dbclass, file, *arg, &block)
  else
    # 'file' is a IO object
    ff = self.new(dbclass, file)
    block_given? ? (yield ff) : ff
  end
end

.open_file(filename, *arg) ⇒ Object

Same as FlatFile.auto(filename, *arg), except that it only accept filename and doesn’t accept IO object. File format is automatically determined.

It can accept a block. If a block is given, it returns the block’s return value. Otherwise, it returns a new FlatFile object.



467
468
469
# File 'lib/bio/io/flatfile.rb', line 467

def self.open_file(filename, *arg)
  _open_file(nil, filename, *arg)
end

.open_uri(uri, *arg) ⇒ Object

Opens URI specified as uri. uri must be a String or URI object. *arg is passed to OpenURI.open_uri or URI#open.

Like FlatFile#open, it can accept a block.

Note that you MUST explicitly require ‘open-uri’. Because open-uri.rb modifies existing class, it isn’t required by default.



500
501
502
503
504
505
506
507
508
509
# File 'lib/bio/io/flatfile.rb', line 500

def self.open_uri(uri, *arg)
  if block_given? then
    BufferedInputStream.open_uri(uri, *arg) do |stream|
      yield self.new(nil, stream)
    end
  else
    stream = BufferedInputStream.open_uri(uri, *arg)
    self.new(nil, stream)
  end
end

.to_a(*arg) ⇒ Object

Same as FlatFile.auto(filename_or_stream, *arg).to_a

(This method might be OBSOLETED in the future.)



452
453
454
455
456
457
# File 'lib/bio/io/flatfile.rb', line 452

def self.to_a(*arg)
  self.auto(*arg) do |ff|
    raise 'cannot determine file format' unless ff.dbclass
    ff.to_a
  end
end

Instance Method Details

#autodetect(lines = 31, ad = AutoDetect.default) ⇒ Object

Performs determination of database class (file format). Pre-reads lines lines for format determination (default 31 lines). If fails, returns nil or false. Otherwise, returns database class.

The method can be called anytime if you want (but not recommended). This might be useful if input file is a mixture of muitiple format data.



743
744
745
746
747
748
749
750
# File 'lib/bio/io/flatfile.rb', line 743

def autodetect(lines = 31, ad = AutoDetect.default)
  if r = ad.autodetect_flatfile(self, lines)
    self.dbclass = r
  else
    self.dbclass = nil unless self.dbclass
  end
  r
end

#closeObject

Closes input stream. (similar to IO#close)



670
671
672
# File 'lib/bio/io/flatfile.rb', line 670

def close
  @stream.close
end

#each_entryObject Also known as: each

Iterates over each entry in the flatfile.

  • Example

    include Bio
    ff = FlatFile.open(GenBank, "genbank/gbhtg14.seq")
    ff.each_entry do |x|
      puts x.definition
    end
    


653
654
655
656
657
# File 'lib/bio/io/flatfile.rb', line 653

def each_entry
  while e = self.next_entry
	yield e
  end
end

#entry_ended_posObject

(end position of the last entry) + 1



641
642
643
# File 'lib/bio/io/flatfile.rb', line 641

def entry_ended_pos
  @splitter.entry_ended_pos
end

#entry_pos_flagObject

a flag to write down entry start and end positions



626
627
628
# File 'lib/bio/io/flatfile.rb', line 626

def entry_pos_flag
  @splitter.entry_pos_flag
end

#entry_pos_flag=(x) ⇒ Object

Sets flag to write down entry start and end positions



631
632
633
# File 'lib/bio/io/flatfile.rb', line 631

def entry_pos_flag=(x)
  @splitter.entry_pos_flag = x
end

#entry_rawObject

Returns the last raw entry as a string.



621
622
623
# File 'lib/bio/io/flatfile.rb', line 621

def entry_raw
  @splitter.entry
end

#entry_start_posObject

start position of the last entry



636
637
638
# File 'lib/bio/io/flatfile.rb', line 636

def entry_start_pos
  @splitter.entry_start_pos
end

#eof?Boolean

Returns true if input stream is end-of-file. Otherwise, returns false. (Similar to IO#eof?, but may not be equal to io.eof?, because FlatFile has its own internal buffer.)

Returns:

  • (Boolean)


699
700
701
# File 'lib/bio/io/flatfile.rb', line 699

def eof?
  @stream.eof?
end

#gets(*arg) ⇒ Object

Similar to IO#gets. Internal use only. Users should not call it directly.



714
715
716
# File 'lib/bio/io/flatfile.rb', line 714

def gets(*arg)
  @stream.gets(*arg)
end

#ioObject

(DEPRECATED) IO object in the flatfile object.

Compatibility Note: Bio::FlatFile#io is deprecated. Please use Bio::FlatFile#to_io instead.



578
579
580
581
# File 'lib/bio/io/flatfile.rb', line 578

def io
  warn "Bio::FlatFile#io is deprecated."
  @stream.to_io
end

#next_entryObject

Get next entry.



600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
# File 'lib/bio/io/flatfile.rb', line 600

def next_entry
  raise UnknownDataFormatError, 
  'file format auto-detection failed?' unless @dbclass
  if @skip_leader_mode and
      ((@firsttime_flag and @skip_leader_mode == :firsttime) or
         @skip_leader_mode == :everytime)
    @splitter.skip_leader
  end
  r = @splitter.get_entry
  @firsttime_flag = false
  return nil unless r
  if raw then
	r
  else
	@entry = @dbclass.new(r)
    @entry
  end
end

#pathObject

Pathname, filename or URI (or nil).



591
592
593
# File 'lib/bio/io/flatfile.rb', line 591

def path
  @stream.path
end

#posObject

Returns current position of input stream. If the input stream is not a normal file, the result is not guaranteed. It is similar to IO#pos. Note that it will not be equal to io.pos, because FlatFile has its own internal buffer.



680
681
682
# File 'lib/bio/io/flatfile.rb', line 680

def pos
  @stream.pos
end

#pos=(p) ⇒ Object

(Not recommended to use it.) Sets position of input stream. If the input stream is not a normal file, the result is not guaranteed. It is similar to IO#pos=. Note that it will not be equal to io.pos=, because FlatFile has its own internal buffer.



691
692
693
# File 'lib/bio/io/flatfile.rb', line 691

def pos=(p)
  @stream.pos=(p)
end

#rewindObject

Resets file pointer to the start of the flatfile. (similar to IO#rewind)



662
663
664
665
666
# File 'lib/bio/io/flatfile.rb', line 662

def rewind
  r = @stream.rewind
  @firsttime_flag = true
  r
end

#to_ioObject

IO object in the flatfile object.

Compatibility Note: Bio::FlatFile#io is deprecated.



586
587
588
# File 'lib/bio/io/flatfile.rb', line 586

def to_io
  @stream.to_io
end