Class: Bio::Lasergene
Overview
bio/db/lasergene.rb - Interface for DNAStar Lasergene sequence file format
- Author
-
Trevor Wennblom <[email protected]>
- Copyright
-
Copyright © 2007 Center for Biomedical Research Informatics, University of Minnesota (cbri.umn.edu)
- License
-
The Ruby License
Description
Bio::Lasergene reads DNAStar Lasergene formatted sequence files, or .seq
files. It only expects to find one sequence per file.
Usage
require 'bio'
filename = 'MyFile.seq'
lseq = Bio::Lasergene.new( IO.readlines(filename) )
lseq.entry_id # => "Contig 1"
lseq.seq # => ATGACGTATCCAAAGAGGCGTTACC
Comments
I’m only aware of the following three kinds of Lasergene file formats. Feel free to send me other examples that may not currently be accounted for.
File format 1:
## begin ##
"Contig 1" (1,934)
Contig Length: 934 bases
Average Length/Sequence: 467 bases
Total Sequence Length: 1869 bases
Top Strand: 2 sequences
Bottom Strand: 2 sequences
Total: 4 sequences
^^
ATGACGTATCCAAAGAGGCGTTACCGGAGAAGAAGACACCGCCCCCGCAGTCCTCTTGGCCAGATCCTCCGCCGCCGCCCCTGGCTCGTCCACCCCCGCCACAGTTACCGCTGGAGAAGGAAAAATGGCATCTTCAWCACCCGCCTATCCCGCAYCTTCGGAWRTACTATCAAGCGAACCACAGTCAGAACGCCCTCCTGGGCGGTGGACATGATGAGATTCAATATTAATGACTTTCTTCCCCCAGGAGGGGGCTCAAACCCCCGCTCTGTGCCCTTTGAATACTACAGAATAAGAAAGGTTAAGGTTGAATTCTGGCCCTGCTCCCCGATCACCCAGGGTGACAGGGGAATGGGCTCCAGTGCTGWTATTCTAGMTGATRRCTTKGTAACAAAGRCCACAGCCCTCACCTATGACCCCTATGTAAACTTCTCCTCCCGCCATACCATAACCCAGCCCTTCTCCTACCRCTCCCGYTACTTTACCCCCAAACCTGTCCTWGATKCCACTATKGATKACTKCCAACCAAACAACAAAAGAAACCAGCTGTGGSTGAGACTACAWACTGCTGGAAATGTAGACCWCGTAGGCCTSGGCACTGCGTKCGAAAACAGTATATACGACCAGGAATACAATATCCGTGTMACCATGTATGTACAATTCAGAGAATTTAATCTTAAAGACCCCCCRCTTMACCCKTAATGAATAATAAMAACCATTACGAAGTGATAAAAWAGWCTCAGTAATTTATTYCATATGGAAATTCWSGGCATGGGGGGGAAAGGGTGACGAACKKGCCCCCTTCCTCCSTSGMYTKTTCYGTAGCATTCYTCCAMAAYACCWAGGCAGYAMTCCTCCSATCAAGAGcYTSYACAGCTGGGACAGCAGTTGAGGAGGACCATTCAAAGGGGGTCGGATTGCTGGTAATCAGA
## end ##
File format 2:
## begin ##
^^: 350,935
Contig 1 (1,935)
Contig Length: 935 bases
Average Length/Sequence: 580 bases
Total Sequence Length: 2323 bases
Top Strand: 2 sequences
Bottom Strand: 2 sequences
Total: 4 sequences
^^
ATGTCGGGGAAATGCTTGACCGCGGGCTACTGCTCATCATTGCTTTCTTTGTGGTATATCGTGCCGTTCTGTTTTGCTGTGCTCGTCAACGCCAGCGGCGACAGCAGCTCTCATTTTCAGTCGATTTATAACTTGACGTTATGTGAGCTGAATGGCACGAACTGGCTGGCAGACAACTTTAACTGGGCTGTGGAGACTTTTGTCATCTTCCCCGTGTTGACTCACATTGTTTCCTATGGTGCACTCACTACCAGTCATTTTCTTGACACAGTTGGTCTAGTTACTGTGTCTACCGCCGGGTTTTATCACGGGCGGTACGTCTTGAGTAGCATCTACGCGGTCTGTGCTCTGGCTGCGTTGATTTGCTTCGCCATCAGGTTTGCGAAGAACTGCATGTCCTGGCGCTACTCTTGCACTAGATACACCAACTTCCTCCTGGACACCAAGGGCAGACTCTATCGTTGGCGGTCGCCTGTCATCATAGAGAAAGGGGGTAAGGTTGAGGTCGAAGGTCATCTGATCGATCTCAAAAGAGTTGTGCTTGATGGCTCTGTGGCGACACCTTTAACCAGAGTTTCAGCGGAACAATGGGGTCGTCCCTAGACGACTTTTGCCATGATAGTACAGCCCCACAGAAGGTGCTCTTGGCGTTTTCCATCACCTACACGCCAGTGATGATATATGCCCTAAAGGTAAGCCGCGGCCGACTTTTGGGGCTTCTGCACCTTTTGATTTTTTTGAACTGTGCCTTTACTTTCGGGTACATGACATTCGTGCACTTTCGGAGCACGAACAAGGTCGCGCTCACTATGGGAGCAGTAGTCGCACTCCTTTGGGGGGTGTACTCAGCCATAGAAACCTGGAAATTCATCACCTCCAGATGCCGTTGTGCTTGCTAGGCCGCAAGTACATTCTGGCCCCTGCCCACCACGTTG
## end ##
File format 3 (non-standard Lasergene header):
## begin ##
LOCUS PRU87392 15411 bp RNA linear VRL 17-NOV-2000
DEFINITION Porcine reproductive and respiratory syndrome virus strain VR-2332,
complete genome.
ACCESSION U87392 AF030244 U00153
VERSION U87392.3 GI:11192298
[...cut...]
3'UTR 15261..15411
polyA_site 15409
ORIGIN
^^
atgacgtataggtgttggctctatgccttggcatttgtattgtcaggagctgtgaccattggcacagcccaaaacttgctgcacagaaacacccttctgtgatagcctccttcaggggagcttagggtttgtccctagcaccttgcttccggagttgcactgctttacggtctctccacccctttaaccatgtctgggatacttgatcggtgcacgtgtacccccaatgccagggtgtttatggcggagggccaagtctactgcacacgatgcctcagtgcacggtctctccttcccctgaacctccaagtttctgagctcggggtgctaggcctattctacaggcccgaagagccactccggtggacgttgccacgtgcattccccactgttgagtgctcccccgccggggcctgctggctttctgcaatctttccaatcgcacgaatgaccagtggaaacctgaacttccaacaaagaatggtacgggtcgcagctgagctttacagagccggccagctcacccctgcagtcttgaaggctctacaagtttatgaacggggttgccgctggtaccccattgttggacctgtccctggagtggccgttttcgccaattccctacatgtgagtgataaacctttcccgggagcaactcacgtgttgaccaacctgccgctcccgcagagacccaagcctgaagacttttgcccctttgagtgtgctatggctactgtctatgacattggtcatgacgccgtcatgtatgtggccgaaaggaaagtctcctgggcccctcgtggcggggatgaagtgaaatttgaagctgtccccggggagttgaagttgattgcgaaccggctccgcacctccttcccgccccaccacacagtggacatgtctaagttcgccttcacagcccctgggtgtggtgtttctatgcgggtcgaacgccaacacggctgccttcccgctgacactgtccctgaaggcaactgctggtggagcttgtttgacttgcttccactggaagttcagaacaaagaaattcgccatgctaaccaatttggctaccagaccaagcatggtgtctctggcaagtacctacagcggaggctgca[...cut...]
## end ##
Constant Summary collapse
- DELIMITER_1 =
Match ‘^^:’ at the beginning of a line
'^\^\^:'
- DELIMITER_2 =
Match ‘^^’ at the beginning of a line
'^\^\^'
Instance Attribute Summary collapse
-
#average_length ⇒ Object
readonly
Average length per sequence * Parsed from standard Lasergene header.
-
#bottom_strand_sequences ⇒ Object
readonly
Number of bottom strand sequences * Parsed from standard Lasergene header.
-
#comments ⇒ Object
readonly
Entire header before the sequence.
-
#contig_length ⇒ Object
readonly
Contig length, length of present sequence * Parsed from standard Lasergene header.
-
#name ⇒ Object
readonly
Name of sequence * Parsed from standard Lasergene header.
-
#sequence ⇒ Object
readonly
Sequence.
-
#top_strand_sequences ⇒ Object
readonly
Number of top strand sequences * Parsed from standard Lasergene header.
-
#total_length ⇒ Object
readonly
Length of parent sequence * Parsed from standard Lasergene header.
-
#total_sequences ⇒ Object
readonly
Number of sequences * Parsed from standard Lasergene header.
Instance Method Summary collapse
-
#entry_id ⇒ Object
Name of sequence * Parsed from standard Lasergene header.
-
#initialize(lines) ⇒ Lasergene
constructor
A new instance of Lasergene.
-
#seq ⇒ Object
Sequence.
-
#standard_comment? ⇒ Boolean
Is the comment header recognized as standard Lasergene format?.
Constructor Details
#initialize(lines) ⇒ Lasergene
Returns a new instance of Lasergene.
124 125 126 |
# File 'lib/bio/db/lasergene.rb', line 124 def initialize(lines) process(lines) end |
Instance Attribute Details
#average_length ⇒ Object (readonly)
Average length per sequence
-
Parsed from standard Lasergene header
103 104 105 |
# File 'lib/bio/db/lasergene.rb', line 103 def average_length @average_length end |
#bottom_strand_sequences ⇒ Object (readonly)
Number of bottom strand sequences
-
Parsed from standard Lasergene header
115 116 117 |
# File 'lib/bio/db/lasergene.rb', line 115 def bottom_strand_sequences @bottom_strand_sequences end |
#comments ⇒ Object (readonly)
Entire header before the sequence
86 87 88 |
# File 'lib/bio/db/lasergene.rb', line 86 def comments @comments end |
#contig_length ⇒ Object (readonly)
Contig length, length of present sequence
-
Parsed from standard Lasergene header
99 100 101 |
# File 'lib/bio/db/lasergene.rb', line 99 def contig_length @contig_length end |
#name ⇒ Object (readonly)
Name of sequence
-
Parsed from standard Lasergene header
95 96 97 |
# File 'lib/bio/db/lasergene.rb', line 95 def name @name end |
#sequence ⇒ Object (readonly)
Sequence
Bio::Sequence::NA or Bio::Sequence::AA object
91 92 93 |
# File 'lib/bio/db/lasergene.rb', line 91 def sequence @sequence end |
#top_strand_sequences ⇒ Object (readonly)
Number of top strand sequences
-
Parsed from standard Lasergene header
111 112 113 |
# File 'lib/bio/db/lasergene.rb', line 111 def top_strand_sequences @top_strand_sequences end |
#total_length ⇒ Object (readonly)
Length of parent sequence
-
Parsed from standard Lasergene header
107 108 109 |
# File 'lib/bio/db/lasergene.rb', line 107 def total_length @total_length end |
#total_sequences ⇒ Object (readonly)
Number of sequences
-
Parsed from standard Lasergene header
119 120 121 |
# File 'lib/bio/db/lasergene.rb', line 119 def total_sequences @total_sequences end |
Instance Method Details
#entry_id ⇒ Object
Name of sequence
-
Parsed from standard Lasergene header
147 148 149 |
# File 'lib/bio/db/lasergene.rb', line 147 def entry_id @name end |
#seq ⇒ Object
Sequence
Bio::Sequence::NA or Bio::Sequence::AA object
141 142 143 |
# File 'lib/bio/db/lasergene.rb', line 141 def seq @sequence end |
#standard_comment? ⇒ Boolean
Is the comment header recognized as standard Lasergene format?
Arguments
-
none
- Returns
-
true
orfalse
134 135 136 |
# File 'lib/bio/db/lasergene.rb', line 134 def standard_comment? @standard_comment end |