bio-dbla-classifier

DBL-alpha tags are small regions of the PfEMP1 protein that can be PCR amplified and are classified into six expression groups depending on the number of cysteines and presence of certain motifs within the tag region (Bull et al 2007). This plugin extends bioruby’s amino acid class, Bio::Sequence::AA by adding methods to analyze DBL-alpha sequences tags. If you use this plugin please quote, Bull et al “An approach to classifying sequence tags sampled from Plasmodium falciparum var genes..” Molecular and Biochemical Parasitology 154 (1) (July): 98–102. doi:10.1016/j.molbiopara.2007.03.011.

Installation

Ruby must be installed on your system. See rubylang.info/ for information on Ruby and how to install it on your system. Once Ruby 1.9.2 is installed type the following command in the terminal to install the gem. This will install the bioruby gem if it is not already installed on your system. The plugin has been tested on Ruby 1.9.2-p290.

gem install bio-dbla-classifier

Uninstall

gem uninstall bio-dbla-classifier

Usage

#create an instance of Bioruby's Bio::Sequence::AA class with methods to classify and describe DBL-alpha tags.

require 'bio-dbla-classifier'

seq ='DIGDIVRGRDMFKSNPEVEKGLKAVFRKINNGLTPQAKTHYADEDGSGNYVKLREDWWKANRDQVWKAITCKAPQSVHYFIKTSHGTRGFTSHGKCGRNETNVPTNLDYVPQYLR'
dbl_seq = Bio::Sequence::AA.new(seq)

#get the positions of limited variability
puts dbl_seq.polv1 #=> MFKS
puts dbl_seq.polv2 #=> LRED
puts dbl_seq.polv3 #=> KAIT
puts dbl_seq.polv4 #=> PTNL

#get the number of cysteines in the tag
puts dbl_seq.cys_count #=> 2

#get the distinct sequence identifier
puts dbl_seq.dsid #=>MFKS-LRED-KAIT-2-PTNL-115

#get the cyspolv group for this tag
puts dbl_seq.cyspolv_group #=> 1

#get the block sharing group for this tag
puts dbl_seq.bs_group #=> 1

#get the length of the tag
puts dbl_seq.size #=> 115

#determine whether the tag is a var1
dbl_seq.is_var1? #=> false

#is this tag a groupA like sequence tag?
dbl_seq.is_groupA_like?

Finding the Position Specific Polymorphic Blocks(PSPB)

The pspb methods take 2 arguments, an anchor position and a window length that defines the length of the pspb.The default anchor position is 0 and the default window length is 14

#get pspb1
puts seq.pspb1 #=> NPEVEKGLKAVFRK

#get pspb2
puts seq.pspb2 #=> THYADEDGSGNYVK

#get pspb3
puts seq.pspb3 #=> CKAPQSVHYFIKTS

#get pspb4
puts seq.pspb4 #=> FTSHGKCGRNETNV

Processing a flatfile for example fasta, genbank or embl

seq_file = "sequences.fasta"

#for each entry in the file
Bio::FlatFile.open(seq_file).each do |entry|
 tag = Bio::Sequence::AA.new(entry.seq)
 puts "#{entry.definition},#{tag.cyspolv_group},#{tag.dsid},#{tag.bs_group},"
end

Copyright

See LICENSE.txt for further details