Module: Quorum::Sequence
- Included in:
- Job
- Defined in:
- lib/quorum/sequence.rb
Defined Under Namespace
Modules: SendSequence
Instance Method Summary collapse
-
#create_hash(sequence) ⇒ Object
Create a unique hash plus timestamp.
-
#discover_input_sequence_type(sequence) ⇒ Object
Discover input sequence type (nucleic acid NA or amino acid AA).
-
#write_input_sequence_to_file(tmp_dir, hash, sequence) ⇒ Object
Write input sequence to file.
Instance Method Details
#create_hash(sequence) ⇒ Object
Create a unique hash plus timestamp.
7 8 9 |
# File 'lib/quorum/sequence.rb', line 7 def create_hash(sequence) Digest::MD5.hexdigest(sequence).to_s + "-" + Time.now.to_f.to_s end |
#discover_input_sequence_type(sequence) ⇒ Object
Discover input sequence type (nucleic acid NA or amino acid AA).
Subtracting all AA single letter chars from NA single letter chars (including ALL ambiguity codes for each!) leaves us with EQILFP. If a sequence contains EQILFP, it’s safe to call it an AA.
See single letter char tables for more information: en.wikipedia.org/wiki/Proteinogenic_amino_acid www.chick.manchester.ac.uk/SiteSeer/IUPAC_codes.html
If a sequence doesn’t contain EQILFP, it could be either an AA or NA. To distinguish the two, count the number of As Ts Gs Cs and Ns, divide by the the length of the sequence and * 100.
If the percentage of A, T, U, G, C or N is >= 15.0, call it a NA. 15% was choosen based on the data in the table “Relative proportions (%) of bases in DNA” (en.wikipedia.org/wiki/Chargaff’s_rules) and the precentage of AAs found in peptides (www.ncbi.nlm.nih.gov/pmc/articles/PMC2590925/).
55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 |
# File 'lib/quorum/sequence.rb', line 55 def discover_input_sequence_type(sequence) # Index of the first newline char. start = sequence.index(/\n/) # Remove the sequence FASTA header. seq = sequence.slice(start..-1).gsub!(/\n/, '') if seq =~ /[EQILFPeqilfp]+/ type = "amino_acid" else # Length of the sequence minus the FASTA header. len = seq.length.to_f na_percent = 15.0 counts = [] counts << (seq.count("Aa").to_f / len) * 100 counts << (seq.count("Tt").to_f / len) * 100 counts << (seq.count("Uu").to_f / len) * 100 counts << (seq.count("Gg").to_f / len) * 100 counts << (seq.count("Cc").to_f / len) * 100 counts << (seq.count("Nn").to_f / len) * 100 counts.reject! { |c| c < na_percent } if (!counts.empty?) type = "nucleic_acid" else type = "amino_acid" end end type end |
#write_input_sequence_to_file(tmp_dir, hash, sequence) ⇒ Object
Write input sequence to file. Pass the raw input data through seqret to ensure FASTA format.
15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 |
# File 'lib/quorum/sequence.rb', line 15 def write_input_sequence_to_file(tmp_dir, hash, sequence) seq = File.join(tmp_dir, hash + ".seq") File.open(seq, "w") do |f| f << sequence end fasta = File.join(tmp_dir, hash + ".fa") # Force FASTA format. cmd = "seqret -filter -sformat pearson -osformat fasta < #{seq} " << "> #{fasta} 2> /dev/null" system(cmd) if $?.exitstatus > 0 raise " - Please enter your sequence(s) in Plain Text as " << "FASTA." end end |