Class: SequenceServer::Doctor
- Inherits:
-
Object
- Object
- SequenceServer::Doctor
- Extended by:
- Forwardable
- Defined in:
- lib/sequenceserver/doctor.rb
Overview
Doctor detects inconsistencies likely to cause problems with Sequenceserver operation.
Constant Summary collapse
- ERROR_PARSE_SEQIDS =
1
- ERROR_NUMERIC_IDS =
2
- ERROR_PROBLEMATIC_IDS =
3
- AVOID_ID_REGEX =
/^(?!gi|bbs)\w+\|\w*\|?/
Instance Attribute Summary collapse
-
#all_seqids ⇒ Object
readonly
Returns the value of attribute all_seqids.
-
#invalids ⇒ Object
readonly
Returns the value of attribute invalids.
Class Method Summary collapse
-
.all_sequence_ids(ignore) ⇒ Object
Retrieve sequence ids (specified by %i) from all databases.
-
.bullet_list(values) ⇒ Object
Pretty print database list.
-
.inspect_parse_seqids(seqids) ⇒ Object
FASTA files formatted without -parse_seqids option won’t support the blastdbcmd command of fetching sequence ids using ‘%i’ identifier.
-
.inspect_seqids(seqids, &block) ⇒ Object
Returns an array of database objects in which each of the object has an array of sequence_ids satisfying the block passed to the method.
-
.show_message(error, values) ⇒ Object
Print diagnostic error messages according to the type of error.
Instance Method Summary collapse
-
#check_id_format ⇒ Object
Warn users about sequence identifiers of format abc|def because then BLAST+ appends a gnl (for general) infront of the database identifiers.
-
#check_numeric_ids ⇒ Object
Check for the presence of numeric sequence ids within a database.
-
#check_parse_seqids ⇒ Object
Obtain files which aren’t formatted with -parse_seqids and add them to ignore list.
- #diagnose ⇒ Object
-
#initialize ⇒ Doctor
constructor
A new instance of Doctor.
-
#remove_invalid_databases ⇒ Object
Remove entried which are in ignore list or not formatted with -parse_seqids option.
Constructor Details
#initialize ⇒ Doctor
Returns a new instance of Doctor.
98 99 100 101 |
# File 'lib/sequenceserver/doctor.rb', line 98 def initialize @ignore = [] @all_seqids = Doctor.all_sequence_ids(@ignore) end |
Instance Attribute Details
#all_seqids ⇒ Object (readonly)
Returns the value of attribute all_seqids.
103 104 105 |
# File 'lib/sequenceserver/doctor.rb', line 103 def all_seqids @all_seqids end |
#invalids ⇒ Object (readonly)
Returns the value of attribute invalids.
103 104 105 |
# File 'lib/sequenceserver/doctor.rb', line 103 def invalids @invalids end |
Class Method Details
.all_sequence_ids(ignore) ⇒ Object
Retrieve sequence ids (specified by %i) from all databases. Using accession number is problematic because of several reasons.
31 32 33 34 35 36 37 38 39 40 41 |
# File 'lib/sequenceserver/doctor.rb', line 31 def all_sequence_ids(ignore) Database.map do |db| next if ignore.include? db out = `blastdbcmd -entry all -db #{db.name} -outfmt "%i" 2> /dev/null` { db: db, seqids: out.to_s.split } end.compact end |
.bullet_list(values) ⇒ Object
Pretty print database list.
54 55 56 57 58 59 60 |
# File 'lib/sequenceserver/doctor.rb', line 54 def bullet_list(values) list = '' values.each do |value| list << " - #{value}\n" end list end |
.inspect_parse_seqids(seqids) ⇒ Object
FASTA files formatted without -parse_seqids option won’t support the blastdbcmd command of fetching sequence ids using ‘%i’ identifier. In such cases, an array of ‘N/A’ values are returned which is checked in this case.
47 48 49 50 51 |
# File 'lib/sequenceserver/doctor.rb', line 47 def inspect_parse_seqids(seqids) seqids.map do |sq| sq[:db] if sq[:seqids].include? 'N/A' end.compact end |
.inspect_seqids(seqids, &block) ⇒ Object
Returns an array of database objects in which each of the object has an array of sequence_ids satisfying the block passed to the method.
23 24 25 26 27 |
# File 'lib/sequenceserver/doctor.rb', line 23 def inspect_seqids(seqids, &block) seqids.map do |sq| sq[:db] unless sq[:seqids].select(&block).empty? end.compact end |
.show_message(error, values) ⇒ Object
Print diagnostic error messages according to the type of error. rubocop:disable Metrics/MethodLength
64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 |
# File 'lib/sequenceserver/doctor.rb', line 64 def (error, values) return if values.empty? case error when ERROR_PARSE_SEQIDS puts <<~MSG *** Doctor has found improperly formatted database: #{bullet_list(values)} Please reformat your databases with -parse_seqids switch (or use sequenceserver -m) for using SequenceServer as the current format may cause problems. These databases are ignored in further checks. MSG when ERROR_NUMERIC_IDS puts <<~MSG *** Doctor has found databases with numeric sequence ids: #{bullet_list(values)} Note that this may cause problems with sequence retrieval. MSG when ERROR_PROBLEMATIC_IDS puts <<~MSG *** Doctor has found databases with problematic sequence ids: #{bullet_list(values)} This causes some sequence to contain extraneous words like `gnl|` appended to their id string. MSG end end |
Instance Method Details
#check_id_format ⇒ Object
Warn users about sequence identifiers of format abc|def because then BLAST+ appends a gnl (for general) infront of the database identifiers. There are only two identifiers that we need to avoid when searching for this format. bbs|number, gi|number Note that while sequence ids could have been arbitrary, using parse_seqids reduces our search space substantially.
147 148 149 150 151 152 |
# File 'lib/sequenceserver/doctor.rb', line 147 def check_id_format selector = proc { |id| id.match(AVOID_ID_REGEX) } Doctor.(ERROR_PROBLEMATIC_IDS, Doctor.inspect_seqids(@all_seqids, &selector)) end |
#check_numeric_ids ⇒ Object
Check for the presence of numeric sequence ids within a database.
133 134 135 136 137 138 |
# File 'lib/sequenceserver/doctor.rb', line 133 def check_numeric_ids selector = proc { |id| !id.to_i.zero? } Doctor.(ERROR_NUMERIC_IDS, Doctor.inspect_seqids(@all_seqids, &selector)) end |
#check_parse_seqids ⇒ Object
Obtain files which aren’t formatted with -parse_seqids and add them to ignore list.
125 126 127 128 129 130 |
# File 'lib/sequenceserver/doctor.rb', line 125 def check_parse_seqids without_parse_seqids = Doctor.inspect_parse_seqids(@all_seqids) Doctor.(ERROR_PARSE_SEQIDS, without_parse_seqids) @ignore.concat(without_parse_seqids) end |
#diagnose ⇒ Object
105 106 107 108 109 110 111 112 113 114 115 |
# File 'lib/sequenceserver/doctor.rb', line 105 def diagnose puts "\n1/3 Inspecting databases for proper -parse_seqids formatting.." check_parse_seqids remove_invalid_databases puts "\n2/3 Inspecting databases for numeric sequence ids.." check_numeric_ids puts "\n3/3 Inspecting databases for problematic sequence ids.." check_id_format end |
#remove_invalid_databases ⇒ Object
Remove entried which are in ignore list or not formatted with -parse_seqids option.
119 120 121 |
# File 'lib/sequenceserver/doctor.rb', line 119 def remove_invalid_databases @all_seqids.delete_if { |sq| @ignore.include? sq[:db] } end |