Class: YouAndMe::RawDataFileLoader
- Inherits:
-
Object
- Object
- YouAndMe::RawDataFileLoader
- Defined in:
- lib/youandme/raw_data_file_loader.rb
Overview
Tool for loading raw data files from 23andme into native Ruby structures
Instance Method Summary collapse
-
#check_file(file_name) ⇒ Object
Returns true if and only if the given file exists and is readable.
-
#load_file(file_name, max = 0) ⇒ Object
Reads the given raw 23andme data file, parses the data, and shoves it into a native Ruby data structure.
-
#markdown_links(snp) ⇒ Object
Returns Markdown-formatted URLs for the given SNP hash.
Instance Method Details
#check_file(file_name) ⇒ Object
Returns true if and only if the given file exists and is readable.
21 22 23 24 25 26 27 |
# File 'lib/youandme/raw_data_file_loader.rb', line 21 def check_file(file_name) valid = false if(file_name != nil && File.file?(file_name) && File.readable?(file_name)) valid = true end valid end |
#load_file(file_name, max = 0) ⇒ Object
Reads the given raw 23andme data file, parses the data, and shoves it into a native Ruby data structure. The returned data is an Array
full of Hashes, where each hash has key/value pairs for the columns in the data file.
31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 |
# File 'lib/youandme/raw_data_file_loader.rb', line 31 def load_file(file_name, max = 0) snps = [] # Manually spliting seems to be faster than the built-in CSV parser for rows = File.read(file_name).split("\n") rows.each do |n| row = n.chomp.split("\t") # CSV.foreach(file_name, :col_sep => "\t") do |row| # Skip if the the line is a comment next if row[0][0] == '#' break if max > 0 && snps.length >= max snp = { :rsid => row[0], :chromosome => row[1], :position => row[2], :genotype => row[3] } if snp[:genotype].match(/[A-Z]/) == nil snp[:genotype] = '-' end # y snp snps << snp end snps end |
#markdown_links(snp) ⇒ Object
Returns Markdown-formatted URLs for the given SNP hash.
9 10 11 12 13 14 15 16 17 18 |
# File 'lib/youandme/raw_data_file_loader.rb', line 9 def markdown_links(snp) s = "" if(!(snp[:rsid][0] == 'i')) #how to break a string across rows? s = "[dbSNP](http://www.ncbi.nlm.nih.gov/projects/SNP/snp_ref.cgi?rs=#{snp[:rsid]}) " + " [HapMap](http://hapmap.ncbi.nlm.nih.gov/cgi-perl/gbrowse/hapmap27_B36/?name=SNP%3A#{snp[:rsid]})" + " [NextBio](http://www.nextbio.com/b/search/details/#{snp[:rsid]}?type=snp&q0=#{snp[:rsid]}&t0=snp#tab=populations)" + " [Ensembl](http://uswest.ensembl.org/Homo_sapiens/Variation/Summary?v=#{snp[:rsid]};vdb=variation)" end s end |