Module: Bio::Jaspar

Defined in:
lib/bio-jaspar/jaspar.rb

Overview

JASPAR 2014 module

Provides read access to a JASPAR5 formatted database.

This module is a direct import of Bio.motifs.jaspar module in Biopython. The following document contains excerpts from Bio.motifs.jaspar module in Biopython.

Defined Under Namespace

Classes: Motif, Record

Constant Summary collapse

DNA =

Unambiguous DNA bases

Bio::Motifs::Alphabet.new.IUPAC_unambiguous_dna
JASPAR_ORDERED_DNA_LETTERS =

JASPAR OUTPUT specific DNA bases

["A","C","G","T"]

Class Method Summary collapse

Class Method Details

.calculate_pseudocounts(motif) ⇒ Object

Return pseudocounts of a given JASPAR motif



240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
# File 'lib/bio-jaspar/jaspar.rb', line 240

def Jaspar.calculate_pseudocounts(motif)
	alphabet = motif.alphabet
	background = motif.background

	total = 0
	(0...motif.length).each do |i|
		total += alphabet.letters.map { |letter| motif.counts[letter][i].to_f }.inject(:+)
	end

	avg_nb_instances = total / motif.length
	sq_nb_instances = Math.sqrt(avg_nb_instances)

	if background
		background = Hash[background]
	else
		background = Hash[alphabet.letters.sort.map { |l| [l, 1.0] }]
	end

	total = background.values.inject(:+)
	pseudocounts = {}

	alphabet.letters.each do |letter|
		background[letter] /= total
		pseudocounts[letter] = sq_nb_instances * background[letter]
	end

	return pseudocounts
end

.read(handle, format) ⇒ Object

Return the record of PFM(s). Call the appropriate routine based on the format passed



190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
# File 'lib/bio-jaspar/jaspar.rb', line 190

def Jaspar.read(handle, format)
	format = format.downcase
	if format == "pfm"
		record = _read_pfm(handle)
		return record
	elsif format == "sites"
		record = _read_sites(handle)
		return record
	elsif format == "jaspar"
		record = _read_jaspar(handle)
		return record
	else
		raise ArgumentError, "Unknown JASPAR format #{format}"
	end
			
end

.split_jaspar_id(id) ⇒ Object

Components are base ID and version number, e.g. ‘MA0047.2’ is returned as (‘MA0047’, 2).



273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
# File 'lib/bio-jaspar/jaspar.rb', line 273

def Jaspar.split_jaspar_id(id)
	id_split = id.split(".")

	base_id = nil
	version = nil

	if id_split.length == 2
		base_id = id_split[0]
		version = id_split[1]
	else
		base_id = id
	end

	return base_id, version
end

.write(motifs, format) ⇒ Object

Return the representation of motifs in “pfm” or “jaspar” format.



208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
# File 'lib/bio-jaspar/jaspar.rb', line 208

def Jaspar.write(motifs, format)
	letters = JASPAR_ORDERED_DNA_LETTERS
	lines = []
	if format == "pfm"
		motif = motifs[0]
		counts = motif.counts
		letters.each do |letter|
			terms = counts[letter].map { |value| "%6.2f" % value }
			line = "#{terms.join(" ")}\n"
			lines << line
		end
	elsif format == "jaspar"
		motifs.each do |m|
			counts = m.counts
			line = ">#{m.matrix_id} #{m.name}\n"
			lines << line

			letters.each do |letter|
				terms = counts[letter].map { |value| "%6.2f" % value }
				line = "#{letter} [#{terms.join(" ")}]\n"
				lines << line
			end
		end
	else
		raise ArgumentError, "Unknown JASPAR format #{format}"
	end
		
	text = lines.join("")
	return text	
end