bio-biostars-analytics

Build Status

Data-mining analysis that make use of this gem (newest to oldest):

Installation

Biostars analytics can be installed as a Ruby gem:

gem install bio-biostars-analytics

Statistical analytics requires the installation of R 2.15.0 or later; requires the installation of the plyr package 2.15.1 or later.

Usage

Data-mining: crawl the Biostars forum and retrieve data from the Biostar RESTful API; parameters as of March 2014:

biostars-analytics 96000 54

This will create two files: <date>_api.tsv and <date>_crawled.tsv

Various plots in PNG file format can be generated via:

biostar_api_stats <date>_api.tsv
biostar_crawled_stats <date>_crawled.tsv

Command Line Usage Instructions

Data-Mining

Usage: biostars-analytics max_post_number months_look_back [min_post_number]

Required parameters:
  max_post_number    : highest number (ID) of the post that should
                       be mined for data; the crawler will go over
                       posts min_post_number to max_post_number
  months_look_back   : how many months back should queries to the
                       Biostar API go (1 month = 30 days); default
                       value is 1

Optional parameters:
  min_post_number    : lowest number (ID) of the post that should
                       be mined for data

Output (date matches the script\'s invokation):
  <date>_crawled.tsv : data mined from crawling over posts
  <date>_api.tsv     : data extracted from the Biostar API

Example: mining Biostars in March 2014:
  biostars-analytics 96000 54

Statistics (based on RESTful API data)

Usage: biostar_api_stats apitsvfile

Example (data provided at http://github.com/joejimbo/bioruby-biostars-analytics):
  biostar_api_stats data/20140328_api.tsv

Statistics (based on forum mining/crawling)

Usage: biostar_crawled_stats crawledtsvfile

Example (data provided at http://github.com/joejimbo/bioruby-biostars-analytics):
  biostar_api_stats data/20140328_crawled.tsv

Project Repository

Contributions can be made to the open repository on GitHub:

http://github.com/joejimbo/bioruby-biostars-analytics

The BioRuby community is on IRC server: irc.freenode.org, channel: #bioruby.

Cite

If you use this software, please cite one of

Biogems.info

This Biogem is published at (http://biogems.info/index.html#bio-biostars-analytics)

Copyright (c) 2014 Joachim Baran. See LICENSE.txt for further details.