Clusta
Clusta is a Ruby gem for network analysis built on top of Wukong.
Wukong lets you write Ruby scripts that run on your laptop as well as on a Hadoop cluster.
Clusta is:
-
classes that make describing the geometry of networks easy
-
network algorithms written with these classes to use Wukong
-
a shim command-line program for running these algorithms
Start with a file containing edges:
Edge 1 2
Edge 2 3
Edge 1 4
Edge 4 5
Edge 5 6
Edge 5 7
Edge 6 8
Edge 7 8
Edge 8 9
Run this through a transformation named edges_to_degrees
:
$ clusta --transform=edges_to_degrees /local/edges.tsv -
Degree 1 2
Degree 2 2
Degree 3 1
Degree 4 2
Degree 5 3
Degree 6 2
Degree 7 2
Degree 8 3
Degree 9 1
Chain transformations together:
$ clusta --transform=edges_to_neighborhoods /local/edges.tsv - | clusta --transform=neighborhoods_to_degree_pairs - - | clusta --transform=degree_pairs_to_assortativities - -
Assortativity 1 2 1
Assortativity 1 3 1
Assortativity 2 1 1
Assortativity 2 2 4
Assortativity 2 3 5
Assortativity 3 1 1
Assortativity 3 2 5
And then leverage Wukong when you’re ready:
$ clusta --run=hadoop --transform=edges_to_neighborhoods /hdfs/edges.tsv /hdfs/neighborhoods.tsv
I, [2012-03-03T21:00:39.992750 #25835] INFO -- : Launching hadoop!
I, [2012-03-03T21:00:39.992979 #25835] INFO -- : Running
/usr/lib/hadoop/bin/hadoop \
jar /usr/lib/hadoop/contrib/streaming/hadoop-*streaming*.jar \
-D mapred.job.name='clusta---spec/data/edges/undirected.unweighted.tsv----' \
-mapper '/usr/bin/ruby1.9.1 clusta --map --log_interval=10000 --log_seconds=30 --transform=edges_to_degrees' \
-reducer '/usr/bin/ruby1.9.1 clusta --reduce --log_interval=10000 --log_seconds=30 --transform=edges_to_degrees' \
-input 'spec/data/edges/undirected.unweighted.tsv' \
-output '-' \
-file '/home/user/projects/networks/clusta/bin/clusta'
...