weblicate

Replicate a website.

Weblicate creates a copy of a web page, complete with third party assets, to be run on your own webserver. When given a HAR file, weblicate writes all assets to local disk. It appends a domain to the end of all URLs so you can simulate external requests from sites like doubleclick and google.

HAR (HTTP Archive) files are a great way to pass around information about web pages. Firebug lets you export them.

Installation

You need Ruby and RubyGems installed

sudo gem install weblicate

Usage

Running the following command:

weblicate www.cnn.com.har

Results in the following response:

Now run these commands...

rsync -avz www.cnn.com-files/ localhost:/var/www/weblicate/www.cnn.com/
scp www.cnn.com-vhosts localhost:/etc/apache2/sites-enabled/
cat www.cnn.com-hosts >> /etc/hosts # For access from workstation

OPTIONAL - if you want access from the wider internet
sh www.cnn.com-dns # Create domains on Slicehost

and generates the following output:

www.cnn.com-files  # All files and assets for the page
www.cnn.com-hosts  # Entries that can be added to your local hosts file
www.cnn.com-vhosts # Apache vhosts entries for all domains

By default, weblicate appends ‘.localhost’ to all domains. You can over ride this if you want you weblicant to be accessible to the greater internet.

weblicate www.cnn.com.har yourdomain.com

Weblicate generates a script that creates the domains. It Works For Me.

www.cnn.com-dns    # A script to create DNS entries (on Slicehost.com)

Copyright © 2010 Mike Bailey. See LICENSE for details.