framework_guesser

Framework guesser tries to detect frameworks and javascript libraries from HTML code and HTTP headers hash. Some extra information like server, server-side programming language, doctype, meta description and keywords are returned as well.

It is used by statscrawler.com to analyze sites and collect statistics about Internet domains. This is a sample (though working and pretty usable) for everyone interested in framework detection on statscrawler.com.

Usage

Requires nokogiri and rspec for tests.

    require 'open-uri'
    require 'openssl'
    require 'framework_guesser'

    for domain in ['rubyonrails.org', 'drupal.org', 'wordpress.org', 'joomla.org']
      begin
        open("http://www." + domain,
            :read_timeout => 10,
            :ssl_verify_mode => OpenSSL::SSL::VERIFY_NONE) do |file|

          url = file.base_uri.to_s

          result = FrameworkGuesser.guess(file.meta, file.read)
          puts "#{domain} => #{url}"
          puts "Description: #{result[:description]}"
          puts "Keywords: #{result[:keywords]}"
          puts "Server: #{result[:server]}"
          puts "Engine: #{result[:engine]}"
          puts "Doctype: #{result[:doctype]}"
          puts "Framework: #{result[:framework]}"
          puts "Features: #{result[:features].join(', ')}"

          puts
        end
      rescue StandardError => err
        puts "#{domain} => #{err.message}"
      end
    end