Method: Daru::DataFrame.from_html

Defined in:
lib/daru/dataframe.rb

.from_html(path, fields = {}) ⇒ Object

Read the table data from a remote html file. Please note that this module works only for static table elements on a HTML page, and won’t work in cases where the data is being loaded into the HTML table by Javascript.

By default - all <th> tag elements in the first proper row are considered as the order, and all the <th> tag elements in the first column are considered as the index.

Arguments

  • path [String] - URL of the target HTML file.

  • fields [Hash] -

    :match - A String to match and choose a particular table(s) from multiple tables of a HTML page.

    :order - An Array which would act as the user-defined order, to override the parsed Daru::DataFrame.

    :index - An Array which would act as the user-defined index, to override the parsed Daru::DataFrame.

    :name - A String that manually assigns a name to the scraped Daru::DataFrame, for user’s preference.

Returns

An Array of Daru::DataFrames, with each dataframe corresponding to a HTML table on that webpage.

Usage

dfs = Daru::DataFrame.from_html("http://www.moneycontrol.com/", match: "Sun Pharma")
dfs.count
# => 4

dfs.first
#
# => <Daru::DataFrame(5x4)>
#          Company      Price     Change Value (Rs
#     0 Sun Pharma     502.60     -65.05   2,117.87
#     1   Reliance    1356.90      19.60     745.10
#     2 Tech Mahin     379.45     -49.70     650.22
#     3        ITC     315.85       6.75     621.12
#     4       HDFC    1598.85      50.95     553.91


162
163
164
# File 'lib/daru/dataframe.rb', line 162

def from_html path, fields={}
  Daru::IO.from_html path, fields
end