Module: RfcReader::Search

Defined in:
lib/rfc_reader/search.rb

Class Method Summary collapse

Class Method Details

.fetch_by(term:) ⇒ String

Returns the raw HTML of the search results for the given term.

Parameters:

  • term (String)

Returns:

  • (String)

    the raw HTML of the search results for the given term



20
21
22
23
24
# File 'lib/rfc_reader/search.rb', line 20

def self.fetch_by(term:)
  ErrorContext.wrap("Fetching RFC search results") do
    Net::HTTP.post_form(RFC_SEARCH_URI, { combo_box: term }).body
  end
end

.parse(html) ⇒ Hash<String, String>

Example: HTML fragment we’re trying to parse title and link info from.

“‘html <div class=“scrolltable”>

<table class='gridtable'>
    <tr>
        <th>
            <a href='rfc_search_detail.php?sortkey=Number&sorting=DESC&page=25&title=ftp&pubstatus[]=Any&pub_date_type=any'>Number</a>
        </th>
        <th>Files</th>
        <th>Title</th>
        <th>Authors</th>
        <th>
            <a href='rfc_search_detail.php?sortkey=Date&sorting=DESC&page=25&title=ftp&pubstatus[]=Any&pub_date_type=any'>Date</a>
        </th>
        <th>More Info</th>
        <th>Status</th>
    </tr>
    <tr>
        <td>
            <a href="https://www.rfc-editor.org/info/rfc114" target="_blank">RFC&nbsp;114</a>
        </td>
        <td>
            <a href="https://www.rfc-editor.org/rfc/rfc114.txt" target="_blank">ASCII</a>
            ,
            <a href="https://www.rfc-editor.org/pdfrfc/rfc114.txt.pdf" target="_blank">PDF</a>
            ,
            <a href="https://www.rfc-editor.org/rfc/rfc114.html" target="_blank">HTML</a>
        </td>
        <td class="title"> File Transfer Protocol </td>
        <td> A.K. Bhushan</td>
        <td>April 1971</td>
        <td>
            Updated by
            <a href="https://www.rfc-editor.org/info/rfc133" target="_blank">RFC&nbsp;133</a>
            ,
            <a href="https://www.rfc-editor.org/info/rfc141" target="_blank">RFC&nbsp;141</a>
            ,
            <a href="https://www.rfc-editor.org/info/rfc171" target="_blank">RFC&nbsp;171</a>
            ,
            <a href="https://www.rfc-editor.org/info/rfc172" target="_blank">RFC&nbsp;172</a>
        </td>
        <td>Unknown</td>
    </tr>

… “‘

Parameters:

  • html (String)

    the HTML of the search results

Returns:

  • (Hash<String, String>)

    from RFC title to text file url



75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
# File 'lib/rfc_reader/search.rb', line 75

def self.parse(html)
  ErrorContext.wrap("Parsing RFC search results") do
    # NOTE: The first element in the table is just some general search information. See example HTML above.
    Nokogiri::HTML(html)
      .xpath("//div[@class='scrolltable']//table[@class='gridtable']//tr")
      .drop(1)
      .to_h do |tr_node|
        td_nodes = tr_node.elements
        title = td_nodes[2]
          .text
          .strip
        url = td_nodes[1]
          .elements
          .map { _1.attribute("href").text.strip }
          .find { _1.end_with?(".txt") }

        [title, url]
      end
  end
end

.search_by(term:) ⇒ Hash<String, String>

Returns from RFC title to text file url.

Parameters:

  • term (String)

Returns:

  • (Hash<String, String>)

    from RFC title to text file url



13
14
15
16
# File 'lib/rfc_reader/search.rb', line 13

def self.search_by(term:)
  html = fetch_by(term: term)
  parse(html)
end