Class: Zenrows::CssExtractor

Inherits:
Object
  • Object
show all
Defined in:
lib/zenrows/css_extractor.rb

Overview

DSL for building CSS extraction rules

Provides a clean interface for defining CSS selectors to extract data from web pages using the ZenRows API.

Examples:

Basic extraction

extractor = Zenrows::CssExtractor.build do
  extract :title, 'h1'
  extract :description, 'meta[name="description"]', attribute: 'content'
end

With attribute extraction

extractor = Zenrows::CssExtractor.build do
  extract :links, 'a.product-link', attribute: 'href'
  extract :images, 'img.product-image', attribute: 'src'
end

Using with ApiClient

api = Zenrows::ApiClient.new
response = api.get(url, css_extractor: extractor)
response.extracted  # => { "title" => "...", "links" => [...] }

Since:

  • 0.2.0

Instance Attribute Summary collapse

Class Method Summary collapse

Instance Method Summary collapse

Constructor Details

#initializeCssExtractor

Initialize empty extractor

Since:

  • 0.2.0



44
45
46
# File 'lib/zenrows/css_extractor.rb', line 44

def initialize
  @rules = {}
end

Instance Attribute Details

#rulesHash{Symbol => String} (readonly)

Returns Extraction rules.

Returns:

  • (Hash{Symbol => String})

    Extraction rules

Since:

  • 0.2.0



33
34
35
# File 'lib/zenrows/css_extractor.rb', line 33

def rules
  @rules
end

Class Method Details

.build {|extractor| ... } ⇒ CssExtractor

Build extractor using DSL block

Yields:

  • (extractor)

    Block for defining extraction rules

Returns:

Since:

  • 0.2.0



39
40
41
# File 'lib/zenrows/css_extractor.rb', line 39

def self.build(&block)
  new.tap { |e| e.instance_eval(&block) }
end

Instance Method Details

#empty?Boolean

Check if extractor has rules

Returns:

  • (Boolean)

Since:

  • 0.2.0



100
101
102
# File 'lib/zenrows/css_extractor.rb', line 100

def empty?
  @rules.empty?
end

#extract(name, selector, attribute: nil) ⇒ self

Define extraction rule

Examples:

Extract text content

extract :title, 'h1'

Extract attribute

extract :link, 'a.main', attribute: 'href'

Parameters:

  • name (Symbol, String)

    Key for extracted data

  • selector (String)

    CSS selector

  • attribute (String, nil) (defaults to: nil)

    Attribute to extract (nil for text content)

Returns:

  • (self)

    For chaining

Since:

  • 0.2.0



60
61
62
63
# File 'lib/zenrows/css_extractor.rb', line 60

def extract(name, selector, attribute: nil)
  @rules[name.to_sym] = attribute ? "#{selector} @#{attribute}" : selector
  self
end

#images(name, selector) ⇒ self

Add rule for extracting src attributes

Parameters:

  • name (Symbol, String)

    Key for extracted data

  • selector (String)

    CSS selector for elements with src

Returns:

  • (self)

    For chaining

Since:

  • 0.2.0



79
80
81
# File 'lib/zenrows/css_extractor.rb', line 79

def images(name, selector)
  extract(name, selector, attribute: "src")
end

Add rule for extracting href attributes

Parameters:

  • name (Symbol, String)

    Key for extracted data

  • selector (String)

    CSS selector for anchor elements

Returns:

  • (self)

    For chaining

Since:

  • 0.2.0



70
71
72
# File 'lib/zenrows/css_extractor.rb', line 70

def links(name, selector)
  extract(name, selector, attribute: "href")
end

#sizeInteger

Number of extraction rules

Returns:

  • (Integer)

Since:

  • 0.2.0



107
108
109
# File 'lib/zenrows/css_extractor.rb', line 107

def size
  @rules.size
end

#to_hHash{Symbol => String}

Convert to hash

Returns:

  • (Hash{Symbol => String})

    Rules hash

Since:

  • 0.2.0



86
87
88
# File 'lib/zenrows/css_extractor.rb', line 86

def to_h
  @rules
end

#to_jsonString

Convert to JSON string for API

Returns:

  • (String)

    JSON representation

Since:

  • 0.2.0



93
94
95
# File 'lib/zenrows/css_extractor.rb', line 93

def to_json(*)
  @rules.transform_keys(&:to_s).to_json
end