Class: SecApi::Extractor

Inherits:
Object
  • Object
show all
Defined in:
lib/sec_api/extractor.rb

Overview

Extractor proxy for document extraction endpoints

All extractor methods return immutable ExtractedData objects (not raw hashes). This ensures thread safety and a consistent API surface.

Examples:

Extract text from filing

extracted = client.extractor.extract(filing_url)
extracted.text              # => "Full extracted text..."
extracted.sections          # => { risk_factors: "...", financials: "..." }
extracted.          # => { source_url: "...", form_type: "10-K" }

Extract specific sections

extracted = client.extractor.extract(filing_url, sections: [:risk_factors, :mda])
extracted.risk_factors      # => "Risk factor content..."
extracted.mda               # => "MD&A content..."

Constant Summary collapse

SECTION_MAP =

This constant is part of a private API. You should avoid using this constant if possible, as it may be removed or be changed in the future.

Maps Ruby symbols to SEC item identifiers for 10-K filings

{
  risk_factors: "1A",
  business: "1",
  mda: "7",
  financials: "8",
  legal_proceedings: "3",
  properties: "2",
  market_risk: "7A"
}.freeze

Instance Method Summary collapse

Constructor Details

#initialize(client) ⇒ SecApi::Extractor

This method is part of a private API. You should avoid using this method if possible, as it may be removed or be changed in the future.

Creates a new Extractor proxy instance.

Extractor instances are obtained via Client#extractor and cached for reuse. Direct instantiation is not recommended.

Parameters:



38
39
40
# File 'lib/sec_api/extractor.rb', line 38

def initialize(client)
  @_client = client
end

Instance Method Details

#extract(filing, sections: nil, **options) ⇒ ExtractedData

Note:

When extracting multiple sections, one API call is made per section. This may impact latency and API usage costs for large section lists.

Extract text and sections from SEC filing

Examples:

Extract full filing

extracted = client.extractor.extract(filing_url)
extracted.text  # => "Full filing text..."

Extract specific section (dynamic accessor)

extracted = client.extractor.extract(filing_url, sections: [:risk_factors])
extracted.risk_factors  # => "Risk factors content..."

Extract multiple sections (dynamic accessors)

extracted = client.extractor.extract(filing_url, sections: [:risk_factors, :mda])
extracted.risk_factors  # => "Risk factors..."
extracted.mda           # => "MD&A analysis..."

Parameters:

  • filing (String, Filing)

    The filing URL string or Filing object

  • sections (Array<Symbol>, nil) (defaults to: nil)

    Specific sections to extract (e.g., [:risk_factors, :mda]) When nil or omitted, extracts the full filing text. Supported sections: :risk_factors, :business, :mda, :financials, :legal_proceedings, :properties, :market_risk

  • options (Hash)

    Additional extraction options passed to the API

Returns:

Raises:



68
69
70
71
72
73
74
75
76
77
78
79
80
# File 'lib/sec_api/extractor.rb', line 68

def extract(filing, sections: nil, **options)
  url = filing.is_a?(String) ? filing : filing.url

  if sections.nil? || sections.empty?
    # Default behavior - extract full filing
    response = @_client.connection.post("/extractor", {url: url}.merge(options))
    ExtractedData.from_api(response.body)
  else
    # Extract specified sections
    section_contents = extract_sections(url, Array(sections), options)
    ExtractedData.from_api({sections: section_contents})
  end
end