Class: Section

Inherits:

Object

Object
Section

show all

Includes:: Regexen

Defined in:: lib/galaxy/section.rb

Overview

Section is a piece of text that contains data structured in a certain way (possibly with sub-sections), Section describes this data structure and provides methods for parsing the text and extracting data records By default, Section calls new_or_update method on class defined by Section name Example: Section(:name=>:battle_groups) calls Group.new_or_update(match, state) on each found /Battle_group_record/ match by default, unless :record_proc is provided to Section ‘state’ hash is used to provide context for multi-section parsing

Direct Known Subclasses

Report

Constant Summary

Constants included from Regexen

Instance Attribute Summary collapse

#footer ⇒ Object

Regex identifying end of this Section (end of EACH Multisections).
#footer_proc ⇒ Object

Proc object to be run on Footer match.
#header ⇒ Object

Regex identifying start of this Section (obligatory!).
#header_proc ⇒ Object

Proc object to be run on Header match.
#mult ⇒ Object

Flag indicating that this is a “Multisection” (several Sections with similar headers one after another).
#name ⇒ Object

Name of this Section (also used to auto-generate properties).
#record ⇒ Object

Regex matching data Record (or an array of such regexen).
#record_proc ⇒ Object

Proc object to be run on each Record match (or an array of Procs).
#sections ⇒ Object

(Sub)sections (possibly) contained inside this Section.
#skip ⇒ Object

Flag indicating that this Sections contains no data (should be skipped).
#text ⇒ Object

Source text of this Section (raw material for data extraction).

Instance Method Summary collapse

#copy ⇒ Object

Returns (relatively) deep copy of self.
#find_text(regex, pos = 0) {|match| ... } ⇒ Object

Safely matches given regex to @text (starting at position pos), returns initial offset of match or nil if regex not found, yield match to given block (if any).
#initialize(args) ⇒ Section constructor

New Section is created by using the following syntax: Section.new => header, :footer => footer, :record =>[rec1,rec2], :sections => [sec1,sec2,sec3] Section.new => name -> extracted as {:header => Name_header, :footer => Name_footer, :record =>Name_record } Section.new :symbol -> extracted as {:header => Symbol_header, :footer => Symbol_footer, :record =>Symbol_record }.
#parse(state = {}) ⇒ Object

Recursively parse Section, extract data records.
#scan_text(regex, pos = 0) ⇒ Object

Scans @text for Data Records matching given regex pattern, returns array of matching Data Records (as MatchData or String array), yields each found match object to given block (if any).

Constructor Details

#initialize(args) ⇒ `Section`

New Section is created by using the following syntax:

Section.new {:header => header, :footer => footer, :record =>[rec1,rec2], :sections => [sec1,sec2,sec3]}
Section.new {:name => name}  -> extracted as {:header => Name_header, :footer => Name_footer, :record =>Name_record }
Section.new :symbol  -> extracted as {:header => Symbol_header, :footer => Symbol_footer, :record =>Symbol_record }

# File 'lib/galaxy/section.rb', line 97

def initialize args
  case args       # Parsing (named) arguments
    when Symbol, String # Symbol represents Section name, appropriately named Constants MUST be defined in Regexen module
    @name = args.to_s.downcase.capitalize
    when Hash
    @name = args[:name].to_s.downcase.capitalize if args[:name]
    @text = args[:text]
    @skip = args[:skip]
    @mult = args[:mult]
    @sections = args[:sections]
    
    # Header/footer/record patterns and appropriate processing Procs can be: 
    # 1) given as a Constant name (should be defined in Regexen module), 
    # 2) given as a direct value (escaped pattern string literal or Proc, respectively), or
    # 3) not given at all, appropriate values should be inferred from :name argument 
    @header = Regexen.const_get(args[:header]) rescue args[:header]
    @footer = Regexen.const_get(args[:footer]) rescue args[:footer]
    @record = Regexen.const_get(args[:record]) rescue args[:record]
    @header_proc = Regexen.const_get(args[:header_proc]) rescue args[:header_proc]
    @footer_proc = Regexen.const_get(args[:footer_proc]) rescue args[:footer_proc]
    @record_proc = Regexen.const_get(args[:record_proc]) rescue args[:record_proc]
  end #case  
  
  # Try to auto-generate Section's Patterns and Procs from @name (if they are not already given)
  # First we try to find Regexen constants derived from name, if not found then we look for defaults
  @header = @header || Regexen.const_get(@name + '_header') rescue 
  if Regexen.const_defined?('Default_header') then Regexen.const_get('Default_header') end
  @footer = @footer || Regexen.const_get(@name + '_footer') rescue 
  if Regexen.const_defined?('Default_footer') then Regexen.const_get('Default_footer') end  
  @record = @record || Regexen.const_get(@name + '_record') rescue 
  if Regexen.const_defined?('Default_record') then Regexen.const_get('Default_record') end
  @header_proc = @header_proc || Regexen.const_get(@name + '_header_proc') rescue 
  if Regexen.const_defined?('Default_header_proc') then Regexen.const_get('Default_header_proc') end
  @footer_proc = @footer_proc || Regexen.const_get(@name + '_footer_proc') rescue 
  if Regexen.const_defined?('Default_footer_proc') then Regexen.const_get('Default_footer_proc') end  
  @record_proc = @record_proc || Regexen.const_get(@name + '_record_proc') rescue 
  if Regexen.const_defined?('Default_record_proc') then Regexen.const_get('Default_record_proc') end
  
  # This is a G+ specific piece of code overriding general Section functionality (Default_record_proc)
  # Needed to speed up calculations and avoid class evaluations on each record
  # Class name of the Object (described by Record), e.g "Group"
  if @name and not @record_proc 
    klass_name = @name.split("_")[-1][0..-2].capitalize
    if Object.const_defined?(klass_name) 
      klass = Object.const_get(klass_name) 
      @record_proc ||= lambda do |match, state| 
        klass.new_or_update match[1..-1], state
      end 
    end 
  end
end

Instance Attribute Details

Regex identifying end of this Section (end of EACH Multisections)



84
85
86

# File 'lib/galaxy/section.rb', line 84

def footer
  @footer
end

#footer_proc ⇒ `Object`

Proc object to be run on Footer match



87
88
89

# File 'lib/galaxy/section.rb', line 87

def footer_proc
  @footer_proc
end

#header ⇒ `Object`

Regex identifying start of this Section (obligatory!)



83
84
85

# File 'lib/galaxy/section.rb', line 83

def header
  @header
end

#header_proc ⇒ `Object`

Proc object to be run on Header match



86
87
88

# File 'lib/galaxy/section.rb', line 86

def header_proc
  @header_proc
end

#mult ⇒ `Object`

Flag indicating that this is a “Multisection” (several Sections with similar headers one after another)



91
92
93

# File 'lib/galaxy/section.rb', line 91

def mult
  @mult
end

#name ⇒ `Object`

Name of this Section (also used to auto-generate properties)



81
82
83

# File 'lib/galaxy/section.rb', line 81

def name
  @name
end

#record ⇒ `Object`

Regex matching data Record (or an array of such regexen)



85
86
87

# File 'lib/galaxy/section.rb', line 85

def record
  @record
end

#record_proc ⇒ `Object`

Proc object to be run on each Record match (or an array of Procs)



88
89
90

# File 'lib/galaxy/section.rb', line 88

def record_proc
  @record_proc
end

#sections ⇒ `Object`

(Sub)sections (possibly) contained inside this Section



89
90
91

# File 'lib/galaxy/section.rb', line 89

def sections
  @sections
end

#skip ⇒ `Object`

Flag indicating that this Sections contains no data (should be skipped)



90
91
92

# File 'lib/galaxy/section.rb', line 90

def skip
  @skip
end

#text ⇒ `Object`

Source text of this Section (raw material for data extraction)



82
83
84

# File 'lib/galaxy/section.rb', line 82

def text
  @text
end

Instance Method Details

#copy ⇒ `Object`

Returns (relatively) deep copy of self

# File 'lib/galaxy/section.rb', line 150

def copy
  secs = @sections ? @sections.map {|s| s.copy} : nil
  Section.new :name=>@name, :header=>@header, :footer=>@footer, :record=>@record, :header_proc=>@header_proc, 
  :footer_proc=>@footer_proc, :record_proc=>@record_proc, :sections=>secs, :skip=>@skip, :mult=>@mult, :text=>@text
end

#find_text(regex, pos = 0) {|match| ... } ⇒ `Object`

Safely matches given regex to @text (starting at position pos), returns initial offset of match or nil if regex not found, yield match to given block (if any)

Yields:

(match)

# File 'lib/galaxy/section.rb', line 202

def find_text regex, pos=0
  return nil if @text == nil
  return nil if regex == nil
  text = pos == 0 ? @text : @text[pos..-1]
  match = Oniguruma::ORegexp.new(regex).match(text)
  return nil unless match 
  yield match if block_given?
  pos + match.begin # Return initial match offset (corrected for position pos)
end

#parse(state = {}) ⇒ `Object`

Recursively parse Section, extract data records

# File 'lib/galaxy/section.rb', line 157

def parse state={}
  state[:section] = @name
  if @mult
    #puts "Mults: #{self.name} #{self.header}"
    # Multisection: Find out if this Section is actually a collection of sections with similar headers
    # If it is, clone an Array of multisections and call parse on each (data extraction happens downstream)
    scan_text(@header) do |match|
      start = match.begin
      finish = -1 unless finish = find_text(@footer, match.end) # Find end of Section (after Header END)
      s = self.copy # Create a copy of Section (to be used as child multisection template)
      s.mult = false
      s.text = @text[start..finish] # Set text property for found multisection
      s.parse state # Recursively call parse on each found multisection
    end
  else
    # Process Section Header, Records and Footer (if any)
    find_text(@header) {|match| @header_proc.call match, state} if @header and @header_proc
    scan_text(@record) {|match| @record_proc.call match, state} if @record and @record_proc
    find_text(@footer) {|match| @footer_proc.call match, state} if @footer and @footer_proc
    
    if @sections
      #puts "Sections: #{self.name} #{self.header}"
      # Process Sections array against @text, skipping empty/skippable Sections, recursively 
      # calling parse on found Sections and moving forward position cursor pos 
      # TODO Generalize for UNORDERED Sections (position cursor should not work in this case)
      finish = 0
      @sections.each_with_index do |s, i|
        next if s.skip #Skip non-data Section
        if start = find_text(s.header, finish)  # Find Section Header
          finish = nil # Needed for last Section (no next section to find)
          @sections[i+1..-1].each do |sn| # Find finish by cycling through next Section Headers
            break if finish = find_text(sn.header, start) # Find first of next Section Header 
          end 
          finish = -1 unless finish # If finish not found, set it to the end of @text
          #Start and finish defined, assign text to this Section and recursively parse it
          s.text = @text[start..finish]
          s.parse state 
        end 
      end 
    end 
  end   
end

#scan_text(regex, pos = 0) ⇒ `Object`

Scans @text for Data Records matching given regex pattern, returns array of matching Data Records (as MatchData or String array), yields each found match object to given block (if any)

# File 'lib/galaxy/section.rb', line 214

def scan_text regex, pos=0
  text = pos == 0 ? @text : @text[pos..-1]
  if block_given?
    # Scan Section for regex matches, yield each match to given block, return array of MATCH objects
    Oniguruma::ORegexp.new(regex).scan(text) {|match| yield match }
  else
    # Scan Section for regex matches, return array of matches converted into string arrays
    results=[]
    Oniguruma::ORegexp.new(regex).scan(text) {|match| results << match[1..-1].to_a }
    results
  end
end

Class: Section

Overview

Direct Known Subclasses

Constant Summary

Constants included from Regexen

Instance Attribute Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(args) ⇒ Section

Instance Attribute Details

#footer ⇒ Object

#footer_proc ⇒ Object

#header ⇒ Object

#header_proc ⇒ Object

#mult ⇒ Object

#name ⇒ Object

#record ⇒ Object

#record_proc ⇒ Object

#sections ⇒ Object

#skip ⇒ Object

#text ⇒ Object