Class: Section

Inherits:
Object
  • Object
show all
Includes:
Regexen
Defined in:
lib/galaxy/section.rb

Overview

Section is a piece of text that contains data structured in a certain way (possibly with sub-sections), Section describes this data structure and provides methods for parsing the text and extracting data records By default, Section calls new_or_update method on class defined by Section name Example: Section(:name=>:battle_groups) calls Group.new_or_update(match, state) on each found /Battle_group_record/ match by default, unless :record_proc is provided to Section ‘state’ hash is used to provide context for multi-section parsing

Direct Known Subclasses

Report

Constant Summary

Constants included from Regexen

Regexen::Battle_groups_header, Regexen::Battle_groups_record, Regexen::Battle_planets_header, Regexen::Battle_planets_record, Regexen::Bombings_header, Regexen::Bombings_record, Regexen::Default_footer, Regexen::Default_header_proc, Regexen::Fint, Regexen::Fleets_header, Regexen::Fleets_record, Regexen::Fname, Regexen::Fnum, Regexen::Group, Regexen::Groups_header, Regexen::Groups_record, Regexen::Header, Regexen::Incoming_groups_header, Regexen::Incoming_groups_record, Regexen::Int, Regexen::Line, Regexen::Maps_header, Regexen::Name, Regexen::Num, Regexen::Planets_header, Regexen::Planets_record, Regexen::Production_planets_header, Regexen::Production_planets_record, Regexen::Races_header, Regexen::Races_record, Regexen::Reports_header, Regexen::Routes_header, Regexen::Routes_record, Regexen::Scargo, Regexen::Science_products_header, Regexen::Science_products_record, Regexen::Ship_products_header, Regexen::Ship_products_record, Regexen::Sint, Regexen::Sname, Regexen::Snum, Regexen::Sstatus, Regexen::Unidentified_groups_header, Regexen::Unidentified_groups_record, Regexen::Unidentified_planets_header, Regexen::Unidentified_planets_record, Regexen::Uninhabited_planets_header, Regexen::Uninhabited_planets_record, Regexen::Your_groups_header, Regexen::Your_groups_record, Regexen::Your_planets_header, Regexen::Your_planets_record

Instance Attribute Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(args) ⇒ Section

New Section is created by using the following syntax:

Section.new {:header => header, :footer => footer, :record =>[rec1,rec2], :sections => [sec1,sec2,sec3]}
Section.new {:name => name}  -> extracted as {:header => Name_header, :footer => Name_footer, :record =>Name_record }
Section.new :symbol  -> extracted as {:header => Symbol_header, :footer => Symbol_footer, :record =>Symbol_record }


97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
# File 'lib/galaxy/section.rb', line 97

def initialize args
  case args       # Parsing (named) arguments
    when Symbol, String # Symbol represents Section name, appropriately named Constants MUST be defined in Regexen module
    @name = args.to_s.downcase.capitalize
    when Hash
    @name = args[:name].to_s.downcase.capitalize if args[:name]
    @text = args[:text]
    @skip = args[:skip]
    @mult = args[:mult]
    @sections = args[:sections]
    
    # Header/footer/record patterns and appropriate processing Procs can be: 
    # 1) given as a Constant name (should be defined in Regexen module), 
    # 2) given as a direct value (escaped pattern string literal or Proc, respectively), or
    # 3) not given at all, appropriate values should be inferred from :name argument 
    @header = Regexen.const_get(args[:header]) rescue args[:header]
    @footer = Regexen.const_get(args[:footer]) rescue args[:footer]
    @record = Regexen.const_get(args[:record]) rescue args[:record]
    @header_proc = Regexen.const_get(args[:header_proc]) rescue args[:header_proc]
    @footer_proc = Regexen.const_get(args[:footer_proc]) rescue args[:footer_proc]
    @record_proc = Regexen.const_get(args[:record_proc]) rescue args[:record_proc]
  end #case  
  
  # Try to auto-generate Section's Patterns and Procs from @name (if they are not already given)
  # First we try to find Regexen constants derived from name, if not found then we look for defaults
  @header = @header || Regexen.const_get(@name + '_header') rescue 
  if Regexen.const_defined?('Default_header') then Regexen.const_get('Default_header') end
  @footer = @footer || Regexen.const_get(@name + '_footer') rescue 
  if Regexen.const_defined?('Default_footer') then Regexen.const_get('Default_footer') end  
  @record = @record || Regexen.const_get(@name + '_record') rescue 
  if Regexen.const_defined?('Default_record') then Regexen.const_get('Default_record') end
  @header_proc = @header_proc || Regexen.const_get(@name + '_header_proc') rescue 
  if Regexen.const_defined?('Default_header_proc') then Regexen.const_get('Default_header_proc') end
  @footer_proc = @footer_proc || Regexen.const_get(@name + '_footer_proc') rescue 
  if Regexen.const_defined?('Default_footer_proc') then Regexen.const_get('Default_footer_proc') end  
  @record_proc = @record_proc || Regexen.const_get(@name + '_record_proc') rescue 
  if Regexen.const_defined?('Default_record_proc') then Regexen.const_get('Default_record_proc') end
  
  # This is a G+ specific piece of code overriding general Section functionality (Default_record_proc)
  # Needed to speed up calculations and avoid class evaluations on each record
  # Class name of the Object (described by Record), e.g "Group"
  if @name and not @record_proc 
    klass_name = @name.split("_")[-1][0..-2].capitalize
    if Object.const_defined?(klass_name) 
      klass = Object.const_get(klass_name) 
      @record_proc ||= lambda do |match, state| 
        klass.new_or_update match[1..-1], state
      end 
    end 
  end
end

Instance Attribute Details

Regex identifying end of this Section (end of EACH Multisections)



84
85
86
# File 'lib/galaxy/section.rb', line 84

def footer
  @footer
end

Proc object to be run on Footer match



87
88
89
# File 'lib/galaxy/section.rb', line 87

def footer_proc
  @footer_proc
end

#headerObject

Regex identifying start of this Section (obligatory!)



83
84
85
# File 'lib/galaxy/section.rb', line 83

def header
  @header
end

#header_procObject

Proc object to be run on Header match



86
87
88
# File 'lib/galaxy/section.rb', line 86

def header_proc
  @header_proc
end

#multObject

Flag indicating that this is a “Multisection” (several Sections with similar headers one after another)



91
92
93
# File 'lib/galaxy/section.rb', line 91

def mult
  @mult
end

#nameObject

Name of this Section (also used to auto-generate properties)



81
82
83
# File 'lib/galaxy/section.rb', line 81

def name
  @name
end

#recordObject

Regex matching data Record (or an array of such regexen)



85
86
87
# File 'lib/galaxy/section.rb', line 85

def record
  @record
end

#record_procObject

Proc object to be run on each Record match (or an array of Procs)



88
89
90
# File 'lib/galaxy/section.rb', line 88

def record_proc
  @record_proc
end

#sectionsObject

(Sub)sections (possibly) contained inside this Section



89
90
91
# File 'lib/galaxy/section.rb', line 89

def sections
  @sections
end

#skipObject

Flag indicating that this Sections contains no data (should be skipped)



90
91
92
# File 'lib/galaxy/section.rb', line 90

def skip
  @skip
end

#textObject

Source text of this Section (raw material for data extraction)



82
83
84
# File 'lib/galaxy/section.rb', line 82

def text
  @text
end

Instance Method Details

#copyObject

Returns (relatively) deep copy of self



150
151
152
153
154
# File 'lib/galaxy/section.rb', line 150

def copy
  secs = @sections ? @sections.map {|s| s.copy} : nil
  Section.new :name=>@name, :header=>@header, :footer=>@footer, :record=>@record, :header_proc=>@header_proc, 
  :footer_proc=>@footer_proc, :record_proc=>@record_proc, :sections=>secs, :skip=>@skip, :mult=>@mult, :text=>@text
end

#find_text(regex, pos = 0) {|match| ... } ⇒ Object

Safely matches given regex to @text (starting at position pos), returns initial offset of match or nil if regex not found, yield match to given block (if any)

Yields:

  • (match)


202
203
204
205
206
207
208
209
210
# File 'lib/galaxy/section.rb', line 202

def find_text regex, pos=0
  return nil if @text == nil
  return nil if regex == nil
  text = pos == 0 ? @text : @text[pos..-1]
  match = Oniguruma::ORegexp.new(regex).match(text)
  return nil unless match 
  yield match if block_given?
  pos + match.begin # Return initial match offset (corrected for position pos)
end

#parse(state = {}) ⇒ Object

Recursively parse Section, extract data records



157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
# File 'lib/galaxy/section.rb', line 157

def parse state={}
  state[:section] = @name
  if @mult
    #puts "Mults: #{self.name} #{self.header}"
    # Multisection: Find out if this Section is actually a collection of sections with similar headers
    # If it is, clone an Array of multisections and call parse on each (data extraction happens downstream)
    scan_text(@header) do |match|
      start = match.begin
      finish = -1 unless finish = find_text(@footer, match.end) # Find end of Section (after Header END)
      s = self.copy # Create a copy of Section (to be used as child multisection template)
      s.mult = false
      s.text = @text[start..finish] # Set text property for found multisection
      s.parse state # Recursively call parse on each found multisection
    end
  else
    # Process Section Header, Records and Footer (if any)
    find_text(@header) {|match| @header_proc.call match, state} if @header and @header_proc
    scan_text(@record) {|match| @record_proc.call match, state} if @record and @record_proc
    find_text(@footer) {|match| @footer_proc.call match, state} if @footer and @footer_proc
    
    if @sections
      #puts "Sections: #{self.name} #{self.header}"
      # Process Sections array against @text, skipping empty/skippable Sections, recursively 
      # calling parse on found Sections and moving forward position cursor pos 
      # TODO Generalize for UNORDERED Sections (position cursor should not work in this case)
      finish = 0
      @sections.each_with_index do |s, i|
        next if s.skip #Skip non-data Section
        if start = find_text(s.header, finish)  # Find Section Header
          finish = nil # Needed for last Section (no next section to find)
          @sections[i+1..-1].each do |sn| # Find finish by cycling through next Section Headers
            break if finish = find_text(sn.header, start) # Find first of next Section Header 
          end 
          finish = -1 unless finish # If finish not found, set it to the end of @text
          #Start and finish defined, assign text to this Section and recursively parse it
          s.text = @text[start..finish]
          s.parse state 
        end 
      end 
    end 
  end   
end

#scan_text(regex, pos = 0) ⇒ Object

Scans @text for Data Records matching given regex pattern, returns array of matching Data Records (as MatchData or String array), yields each found match object to given block (if any)



214
215
216
217
218
219
220
221
222
223
224
225
# File 'lib/galaxy/section.rb', line 214

def scan_text regex, pos=0
  text = pos == 0 ? @text : @text[pos..-1]
  if block_given?
    # Scan Section for regex matches, yield each match to given block, return array of MATCH objects
    Oniguruma::ORegexp.new(regex).scan(text) {|match| yield match }
  else
    # Scan Section for regex matches, return array of matches converted into string arrays
    results=[]
    Oniguruma::ORegexp.new(regex).scan(text) {|match| results << match[1..-1].to_a }
    results
  end
end