Class: Section
Overview
Section is a piece of text that contains data structured in a certain way (possibly with sub-sections), Section describes this data structure and provides methods for parsing the text and extracting data records By default, Section calls new_or_update method on class defined by Section name Example: Section(:name=>:battle_groups) calls Group.new_or_update(match, state) on each found /Battle_group_record/ match by default, unless :record_proc is provided to Section ‘state’ hash is used to provide context for multi-section parsing
Direct Known Subclasses
Constant Summary
Constants included from Regexen
Regexen::Battle_groups_header, Regexen::Battle_groups_record, Regexen::Battle_planets_header, Regexen::Battle_planets_record, Regexen::Bombings_header, Regexen::Bombings_record, Regexen::Default_footer, Regexen::Default_header_proc, Regexen::Fint, Regexen::Fleets_header, Regexen::Fleets_record, Regexen::Fname, Regexen::Fnum, Regexen::Group, Regexen::Groups_header, Regexen::Groups_record, Regexen::Header, Regexen::Incoming_groups_header, Regexen::Incoming_groups_record, Regexen::Int, Regexen::Line, Regexen::Maps_header, Regexen::Name, Regexen::Num, Regexen::Planets_header, Regexen::Planets_record, Regexen::Production_planets_header, Regexen::Production_planets_record, Regexen::Races_header, Regexen::Races_record, Regexen::Reports_header, Regexen::Routes_header, Regexen::Routes_record, Regexen::Scargo, Regexen::Science_products_header, Regexen::Science_products_record, Regexen::Ship_products_header, Regexen::Ship_products_record, Regexen::Sint, Regexen::Sname, Regexen::Snum, Regexen::Sstatus, Regexen::Unidentified_groups_header, Regexen::Unidentified_groups_record, Regexen::Unidentified_planets_header, Regexen::Unidentified_planets_record, Regexen::Uninhabited_planets_header, Regexen::Uninhabited_planets_record, Regexen::Your_groups_header, Regexen::Your_groups_record, Regexen::Your_planets_header, Regexen::Your_planets_record
Instance Attribute Summary collapse
-
#footer ⇒ Object
Regex identifying end of this Section (end of EACH Multisections).
-
#footer_proc ⇒ Object
Proc object to be run on Footer match.
-
#header ⇒ Object
Regex identifying start of this Section (obligatory!).
-
#header_proc ⇒ Object
Proc object to be run on Header match.
-
#mult ⇒ Object
Flag indicating that this is a “Multisection” (several Sections with similar headers one after another).
-
#name ⇒ Object
Name of this Section (also used to auto-generate properties).
-
#record ⇒ Object
Regex matching data Record (or an array of such regexen).
-
#record_proc ⇒ Object
Proc object to be run on each Record match (or an array of Procs).
-
#sections ⇒ Object
(Sub)sections (possibly) contained inside this Section.
-
#skip ⇒ Object
Flag indicating that this Sections contains no data (should be skipped).
-
#text ⇒ Object
Source text of this Section (raw material for data extraction).
Instance Method Summary collapse
-
#copy ⇒ Object
Returns (relatively) deep copy of self.
-
#find_text(regex, pos = 0) {|match| ... } ⇒ Object
Safely matches given regex to @text (starting at position pos), returns initial offset of match or nil if regex not found, yield match to given block (if any).
-
#initialize(args) ⇒ Section
constructor
New Section is created by using the following syntax: Section.new => header, :footer => footer, :record =>[rec1,rec2], :sections => [sec1,sec2,sec3] Section.new => name -> extracted as {:header => Name_header, :footer => Name_footer, :record =>Name_record } Section.new :symbol -> extracted as {:header => Symbol_header, :footer => Symbol_footer, :record =>Symbol_record }.
-
#parse(state = {}) ⇒ Object
Recursively parse Section, extract data records.
-
#scan_text(regex, pos = 0) ⇒ Object
Scans @text for Data Records matching given regex pattern, returns array of matching Data Records (as MatchData or String array), yields each found match object to given block (if any).
Constructor Details
#initialize(args) ⇒ Section
New Section is created by using the following syntax:
Section.new {:header => header, :footer => footer, :record =>[rec1,rec2], :sections => [sec1,sec2,sec3]}
Section.new {:name => name} -> extracted as {:header => Name_header, :footer => Name_footer, :record =>Name_record }
Section.new :symbol -> extracted as {:header => Symbol_header, :footer => Symbol_footer, :record =>Symbol_record }
97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 |
# File 'lib/galaxy/section.rb', line 97 def initialize args case args # Parsing (named) arguments when Symbol, String # Symbol represents Section name, appropriately named Constants MUST be defined in Regexen module @name = args.to_s.downcase.capitalize when Hash @name = args[:name].to_s.downcase.capitalize if args[:name] @text = args[:text] @skip = args[:skip] @mult = args[:mult] @sections = args[:sections] # Header/footer/record patterns and appropriate processing Procs can be: # 1) given as a Constant name (should be defined in Regexen module), # 2) given as a direct value (escaped pattern string literal or Proc, respectively), or # 3) not given at all, appropriate values should be inferred from :name argument @header = Regexen.const_get(args[:header]) rescue args[:header] @footer = Regexen.const_get(args[:footer]) rescue args[:footer] @record = Regexen.const_get(args[:record]) rescue args[:record] @header_proc = Regexen.const_get(args[:header_proc]) rescue args[:header_proc] @footer_proc = Regexen.const_get(args[:footer_proc]) rescue args[:footer_proc] @record_proc = Regexen.const_get(args[:record_proc]) rescue args[:record_proc] end #case # Try to auto-generate Section's Patterns and Procs from @name (if they are not already given) # First we try to find Regexen constants derived from name, if not found then we look for defaults @header = @header || Regexen.const_get(@name + '_header') rescue if Regexen.const_defined?('Default_header') then Regexen.const_get('Default_header') end @footer = @footer || Regexen.const_get(@name + '_footer') rescue if Regexen.const_defined?('Default_footer') then Regexen.const_get('Default_footer') end @record = @record || Regexen.const_get(@name + '_record') rescue if Regexen.const_defined?('Default_record') then Regexen.const_get('Default_record') end @header_proc = @header_proc || Regexen.const_get(@name + '_header_proc') rescue if Regexen.const_defined?('Default_header_proc') then Regexen.const_get('Default_header_proc') end @footer_proc = @footer_proc || Regexen.const_get(@name + '_footer_proc') rescue if Regexen.const_defined?('Default_footer_proc') then Regexen.const_get('Default_footer_proc') end @record_proc = @record_proc || Regexen.const_get(@name + '_record_proc') rescue if Regexen.const_defined?('Default_record_proc') then Regexen.const_get('Default_record_proc') end # This is a G+ specific piece of code overriding general Section functionality (Default_record_proc) # Needed to speed up calculations and avoid class evaluations on each record # Class name of the Object (described by Record), e.g "Group" if @name and not @record_proc klass_name = @name.split("_")[-1][0..-2].capitalize if Object.const_defined?(klass_name) klass = Object.const_get(klass_name) @record_proc ||= lambda do |match, state| klass.new_or_update match[1..-1], state end end end end |
Instance Attribute Details
#footer ⇒ Object
Regex identifying end of this Section (end of EACH Multisections)
84 85 86 |
# File 'lib/galaxy/section.rb', line 84 def @footer end |
#footer_proc ⇒ Object
Proc object to be run on Footer match
87 88 89 |
# File 'lib/galaxy/section.rb', line 87 def @footer_proc end |
#header ⇒ Object
Regex identifying start of this Section (obligatory!)
83 84 85 |
# File 'lib/galaxy/section.rb', line 83 def header @header end |
#header_proc ⇒ Object
Proc object to be run on Header match
86 87 88 |
# File 'lib/galaxy/section.rb', line 86 def header_proc @header_proc end |
#mult ⇒ Object
Flag indicating that this is a “Multisection” (several Sections with similar headers one after another)
91 92 93 |
# File 'lib/galaxy/section.rb', line 91 def mult @mult end |
#name ⇒ Object
Name of this Section (also used to auto-generate properties)
81 82 83 |
# File 'lib/galaxy/section.rb', line 81 def name @name end |
#record ⇒ Object
Regex matching data Record (or an array of such regexen)
85 86 87 |
# File 'lib/galaxy/section.rb', line 85 def record @record end |
#record_proc ⇒ Object
Proc object to be run on each Record match (or an array of Procs)
88 89 90 |
# File 'lib/galaxy/section.rb', line 88 def record_proc @record_proc end |
#sections ⇒ Object
(Sub)sections (possibly) contained inside this Section
89 90 91 |
# File 'lib/galaxy/section.rb', line 89 def sections @sections end |
#skip ⇒ Object
Flag indicating that this Sections contains no data (should be skipped)
90 91 92 |
# File 'lib/galaxy/section.rb', line 90 def skip @skip end |
#text ⇒ Object
Source text of this Section (raw material for data extraction)
82 83 84 |
# File 'lib/galaxy/section.rb', line 82 def text @text end |
Instance Method Details
#copy ⇒ Object
Returns (relatively) deep copy of self
150 151 152 153 154 |
# File 'lib/galaxy/section.rb', line 150 def copy secs = @sections ? @sections.map {|s| s.copy} : nil Section.new :name=>@name, :header=>@header, :footer=>@footer, :record=>@record, :header_proc=>@header_proc, :footer_proc=>@footer_proc, :record_proc=>@record_proc, :sections=>secs, :skip=>@skip, :mult=>@mult, :text=>@text end |
#find_text(regex, pos = 0) {|match| ... } ⇒ Object
Safely matches given regex to @text (starting at position pos), returns initial offset of match or nil if regex not found, yield match to given block (if any)
202 203 204 205 206 207 208 209 210 |
# File 'lib/galaxy/section.rb', line 202 def find_text regex, pos=0 return nil if @text == nil return nil if regex == nil text = pos == 0 ? @text : @text[pos..-1] match = Oniguruma::ORegexp.new(regex).match(text) return nil unless match yield match if block_given? pos + match.begin # Return initial match offset (corrected for position pos) end |
#parse(state = {}) ⇒ Object
Recursively parse Section, extract data records
157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 |
# File 'lib/galaxy/section.rb', line 157 def parse state={} state[:section] = @name if @mult #puts "Mults: #{self.name} #{self.header}" # Multisection: Find out if this Section is actually a collection of sections with similar headers # If it is, clone an Array of multisections and call parse on each (data extraction happens downstream) scan_text(@header) do |match| start = match.begin finish = -1 unless finish = find_text(@footer, match.end) # Find end of Section (after Header END) s = self.copy # Create a copy of Section (to be used as child multisection template) s.mult = false s.text = @text[start..finish] # Set text property for found multisection s.parse state # Recursively call parse on each found multisection end else # Process Section Header, Records and Footer (if any) find_text(@header) {|match| @header_proc.call match, state} if @header and @header_proc scan_text(@record) {|match| @record_proc.call match, state} if @record and @record_proc find_text(@footer) {|match| @footer_proc.call match, state} if @footer and @footer_proc if @sections #puts "Sections: #{self.name} #{self.header}" # Process Sections array against @text, skipping empty/skippable Sections, recursively # calling parse on found Sections and moving forward position cursor pos # TODO Generalize for UNORDERED Sections (position cursor should not work in this case) finish = 0 @sections.each_with_index do |s, i| next if s.skip #Skip non-data Section if start = find_text(s.header, finish) # Find Section Header finish = nil # Needed for last Section (no next section to find) @sections[i+1..-1].each do |sn| # Find finish by cycling through next Section Headers break if finish = find_text(sn.header, start) # Find first of next Section Header end finish = -1 unless finish # If finish not found, set it to the end of @text #Start and finish defined, assign text to this Section and recursively parse it s.text = @text[start..finish] s.parse state end end end end end |
#scan_text(regex, pos = 0) ⇒ Object
Scans @text for Data Records matching given regex pattern, returns array of matching Data Records (as MatchData or String array), yields each found match object to given block (if any)
214 215 216 217 218 219 220 221 222 223 224 225 |
# File 'lib/galaxy/section.rb', line 214 def scan_text regex, pos=0 text = pos == 0 ? @text : @text[pos..-1] if block_given? # Scan Section for regex matches, yield each match to given block, return array of MATCH objects Oniguruma::ORegexp.new(regex).scan(text) {|match| yield match } else # Scan Section for regex matches, return array of matches converted into string arrays results=[] Oniguruma::ORegexp.new(regex).scan(text) {|match| results << match[1..-1].to_a } results end end |