Class: Mizuho::Parser

Inherits:
Object
  • Object
show all
Defined in:
lib/mizuho/parser.rb

Overview

This class can parse the raw Asciidoc XHTML output, and extract the title, raw contents (without layout) and other information from it.

Instance Attribute Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(filename) ⇒ Parser

Parse the given file.



21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
# File 'lib/mizuho/parser.rb', line 21

def initialize(filename)
	@filename = filename
	
	@contents = File.read(filename)
	
	# Extract the title.
	@contents =~ %r{<title>(.*?)</title>}
	@title = CGI::unescapeHTML($1)
	
	# Get rid of the Asciidoc layout and unwanted elements.
	if !@contents.sub!(/\A.*?(<div id="preamble">)/m, '\1')
		# There's no preamble, so strip everything until the
		# end of the TOC div.
		@contents.sub!(%r(\A.*?</noscript>[\r\n\s]*</div>[\r\n\s]*</div>)m, '')
	end
	@contents.sub!(/<div id="footer">.*/m, '')
	@contents.gsub!(%r{<div style="clear:left"></div>}, '')
	
	# Extract table of contents.
	parse_table_of_contents(@contents)
end

Instance Attribute Details

#contentsObject (readonly)

The document’s raw contents, without any layout.



18
19
20
# File 'lib/mizuho/parser.rb', line 18

def contents
  @contents
end

#filenameObject (readonly)

Returns the value of attribute filename.



10
11
12
# File 'lib/mizuho/parser.rb', line 10

def filename
  @filename
end

#table_of_contentsObject (readonly)

The document’s table of contents, represented in a tree structure by Heading objects.



16
17
18
# File 'lib/mizuho/parser.rb', line 16

def table_of_contents
  @table_of_contents
end

#titleObject (readonly)

The document’s title.



13
14
15
# File 'lib/mizuho/parser.rb', line 13

def title
  @title
end

Instance Method Details

#chaptersObject

Returns the individual chapters as an array of Chapter objects. The first Chapter object represents the preamble.



45
46
47
# File 'lib/mizuho/parser.rb', line 45

def chapters
	@chapters ||= parse_chapters(@contents)
end