Class: WordToMarkdown::Document
- Inherits:
-
Object
- Object
- WordToMarkdown::Document
- Defined in:
- lib/word-to-markdown/document.rb
Defined Under Namespace
Classes: ConversionError, NotFoundError
Instance Attribute Summary collapse
-
#path ⇒ Object
readonly
Returns the value of attribute path.
-
#tmpdir ⇒ Object
readonly
Returns the value of attribute tmpdir.
Instance Method Summary collapse
-
#encoding ⇒ String
Determine the document encoding.
-
#extension ⇒ String
The document’s extension.
-
#html ⇒ String
The html representation of the document.
-
#initialize(path, tmpdir = nil) ⇒ Document
constructor
A new instance of Document.
-
#markdown ⇒ String
(also: #to_s)
The markdown representation of the document.
- #tree ⇒ Nokigiri::Document
Constructor Details
#initialize(path, tmpdir = nil) ⇒ Document
Returns a new instance of Document.
13 14 15 16 17 |
# File 'lib/word-to-markdown/document.rb', line 13 def initialize(path, tmpdir = nil) @path = File. path, Dir.pwd @tmpdir = tmpdir || Dir.mktmpdir raise NotFoundError, "File #{@path} does not exist" unless File.exist?(@path) end |
Instance Attribute Details
#path ⇒ Object (readonly)
Returns the value of attribute path.
9 10 11 |
# File 'lib/word-to-markdown/document.rb', line 9 def path @path end |
#tmpdir ⇒ Object (readonly)
Returns the value of attribute tmpdir.
9 10 11 |
# File 'lib/word-to-markdown/document.rb', line 9 def tmpdir @tmpdir end |
Instance Method Details
#encoding ⇒ String
Determine the document encoding
47 48 49 50 51 52 53 54 |
# File 'lib/word-to-markdown/document.rb', line 47 def encoding match = raw_html.encode('UTF-8', invalid: :replace, replace: '').match(/charset=([^"]+)/) if match match[1].sub('macintosh', 'MacRoman') else 'UTF-8' end end |
#extension ⇒ String
Returns the document’s extension.
20 21 22 |
# File 'lib/word-to-markdown/document.rb', line 20 def extension File.extname path end |
#html ⇒ String
Returns the html representation of the document.
34 35 36 |
# File 'lib/word-to-markdown/document.rb', line 34 def html tree.to_html.gsub("</li>\n", '</li>') end |
#markdown ⇒ String Also known as: to_s
Returns the markdown representation of the document.
39 40 41 |
# File 'lib/word-to-markdown/document.rb', line 39 def markdown @markdown ||= scrub_whitespace(ReverseMarkdown.convert(html, WordToMarkdown::REVERSE_MARKDOWN_OPTIONS)) end |
#tree ⇒ Nokigiri::Document
25 26 27 28 29 30 31 |
# File 'lib/word-to-markdown/document.rb', line 25 def tree @tree ||= begin tree = Nokogiri::HTML(normalized_html) tree.css('title').remove tree end end |