Class: WordToMarkdown::Document

Inherits:
Object
  • Object
show all
Defined in:
lib/word-to-markdown/document.rb

Defined Under Namespace

Classes: ConversionError, NotFoundError

Instance Attribute Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(path, tmpdir = nil) ⇒ Document

Returns a new instance of Document.

Parameters:

  • path (string)

    Path to the Word document

  • tmpdir (string) (defaults to: nil)

    Path to a working directory to use

Raises:



13
14
15
16
17
# File 'lib/word-to-markdown/document.rb', line 13

def initialize(path, tmpdir = nil)
  @path = File.expand_path path, Dir.pwd
  @tmpdir = tmpdir || Dir.mktmpdir
  raise NotFoundError, "File #{@path} does not exist" unless File.exist?(@path)
end

Instance Attribute Details

#pathObject (readonly)

Returns the value of attribute path.



9
10
11
# File 'lib/word-to-markdown/document.rb', line 9

def path
  @path
end

#tmpdirObject (readonly)

Returns the value of attribute tmpdir.



9
10
11
# File 'lib/word-to-markdown/document.rb', line 9

def tmpdir
  @tmpdir
end

Instance Method Details

#encodingString

Determine the document encoding

Returns:

  • (String)

    the encoding, defaulting to “UTF-8”



47
48
49
50
51
52
53
54
# File 'lib/word-to-markdown/document.rb', line 47

def encoding
  match = raw_html.encode('UTF-8', invalid: :replace, replace: '').match(/charset=([^"]+)/)
  if match
    match[1].sub('macintosh', 'MacRoman')
  else
    'UTF-8'
  end
end

#extensionString

Returns the document’s extension.

Returns:

  • (String)

    the document’s extension



20
21
22
# File 'lib/word-to-markdown/document.rb', line 20

def extension
  File.extname path
end

#htmlString

Returns the html representation of the document.

Returns:

  • (String)

    the html representation of the document



34
35
36
# File 'lib/word-to-markdown/document.rb', line 34

def html
  tree.to_html.gsub("</li>\n", '</li>')
end

#markdownString Also known as: to_s

Returns the markdown representation of the document.

Returns:

  • (String)

    the markdown representation of the document



39
40
41
# File 'lib/word-to-markdown/document.rb', line 39

def markdown
  @markdown ||= scrub_whitespace(ReverseMarkdown.convert(html, WordToMarkdown::REVERSE_MARKDOWN_OPTIONS))
end

#treeNokigiri::Document

Returns:

  • (Nokigiri::Document)


25
26
27
28
29
30
31
# File 'lib/word-to-markdown/document.rb', line 25

def tree
  @tree ||= begin
    tree = Nokogiri::HTML(normalized_html)
    tree.css('title').remove
    tree
  end
end