Class: Docx::Document

Inherits:
Object
  • Object
show all
Defined in:
lib/docx/document.rb

Overview

The Document class wraps around a docx file and provides methods to interface with it.

# get a Docx::Document for a docx file in the local directory
doc = Docx::Document.open("test.docx")

# get the text from the document
puts doc.text

# do the same thing in a block
Docx::Document.open("test.docx") do |d|
  puts d.text
end

Instance Attribute Summary collapse

Class Method Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(path, &block) ⇒ Document

Returns a new instance of Document.



23
24
25
26
27
28
29
30
31
32
33
34
# File 'lib/docx/document.rb', line 23

def initialize(path, &block)
  @replace = {}
  @zip = Zip::File.open(path)
  @document_xml = @zip.read('word/document.xml')
  @doc = Nokogiri::XML(@document_xml)
  @styles_xml = @zip.read('word/styles.xml')
  @styles = Nokogiri::XML(@styles_xml)
  if block_given?
    yield self
    @zip.close
  end
end

Instance Attribute Details

#docObject (readonly)

Returns the value of attribute doc.



21
22
23
# File 'lib/docx/document.rb', line 21

def doc
  @doc
end

#stylesObject (readonly)

Returns the value of attribute styles.



21
22
23
# File 'lib/docx/document.rb', line 21

def styles
  @styles
end

#xmlObject (readonly)

Returns the value of attribute xml.



21
22
23
# File 'lib/docx/document.rb', line 21

def xml
  @xml
end

#zipObject (readonly)

Returns the value of attribute zip.



21
22
23
# File 'lib/docx/document.rb', line 21

def zip
  @zip
end

Class Method Details

.open(path, &block) ⇒ Object

With no associated block, Docx::Document.open is a synonym for Docx::Document.new. If the optional code block is given, it will be passed the opened docx file as an argument and the Docx::Document oject will automatically be closed when the block terminates. The values of the block will be returned from Docx::Document.open. call-seq:

open(filepath) => file
open(filepath) {|file| block } => obj


49
50
51
# File 'lib/docx/document.rb', line 49

def self.open(path, &block)
  self.new(path, &block)
end

Instance Method Details

#bookmarksObject



57
58
59
60
61
62
63
64
# File 'lib/docx/document.rb', line 57

def bookmarks
  bkmrks_hsh = Hash.new
  bkmrks_ary = @doc.xpath('//w:bookmarkStart').map { |b_node| parse_bookmark_from b_node }
  # auto-generated by office 2010
  bkmrks_ary.reject! {|b| b.name == "_GoBack" }
  bkmrks_ary.each {|b| bkmrks_hsh[b.name] = b }
  bkmrks_hsh
end

#document_propertiesObject

This stores the current global document properties, for now



38
39
40
41
42
# File 'lib/docx/document.rb', line 38

def document_properties
  {
    font_size: font_size
  }
end

#each_paragraphObject

Deprecated

Iterates over paragraphs within document call-seq:

each_paragraph => Enumerator


83
84
85
# File 'lib/docx/document.rb', line 83

def each_paragraph
  paragraphs.each { |p| yield(p) }
end

#font_sizeObject

Some documents have this set, others don’t. Values are returned as half-points, so to get points, that’s why it’s divided by 2.



72
73
74
75
# File 'lib/docx/document.rb', line 72

def font_size
  size_tag = @styles.xpath('//w:docDefaults//w:rPrDefault//w:rPr//w:sz').first
  size_tag ? size_tag.attributes['val'].value.to_i / 2 : nil
end

#paragraphsObject



53
54
55
# File 'lib/docx/document.rb', line 53

def paragraphs
  @doc.xpath('//w:document//w:body//w:p').map { |p_node| parse_paragraph_from p_node }
end

#replace_entry(entry_path, file_contents) ⇒ Object



119
120
121
# File 'lib/docx/document.rb', line 119

def replace_entry(entry_path, file_contents)
  @replace[entry_path] = file_contents
end

#save(path) ⇒ Object

Save document to provided path call-seq:

save(filepath) => void


101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
# File 'lib/docx/document.rb', line 101

def save(path)
  update
  Zip::OutputStream.open(path) do |out|
    zip.each do |entry|
      out.put_next_entry(entry.name)

      if @replace[entry.name]
        out.write(@replace[entry.name])
      else
        out.write(zip.read(entry.name))
      end
    end
  end
  zip.close
end

#tablesObject



66
67
68
# File 'lib/docx/document.rb', line 66

def tables
  @doc.xpath('//w:document//w:body//w:tbl').map { |t_node| parse_table_from t_node }
end

#to_htmlObject

Output entire document as a String HTML fragment



94
95
96
# File 'lib/docx/document.rb', line 94

def to_html
  paragraphs.map(&:to_html).join('\n')
end

#to_sObject Also known as: text

call-seq:

to_s -> string


89
90
91
# File 'lib/docx/document.rb', line 89

def to_s
  paragraphs.map(&:to_s).join("\n")
end