Class: Docx::Document
- Inherits:
-
Object
- Object
- Docx::Document
- Includes:
- SimpleInspect
- Defined in:
- lib/docx/document.rb
Overview
The Document class wraps around a docx file and provides methods to interface with it.
# get a Docx::Document for a docx file in the local directory
doc = Docx::Document.open("test.docx")
# get the text from the document
puts doc.text
# do the same thing in a block
Docx::Document.open("test.docx") do |d|
puts d.text
end
Instance Attribute Summary collapse
-
#doc ⇒ Object
readonly
Returns the value of attribute doc.
-
#styles ⇒ Object
readonly
Returns the value of attribute styles.
-
#xml ⇒ Object
readonly
Returns the value of attribute xml.
-
#zip ⇒ Object
readonly
Returns the value of attribute zip.
Class Method Summary collapse
-
.open(path, &block) ⇒ Object
With no associated block, Docx::Document.open is a synonym for Docx::Document.new.
Instance Method Summary collapse
- #bookmarks ⇒ Object
- #default_paragraph_style ⇒ Object
-
#document_properties ⇒ Object
This stores the current global document properties, for now.
-
#each_paragraph ⇒ Object
Deprecated.
-
#font_size ⇒ Object
Some documents have this set, others don’t.
- #hyperlink_relationships ⇒ Object
-
#hyperlinks ⇒ Object
Hyperlink targets are extracted from the document.xml.rels file.
-
#initialize(path_or_io, options = {}) ⇒ Document
constructor
A new instance of Document.
- #paragraphs ⇒ Object
- #replace_entry(entry_path, file_contents) ⇒ Object
-
#save(path) ⇒ Object
Save document to provided path call-seq: save(filepath) => void.
-
#stream ⇒ Object
Output entire document as a StringIO object.
- #style_name_of(style_id) ⇒ Object
- #styles_configuration ⇒ Object
- #tables ⇒ Object
-
#to_html ⇒ Object
Output entire document as a String HTML fragment.
-
#to_s ⇒ Object
(also: #text)
call-seq: to_s -> string.
- #to_xml ⇒ Object
Methods included from SimpleInspect
Constructor Details
#initialize(path_or_io, options = {}) ⇒ Document
Returns a new instance of Document.
27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 |
# File 'lib/docx/document.rb', line 27 def initialize(path_or_io, = {}) @replace = {} # if path-or_io is string && does not contain a null byte if (path_or_io.instance_of?(String) && !/\u0000/.match?(path_or_io)) raise Errno::EIO.new('Invalid file format') if !File.extname(path_or_io).eql?('.docx') @zip = Zip::File.open(path_or_io) else @zip = Zip::File.open_buffer(path_or_io) end document = @zip.glob('word/document*.xml').first raise Errno::ENOENT if document.nil? @document_xml = document.get_input_stream.read @doc = Nokogiri::XML(@document_xml) load_styles yield(self) if block_given? ensure @zip.close unless @zip.nil? end |
Instance Attribute Details
#doc ⇒ Object (readonly)
Returns the value of attribute doc.
25 26 27 |
# File 'lib/docx/document.rb', line 25 def doc @doc end |
#styles ⇒ Object (readonly)
Returns the value of attribute styles.
25 26 27 |
# File 'lib/docx/document.rb', line 25 def styles @styles end |
#xml ⇒ Object (readonly)
Returns the value of attribute xml.
25 26 27 |
# File 'lib/docx/document.rb', line 25 def xml @xml end |
#zip ⇒ Object (readonly)
Returns the value of attribute zip.
25 26 27 |
# File 'lib/docx/document.rb', line 25 def zip @zip end |
Class Method Details
.open(path, &block) ⇒ Object
With no associated block, Docx::Document.open is a synonym for Docx::Document.new. If the optional code block is given, it will be passed the opened docx file as an argument and the Docx::Document oject will automatically be closed when the block terminates. The values of the block will be returned from Docx::Document.open. call-seq:
open(filepath) => file
open(filepath) {|file| block } => obj
61 62 63 |
# File 'lib/docx/document.rb', line 61 def self.open(path, &block) new(path, &block) end |
Instance Method Details
#bookmarks ⇒ Object
69 70 71 72 73 74 75 76 |
# File 'lib/docx/document.rb', line 69 def bookmarks bkmrks_hsh = {} bkmrks_ary = @doc.xpath('//w:bookmarkStart').map { |b_node| parse_bookmark_from b_node } # auto-generated by office 2010 bkmrks_ary.reject! { |b| b.name == '_GoBack' } bkmrks_ary.each { |b| bkmrks_hsh[b.name] = b } bkmrks_hsh end |
#default_paragraph_style ⇒ Object
174 175 176 |
# File 'lib/docx/document.rb', line 174 def default_paragraph_style @styles.at_xpath("w:styles/w:style[@w:type='paragraph' and @w:default='1']/w:name/@w:val").value end |
#document_properties ⇒ Object
This stores the current global document properties, for now
50 51 52 53 54 55 |
# File 'lib/docx/document.rb', line 50 def document_properties { font_size: font_size, hyperlinks: hyperlinks } end |
#each_paragraph ⇒ Object
Deprecated
Iterates over paragraphs within document call-seq:
each_paragraph => Enumerator
113 114 115 |
# File 'lib/docx/document.rb', line 113 def each_paragraph paragraphs.each { |p| yield(p) } end |
#font_size ⇒ Object
Some documents have this set, others don’t. Values are returned as half-points, so to get points, that’s why it’s divided by 2.
88 89 90 91 92 93 94 |
# File 'lib/docx/document.rb', line 88 def font_size size_value = @styles&.at_xpath('//w:docDefaults//w:rPrDefault//w:rPr//w:sz/@w:val')&.value return nil unless size_value size_value.to_i / 2 end |
#hyperlink_relationships ⇒ Object
103 104 105 |
# File 'lib/docx/document.rb', line 103 def hyperlink_relationships @rels.xpath("//xmlns:Relationship[contains(@Type,'hyperlink')]") end |
#hyperlinks ⇒ Object
Hyperlink targets are extracted from the document.xml.rels file
97 98 99 100 101 |
# File 'lib/docx/document.rb', line 97 def hyperlinks hyperlink_relationships.each_with_object({}) do |rel, hash| hash[rel.attributes['Id'].value] = rel.attributes['Target'].value end end |
#paragraphs ⇒ Object
65 66 67 |
# File 'lib/docx/document.rb', line 65 def paragraphs @doc.xpath('//w:document//w:body/w:p').map { |p_node| parse_paragraph_from p_node } end |
#replace_entry(entry_path, file_contents) ⇒ Object
170 171 172 |
# File 'lib/docx/document.rb', line 170 def replace_entry(entry_path, file_contents) @replace[entry_path] = file_contents end |
#save(path) ⇒ Object
Save document to provided path call-seq:
save(filepath) => void
131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 |
# File 'lib/docx/document.rb', line 131 def save(path) update Zip::OutputStream.open(path) do |out| zip.each do |entry| next unless entry.file? out.put_next_entry(entry.name) value = @replace[entry.name] || zip.read(entry.name) out.write(value) end end zip.close end |
#stream ⇒ Object
Output entire document as a StringIO object
148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 |
# File 'lib/docx/document.rb', line 148 def stream update stream = Zip::OutputStream.write_buffer do |out| zip.each do |entry| next unless entry.file? out.put_next_entry(entry.name) if @replace[entry.name] out.write(@replace[entry.name]) else out.write(zip.read(entry.name)) end end end stream.rewind stream end |
#style_name_of(style_id) ⇒ Object
178 179 180 |
# File 'lib/docx/document.rb', line 178 def style_name_of(style_id) styles_configuration.style_of(style_id).name end |
#styles_configuration ⇒ Object
182 183 184 |
# File 'lib/docx/document.rb', line 182 def styles_configuration @styles_configuration ||= Elements::Containers::StylesConfiguration.new(@styles.dup) end |
#tables ⇒ Object
82 83 84 |
# File 'lib/docx/document.rb', line 82 def tables @doc.xpath('//w:document//w:body//w:tbl').map { |t_node| parse_table_from t_node } end |
#to_html ⇒ Object
Output entire document as a String HTML fragment
124 125 126 |
# File 'lib/docx/document.rb', line 124 def to_html paragraphs.map(&:to_html).join("\n") end |
#to_s ⇒ Object Also known as: text
call-seq:
to_s -> string
119 120 121 |
# File 'lib/docx/document.rb', line 119 def to_s paragraphs.map(&:to_s).join("\n") end |
#to_xml ⇒ Object
78 79 80 |
# File 'lib/docx/document.rb', line 78 def to_xml Nokogiri::XML(@document_xml) end |