Class: DerivativeRodeo::Services::PdfSplitter::PagesSummary
- Inherits:
-
Struct
- Object
- Struct
- DerivativeRodeo::Services::PdfSplitter::PagesSummary
- Defined in:
- lib/derivative_rodeo/services/pdf_splitter/pages_summary.rb
Overview
A simple data structure that summarizes the image properties of the given path.
Constant Summary collapse
- COL_WIDTH =
class constant column numbers
3
- COL_HEIGHT =
4
- COL_COLOR_DESC =
5
- COL_CHANNELS =
6
- COL_BITS =
7
- COL_XPPI =
only poppler 0.25+ has this column in output:
12
Instance Attribute Summary collapse
-
#bits_per_channel ⇒ Object
(also: #bits)
Returns the value of attribute bits_per_channel.
-
#channels ⇒ Object
Returns the value of attribute channels.
-
#color_description ⇒ Object
Returns the value of attribute color_description.
-
#height ⇒ Object
Returns the value of attribute height.
-
#page_count ⇒ Object
Returns the value of attribute page_count.
-
#path ⇒ Object
Returns the value of attribute path.
-
#pixels_per_inch ⇒ Object
(also: #ppi)
Returns the value of attribute pixels_per_inch.
-
#width ⇒ Object
Returns the value of attribute width.
Class Method Summary collapse
-
.extract_from(path:) ⇒ DerivativeRodeo::PdfSplitter::PagesSummary
Responsible for determining the image properties of the PDF.
Instance Method Summary collapse
- #color ⇒ Array<String, Integer, Integer>
-
#valid? ⇒ Boolean
If the underlying extraction couldn’t set the various properties, we likely have an invalid_pdf.
Instance Attribute Details
#bits_per_channel ⇒ Object Also known as: bits
Returns the value of attribute bits_per_channel
9 10 11 |
# File 'lib/derivative_rodeo/services/pdf_splitter/pages_summary.rb', line 9 def bits_per_channel @bits_per_channel end |
#channels ⇒ Object
Returns the value of attribute channels
9 10 11 |
# File 'lib/derivative_rodeo/services/pdf_splitter/pages_summary.rb', line 9 def channels @channels end |
#color_description ⇒ Object
Returns the value of attribute color_description
9 10 11 |
# File 'lib/derivative_rodeo/services/pdf_splitter/pages_summary.rb', line 9 def color_description @color_description end |
#height ⇒ Object
Returns the value of attribute height
9 10 11 |
# File 'lib/derivative_rodeo/services/pdf_splitter/pages_summary.rb', line 9 def height @height end |
#page_count ⇒ Object
Returns the value of attribute page_count
9 10 11 |
# File 'lib/derivative_rodeo/services/pdf_splitter/pages_summary.rb', line 9 def page_count @page_count end |
#path ⇒ Object
Returns the value of attribute path
9 10 11 |
# File 'lib/derivative_rodeo/services/pdf_splitter/pages_summary.rb', line 9 def path @path end |
#pixels_per_inch ⇒ Object Also known as: ppi
Returns the value of attribute pixels_per_inch
9 10 11 |
# File 'lib/derivative_rodeo/services/pdf_splitter/pages_summary.rb', line 9 def pixels_per_inch @pixels_per_inch end |
#width ⇒ Object
Returns the value of attribute width
9 10 11 |
# File 'lib/derivative_rodeo/services/pdf_splitter/pages_summary.rb', line 9 def width @width end |
Class Method Details
.extract_from(path:) ⇒ DerivativeRodeo::PdfSplitter::PagesSummary
Uses poppler 0.19+ pdfimages command to extract image listing metadata from PDF files. Though we are optimizing for 0.25 or later for poppler.
For dpi extraction, falls back to calculating using MiniMagick, if neccessary.
Responsible for determining the image properties of the PDF.
The first two lines are tabular header information:
rubocop:disable Metrics/AbcSize - Because this helps us process the results in one loop. rubocop:disable Metrics/MethodLength - Again, to help speed up the processing loop. rubocop:disable Metrics/CyclomaticComplexity rubocop:disable Metrics/PerceivedComplexity
72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 |
# File 'lib/derivative_rodeo/services/pdf_splitter/pages_summary.rb', line 72 def PagesSummary.extract_from(path:) # NOTE: https://github.com/scientist-softserv/iiif_print/pull/223/files for piping warnings # to /dev/null command = format('pdfimages -list %<path>s 2>/dev/null', path: path) page_count = 0 color_description = 'gray' width = 0 height = 0 channels = 0 bits_per_channel = 0 pixels_per_inch = 0 Open3.popen3(command) do |_stdin, stdout, _stderr, _wait_thr| stdout.read.split("\n").each_with_index do |line, index| # Skip the two header lines (see the above example) next if index <= 1 page_count += 1 cells = line.gsub(/\s+/m, ' ').strip.split(' ') color_description = 'rgb' if cells[COL_COLOR_DESC] != 'gray' width = cells[COL_WIDTH].to_i if cells[COL_WIDTH].to_i > width height = cells[COL_HEIGHT].to_i if cells[COL_HEIGHT].to_i > height channels = cells[COL_CHANNELS].to_i if cells[COL_CHANNELS].to_i > channels bits_per_channel = cells[COL_BITS].to_i if cells[COL_BITS].to_i > bits_per_channel # In the case of poppler version < 0.25, we will have no more than 12 columns. As such, # we need to do some alternative magic to calculate this. if page_count == 1 && cells.size <= 12 pdf = MiniMagick::Image.open(path) width_points = pdf.width width_px = width pixels_per_inch = (72 * width_px / width_points).to_i elsif cells[COL_XPPI].to_i > pixels_per_inch pixels_per_inch = cells[COL_XPPI].to_i end # By the magic of nil#to_i if we don't have more than 12 columns, we've already set # the pixels_per_inch and this line won't due much of anything. end end new( path: path, page_count: page_count, pixels_per_inch: pixels_per_inch, width: width, height: height, color_description: color_description, channels: channels, bits_per_channel: bits_per_channel ) end |
Instance Method Details
#color ⇒ Array<String, Integer, Integer>
24 25 26 |
# File 'lib/derivative_rodeo/services/pdf_splitter/pages_summary.rb', line 24 def color [color_description, channels, bits_per_channel] end |
#valid? ⇒ Boolean
If the underlying extraction couldn’t set the various properties, we likely have an invalid_pdf.
32 33 34 35 36 37 38 39 40 |
# File 'lib/derivative_rodeo/services/pdf_splitter/pages_summary.rb', line 32 def valid? return false if pdf_pages_summary.color_description.nil? return false if pdf_pages_summary.channels.nil? return false if pdf_pages_summary.bits_per_channel.nil? return false if pdf_pages_summary.height.nil? return false if pdf_pages_summary.page_count.to_i.zero? true end |