Class: DerivativeRodeo::Services::ExtractWordCoordinatesFromHocrSgmlService::AltoXml

Inherits:
Object
  • Object
show all
Defined in:
lib/derivative_rodeo/services/extract_word_coordinates_from_hocr_sgml_service.rb

Instance Attribute Summary collapse

Class Method Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(words:, width:, height:, scaling: 1.0) ⇒ AltoXml

Returns a new instance of AltoXml.



249
250
251
252
253
254
# File 'lib/derivative_rodeo/services/extract_word_coordinates_from_hocr_sgml_service.rb', line 249

def initialize(words:, width:, height:, scaling: 1.0)
  @words = words
  @height = height.to_i
  @width = width.to_i
  @scaling = scaling
end

Instance Attribute Details

#heightObject (readonly)

Returns the value of attribute height.



256
257
258
# File 'lib/derivative_rodeo/services/extract_word_coordinates_from_hocr_sgml_service.rb', line 256

def height
  @height
end

#scalingObject (readonly)

Returns the value of attribute scaling.



256
257
258
# File 'lib/derivative_rodeo/services/extract_word_coordinates_from_hocr_sgml_service.rb', line 256

def scaling
  @scaling
end

#widthObject (readonly)

Returns the value of attribute width.



256
257
258
# File 'lib/derivative_rodeo/services/extract_word_coordinates_from_hocr_sgml_service.rb', line 256

def width
  @width
end

#wordsObject (readonly)

Returns the value of attribute words.



256
257
258
# File 'lib/derivative_rodeo/services/extract_word_coordinates_from_hocr_sgml_service.rb', line 256

def words
  @words
end

Class Method Details

.to_alto(words:, width: nil, height: nil) ⇒ String

Returns the ALTO XML representation of the given words and their coordinates.

Parameters:

  • words (Array<Hash>)

    an array of hash objects that have the keys ‘:word` and `:coordinates`.

  • width (Integer, nil) (defaults to: nil)

    the width of the “canvas” on which the words appear.

  • height (Integer, nil) (defaults to: nil)

    the height of the “canvas” on which the words appear.

Returns:

  • (String)

    the ALTO XML representation of the given words and their coordinates.



245
246
247
# File 'lib/derivative_rodeo/services/extract_word_coordinates_from_hocr_sgml_service.rb', line 245

def self.to_alto(words:, width: nil, height: nil)
  new(words: words, width: width, height: height).to_alto
end

Instance Method Details

#to_altoString

Output ALTO XML of word coordinates

Returns:

  • (String)

    ALTO XML representation of the words and their coordinates



261
262
263
264
265
266
267
268
269
270
271
272
273
274
# File 'lib/derivative_rodeo/services/extract_word_coordinates_from_hocr_sgml_service.rb', line 261

def to_alto
  page = alto_page(width, height) do |xml|
    words.each do |word|
      xml.String(
        CONTENT: word[:word],
        WIDTH: scale_point(word[:coordinates][2]).to_s,
        HEIGHT: scale_point(word[:coordinates][3]).to_s,
        HPOS: scale_point(word[:coordinates][0]).to_s,
        VPOS: scale_point(word[:coordinates][1]).to_s
      ) { xml.text '' }
    end
  end
  page.to_xml
end