embulk-parser-flexml

Parser plugin for Embulk.

Flexible xml parser for embulk. read data using xpath and from attributes

  • Plugin type: parser
  • Load all or nothing: yes
  • Resume supported: no

Configuration

  • type: specify this plugin as flexml .
  • root: root property to start fetching each entries, specify in path/to/node style (string, required)
  • schema: specify the attribute of table and data type (required)
    • name: name of the attribute (string, required)
    • type: type of the attribute (string, required)
    • attribute: if specified, value of this attribute will be the output, otherwise child will be the output (string, optional)
    • xpath: child element to select (string, required)
    • format: timestamp format to parse (string, required)
    • timezone: timestamp will be parsed in this timezone (string, optional)

Example

Configuration

parser:
    type: flexml
    root: Team/Players/Player
    schema:
        - { name: name, type: string, attribute: name }
        - { name: age, type: long, attribute: age }
        - { name: about, type: string, xpath: About }
        - { name: facebook, type: string, xpath: "SocialMedia[@type='facebook']", attribute: url }
        - { name: twitter, type: string, xpath: "SocialMedia[@type='twitter']", attribute: url }

XML

<?xml version="1.0" encoding="utf-8" standalone="no"?>
<Team>
    <Players>
        <Player name="Locatelli" age="23">
            <About>
                Manuel Locatelli Cavaliere OMRI (born 8 January 1998) is an Italian professional footballer who plays as a midfielder for Serie A club Juventus, on loan from Serie A club Sassuolo, and the Italy national team.
            </About>
            <SocialMedia type="facebook" url="https://www.facebook.com/locamanuel73"/>
            <SocialMedia type="twitter" url="https://twitter.com/locamanuel73"/>
        </Player>
    </Players>
</Team>