Syllabify

A Ruby port of the Penn Phonetic Toolkit (P2TK) syllabifier. Unlike the P2TK syllabifier, this implementation works on transcriptions in IPA rather than Arpabet. Given a phonemic transcription in IPA, it automatically segments the phonemes into syllables.

Like the P2TK syllabifier, a phoneme inventory containing the legal consonants, nuclei (typically the language’s vowels), and onsets in the transcribed language must be created. This inventory is specified as plain text in YAML and a default phoneme inventory for English from the P2TK syllabifier is included. If you create inventories for other languages, please submit a pull request (or simply email it to me if you’re not a techie) and I will include it in subsequent releases of the gem.

Full documentation is at RubyDoc.info.

Transcription constraints

Any phonemes represented in IPA by digraphs (such as affricates, doubly-articulated consonants, and diphthongs) must be transcribed using a tie, otherwise there is no way to distinguish them from the phonemes of their individual components and syllabification will be incorrect in some cases.

For example, in English transcriptions the voiceless postalveolar affricate is customarily transcribed without the tie. For the purposes of syllabification, however, this is problematic because in English /t͡ʃ/ is a legal onset but /tʃ/ is not. If the English voiceless postalveolar affricate were to be transcribed without the tie as /tʃ/ then it would have to be included in the inventory of onsets, but doing so would cause the phoneme sequence /t/ followed by /ʃ/ to also be interpreted as an onset—which it isn’t. In this case, without the tie transcriptions where /tʃ/ represents two different phonemes rather than one—such as in nutshell /nʌtʃɛl/—will be incorrectly syllabified as /nʌ.tʃɛl/ rather than /nʌt.ʃɛl/. Similarly, without a tie it’s not possible to determine whether the diphthong /ɔɪ/ in clawing /klɔɪŋ/ is one or two separate phonemes.

In other words, tie glyphs together if they represent the same phoneme. The only digraphs requiring ties in English are the voiced and voiceless postalveolar affricates /t͡ʃ, d͡ʒ/ and the diphthongs /a͡ʊ, a͡ɪ, e͡ɪ, o͡ʊ, ɔ͡ɪ/.

How to enter ties

The tie is represented in Unicode by Combining Double Inverted Breve (U+0361). This character is entered between the two characters to be tied.

Ligatures

Phonemes that require transcription with ties could potentially be alternatively transcribed using their respective ligatures, but glyphs for all the potential ligatures aren’t defined in Unicode and use of the ligatures is no longer official IPA usage in any case.

Example

transcription = CodyRobbins::Syllabify.new(:en, 'dɪˌsɔrgənəˈze͡ɪʃən')

transcription.to_s      #=> 'dɪ.ˌsɔr.gə.nə.ˈze͡ɪ.ʃən'
transcription.syllables #=> [dɪ, ˌsɔr, gə, nə, ˈze͡ɪ, ʃən]

syllable = transcription.syllables[4]
syllable.stress  #=> 'ˈ'
syllable.onset   #=> 'z'
syllable.nucleus #=> 'e͡ɪ'
syllable.coda    #=> ''

syllable = transcription.syllables.last
syllable.stress  #=> nil
syllable.onset   #=> 'ʃ'
syllable.nucleus #=> 'ə'
syllable.coda    #=> 'n'

Colophon

See also

If you like this gem, you may also want to check out transliterate.

Tested with

  • Ruby 1.9.2-p290 — 18 October 2011

Contributing

To send patches, please fork on GitHub and submit a pull request.

Credits

© 2011 Cody Robbins. See LICENSE for details.