Ruby Splitta

Status

Travis Build Status Code Climate Test Coverage

Description

Splitta Includes proper tokenization and models for very high accuracy sentence boundary detection (English only for now). The models are trained from Wall Street Journal news combined with the Brown Corpus which is intended to be widely representative of written English. Error rates on test news data are near 0.25%.

Installation

gem install ruby-splitta

Requirements

  • Ruby 2.5.1 or higher

Usage

require 'splitta'

Splitta.sentences("Some text goes here.")

License

MIT. See the LICENSE file.

References

Dan Gillick, “Sentence Boundary Detection and the Problem with the U.S.” at NAACL 2009, http://dgillick.com/resource/sbd_naacl_2009.pdf