Module: PragmaticSegmenter::Languages::Arabic
- Includes:
- Common
- Defined in:
- lib/pragmatic_segmenter/languages/arabic.rb
Defined Under Namespace
Modules: Abbreviation Classes: AbbreviationReplacer
Constant Summary collapse
- Punctuations =
['?', '!', ':', '.', '؟', '،'].freeze
- SENTENCE_BOUNDARY_REGEX =
/.*?[:\.!\?؟،]|.*?\z|.*?$/
- ReplaceColonBetweenNumbersRule =
Rubular: rubular.com/r/RX5HpdDIyv
Rule.new(/(?<=\d):(?=\d)/, '♭')
- ReplaceNonSentenceBoundaryCommaRule =
Rubular: rubular.com/r/kPRgApNHUg
Rule.new(/،(?=\s\S+،)/, '♬')
Constants included from Common
Common::BETWEEN_DOUBLE_QUOTES_REGEX, Common::CONTINUOUS_PUNCTUATION_REGEX, Common::ExtraWhiteSpaceRule, Common::FileFormatRule, Common::GeoLocationRule, Common::KommanditgesellschaftRule, Common::MULTI_PERIOD_ABBREVIATION_REGEX, Common::NUMBERED_REFERENCE_REGEX, Common::PARENS_BETWEEN_DOUBLE_QUOTES_REGEX, Common::PossessiveAbbreviationRule, Common::QUOTATION_AT_END_OF_SENTENCE_REGEX, Common::QuestionMarkInQuotationRule, Common::SPLIT_SPACE_QUOTATION_AT_END_OF_SENTENCE_REGEX, Common::SingleNewLineRule, Common::SubSingleQuoteRule