Class: PragmaticSegmenter::Languages::Japanese::Cleaner

Inherits:
Cleaner
  • Object
show all
Defined in:
lib/pragmatic_segmenter/languages/japanese.rb

Constant Summary collapse

NewLineInMiddleOfWordRule =
Rule.new(/(?<=の)\n(?=\S)/, '')

Constants included from Cleaner::Rules

Cleaner::Rules::ConsecutiveForwardSlashRule, Cleaner::Rules::ConsecutivePeriodsRule, Cleaner::Rules::DoubleNewLineRule, Cleaner::Rules::DoubleNewLineWithSpaceRule, Cleaner::Rules::EscapedCarriageReturnRule, Cleaner::Rules::EscapedNewLineRule, Cleaner::Rules::InlineFormattingRule, Cleaner::Rules::NEWLINE_IN_MIDDLE_OF_SENTENCE_REGEX, Cleaner::Rules::NO_SPACE_BETWEEN_SENTENCES_DIGIT_REGEX, Cleaner::Rules::NO_SPACE_BETWEEN_SENTENCES_REGEX, Cleaner::Rules::NewLineFollowedByBulletRule, Cleaner::Rules::NewLineFollowedByPeriodRule, Cleaner::Rules::NoSpaceBetweenSentencesDigitRule, Cleaner::Rules::NoSpaceBetweenSentencesRule, Cleaner::Rules::QuotationsFirstRule, Cleaner::Rules::QuotationsSecondRule, Cleaner::Rules::ReplaceNewlineWithCarriageReturnRule, Cleaner::Rules::TableOfContentsRule, Cleaner::Rules::TypoEscapedCarriageReturnRule, Cleaner::Rules::TypoEscapedNewLineRule, Cleaner::Rules::URL_EMAIL_KEYWORDS

Instance Attribute Summary

Attributes inherited from Cleaner

#doc_type, #text

Instance Method Summary collapse

Methods inherited from Cleaner

#initialize

Constructor Details

This class inherits a constructor from PragmaticSegmenter::Cleaner

Instance Method Details

#cleanObject



12
13
14
15
# File 'lib/pragmatic_segmenter/languages/japanese.rb', line 12

def clean
  super
  remove_newline_in_middle_of_word
end