Class: Replace

Inherits:

Object

Object
Replace

show all

Defined in:: lib/replace.rb

Instance Attribute Summary collapse

#scan ⇒ Object readonly

Returns the value of attribute scan.
#string ⇒ Object readonly

Returns the value of attribute string.

Instance Method Summary collapse

#add_line_break ⇒ Object

增加一些必要的分行.
#ancient_literature ⇒ Object
#ascii2 ⇒ Object

双字节 ASCII 字符转为单字节字符 (通过验证, 危险等级: 0) ！＂＃＄％＆＇（）＊＋，－．／０１２３４５６７８９：；＜＝＞？＠ＡＢＣＤＥＦＧＨＩＪＫＬＭＮＯＰＱＲＳＴＵＶＷＸＹＺ［＼］＾＿｀ａｂｃｄｅｆｇｈｉｊｋｌｍｎｏｐｑｒｓｔｕｖｗｘｙｚ｛｜｝～ !“#$%&‘()*+,-./ 0123456789:;<=>? PQRSTUVWXYZ^_ `abcdefghijklmno pqrstuvwxyz{|}~.
#batch_replace(regexps = {}) ⇒ Object

批量逐个替换第一个匹配项.
#blank ⇒ Object

删除汉字之间的空格 (通过验证, 危险等级: 3) 添加汉字与数字、英文之间的空格 del_head_blank.del_blank_line.
#chapter ⇒ Object

判定章节标题 (通过验证, 危险等级: 0).
#code ⇒ Object

行内代码两边各留一个空格 (未通过验证, 危险等级: 4) jekyll_code.
#del_blank_line ⇒ Object

删除多余的空行 (通过验证, 危险等级: 0) del_tail_blank.
#del_head_blank ⇒ Object

删除行首的空白 (通过验证, 危险等级: 3, 可能是 Markdown 缩进) 将看上去像空白的行转化为真真的空白行.
#del_italics_and_bold ⇒ Object

删除加粗斜体样式 (通过验证, 危险等级: 3, 可能是 Markdown 加粗斜体).
#del_line_break ⇒ Object

删除一些没必要的分行.
#del_tail_blank ⇒ Object

删除行尾的空白 (通过验证, 危险等级: 0) 将看上去像空白的行转化为真真的空白行.
#footnote ⇒ Object
#foreign_literature ⇒ Object
#format_markdown ⇒ Object
#head_foot ⇒ Object

删除页眉页脚.
#help ⇒ Object
#html2markdown ⇒ Object
#image ⇒ Object

处理插图路径 (通过验证, 危险等级: 0).
#initialize(string) ⇒ Replace constructor

A new instance of Replace.
#jekyll_code ⇒ Object

Jekyll 代码格式转为 Fenced 代码格式 (通过验证, 危险等级: 0).
#list ⇒ Object
#markdown2html ⇒ Object
#paragraph ⇒ Object

判定段落的起始 (通过验证, 危险等级: 0).
#pdftotext ⇒ Object

处理 pdftotext 的转换结果 (未通过验证, 危险等级: 4) paragraph.blank.del_line_break.chapter.list.punct2.add_line_break.
#post_pandoc_for_latex ⇒ Object
#pre_pandoc_for_latex ⇒ Object
#punct1 ⇒ Object

中文标点转为英文标点.
#punct2 ⇒ Object

中文标点转为英文标点 (通过验证, 危险等级: 3, 可能需要用中文标点) 保留部分中文符号: 、《》〈〉【】〖〗〔〕 ascii2: ？！，；：（）.
#rename ⇒ Object
#scan_image ⇒ Object
#scan_note ⇒ Object

扫描注释列表生成替换字典.
#scan_test ⇒ Object
#scan_url ⇒ Object
#simple ⇒ Object
#standard ⇒ Object

标准化 Markdown 文件, 处理 HTML 文件的转换结果 (未通过验证, 危险等级: 4) code.punct2.blank.
#taiwan ⇒ Object

台湾标点转大陆标点 (通过验证, 危险等级: 0) ascii2.
#theorem ⇒ Object

定理环境, LaTeX 命令 (未通过验证, 危险等级: 2).
#title ⇒ Object

转换 YAML 标题信息 (通过验证, 危险等级: 0).
#tree ⇒ Object

处理 Shell 命令 tree 的输出 (通过验证, 危险等级: 0).
#tw2s ⇒ Object

台湾正体到简体 brew install opencc sudo gem install ropencc.

Constructor Details

#initialize(string) ⇒ `Replace`

Returns a new instance of Replace.



8
9
10

# File 'lib/replace.rb', line 8

def initialize(string)
  @string = string
end

Instance Attribute Details

#scan ⇒ `Object` (readonly)

Returns the value of attribute scan.



6
7
8

# File 'lib/replace.rb', line 6

def scan
  @scan
end

#string ⇒ `Object` (readonly)

Returns the value of attribute string.



6
7
8

# File 'lib/replace.rb', line 6

def string
  @string
end

Instance Method Details

#add_line_break ⇒ `Object`

增加一些必要的分行

# File 'lib/replace.rb', line 228

def add_line_break
  replace(@string) do
    s /(\p{Han})[[:blank:]]*([:,])[[:blank:]]*(\p{Han})/, '\1\2 \3'
    s /(\p{Han})[[:blank:]]*([。.!?;])[[:blank:]]*(\p{Han})/, '\1\2'"\n"'\3'
    s /(\p{Han})[[:blank:]]*(\p{Ps})/, '\1 \2'
    s /(\p{Pe})[[:blank:]]*(\p{Han})/, '\1 \2'
  end
  self
end

#ancient_literature ⇒ `Object`

# File 'lib/replace.rb', line 365

def ancient_literature
  replace(@string) do
    s /_古诗文网/, ''
    s /作者：.*\r?\n/, ''
  end
  del_head_blank
end

#ascii2 ⇒ `Object`

双字节 ASCII 字符转为单字节字符 (通过验证, 危险等级: 0) ！＂＃＄％＆＇（）＊＋，－．／０１２３４５６７８９：；＜＝＞？＠ＡＢＣＤＥＦＧＨＩＪＫＬＭＮＯＰＱＲＳＴＵＶＷＸＹＺ［＼］＾＿｀ａｂｃｄｅｆｇｈｉｊｋｌｍｎｏｐｑｒｓｔｕｖｗｘｙｚ｛｜｝～ !“#$%&‘()*+,-./ 0123456789:;<=>? PQRSTUVWXYZ^_ `abcdefghijklmno pqrstuvwxyz{|}~

# File 'lib/replace.rb', line 203

def ascii2
  replace(@string) do
    s /([\u{FF01}-\u{FF5E}])/ do
      bytes = $1.bytes
      bytes[1] -= 0xBC
      bytes[2] -= 0x60
      bytes[2] += 64*bytes[1]
      bytes[2..2].pack("c*")
    end
  end
  self
end

#batch_replace(regexps = {}) ⇒ `Object`

批量逐个替换第一个匹配项

# File 'lib/replace.rb', line 49

def batch_replace(regexps = {})
  regexps.each do |key, value|
    replace(@string) do
      sub! Regexp.new("\\G(.*?)#{key}", Regexp::MULTILINE), '\1'" ^[#{value}] "
    end
  end
  self
end

#blank ⇒ `Object`

删除汉字之间的空格 (通过验证, 危险等级: 3) 添加汉字与数字、英文之间的空格 del_head_blank.del_blank_line

# File 'lib/replace.rb', line 241

def blank
  replace(@string) do
    # 删除汉字之间的空格, "无 法 处 理 这 种 情 况"
    s /(\p{Han})[[:blank:]]+(\p{Han})/, '\1\2'
    # 添加汉字与数字、英文之间的空格
    s /(\p{Han})(\w)/, '\1 \2'
    s /(\w)(\p{Han})/, '\1 \2'
  end
  del_head_blank.del_blank_line
end

#chapter ⇒ `Object`

判定章节标题 (通过验证, 危险等级: 0)

# File 'lib/replace.rb', line 382

def chapter
  replace(@string) do
    s /^第[一二三四五六七八九十]+[卷部篇]/, 'PART: '
    s /^第[一二三四五六七八九十]+[章]/, '# '
    s /^第[一二三四五六七八九十]+[节]/, '## '
    s /^[一二三四五六七八九十]+、/, '### '
    s /^\([一二三四五六七八九十]+\)/, '#### '
  end
  self
end

#code ⇒ `Object`

行内代码两边各留一个空格 (未通过验证, 危险等级: 4) jekyll_code

# File 'lib/replace.rb', line 300

def code
  replace(@string) do
    # 行内代码两边各留一个空格
    s /([[:alnum:]])`([^`]+?)`([[:alnum:]])/, '\1 `\2` \3'
  end
  jekyll_code
end

#del_blank_line ⇒ `Object`

删除多余的空行 (通过验证, 危险等级: 0) del_tail_blank

# File 'lib/replace.rb', line 272

def del_blank_line
  replace(@string) do
    s /(^[[:blank:]]*\r?\n){2,}/, "\n"
  end
  del_tail_blank
end

#del_head_blank ⇒ `Object`

删除行首的空白 (通过验证, 危险等级: 3, 可能是 Markdown 缩进) 将看上去像空白的行转化为真真的空白行

# File 'lib/replace.rb', line 254

def del_head_blank
  replace(@string) do
    s /^[[:blank:]]+/, ''
  end
  self
end

#del_italics_and_bold ⇒ `Object`

删除加粗斜体样式 (通过验证, 危险等级: 3, 可能是 Markdown 加粗斜体)

# File 'lib/replace.rb', line 345

def del_italics_and_bold
  replace(@string) do
    s /([\W_]|^)(\*\*|__)(?=\S)([^\r]*?\S[\*_]*)\2([\W_]|$)/, '\1\3\4'
    s /([\W_]|^)(\*|_)(?=\S)([^\r\*_]*?\S)\2([\W_]|$)/, '\1\3\4'
  end
  self
end

#del_line_break ⇒ `Object`

删除一些没必要的分行

# File 'lib/replace.rb', line 217

def del_line_break
  replace(@string) do
    # "无\n法\n处\n理\n这\n种\n情\n况"
    s /(\p{Han})\r?\n(\p{Han})/, '\1\2'
    s /(\p{Han})\r?\n([[:punct:]])/, '\1\2'
    s /…{3,}(\r?\n)+/, ''
  end
  self
end

#del_tail_blank ⇒ `Object`

删除行尾的空白 (通过验证, 危险等级: 0) 将看上去像空白的行转化为真真的空白行

# File 'lib/replace.rb', line 263

def del_tail_blank
  replace(@string) do
    s /[[:blank:]]+\r?\n/, "\n"
  end
  self
end

#footnote ⇒ `Object`



58
59
60

# File 'lib/replace.rb', line 58

def footnote
  batch_replace(scan_note)
end

#foreign_literature ⇒ `Object`

# File 'lib/replace.rb', line 353

def foreign_literature
  replace(@string) do
    s /\s*\n/, "\n\n"
    s /\${4,}\s*/, '#### '
    s /[　\u{001A}]/, ''
    s /# [０-９]+．\s*/, '## '
    s /#### 第[^\r\n]+[卷部]\s*(.*)\s*\n/, "PART: "'\1'"\n\n"
    s /#### 第[^\r\n]+[章]\s*(.*)\s*\n/, "# "'\1'"\n\n"
  end
  del_head_blank
end

#format_markdown ⇒ `Object`



401
402
403

# File 'lib/replace.rb', line 401

def format_markdown
  markdown2html.html2markdown
end

#head_foot ⇒ `Object`

删除页眉页脚

# File 'lib/replace.rb', line 289

def head_foot
  replace(@string) do
    s /\A(^[^\r\n]*\r?\n){11}\s*/m, ''
    s /^\[«.*?\z/m, ''
    # s /(^.*?\r?\n){4}\z/, ''
  end
  self
end

#help ⇒ `Object`

# File 'lib/replace.rb', line 12

def help
  method_comments = {}
  replace(@string) do
    s /((.*#.*\r?\n)*)\s*def\s+(\w+)/ do
      method_comments[$3.to_sym] = $1
    end
  end
  method_comments
end

#html2markdown ⇒ `Object`

# File 'lib/replace.rb', line 411

def html2markdown
  converter = PandocRuby.new(@string, from: :html, to: :markdown)
  @string = converter.convert('chapters', 'atx-headers', 'normalize', 'no-wrap')
  self
end

#image ⇒ `Object`

处理插图路径 (通过验证, 危险等级: 0)

# File 'lib/replace.rb', line 280

def image
  replace(@string) do
    s /Insert\s(18333fig\d+)\.png\s*\n.*?\d{1,2}-\d{1,2}\. (.*)/, '![\2](\1-tn.png)'
    s /!\[(.*?)\]\(\S*\/(\S*?)( ".*")?\)/, '![\1](\2)'
  end
  self
end

#jekyll_code ⇒ `Object`

Jekyll 代码格式转为 Fenced 代码格式 (通过验证, 危险等级: 0)

# File 'lib/replace.rb', line 309

def jekyll_code
  replace(@string) do
    s /\s*\{%\s*highlight\s+(\w+)\s*%\}\s*/, "\n\n"'```{.\1}'"\n"
    s /\s*\{%\s*endhighlight\s*%\}\s*/, "\n"'```'"\n\n"
  end
  self
end

#list ⇒ `Object`

# File 'lib/replace.rb', line 393

def list
  replace(@string) do
    s /^(\d.)\s*/, '\1'"\t"
    s /^[●]\s*/, "-\t"
  end
  self
end

#markdown2html ⇒ `Object`

# File 'lib/replace.rb', line 405

def markdown2html
  converter = PandocRuby.new(@string, from: :markdown, to: :html)
  @string = converter.convert('chapters', 'indented-code-classes' => 'sourceCode')
  self
end

#paragraph ⇒ `Object`

判定段落的起始 (通过验证, 危险等级: 0)

# File 'lib/replace.rb', line 374

def paragraph
  replace(@string) do
    s /^[[:blank:]]{2,}/, "\n"
  end
  self
end

#pdftotext ⇒ `Object`

处理 pdftotext 的转换结果 (未通过验证, 危险等级: 4) paragraph.blank.del_line_break.chapter.list.punct2.add_line_break

# File 'lib/replace.rb', line 112

def pdftotext
  replace(@string) do
    # 删除页码行
    s /^[[:blank:]]*[０-９]+[[:blank:]]*\r?\n/, ''
  end
  paragraph.blank.del_line_break.chapter.list.punct2.add_line_break
end

#post_pandoc_for_latex ⇒ `Object`

# File 'lib/replace.rb', line 94

def post_pandoc_for_latex
  replace(@string) do
    s /\{verbatim\}/, '{Verbatim}'
    s /\\begin\{center\}\\rule\{(.*?)\}\{(.*?)\}\\end\{center\}/, '\newpage'
    s /\s*\\footnote\{(.*?)\}\s*/, '\footnote{\1}'
    s /\\footnote\{(.*?)[:：]\s*(.*?)\}/, '〔{\kaishu \1: \2}〕'
  end
  theorem
end

#pre_pandoc_for_latex ⇒ `Object`



90
91
92

# File 'lib/replace.rb', line 90

def pre_pandoc_for_latex
  title
end

#punct1 ⇒ `Object`

中文标点转为英文标点

# File 'lib/replace.rb', line 121

def punct1
  replace(@string) do
    s /，/, ', '
    s /：([^\r\n])/, ":\n"'\1'
    s /；([^\r\n])/, ";\n"'\1'
    s /。([^\r\n])/, ".\n"'\1'
    s /？([^\r\n])/, "?\n"'\1'
    s /！([^\r\n])/, "!\n"'\1'
    s /：\r?\n/, ":\n"
    s /；\r?\n/, ";\n"
    s /。\r?\n/, ".\n"
    s /？\r?\n/, "?\n"
    s /！\r?\n/, "!\n"
    s /（/, ' ('
    s /）/, ') '
    s /\) ([,.])/, ')\1'
  end
  self
end

#punct2 ⇒ `Object`

中文标点转为英文标点 (通过验证, 危险等级: 3, 可能需要用中文标点) 保留部分中文符号: 、《》〈〉【】〖〗〔〕 ascii2: ？！，；：（）

# File 'lib/replace.rb', line 144

def punct2
  replace(@string) do
    # ‐‑‒–—―‖‗‘’‚‛“”„‟
    # †‡•‣․‥…‧
    # ‰‱′″‴‵‶‷‸‹›※‼‽‾‿
    # ⁀⁁⁂⁃
    # ⁅⁆⁇⁈⁉⁊⁋⁌⁍⁎⁏
    # ⁐⁑
    # ⁓⁔⁕⁖⁗⁘⁙⁚⁛⁜⁝⁞
    # ⁽⁾
    # 、。〃
    # 〈〉《》「」『』
    # 【】
    # 〔〕〖〗〘〙〚〛〜〝〞〟
    # 〰
    # 〽
    # \p{S}: $+<=>^`|~⁄⁒
    # \p{Sm}: +<=>|~⁄⁒
    # \p{Sc}: $
    # \p{Sk}: ^`
    # \p{Pi}: ‘‛“‟
    # \p{Pf}: ’”
    # 句末符号 .!?;:
    # 标点符号 `$()''""
    # 句中符号 ,、
    s /。/, '.'
    s /[“”]/, '"'
    s /[‘’]/, "'"
    s /──/, '---'
    s /—/, '--'
  end
  ascii2
end

#rename ⇒ `Object`

# File 'lib/replace.rb', line 80

def rename
  replace(@string) do
    s /!\[\]\(image(\d+).jpg\)/ do
      i = $1.to_i - 1
      "![](image%03d.jpg)" % i
    end
  end
  self
end

#scan_image ⇒ `Object`



30
31
32

# File 'lib/replace.rb', line 30

def scan_image
  @scan = @string.scan(/!\[.*?\]\(([^\s]+?)(?:\s+.*?)?\)/)
end

#scan_note ⇒ `Object`

扫描注释列表生成替换字典

# File 'lib/replace.rb', line 35

def scan_note
  del_head_blank
  note = {}
  # @string.scan(/^[(（]\d+[）)]\s*(.*?)[:：]\s*(.*?)\\?\r?\n/) do |key, value|
  @string.scan(/^(.*?)〔(.*?〕.*?)\r?\n/) do |key, value|
    # key_stem = key.gsub(/[(（](.*?)[）)]/, '')
    key_stem = "\\^#{key}\\^"
    # note[key_stem] = "#{key}: #{value}"
    note[key_stem] = value.sub(/〕/, ': ')
  end
  note
end

#scan_test ⇒ `Object`



22
23
24

# File 'lib/replace.rb', line 22

def scan_test
  @scan = @string.scan(/\w+/)
end

#scan_url ⇒ `Object`



26
27
28

# File 'lib/replace.rb', line 26

def scan_url
  @scan = @string.scan(/href=['"](.*?)['"]/)
end

#simple ⇒ `Object`

# File 'lib/replace.rb', line 62

def simple
  replace(@string) do
    s /cc/, 'dd'
    s /aa/, 'bb'
  end
  self
end

#standard ⇒ `Object`

标准化 Markdown 文件, 处理 HTML 文件的转换结果 (未通过验证, 危险等级: 4) code.punct2.blank



106
107
108

# File 'lib/replace.rb', line 106

def standard
  blank.del_line_break.punct2.code.add_line_break.format_markdown
end

#taiwan ⇒ `Object`

台湾标点转大陆标点 (通过验证, 危险等级: 0) ascii2

# File 'lib/replace.rb', line 180

def taiwan
  replace(@string) do
    s /「/, '‘'
    s /」/, '’'
    s /『/, '“'
    s /』/, '”'
  end
  ascii2
end

#theorem ⇒ `Object`

定理环境, LaTeX 命令 (未通过验证, 危险等级: 2)

# File 'lib/replace.rb', line 318

def theorem
  replace(@string) do
    s /^(ASSUMPTION|DEFINITION|CONCLUSION|ALGORITHM|EXPERIMENT|EXAMPLE|REMARK|NNOTE|THEOREM|AXIOM|LEMMA|PROPERTY|COROLLARY|PROPOSITION|CLAIM|PROBLEM|QUESTION|CONJECTURE|PROOF|SOLUTION|ANSWER|ANALYSIS)[.:](.*?)(\n(?=\n)|\Z)/mi do
      css_class = $1.downcase
      "\\begin{#{css_class}}\n#{$2.strip}\n\\end{#{css_class}}\n"
    end
  end
  replace(@string) do
    s /^(PART)[.:](.*?)(\n(?=\n)|\Z)/mi do
      "\\#{$1.downcase}{#{$2.strip}}\n"
    end
  end
  self
end

#title ⇒ `Object`

转换 YAML 标题信息 (通过验证, 危险等级: 0)

# File 'lib/replace.rb', line 334

def title
  replace(@string) do
    s /\A^-{3,}\r?\n(.*?)^-{3,}\r?\n/m do
      doc = YAML::load($1)
      "# #{doc['title']}\n\n" if doc['title']
    end
  end
  self
end

#tree ⇒ `Object`

处理 Shell 命令 tree 的输出 (通过验证, 危险等级: 0)

# File 'lib/replace.rb', line 71

def tree
  replace(@string) do
    s /[│├]/, '|'
    s /[└]/, '\\'
    s /[─]/, '-'
  end
  self
end

#tw2s ⇒ `Object`

台湾正体到简体 brew install opencc sudo gem install ropencc

# File 'lib/replace.rb', line 420

def tw2s
  converter = Ropencc.open('tw2s.json')
  @string = converter.convert(@string)
  self
end

Class: Replace

Instance Attribute Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(string) ⇒ Replace

Instance Attribute Details

#scan ⇒ Object (readonly)

#string ⇒ Object (readonly)

Instance Method Details

#add_line_break ⇒ Object

#ancient_literature ⇒ Object

#ascii2 ⇒ Object

#batch_replace(regexps = {}) ⇒ Object

#blank ⇒ Object

#chapter ⇒ Object

#code ⇒ Object

#del_blank_line ⇒ Object

#del_head_blank ⇒ Object

#del_italics_and_bold ⇒ Object

#del_line_break ⇒ Object

#del_tail_blank ⇒ Object

#footnote ⇒ Object

#foreign_literature ⇒ Object

#format_markdown ⇒ Object

#head_foot ⇒ Object

#help ⇒ Object

#html2markdown ⇒ Object

#image ⇒ Object

#jekyll_code ⇒ Object

#list ⇒ Object

#markdown2html ⇒ Object

#paragraph ⇒ Object

#pdftotext ⇒ Object

#post_pandoc_for_latex ⇒ Object

#pre_pandoc_for_latex ⇒ Object

#punct1 ⇒ Object

#punct2 ⇒ Object

#rename ⇒ Object

#scan_image ⇒ Object

#scan_note ⇒ Object

#scan_test ⇒ Object

#scan_url ⇒ Object

#simple ⇒ Object

#standard ⇒ Object

#taiwan ⇒ Object

#theorem ⇒ Object

#title ⇒ Object

#tree ⇒ Object

#tw2s ⇒ Object