Class: CBETA

Inherits:
Object
  • Object
show all
Defined in:
lib/cbeta.rb

Defined Under Namespace

Classes: BMToText, Canon, CharCount, CharFrequency, Gaiji, HTMLToPDF, HTMLToText, P5aToHTML, P5aToHTMLForEveryEdition, P5aToHTMLForPDF, P5aToSimpleHTML, P5aToText, P5aValidator, UnicodeService, XMLDocument

Constant Summary collapse

CANON =
'CC|DA|GA|GB|LC|TX|ZS|ZW|[A-Z]'
SORT_ORDER =
%w(T X A K S F C D U P J L G M N ZS I ZW B GA GB Y LC TX CC)
VOL3 =
%w[A CC C G GA GB L M P U]
DATA =
File.join(File.dirname(__FILE__), 'data')
PUNCS =
',.()[] 。‧.,、;?!:︰/()「」『』《》<>〈〉〔〕[]【】〖〗〃…—─ ~│┬▆△*+-='

Class Method Summary collapse

Instance Method Summary collapse

Constructor Details

#initializeCBETA

載入藏經資料



212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
# File 'lib/cbeta.rb', line 212

def initialize()
  @canon_abbr = {}
  @canon_nickname = {}
  fn = File.join(File.dirname(__FILE__), 'data/canons.csv')
  CSV.foreach(fn, :headers => true, encoding: 'utf-8') do |row|
    id = row['id']
    unless row['nickname'].nil?
      @canon_nickname[id] = row['nickname']
    end
    next if row['abbreviation'].nil?
    next if row['abbreviation'].empty?
    @canon_abbr[id] = row['abbreviation']
  end
  
  fn = File.join(File.dirname(__FILE__), 'data/categories.json')
  s = File.read(fn)
  @categories = JSON.parse(s)
end

Class Method Details

.get_canon_from_vol(vol) ⇒ String

由 冊號 取得 藏經 ID

Parameters:

  • vol (String)

    冊號, 例如 “T01” 或 “GA009”

Returns:

  • (String)

    藏經 ID,例如 “T” 或 “GA”



32
33
34
# File 'lib/cbeta.rb', line 32

def self.get_canon_from_vol(vol)
  vol.sub(/^(#{CANON}).*$/, '\1')
end

.get_canon_id_from_linehead(linehead) ⇒ String

由 行首資訊 取得 藏經 ID

Parameters:

  • linehead (String)

    行首資訊, 例如 “T01n0001_p0001a01” 或 “GA009n0008_p0003a01”

Returns:

  • (String)

    藏經 ID,例如 “T” 或 “GA”



18
19
20
# File 'lib/cbeta.rb', line 18

def self.get_canon_id_from_linehead(linehead)
  linehead.sub(/^(#{CANON}).*$/, '\1')
end

.get_canon_id_from_work_id(work) ⇒ String

由 典籍編號 取得 藏經 ID

Parameters:

  • work (String)

    典籍編號, 例如 “T0001” 或 “ZW0001”

Returns:

  • (String)

    藏經 ID,例如 “T” 或 “ZW”



25
26
27
# File 'lib/cbeta.rb', line 25

def self.get_canon_id_from_work_id(work)
  work.sub(/^(#{CANON}).*$/, '\1')
end

.get_linehead(file_basename, lb) ⇒ String

Returns CBETA 行首資訊,例如 “T01n0001_p0001a01” 或 “T25n1510ap0757b29”.

Parameters:

  • file_basename (String)

    XML檔主檔名, 例如 “T01n0001” 或 “T25n1510a”

  • lb (String)

    例如 “0001a01” 或 “0757b29”

Returns:

  • (String)

    CBETA 行首資訊,例如 “T01n0001_p0001a01” 或 “T25n1510ap0757b29”



39
40
41
42
43
44
45
46
47
48
# File 'lib/cbeta.rb', line 39

def self.get_linehead(file_basename, lb)
  if file_basename.match(/^(T\d\dn0220)/)
    r = $1
  else
    r = file_basename
  end
  r += '_' if r.match(/\d$/)
  r += 'p' + lb
  r
end

.get_sort_order_from_canon_id(canon) ⇒ String

由「藏經 ID」取得「排序用編號」,例如:傳入 “T” 回傳 “A”;傳入 “X” 回傳 “B”

Parameters:

  • canon (String)

    藏經 ID

Returns:

  • (String)

    排序用編號



138
139
140
141
142
143
144
145
146
147
# File 'lib/cbeta.rb', line 138

def self.get_sort_order_from_canon_id(canon)
  # CBETA 提供,惠敏法師最後決定的全文檢索順序表, 2016-06-03
  i = SORT_ORDER.index(canon)
  if i.nil?
    puts "unknown canon id: #{canon}" 
    return nil
  end
  
  (i + 'A'.ord).chr
end

.get_work_id_from_file_basename(fn) ⇒ String

由 XML檔主檔名 取得 典籍編號

Parameters:

  • fn (String)

    檔名, 例如 “T01n0001” 或 “GA009n0008”

Returns:

  • (String)

    典籍編號,例如 “T0001” 或 “GA0008”



129
130
131
132
133
# File 'lib/cbeta.rb', line 129

def self.get_work_id_from_file_basename(fn)
  r = fn.sub(/^(#{CANON})\d{2,3}n(.*)$/, '\1\2')
  r = 'T0220' if r.start_with? 'T0220'
  r
end

.get_xml_file_from_vol_and_work(vol, work) ⇒ String

由 冊號 及 典籍編號 取得 XML 主檔名

Parameters:

  • vol (String)

    冊號, 例如 “T01” 或 “GA009”

  • work (String)

    典籍編號, 例如 “T0001” 或 “GA0008”

Returns:

  • (String)

    XML主檔名,例如 “T01n0001” 或 “GA009n0008”



54
55
56
# File 'lib/cbeta.rb', line 54

def self.get_xml_file_from_vol_and_work(vol, work)
  vol + 'n' + work.sub(/^(#{CANON})(.*)$/, '\2')
end

.juan_across_vol(vol, work, juan = nil) ⇒ Numeric

卷跨冊

Returns:

  • (Numeric)

    1: 卷跨冊的上半部; 2: 卷跨冊的下半部



76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
# File 'lib/cbeta.rb', line 76

def self.juan_across_vol(vol, work, juan=nil)
  case work
  when 'GA0037'
    case vol
    when 'GA036'
      1 if juan == 2
    when 'GA037'
      2 if juan.nil? or juan == 2
    end
  when 'L1557'
    case vol
    when 'L130'
      1 if juan == 17 # 上半卷
    when 'L131'
      case juan
      when 17, nil then 2
      when 34      then 1
      end
    when 'L132'
      case juan
      when 34, nil then 2
      when 51      then 1
      end
    when 'L133'
      2 if juan.nil? or juan == 51
    end
  when 'X0714'
    case vol
    when 'X39'
      1 if juan == 3
    when 'X40'
      2 if juan.nil? or juan == 3
    end
  end
end

.linehead_to_s(linehead) ⇒ String

將行首資訊轉為引用格式

Examples:

CBETA.linehead_to_s('T85n2838_p1291a03')
# return "T85, no. 2838, p. 1291, a03"

Parameters:

  • linehead (String)

    行首資訊, 例如:T85n2838_p1291a03

Returns:

  • (String)

    引用格式的出處資訊,例如:T85, no. 2838, p. 1291, a03



157
158
159
160
161
162
# File 'lib/cbeta.rb', line 157

def self.linehead_to_s(linehead)
  linehead.match(/^((?:#{CANON})\d+)n(.*)_p(\d+)([a-z]\d+)$/) {
    return "#{$1}, no. #{$2}, p. #{$3}, #{$4}"
  }
  nil
end

.linehead_to_xml_file_path(linehead) ⇒ String

由 行首資訊 取得 XML檔相對路徑

Parameters:

  • linehead (String)

    行首資訊, 例如 “GA009n0008_p0003a01” ex: J36nB348_p0284c01

Returns:

  • (String)

    XML檔相對路徑,例如 “GA/GA009/GA009n0008.xml”



116
117
118
119
120
121
122
123
124
# File 'lib/cbeta.rb', line 116

def self.linehead_to_xml_file_path(linehead)
  # 經號: 四碼數字 + 英文字母 或如 嘉興藏 英文字母 + 三碼數字
  w = '(?:\d+[a-zA-Z]?|[AB]\d{3})'
  if m = linehead.match(/^(?<work>(?<vol>(?<canon>#{CANON})\d+)n#{w}).*$/)
    File.join(m[:canon], m[:vol], m[:work]+'.xml')
  else
    nil
  end
end

.normalize_vol(vol) ⇒ Object



164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
# File 'lib/cbeta.rb', line 164

def self.normalize_vol(vol)
  if vol.match(/^(#{CANON})(.*)$/)
    canon = $1
    vol = $2
  
    if VOL3.include? canon
      # 這些藏經的冊號是三碼
      vol_len = 3
    else
      vol_len = 2      
    end
    canon + vol.rjust(vol_len, '0')
  else
    abort "unknown vol format: #{vol}"
  end
end

.open_xml(fn) ⇒ Object



181
182
183
184
185
186
# File 'lib/cbeta.rb', line 181

def self.open_xml(fn)
  s = File.read(fn)
  doc = Nokogiri::XML(s)
  doc.remove_namespaces!()
  doc
end

.pua(gid) ⇒ Object

傳入 缺字碼,傳回 Unicode PUA 字元



189
190
191
192
193
194
195
196
197
# File 'lib/cbeta.rb', line 189

def self.pua(gid)
  if gid.start_with? 'SD'
    siddham_pua(gid)
  elsif gid.start_with? 'RJ'
    ranjana_pua(gid)
  else
    [0xf0000 + gid[2..-1].to_i].pack 'U'
  end
end

.ranjana_pua(gid) ⇒ Object

傳入 蘭札體 缺字碼,傳回 Unicode PUA 字元



200
201
202
203
# File 'lib/cbeta.rb', line 200

def self.ranjana_pua(gid)
  i = 0x100000 + gid[-4..-1].hex
  [i].pack("U")
end

.siddham_pua(gid) ⇒ Object

傳入 悉曇字 缺字碼,傳回 Unicode PUA 字元



206
207
208
209
# File 'lib/cbeta.rb', line 206

def self.siddham_pua(gid)
  i = 0xFA000 + gid[-4..-1].hex
  [i].pack("U")
end

.work_juan_vol_range(work, juan) ⇒ Object

如果 卷跨冊,回傳 冊號範圍



59
60
61
62
63
64
65
66
67
68
69
70
71
72
# File 'lib/cbeta.rb', line 59

def self.work_juan_vol_range(work, juan)
  case work
  when 'GA0037'
    (36..37) if juan == 2
  when 'L1557'
    case juan
    when 17 then (130..131)
    when 34 then (131..132)
    when 51 then (132..133)
    end
  when 'X0714'
    (39..40) if juan == 3
  end
end

Instance Method Details

#get_canon_abbr(id) ⇒ String

取得藏經略名

Examples:

cbeta = CBETA.new
cbeta.get_canon_abbr('T') # return "大"

Parameters:

  • id (String)

    藏經 ID, 例如大正藏的 ID 是 “T”

Returns:

  • (String)

    藏經短名,例如 “大”



259
260
261
262
263
# File 'lib/cbeta.rb', line 259

def get_canon_abbr(id)
   r = get_canon_symbol(id)
   return nil if r.nil?
   r.sub(/^【(.*?)】$/, '\1')
end

#get_canon_nickname(id) ⇒ String

Returns 藏經短名,例如 “大正藏”.

Parameters:

  • id (String)

    藏經 ID, 例如大正藏的 ID 是 “T”

Returns:

  • (String)

    藏經短名,例如 “大正藏”



233
234
235
236
# File 'lib/cbeta.rb', line 233

def get_canon_nickname(id)
  return nil unless @canon_nickname.key? id
  @canon_nickname[id]
end

#get_canon_symbol(id) ⇒ String

取得藏經略符

Examples:

cbeta = CBETA.new
cbeta.get_canon_symbol('T') # return "【大】"

Parameters:

  • id (String)

    藏經 ID, 例如大正藏的 ID 是 “T”

Returns:

  • (String)

    藏經略符,例如 “【大】”



246
247
248
249
# File 'lib/cbeta.rb', line 246

def get_canon_symbol(id)
  return nil unless @canon_abbr.key? id
  @canon_abbr[id]
end

#get_category(book_id) ⇒ String

傳入經號,取得部類

Examples:

cbeta = CBETA.new
cbeta.get_category('T0220') # return '般若部類'

Parameters:

  • book_id (String)

    Book ID (經號), ex. “T0220”

Returns:

  • (String)

    部類名稱,例如 “阿含部類”



272
273
274
# File 'lib/cbeta.rb', line 272

def get_category(book_id)
  @categories[book_id]
end