Class: JLDrill::DataFile
- Inherits:
-
Object
- Object
- JLDrill::DataFile
- Defined in:
- lib/jldrill/model/DataFile.rb
Overview
This class represents a data file in JLDrill. This is an abstract class meant to define the interface for having a file which can be read in the background in JLDrill.
Direct Known Subclasses
DeinflectionRulesFile, Dictionary, KanaFile, KanjiFile, Quiz, RadicalFile, Tanaka::Reference, Tatoeba::ChineseIndexFile, Tatoeba::JapaneseIndexFile, Tatoeba::LinkFile, Tatoeba::SentenceFile
Instance Attribute Summary collapse
-
#encoding ⇒ Object
readonly
Returns the value of attribute encoding.
-
#file ⇒ Object
Returns the value of attribute file.
-
#lines ⇒ Object
Returns the value of attribute lines.
-
#parsed ⇒ Object
readonly
Returns the value of attribute parsed.
-
#publisher ⇒ Object
readonly
Returns the value of attribute publisher.
-
#stepSize ⇒ Object
Returns the value of attribute stepSize.
Instance Method Summary collapse
-
#createLines(buffer) ⇒ Object
Make sure the encoding is correct and split the lines.
-
#dataSize ⇒ Object
Returns the number of items you have created.
-
#eof? ⇒ Boolean
Returns true if there is no more data to parse.
-
#findEncoding(buffer) ⇒ Object
Try to determine the encoding from the first 999 characters of the string.
-
#finishParsing ⇒ Object
Usually we want to delete the original source lines when we are finished parsing.
-
#fraction ⇒ Object
Returns a float showing the percentage of the file that has been parsed so far.
-
#initialize ⇒ DataFile
constructor
A new instance of DataFile.
-
#load(file) ⇒ Object
Load in the file data, but don’t parse it yet.
-
#loaded? ⇒ Boolean
Returns true if the we have completed parsing a file.
-
#parse ⇒ Object
Parse the entire file all at once.
-
#parseChunk(size) ⇒ Object
Parse a chunk of the file.
-
#parseEntry ⇒ Object
Parses one entry from the lines.
-
#parser ⇒ Object
Returns a reference to the object that can parse a line.
-
#readLines ⇒ Object
Read the file into memory.
-
#reset ⇒ Object
Resets the file.
-
#setLoaded(bool) ⇒ Object
Indicate to the outside world that the file is loaded.
-
#shortFilename ⇒ Object
Returns the filename without the path.
Constructor Details
Instance Attribute Details
#encoding ⇒ Object (readonly)
Returns the value of attribute encoding.
11 12 13 |
# File 'lib/jldrill/model/DataFile.rb', line 11 def encoding @encoding end |
#file ⇒ Object
Returns the value of attribute file.
11 12 13 |
# File 'lib/jldrill/model/DataFile.rb', line 11 def file @file end |
#lines ⇒ Object
Returns the value of attribute lines.
11 12 13 |
# File 'lib/jldrill/model/DataFile.rb', line 11 def lines @lines end |
#parsed ⇒ Object (readonly)
Returns the value of attribute parsed.
11 12 13 |
# File 'lib/jldrill/model/DataFile.rb', line 11 def parsed @parsed end |
#publisher ⇒ Object (readonly)
Returns the value of attribute publisher.
11 12 13 |
# File 'lib/jldrill/model/DataFile.rb', line 11 def publisher @publisher end |
#stepSize ⇒ Object
Returns the value of attribute stepSize.
11 12 13 |
# File 'lib/jldrill/model/DataFile.rb', line 11 def stepSize @stepSize end |
Instance Method Details
#createLines(buffer) ⇒ Object
Make sure the encoding is correct and split the lines
90 91 92 93 94 95 96 |
# File 'lib/jldrill/model/DataFile.rb', line 90 def createLines(buffer) @encoding = findEncoding(buffer) if (@encoding != Kconv::UTF8) buffer = Kconv.kconv(buffer, Kconv::UTF8, @encoding) end @lines = buffer.split("\n") end |
#dataSize ⇒ Object
Returns the number of items you have created
23 24 25 |
# File 'lib/jldrill/model/DataFile.rb', line 23 def dataSize # Please implement this in the concrete class end |
#eof? ⇒ Boolean
Returns true if there is no more data to parse
59 60 61 |
# File 'lib/jldrill/model/DataFile.rb', line 59 def eof? return @parsed >= @lines.size end |
#findEncoding(buffer) ⇒ Object
Try to determine the encoding from the first 999 characters of the string. By keeping it to a multiple of 3 we avoid splitting the encodings for UTF8 strings. I can’t help but think that this function is prone to failure since UTF8 characters are variable length, but I can’t think of a better idea. Note this problem will only manifest itself on ruby 1.8
84 85 86 87 |
# File 'lib/jldrill/model/DataFile.rb', line 84 def findEncoding(buffer) encoding = Kconv.guess(buffer[0..998]) return encoding end |
#finishParsing ⇒ Object
Usually we want to delete the original source lines when we are finished parsing. But some files are only partially parsed on reading (like Edict). Please redefine this if you want to keep the source lines around for some reason.
160 161 162 163 164 |
# File 'lib/jldrill/model/DataFile.rb', line 160 def finishParsing @lines = [] @parsed = 0 setLoaded(true) end |
#fraction ⇒ Object
Returns a float showing the percentage of the file that has been parsed so far.
70 71 72 73 74 75 76 |
# File 'lib/jldrill/model/DataFile.rb', line 70 def fraction retVal = 0.0 if @lines.size != 0 retVal = @parsed.to_f / @lines.size.to_f end return retVal end |
#load(file) ⇒ Object
Load in the file data, but don’t parse it yet
112 113 114 115 116 |
# File 'lib/jldrill/model/DataFile.rb', line 112 def load(file) reset @file = file readLines end |
#loaded? ⇒ Boolean
Returns true if the we have completed parsing a file
64 65 66 |
# File 'lib/jldrill/model/DataFile.rb', line 64 def loaded? return eof? && (dataSize > 0) end |
#parse ⇒ Object
Parse the entire file all at once
119 120 121 |
# File 'lib/jldrill/model/DataFile.rb', line 119 def parse parseChunk(@lines.size) end |
#parseChunk(size) ⇒ Object
Parse a chunk of the file. Size shows how many entries to parse
133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 |
# File 'lib/jldrill/model/DataFile.rb', line 133 def parseChunk(size) # We don't want to get updated when we parse a large block of data @publisher.block last = @parsed + size if last > @lines.size last = @lines.size end while @parsed < last do parseEntry end @publisher.unblock # If the parsing is finished dispose of the unparsed lines finished = self.eof? if finished finishParsing end return finished end |
#parseEntry ⇒ Object
Parses one entry from the lines. The default parses a single line from the lines. You can override this for files whose entries span more than one line.
126 127 128 129 |
# File 'lib/jldrill/model/DataFile.rb', line 126 def parseEntry parser.parse(@lines[@parsed]) @parsed += 1 end |
#parser ⇒ Object
Returns a reference to the object that can parse a line
28 29 30 31 32 |
# File 'lib/jldrill/model/DataFile.rb', line 28 def parser # Please implement this in the concrete class unless # you modify the parseEntry() method to directly access # the parser. end |
#readLines ⇒ Object
Read the file into memory. This is done before parsing
99 100 101 102 103 104 105 106 107 108 109 |
# File 'lib/jldrill/model/DataFile.rb', line 99 def readLines begin buffer = IO.read(@file) rescue Context::Log::warning("JLDrill::DataFile", "Could not load #{@file}.") buffer = "" end createLines(buffer) @parsed = 0 end |
#reset ⇒ Object
Resets the file
35 36 37 38 39 40 41 42 |
# File 'lib/jldrill/model/DataFile.rb', line 35 def reset @file = "" @lines = [] @parsed = 0 setLoaded(false) # Please define the rest of the method and call super() # at the end. end |
#setLoaded(bool) ⇒ Object
Indicate to the outside world that the file is loaded
52 53 54 55 56 |
# File 'lib/jldrill/model/DataFile.rb', line 52 def setLoaded(bool) if bool @publisher.update("loaded") end end |
#shortFilename ⇒ Object
Returns the filename without the path
167 168 169 170 171 172 |
# File 'lib/jldrill/model/DataFile.rb', line 167 def shortFilename if @file.nil? || @file.empty? return "No name" end return File.basename(file) end |