Module: CRFPP
- Defined in:
- lib/crfpp/data.rb,
lib/crfpp/macro.rb,
lib/crfpp/model.rb,
lib/crfpp/token.rb,
lib/crfpp/errors.rb,
lib/crfpp/feature.rb,
lib/crfpp/version.rb,
lib/crfpp/filelike.rb,
lib/crfpp/template.rb,
lib/crfpp/utilities.rb
Defined Under Namespace
Modules: Filelike Classes: Data, Error, Feature, Macro, Model, NativeError, Template, Token
Constant Summary collapse
- VERSION =
'0.0.4'.freeze
Class Method Summary collapse
-
.learn(template, data, options = {}) ⇒ Object
Creates a new Model based on a template and training data.
-
.train ⇒ Object
Creates a new Model based on a template and training data.
Class Method Details
.learn(template, data, options = {}) ⇒ Object
Creates a new Model based on a template and training data.
:threads: False or the number of threads to us (default is 2).
:algorithm: L1 or L2 (default)
:cost: With this option, you can change the hyper-parameter for the CRFs.
With larger C value, CRF tends to overfit to the give training
corpus. This parameter trades the balance between overfitting and
underfitting. The results will significantly be influenced by this
parameter. You can find an optimal value by using held-out data or
more general model selection method such as cross validation.
:frequency: This parameter sets the cut-off threshold for the features. CRF++
uses the features that occurs no less than NUM times in the given training
data. The default value is 1. When you apply CRF++ to large data, the
number of unique features would amount to several millions. This option is
useful in such cases.
23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 |
# File 'lib/crfpp/utilities.rb', line 23 def learn(template, data, = {}) = { :threads => 2, :algorithm => :L2, :cost => 1.0, :frequency => 1}.merge() model = Model.new arguments = [] # TODO check algorithm names # arguments << "--algorithm=#{options[:algorithm]}" arguments << "--cost=#{[:cost]}" arguments << "--thread=#{[:threads]}" arguments << "--freq=#{[:frequency]}" arguments << (template.respond_to?(:path) ? template.path : template) arguments << (data.respond_to?(:path) ? data.path : data) arguments << model.path Native.learn(arguments.join(' ')) model rescue => error raise NativeError, error. end |
.train ⇒ Object
Creates a new Model based on a template and training data.
:threads: False or the number of threads to us (default is 2).
:algorithm: L1 or L2 (default)
:cost: With this option, you can change the hyper-parameter for the CRFs.
With larger C value, CRF tends to overfit to the give training
corpus. This parameter trades the balance between overfitting and
underfitting. The results will significantly be influenced by this
parameter. You can find an optimal value by using held-out data or
more general model selection method such as cross validation.
:frequency: This parameter sets the cut-off threshold for the features. CRF++
uses the features that occurs no less than NUM times in the given training
data. The default value is 1. When you apply CRF++ to large data, the
number of unique features would amount to several millions. This option is
useful in such cases.
47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 |
# File 'lib/crfpp/utilities.rb', line 47 def learn(template, data, = {}) = { :threads => 2, :algorithm => :L2, :cost => 1.0, :frequency => 1}.merge() model = Model.new arguments = [] # TODO check algorithm names # arguments << "--algorithm=#{options[:algorithm]}" arguments << "--cost=#{[:cost]}" arguments << "--thread=#{[:threads]}" arguments << "--freq=#{[:frequency]}" arguments << (template.respond_to?(:path) ? template.path : template) arguments << (data.respond_to?(:path) ? data.path : data) arguments << model.path Native.learn(arguments.join(' ')) model rescue => error raise NativeError, error. end |