Spanish Stem Gem

Description

This gem reduces Spanish words to their respective roots. It uses an algorithm based on Martin Porter’s specifications.

For more information, visit: snowball.tartarus.org/algorithms/spanish/stemmer.html

Descripción

Esta gema está para reducir las palabras del Español en sus respectivas raíces, para ello ultiliza un algoritmo basado en las especificaciones de Martin Porter

Para más información, visite: snowball.tartarus.org/algorithms/spanish/stemmer.html

Install – Instalar

$ sudo gem install estem

or

$ gem install estem

Usage

As a reminder, take in consideration that the Spanish language has several non US-ASCII characters, and because of that, the same data may varied from one codeset to another.

Please remember to use a UTF-8 compatible encoding while using EStem. Please do not use String#force_encoding to convert from one codeset to another, you may try using String#encode alone but, instead, consider using String#safe_es_stem when handling incompatibles codesets or the codeset type varies.

require 'estem'

puts "albergues".es_stem      # ==> "alberg"
puts "habitaciones".es_stem   # ==> "habit"

# EStem will never make unnecessary changes to your input data.
puts "ALbeRGues".es_stem      # ==> "ALbeRG"
puts "HaBiTaCiOnEs".es_stem   # ==> "HaBiT"
puts "Hacinamiento".es_stem   # ==> "Hacin"

Uso

Como recordatorio, ten en cosideración que el Castellano posee muchos carácteres que están fuera del código ASCII, y por esta razón, los datos pueden variar de un conjunto de codificación a otro.

Por favor recuerda utilizar sistemas de condificación compatibles con UTF-8 cuando se trabaje con EStem. Por favor no use String#force_encoding para convertir de un conjunto de codificación a otro, podría utilizar String#encode solo, pero en su lugar, considere utilizar String#safe_es_stem() si está manejando conjuntos de codificación incompatibles o se desconoce el tipo.

require 'estem'

puts "albergues".es_stem      # ==> "alberg"
puts "habitaciones".es_stem   # ==> "habit"

# EStem nunca hará cambios innecesarios a tus datos.
puts "ALbeRGues".es_stem      # ==> "ALbeRG"
puts "HaBiTaCiOnEs".es_stem   # ==> "HaBiT"
puts "Hacinamiento".es_stem   # ==> "Hacin"

Test

This test is based on the sample input and output text from Martin Porter website. It includes 28390 test words and their expected stem results. To run the test, just type:

rake test

Pruebas

Esta prueba está basada en un archivo de prueba provisto por Martin Porter. Incluye 28390 palabras de prueba con sus resultado esperados. Para realizar la prueba, ejecuta:

rake test

License – Licencia

Copyright © 2012 Manuel A. Güílamo

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the “Software”), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED “AS IS”, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.