Class: Pho::Analyzers
- Inherits:
-
Object
- Object
- Pho::Analyzers
- Defined in:
- lib/pho/field_predicate_map.rb
Overview
Declares URI constants for the various text analyzers supported by the Talis Platform
Analyzers are configured to operate on specific DataTypePropertys using the FieldPredicateMap
Constant Summary collapse
- STANDARD =
A standard English analyzer and the default if no analyzer is specified. Words are split on punctuation characters, removing the punctuation. Words containing a dot are not split. Words containing both hyphens and numbers are not split. Email addresses and hostnames are not split. Stop words are removed. Searches on fields with this type of analyzer are case insensitive.
The following words are considered to be stop words and will not be indexed: a, an, and, are, as, at, be, but, by, for, if, in, into, is, it, no, not, of, on, or, such, that, the, their, then, there, these, they, this, to, was, will, with
"http://schemas.talis.com/2007/bigfoot/analyzers#standard-en".freeze
- GREEK =
A standard Greek language analyzer. Words are split on punctuation characters, removing the punctuation. Words containing a dot are not split. Words containing both hyphens and numbers are not split. Email addresses and hostnames are not split. Stop words are removed. Searches on fields with this type of analyzer are case insensitive.
"http://schemas.talis.com/2007/bigfoot/analyzers#standard-el".freeze
- GERMAN =
The following words are considered to be stop words and will not be indexed: einer, eine, eines, einem, einen, der, die, das, dass, daß, du, er, sie, es, was, wer, wie, wir, und, oder, ohne, mit, am, im, in, aus, auf, ist, sein, war, wird, ihr, ihre, ihres, als, für, von, mit, dich, dir, mich, mir, mein, sein, kein, durch, wegen, wird
"http://schemas.talis.com/2007/bigfoot/analyzers#standard-de".freeze
- FRENCH =
A standard French language analyzer. Words are split on punctuation characters, removing the punctuation. Words containing a dot are not split. Words containing both hyphens and numbers are not split. Email addresses and hostnames are not split. Stop words are removed and any remaining words are stemmed. Searches on fields with this type of analyzer are case insensitive.
The following words are considered to be stop words and will not be indexed: a, afin, ai, ainsi, après, attendu, au, aujourd, auquel, aussi, autre, autres, aux, auxquelles, auxquels, avait, avant, avec, avoir, c, car, ce, ceci, cela, celle, celles, celui, cependant, certain, certaine, certaines, certains, ces, cet, cette, ceux, chez, ci, combien, comme, comment, concernant, contre, d, dans, de, debout, dedans, dehors, delà, depuis, derrière, des, désormais, desquelles, desquels, dessous, dessus, devant, devers, devra, divers, diverse, diverses, doit, donc, dont, du, duquel, durant, dès, elle, elles, en, entre, environ, est, et, etc, etre, eu, eux, excepté, hormis, hors, hélas, hui, il, ils, j, je, jusqu, jusque, l, la, laquelle, le, lequel, les, lesquelles, lesquels, leur, leurs, lorsque, lui, là, ma, mais, malgré, me, merci, mes, mien, mienne, miennes, miens, moi, moins, mon, moyennant, même, mêmes, n, ne, ni, non, nos, notre, nous, néanmoins, nôtre, nôtres, on, ont, ou, outre, où, par, parmi, partant, pas, passé, pendant, plein, plus, plusieurs, pour, pourquoi, proche, près, puisque, qu, quand, que, quel, quelle, quelles, quels, qui, quoi, quoique, revoici, revoilà, s, sa, sans, sauf, se, selon, seront, ses, si, sien, sienne, siennes, siens, sinon, soi, soit, son, sont, sous, suivant, sur, ta, te, tes, tien, tienne, tiennes, tiens, toi, ton, tous, tout, toute, toutes, tu, un, une, va, vers, voici, voilà, vos, votre, vous, vu, vôtre, vôtres, y, à, ça, ès, été, être, ô.
"http://schemas.talis.com/2007/bigfoot/analyzers#standard-fr".freeze
- CJK =
A standard CJK language analyzer.
"http://schemas.talis.com/2007/bigfoot/analyzers#standard-cjk".freeze
- DUTCH =
The following words are considered to be stop words and will not be indexed: de, en, van, ik, te, dat, die, in, een, hij, het, niet, zijn, is, was, op, aan, met, als, voor, had, er, maar, om, hem, dan, zou, of, wat, mijn, men, dit, zo, door, over, ze, zich, bij, ook, tot, je, mij, uit, der, daar, haar, naar, heb, hoe, heeft, hebben, deze, u, want, nog, zal, me, zij, nu, ge, geen, omdat, iets, worden, toch, al, waren, veel, meer, doen, toen, moet, ben, zonder, kan, hun, dus, alles, onder, ja, eens, hier, wie, werd, altijd, doch, wordt, wezen, kunnen, ons, zelf, tegen, na, reeds, wil, kon, niets, uw, iemand, geweest, andere
"http://schemas.talis.com/2007/bigfoot/analyzers#standard-nl".freeze
- CHINESE =
"http://schemas.talis.com/2007/bigfoot/analyzers#standard-cn".freeze
- KEYWORD =
This analyzer does not split the field at all. The entire value of the field is indexed as a single token.
"http://schemas.talis.com/2007/bigfoot/analyzers#keyword".freeze
- NO_STOP_WORD_STANDARD =
English analyzer without stop words. This is identical to the standard English analyzer but all words are indexed.
"http://schemas.talis.com/2007/bigfoot/analyzers#nostop-en".freeze
- NORMALISE_STANDARD =
English analyzer without stop words and with accent support. This is identical to the standard English analyzer but all words are indexed plus any accented characters in the ISO Latin 1 character set are replaced by their unaccented equivalent See API documentation at n2.talis.com/wiki/Field_Predicate_Map for details of replacements
"http://schemas.talis.com/2007/bigfoot/analyzers#norm-en".freeze
- PORTER_NORMALIZE_STANDARD =
English analyzer with porter stemming, case normalization, latin 1 normalization, and stop words removal
"http://schemas.talis.com/2007/bigfoot/analyzers#porter-norm-en".freeze
- PORTER_NO_STOP_WORD_STANDARD =
English analyzer with porter stemming, case normalization and latin 1 normalization.
"http://schemas.talis.com/2007/bigfoot/analyzers#porter-nostop-norm-en".freeze