LSTSRV-L Archives

LISTSERV Site Administrators' Forum

LSTSRV-L

Options: Use Monospaced Font
Show Text Part by Default
Show All Mail Headers

Topic: [<< First] [< Prev] [Next >] [Last >>]

Print Reply
Jeffrey R Kell <JEFF@UTCVM>
Tue, 3 May 88 08:47:08 EDT
text/plain (19 lines)
There is also the NYSIIS (New York State Identification and Information
System) algorithm which was published some years ago in the proceedings
of the HP International Users Group.  I'm sure it is still language
sensitive, but it is more oriented toward phonetics rather than simple
consonants.  It is similar to Soundex but uses several 'substitution'
passes beforehand to do things like removing vowels (special cases for 'Y')
and duplicate consonants, altering 'CK'->'K', 'QU'->'K', etc.  The writeup
emphasized their algorithm was better than Soundex for Spanish surnames.
 
I have a name search routine in use here which uses an index file with
both a Soundex and NYSIIS key.  When a search key is entered, the
corresponding hash keys are calculated and the *intersection* of the two
is evaluated first (Soundex and NYSIIS match) followed by NYSIIS only and
finally Soundex only.  This gets you much closer to the target much faster.
It generally works well, but is practically worthless for Arabic names.
 
If it would be of any help, I can send the source, but it's written in
HP's SPL (sort of Algol-ish).

ATOM RSS1 RSS2