LISTSERV - LSTSRV-L Archives - COMMUNITY.EMAILOGY.COM

I asked a few of my friends about the SOUNDEX issues that were brought
up on this list a while ago, and got the following interesting answers.
I deleted all headers and greetings and stuff in the interests of
brevity, but can provide followup information for anyone who is really
interested.
 
Michael
 
 ---------------------------- Text of forwarded message -----------------------
 
... I don't think it is correct to attribute it to Knuth ... I read
somewhere that the algorithm was devised by a monk in the 16th
century.
 
 ---------------------------- Text of forwarded message -----------------------
 
Although you specifically asked for NON soundex routines, I sent you a
copy of a REXX version that I believe is fairly tailorable as to the
'values' of each letter.   It seems to me that if you use a different
pattern of values for each language (assuming the language is known),
then you can 'stress the importance' of different groups of letters.
 
So if you let
alphabet ="ABCDEFGHIJKLMNOPQRSTUVWXYZ ", and for English, let
alphaval ="01230120022455012673010702".  and for French some other
pattern (say with the 'M' and 'N' more differentiated) perhaps you
can get to where you are going.
 
 ---------------------------- Text of forwarded message -----------------------
 
I used to do some work with soundex in record linkage work.  Seems to
me, it predates Knuth.  I found the application rules unnecessarily
complicated and inconsistent.  The methods i used in the end were
quite interesting, using actual entropic weights for the identifiers.
 
The note from who-ever talking about first 7 voiced consonants made
me laugh -- neither Maaori nor Chinese have too many of those :-)
 
Which 22 languages should be covered?  Is the input sound or text?
Is _name_ really all you have?  what are the costs of false + vs
false -?  is a unique answer required, or would it be adequate to
produce the n most likely hits for human selection?