Michael Wagner +49 228 8199645 <WAGNER@DBNGMD21>
Tue, 10 May 88 16:29:00 LCL
|
I asked a few of my friends about the SOUNDEX issues that were brought
up on this list a while ago, and got the following interesting answers.
I deleted all headers and greetings and stuff in the interests of
brevity, but can provide followup information for anyone who is really
interested.
Michael
---------------------------- Text of forwarded message -----------------------
... I don't think it is correct to attribute it to Knuth ... I read
somewhere that the algorithm was devised by a monk in the 16th
century.
---------------------------- Text of forwarded message -----------------------
Although you specifically asked for NON soundex routines, I sent you a
copy of a REXX version that I believe is fairly tailorable as to the
'values' of each letter. It seems to me that if you use a different
pattern of values for each language (assuming the language is known),
then you can 'stress the importance' of different groups of letters.
So if you let
alphabet ="ABCDEFGHIJKLMNOPQRSTUVWXYZ ", and for English, let
alphaval ="01230120022455012673010702". and for French some other
pattern (say with the 'M' and 'N' more differentiated) perhaps you
can get to where you are going.
---------------------------- Text of forwarded message -----------------------
I used to do some work with soundex in record linkage work. Seems to
me, it predates Knuth. I found the application rules unnecessarily
complicated and inconsistent. The methods i used in the end were
quite interesting, using actual entropic weights for the identifiers.
The note from who-ever talking about first 7 voiced consonants made
me laugh -- neither Maaori nor Chinese have too many of those :-)
Which 22 languages should be covered? Is the input sound or text?
Is _name_ really all you have? what are the costs of false + vs
false -? is a unique answer required, or would it be adequate to
produce the n most likely hits for human selection?
|
|
|