LSTSRV-L Archives

LISTSERV Site Administrators' Forum

LSTSRV-L

Options: Use Monospaced Font
Show Text Part by Default
Show All Mail Headers

Topic: [<< First] [< Prev] [Next >] [Last >>]

Print Reply
Michael Wagner +49 228 8199645 <WAGNER@DBNGMD21>
Tue, 10 May 88 16:29:00 LCL
text/plain (43 lines)
I asked a few of my friends about the SOUNDEX issues that were brought
up on this list a while ago, and got the following interesting answers.
I deleted all headers and greetings and stuff in the interests of
brevity, but can provide followup information for anyone who is really
interested.
 
Michael
 
 ---------------------------- Text of forwarded message -----------------------
 
... I don't think it is correct to attribute it to Knuth ... I read
somewhere that the algorithm was devised by a monk in the 16th
century.
 
 ---------------------------- Text of forwarded message -----------------------
 
Although you specifically asked for NON soundex routines, I sent you a
copy of a REXX version that I believe is fairly tailorable as to the
'values' of each letter.   It seems to me that if you use a different
pattern of values for each language (assuming the language is known),
then you can 'stress the importance' of different groups of letters.
 
So if you let
alphabet ="ABCDEFGHIJKLMNOPQRSTUVWXYZ ", and for English, let
alphaval ="01230120022455012673010702".  and for French some other
pattern (say with the 'M' and 'N' more differentiated) perhaps you
can get to where you are going.
 
 ---------------------------- Text of forwarded message -----------------------
 
I used to do some work with soundex in record linkage work.  Seems to
me, it predates Knuth.  I found the application rules unnecessarily
complicated and inconsistent.  The methods i used in the end were
quite interesting, using actual entropic weights for the identifiers.
 
The note from who-ever talking about first 7 voiced consonants made
me laugh -- neither Maaori nor Chinese have too many of those :-)
 
Which 22 languages should be covered?  Is the input sound or text?
Is _name_ really all you have?  what are the costs of false + vs
false -?  is a unique answer required, or would it be adequate to
produce the n most likely hits for human selection?

ATOM RSS1 RSS2