LSTSRV-L Archives

LISTSERV Site Administrators' Forum

LSTSRV-L

Options: Use Monospaced Font
Show Text Part by Default
Show All Mail Headers

Topic: [<< First] [< Prev] [Next >] [Last >>]

Print Reply
"Eric Thomas (CERN)" <ERIC@CEARN>
Sun, 1 May 88 20:45:00 GVA
text/plain (37 lines)
As I  had promised  in the  UDS BOF in  Cesme, I  have implemented  a phonetic
search routine  in the LISTSERV database  functions. I have used  the 'SOUNDX'
algorithm,  which  is used  in  a  lot  of  commercial database  systems.  The
implementation  was based  on  a  FORTRAN program  that  Peter  Flynn sent  me
(thanks!). I would like to state that I do not consider myself responsible for
the behaviour of this algorithm. I strongly believe that there is NO algorithm
in the world that can perform  decent phonetic searches compatible with the 22
languages that  are presently  being used  on EARN.  You will  appreciate, for
example, that my name (THOMAS) does not  'sound like' the way it is pronounced
(TOMA). Also,  SPOOL does not  sound like POOL, and  POOL does not  sound like
PULL (it  does sound  like PILULE, though  :-) ). I  could spend  days listing
similar examples.
 
The implementation has been done through  a pair of new operators, SOUNDS LIKE
and  DOES NOT  SOUND  LIKE  (the latter  being  there  for purely  aesthetical
reasons). They  work in the  same way as, for  example, CONTAINS and  DOES NOT
CONTAIN. You could therefore do:
 
          Select * in BITEARN where SITE sounds like HEKHOLL
 
(Note that HEKHOLL does sound like ECOLE :-) )
 
The actual  implementation of SOUNDS  LIKE is a bit  more subtle than  for the
regular operators:
 
- If the search parameter you specified  contains more than one word, a 'dumb'
  phonetic  comparison  takes place  on  the  two  strings.  Note that  it  is
  generally not a good idea to try  to match long strings phonetically, as the
  SOUNDX  algorithm will  only compare  the first  4 "phonems".  That is,  the
  strings will "sound alike" if, and only if, the first few characters do.
 
- If you specified only one word, it is phonetically compared to all the words
  in the  'source' string, and you  have a hit if  it matches any of  them. In
  other words, that is a "contains something that sounds like" operation.
 
Eric

ATOM RSS1 RSS2