|
Sender: |
Revised LISTSERV forum <LSTSRV-L@DEARN> |
Subject: |
|
From: |
"Eric Thomas (CERN)" <ERIC@CEARN> |
Date: |
Sun, 1 May 88 20:45:00 GVA |
Reply-To: |
Revised LISTSERV forum <LSTSRV-L@DEARN> |
As I had promised in the UDS BOF in Cesme, I have implemented a phonetic
search routine in the LISTSERV database functions. I have used the 'SOUNDX'
algorithm, which is used in a lot of commercial database systems. The
implementation was based on a FORTRAN program that Peter Flynn sent me
(thanks!). I would like to state that I do not consider myself responsible for
the behaviour of this algorithm. I strongly believe that there is NO algorithm
in the world that can perform decent phonetic searches compatible with the 22
languages that are presently being used on EARN. You will appreciate, for
example, that my name (THOMAS) does not 'sound like' the way it is pronounced
(TOMA). Also, SPOOL does not sound like POOL, and POOL does not sound like
PULL (it does sound like PILULE, though :-) ). I could spend days listing
similar examples.
The implementation has been done through a pair of new operators, SOUNDS LIKE
and DOES NOT SOUND LIKE (the latter being there for purely aesthetical
reasons). They work in the same way as, for example, CONTAINS and DOES NOT
CONTAIN. You could therefore do:
Select * in BITEARN where SITE sounds like HEKHOLL
(Note that HEKHOLL does sound like ECOLE :-) )
The actual implementation of SOUNDS LIKE is a bit more subtle than for the
regular operators:
- If the search parameter you specified contains more than one word, a 'dumb'
phonetic comparison takes place on the two strings. Note that it is
generally not a good idea to try to match long strings phonetically, as the
SOUNDX algorithm will only compare the first 4 "phonems". That is, the
strings will "sound alike" if, and only if, the first few characters do.
- If you specified only one word, it is phonetically compared to all the words
in the 'source' string, and you have a hit if it matches any of them. In
other words, that is a "contains something that sounds like" operation.
Eric
|
|
|