Here is how searching and indexing works.

 

  1. LISTSERV Classic:
    1. New DBRINDEX files are created in Classic format.
    2. Words whose index record exceeds 4096 bytes are overflowed.
    3. DBRINDEX files imported from an HPO server are supported, but records longer than 4096 bytes are overflowed.
    4. If there are 100,000 matches…

                                                      i.     LISTSERV fully processes and prepares 100,000 match results, in the requested sort order.

                                                    ii.     LISTSERV returns the first 100 to WA (I think this used to be 50).

                                                   iii.     WA returns 100 matches to browser, with previous/next navigation.

  1. LISTSERV HPO:
    1. New DBRINDEX files are created in HPO format.
    2. The threshold at which words are overflowed adapts to the size of the archives, and is never less than 4096 bytes.
    3. DBRINDEX files in Classic format are supported, but the overflow threshold is pinned at 4096 bytes (there is no way to “bring back” a word that has been overflowed by Classic). You must reindex to gain the benefits of the adaptive overflow algorithm. Reindexing all lists could take over an hour on a large server, so it is not done automatically when installing an HPO LAK.
    4. If there are 100,000 matches…

                                                      i.     If possible (this depends on the search), LISTSERV fully processes and prepares only the 100 match results that it is going to return. These are the 100 items that would come on top if LISTSERV were to process all 100,000 matches in the requested sort order.

                                                    ii.     LISTSERV returns 100 matches to WA.

                                                   iii.     WA returns 100 matches to browser, with previous/next navigation.

  1. With either version:
    1. Overflowed search operands initially match every message in the archive.
    2. A search containing only overflowed words is rejected with the message, “Your search contains only "overflow words" - words that occur so frequently that they are not indexed. Please refine your search.” There was a bug between 2013-11-18 and 2017-03-09 causing searches with a mix of overflow and non-overflow words to be incorrectly rejected in some circumstances.
    3. Every matching message is read from the archive and post-processed to confirm that it actually contains the overflow words; messages that do not are removed from the search results.

 

This is for 16.5 and 16.0/2017a. Older versions handled ordering differently.

 

Sites like Amazon have gotten people used to searching for just “knife” and finding the nakiri knife they always dreamed of among the first 5 matches because matches are sorted based on past purchases, browsing history, and a host of other private data that we don’t even know about, but that is very effective in predicting what we want to see. LISTSERV collects no such data, so if you could search for “email” in LSTSRV-L, the message you are looking for would be very unlikely to be among the first 5. LISTSERV returns the equivalent of 10 Google pages’ worth of results and you can paginate for more but, if you value your time, you just have to narrow your search.

 

Anyway, there is nothing in the HPO search function that is more limiting than in Classic.

 

  Eric



To unsubscribe from the LSTSRV-L list, click the following link:
http://peach.ease.lsoft.com/scripts/wa-PEACH.exe?SUBED1=LSTSRV-L&A=1