It seems that the corrupted X-DEL and X-LUPD jobs have stopped flowing, but for all we know there might be another outburst tomorrow. The cause of the problem on the FRMOP22-FRORS13-FRORS12 (and back) path has not yet been identified. Having still heard strictly nothing regarding the service level of RSCS on the FRORSxx machines, despite the fact that the EARN Office is located in the same building, and judging from the amount of postings on that topic I have seen recently on the EARN-NOG list, I am afraid I am NOT given the impression that this problem is being investigated with the priority that it deserves. Meanwhile, all the X-DEL problems are being blamed on LISTSERV (ie me), and if I hadn't lost my spool this morning I would have spent the whole day answering complaints. Since I have come to the unfortunate conclusion that we simply cannot trust the network to deliver files uncorrupted (nor can we rely on the willingness of staff at key sites to investigate such problems), I spent the day writing and testing code to make LISTSERV checksum the DISTRIBUTE jobs it is sending. This is admittedly preposterous - a high-level application written in an interpreted language checksumming its data because it cannot rely on a transport protocol as old and well understood as NJE. But I feel there is simply no other choice, as a number of sites have already "expressed their concern" over the network and CPU resources eaten by the last X-DEL storm. Next time it happens, they will start to be "worried about the continued existence of LISTSERV at UOFXYZ", and after a couple more times they will be "sad to report that management has decided the cost in manpower, system resources and membership dues of BITNET far outweigh the benefits of a direct connection". Having LISTSERV checksum DISTRIBUTE jobs will: 1. Pinpoint the link(s) causing the corruption, as each server on the DISTRIBUTE path will verify that the checksum is correct. 2. Ensure that problems such as the X-DEL storm cannot happen any longer, as jobs failing the CRC check will not be distributed. Even though there will always be servers that do not run the CRC code, having the "core" of the backbone check CRC's and discard corrupted jobs should considerably decrease the duplication factor and thus the overall impact of such jobs on the network. 3. Have a positive impact on sites which are starting to have second thoughts about the use of LISTSERV and BITNET in general. 4. Considerably reduce the amount of corrupted mail files shown to end users (but increase the amount of "lost" postings if postmasters do not edit and resubmit the jobs as they ought to). Corrupted mail files are really BAD press; lost files are common on the Internet, and not quite as visible. This may sound ridiculous, since I'd rather have my mail with some trash appended to it than no mail at all, but that's not the way the users react to corrupted vs lost mail. The cost is about 200 370 instructions per record being checksummed, ie some 50ms of CPU time on a 9370-60 for your average job, one tenth of that on a 3090 - plus 2 extra disk I/O's. That is much less than what the 1.7 performance improvements will save you on DISTRIBUTE jobs, but it's still a pity, especially if you are I/O constrained. This is a significant change, since it can potentially reject massive amounts of jobs. It must be beta-tested carefully, and such beta-testing will require more time than usual. No data will be lost - if the job is rejected, it ends up in your reader and you can resubmit it after removing the '//CRC DD' card. But you may well end up having to do that for hundreds of jobs (I hope it won't happen, I've done all I could to test it locally, but we won't know until we try on a larger scale). It is equally important to check that the checksums are propagated (but ignored) by servers not running with the CRC code - that means not only 1.6e, but 1.5o and LISTEARN, which each have a different DISTRIBUTE processor. Finally, you must be ready to revert to the old code in case of major job rejection, especially as I will be in Copenhagen from thursday (910613) to the following friday (910620) and available only during the evening (there is a terminal room on the conference site). Still, the problem is pressing and I would like to begin testing as soon as possible; ideally, I ought to be able to take care of any "obvious" problem this evening or tomorrow, and one can hope that everything else should work fine and I could safely proceed to release 1.7 when I'm back from Copenhagen. Once the code is running on the major hub sites, X-DEL filters can be pulled out and the LISTSERV network (not to mention my mailbox) can resume its quiet everyday mode of operation. Thanks in advance for your cooperation. Eric