A serious problem exists when a peered list has a mix of 1.8 and 1.7 peers, and the 1.8 peers are configured to use Internet addresses ("List-Address= FQDN", or equivalent default). SIGNOFF/DELETE (GLOBAL commands may cause an infinite loop, which will furthermore fill up the A-disk of all the peers involved (given enough time). Eventually the servers will crash and will not be able to restart without manual intervention. ************************** * How to restore service * ************************** If this happens to you, DO NOT DELETE PERMVARS FILE! Use XEDIT or PIPE to eliminate all the lines containing the string X-REQ (ALL/X-REQ followed by DEL * under XEDIT). PERMVARS FILE contains important information, and should be treated as a LIST or FILELIST file, not as LISTSERV NETLOG. Once you have returned PERMVARS FILE to its normal size, you will be able to start LISTSERV, but the problem will remain until you identify and remove the looping jobs. They will be in mail files called LISTSERV MAIL and coming from the servers whose hostnames were in the X-REQ entries in PERMVARS FILE. The job name will be the same as the name of the list in question. This may or may not be sufficient to identify the files. There is only one job to kill per 1.7 peer (jobs from 1.8 peers don't need to be killed). If you can't identify the jobs, or if you are not sure, remove all the peer subscriptions from the list in question temporarily. This will stop the loop. ************************* * Permanent restriction * ************************* There are three ways to run a peered list with both 1.7 and 1.8 peers: 1. All 1.8 peers use "List-address= NJE", and all peer subscriptions (the subscriptions with the name "Peer distribution list") are in BITNET form. This produces the behaviour the 1.7 servers expect (the only behaviour they supported) and there is no problem. But, of course, the 1.8 peers cannot take advantage of the "List-address=" support added in 1.8a and continue to identify themselves under their BITNET address. 2. The 1.8 peers use "List-address= FQDN", and all peer subscriptions are in BITNET form. In that case there is no risk of loop, but subscribers will receive an error message about duplicate postings every time they post to the list. 3. The 1.8 peers use "List-address= FQDN", and the corresponding peer subscriptions are changed to their Internet form. This solves the problem mentioned above, but exposes the list to the loop. Option 1 is fully supported. Option 2 is supported, but not recommended. Option 3 is not supported. Note that servers running the base level of 1.8a may experience the symptoms described in option 2. However, they do not suffer from the loop problem. The message about duplicate postings does not necessarily mean the loop problem is present. ********************** * Can't it be fixed? * ********************** The problem cannot be fixed by a change to version 1.8. It is caused by the algorithm version 1.7 uses to forward certain requests. When the request comes back to the 1.8a server, it cannot tell whether it is due to the 1.7 restriction, or caused by a new command from a user, because the 1.7 server assigns a new ID. The only way to solve this problem in 1.8a would be to disable command forwarding completely. There are three general strategies to bypass this restriction: 1. Upgrade all participating servers to 1.8. 2. Remove all 1.7 peers. 3. Run the list with "List-address= NJE" until all peers migrate to 1.8. L-Soft was not aware of this problem until it hit UBVM tonight. This is the reason why it was not mentioned in the release notes, and why we had previously advised people to select the options that expose you to the loop. As you know, L-Soft offered to deliver 1.8a to all the beneficiaries of the CREN/L-Soft contract who did not manage to push the paperwork past their legal department, at L-Soft's own risk (without written agreement). Because of the use of the word "executed" rather than "agreed" in section 4 of the CREN/L-Soft agreement, L-Soft cannot do this without CREN's written permission. Regretfully, the last formal response we received from CREN on this topic was on June 29. CREN did not agree to let us deliver 1.8a to everyone, although no reason was stated. Instead, CREN proposed that the deadline be extended by another three months. We turned down this offer because it would not solve the problem. The reason less than a third of the beneficiaries returned usable contracts is that the contracts are too complicated and involve three parties (against our recommendation - we wanted a simple, standalone maintenance agreement with two parties), whereas purchase lawyers are not usually familiar with three-party maintenance agreements where 18 out of 30 pages are totally out of their control and not negotiable. Extending the deadline will not solve this problem. Furthermore, we do not have three months at our disposal. The backbone must be made LTCP-exploitive by the end of July, and L-Soft does not believe that 100 universities will return their contract over the next two weeks when only about 50 did so since March. What is most unfortunate with this negotiation is that we have still not been told why CREN opposes our proposal. Not knowing what bothers CREN with the proposal, and being faced with an unusable counter-proposal, there is not much we can do in terms of negotiations. We now have no option but to begin retrofitting LTCP exploitation into 1.7f. Volunteers for beta-testing are invited to contact L-Soft privately. Eric