Many people have written to complain about duplicate messages from LSOFT.COM. Here is what happened. On Thursday night US time, PSUVM began unleashing some 3.5M deliveries onto our machines in the space of a few hours, following the resolution of a routing outage at CICnet, PSU's provider. Our delivery machines are normally configured in redundant mode, where only one machine is actually delivering mail while the other acts as a backup. In this mode we normally have a delivery capacity of around 210k/hour, depending on network outages, route flaps, etc. This is with a normal queue (a most a couple hundred thousand recipients) and input stream. With the kind of queue we are talking about here, performance was reduced to 180k/hour. On Friday morning we reconfigured the delivery machines in non-redundant mode, increasing overall throughput to around 300k/hour (again, less than our normal throughput due to the immenseness of the queue). Even at 300k/hour, a 3.5M backlog would take around 12h to go through, and naturally we were also getting our regular, daily ~2.5M deliveries from our everyday operations. The bottom line is that we had to deliver 6,361,742 messages on 11/22, instead of the usual 2.5M. So, things were slow, and some of our hosts have been unresponsive yesterday. Unfortunately, some messages were also duplicated Thursday night. This is because, in redundant mode, all the new messages had to be processed by one machine, and the configuration we were using did not allow it to handle a queue of that size on its own. To give you an idea, the second largest outage/backlog we have had to deal with involved delivering an extra 1.1M messages, as compared to a normal day. Here we ended up delivering 3.8M more than on a normal day. The server had been configured with a large file cache, which still left more RAM for LSMTP than it had ever attempted to use (even on the worst outage of record). The virtual storage quota for LSMTP had been set accordingly (no use allowing it to get into a situation where paging activity would bring the system to its knees). So LSMTP was crashing every 30-45 minutes, and any messages that had been in the process of being sent at the time would be resent after the restart. We fixed this on Friday morning by increasing the amount of storage available to LSMTP and its virtual storage quota, and by splitting the queue between the two main delivery machines. As you may know we are about to upgrade this setup to a fully redundant configuration based on a VMS cluster. The new server was shipped on Wednesday and should in principle arrive Monday (a bit too late, but it never rains...) This machine will have 512M of RAM and should be able to deliver 550k messages an hour. If we had had it yesterday, we would have been able to clear the backlog in a few hours (our total throughput would have been around 800k/hour), and there would have been no duplicate. It will take a couple months for the new clustered configuration to go online as we need to make a few software changes (and test them!) We will migrate the production workload to the new server when it arrives, and use the old one as a test machine until we are ready to go live with the clustered setup. The old server has also been upgraded and we expect that it will be able to handle 400-450k/hour, however this requires the installation of VMS 7.1, which will be released in a week or two. At any rate, we apologize for the inconvenience, but there is no need to keep reporting duplicates unless they have occurred on Saturday or later (per the time stamp on the "Received:" line for PEACH.EASE.LSOFT.COM). It is also possible that messages might get duplicated down the line. Since we sent 2.5 times as much traffic as on a regular day, mail servers all over the world have received 2.5 times as much traffic from us as on a regular day. For most sites, this is not likely to make any difference, but large sites which get large absolute numbers of deliveries from us may have been impacted. For instance, we sent around 650,000 deliveries to AOL yesterday. This is not to say that there has been a problem at AOL, just that the absolute numbers for some sites may have been significant and can have caused problems for the mail servers in question. Eric