LSTOWN-L Archives

LISTSERV List Owners' Forum

LSTOWN-L

Options: Use Monospaced Font
Show Text Part by Default
Show All Mail Headers

Topic: [<< First] [< Prev] [Next >] [Last >>]

Print Reply
Eric Thomas <[log in to unmask]>
Sun, 3 Nov 1996 15:42:08 +0200
text/plain (45 lines)
This problem  has just  been fixed.  Please let me  know privately  if it
happens again (and if the "Date:" field in the error message is posterior
to 08:30 Eastern time).
 
PLUM.EASE.LSOFT.COM is the first half of a fully redundant LISTSERV setup
that we are going  to build over the next couple months  (the rest of the
hardware will take about one month to arrive due to being very new models
with  an  order  backlog).  Currently we  have  redundant  mail  delivery
machines,  however the  machine running  LISTSERV  is a  single point  of
failure. In practice this machine has  proved very reliable, and the only
incident was a  disk crash in February,  which did not cause  any loss of
data since the disk  was mirrored. But there are a lot  of sites for whom
this design  is not acceptable;  they need  to be able  to blow up  a box
completely and  still have a  working service, for instance  because they
are in the business of  sending warnings about natural disasters. Another
issue  is  that  taking  the   system  down  for  maintenance  causes  an
interruption of service. In some  businesses, a 15-min maintenance window
may just not be acceptable.
 
So, we are going to build the necessary tools to make it possible to have
a truly uninterrupted,  24x365 LISTSERV service. Currently  this can only
be achieved using VMS cluster technology  (there is a cluster product for
NT,  but  in  its  current  version   it  does  not  have  the  necessary
functionality). To give this setup the  level of testing it needs, we are
going to  use it to  take over PEACH's  role in the  DISTRIBUTE backbone.
Currently this is done  by having PEACH forward all its  mail to PLUM, so
that we can back out immediately if there should be a problem, or when we
need  to do  maintenance  on PLUM  (which  we  will have  to  as the  new
components of the setup arrive).
 
The error that you noticed was a bug in the in-transit spam detector that
we run on PEACH and now on PLUM. This bug had not been detected until now
due to operating system specific considerations, but PEACH would probably
have run  into it  within the  next few months,  and its  performance was
already being  impacted by  a side  effect of  the bug.  PLUM is  now the
largest LISTSERV site  in the world and there are  problems that you only
detect  with this  level  of  traffic. Some  operating  systems may  hide
certain  problems and  I was  actually expecting  to find  a couple  bugs
during the transition. I made the  switch late at night and monitored the
system for the next 3-4h, and  everything worked fine, but obviously this
one decided  to strike  while I was  asleep. At the  rate mail  is coming
through, it shouldn't take very long for other problems to be found.
 
  Eric

ATOM RSS1 RSS2