LISTSERV - LSTOWN-L Archives - COMMUNITY.EMAILOGY.COM

On Sep  15, UIUC  reported on  the core operators'  list that  their core
system, a dedicated 4381 running the  UIUCVM42 core node, could no longer
handle the load it was subjected  to. It was quickly established that the
machine is simply out of steam. It is one of the smallest machines on the
core, it  is 100% busy  24 hours,  and it has  reached a point  where the
smallest amount  of files in  LISTSERV's input queue  is on the  order of
3000,  in the  middle  of the  night  on Sundays.  This  machine and  the
manpower to operate it are provided by  UIUC on a volunteer basis, and we
can  only  thank  UIUC  for their  continuing  support,  dedication,  and
generosity. However, this problem does  need to be solved. Several L-Soft
customers have complained  about LISTSERV delivery delays of  up to EIGHT
DAYS. Needless to say, this is totally unacceptable to the average user.
 
At first, we told people that  the UIUCVM42 issue was being investigated.
The issue was  being discussed on the core operators'  list, and we hoped
that  a solution  would be  found  shortly. Unfortunately,  this has  not
happened.  What's worse,  nobody seems  to  have taken  ownership of  the
problem.  We can't  even  tell our  customers that  a  solution is  being
actively  implemented and  is expected  to be  ready by  a certain  date,
because,  to  the  best  of  our  knowledge,  nothing  at  all  is  being
implemented. This is intolerable. Our customers are not interested in our
explanations  of the  delicate, volunteer-based  core support  structure.
They pay us good  money for software which happens to  use the core. They
demand service. In their  opinion, if the core needs 8  days to process a
LISTSERV distribution,  the core  should be  either fixed  or terminated,
because  a structure  that needs  8 days  to deliver  mail is  simply not
useful. And they are right.
 
In  order to  ensure that  our  customers do  receive a  decent level  of
service, we have  had no option but to remove  UIUCVM42 from the LISTSERV
backbone (and set UIUCVMD to LOCAL  distribution mode, to avoid having it
attract the workload of UIUCVM42). This will bypass the LISTSERV@UIUCVM42
backlog and restore the expected level of service.
 
This  is not  a  satisfactory solution.  In  fact, it  is  a last  resort
solution, and this is why we  waited 2 weeks before making this decision.
There was  simply no other option.  Removing UIUC from the  backbone will
increase  the level  of  traffic  on the  core,  and  break the  INTERBIT
symmetry. We do not expect any major  disaster, and there is no cause for
panic. This  change simply puts  UIUC in  the same situation  as Cornell,
back when it used to be a  core site not running LISTSERV. We expect that
this change will solve the problem in  the near future, at the expense of
additional traffic that the core structure can support today. However, we
also expect that other sites will  find themselves in a situation similar
to UIUC's over the next 6 months.  Removing a core site from the backbone
increases traffic in proportion to the  number of remaining core sites on
the backbone. That is, it is bearable  the first few times you do it, but
every additional  removal becomes more  expensive than the  previous one.
And, since each  removal increases traffic and  contributes to saturating
machines  and  requiring  another  removal,  this  is  a  very  dangerous
situation which could get out of control in no time.
 
Again, UIUC is not to blame for this problem. They are not being paid for
this service, which  costs them real money. The machine  is out of steam,
the traffic simply  has to be moved elsewhere. There  are several ways to
move SMTP/INTERBIT traffic  from a VM system to a  workstation. These are
not experimental mechanisms. SUNET has been running its SMTP service on a
workstation  since  March  1994.  Others have  started  offloading  their
mainframes in a  similar fashion. The technology is  available, today, to
solve problems such as UIUC's. And, if  it is not deployed today, we will
not have a  core for long. The  only obstacle to this  deployment is that
software  cannot  run on  thin  air,  and  someone  has to  purchase  the
workstations in question. Some core sites are willing to spend $10-20k to
buy  a workstation  for the  core service,  others aren't,  and we  can't
really blame  them for  that. This problem  has to be  solved by  the NJE
connectivity  providers,  who  are  getting  paid  by  the  participating
organizations for the provision of  services that their users find useful
-  when the  turnaround  time is  within reasonable  bounds,  that is.  A
comprehensive solution  would probably  cost around  $200k and  a minimal
solution, $50k.  These are mostly  one time  charges for the  purchase of
equipment. It is estimated that the NJE connectivity providers collect on
the  order  of  2  million  US dollars  a  year  (worldwide).  Thus,  the
comprehensive solution would cost about 10% of the yearly dues, and again
most of that  is a one time charge. Since  the NJE connectivity providers
have a  monopoly, it is  not possible for  other companies to  offer more
competitive  or better  operated NJE  services.  Your only  option, as  a
representative of  your dissatisfied  users, is to  complain to  your NJE
provider,  and  seek  alternate  solutions   if  you  do  not  receive  a
satisfactory answer.
 
  Eric