The purpose of this note is to answer the questions asked by several people
about the present delays in getting files DISTRIBUTEd from the US to Europe,
and vice-versa. There are two causes to this problem. The first one is that
the BITNIC CPU is too small for the load it is getting, and that its link to
CUNY is also too slow. But today the major delays are in Europe, where the
"hub" LISTSERV backbone node is FRULM11. Although the node is quite stable and
their staff has never failed to take action whenever they were notified of a
problem, the CPU is simply too slow for the amount of jobs it is getting (we
are talking about over 5000 a day), and its connection to the rest of the
network is via a 9.6k line. LISTSERV@FRULM11 is still processing the backlog
caused by an interruption of service at FRORS12 last weekend, and is probably
flirting with the 9900 spoolids limit as I type these lines.
I have analyzed the situation and there is unfortunately no satisfactory
solution. Temporarily changing FRULM11 to DISTRIBUTE(NO), or changing the link
weights, would only move the totality of the load to another EARN server:
TREARN, HEARN, DEARN, depending on the new set of weights. There is only one
solution that would spread the load across several servers: nullifying the
weight of the CUNY-MOP link (now 75) and setting all the other weights so that
all the hub European servers lay at the same distance from CUNY. There would
then be 'N' files sent from BITNET to EARN rather than 1, but the CPU and
outbound files load would not hit a single machine. But this change would have
other implications, like causing European subscribers to be assigned to a
BITNET peer or vice-versa. You can imagine the political consequences that
this might have, and I want to stay clear of that.
It could be argued that there may be better machines than FRULM11 to take that
load, but unfortunately there is none that is clearly better. The requirements
are a fast (or at least lightly-loaded), non volume-charged connection to the
EARN backbone, a fast CPU and the willingness to burn a large amount of CPU
cycles for the network, a large spool with no more than 1000-2000 local files
at any time or a version of VM that can handle more than 9900, excellent
availability and, above all, responsible staff willing to take immediate
action if there is an urgent problem. I know of no node meeting all these
requirements, especially as this would mean dramatically increasing the
traffic on a given international line in order to provide service to other
countries, thereby greatly reducing the bandwidth available to users of the
local country - more political problems, which lead me to think that the
wisest thing to do is to leave things as they are.
Finally, I would like to remind you that FRULM11 is a "private" computing
centre having no obligation to take that load (unlike DEARN, CEARN, BITNIC,
etc, whose only purpose is to do this kind of things). They have kindly
accepted to do so, because of their advantageous topological location, in
order to help the network to solve its chronic line saturation problems. We
should all remember this when we complain about their CPU being slower than we
would like it to be.
Cheers, Eric
|