LSTSRV-L Archives

LISTSERV Site Administrators' Forum

LSTSRV-L

Options: Use Monospaced Font
Show Text Part by Default
Show All Mail Headers

Topic: [<< First] [< Prev] [Next >] [Last >>]

Print Reply
Eric Thomas <[log in to unmask]>
Mon, 24 Mar 2003 01:32:44 +0100
text/plain (22 lines)
>    Receive error: Resource temporarily unavailable
> 
> [Our system manager thinks] this error
> message is being generated by the listserv program, but we don't
> know how to prove this. Looking at the source code of sendmail,
> we do not see this message, so we assume it is coming from the
> listserv server.

This error message is simply the strerror() "plain English" text for errno=EAGAIN. The immediate cause of the problem is that a call to recv() to read the response from sendmail returned EAGAIN. This error does not come from sendmail but from AIX. It is not listed in the man page for recv(). EAGAIN normally means that a process quota is exceeded, most commonly the fork() quota, although it should not be relevant here. You are invited to retry, but chances are that you will get the same error again and again, until someone comes in and increases the quota or another process happens to exit. Currently LISTSERV aborts the transmission, closes the connection and starts a new one after a while (one minute if I remember correctly).

What I've been able to find so far is that AIX 4.3 returns EAGAIN where other unixes would return EWOULDBLOCK. This should not matter since this is a blocking recv() call, on a socket not set for non-blocking I/O, so EWOULDBLOCK is in principle impossible and, if it should occur anyway, you probably want to treat it like an unexpected error, which is exactly what happens with EAGAIN. I've also found many reports of similar problems with OpenSSL on AIX, but not published solutions. Finally, I've found isolated theories via a Google search:

- SMP AIX 4.x systems sometimes return EAGAIN instead of EINTR when a signal arrives during the recv() call. If true, this would be relevant to LISTSERV as it receives a signal when new mail has arrived.

- EAGAIN can mean that the recv() timeout has expired with no new data. If true, it would almost certainly indicate that sendmail is not responding, since the timeout is one minute.

The most credible theory is EAGAIN instead of EINTR. This would explain why the only EAGAIN problem reports are for AIX. In that case, there could either be an APAR that fixes it, or we could try to program a limited retry in the code (you wouldn't want an infinite retry as it could end up being a hard loop). As for why it works with 1.8d, it is a smaller executable so you could happen to be just below a default quota whereas 1.8e is barely above. It is typical for quota problems to trigger after a version upgrade, so unfortunately this does not tell us much.

Has anybody experienced the EAGAIN problem on a single-processor system?

  Eric

ATOM RSS1 RSS2