These forums are CLOSED. Please visit the new forums HERE
Email into/out of SL grid is down |
|
Andrew Linden
Linden staff
![]() Join date: 18 Nov 2002
Posts: 692
|
07-22-2004 17:59
We've been having trouble with email in and out of the SL grid today. We thought we had it fixed but it broke in a new way. We're working on it now but we expect it to be down for a while.
|
Grim Lupis
Dark Wolf
Join date: 11 Jul 2003
Posts: 762
|
07-22-2004 21:14
While you guys are fiddling with the mail server(s), can you make the main mail server (data.agni.lindenlab.com) RFC compliant so that it will actually use my backup exchanger(s) instead of waiting 4 hours to retry?
Oh, and while you're at it, you guys should really install a backup exchanger of your own. You know, so that when the primary server goes belly-up, it doesn't break all of the scripts that use email-RPC (since it's the only way we have to really pass data to a real server.) _____________________
Grim
"God only made a few perfect heads, the rest of them he put hair on." -- Unknown |
Darwin Appleby
I Was Beaten With Satan
![]() Join date: 14 Mar 2003
Posts: 2,779
|
07-22-2004 21:33
Originally posted by Grim Lupis While you guys are fiddling with the mail server(s), can you make the main mail server (data.agni.lindenlab.com) RFC compliant so that it will actually use my backup exchanger(s) instead of waiting 4 hours to retry? _____________________
Touche.
|
Mark Linden
Funky Linden Monkey
Join date: 20 Nov 2002
Posts: 179
|
07-23-2004 09:58
Grim: Can you quote the RFC that you believe we are not compliant with, please?
The mail problem that occored yesterday and which is being fixed today has nothing to do with mail exchangers, unfortunately. |
Grim Lupis
Dark Wolf
Join date: 11 Jul 2003
Posts: 762
|
07-23-2004 16:29
Originally posted by Mark Linden Grim: Can you quote the RFC that you believe we are not compliant with, please? RFC 974 (MX), page 5 If the list of MX RRs is not empty, the mailer should try to deliver the message to the MXs in order (lowest preference value tried first). The mailer is required to attempt delivery to the lowest valued MX. Implementors are encouraged to write mailers so that they try the MXs in order until one of the MXs accepts the message, or all the MXs have been tried. A somewhat less demanding system, in which a fixed number of MXs is tried, is also reasonable. Note that multiple MXs may have the same preference value. In this case, all MXs at with a given value must be tried before any of a higher value are tried. In addition, in the special case in which there are several MXs with the lowest preference value, all of them should be tried before a message is deemed undeliverable. _____________________
Grim
"God only made a few perfect heads, the rest of them he put hair on." -- Unknown |
Mark Linden
Funky Linden Monkey
Join date: 20 Nov 2002
Posts: 179
|
07-23-2004 18:10
This is what our MTA does. However, if your primary MX issues a banner response to the SMTP HELO/EHLO verb, our current SMTP software will lock on it and ignore your other MXs, even if it issues an error later in the SMTP transaction.
So, if you're running a busted mailserver, we won't do the right thing, but we are technically not in violation of the RFC. We'll be upgrading to a newer MTA that will deal with this sort of thing better in the future. |
Grim Lupis
Dark Wolf
Join date: 11 Jul 2003
Posts: 762
|
07-23-2004 18:41
Actually, Mark, my problem wasn't a busted mail server.
When I first set up my backups, I turned my primary server off for testing, logged into SL, and created an object that would send an email. I then waited about 10 minutes and re-enabled my mail server. The message should have come in from my backup approx 5 minutes later (15 minute retries). The problem is that I didn't get the e-mail until 4 hours later, and it came direct from data.agni.lindenlab.com. The LL mail server apparently completely ignored my backup exchangers when my primary server was completely offline. And this test was run about 1-1/2 weeks ago, when you weren't experiencing the current problems on the LL mail server. The 15-minute interval instead of LL's 4-hour retry interval is one of the reasons that I pay for backup exchangers. Is it possible that the current mail server, once it successfully communicates once with an SMTP server, caches that information and then ignores all backups for some arbitrary period of time? I was running some other functionality tests prior to the backup test, but I'm positive that I didn't shut the server down in the middle of a session. **Edit: added the italicized text for accuracy. _____________________
Grim
"God only made a few perfect heads, the rest of them he put hair on." -- Unknown |
Malachi Petunia
Gentle Miscreant
![]() Join date: 21 Sep 2003
Posts: 3,414
|
07-23-2004 18:49
Originally posted by Mark Linden ....So, if you're running a busted mailserver, we won't do the right thing, but we are technically not in violation of the RFC. We'll be upgrading to a newer MTA that will deal with this sort of thing better in the future. I expect that sort of excuse from Microsoft - congratulations Mark, you've reached a nadir of customer contempt that is unparalleled. |
Mark Linden
Funky Linden Monkey
Join date: 20 Nov 2002
Posts: 179
|
07-24-2004 01:17
Grim:
What was the domain that you were sending mail to, and approx. when did you run this test (or exactly, if you know)? I'd like to check our logs to see what our mailserver actually did while your primary was offline; it certainly should have tried your next-in-line MX if your primary wasn't responding to SMTP at all. If you don't want to reveal your domain info on the forum, send me a private message, or IM me in SL. Malachi: I'm sorry you feel that way. However, all projects of Second Life's technological complexity have bugs, and sometimes those bugs are in software that we have do not have direct control over. Our current MTA software isn't perfect, but it has worked well for us for the last 2 years up until very recently. Instead of hiding our problems, I am simply offering to Grim and our user community that I am aware of the short comings of our current system and that there is a solution to this particular problem that is coming. |
Malachi Petunia
Gentle Miscreant
![]() Join date: 21 Sep 2003
Posts: 3,414
|
07-24-2004 08:22
Mark, I actually feel empathetic toward you personally. I incorrectly took out my frustrations at LL on you and for that I am sorry. I had mailed Philip three months ago (and postsed extensively on these forums with a message of "please grow or die"
![]() My apologies, Mark. |
Grim Lupis
Dark Wolf
Join date: 11 Jul 2003
Posts: 762
|
07-24-2004 11:41
Originally posted by Mark Linden Grim: What was the domain that you were sending mail to, and approx. when did you run this test (or exactly, if you know)? I'd like to check our logs to see what our mailserver actually did while your primary was offline; it certainly should have tried your next-in-line MX if your primary wasn't responding to SMTP at all. I'm not sure about the date. I believe it was July 12th, but I'm not 100% certain. I do remember that I sent the original messages at approximately 5PM PST (8PM for me), and I didn't receive them until a little after midnight, my time. I did a retest a little while ago, to see if I could reproduce this. Everything worked exactly like it was supposed to, and the email(s) came in from my backup exchanger about 15 minutes after being sent, as expected. O.o I did the original test shortly after the backups were configured. I have 5 domains backed up through the same service, and my email-rpc domain was the last one tested, all the others worked fine. However, I suppose that if the backups are working now and weren't then, it's most likely a result of cached, stale DNS information. _____________________
Grim
"God only made a few perfect heads, the rest of them he put hair on." -- Unknown |
Mark Linden
Funky Linden Monkey
Join date: 20 Nov 2002
Posts: 179
|
07-24-2004 14:20
Grim:
I agree; it does look like a stale DNS cache issue. Thanks for re-running your test. |