Here's what just happened, and it is something that happens somewhat frequently.
The server one of my clients is on (Rhea) had a load spike up to 200 at 1:50pm. It came down quickly, but then went up to 222 at 2:25, then came down again, then back up to 325 at 2:50 and kept going up. It reached 510 at 3:10, and HTTP crashed on the server. I sent in a support request at 1:50, hoping someone would see it in time to respond and prevent a crash. That didn't happen; 20 minutes later, at the time of the crash, the ticket still hadn't been assigned to anyone. So, from the time of the initial spike, it took and hour and twenty minutes before the crash happened.
Here's the question. Shouldn't there be some sort of load monitoring on your servers to alert techs to this sort of thing? In the past, the techs have said "thanks for letting us know" when I report these load spikes, but why should I have to report them at all? I have a script on my server that checks the load periodically and sends me an email when it goes well above the normal range. If the load stays up, or goes down but goes up again, I submit a support ticket. Why isn't something like this in place on all the servers to alert the techs? That would prevent some of these server crashes from happening.


LinkBack URL
About LinkBacks



Reply With Quote



Bookmarks