The MU forums have moved to WordPress.org

1 2 3

mysterious load spikes crash server (68 posts)

  1. agreda
    Member
    Posted 9 years ago #

    I've searched these forums and elsewhere but am still at a loss for help identifying why our server has crashed multiple times recently. We moved to our own box in preparation for growth of our WPMU install at http://tripawds.com

    All was running well for the past couple months until we went down multiple times in the last week, after no apparent changes to the install -- no new plugins or hacks etc. We just suddenly crashed and burned a couple times.

    The folks at Server Beach will only tell us that "excessive http requests" caused the server load to spike. [sigh] Big help. Our minimal traffic and low number of users, however, simply do not justify such peaks. They blamed it on our PHP freeChat script we had running, so I disabled that and we still went down again.

    I have eliminated the chat script for now. I have reconfigured our Forums plugin to make far fewer db calls. I optimized the database via MyAdmin and will continue to do so regularly. I enabled gzip via PHP. I am considering wp-cache or super cache but don't see how that will help when the majority of our traffic is in the forums from logged in users. I continue to pull out my hair and grind my teeth.

    Below are our server and MU specs. Any help whatsoever helping us identify the cause is greatly appreciated. Thank you.

    WPMU v. 2.8.4a
    plugins

    • All in One SEO **
    • cforms *
    • NextGEN Gallery *
    • Simple:Press Forum *
    • Subscribe To Comments **
    • Text Link Ads *
    • TypePad AntiSpam ***
    • Unfiltered MU **
    • WP-SpamFree **

    *Enabled for Main Blog Only
    ** Enabled for Supporters (WPMU DEV Premium)
    *** Enabled for All

    mu-plugins

    • Admin Ads
    • Avatars
    • Comment Indexer
    • Custom Content Dashboard Widget
    • Dashboard Feeds
    • First Comment
    • Global Header
    • Invites
    • List All
    • Logout Redirect
    • Plugin Manager
    • Personal Welcome
    • Post Indexer
    • Supporters
    • Widget Recent Global Comments
    • Widget Recent Global Comments Feed
    • Widget Recent Global Posts
    • Widget Recent Global Posts Feed
    • xxx
    • xxx

    Dedicated Server Specs:
    AMD Athlon64 3500+
    2 GB RAM
    2 hard drives each - 160 GB SATA II
    OS: CentOS 5 64bit
    cPanel Unlimited Domain License
    1 IP Address
    10 Mbit Port
    2000 GB bandwidth per month
    Apache version: 2.2.11 (Unix)
    PHP version: 5.2.8
    MySQL version: 5.0.81-community
    Architecture: i686
    Operating system:" Linux
    Kernel version: 2.6.18-92.1.22.el5

  2. andrea_r
    Moderator
    Posted 9 years ago #

    Could be two things, or both those things - the box, or a plugin (or combination of plugins).

    CentOS can have a memory leak. Some people will argue that cPanel will hog your memory, depending.

    The standard advice pretty mcuh everywhere is turn off the plugins - all of 'em. Yes, even mu-plugins (well, keep the spam ones..)

    My shot in the dark guess is the NextGen gallery pulling in random pics in your forum header. It was chugging hard. I think it was doing a switch to blog 1 as well, so if you're on the main blog anyway.... yeah. that'll do it.

  3. agreda
    Member
    Posted 9 years ago #

    ... - the box ...

    As in the specs are too weak for what we want to do? If so, Please provide recommendations. Thanks.

    ... I think it was doing a switch to blog 1 as well ...

    Would you mind clarifying this? The forums are only enabled for blog 1 so I'm not sure how you thought it may be "switching" or how I might address that.

    Thanks for the advice. We can't exactly disable Simple:Press because most members only join for the forums. But I guess it's time to start stripping other features.

  4. Ovidiu
    Member
    Posted 9 years ago #

    I told you this already in the premium forums: install a tool that will graphicaly show you what is happening on the box. THEN after the next crash, you can easily find out what happened. Use a tool like NAGIOS, MUNIN, MONIT

    These tools will show you traffic spikes, how many apache threads were active at a certain time, etc.etc.etc. which is very useful to figure out what went wrong.

    Right now you can't even say what the problem was except for: excessive http requests

    You need to analyse the problem on your own, if your host won't help.

    What if its not a plugin, but misconfigured php? what are your apache settings? do you allow persisten connections? what are your mysql settings?....

    I'd suggest settign up a monitoring tool, waiting for the next crash, then supplying us with more info.

  5. andrea_r
    Moderator
    Posted 9 years ago #

    "As in the specs are too weak for what we want to do? "

    No, actually - the specs are fine. I'm helping you narrow down the cause. CentOS sometimes has a memory leak - if that is the cause, then that's what I mean by saying "it's the box you're on".

    You have to figure out what it is exactly by monitoring things like Ovidiu suggested. Otherwise we're all just guessing.

    You coudl also try eaccelerate to compress the php.

    "The forums are only enabled for blog 1 so I'm not sure how you thought it may be "switching" or how I might address that."

    Nevermind, it was late. :D

  6. agreda
    Member
    Posted 9 years ago #

    Thanks again for all the tips. Our investigation shall continue.

  7. parkstreet
    Member
    Posted 9 years ago #

    I used CentOS and cPanel for a little over a month and decided to get rid of it. Nothing against CentOS, but rather cPanel. The issue I found was the logs that were created every time someone did anything on the server. You might want to check there to see if you are generating excessive logs (access and error). But first you should check your error logs to see if this is the case. I wish that I could remember the error message that was generated so that you can compare.

  8. andrea_r
    Moderator
    Posted 9 years ago #

    We've been babysitting a vps with centos & cpanel and are finally moving things today. It goes down on a regualr basis. the only thing we can figure is spam (possibly spam assassin) and/or spam comment attempts.

    It's really quite maddening.

  9. parkstreet
    Member
    Posted 9 years ago #

    Yeah. That is why I go Ubuntu and ISPConfig3 all the way.

  10. andrea_r
    Moderator
    Posted 9 years ago #

    I'm kinda rockin' Debian Lenny now. You learn these things as you go. :)

  11. agreda
    Member
    Posted 9 years ago #

    More good info, thanks. I just got root access to our WHM account and can now answer the previous questions (below).

    I've had Munin installed, and while it does show lots of pretty graphs, it doesn't seem to get very detailed regarding specific queries, etc. :-\

    Thank you all once again for the assistance.

    Apache Global Config Options
    SSLCipherSuite: default
    TraceEnable: On
    ServerSignature: Off (PCI Recommended)
    ServerTokens: Full
    FileETag: All
    Directory '/' Options:
    X ExecCGI (Selected)
    X FollowSymLinks (Selected)
    X Includes (Selected)
    X IncludesNOEXEC (Selected)
    X Indexes (Selected)
    O MultiViews (Not Selected)
    X SymLinksIfOwnerMatch (Selected)

    PHP and SuExec Configuration
    Default PHP Version (.php files) | 5
    PHP 5 Handler | dso
    PHP 4 Handler | none
    Apache suEXEC | on

    PHP Configuration (Basic Mode)
    upload_max_filesize | 32M
    include_path | .:/usr/lib/php:/usr/local/lib/php
    file_uploads | On
    asp_tags | Off
    memory_limit | 500M
    register_globals | Off
    max_execution_time | 30000
    max_input_time | 6000
    enable_dl | Off
    safe_mode | Off
    safe_modesession.save_path | /tmp

    what are your mysql settings?....
    MySQL mysql.allow_persistent | Off
    MySQL mysql.connect_timeout | 60
    MySQL mysql.default_host | (empty)
    MySQL mysql.default_password | (empty)
    MySQL mysql.default_port | (empty)
    MySQL mysql.default_socket | (empty)
    MySQL mysql.default_user | (empty)
    MySQL mysql.max_links | -1
    MySQL mysql.max_persistent | -1
    MySQL mysql.trace_mode | Off

  12. agreda
    Member
    Posted 9 years ago #

    UPDATE: Munin doesn't seem to do anything for me other than graph when spikes occur (among many other events) . It doesn't seem to show any specific cause for the spikes. It does however indicate some seriously slow SQL queries during the spikes, and that the timing of the last couple spikes correlate to times when I was experiencing trouble in the the WPMU admin -- widgets not saving, tags not displaying when editing a post, slow load of media insert, etc.

    QUESTION: Could a fresh install/upgrade of wpmu help? Yes, I am running 2.8.4 but I'm wondering if an upload of all new core files might replace something that may have been corrupted.

    I ask because I see a number of GET requests for various core files from ../images rather than their proper location (presumably ../wp-admin/images/) in the Apache Server Status section at the bottom of this load spike alert.

    Admittedly, I am grasping at straws here. All thoughts are greatly appreciated. Many thanks for the continued support.

  13. SteveAtty
    Member
    Posted 9 years ago #

    I'd be tempted to try with a completely clean 2.8.4a install with no plugins just in case.

  14. andrea_r
    Moderator
    Posted 9 years ago #

    "Could a fresh install/upgrade of wpmu help? "

    It very well could, especially if you remove all the files. Any stray ones from previous versions that are no longer in use could cause some unexpected behaviour.

  15. kingkong954
    Member
    Posted 9 years ago #

    What would you keep if trying to 'fresh install' over an existing install?

    I presum you'd leave a lot in the wp-content folders (blogs.dir, themes, etc.), and then perhaps the wp-config? .htaccess? Or would you re-run the setup with the same DB information?

  16. andrea_r
    Moderator
    Posted 9 years ago #

    "What would you keep if trying to 'fresh install' over an existing install?"

    You'd keep wp-content, wp-config.php & .htaccess. Blow away everything else.

    And TAKE A BACKUP. :D

    (actually, rather than blow things away, I make new folders on the server and move things around.)

  17. anointed
    Member
    Posted 9 years ago #

    I'm also using Centos on my servers. In the past I was using cpanel and noticed the same issues. Way to many log entries and it did not seem to be clearing out the old ones properly.

    I also noticed that my cpanel was being attacked pretty much 24/7 by the bots. In checking some of the 'underground' forums it seems that there are many people always working on new exploits for cpanel to gain access. IN my case changing from cpanel to Hsphere, made a huge difference. ymmv

  18. andrea_r
    Moderator
    Posted 9 years ago #

    Ooooo, good digging!

  19. agreda
    Member
    Posted 9 years ago #

    Update: My server manager did a complete rebuild of Apache ... "eliminating a bunch of modules that are not needed in order to give you a slimmer, more efficient binary." I also replaced wpmu 2.8.4a with all new core files. I've done my best to rule out any plugin conflicts. All was good, for about a couple days. Then we spiked again.

    Here's the interesting thing. The spikes only appear to happen to when I am logged into the admin. Timing of this recent spike coincided with when I was logged in the wpmu admin and noticed a comment reply taking forever to save. Or so I thought, the save icon just kept spinning so I jumped to the post and my comment reply was actually there.

    Anyway, if there are any ideas as to why being logged into the wpmu admin might cause the server to bog down, I am all ears. Thanks!

    For anyone willing to review it, and able to decipher it, here is the server load alert from our last spike.

  20. agreda
    Member
    Posted 9 years ago #

    Sorry for the bump, but this is just seriously perplexing and I have nowhere else to turn.

    I opened our root WHM account in one window and navigated our site and forums in another, watching the Load Averages hover around 0.32 - 0.76 ... Simply by visiting the dashboard and clicking on Comments, I watched the Load Averages climb to about 7.6. Upon deleting a comment it jumped to over 20 before I closed the window, when Averages fell back to normal.

    If there are any experts out there looking for a gig to help troubleshoot and resolve, please feel free to contact me.

  21. SteveAtty
    Member
    Posted 9 years ago #

    I assume you've optimised your database tables?

  22. andrea_r
    Moderator
    Posted 9 years ago #

    And please check the database for spam clogging things up. If you've got 10,000 spam comments in there, that certainly won't help.

  23. agreda
    Member
    Posted 9 years ago #

    Well, I have optimized the db tables.

    And...
    delete from wp_1_comments where comment_approved="spam"
    resulted in
    Deleted rows: 32 (Query took 0.0267 sec)
    So we weren't clogged up with spam comments.

    But I do have hundreds of users (without blogs) who we could clean out of the db from long before our wpmu migration. Any help writing a query to do so would be much appreciated, considering there is only one indicator we have of those users who we are sure we want to delete.

    How can I delete from wp_users AND wp_1_sfmembers only those identified by lastvisit='0000-00-00 00:00:00' within wp_1_sfmembers? And in doing so, would I need to drop any related rows from wp_usermeta?

    Admittedly, this is over my head, and a shot in the dark. Many thanks once again for all the help.

  24. SteveAtty
    Member
    Posted 9 years ago #

    Look at Power Tools, it does a lot of cleaning for you

  25. agreda
    Member
    Posted 9 years ago #

    Will do, thanks. But here's our latest interesting discovery ... the spikes clearly seem related to my account when logged in from my Mac, on our MotoSat internet connection. When logged in as a different admin from from a PC on our Verizon connection, Load Averages hovered around .6± - 1.2± ... when performing the same actions from my machine/account, I watched the load climb to 40+ before closing the window when they would fall back to normal.

    At least I know what we're dealing with now. As long as I stay out of the admin, we're OK! ;-) Any ideas about what might cause such specific account or IP related spikes are welcome.

    Thanks again for all the help!

  26. SteveAtty
    Member
    Posted 9 years ago #

    Does it do it with your Mac on any other connection. Does it do it with different browsers on the Mac?

    Motosat is satellite isn't it?

    Silly question but what is the keep alive set to on Apache?

  27. andrea_r
    Moderator
    Posted 9 years ago #

    ahhhh.. yeah satellite has a latency on it that can get bad sometimes. Weird, but not surprised, as satellite can do some pretty freaky things, and glad you at least narrowed it down.

    We narrowed down our issue too. Somewhere the system was reporting the tmp dir as over 80% when it wasn't. Once this was fixed, the spikes went away.

  28. agreda
    Member
    Posted 9 years ago #

    I cannot replicate the issue connecting via our Verizon MiFi account. Yet it occurs in any browser via satellite. I was thinking it had something to do with the latency, but can't imagine why would would just start happening recently.

    Next, I need to see if issue persists from another machine (PC). Also researching Apache settings now. Thanks again for your thoughts!

  29. andrea_r
    Moderator
    Posted 9 years ago #

    You want me to test it from here? I'm on satellite too. That might help narrow down *something*. Your provider for instance.

    just hit up my contact form.

  30. agreda
    Member
    Posted 9 years ago #

    KeepAlives are on, with a timeout of 5 seconds

    @andrea: Thanks! I'll send login credentials for an admin account after we do some more testing on our end. I'd hate to have you test it without me logged into the root and ready to kill any process spinning out of control to bring down the box.

1 2 3

About this Topic