The MU forums have moved to WordPress.org

Maintainanace and management of thousands of schemas/tables (7 posts)

  1. darshankarandikar
    Member
    Posted 13 years ago #

    Hi,

    I am trying to understand the "maintainance" and "manageability" aspects of wordpress mu database architecture. If I have thousands of database tables/schemas across one or more "databases" to support thousands of users (as done in wordpress.com), and adding more tables/schemas is the only option to scale this solution further, what will be the impact of this architecture on "maintainance" and "manageability" of the database system? i.e.

    1) Assume you have 5000 blogs. Thus 5000 X 9 tables in one database. If a data definition change (e.g. column name change in 1 table) is to be rolled out using an automated script, and suppose there is a failure during application of this change to 1000th table (out of 5000), then will the change done earlier to 999 tables be rolled back? Is it not a considerably complex and risky "transaction" to be managed (it can potentially make my blogging platform unstable if not managed meticulously), given the huge volume of tables?

    2) How wordpress.com manages this issue?

    3) What is the team size that wordpress.com employs to manage such a huge number of tables and databases? What are the challenges they face?

    4) What is the total number of databases as well as size & capacity per database that wordpress.com uses to support such a huge numeber of tables?

    5) Once the number of tables in one databae grows beyond what the database can handle, the only option is to add one more physical database and manage the 2 databases with middleware like Hyperion. There is a cost implication of this in terms of database licenses (if not using open source database). It also increases number of "breakpoints" in the system.

    6) To implement the desired "multitenancy", is maintaining a "user id" in relevant tables a better solution instead of "one table set per user"? "normalizing database model with tenant id in tables (share everything) instead of one schema/table set per tenant" is the approach that is suggested for most high usage SaaS platforms where "multitenancy" is required. Any specific reason to choose "one table set per tenant/user" approach in wordpress mu over "single schema/table set for all tenants with tenant id in tables (logical partitioning)" approach?

    I would also like to know your experience with maintainability and manageability QoS requirements when it comes to "multi table" database architecture that wordpress mu implements for multitenancy?

    Please help me understand the pros and cons of this database architecture.

    Thanks for your time!

  2. andrea_r
    Moderator
    Posted 13 years ago #

    #1 you try not to make changes like that. Don't mess with the standard table & field names, you'll screw up all kinds of things, 5,000 blogs or not.

    #2 They use this: http://wordpress.org/extend/plugins/hyperdb/

    #3 Automattic has 40+ employees and hundreds of servers. barry is the person to read: http://barry.wordpress.com/

    #4 dunno, but it seems to be a non-issue. In other corporate non-WordPress settings, we've seen single databases with millions of records.

    #5 databases can grow pretty large. the database itself can handle whatever. the biggest limitation you'll hit is the 32,000 folder limit in linux. the plugin I linked to above, as well as this one: http://wordpress.org/extend/plugins/shardb/

    handle multiple databases. they do not fill up one then use the next one. they spread out the new blogs across the databses. you're using a MySQL database here.

    #6 I wouldn't do either. The code & plugins relies on how things are set up now. Study the db.php file.

  3. darshankarandikar
    Member
    Posted 13 years ago #

    Thanks andrea_r for the response. Here are a few followup queries:

    #1 The query was more from the point of view of maintainability of the 'general database architecture' (multi table sets) followed for mu. i.e. I am not saying I need to change mu data definitions. But if I need to build something like mu on my own with thousands of users and say around 50 to 55 tables per user, and at a certain point I need to change a data definition in one or more tables (say to add a feature), then will mu-like database architecture increase the complexity and risk of such a change as compared to "single table set for all users" model, where the data definition change needs to be done only in one table. What is your opinion on this? Since wordpress.com uses mu database architecture, how does wordpress.com team manage such a change when there's a need to do a data definition change, let's say, to add a new feature to wordpress.com?

    #4 What is the benefit of using "multi table set" model vs. "single table set and millions of records". I think "performance" may be better in "multi table set" model due to less resource intensive database queries? What do you think?

    #6 ok. So what I understand is that the "multi table set" model is used in mu because, with this model it was easy to extend the basic wordpress database architecture for multiple user blogs without the code/database changes required to introduce userid in relevant queries and tables. There was no specific reason other than this (such as performance or scalability). Is my understanding correct? (I may be absolutely wrong here). I am just trying to understand the reasoning behind employing this 'multi table set' model as compared to 'single table set for all tenants' model.

    Appreciate your help.

  4. darshankarandikar
    Member
    Posted 13 years ago #

    Adding to #6: If developing a multitenant application from scratch with thousands of tenants expected to use it, which of the following is a better database model considering all quality of service requirements such as performance, scalability, maintanability and manageability:
    Option 1: One schema/table set per tenant/user (as done in wordpress.com with wordpress mu platform). i.e multiple schemas/table sets for multiple tenants/users.
    OR
    Option 2: Single schema/table set for ALL tenants ("shared everything") with tenant id (userid) as part of the relevant tables to logically partition tenant data.

  5. andrea_r
    Moderator
    Posted 13 years ago #

    Since wordpress.com uses mu database architecture, how does wordpress.com team manage such a change when there's a need to do a data definition change, let's say, to add a new feature to wordpress.com?

    They very rarely - like never - change the name of an exisiting feild or table. The addition of a new table across wordpress.com has to be fought for incredibly hard.

    And since they are large, they are able to roll out these additions to a section at a time.

    Adding a new features does not always mean you have to rename a table or a field. A lot of things are stored in each blog's options table - wp_345_options for example.
    Some new features could exist in a global table, or they may not have associated tables at all.

    #4 - six of one, half a dozen of the other. ;) What you gain in one area, you lose somewhere else. It works out even in the end.

    #6 Exactly. To work smoothly with WordPress changes, it had to be based on the exisiting single WP db structure.

  6. rbaccaro
    Member
    Posted 13 years ago #

    Hi,

    6# still not clear to me. I would like to understand why MU creates 9 tables for each new user instead using the main tables and using only a flag, for example. I am installing a 40.000 users with WP 3.0 and wondering to find a the way to go.

    Regards,

    Ray

  7. andrea_r
    Moderator
    Posted 13 years ago #

    It creates 9 new tables for each *blog*, not each user. If 10,000 users don't have a blog, just a login, those tables do not get created.

    You will have 10K more fields in your user meta tables though.

About this Topic

  • Started 13 years ago by darshankarandikar
  • Latest reply from andrea_r