Make WordPress Core

Opened 13 years ago

Closed 13 years ago

#17998 closed defect (bug) (fixed)

dbDelta uses unrestricted "show all" which doesn't scale or work with DB sharding plugins

Reported by: brianlayman's profile brianlayman Owned by:
Milestone: 3.3 Priority: normal
Severity: major Version: 3.2
Component: Database Keywords:
Focuses: Cc:

Description

The dbdelta function does a straight "show tables" which is not scalable to large multisite installs.

The routine is built to look at and upgrade or create all of the tables in the 'global' table list (blogs, sitemeta & etc) as well as tables for the individual site being upgraded (comments, options & etc). In the current version, it asks MySQL to list all of the tables it knows and then iterates that entire list picking out just those tables listed above. On large sites, this simple "show tables" request can take 5+ minutes to complete.

This patch uses the standard MySQL show tables call with a where clause to limit the query to the just the global tables and the tables that start with the prefix for the site we are looking at.

It should be fully compatible with both standalone and MS WordPress installations.

It requires MySQL 5.0 and therefore will only work with WordPress 3.2+

I use this in the field with two independent clients each with more than 10,000 sites in their MS network. With over 90,000 tables, the one site would have taken 8 days to complete its upgrade. With this patch, it took less than 4 hours to complete the upgrade. The other network was timing out with every new user that signed up. This caused inconsistent issues ranging from tables never being created for the blogs to the default themes and settings never being applied. Sometimes, if another blog was created right away, a blog creation would succeed. I think this was when the query result was still cached by the MySQL server.

Attachments (4)

17998_upgrade_limit_show_tables.diff (695 bytes) - added by brianlayman 13 years ago.
Adds a where clause to the show all command
17998.diff (1.9 KB) - added by nacin 13 years ago.
17998.2.diff (2.2 KB) - added by nacin 13 years ago.
17998.3.diff (2.2 KB) - added by nacin 13 years ago.

Download all attachments as: .zip

Change History (29)

@brianlayman
13 years ago

Adds a where clause to the show all command

#1 @kawauso
13 years ago

  • Keywords has-patch added

#2 @uglyrobot
13 years ago

All for this. We avoid dbDelta in all plugins on Edublogs for this very reason.

#3 @scribu
13 years ago

  • Keywords reporter-feedback added

If you only ugrade the tables for a particular site, that means that you will have to run it for each site in the network. How is that faster?

#4 @brianlayman
13 years ago

Because this routine is already called for each site. Specifying a where clause that restricts the records to only those records likely to match the criteria in the if statement later in this routine, you reduce the possible result set from tens of thousands of records (or more) to perhaps a dozen or two records. That's how it is faster, going minutes to milliseconds. With this routine called thousands of times in an upgrade, the improvement is multiplied many many times.

The if statement is still required as plugins may have added additional per site tables.

Last edited 13 years ago by brianlayman (previous) (diff)

#5 @scribu
13 years ago

  • Keywords reporter-feedback removed
  • Milestone changed from Awaiting Review to 3.3

Ok, that makes sense.

Last edited 13 years ago by scribu (previous) (diff)

#6 @nacin
13 years ago

What part of this requires 5.0?

#7 @dd32
13 years ago

What part of this requires 5.0?

the WHERE (expr) for SHOW TABLES is 5+, IIRC, 4.1+ supported SHOW TABLES LIKE but not WHERE. Docs for 5.0 and 4.1 seem to agree.

#8 @inxilpro
13 years ago

We've had similar issues, but they're due to the fact that our data is sharded, so each table may be in a different database or a different server. This means that just running SHOW TABLES and then looping through them will never work. The only solution that I can think of is to loop through each table in $cqueries and check that table's existence individually. It's not as performant, but it opens a spot for plugins like HyperDB to route that query to the right database.

Thoughts?

#9 @andy
13 years ago

Why does dbDelta bother with SHOW TABLES at all? Why not just loop
over the table names and do DESCRIBE TABLE on each one?

foreach ( $cqueries as $table => $qry ) {
    $tablefields = $wpdb->get_results( "DESCRIBE $table" );
    if ( $tablefields ) {
        // Generate ALTER TABLE statements
    } else {
        // Use CREATE TABLE statement
    }
}

Related:

http://lists.automattic.com/pipermail/hyperdb/2011-October/000228.html
http://lists.automattic.com/pipermail/hyperdb/2011-October/000229.html

#10 @uglyrobot
13 years ago

  • Keywords needs-patch added; has-patch removed
  • Summary changed from New Site & Upgrade process uses unrestricted "show all" causing timeouts to dbDelta uses unrestricted "show all" which doesn't scale or work with DB sharding plugins

Fully agree, it's wasteful an doesn't scale to run a SHOW TABLES. This issue breaks dbDelta with our multidb plugin too (http://premium.wpmudev.org/project/multi-db) for the same reason. Plugins that use it trigger db errors as dbDelta tries to create an already existing table.

If it could be switched to a "DESCRIBE $table" then sharding plugins can intercept that and direct it to the proper DB.

#11 @ryan
13 years ago

A describe loop sounds much better than show tables. Patch it up.

@nacin
13 years ago

#13 @nacin
13 years ago

Took a shot at andy's approach. 17998.diff. Wholly untested. I left stuff improperly indented so as not to muddle the diff.

@nacin
13 years ago

#14 @nacin
13 years ago

Updated it so we actually use the value for the foreach of cqueries, in 17998.2.diff.

#15 @brianlayman
13 years ago

Ack! This original patch wasn't applied to 3.3 yet? It's Milestone 3.3, it really should have been included in beta 1...

I was just going to argue that this ticket shouldn't be changed as it is the most compatible with 3.2 and earlier and it's too late in the game to switch to all new code for this release.

The change in the original is required in order to update large sites. It's been tested already on several large sites I've managed (210,000 blogs on one 12,000 on another), so I know it works and is stable.

Since there's another ticket for the alternative method, I'd really like to see it applied in 3.4 and this one make it in 3.3.

#16 @ryan
13 years ago

17998.2.diff generates some warnings during scratch install, but otherwise works:

WordPress database error: [Table 'wordpress.wp_test_users' doesn't exist]
DESCRIBE wp_test_users;

@nacin
13 years ago

#17 @nacin
13 years ago

17998.3.diff does a simple suppress of potential errors from DESCRIBE.

#18 @brianlayman
13 years ago

Yeah, that looks clean. It should work well.

#19 @ryan
13 years ago

I beefed up the multisite unit tests for blog creation/deletion in [UT460] and ran them with this patch. Looks good there.

#21 @ryan
13 years ago

Scratch install looks good with 17998.3.diff

#22 @inxilpro
13 years ago

17998.3.diff looks good to me.

#23 @nacin
13 years ago

In [19041]:

Undent (by two tabs) a whole lot of code in dbDelta(). Thx. see #17998.

#24 @ocean90
13 years ago

  • Keywords needs-patch removed

#25 @ryan
13 years ago

  • Resolution set to fixed
  • Status changed from new to closed
Note: See TracTickets for help on using tickets.