Opened 4 years ago
Last modified 3 years ago
#50789 new defect (bug)
Improve WPDB logic around information_schema
Reported by: | andy | Owned by: | |
---|---|---|---|
Milestone: | Awaiting Review | Priority: | normal |
Severity: | normal | Version: | 5.5 |
Component: | Database | Keywords: | |
Focuses: | multisite | Cc: |
Description
WooCommerce uses a query on information_schema.table_constraints
to determine whether it needs to add a foreign key constraint to one of its tables.
SELECT COUNT(*) AS fk_count FROM information_schema.TABLE_CONSTRAINTS WHERE CONSTRAINT_SCHEMA = '' AND CONSTRAINT_NAME = 'fk_wp_12345_wc_download_log_permission_id' AND CONSTRAINT_TYPE = 'FOREIGN KEY' AND TABLE_NAME = 'wp_12345_wc_download_log'
$wpdb->get_table_from_query
returns information_schema
which is not a table, it is a database containing the table TABLE_CONSTRAINTS
. So it seems there is room to improve this method to extract the table name in cases where it is preceded by a database name. However, this is not the goal of this ticket. In the specific case of information_schema
tables, we're more interested in the table referenced in the WHERE
clause.
My use case involves a wpdb
drop-in, WordPress.com's hyperdb, which was the original source of this method when it was added in [30345]. We use hyperdb to map queries to database servers using table names as map keys. Given a table like wp_12345_posts
we connect to the right database.
The database information_schema
exists in all database servers. When WooCommerce queries the table table_constraints
it's looking for information about the table wp_12345_wc_download_log
. To route the query to the appropriate database server, we are interested in this table name.
I would propose adding this before the first preg_match
in get_table_from_query
:
// SELECT FROM information_schema.* WHERE TABLE_NAME = 'wp_12345_foo' if ( preg_match('/^\s*' . 'SELECT.*?\s+FROM\s+`?information_schema`?\.' . '.*\s+TABLE_NAME\s*=\s*["\']([\w-]+)["\']/is', $q, $maybe) ) return $maybe[1];
This returns wp_12345_wc_download_log
which allows us to route the query to the correct database server.
I am able to patch WordPress.com's drop-in to check this pattern before calling the parent method in core's wpdb
. So there is no need to rush on our behalf.
Does anyone know of a use case that relies on the existing implementation?
It might be argued that a caller of this function would expect the return to be TABLE_CONSTRAINTS
in this case. However, I was unable to find any tickets requesting a fix for the current behavior. I believe the best fix would be to return the table name from the WHERE
clause.
Attachments (1)
Change History (6)
#3
in reply to:
↑ 1
@
4 years ago
Replying to johnjamesjacoby:
Interesting. (Also, hey @andy!)
Hey, @johnjamesjacoby! :D
It's not common, but querying the
information_schema
database is a valid use ofWPDB::query()
. I expect forget_table_from_query()
to return the first table being queried, not theTABLE_NAME
- it seems too clever.
The original and primary use for get_table_from_query
in hyperdb was to allow hyperdb to route the query to the right database server. Years later, @pento borrowed this code for core in [30345] for a different purpose: to enable get_table_charset()
to write a SHOW COLUMNS
statement ultimately supporting strip_invalid_text_from_query()
. This does not fit with the proposal that I made, i.e. to extract from the TABLE_NAME
value in the WHERE
clause.
In agreement with your assessment, I now think that hyperdb should use a separate or hybrid solution, taking back responsibility for extracting the name that it needs for query routing.
Still, there is a valid issue in the original description: the bug where get_table_from_query
returns the database name information_schema
instead of the table name TABLE_CONSTRAINTS. This should be fixed. In the process, it can be considered whether to return the extracted database name somewhere (e.g.
information_schema`).
I probably wouldn't query
information_schema
directly in this situation for this reason. WooCommerce should be storing a version as an option for each of its custom tables, and the values of those options should dictate what SQL is generated. (This is also the approach I've taken in the BerlinDB project.)
We agree that information_schema
queries should be avoided. I can't speak for the WooCommerce logic of using both a stored version number and a schema check. But WooCommerce got a pull request to replace the information_schema
query with a SHOW CREATE TABLE
: https://github.com/Automattic/woocommerce/pull/84
#4
follow-up:
↓ 5
@
3 years ago
@andy I'm working on an issue in a site that uses LudicrousDB (fork of HyperDB) that ran into a similar issue when using the plugin multisite-clone-duplicator which also seems to query the information schema. The query looks like this:
<?php $wpdb->prepare('SELECT TABLE_NAME FROM INFORMATION_SCHEMA.TABLES WHERE TABLE_SCHEMA = \'%s\' AND TABLE_NAME LIKE \'%s\'', $schema, $from_site_prefix_like . '%');
The REGEX in get_table_from_query indeed collects the INFORMATION_SCHEMA.TABLE_NAME as demonstrated here: https://www.phpliveregex.com/p/AIE
Curious if a possible solution here might be to simply add a filter to the REGEX patterns (allowing more specific regex patterns to be injected) so that these queries could be handled in situations like these (edge cases).
#5
in reply to:
↑ 4
@
3 years ago
Replying to rfair404:
Curious if a possible solution here might be to simply add a filter to the REGEX patterns (allowing more specific regex patterns to be injected) so that these queries could be handled in situations like these (edge cases).
I like this idea very much. Although, I would prefer a simpler and more powerful implementation more like the pre_option
pattern.
Interesting. (Also, hey @andy!)
It's not common, but querying the
information_schema
database is a valid use ofWPDB::query()
. I expect forget_table_from_query()
to return the first table being queried, not theTABLE_NAME
- it seems too clever.The
get_table_from_query()
method documentation says:...emphasis first. 😕
I probably wouldn't query
information_schema
directly in this situation for this reason. WooCommerce should be storing a version as an option for each of its custom tables, and the values of those options should dictate what SQL is generated. (This is also the approach I've taken in the BerlinDB project.)If table versions are unknown, I would reverse engineer them by calling:
...and comparing those results to known database table schema changes.
I do think if this idea made its way into
get_table_from_query()
that probably not very many people would notice. But I still don't think it's a good idea. That said, I'm attaching a patch imminently so that others can test it out and draw their own conclusions. ❤️