Make WordPress Core

Opened 3 years ago

Last modified 7 months ago

#54042 new enhancement

Extending wpdb::prepare() to support IN() operator

Reported by: craigfrancis's profile craigfrancis Owned by:
Milestone: Future Release Priority: normal
Severity: normal Version:
Component: Database Keywords: has-patch dev-feedback needs-testing early has-unit-tests changes-requested
Focuses: Cc:

Description (last modified by craigfrancis)

wpdb::prepare() helps avoid SQL Injection vulnerabilities, by escaping most variables correctly.

WP 6.1 added support for Identifiers (table/field names) with %i, in #52506.

But it's also fairly common to make a mistake to include values with the IN() operator, for example:

<?php
$where = 'WHERE id IN (' . implode( ',', $ids ) . ')'; // INSECURE?

Developers need to be sure $ids has come from a trusted source, or use something like wp_parse_id_list() or array_map('intval', $ids).


Maybe support could be added with:

<?php
$wpdb->prepare('WHERE id IN (%...d)', $ids);

Where %...d or %...s would safely (and easily) include a comma separated array of integers or strings - taking the idea of using '...' for variadics in PHP.

https://wiki.php.net/rfc/variadics
https://www.php.net/manual/en/functions.arguments.php#functions.variable-arg-list
https://dev.mysql.com/doc/refman/8.0/en/comparison-operators.html#operator_in

Change History (48)

This ticket was mentioned in PR #1668 on WordPress/wordpress-develop by craigfrancis.


3 years ago
#1

  • Keywords has-patch added

This is an *experimental* patch so that $wpdb->prepare() can support table/field names, and an array of values for the IN() operator.

Initial performance tests indicate it might be slightly faster for existing queries (as it replaces 2 regular expressions), and while the new %...d like syntax should be safer, it is slower than doing the implode/intval manually.

Trac ticket: https://core.trac.wordpress.org/ticket/54042

#2 @ocean90
3 years ago

Hello @craigfrancis, welcome to WordPress Trac!

For escaping table names we already have a ticket at #52506. Would be good to keep the discussion in one place for this. The ticket also notes the potential conflicts with future printf formats.

For the IN operator the coding standards recommend to use a combination of implode() and array_fill(), see these examples.

#3 @craigfrancis
3 years ago

Thank you @ocean90, I completely missed the ticket regarding table names.

In regards to the IN operator, thanks again, I hadn't realised the Coding-Standards suggested using:

$where = $wpdb->prepare(
  sprintf(
    "post_type IN (%s)",
    implode( ',', array_fill( 0, count($post_types), '%s' ) )
  ),
  $post_types
);

But I still think this would be easier/safer:

$where = $wpdb->prepare( 'post_type IN (%...s)', $post_types );

It also means the $query argument to $wpdb->prepare() could use the literal-string type that's now available in Psalm 4.8.0 and PHPStan 0.12.97 (and will hopefully be added to PHP in the future), doing this allows us to avoid unsafe-variable concatenation and escaping mistakes.

#4 @craigfrancis
3 years ago

I've updated my patch so that it passes the existing unit tests.

Some basic checks suggest it might run a bit faster than the original - a 10k loop, on a query with 1 argument went from ~0.029 to ~0.023, and 3 arguments went from ~0.045s to ~0.041s (I think that's due to removing a RegEx).

I am still concerned that "%s / %5s" would quote the first string but not the second, but doing so does preserve backwards compatibility - "frequently used in the middle of longer strings, or as table name placeholders".

[Removed, stats out of date]


3 years ago
#5

This ticket was mentioned in Slack in #core by craigfrancis. View the logs.


2 years ago

#7 @craigfrancis
2 years ago

I used %i for "Identifier"; and because printf() in C uses %d and %i for Integers, whereas PHP only uses %d (so PHP is unlikely to use %i for anything else).

I'd like this to be included in 5.9, as it shouldn't cause any backwards compatibility issues; and I want to focus on the 'WHERE id = %5s' problem in 6.0 (that will require more discussion and testing).

This ticket was mentioned in Slack in #core by sergey. View the logs.


2 years ago

This ticket was mentioned in Slack in #core by craigfrancis. View the logs.


2 years ago

#10 @audrasjb
2 years ago

  • Keywords dev-feedback needs-testing added
  • Milestone changed from Awaiting Review to 5.9
  • Version trunk deleted

Moving for 5.9 consideration

This ticket was mentioned in Slack in #core by audrasjb. View the logs.


2 years ago

#12 @audrasjb
2 years ago

  • Keywords early added
  • Milestone changed from 5.9 to Future Release

As per today's bugscrub, this ticket needs to be addressed early in the release cycle. Moving to Future release with early keyword.

#13 @johnbillion
2 years ago

Thanks for working on this @craigfrancis.

I think this functionality needs to be introduced via a new wpdb method, otherwise there is no means for a plugin to safely determine whether it's able to use the new formats while retaining support for older versions of WordPress. It's not safe to check the WordPress version number because a custom content/db.php driver could be in use which doesn't support the new formats and would return invalid SQL. If a new method was introduced instead then the plugin could safely check for the existence of the method and use the new formats, and fall back to using prepare() without them if it doesn't exist.

#14 @craigfrancis
2 years ago

Thanks @johnbillion; that’s a good point, and I’m happy to look at alternatives (as an aside, there are a couple of things I’d like to do with wpdb, like looking at "%s / %5s" quoting the first string but not the second, and being able to use the literal-string type to prevent Injection vulnerabilities, but that’s going a bit off topic, so maybe a chat on Slack/email?).

As to this proposal, I would like to keep all escaping functionality in one method, so we keep using one nice and simple way to handle all values (rather than having 2-3 different methods everyone needs to know about, even in ~10 years time)… that said, you are right, a plug-in does need to know if it can use these placeholders, so maybe a version number (what does EZSQL_VERSION do? or maybe a simpler $wbdp->version = 2?), or maybe a method that returns an array of capabilities or details about wpdb?

#15 @craigfrancis
2 years ago

I went with $wbdp->version = 2, so plugins can easily "determine whether it's able to use the new formats", as you're right, it's possible a custom content/db.php could provide their own prepare() method.

This ticket was mentioned in PR #2191 on WordPress/wordpress-develop by craigfrancis.


2 years ago
#16

  • Keywords has-unit-tests added

Update $wpdb->prepare() to support table/field names, and an array of values for the IN() operator.

Trac ticket: https://core.trac.wordpress.org/ticket/54042

craigfrancis commented on PR #1668:


2 years ago
#17

Replaced by Pull Request #2191.

I deleted my repo after I made a mess with the master/trunk change... and I've also changed the patch to use preg_split() rather than using the slightly slower preg_match_all() with PREG_OFFSET_CAPTURE.

This ticket was mentioned in Slack in #core by craigfrancis. View the logs.


2 years ago

#19 @craigfrancis
2 years ago

  • Milestone changed from Future Release to 6.0

This ticket was mentioned in Slack in #core by craigfrancis. View the logs.


2 years ago

This ticket was mentioned in Slack in #core by craigfrancis. View the logs.


2 years ago

This ticket was mentioned in Slack in #core by chaion07. View the logs.


2 years ago

#23 @chaion07
2 years ago

Thanks @craigfrancis for reporting this. We reviewed this ticket during a recent bug-scrub session. Here's the highlight from the discussion:

  1. We think that we're getting a bit close to beta to be making changes to $wpdb
  2. There's been a bit of activity in GitHub and the team appreciate @craigfrancis efforts as we're getting some good reviews from Committers.
  3. @peterwilsoncc has volunteered to reach out to people via Slack aiming for this to be done by this Friday.

Props to @peterwilsoncc & @markparnell

Cheers!

#24 @craigfrancis
2 years ago

Yep, the focus has been on #52506, and while I’d like both of these features to be landed at the same time (they work well together), it can wait if there isn’t enough time.

This ticket was mentioned in Slack in #core by chaion07. View the logs.


2 years ago

#26 @peterwilsoncc
2 years ago

  • Milestone changed from 6.0 to Future Release

@craigfrancis I'll move this off the milestone to keep focus on the identifiers ticket.

#27 @apokalyptik
2 years ago

Over at A8C this has come up a number of times in the past month or so for both my team and others...

It would be invaluable to have this functionality in core because tricks such as vsprintf argnum hacks used to get around code sniffers, etc, undermine the integrity and security that those checks are meant to enforce.

I wanted to implement this as a flag: %,d, %,f, and %,s specifically.

So I started looking into how to do this safely. It turns out that this is very complicated and difficult for a number of reasons. I'll list a couple below...

  1. prepare does not actually parse like vsprintf does, but uses a regex which is, i think, imperfect (aren't they all?)
  2. for legacy reasons argnum's aren't escaped (which is why they're used as hacks for IN() statements)
  3. to avoid (possibly infinite) recursion with wpbd::prepare() you want to expand the specifiers and the args but
  4. this changes the order, and number, of the arguments and thus what the argument numbers need to be to stay correct.

The other thing is that it needed to be as fully backwards compatible with prepare() (or at least punt and return to prepare on some sort of error) as possible. This meant that I needed to (and did) run my solutions through the entire wpdb unit testing suite (and extend it to include implosion specifiers.) All tests pass, excepting assertions around doing_it_wrong where the tests don't expect my new doing_it_wrong's in another function. To replicate this you have to edit wpdb::prepare() and comment out the checks which prevent this scanning from happening unnecessarily.

So I ended up having to write a parser to accomplish the task. Which I did. This was mostly very easy except in the argnum case (which is where all of the complication came in.) For this case what I ended up doing is de-referencing the arguments and having them "point" to the newly de-referenced args element. Originally I was just going to change them inline after de-referencing them from, say, %1$s to %s but because the placeholders for those come after the positional arguments that was a no-go because it severely mis-ordered the query. Also, as mentioned earlier, they are not escaped. I would have just interpolated the parameters properly inline if it weren't for that.. Also I wanted prepare to continue handling the actual escaping of the data... So append and re-point is what I went with.

The code is more complex than I would have liked but because of the need to parse argnums and reorder parameters correctly it is what it is...

My code can be found here: https://github.com/apokalyptik/wordpress-develop/compare/trunk...wpdb-implode-specifier?expand=1

#28 follow-up: @craigfrancis
2 years ago

Hi @apokalyptik, I'm not at my computer, but have you checked the patch I've provided (test/download, note the last link includes support for IN()), it uses a slightly different syntax (%...d vs %,d) in a similar way to how variadics works in PHP, but I think it covers all the points you raised, and works with table/field name (identifiers) patch that's had a few reviews already... if I've missed anything, please let me know.

#29 in reply to: ↑ 28 @apokalyptik
2 years ago

Replying to craigfrancis:

Hi @apokalyptik, I'm not at my computer, but have you checked the patch I've provided (test/download, note the last link includes support for IN()), it uses a slightly different syntax (%...d vs %,d) in a similar way to how variadics works in PHP, but I think it covers all the points you raised, and works with table/field name (identifiers) patch that's had a few reviews already... if I've missed anything, please let me know.

Hi! I have to admit. I did all this before even thinking about looking to see if there was already work being done on this. So *after* doing what I did I went and found this ticket, and so decided to post my experience and code. I never considered that it might be taken as a critique on existing work, so please please don't take it that way. I have not reviewed the code... anywhere :D (except my own, obviously)

#30 @apokalyptik
2 years ago

@craigfrancis

I've given the code a little review and a little thought. tested a few things, and it looks pretty good! A lot of streamlining was possible when moving it out of a proof like I had it and into the prepare function itself.

I'm not sure I, personally, like the variadic-alike syntax here. I feel something more terse with a single not-currently-used character is more appropriate with its peers.

I think if you use a single array for the args instead of arg_variadics and arg_identifiers and whatnot and make the value richer and use $arg_id as the key you can take out all of the multiple array searching you're doing at the end of prepare, making the whole thing lighter.

I would also encourage expanding your logic up top with the passed as array checks. There's no need for it to be so terse, and using multiple lines and statements and checks will make it more understandable at a glance.

I especially like your use of PREG_SPLIT_DELIM_CAPTURE, and how you used it to make parsing the specifiers simpler. That was very nice. that substr -1 for the specifier was so good.

Overall very good quality code here, though. Nice work.

#31 follow-up: @craigfrancis
2 years ago

Thank you @apokalyptik, I really appreciate your time and thoughts that have gone into this.

When you say "a single not-currently-used character", can you expand on that? The only reason I went with ... is because it matches variadics in PHP, and because I think it would help someone seeing it for the first time guess what it's doing (i.e. it's an ellipsis, which has meaning in the English language already). And while I appreciate a comma (a single character) is similar to $ for parameter number, and . for precision, if printf was to change in the future (I know, highly unlikely), I suspect it would be more likely to use a comma (to follow their single character pattern), and that would create a compatibility issue.

The $arg_identifiers are separate so I can easily escape them differently, and to trigger the "cannot be used for both String and Identifier escaping" error. As in, if someone tried to use the same value in both a string/number and identifier context (odd, but it might happen), because they cannot be escaped in the same way, I need to reject that, and ask them to provide the value twice, e.g.

<?php
$wpdb->prepare( 'WHERE %i LIKE "%1$s" LIMIT 1', [ 'aaa', false ] ); // Sorry, no can do.
$wpdb->prepare( 'WHERE %i LIKE %s LIMIT 1', [ 'aaa', 'aaa' ] );

Also, some of the choices were to keep the performance/memory impact really low (it's generally about the same, and in some cases can be even faster due to less RegEx work).

If you want to propose any changes, I've got a branch on GitHub, and happy to discuss here and off-list, but keep in mind the diff from ticket-52506 (to add support for identifiers) is quite small (unfortunately GitHub compare seems to be broken, A vs B).

#32 in reply to: ↑ 31 ; follow-up: @apokalyptik
2 years ago

Replying to craigfrancis:

When you say "a single not-currently-used character", can you expand on that?

Sure. The vsprintf format string format goes %[argnum$][flags][width][.precision]specifier. within this format each active piece of the format is exactly one character which is not reused between the hardcoded parts %, %, . and also the non-variable parts (that comprise flags, and specifier) with the only exception being % which the first % is escaping.

This is why I felt that %,d fit better than %...d. ... is both more than one character for an operation and also . is already used as the format signifier that the current version specification has entered defining precision

The $arg_identifiers are separate so I can easily escape them differently, and to trigger the "cannot be used for both String and Identifier escaping" error. As in, if someone tried to use the same value in both a string/number and identifier context (odd, but it might happen), because they cannot be escaped in the same way, I need to reject that, and ask them to provide the value twice, e.g.

Right. I'm suggesting that a single array in the format of something like $index[ $arg_id ] = array( $boolVariadic, $boolIdentifier[, ...] ); would let you pull the data without doing an array value search with in_array (multiple times in some cases.) Just storing the metadata in a better indexed format like this removes the value searching.

Last edited 2 years ago by apokalyptik (previous) (diff)

#33 in reply to: ↑ 32 @craigfrancis
2 years ago

Replying to apokalyptik:

This is why I felt that %,d fit better than %...d.

Some of this is going to be about taste and familiarity... but if I was to try and move towards data, the bit I worry about is future compatibility. Even though printf is very unlikely to change now, if it was to be expanded, I would expect printf to use another single character (where I worry that a comma would be quite likely); in contrast, and as you've noted, it's already using . for precision, so I think ... is less likely to be used.

I'm suggesting that a single array in the format of something like $index[ $arg_id ] = array( $boolVariadic, $boolIdentifier[, ...] );

I've had a quick attempt at this, updating the structure of $args, but it seems to be turning into a bigger change than I was hoping. Maybe I've missed something obvious, but considering 6 people have already reviewed PR 2192, all the unit tests pass, the performance tests are doing very well, and I'm hoping to get it merged soon (the branch for 6.1 was last week), maybe we can try this different array structure later, as a separate task?

This ticket was mentioned in Slack in #core by craigfrancis. View the logs.


22 months ago

#35 @jrf
21 months ago

For what it's worth: I like and support using the ... syntax as it feels intuitive to use with its likeness to the spread operator.

Only question I'd have (though I haven't looked at the patch yet) is how this would interact with other potential modifiers. Like using %04d to get 0 padded numbers in the replacement values.

Note: if that would not be supported, that wouldn't be a blocker IMO, as in that case, people who really need that can still fall back to the "old" recommended code pattern for creating the replacement values.

I look forward to seeing this feature land!

#36 @sc0ttkclark
19 months ago

I submitted this ticket that proposed an alternative solution for the IN/NOT IN syntax, I believe the suggestion for the spread %...s operator placeholder is even better.

I'm going to close https://core.trac.wordpress.org/ticket/56764#ticket as duplicate

#37 @sc0ttkclark
19 months ago

#56764 was marked as a duplicate.

#38 @craigfrancis
18 months ago

  • Description modified (diff)
  • Summary changed from Extending wpdb::prepare() to support table/field names, and IN() operator to Extending wpdb::prepare() to support IN() operator

#39 @craigfrancis
9 months ago

I think I need some direction/help with this one.

My basic patch is fairly simple, but it has problems with argnums, e.g.

<?php
$wpdb->prepare(
    'WHERE id IN (%...d) AND (name = "%3$s" OR name = "%2$s")',
    [
      [1, 2, 3],
      'A',
      'B'
    ] );

// WHERE id IN (1,2,3) AND (name = "3" OR name = "2")

Because %...d can hold many values, the later placeholders need their argnums updated (insert my usual complaints about these placeholders being un-quoted); and it doesn't support modifiers (ref @jrf, "using %04d to get 0 padded numbers in the replacement values"), e.g. %...04d.

I started putting together a second patch, it supports modifiers, and re-indexing argnums, but it's a bit more complex, and has a performance impact (admittedly in the range of fractions of a millisecond, but it can be 20% slower, e.g. 0.00248 ms to 0.00296 ms).

I've got a testing page, where tests 009 and later show the new functionality (patches can be found at the bottom of the page).

Am I being too concerned? is there a simpler way of doing this?

(while I dislike complexity, I'd rather have it here, where it can be well tested, instead of the current approach of developers writing complex code themselves, and we hope they never make a mistake).

This ticket was mentioned in Slack in #core by craigfrancis. View the logs.


9 months ago

#41 @SergeyBiryukov
9 months ago

  • Milestone changed from Future Release to 6.4

Thanks for the ticket! Found it while looking into #56091. Moving to 6.4 to get more eyes on this.

This ticket was mentioned in Slack in #core by hellofromtonya. View the logs.


8 months ago

This ticket was mentioned in PR #4990 on WordPress/wordpress-develop by craigfrancis.


8 months ago
#43

Update $wpdb->prepare() to support an array of values for the IN() operator.

Incomplete, trying to support argnums (e.g. %...2$d) and formatting.

craigfrancis commented on PR #4990:


8 months ago
#44

Thanks @mukeshpanchal27, those are good points, and I'll hopefully have some time this weekend to do those and look at some of the broken tests.

I'm just wondering if this approach is getting a bit, ugly?

Trying to take the provided format string as $query, to then do a fair amount of parsing/editing, while making a new array of escaped values, just to get it ready for vsprintf() which will then need to re-parse.

Looking at the php src for php_formatted_print(), I'm wondering if we could do this ourselves? I know we won't get C performance, but we are already doing a lot of string processing, especially with those RegEx's, and we don't need to worry about the C oddities.

I don't know, I really don't want to re-write it all, I try to avoid that as much as possible, but I just worry that this patch is making it even more complicated (it might be fine though, if it needs to be complicated, I'd rather it be here, rather than the complexity being everywhere else in Core, the plugins, etc).

This ticket was mentioned in Slack in #core by oglekler. View the logs.


7 months ago

#46 @costdev
7 months ago

  • Keywords changes-requested added

This ticket was discussed in the recent bug scrub. As this ticket is early, and we're now out of the early period for the 6.4 milestone, this ticket should be moved to 6.5 as soon as the milestone becomes available in Trac.

I've also left a review requesting some changes to PR 4990. Adding changes-requested.

#47 @craigfrancis
7 months ago

  • Milestone changed from 6.4 to Future Release

Thanks @costdev, I've changed it to Future Release (as I'm not happy with the current approach), I've merged with trunk, and thanks for the suggestions (have been added).

#48 @costdev
7 months ago

Thanks @craigfrancis!

Note: See TracTickets for help on using tickets.