Make WordPress Core

Opened 9 years ago

Last modified 5 years ago

#34525 new enhancement

Use wp_parse_id_list() in the WP_Query class

Reported by: birgire's profile birgire Owned by:
Milestone: Priority: normal
Severity: normal Version: 4.3.1
Component: Query Keywords: has-patch
Focuses: Cc:

Description

We have this handy core function to clean up an array, comma- or space separated list of IDs:

function wp_parse_id_list( $list ) {
    if ( !is_array($list) )
        $list = preg_split('/[\s,]+/', $list);
 
    return array_unique(array_map('absint', $list));
}

Why not use it where we can?

Examples:

Instead of the following lines, in the WP_Query class:

	$q['category__in'] = array_map( 'absint', array_unique( (array) $q['category__in'] ) );
	
	$q['category__not_in'] = array_map( 'absint', array_unique( (array) $q['category__not_in'] ) );
	
	$q['category__and'] = array_map( 'absint', array_unique( (array) $q['category__and'] ) );

	$q['tag__in'] = array_map('absint', array_unique( (array) $q['tag__in'] ) );

	$q['tag__not_in'] = array_map('absint', array_unique( (array) $q['tag__not_in'] ) );

	$q['tag__and'] = array_map('absint', array_unique( (array) $q['tag__and'] ) );


we could simplify it with:

	$q['category__in'] = wp_parse_id_list( $q['category__in'] );

	$q['category__not_in'] = wp_parse_id_list( $q['category__not_in'] );

	$q['category__and'] = wp_parse_id_list( $q['category__and'] );

	$q['tag__in'] = wp_parse_id_list( $q['tag__in'] );

	$q['tag__not_in'] = wp_parse_id_list( $q['tag__not_in'] );

	$q['tag__and'] = wp_parse_id_list( $q['tag__and'] );

Similarly for:

	$author__not_in = implode( ',', array_map( 'absint', array_unique( (array) $q['author__not_in'] ) ) );

	$author__in = implode( ',', array_map( 'absint', array_unique( (array) $q['author__in'] ) ) );


we could use:

	$author__not_in = implode( ',', wp_parse_id_list( $q['author__not_in'] ) );

	$author__in = implode( ',', wp_parse_id_list( $q['author__in'] ) );

I will check it further and soon add the relevant patches.

Attachments (1)

34525.diff (2.7 KB) - added by birgire 9 years ago.

Download all attachments as: .zip

Change History (7)

#1 @DrewAPicture
9 years ago

  • Keywords close added

Testing strictly against an array of ids and benchmarking the average of 5,000 iterations, wp_parse_id_list() is just slightly slower:

$ids = range( 1000, 2000 );
array_map( 'absint', array_unique( $ids )

0.0015718542098999
0.0015127508163452
0.0015818600177765
$ids = range( 1000, 2000 );
wp_parse_id_list( $ids );

0.0016850507736206
0.0016270678043365
0.0016728698253632

In addition to the minor performance hit, the main problem I see with this is that we'd effectively be adding support for comma-separated ID lists for all of these arguments, and I'm not sure that's a rabbit hole we want to go down. Even if we made the comma-separated ID list bit in wp_parse_id_list() optional, it's still slower.

Side note: It's actually a little puzzling that the current code, which runs array_unique() on every array_map() iteration is actually faster than running the array_map() and then array_unique() on the result as in wp_parse_id_list(). PHP, you so silly.

Benchmarked with:

<?php
$ids = range( 1000, 2000 );

$start      = microtime( true );
$iterations = 5000;

for ( $i = 0; $i < $iterations; $i++ ) {
        $list = array_map( 'absint', array_unique( $ids ) );
//      $list = wp_parse_id_list( $ids );
}

$end = microtime( true ) - $start;
print( $average = $end / $iterations );
Last edited 9 years ago by DrewAPicture (previous) (diff)

#2 @dd32
9 years ago

Although micro optimizations can add up, given these are not called repetitively on each pageload I think we can deal with the function call overhead for more standardised code.

If we want to retain the no-comma-seperator behaviour, we simply need to pass the array in, as in: wp_parse_id_list( (array) $q['category__in'] );

#3 @birgire
9 years ago

Thanks @DrewAPicture for these informative tests.

Yes it's a constant battle between maintainable vs optimization ;-)

It's funny to see the array_map and array_unique swap difference ;-)

Are you maybe suggesting to swap it within the wp_parse_id_list() function?

It's probably too much to allow comma separated values as well (even though it would add more consistancy), so we should add the (array) casting part, as suggested by @dd32

ps: here's a little detour regarding microoptimizations

There are some interesting replacements out there for:

  1. is_array() with (array) $var === $var, but the former is more readable. Here's an in-depth discussion by ircmaxell: http://stackoverflow.com/a/3471356/2078474.
  1. array_unique() that seems to be slower than some suggested semi-replacements out there, where some preserve the key order and other don't.


Here are some of these suggested semi-replacements for part 2)

<?php

//----------------------------------
// array_unique() semi-replacements
//----------------------------------

// @link http://php.net/manual/en/function.array-unique.php#98453
function wp_array_unique1( $arr = array() )
{
    $tmp = array(); 
    foreach( $arr as $key => $val ) {    
        $tmp[$val] = true; 
    } 
    return array_keys( $tmp ); 
}	 

// @link http://www.puremango.co.uk/2010/06/fast-php-array_unique-for-removing-duplicates/#comment-202704
function wp_array_unique2( $arr = array() )
{
	$tempArray = array();
	foreach($arr as $key => $val ) {
		if( isset( $tempArray[$val] ) ) continue;
		$tempArray[$val] = $key;
	}
	return array_flip($tempArray);
}

// @link http://stackoverflow.com/questions/8321620/array-unique-vs-array-flip
function wp_array_unique3( $arr = array() )
{
	// Code documented at: http://www.puremango.co.uk/?p=1039
	return array_flip(array_flip(array_reverse($arr,true)));
}

// @link http://stackoverflow.com/questions/8321620/array-unique-vs-array-flip
function wp_array_unique4( $arr = array() )
{
	return array_keys( array_flip( $arr ) );
}

// @link http://stackoverflow.com/questions/8321620/array-unique-vs-array-flip
function wp_array_unique5( $arr = array() )
{
	return array_flip( array_flip( $arr ) );
}

//----------
// Input
//----------
$input = array( 
    0 => 1,
    1 => 2,
    2 => 2,
    3 => 0,
    4 => -1,
    5 => '1',
    'a' => 'c',
    'b' => 'a',
    'c' => 'c'
);

//-------------------
// Test comparisions
//-------------------
echo 'Input:';
print_r( $input );
echo 'array_unique: ';
print_r( array_unique( $input ) );
echo 'Replacement #1: ';
print_r( wp_array_unique1( $input ) );
echo 'Replacement #2: ';
print_r( wp_array_unique2( $input ) );
echo 'Replacement #3: ';
print_r( wp_array_unique3( $input ) );
echo 'Replacement #4: ';
print_r( wp_array_unique4( $input ) );
echo 'Replacement #5: ';
print_r( wp_array_unique5( $input ) );

I added this test here: https://3v4l.org/AaU1X

The output is:

Input:Array
(
    [0] => 1
    [1] => 2
    [2] => 2
    [3] => 0
    [4] => -1
    [5] => 1
    [a] => c
    [b] => a
    [c] => c
)
array_unique: Array
(
    [0] => 1
    [1] => 2
    [3] => 0
    [4] => -1
    [a] => c
    [b] => a
)
Replacement #1: Array
(
    [0] => 1
    [1] => 2
    [2] => 0
    [3] => -1
    [4] => c
    [5] => a
)
Replacement #2: Array
(
    [0] => 1
    [1] => 2
    [3] => 0
    [4] => -1
    [a] => c
    [b] => a
)
Replacement #3: Array
(
    [a] => c
    [b] => a
    [0] => 1
    [4] => -1
    [3] => 0
    [1] => 2
)
Replacement #4: Array
(
    [0] => 1
    [1] => 2
    [2] => 0
    [3] => -1
    [4] => c
    [5] => a
)
Replacement #5: Array
(
    [5] => 1
    [2] => 2
    [3] => 0
    [4] => -1
    [c] => c
    [b] => a
)	

Here's the input extended further, to better understand what happens for non-string/integer input values: https://3v4l.org/ZeLCf

Let's play with the test by @DrewAPicture:

$ids = range( 1000, 2000 );

$start      = microtime( true );
$iterations = 5000;

for ( $i = 0; $i < $iterations; $i++ ) 
{
		
	// test #0
	$list = array_map( 'absint', array_unique( $ids ) );

	// test #1
	// $list = array_map( 'absint', wp_array_unique1( $ids ) );

	// test #2
	// $list = array_map( 'absint', wp_array_unique2( $ids ) );

	// test #3
	// $list = array_map( 'absint', wp_array_unique3( $ids ) );
	
	// test #4
	// $list = array_map( 'absint', wp_array_unique4( $ids ) );
	
	// test #5
	// $list = array_map( 'absint', wp_array_unique5( $ids ) );
	
}

$end = microtime( true ) - $start;
print( $average = $end / $iterations );

The results on my server are:

test #0
	0.0094432024002075
	0.0093391174316406
	0.0093821507930756
		
test #1
	0.0069236095905304
	0.0069987013816833
	0.0068500112056732

test #2
	0.0071335470199585
	0.0070521537780762
	0.0069442203998566

test #3
	0.0066374749660492
	0.0065019884109497
	0.0065506939888
	
test #4
	0.0062625207901001
	0.0062665887832642
	0.0063331448078156
	
test #5
	0.0063409204006195
	0.0063279666423798
	0.0063581778049469

So array_unique() is indeed the slowest one here.

Version 0, edited 9 years ago by birgire (next)

#4 @DrewAPicture
9 years ago

  • Keywords close removed

@birgire Adding an actual patch would be a good next step for continued dev feedback.

#5 @johnbillion
9 years ago

  • Keywords needs-patch added

@birgire
9 years ago

#6 @birgire
9 years ago

  • Keywords has-patch added; needs-patch removed
Note: See TracTickets for help on using tickets.