Make WordPress Core

Opened 13 years ago

Closed 10 years ago

Last modified 10 years ago

#17689 closed defect (bug) (fixed)

Terms should not be sanitized inside term_exists()

Reported by: blepoxp's profile blepoxp Owned by: wonderboymusic's profile wonderboymusic
Milestone: 4.0 Priority: high
Severity: normal Version: 3.2
Component: Taxonomy Keywords: has-patch 4.0-early
Focuses: Cc:

Description

When adding a term to a post, the title of the term is sent through term_exists(). If term_exists finds and returns the ID of an existing term for the passed taxonomy, that ID is added to the post object. If no term is found, it returns false and a new term is created for that taxonomy with the same title that was passed to term_exists().

The problem is that term_exists() uses sanitize_title($term) on line 1457 of wp-includes/taxonomy.php while wp_insert_term uses stripslashes($name) on line 1985 of the same file.

This doesn't cause a problem in many circumstances, but if the term title happens to be something like $$$, that means it will always be added correctly in wp_insert_term() but never found as existing in term_exists(). The result is that every time you add $$$$ to another post it gets added as a new term with a unique slug so that you have several terms with the title $$$$ for the same taxonomy but different IDs.

The attached patch corrects that by passing the term title through stripslashes in term_exists() rather than through sanitize_title().

I haven't found any undesired side effects in testing.

Attachments (14)

17689.diff (388 bytes) - added by blepoxp 13 years ago.
replace sanitize_title with stripslashes
17689-2.diff (374 bytes) - added by blepoxp 13 years ago.
Didn't realize that it was actually sent through stripslashes() 2 lines up. Updated patch just removes sanitize_title()
non-alnum-terms.diff (1.2 KB) - added by wonderboymusic 12 years ago.
17689.2.diff (2.0 KB) - added by wonderboymusic 12 years ago.
ascii-results.txt (3.2 KB) - added by wonderboymusic 12 years ago.
17689-unit-tests-term.diff (798 bytes) - added by moraleida.me 11 years ago.
Unit tests first sketch
17689.3.diff (2.4 KB) - added by wonderboymusic 11 years ago.
17689.4.diff (4.4 KB) - added by wonderboymusic 11 years ago.
17689.5.diff (4.4 KB) - added by wonderboymusic 11 years ago.
17689-noformatting.diff (1.5 KB) - added by wonderboymusic 11 years ago.
17689.get-term-by-name.diff (746 bytes) - added by SergeyBiryukov 11 years ago.
17689.6.diff (2.1 KB) - added by wonderboymusic 11 years ago.
17689.7.diff (3.1 KB) - added by aaroncampbell 10 years ago.
17689.8.diff (3.9 KB) - added by wonderboymusic 10 years ago.

Download all attachments as: .zip

Change History (60)

@blepoxp
13 years ago

replace sanitize_title with stripslashes

#1 @blepoxp
13 years ago

Steps to reproduce if you can't make sense of the book i wrote above:

  1. Create a new post with the tag $$
  2. Create another post with the same tag: $$
  3. Create a third post with the same tag: $$
  4. Check the tag list to find three tags with the title $$ but with separate IDs.

@blepoxp
13 years ago

Didn't realize that it was actually sent through stripslashes() 2 lines up. Updated patch just removes sanitize_title()

#2 @wonderboymusic
12 years ago

This is totally a bug - you can add "$$$" in each and every taxonomy as many times as you want - but I have fixed it! Patch added.

#3 @nacin
12 years ago

  • Keywords needs-unit-tests added

#4 @scribu
12 years ago

  • Cc scribu added

#5 @webord
12 years ago

  • Cc webord.net@… added

#6 @phill_brown
12 years ago

Another scenario this affects:

  1. Create a tag called 'A' - the slug should default to 'a'
  2. Add/edit a post.
  3. In the tags box add 'A' to the post, and also 'A-'
  4. Update the post.

'A' will assign as expected but 'A-' will be discarded instead of added as a new tag.

santitize_title() will strip out the hyphen in 'A-' and match the slug against 'A'. The same behaviour would apply to any use case where a new tag name matched the slug of an existing tag.

#8 @judgej
12 years ago

This caused us a headache trying to trace the cause too.

We are wanting to use grades of music albums as a taxonomy in a shop. The list includes "M", "M+" and "M-" in the grades. Even though we can create all three taxonomies through the admin pages, we cannot assign either "M+" or "M-" to a post. Doing so will link the post to just "M" instead.

#9 @scribu
12 years ago

  • Milestone changed from Awaiting Review to Future Release

#10 @wonderboymusic
12 years ago

  • Keywords needs-refresh added
  • Milestone changed from Future Release to 3.6

My patch still applies cleanly against the original scenario and should be applied for terms that have all non-alpha chars

http://core.trac.wordpress.org/ticket/17689#comment:6 still needs to be addressed.

#11 @wonderboymusic
12 years ago

  • Keywords needs-refresh removed

I attacked sanitize_title_with_dashes() here as well #17648

I refreshed my patch for this ticket. With it, you can no longer add $$$ an endless number of times, and M- and M+ become m-minus and m-plus in slug form. This way be controversial, but my thinking (and RegEx logic) are this:

  • If you enter {Not - or +}{+ or -}{space}, this is an intentional use of "plus" or "minus" e.g. A+ Effort
  • If your term name ENDS in {Not - or +}{+ or -}, same goes = e.g. A+

These satisfy comment:6 and all Unit Tests pass. If we go with this patch, #22023 is no longer blocked.

#12 @SergeyBiryukov
12 years ago

Seems like 17689.2.diff would fix #16567 as well.

#13 @wonderboymusic
12 years ago

I just attached a file that shows the results of some tests I was running.

$slugs = array();

for ( $i = 33; $i < 256; $i++ ) {
	$char = trim( mb_convert_encoding('&#' . intval($i) . ';', 'UTF-8', 'HTML-ENTITIES' ) );

	echo $char, PHP_EOL;
	echo sanitize_title_with_dashes( 'A' . $char ), PHP_EOL;
	$ids = wp_insert_term( 'A' . $char, 'post_tag' );

	$term = get_term( $ids['term_id'], 'post_tag' );

	if ( in_array( $term->slug, $slugs ) )
		exit( 'NOOOOOOOO!' );
	else
		$slugs[] = $term->slug;

	echo $term->slug, PHP_EOL, PHP_EOL;
}

exit();

Prints out the following:

  • The char
  • The result of passing 'A' . $char to sanitize_title_with_dashes()
  • The resulting slug

'A' . $char produces many dupes. All term slugs are unique.

#14 @DrewAPicture
11 years ago

17689.2.diff works as expected for me with @wonderboymusic's tests code. Now I have a whole lot of "awesome" tags in my test build.

#15 @moraleida.me
11 years ago

  • Cc moraleida.me added

@moraleida.me
11 years ago

Unit tests first sketch

#16 @moraleida.me
11 years ago

attachment:17689-unit-tests-term.diff fails under the current codebase, succeeds with attachment:17689.2.diff applied. Needs a more thorough thought on how to create the different strings to cover all cases.

#17 @nacin
11 years ago

  • Keywords 3.7-early added
  • Milestone changed from 3.6 to Future Release
  • Priority changed from normal to high

Would very much like to finally fix this in 3.7, as this is a blocker for potential future taxonomy improvements.

#18 @wonderboymusic
11 years ago

  • Milestone changed from Future Release to 3.7

these are all marked 3.7-early

#19 @wonderboymusic
11 years ago

#24841 was marked as a duplicate.

#20 @buffler
11 years ago

  • Cc jeremy.buller@… added

#21 @wonderboymusic
11 years ago

17689.5.diff​ may seem ridiculous, but it totally fixes 2 issues:

  1. Fixes the problem where currently you can add $$$ repeatedly
  1. Fixes the problem where items with + and - get added repeatedly

*added repeatedly* means many terms with the same name and different slugs, due to sanitize_title_with_dashes().

wp_humanize_plus_minus() is the utility function I added that calls itself recursively to convert +/- to plus/minus in specifically-matched cases.

#22 @nacin
11 years ago

17689.get-term-by-name.diff is incredibly clever, and based on various IRC conversations, it seems like it'll do the trick. We need stronger unit tests here. We also decided waiting until 3.8 would be the prudent thing to do here.

#23 @SergeyBiryukov
11 years ago

  • Milestone changed from 3.7 to 3.8

#24 @goto10
11 years ago

  • Cc dromsey@… added

#25 @wonderboymusic
11 years ago

17689.6.diff has unit tests and sets the trimmed name to a temp variable - there is another unit test that makes sure the name isn't trimmed.

I think this is ready to go in, but wanted to let it simmer in the wild for a bit.

#26 @wonderboymusic
11 years ago

This has been sitting for 7 days - anyone want to chime in?

#27 @wonderboymusic
11 years ago

  • Keywords 3.9-early added; 3.7-early removed
  • Milestone changed from 3.8 to Future Release

This one's really close, but I think multiple terms with the same name is still problematic for the get_term_by( 'name', ... ) call in there

#28 @SergeyBiryukov
11 years ago

#26371 was marked as a duplicate.

This ticket was mentioned in IRC in #wordpress-dev by AaronCampbell. View the logs.


11 years ago

#30 @nacin
11 years ago

  • Milestone changed from Future Release to 3.9

#31 @SergeyBiryukov
11 years ago

#20133 was marked as a duplicate.

#32 @wonderboymusic
11 years ago

  • Keywords 4.0-early added; 3.9-early removed
  • Milestone changed from 3.9 to Future Release

Needs more unit tests - this should just be rolled in with our other Taxonomy plans.

This ticket was mentioned in IRC in #wordpress-dev by nacin. View the logs.


10 years ago

#34 @wonderboymusic
10 years ago

#20850 was marked as a duplicate.

#35 @aaroncampbell
10 years ago

  • Keywords needs-unit-tests removed

I ran against this when doing a large import for a client. I came to the same solution as Sergey, using get_term_by( 'name', '...' ) for my work around.

17689.7.diff is an update of .6 (still using get_term_by( 'name', '...' )), that applies cleanly and has more robust unit tests. I changed the unit tests to use $ instead of + because I think the idea of wp_humanize_plus_minus() is good but it's an enhancement that belongs in it's own ticket (and I want this test to still test our standard special character handling even after the plus/minus changes). I also added tests for terms that have all special characters, and am testing that the returned errors contain the expected duplicate term ID.

#36 @SergeyBiryukov
10 years ago

  • Milestone changed from Future Release to 4.0

This ticket was mentioned in IRC in #wordpress-dev by AaronCampbell. View the logs.


10 years ago

This ticket was mentioned in IRC in #wordpress-dev by helen. View the logs.


10 years ago

This ticket was mentioned in IRC in #wordpress-dev by helen. View the logs.


10 years ago

#40 @wonderboymusic
10 years ago

17689.8.diff refreshes so that it applies against trunk and adds more unit tests. About to rip the band-aid off

#41 @wonderboymusic
10 years ago

  • Owner set to wonderboymusic
  • Resolution set to fixed
  • Status changed from new to closed

In 28733:

In wp_insert_term(), when no slug is provided, check for an existing term by name. If it exists, use that slug instead of calling sanitize_title( $name ).

Prevents creating an endless number of terms like A+ or $$$$ in any given taxonomy.

Props wonderboymusic, SergeyBiryukov, aaroncampbell.
Fixes #17689.

This ticket was mentioned in IRC in #wordpress-dev by helen. View the logs.


10 years ago

This ticket was mentioned in IRC in #wordpress-dev by simonwheatley. View the logs.


10 years ago

This ticket was mentioned in IRC in #wordpress-dev by simonwheatley. View the logs.


10 years ago

This ticket was mentioned in IRC in #wordpress-dev by helen. View the logs.


10 years ago

#46 @boonebgorges
10 years ago

In 31792:

In wp_insert_term(), allow a term with an existing name if a unique $slug has been provided.

wp_insert_term() protects against the creation of terms with duplicate names
at the same level of a taxonomy hierarchy. However, it's historically been
possible to override this protection by explicitly providing a value of $slug
that is unique at the hierarchy tier. This ability was broken in [31734], and
the current changeset restores the original behavior.

A number of unit tests are added and refactored in support of these changes.

See #17689 for discussion of a fix that was superceded by [31734]. This commit
retains the fix for the underlying bug described in that ticket.

See #31328.

Note: See TracTickets for help on using tickets.