#17689 closed defect (bug) (fixed)
Terms should not be sanitized inside term_exists()
Reported by: | blepoxp | Owned by: | wonderboymusic |
---|---|---|---|
Milestone: | 4.0 | Priority: | high |
Severity: | normal | Version: | 3.2 |
Component: | Taxonomy | Keywords: | has-patch 4.0-early |
Focuses: | Cc: |
Description
When adding a term to a post, the title of the term is sent through term_exists(). If term_exists finds and returns the ID of an existing term for the passed taxonomy, that ID is added to the post object. If no term is found, it returns false and a new term is created for that taxonomy with the same title that was passed to term_exists().
The problem is that term_exists() uses sanitize_title($term) on line 1457 of wp-includes/taxonomy.php while wp_insert_term uses stripslashes($name) on line 1985 of the same file.
This doesn't cause a problem in many circumstances, but if the term title happens to be something like $$$, that means it will always be added correctly in wp_insert_term() but never found as existing in term_exists(). The result is that every time you add $$$$ to another post it gets added as a new term with a unique slug so that you have several terms with the title $$$$ for the same taxonomy but different IDs.
The attached patch corrects that by passing the term title through stripslashes in term_exists() rather than through sanitize_title().
I haven't found any undesired side effects in testing.
Attachments (14)
Change History (60)
#1
@
13 years ago
Steps to reproduce if you can't make sense of the book i wrote above:
- Create a new post with the tag $$
- Create another post with the same tag: $$
- Create a third post with the same tag: $$
- Check the tag list to find three tags with the title $$ but with separate IDs.
@
13 years ago
Didn't realize that it was actually sent through stripslashes() 2 lines up. Updated patch just removes sanitize_title()
#2
@
12 years ago
This is totally a bug - you can add "$$$" in each and every taxonomy as many times as you want - but I have fixed it! Patch added.
#6
@
12 years ago
Another scenario this affects:
- Create a tag called 'A' - the slug should default to 'a'
- Add/edit a post.
- In the tags box add 'A' to the post, and also 'A-'
- Update the post.
'A' will assign as expected but 'A-' will be discarded instead of added as a new tag.
santitize_title() will strip out the hyphen in 'A-' and match the slug against 'A'. The same behaviour would apply to any use case where a new tag name matched the slug of an existing tag.
#8
@
12 years ago
This caused us a headache trying to trace the cause too.
We are wanting to use grades of music albums as a taxonomy in a shop. The list includes "M", "M+" and "M-" in the grades. Even though we can create all three taxonomies through the admin pages, we cannot assign either "M+" or "M-" to a post. Doing so will link the post to just "M" instead.
#10
@
12 years ago
- Keywords needs-refresh added
- Milestone changed from Future Release to 3.6
My patch still applies cleanly against the original scenario and should be applied for terms that have all non-alpha chars
http://core.trac.wordpress.org/ticket/17689#comment:6 still needs to be addressed.
#11
@
12 years ago
- Keywords needs-refresh removed
I attacked sanitize_title_with_dashes()
here as well #17648
I refreshed my patch for this ticket. With it, you can no longer add $$$ an endless number of times, and M- and M+ become m-minus and m-plus in slug form. This way be controversial, but my thinking (and RegEx logic) are this:
- If you enter {Not - or +}{+ or -}{space}, this is an intentional use of "plus" or "minus" e.g. A+ Effort
- If your term name ENDS in {Not - or +}{+ or -}, same goes = e.g. A+
These satisfy comment:6 and all Unit Tests pass. If we go with this patch, #22023 is no longer blocked.
#13
@
12 years ago
I just attached a file that shows the results of some tests I was running.
$slugs = array(); for ( $i = 33; $i < 256; $i++ ) { $char = trim( mb_convert_encoding('&#' . intval($i) . ';', 'UTF-8', 'HTML-ENTITIES' ) ); echo $char, PHP_EOL; echo sanitize_title_with_dashes( 'A' . $char ), PHP_EOL; $ids = wp_insert_term( 'A' . $char, 'post_tag' ); $term = get_term( $ids['term_id'], 'post_tag' ); if ( in_array( $term->slug, $slugs ) ) exit( 'NOOOOOOOO!' ); else $slugs[] = $term->slug; echo $term->slug, PHP_EOL, PHP_EOL; } exit();
Prints out the following:
- The char
- The result of passing
'A' . $char
tosanitize_title_with_dashes()
- The resulting slug
'A' . $char
produces many dupes. All term slugs are unique.
#14
@
11 years ago
17689.2.diff works as expected for me with @wonderboymusic's tests code. Now I have a whole lot of "awesome" tags in my test build.
#16
@
11 years ago
attachment:17689-unit-tests-term.diff fails under the current codebase, succeeds with attachment:17689.2.diff applied. Needs a more thorough thought on how to create the different strings to cover all cases.
#17
@
11 years ago
- Keywords 3.7-early added
- Milestone changed from 3.6 to Future Release
- Priority changed from normal to high
Would very much like to finally fix this in 3.7, as this is a blocker for potential future taxonomy improvements.
#21
@
11 years ago
17689.5.diff may seem ridiculous, but it totally fixes 2 issues:
- Fixes the problem where currently you can add
$$$
repeatedly
- Fixes the problem where items with
+
and-
get added repeatedly
*added repeatedly* means many terms with the same name and different slugs, due to sanitize_title_with_dashes()
.
wp_humanize_plus_minus()
is the utility function I added that calls itself recursively to convert +/-
to plus/minus
in specifically-matched cases.
#22
@
11 years ago
17689.get-term-by-name.diff is incredibly clever, and based on various IRC conversations, it seems like it'll do the trick. We need stronger unit tests here. We also decided waiting until 3.8 would be the prudent thing to do here.
#25
@
11 years ago
17689.6.diff has unit tests and sets the trimmed name to a temp variable - there is another unit test that makes sure the name isn't trimmed.
I think this is ready to go in, but wanted to let it simmer in the wild for a bit.
#27
@
11 years ago
- Keywords 3.9-early added; 3.7-early removed
- Milestone changed from 3.8 to Future Release
This one's really close, but I think multiple terms with the same name is still problematic for the get_term_by( 'name', ... )
call in there
This ticket was mentioned in IRC in #wordpress-dev by AaronCampbell. View the logs.
11 years ago
#32
@
11 years ago
- Keywords 4.0-early added; 3.9-early removed
- Milestone changed from 3.9 to Future Release
Needs more unit tests - this should just be rolled in with our other Taxonomy plans.
This ticket was mentioned in IRC in #wordpress-dev by nacin. View the logs.
10 years ago
#35
@
10 years ago
- Keywords needs-unit-tests removed
I ran against this when doing a large import for a client. I came to the same solution as Sergey, using get_term_by( 'name', '...' )
for my work around.
17689.7.diff is an update of .6 (still using get_term_by( 'name', '...' )
), that applies cleanly and has more robust unit tests. I changed the unit tests to use $ instead of + because I think the idea of wp_humanize_plus_minus()
is good but it's an enhancement that belongs in it's own ticket (and I want this test to still test our standard special character handling even after the plus/minus changes). I also added tests for terms that have all special characters, and am testing that the returned errors contain the expected duplicate term ID.
This ticket was mentioned in IRC in #wordpress-dev by AaronCampbell. View the logs.
10 years ago
This ticket was mentioned in IRC in #wordpress-dev by helen. View the logs.
10 years ago
This ticket was mentioned in IRC in #wordpress-dev by helen. View the logs.
10 years ago
#40
@
10 years ago
17689.8.diff refreshes so that it applies against trunk and adds more unit tests. About to rip the band-aid off
#41
@
10 years ago
- Owner set to wonderboymusic
- Resolution set to fixed
- Status changed from new to closed
In 28733:
replace sanitize_title with stripslashes