WordPress.org

Make WordPress Core

Opened 9 years ago

Last modified 2 years ago

#10823 new defect (bug)

Bad handling of ampersand in post titles

Reported by: Commeuneimage Owned by:
Milestone: Future Release Priority: normal
Severity: normal Version: 2.8
Component: Formatting Keywords: has-patch needs-refresh
Focuses: Cc:

Description

Titles with ampersand (&) are not correctly handled:

1/ the_title_attribute() doesn't transform & to & causing XHTML validation errors

2/ titles with & followed by ; generate truncated post name (slug) e.g. "Test this & believe ; what ?" => "test-this-what"

Attachments (7)

sanitize_title_with_dashes.patch (540 bytes) - added by williamsba1 8 years ago.
10823.diff (532 bytes) - added by solarissmoke 7 years ago.
10823.2.diff (905 bytes) - added by solarissmoke 7 years ago.
Use \w instead of [^\s]
10823.3.diff (1.3 KB) - added by SergeyBiryukov 6 years ago.
10823-unittests.diff (1.8 KB) - added by boonebgorges 3 years ago.
See #30284.
10823.4.diff (2.1 KB) - added by MikeHansenMe 3 years ago.
Refreshed
10823.5.diff (2.4 KB) - added by lukecavanagh 2 years ago.
Patch Refreshed

Download all attachments as: .zip

Change History (29)

#1 @nacin
8 years ago

  • Component changed from General to Formatting
  • Milestone changed from Unassigned to 3.0

#2 @mdawaffe
8 years ago

I cannot confirm 1. the_title_attribute() does correctly turn ampersands into HTML entities for me.

I can confirm 2.

It comes from the following line in sanitize_title_with_dashes().

$title = preg_replace('/&.+?;/', '', $title); // kill entities

Fixing this could generate more bugs since the output is used in permalinks.

#3 @dd32
8 years ago

Could that be changed to :

$title = preg_replace('/&[a-z0-9]+?;/', '', $title); // kill entities

perhaps? That'd still catch all valid entities...

#4 @williamsba1
8 years ago

  • Keywords has-patch needs-testing added; ampersand the_title_attribute htmlspecialchars removed

Tested out dd32's code and it fixes the issue. Like mdawaffe said this could cause other issues so wanted to patch so others could test.

#5 follow-up: @pufuwozu
8 years ago

Sorry to be pedantic but that patch introduces a slight regression for cases such as × (the permalinks would include "215").

Maybe we should include the hash character in the regex, like so:

$title = preg_replace('/&[#a-z0-9]+?;/', '', $title); // kill entities

After that, I can't think of any other cases.

#6 @dd32
8 years ago

Sorry to be pedantic but that patch introduces a slight regression for cases such as × (the permalinks would include "215").

Very true, That thought came to me after posting, I thought I'd added another reply to that effect..

#7 @nacin
8 years ago

  • Milestone changed from 3.0 to 3.1

#8 @nacin
8 years ago

  • Keywords needs-patch added; has-patch needs-testing removed
  • Milestone changed from Awaiting Triage to Future Release

@solarissmoke
7 years ago

#9 @solarissmoke
7 years ago

  • Keywords has-patch added; needs-patch removed

Patch addresses item (2) - I'm not sure if this would break anything else. In any case this seems quite edge.

#10 @nacin
7 years ago

I imagine we have a regex somewhere that deals with killing entities.

Of note I also see this regex used here:

./wp-includes/formatting.php:758:	$username = preg_replace( '/&.+?;/', '', $username ); // Kill entities

@solarissmoke
7 years ago

Use \w instead of [^\s]

#11 @SergeyBiryukov
7 years ago

  • Milestone changed from Future Release to 3.3

Related: #10797

#13 in reply to: ↑ 5 @azaozz
7 years ago

Replying to pufuwozu:

$title = preg_replace('/&[#a-z0-9]+?;/', '', $title); // kill entities

That looks good except the "?" is not needed. Also we can specify the length of that string:

$title = preg_replace('/&[#a-z0-9]{2,6};/', '', $title);

#14 @westi
7 years ago

  • Keywords 3.4-early needs-unit-tests added
  • Milestone changed from 3.3 to Awaiting Review

Too late in the cycle to be tweaking this.

We should also add some unit tests around this change.

#15 @dd32
6 years ago

  • Keywords needs-unit-tests removed

dd32 * http://unit-tests.trac.wordpress.org/changeset/489 /wp-testcase/test_includes_formatting.php: Cover more entity striping cases for sanitize_user() and sanitize_title_with_dashes(); See http://core.trac.wordpress.org/ticket/10823

Testing for that was done with/passed with the following regex:

'/&#?\w+;/'

I'm sure the above regex's would probably pass with this still.

#16 @SergeyBiryukov
6 years ago

  • Keywords 3.5-early added; 3.4-early removed
  • Milestone changed from Awaiting Review to Future Release

#17 @MikeHansenMe
5 years ago

  • Cc mdhansen@… added

Tested this patch, it still seems to work with 3.6-alpha. Maybe we should try to get this in 3.6 early?

@MikeHansenMe
3 years ago

Refreshed

#19 @chriscct7
3 years ago

  • Keywords 3.5-early removed
  • Severity changed from minor to normal
  • Version changed from 2.8.4 to 2.8

#20 @MikeHansenMe
3 years ago

Patch still applies with some offset

#21 @SergeyBiryukov
2 years ago

#36282 was marked as a duplicate.

#22 @MikeHansenMe
2 years ago

  • Keywords needs-refresh added

Patch needs to be refreshed.

@lukecavanagh
2 years ago

Patch Refreshed

Note: See TracTickets for help on using tickets.