WordPress.org

Make WordPress Core

Opened 5 years ago

Last modified 15 months ago

#10823 new defect (bug)

Bad handling of ampersand in post titles

Reported by: Commeuneimage Owned by:
Milestone: Future Release Priority: normal
Severity: minor Version: 2.8.4
Component: Formatting Keywords: has-patch 3.5-early
Focuses: Cc:

Description

Titles with ampersand (&) are not correctly handled:

1/ the_title_attribute() doesn't transform & to & causing XHTML validation errors

2/ titles with & followed by ; generate truncated post name (slug)
e.g. "Test this & believe ; what ?" => "test-this-what"

Attachments (4)

sanitize_title_with_dashes.patch (540 bytes) - added by williamsba1 4 years ago.
10823.diff (532 bytes) - added by solarissmoke 3 years ago.
10823.2.diff (905 bytes) - added by solarissmoke 3 years ago.
Use \w instead of [^\s]
10823.3.diff (1.3 KB) - added by SergeyBiryukov 2 years ago.

Download all attachments as: .zip

Change History (22)

comment:1 nacin4 years ago

  • Component changed from General to Formatting
  • Milestone changed from Unassigned to 3.0

comment:2 mdawaffe4 years ago

I cannot confirm 1. the_title_attribute() does correctly turn ampersands into HTML entities for me.

I can confirm 2.

It comes from the following line in sanitize_title_with_dashes().

$title = preg_replace('/&.+?;/', '', $title); // kill entities

Fixing this could generate more bugs since the output is used in permalinks.

comment:3 dd324 years ago

Could that be changed to :

$title = preg_replace('/&[a-z0-9]+?;/', '', $title); // kill entities

perhaps? That'd still catch all valid entities...

comment:4 williamsba14 years ago

  • Keywords has-patch needs-testing added; ampersand the_title_attribute htmlspecialchars removed

Tested out dd32's code and it fixes the issue. Like mdawaffe said this could cause other issues so wanted to patch so others could test.

comment:5 follow-up: pufuwozu4 years ago

Sorry to be pedantic but that patch introduces a slight regression for cases such as × (the permalinks would include "215").

Maybe we should include the hash character in the regex, like so:

$title = preg_replace('/&[#a-z0-9]+?;/', '', $title); // kill entities

After that, I can't think of any other cases.

comment:6 dd324 years ago

Sorry to be pedantic but that patch introduces a slight regression for cases such as × (the permalinks would include "215").

Very true, That thought came to me after posting, I thought I'd added another reply to that effect..

comment:7 nacin4 years ago

  • Milestone changed from 3.0 to 3.1

comment:8 nacin3 years ago

  • Keywords needs-patch added; has-patch needs-testing removed
  • Milestone changed from Awaiting Triage to Future Release

solarissmoke3 years ago

comment:9 solarissmoke3 years ago

  • Keywords has-patch added; needs-patch removed

Patch addresses item (2) - I'm not sure if this would break anything else. In any case this seems quite edge.

comment:10 nacin3 years ago

I imagine we have a regex somewhere that deals with killing entities.

Of note I also see this regex used here:

./wp-includes/formatting.php:758:	$username = preg_replace( '/&.+?;/', '', $username ); // Kill entities

solarissmoke3 years ago

Use \w instead of [^\s]

comment:11 SergeyBiryukov3 years ago

  • Milestone changed from Future Release to 3.3

Related: #10797

comment:13 in reply to: ↑ 5 azaozz2 years ago

Replying to pufuwozu:

$title = preg_replace('/&[#a-z0-9]+?;/', '', $title); // kill entities

That looks good except the "?" is not needed. Also we can specify the length of that string:

$title = preg_replace('/&[#a-z0-9]{2,6};/', '', $title);

comment:14 westi2 years ago

  • Keywords 3.4-early needs-unit-tests added
  • Milestone changed from 3.3 to Awaiting Review

Too late in the cycle to be tweaking this.

We should also add some unit tests around this change.

comment:15 dd322 years ago

  • Keywords needs-unit-tests removed

dd32 * http://unit-tests.trac.wordpress.org/changeset/489 /wp-testcase/test_includes_formatting.php: Cover more entity striping cases for sanitize_user() and sanitize_title_with_dashes(); See http://core.trac.wordpress.org/ticket/10823

Testing for that was done with/passed with the following regex:

'/&#?\w+;/'

I'm sure the above regex's would probably pass with this still.

SergeyBiryukov2 years ago

comment:16 SergeyBiryukov2 years ago

  • Keywords 3.5-early added; 3.4-early removed
  • Milestone changed from Awaiting Review to Future Release

comment:17 MikeHansenMe16 months ago

  • Cc mdhansen@… added

Tested this patch, it still seems to work with 3.6-alpha. Maybe we should try to get this in 3.6 early?

Note: See TracTickets for help on using tickets.