#21013 closed defect (bug) (fixed)
Badly truncated slugs in posts with similar, long non-latin titles
Reported by: | nevma | Owned by: | SergeyBiryukov |
---|---|---|---|
Milestone: | 3.6 | Priority: | normal |
Severity: | normal | Version: | 3.4 |
Component: | Permalinks | Keywords: | has-patch 3.6-early commit |
Focuses: | Cc: |
Description
When WP automatically generates a slug for a post with non-latin title AND a number has to be appended to the slug to avoid duplicate slugs, then the resulting slug may be badly truncated.
To reproduce the problem:
Create and publish a post with this title (without the quotes): "Αρνάκι άσπρο και παχύ της μάνας του καμάρι"
Create and publish another one with this title (without the quotes): "Αρνάκι άσπρο και παχύ της μάνας του καμάρι, και άλλα τραγούδια"
In both cases, let WP create the slug.
You will see that the second slug is broken and the URL results in a 404 error.
It seems that, in this scenario, WP does not take into account the fact that the new slug may contain url-encoded characters from multibyte character sets, when truncating it.
Attachments (12)
Change History (33)
#3
@
12 years ago
- Keywords has-patch added
- Milestone changed from Awaiting Review to 3.5
We should be able to use mb_substr()
to cut the string properly, since we have it in compat.php
.
#4
@
12 years ago
- Keywords needs-unit-tests added
Looks good at a glance. Unit tests would be helpful here.
#5
@
12 years ago
21013.patch didn't work for two reasons:
- When we run the query to check if the slug already exists, the slug should stay urlencoded.
- The decoded string is shorter than 200 characters, so the slug wasn't truncated. We would need a compat version of
mb_strlen()
as well.
21013.2.patch fixes that. The compat mb_strlen()
implementation is based on mb_substr()
code:
http://core.trac.wordpress.org/browser/tags/3.4/wp-includes/compat.php#L16
21013.3.patch adds rtrim( $alt_post_name, '-' )
, in case the string we get ends with a hyphen:
предлагаем-металлообрабатывающее--2
#7
follow-up:
↓ 8
@
12 years ago
As an alternative approach, we could attempt to truncate slugs without re-encoding.
Due to more accurate string handling, this allows 1 character longer slugs (still within the limit) than the previous patches, so the unit tests from [UT740] should be updated if 21013.5.patch gets approved.
21013.6.patch also handles 3-octet sequences.
#8
in reply to:
↑ 7
@
12 years ago
- Keywords needs-unit-tests removed
Due to more accurate string handling, this allows 1 character longer slugs (still within the limit) than the previous patches
21013.8.patch brings the same accuracy to the mb_substr()
approach, so both approaches should be functionally the same now. Turned out mb_strlen()
is not needed.
Updated unit tests: [UT741]
#10
@
12 years ago
21013.9.patch adds a urldecode( $slug ) === $slug
check to see if we could get away with substr()
(as per nacin's suggestion).
#11
@
12 years ago
- Cc rfair404 added
The latest patch seems to work for me and does seem to solve the problem.
In my case, publishing the first post with the title works as expected. url generated was pretty much as expected, something like :
http://local.dev/blog/2012/08/06/%CE%B1%CF%81%CE%BD%CE%AC%CE%BA%CE%B9-%CE%AC%CF%83%CF%80%CF%81%CE%BF-%CE%BA%CE%B1%CE%B9-%CF%80%CE%B1%CF%87%CF%8D-%CF%84%CE%B7%CF%82-%CE%BC%CE%AC%CE%BD%CE%B1%CF%82-%CF%84%CE%BF%CF%85-%CE%BA%CE%B1%CE%BC/
Publishing the second post results in an error 400 "Bad Request" and a url that looks like this:
http://local.dev/blog/2012/08/06/%ce%b1%cf%81%ce%bd%ce%ac%ce%ba%ce%b9-%ce%ac%cf%83%cf%80%cf%81%ce%bf-%ce%ba%ce%b1%ce%b9-%cf%80%ce%b1%cf%87%cf%8d-%cf%84%ce%b7%cf%82-%ce%bc%ce%ac%ce%bd%ce%b1%cf%82-%cf%84%ce%bf%cf%85-%ce%ba%ce%b1%ce%b-2/
After applying the patch, I get a url like this:
http://local.dev/blog/2012/08/06/%CE%B1%CF%81%CE%BD%CE%AC%CE%BA%CE%B9-%CE%AC%CF%83%CF%80%CF%81%CE%BF-%CE%BA%CE%B1%CE%B9-%CF%80%CE%B1%CF%87%CF%8D-%CF%84%CE%B7%CF%82-%CE%BC%CE%AC%CE%BD%CE%B1%CF%82-%CF%84%CE%BF%CF%85-%CE%BA%CE%B1-2/
I even tried to add several additional posts and pages with a similar name, and it appears to work across post types including pages.
#12
@
12 years ago
21013.10.patch simplifies the function by eliminating the cycle and passing $length
to utf8_uri_encode()
instead. Still passes the unit test.
#16
@
12 years ago
- Keywords punt added
This looks good but I still feel a bit weary of committing this so late in a cycle. Early 3.6?
#17
@
12 years ago
- Keywords 3.6-early added; early punt removed
- Milestone changed from 3.5 to Future Release
#19
@
12 years ago
- Owner set to SergeyBiryukov
- Resolution set to fixed
- Status changed from new to closed
In 23420:
#20
follow-up:
↓ 21
@
12 years ago
- Cc nashwan.doaqan@… added
Great that this bug solved in 3.6
but I have many posts with this bug (more than 1000), Should I change the slug for each one ?! , and sorry because I comment in a closed ticket but I am not sure that I should open a new one .
#21
in reply to:
↑ 20
@
12 years ago
Replying to alex-ye:
but I have many posts with this bug (more than 1000), Should I change the slug for each one ?!
Try this script to fix the existing slugs: 21013.fix-existing-slugs.php. Place it in the root directory of your site. Remember to back up your database before running the script.
Related: #10483 (this specific issue is mentioned in ticket:10483:18).