Opened 15 years ago
Last modified 2 years ago
#10249 reopened defect (bug)
Page slug in cyrillic = Error 404 - Not Found!
Reported by: | kalifi | Owned by: | westi |
---|---|---|---|
Milestone: | Future Release | Priority: | normal |
Severity: | normal | Version: | 4.9.5 |
Component: | Permalinks | Keywords: | has-patch needs-unit-tests |
Focuses: | Cc: |
Description
When I create a page with page slug for example "киро"
then when I try to open domain/киро - Error 404 - Not Found
The permalinks are %postname%
Post slug with this slug is working just fine, the same BUG exists in 2.7, 2.7.1 and 2.8
Attachments (6)
Change History (76)
#2
@
15 years ago
- Keywords needs-patch added
Example "киро" works when using Month and Name permalinks. Setting the Custom Structure value to "/%postname%/" does cause the test page to throw status 404.
#4
@
15 years ago
- Keywords needs-patch removed
- Milestone 3.0 deleted
- Resolution set to worksforme
- Status changed from new to closed
This is currently working for me in trunk (3.0).
Tested with both /%postname%/ and /%year%/%postname%/
Re-open if this is still around, I believe this may have been fixed by a similar ticket near the start of the dev cycle.
#5
@
14 years ago
- Resolution worksforme deleted
- Status changed from closed to reopened
I have no idea how did you manage to get this issue working with just /%postname%/ but I just installed 3.0b2 and confirmed it's not working. It is working with /%year%/%postname% but with the /%year%, it's also working in 2.9.2.
So I'll reopen it.
#6
@
14 years ago
Ok, we managed to find a "solution" for it. I'm not sure if this is the proper way to do it and it might mess up some other stuff but for what we needed it, it's working.
In query.php (/wp-includes), add the piece of code specified on next line after
function parse_query($query) {
It's (in 3.0 beta 2) on line 1246.
The code that did the job:
if ( !$query['name'] && !$query['pagename'] && $query['category_name']) { $query['pagename']=$query['category_name']; }
I'm no programmer and I really have no idea what's the magic behind it but it's working. Working as in you can use cyrillic (most probably all other non-latin alphabets, too) slugs perfectly both on posts and pages with the /%postname% /%category%/%postname% permalinks (we're using the last one). No /%year% is needed to make it work.
Smart guys out here probably know what's going on when looking at this code, so they can either improve it or if it's known to cause some issues, tell us here that it's not working and shouldn't be used.
#8
@
14 years ago
- Owner changed from ryan to westi
- Status changed from reopened to accepted
I can reproduce this..
Investigating.
#9
@
14 years ago
Ok this caused by the url encoded slug being in the explicit page rules but us matching against a rawurldecoded string.
#11
@
14 years ago
It seems this was introduced for #3727 to fix a similar issue.
It seems rather bogus here as we need the string to be urlencoded to match against our rewrite rules
#12
@
14 years ago
- Cc ryan added
- Keywords has-patch 2nd-opinion added
- Milestone changed from 3.0 to 3.1
- Priority changed from high to normal
- Severity changed from blocker to normal
- Version set to 2.7
Not convinced this is a blocker as it works the same in 2.9
attaching a patch which fixes this for me but needs alot of review first.
Would prefer to leave till 3.1 as this is not a regression.
#13
@
14 years ago
Ok, confirmed working. Non-latin slugs now work perfectly fine (tested with Russian and Greek). We removed the piece of code I mentioned earlier, added Westi's patch and it all works. So thanks, Westi! Nice to see that issue was noticed again. :)
#14
follow-up:
↓ 16
@
14 years ago
Keep in mind Apache usually treats these URLs as identical:
/киро
/%D0%BA%D0%B8%D1%80%D0%BE
Most UAs will transmit the latter, but the former is also commonly used and completely valid when transmitted as a UTF-8 byte stream. As such, I doubt that removing a call to urldecode will fix this problem correctly. It is more likely that the stored permalink needs to be decoded and then matched against the decoded request value.
#15
@
14 years ago
Hmm, now I have to correct myself that it's still not properly working. Sometimes it's working, sometimes it's not working. Totally unpredictable of how or why or when it's working or not.
#16
in reply to:
↑ 14
@
14 years ago
Replying to miqrogroove:
Keep in mind Apache usually treats these URLs as identical:
/киро
/%D0%BA%D0%B8%D1%80%D0%BE
Most UAs will transmit the latter, but the former is also commonly used and completely valid when transmitted as a UTF-8 byte stream. As such, I doubt that removing a call to urldecode will fix this problem correctly. It is more likely that the stored permalink needs to be decoded and then matched against the decoded request value.
Erm. Wouldn't that break all kinds of RFCs - URIs all have to use US-ASCII only chars and even then a limited set of such chars AFAIK
All UserAgents I've ever checked will always ensure that a URI used in an HTTP request has been urlencoded - they will usually now show you the pretty version of the url as that is much more user friendly.
#17
follow-up:
↓ 18
@
14 years ago
I'm not aware of any such requirement. In the wild, UTF-8 requests are common if not frequent and are properly supported by servers. If WordPress is only intended to support the UAs you've checked, then I guess it's not a concern?
#18
in reply to:
↑ 17
@
14 years ago
Replying to miqrogroove:
I'm not aware of any such requirement. In the wild, UTF-8 requests are common if not frequent and are properly supported by servers. If WordPress is only intended to support the UAs you've checked, then I guess it's not a concern?
Can you give an example of a UA which sends UTF8 on the wire in requests?
#21
@
14 years ago
I can confirm the patch is working in my case (WordPress 3.0, Firefox 3.6.3, page slug is “детали”). Ragnaripea, is it not working with certains slugs or certain browsers?
#23
@
14 years ago
More thoughts on the UTF-8 headers issue: I haven't had a chance to test PHP's REQUEST_URI value yet. My idea to urldecode the requested and stored paths was based on the behavior of mod_rewrite, where the curly apostrophe is always represented as "\xe2\x80\xa6" instead of the urlencoded version. What might be a simpler solution if PHP is providing mixed encodings is to urlencode only the bytes >= 128 before comparing it to the stored version. That would convert any UTF-8 byte stream into the WordPress format of url-encoded UTF-8.
#24
@
14 years ago
Oh that sounds like an old friend. Interestingly things get traction when Matt needs to do stuff for his customers. Well, good that finally then.
Related: #11175
Related: #11619
Related: #11669
But in the end I suspect this is related to the romanization / latinization component which creates the slug. Could benefit from an overhaul but everytime this was on topic it was denied to do so because of backwards compat.
#25
@
14 years ago
OK, to add something on the issue. The patch provided westi solves the issue partly. Yes, it allows the usage of UTF-8 slugs for pages, but... it works only if you open the page from the current site. So if someone links to that page from other web site, it will return 404.
So in order to solve the issue, there should be some additional enhancement in the routing of the URLs.
#26
@
14 years ago
More details on the issue. With the patch provided there are two options that I tested:
/%postname%/ - then it works fine if the link to the page is WITHOUT the trailing slash
/%postname% - then it works fine if the link to the page is WITH the trailing slash
However in both cases, if the page slug is "киро" in case one it will not open the page "example.com/киро/" if referred from external site.
In case two it will not open "example.com/киро" if referred from external site.
#28
@
14 years ago
westi's patch looks good. It shouldn't break anything, because later we are matching the regular expression against both the decoded and the raw URI values.
The only difference should be that if we decode it, we might lose the encoded version.
If we want to be extra sure, we could try matching against the encoded version, too, accounting for web servers, which put UTF-8 in the request URI and we end up with not-encoded version, only.
@ivoapostolov, I couldn't reproduce the slash problems—it works for me with and without slashes. If there isn't a slash, WordPress does a 301 redirect to the slashed URL. Moreover, there isn't any difference in routing based on referer. Do you still have this problem with a vanilla install and all plugins disabled?
#29
follow-up:
↓ 30
@
14 years ago
Still we cant use utf-8 chars in permalinks.
http://wordpress.org/extend/ideas/topic/non-latin-characters-need-love
"şŞ İı Ğğ Üü Öö çÇ" chars important for me and those are utf-8 actually.
#30
in reply to:
↑ 29
@
14 years ago
Replying to westi:
(In [15960]) Attempt at a fix for url encoding issues with UTF8 chars. See #10249.
IIRC while testing for #13413 I had exactly problems with removing the rawurldecode. I do not exactly remember what this breaks but I think it had something to do with backwards compatibility because the wp-rewrite-rule is expected to have this encoded.
Replying to unsalkorkmaz:
Still we cant use utf-8 chars in permalinks.
http://wordpress.org/extend/ideas/topic/non-latin-characters-need-love
"şŞ İı Ğğ Üü Öö çÇ" chars important for me and those are utf-8 actually.
Have you tried on trunk? If not, please do so.
Probably you need install/deactivate Normalize URLs (Wordpress Plugin) to make this go.
#31
follow-up:
↓ 32
@
14 years ago
Isn't this ticket a duplicate of #9591 (or better say to be closed as duplicate as being in the family)?
#33
@
14 years ago
If there is a chance that leaving the URI urlencoded, why don't we do it the other way round.
We leave the rawurldecode()
call and when we match, we try to match by URI, urlencoded URI and urldecoded URI.
The patch is attached, it solves the problem in the ticket for me, and I can't imagine how it can break anything.
#35
@
14 years ago
Well, I think i cant explain myself enough good (sorry for my english)
An example.. "примерно" you can use these russian chars in permalinks but "şŞ İı Ğğ Üü Öö çÇ" becoming "ss-ii-gg-uu-oo-cc" and this is a problem actually. You can try this yourself too. those chars are UTF8.
#36
@
14 years ago
and after some tries on formatting.php i see removing those lines works in 823:
$title = remove_accents($title);
"şŞ İı Ğğ Üü Öö çÇ" becoming "şş-iı-ğğ-üü-öö-çç"
#40
@
13 years ago
- Severity changed from normal to major
- Version changed from 2.7 to 3.3
I thought that this bug was gone for good but it reappeared after a nightly build upgrade about a week ago. Pages with cyrillic slug throw a 404 error, while posts and taxonomies (tags, cats) work fine.
#41
@
13 years ago
- Version changed from 3.3 to 2.7
Thanks for the report, but please don't change the Version attribute. It should actually be called "First seen in:".
#42
@
13 years ago
- Keywords dev-feedback added; 2nd-opinion removed
- Milestone changed from Future Release to 3.3
[18541] indeed broke get_page_by_path()
for non-English slugs.
This is what Debug Bar shows:
WARNING: wp-includes/wp-db.php:890 - vsprintf() [function.vsprintf]: Too few arguments
SQL query is:
SELECT ID, post_name, post_parent FROM trunk_posts WHERE post_name IN ('%d0%ba%d0%b8%d1%80%d0%be') AND (post_type = '%s' OR post_type = 'attachment') AND trunk_posts.post_type IN ('post', 'page', 'attachment')
vsprintf()
in $wpdb->prepare()
thinks those encoded chars are placeholders.
Before [18541], only a single page slug was queried here, and it was escaped properly.
10249.1.patch fixes the issue by escaping %
chars before passing $in_string
to $wpdb->prepare()
.
10249.2.patch escapes $post_type
, which makes $wpdb->prepare()
unnecessary, since queried post slugs are also escaped earlier.
Both patches also contain the refreshed page-slugs-urldecode-other-way-round.diff.
#45
follow-up:
↓ 48
@
13 years ago
$in_string probably should be substituted in. A variable inside prepare() is usually a case of doing it wrong.
#46
@
13 years ago
Of course, we need a way to pass in a string without it being automatically quoted.
#48
in reply to:
↑ 45
@
13 years ago
Replying to ryan:
$in_string probably should be substituted in.
I thought of that, but $wpdb->prepare()
adds slashes to inner quotes:
WHERE post_name IN ('%d0%ba%d0%b8%d1%80%d0%be\',\'%d0%ba%d0%b8%d0%ba%d0%b8%d1%80%d0%be')
And if we omit them in implode()
, they are not added back:
WHERE post_name IN ('%d0%ba%d0%b8%d1%80%d0%be,%d0%ba%d0%b8%d0%ba%d0%b8%d1%80%d0%be')
Am I missing something?
Trying to modify prepare()
now to use %r, as per your second comment.
#51
@
13 years ago
%r would be no outside quotes added as is done with %s and no escape_by_ref(). Writing a mini format parser to extract %r could become a fairly big chore.
Actually, esc_sql() is a weak escape. It should not be used here. We probably need to make a public version of wpdb::_escape(). Maybe esc().
#52
follow-up:
↓ 53
@
13 years ago
I prefer to have all queries go through prepare(), just in case it does something more someday, but let's just do a simple fix for now. How about:
$post_type_sql = $post_type; $wpdb->escape_by_ref( $post_type_sql );
#53
in reply to:
↑ 52
@
13 years ago
Done in 10249.3.patch.
We could possibly escape $post_type
itself without introducing a new variable, but later it is passed to get_post()
, though it's not actually used a post type there, but rather as a filter.
#56
in reply to:
↑ 44
@
13 years ago
As for including page-slugs-urldecode-other-way-round.diff:
[15960] affected all PATH_INFO permalinks (not only pages) with non-ASCII chars.
rawurldecode()
of $req_uri
was introduced into [4893]/[4979] as a fix for #3727. So Nikolay's patch (refreshed in 10249.3.patch) is a more proper fix than [15960].
#59
@
13 years ago
- Keywords 3.4-early added
- Milestone changed from 3.3 to Future Release
While I would love to finally close this ticket I will only be happy to do so when we have good unit test coverage.
As this stage in the 3.3 cycle I don't think we should be tweaking this code any more.
#61
@
13 years ago
- Milestone changed from Future Release to 3.4
Will prepare unit tests in a week or two, if no one beats me to it.
#65
@
9 years ago
- Keywords has-patch dev-feedback needs-unit-tests 3.5-early removed
- Resolution set to worksforme
- Severity changed from major to normal
- Status changed from accepted to closed
This works just fine on WordPress 4.2 for posts, pages, and custom post types. This was resolved on an earlier release, possibly when we did the emoji in URL thing for 4.2. Closing as works for me
#66
@
9 years ago
- Keywords has-patch needs-unit-tests added
- Resolution worksforme deleted
- Status changed from closed to reopened
The issue still exists for PATH_INFO permalinks (/index.php/postname
), and comment:56 is still relevant.
Steps to reproduce:
- Disable mod_rewrite support:
add_filter( 'got_rewrite', '__return_false' );
- Set permalinks to "Post name".
- Create a post or page named "Киро" and publish it. You'll get a 404 error.
Not a 2.8 regression so moving to 2.9.