Opened 15 years ago
Last modified 13 months ago
#10483 reviewing enhancement
Change post_name's length from 200 to 400
Reported by: | elnur | Owned by: | SergeyBiryukov |
---|---|---|---|
Milestone: | Future Release | Priority: | normal |
Severity: | minor | Version: | |
Component: | Permalinks | Keywords: | dev-feedback early has-patch needs-testing |
Focuses: | Cc: |
Description
Hello, guys! Thank you very much for providing such a great piece of software! I love WordPress very much! :)
I use WordPress in Russian language and the URLs on my blog consist of Russian characters. There is a post with not such a long URL in Russian, but since it gets encoded to special characters it becomes too long to get fit into post_name
field of post
table.
I've found what code needs to be changed to increase the length. I make these changes every time a new version is released. I think it would be better to submit a patch here so that others people can benefit from it and I will not need to make those changes every release.
I'm attaching the patch to this ticket and asking you to apply it to the code.
Thank you very much again, guys! You do a great job! :)
Cheers,
Elnur
Attachments (5)
Change History (66)
#2
follow-up:
↓ 3
@
15 years ago
This link http://www.ielnur.com/blog/2009/05/%d1%81%d0%bd%d0%be%d0%b2%d0%b0-%d0%b1%d1%80%d0%be%d1%81%d0%b8%d1%82%d1%8c-%d0%ba%d1%83%d1%80%d0%b8%d1%82%d1%8c-30-%d1%82%d0%b8%d0%b4%d0%bd%d0%b5%d0%b2%d0%bd%d0%be%d0%b5-%d0%b8%d1%81%d0%bf%d1%8b%d1%82%d0%b0%d0%bd%d0%b8%d0%b5/ is 258 chars long, post_name
part is 223. It works fine with our local search engine http://www.yandex.ru and with http://www.google.ru as well.
But I might be wrong in trying to change post_name
's length to 400. I remembered that varchar
can't hold more than 255 chars. Isn't that true?
#3
in reply to:
↑ 2
@
15 years ago
Replying to elnur:
But I might be wrong in trying to change
post_name
's length to 400. I remembered thatvarchar
can't hold more than 255 chars. Isn't that true?
yeah, until Mysql 5
#5
in reply to:
↑ 4
@
15 years ago
- Component changed from General to Permalinks
- Milestone changed from 2.8.3 to Future Release
- Owner set to ryan
- Priority changed from normal to low
- Severity changed from normal to minor
Replying to elnur:
So, wouldn't someone apply this patch to the code? :)
We, no... Not until MySQL 5 is the default, anyway. And then we'd need to worry about url length...
Punting to Future in the meanwhile.
#9
@
14 years ago
- Keywords dev-feedback added
Since we're going to MySQL5 in 3.2, can't this be considered for commit?
#10
@
14 years ago
- Milestone Future Release deleted
- Resolution set to maybelater
- Status changed from new to closed
We're not adjusting anything for MySQL 5 at this time. Closing as maybelater.
#11
@
14 years ago
- Keywords close removed
- Milestone set to Future Release
- Resolution maybelater deleted
- Status changed from closed to reopened
Reopening for discussion.
#12
follow-up:
↓ 13
@
14 years ago
- Cc linuxologos@… added
Thanks for giving the opportunity to discuss this.
This is quite a big problem for languages with an alphabet totally different from English. We can't have a post_name with more than ~38 letters, since every letter is urlencoded to be stored in the database, so every letter is converted into many more characters, dramatically cutting down the maximum possible length of the "real" post-name.
#13
in reply to:
↑ 12
;
follow-up:
↓ 14
@
14 years ago
Replying to linuxologos:
Thanks for giving the opportunity to discuss this.
This is quite a big problem for languages with an alphabet totally different from English. We can't have a post_name with more than ~38 letters, since every letter is urlencoded to be stored in the database, so every letter is converted into many more characters, dramatically cutting down the maximum possible length of the "real" post-name.
It's probably worth to drop the urlencoding then inside the storage layer. AFAIK MySQL should be able to store UTF8 in colums, so to have 200 true UTF8 characters instead of 38 to 200 urlencoded, subset of us-ascii ones.
The related refactorings could benefit the overall UTF8 support of the application as a bonus.
#14
in reply to:
↑ 13
;
follow-up:
↓ 15
@
14 years ago
Replying to hakre:
Replying to linuxologos:
Thanks for giving the opportunity to discuss this.
This is quite a big problem for languages with an alphabet totally different from English. We can't have a post_name with more than ~38 letters, since every letter is urlencoded to be stored in the database, so every letter is converted into many more characters, dramatically cutting down the maximum possible length of the "real" post-name.
It's probably worth to drop the urlencoding then inside the storage layer. AFAIK MySQL should be able to store UTF8 in colums, so to have 200 true UTF8 characters instead of 38 to 200 urlencoded, subset of us-ascii ones.
The related refactorings could benefit the overall UTF8 support of the application as a bonus.
MySQL is able to store UTF8 indeed and that is already the fact for post_content and post_title in (wp_)posts table. They don't get urlencoded before stored in the db. post_name is urlencoded though, and I'm not sure if it's technically safe to alter this.
#15
in reply to:
↑ 14
@
14 years ago
Replying to linuxologos:
Replying to hakre:
Replying to linuxologos:
[...]
MySQL is able to store UTF8 indeed and that is already the fact for post_content and post_title in (wp_)posts table. They don't get urlencoded before stored in the db. post_name is urlencoded though, and I'm not sure if it's technically safe to alter this.
I have not said that this is a trivial change and in fact, I can not even say if the project would be able to perform such changes and a refactoring properly at all.
Replying to Denis-de-Bernardy:
if my memory serves me well, the protocol actually assumes a URI is never longer than 255 chars.
Indeed, RFC 2616 suggests to avoid URIs longer than 255 chars:
The HTTP protocol does not place any a priori limit on the length of
a URI. Servers MUST be able to handle the URI of any resource they
serve, and SHOULD be able to handle URIs of unbounded length if they
provide GET-based forms that could generate such URIs. A server
SHOULD return 414 (Request-URI Too Long) status if a URI is longer
than the server can handle (see section 10.4.15).
Note: Servers ought to be cautious about depending on URI lengths
above 255 bytes, because some older client or proxy
implementations might not properly support these lengths.
from: 3.2.1 General Syntax
Next to that 255 char limit by caution, there is a physical one for the browsers. Microsoft Internet Explorer is introducing the lowest limit which is a little bit up to 2000 characters according to WWW FAQs: What is the maximum length of a URL?. Generally these lengths relate to one-char = one-byte in us-ASCII encoding of an (urlencoded) URL, a subset of URI.
I think the 414 response is something WP don't do so far, which is classified as SHOULD. I have no idea about the overall parameters this is related to, I think those are undocumented so far which need to reveal those from the code-base first before coping with that problem which is out of the scope of this ticket as well.
#16
follow-up:
↓ 18
@
14 years ago
- Cc info@… added
I once did a clean URL conversion from the cyrillic alphabet to the 'regular' alphabet in a different CMS. 'Дистрибьюторы' would become 'distributori' as a URL slug. This would solve the problem because I believe the latter is still valid Russian. To me this seems a good solution because this can also be applied to other character sets like ancient Greek. Please let me know if this is desired because then I will look into a WordPress patch for it.
#18
in reply to:
↑ 16
;
follow-up:
↓ 22
@
14 years ago
Replying to ldebrouwer:
I once did a clean URL conversion from the cyrillic alphabet to the 'regular' alphabet in a different CMS. 'Дистрибьюторы' would become 'distributori' as a URL slug. This would solve the problem because I believe the latter is still valid Russian.
It's not Cyrillic, so it's not valid Russian. Transliteration is acceptable for some people (including me), and there are some plugins which transliterate post and term slugs, but it's only a workaround and not a long-term solution for everyone.
The best solution here would be to store slugs as is, not in urlencoded form, since it only allows 33 chars for non-English slugs, which is noticeably less than the original 200 characters limit.
The problem is not only the length, though. Create two posts on 3.3-trunk with the same title:
Предлагаем супер металлообрабатывающее оборудование
The first one will just have a truncated slug:
предлагаем-супер-металлообрабатываю
But the second one will have a broken slug:
предлагаем-супер-металлообрабатыва�%-2
#19
@
14 years ago
Related BuddyPress ticket: http://buddypress.trac.wordpress.org/ticket/2066
#20
@
13 years ago
http://dev.mysql.com/doc/refman/5.0/en/string-type-overview.html
"In MySQL 5.0, the range of M is 0 to 255 before MySQL 5.0.3, and 0 to 65,535 in MySQL 5.0.3 and later."
$required_mysql_version = '5.0';
#21
follow-up:
↓ 27
@
13 years ago
As discussed during the 3.2 cycle, we had aimed to choose 5.0.22. Version 5.0.15 was the first production version, and 5.0.22 was the first with any real usage. We didn't push it because we did not identify any version-specific things we would have wanted.
42.5% of all installs are on 5.0. However, our stats show that there are 59 total installs on 5.0.0, 5.0.1, and 5.0.2. And that number is probably inflated, based on how our stats collection works. I think it would be safe to bump the required version up a bit.
#22
in reply to:
↑ 18
@
13 years ago
Replying to SergeyBiryukov:
But the second one will have a broken slug:
предлагаем-супер-металлообрабатыва�%-2
Related: #21013
This ticket was mentioned in IRC in #wordpress-dev by johnbillion. View the logs.
11 years ago
This ticket was mentioned in IRC in #wordpress-dev by simonwheatley. View the logs.
11 years ago
#26
@
10 years ago
I'm a bit confused. WP 4.2 changes the posts table and specifically the index on post_name to be 191 long, (which takes a long time on large tables, which is why I noticed). Reading over the reasoning, I was expecting the post_name field to be 400 long, but I still see it as 200 on all upgraded 4.2 instances.
Should I be expecting the field name to change to 400 on existing tables or is this only for new ones created from 4.2 on?
#27
in reply to:
↑ 21
;
follow-up:
↓ 29
@
8 years ago
Replying to nacin:
42.5% of all installs are on 5.0. However, our stats show that there are 59 total installs on 5.0.0, 5.0.1, and 5.0.2. And that number is probably inflated, based on how our stats collection works. I think it would be safe to bump the required version up a bit.
MySQL 5.0.x today is at 4.2% https://wordpress.org/about/stats/
Replying to Denis-de-Bernardy:
Replying to elnur:
So, wouldn't someone apply this patch to the code? :)
We, no... Not until MySQL 5 is the default, anyway. And then we'd need to worry about url length...
Punting to Future in the meanwhile.
Is now the future?
Related: #meta1884
#28
@
8 years ago
- Keywords early added
- Milestone changed from Future Release to 4.7
- Owner set to SergeyBiryukov
- Priority changed from low to normal
- Status changed from assigned to reviewing
#29
in reply to:
↑ 27
;
follow-up:
↓ 42
@
8 years ago
Replying to netweb:
Replying to nacin:
42.5% of all installs are on 5.0. However, our stats show that there are 59 total installs on 5.0.0, 5.0.1, and 5.0.2. And that number is probably inflated, based on how our stats collection works. I think it would be safe to bump the required version up a bit.
MySQL 5.0.x today is at 4.2% https://wordpress.org/about/stats/
..
Is now the future?
Although MySQL 5.0.x is only 4.2%, the install count is still in the millions.
MySQL 5.0.3 was the first to support > 255char for VARCHAR.
Stats currently show there are 3 sites on MySQL 5.0.2 (running WordPress 2.8 & 3.0.1), and 36 sites on < MySQL 5.0.15 (Ranges from WordPress 2.8 ~ 4.5.3). To clarify, We have no sites which are recorded as currently running MySQL 5.0.0/5.0.1/5.0.3 in the WordPress.org stats.
If we wanted to bump the requirements in order to get >255 char for VARCHAR, then it's currently safe to bump it to 5.0.4 without any real user impact.
#30
@
8 years ago
The suggested limit of 400 seems still not enough, as it only allows up to 66 encoded characters. Would like to explore our options here.
#31
@
8 years ago
It's also worth remembering that we're limited by the 767-byte index limits, so post_name
can only ever be indexed by the first 767bytes / 4bytes in utf8mb4 = 191 characters, so for encoded character urls, only by the first 31 characters if they're stored encoded in the database.
#32
@
8 years ago
Quoting SergeyBiryukov in #meta1884]:
For a language where most words are longer than in English, this is often insufficient. English topic slugs, on the other hand, allow up to 80 characters (which is the
maxlength
attribute on the input).
500 (rounded up from 484) would allow for 80 characters, kind of bringing it line with the above English max of 80.
#33
@
8 years ago
80 was a reference to the topic title input on support forums, it looks like the title input on Edit Post screen doesn't have a maxlength
attribute at all. 80 is obviously better than 33 though :)
This ticket was mentioned in Slack in #core by helen. View the logs.
8 years ago
#35
@
8 years ago
@SergeyBiryukov Still hoping to get something done for 4.7? I'm worrying that the window for something like this is coming to a close.
This ticket was mentioned in Slack in #core by chriscct7. View the logs.
8 years ago
#39
follow-up:
↓ 41
@
8 years ago
Hello, there is full instruction on russian lang. How to change url length in WordPress for cyrillic or IDN domains.
Всем русскоговорящим привет! Здесь есть подробная инструкция по изменению длины URL в Wordpress
Вот решение: https://ardua.ru/cyrillic-length-url
Все правится ручками.
Два файла formatting.php и schema.php и изменение размера поля в БД (phpmyadmin или SQL запрос)
#41
in reply to:
↑ 39
@
7 years ago
Replying to savvadesogle:
Hello, there is full instruction on russian lang. How to change url length in WordPress for cyrillic or IDN domains.
Всем русскоговорящим привет! Здесь есть подробная инструкция по изменению длины URL в Wordpress
Вот решение: https://ardua.ru/cyrillic-length-url
Все правится ручками.
Два файла formatting.php и schema.php и изменение размера поля в БД (phpmyadmin или SQL запрос)
А можно ли сделать чтобы ВП записывал в базу урл на кирилице, а не конвертировал его? Из-за этой конвертации когда ставишь через редактор ссылку на внутреннюю страницу вместо /сайт.рф/адрес-страницы-поста/ получается такое /%d1%81%d0%b0%d0%bb%d0%b0%d1%82-%d0%b7-%d0%ba%d1%83%d1%80%d0%ba%d0%b8-%d1%96-%d1%88%d0%b0%d0%bc%d0%bf%d1%96%d0%bd%d1%8c%d0%b9%d0%be%d0%bd%d1%96%d0%b2/ . В этом же формате урл ставится в rel="canonical".
---
Пытался написать этот в комментариях на вашем сайте, но там постоянно ругается что неверная каптча, хотя все правильно отвечал.
#42
in reply to:
↑ 29
;
follow-up:
↓ 43
@
7 years ago
Replying to dd32:
Although MySQL 5.0.x is only 4.2%, the install count is still in the millions.
MySQL 5.0.3 was the first to support > 255char for VARCHAR.
Stats currently show there are 3 sites on MySQL 5.0.2 (running WordPress 2.8 & 3.0.1), and 36 sites on < MySQL 5.0.15 (Ranges from WordPress 2.8 ~ 4.5.3). To clarify, We have no sites which are recorded as currently running MySQL 5.0.0/5.0.1/5.0.3 in the WordPress.org stats.
If we wanted to bump the requirements in order to get >255 char for VARCHAR, then it's currently safe to bump it to 5.0.4 without any real user impact.
I see MySQL 5.0 is now down to 2.2%
@dd32 Hows the extended stats look that you have access to, could we bump this for WP 5.0?
#43
in reply to:
↑ 42
;
follow-up:
↓ 44
@
7 years ago
Replying to netweb:
I see MySQL 5.0 is now down to 2.2%
@dd32 Hows the extended stats look that you have access to, could we bump this for WP 5.0?
@netweb:
MySQL 5.0 is at ~1.0% for WP 4.8+4.9.
MySQL 5.1 is at ~2.5% for WP 4.8+4.9.
In order to take baby steps forward, I've dived into the data to figure out what the minimum version numbers we can update to without even thinking twice about it.
- 5.0.90~5.0.96 accounts for 85% of sites running MySQL 5.0.
- 5.0.44+ is where the majority of sites are who aren't running the latest releases starts - That would account for 98% of the 1.0% running MySQL 5.0.x
- 5.0.16+ is the next major point worth mentioning. At this point the percentage of sites I'm talking about is so small it's not worth calculating.
- 5.0~5.0.15 is used by a handful of recent sites, but mostly WP 2.x/3.x.
If we want to bump the MySQL version, just because we can, pick a version - 5.0.16
or 5.0.44
would be my suggestion - either of them would allow this ticket to be fixed. It just depends on what bugs/features were added during the MySQL 5.0 cycle.
Bumping to 5.0.16
is a no-brainer. Bumping to 5.0.44
might affect a few people, but I doubt you'd hear about it. I think if we accidentally started using a 5.0.44
feature and didn't bump the requirements, we'd probably not even hear about it.
If we want to drop support for MySQL 5.0, I'd ask for a feature that 5.1 has that we need. There's enough sites on 5.0.90+ which makes keeping 5.0 support worthwhile, for now.
#44
in reply to:
↑ 43
@
7 years ago
- Milestone changed from Future Release to 5.0
Replying to dd32:
Bumping to
5.0.16
is a no-brainer. Bumping to5.0.44
might affect a few people, but I doubt you'd hear about it. I think if we accidentally started using a5.0.44
feature and didn't bump the requirements, we'd probably not even hear about it.
Thanks for these stats @dd32 :+1:
Let's bump it to 5.0.16
for WP 5.0, that gets this ticket fixed with the lowest impact
Just as the Browse Happy and Server Happy teams that came before, the Query Happy team will be just taking off and ready to bump the MySQL minimum required version to 5.0.44
for WordPress 5.1...
In writing the above in jest, it's not the worst idea I've had today...
#46
@
6 years ago
- Milestone changed from 5.1 to 5.2
Per this post, we're bumping the minimum MySQL version to 5.5 in April, which will probably by WordPress 5.2.
#47
@
6 years ago
- Milestone changed from 5.2 to 5.3
This ticket has not received any attention during the 5.2 cycle. With beta 1 tomorrow, going to punt this to 5.3, especially since it is marked early
.
This ticket was mentioned in Slack in #core by mcmwebsol. View the logs.
5 years ago
#51
@
5 years ago
@SergeyBiryukov The latest patch on this one is still good. Can this be considered for 5.3 Beta 1 landing tomorrow?
#52
@
5 years ago
- Milestone changed from 5.3 to Future Release
With version 5.3 Beta 1 releasing shortly, the deadline for enhancements is now passed. This is being moved to Future Release
. @SergeyBiryukov if you feel this can be included in 5.4, feel free to move up the milestone.
#53
@
4 years ago
Bump!
Mysql 5.0 is getting even more rare, Wordpress 5.3 had been already released, 5.4 too...
Any plans to ever include this patch?
#55
in reply to:
↑ 54
@
4 years ago
Replying to mcmwebsol:
I'm afraid it's too late for WordPress 5.5.
Is it though? @SergeyBiryukov are you still interested in working on this? I am happy to test it.
if my memory serves me well, the protocol actually assumes a uri is never longer than 255 chars.