WordPress.org

Make WordPress Core

Opened 5 years ago

Last modified 22 months ago

#10483 reopened enhancement

Change post_name's length from 200 to 400

Reported by: elnur Owned by: ryan
Milestone: Future Release Priority: low
Severity: minor Version:
Component: Permalinks Keywords: dev-feedback needs-patch
Focuses: Cc:

Description

Hello, guys! Thank you very much for providing such a great piece of software! I love WordPress very much! :)

I use WordPress in Russian language and the URLs on my blog consist of Russian characters. There is a post with not such a long URL in Russian, but since it gets encoded to special characters it becomes too long to get fit into post_name field of post table.

I've found what code needs to be changed to increase the length. I make these changes every time a new version is released. I think it would be better to submit a patch here so that others people can benefit from it and I will not need to make those changes every release.

I'm attaching the patch to this ticket and asking you to apply it to the code.

Thank you very much again, guys! You do a great job! :)

Cheers,
Elnur

Attachments (1)

post_name_patch.diff (1.1 KB) - added by elnur 5 years ago.

Download all attachments as: .zip

Change History (23)

elnur5 years ago

comment:1 Denis-de-Bernardy5 years ago

if my memory serves me well, the protocol actually assumes a uri is never longer than 255 chars.

comment:2 follow-up: elnur5 years ago

This link http://www.ielnur.com/blog/2009/05/%d1%81%d0%bd%d0%be%d0%b2%d0%b0-%d0%b1%d1%80%d0%be%d1%81%d0%b8%d1%82%d1%8c-%d0%ba%d1%83%d1%80%d0%b8%d1%82%d1%8c-30-%d1%82%d0%b8%d0%b4%d0%bd%d0%b5%d0%b2%d0%bd%d0%be%d0%b5-%d0%b8%d1%81%d0%bf%d1%8b%d1%82%d0%b0%d0%bd%d0%b8%d0%b5/ is 258 chars long, post_name part is 223. It works fine with our local search engine http://www.yandex.ru and with http://www.google.ru as well.

But I might be wrong in trying to change post_name's length to 400. I remembered that varchar can't hold more than 255 chars. Isn't that true?

comment:3 in reply to: ↑ 2 Denis-de-Bernardy5 years ago

Replying to elnur:

But I might be wrong in trying to change post_name's length to 400. I remembered that varchar can't hold more than 255 chars. Isn't that true?

yeah, until Mysql 5

comment:4 follow-up: elnur5 years ago

So, wouldn't someone apply this patch to the code? :)

comment:5 in reply to: ↑ 4 Denis-de-Bernardy5 years ago

  • Component changed from General to Permalinks
  • Milestone changed from 2.8.3 to Future Release
  • Owner set to ryan
  • Priority changed from normal to low
  • Severity changed from normal to minor

Replying to elnur:

So, wouldn't someone apply this patch to the code? :)

We, no... Not until MySQL 5 is the default, anyway. And then we'd need to worry about url length...

Punting to Future in the meanwhile.

comment:6 elnur5 years ago

But what if we limit it with 255 chars?

comment:7 Denis-de-Bernardy5 years ago

sure, but then there's the domain too, and the second point raised above.

comment:8 solarissmoke3 years ago

  • Keywords close added

close as maybelater?

comment:9 RyanMurphy3 years ago

  • Keywords dev-feedback added

Since we're going to MySQL5 in 3.2, can't this be considered for commit?

comment:10 nacin3 years ago

  • Milestone Future Release deleted
  • Resolution set to maybelater
  • Status changed from new to closed

We're not adjusting anything for MySQL 5 at this time. Closing as maybelater.

comment:11 nacin3 years ago

  • Keywords close removed
  • Milestone set to Future Release
  • Resolution maybelater deleted
  • Status changed from closed to reopened

Reopening for discussion.

comment:12 follow-up: linuxologos3 years ago

  • Cc linuxologos@… added

Thanks for giving the opportunity to discuss this.

This is quite a big problem for languages with an alphabet totally different from English. We can't have a post_name with more than ~38 letters, since every letter is urlencoded to be stored in the database, so every letter is converted into many more characters, dramatically cutting down the maximum possible length of the "real" post-name.

comment:13 in reply to: ↑ 12 ; follow-up: hakre3 years ago

Replying to linuxologos:

Thanks for giving the opportunity to discuss this.

This is quite a big problem for languages with an alphabet totally different from English. We can't have a post_name with more than ~38 letters, since every letter is urlencoded to be stored in the database, so every letter is converted into many more characters, dramatically cutting down the maximum possible length of the "real" post-name.

It's probably worth to drop the urlencoding then inside the storage layer. AFAIK MySQL should be able to store UTF8 in colums, so to have 200 true UTF8 characters instead of 38 to 200 urlencoded, subset of us-ascii ones.

The related refactorings could benefit the overall UTF8 support of the application as a bonus.

comment:14 in reply to: ↑ 13 ; follow-up: linuxologos3 years ago

Replying to hakre:

Replying to linuxologos:

Thanks for giving the opportunity to discuss this.

This is quite a big problem for languages with an alphabet totally different from English. We can't have a post_name with more than ~38 letters, since every letter is urlencoded to be stored in the database, so every letter is converted into many more characters, dramatically cutting down the maximum possible length of the "real" post-name.

It's probably worth to drop the urlencoding then inside the storage layer. AFAIK MySQL should be able to store UTF8 in colums, so to have 200 true UTF8 characters instead of 38 to 200 urlencoded, subset of us-ascii ones.

The related refactorings could benefit the overall UTF8 support of the application as a bonus.

MySQL is able to store UTF8 indeed and that is already the fact for post_content and post_title in (wp_)posts table. They don't get urlencoded before stored in the db. post_name is urlencoded though, and I'm not sure if it's technically safe to alter this.

comment:15 in reply to: ↑ 14 hakre3 years ago

Replying to linuxologos:

Replying to hakre:

Replying to linuxologos:

[...]

MySQL is able to store UTF8 indeed and that is already the fact for post_content and post_title in (wp_)posts table. They don't get urlencoded before stored in the db. post_name is urlencoded though, and I'm not sure if it's technically safe to alter this.

I have not said that this is a trivial change and in fact, I can not even say if the project would be able to perform such changes and a refactoring properly at all.


Replying to Denis-de-Bernardy:

if my memory serves me well, the protocol actually assumes a URI is never longer than 255 chars.

Indeed, RFC 2616 suggests to avoid URIs longer than 255 chars:

The HTTP protocol does not place any a priori limit on the length of
a URI. Servers MUST be able to handle the URI of any resource they
serve, and SHOULD be able to handle URIs of unbounded length if they
provide GET-based forms that could generate such URIs. A server
SHOULD return 414 (Request-URI Too Long) status if a URI is longer
than the server can handle (see section 10.4.15).

Note: Servers ought to be cautious about depending on URI lengths
above 255 bytes, because some older client or proxy
implementations might not properly support these lengths.

from: 3.2.1 General Syntax

Next to that 255 char limit by caution, there is a physical one for the browsers. Microsoft Internet Explorer is introducing the lowest limit which is a little bit up to 2000 characters according to WWW FAQs: What is the maximum length of a URL?. Generally these lengths relate to one-char = one-byte in us-ASCII encoding of an (urlencoded) URL, a subset of URI.

I think the 414 response is something WP don't do so far, which is classified as SHOULD. I have no idea about the overall parameters this is related to, I think those are undocumented so far which need to reveal those from the code-base first before coping with that problem which is out of the scope of this ticket as well.

comment:16 follow-up: ldebrouwer3 years ago

  • Cc info@… added

I once did a clean URL conversion from the cyrillic alphabet to the 'regular' alphabet in a different CMS. 'Дистрибьюторы' would become 'distributori' as a URL slug. This would solve the problem because I believe the latter is still valid Russian. To me this seems a good solution because this can also be applied to other character sets like ancient Greek. Please let me know if this is desired because then I will look into a WordPress patch for it.

comment:17 SergeyBiryukov3 years ago

  • Keywords needs-patch added

Related: #16230

comment:18 in reply to: ↑ 16 ; follow-up: SergeyBiryukov3 years ago

Replying to ldebrouwer:

I once did a clean URL conversion from the cyrillic alphabet to the 'regular' alphabet in a different CMS. 'Дистрибьюторы' would become 'distributori' as a URL slug. This would solve the problem because I believe the latter is still valid Russian.

It's not Cyrillic, so it's not valid Russian. Transliteration is acceptable for some people (including me), and there are some plugins which transliterate post and term slugs, but it's only a workaround and not a long-term solution for everyone.

The best solution here would be to store slugs as is, not in urlencoded form, since it only allows 33 chars for non-English slugs, which is noticeably less than the original 200 characters limit.

The problem is not only the length, though. Create two posts on 3.3-trunk with the same title:

Предлагаем супер металлообрабатывающее оборудование

The first one will just have a truncated slug:

предлагаем-супер-металлообрабатываю

But the second one will have a broken slug:

предлагаем-супер-металлообрабатыва�%-2

comment:20 ryan2 years ago

http://dev.mysql.com/doc/refman/5.0/en/string-type-overview.html

"In MySQL 5.0, the range of M is 0 to 255 before MySQL 5.0.3, and 0 to 65,535 in MySQL 5.0.3 and later."

$required_mysql_version = '5.0';

comment:21 nacin2 years ago

As discussed during the 3.2 cycle, we had aimed to choose 5.0.22. Version 5.0.15 was the first production version, and 5.0.22 was the first with any real usage. We didn't push it because we did not identify any version-specific things we would have wanted.

42.5% of all installs are on 5.0. However, our stats show that there are 59 total installs on 5.0.0, 5.0.1, and 5.0.2. And that number is probably inflated, based on how our stats collection works. I think it would be safe to bump the required version up a bit.

comment:22 in reply to: ↑ 18 SergeyBiryukov22 months ago

Replying to SergeyBiryukov:

But the second one will have a broken slug:

предлагаем-супер-металлообрабатыва�%-2

Related: #21013

Note: See TracTickets for help on using tickets.