#11669 closed defect (bug) (worksforme)
There's a problem to show one letter, and it cuts taxonomies names
Reported by: | maorb | Owned by: | hakre |
---|---|---|---|
Milestone: | Priority: | high | |
Severity: | major | Version: | 2.9 |
Component: | Charset | Keywords: | close |
Focuses: | Cc: |
Description
There's the Hebrew letter called "nun" (ascii char: 144, Unicode: 05E0) ( http://www.htmlescape.net/escape_hebrew.html ) that began to do problems in 2.9.
When trying to add that letter as a part of name of a category or post tag, it just add a blank name.
Only when adding it directly to the DB table, it does appear.
Many people in Israel encountered this issue, but not all of them.
Some thinks that it may be related to the problem with the PHP function preg_replace() that causes this.
This needs an asap fix, since it breaks many WP Hebrew based sites.
Change History (42)
#3
@
15 years ago
- Keywords reporter-feedback added
Please provide encoding information about your blog. The blogs encoding as well as the encoding of the database would be interesting.
Please provide tests to reproduce so that it can taken better care of this.
#4
@
15 years ago
- Keywords needs-patch removed
Reviewed:
test -נ- end
could be added as category via ajax on the post editor screen.
test -נ- end 2
could be added as category w/o javascript on categories screen.
test -נ- end 2
could be added as tag w/o javascript on tags screen.
test-נ-end
could be added as tag w/o javascript on Edit Post screen.
test-נ-end2
could be added as tag via ajax javascript on Edit Post screen.
Therefore I was not able to reproduce on a clean install here.
#5
@
15 years ago
The slugs are encoded in the DB, the names are not. But 2.9 does treat it differently than 2.8. In 2.8, the slug appears in admin as %d7%a0, while in 2.9 the slug shows the character.
Can confirm that on a clean utf8 install here too. (That it seems to work as intended, both as a single character, and as part of a string)
I'm thinking its due to the charset of the database in use, differing from what WordPress thinks it is.
maorb: Can you open PhpMyAdmin (Or any other DB viewing app) and have a look and see if it lists the Charset/Collation in use for the database tables?
#6
@
15 years ago
DB collation are utf8_general_ci for both tables and database.
It might be that the problem occurs on local XAMPP/WAMP installations and on windows' server, but not for sure. It also might be a PHP issue and not Wordpress one, but till 2.8.6 the bug didn't exist.
Is there some PHP functions that were not in use before 2.9?
This bug's behavior is not yet fully understood, since it doesn't occur for all blogs and sites.
I add here the link for the discussion on this issue in the wpheb google group (discussion there is in Hebrew, so it's Just for reference)
http://groups.google.co.il/group/wpheb/browse_thread/thread/996f0e258f75e59?hl=iw
#7
@
15 years ago
I'm running utf8_unicode_ci on the DB, also running XAMPP on Windows. I will test some other collation/charsets.
I feel we're going to need more reporter feedback on this. Can you get some more site admins in here, some of whom are experiencing this and some of whom or not, and ask them to share their setups?
#8
follow-up:
↓ 11
@
15 years ago
I'm having this problem with "nun" on one of my blogs. The problematic blog is 2.9, and setup on windows hosting - IIS7, php5.x.
MySQL version is 5.0
charset = UTF-8 Unicode (utf8)
connection collation = utf8_unicode_ci
From the posts on the hebrew forum, it looks like it has something to do with the windows environment.
#9
@
15 years ago
- Cc dshalgishira added
- Severity changed from normal to major
Hello all,
First I encountered this bug on my development environment WIN7 + XAmpp.
Then I read in the Hebrew group that it is only on XAMPP and not on live servers.
Than I installed it on the hosting, and it is a windows server hosting and it happens there too.
I think it is related to windows.
Daniel
#10
@
15 years ago
I tested OK on Windows7/Apache2/PHP/MySQL all custom installed, virtually using all-defaults.
#11
in reply to:
↑ 8
@
15 years ago
Replying to margolis:
I'm having this problem with "nun" on one of my blogs. The problematic blog is 2.9, and setup on windows hosting - IIS7, php5.x.
MySQL version is 5.0
charset = UTF-8 Unicode (utf8)
connection collation = utf8_unicode_ci
From the posts on the hebrew forum, it looks like it has something to do with the windows environment.
I'm running the exact same setup, except that I'm running Apache instead of IIS. But the original reporter here was using XAMPP.
The only difference I think is that we're not running the same locales...
Can those reporting the problem here disable all plugins, or run a clean install of WP 2.9 on the same server?
#12
@
15 years ago
I tested this in a post and as the tag at my development site: http://wpdev.doublelenterprises.com/2009/12/31/collation-testing/
I am running IIS 7 on Server 2008, with PHP 5.3.1 and MySQL 5.1.39
It appears fine.
Could this be related to the php/mysql combination?
What is your Windows server hosting config?
#13
@
15 years ago
@nacin - That problem occurs also on a fresh install with no plugins installed.
@kcristiano - I saw in your test that you added the letter "nun" inside a post, but the problem is not inside a post or page, but when adding the letter as part of a category or tag names.
#14
@
15 years ago
- Cc tomerc+core.trac.wordpress.org@… added
I have tested some PHP versions on Windows and was unable to reproduce this situation. Can you please provide some steps to reproduce?
Does this issue also occur using Erez Wolf's testcase which he published on the wpheb mailing list linked above?
<?php
echo preg_replace('/\s+/', ' ', 'אבגדהוזחטיכלמנסעפצקרשת');
?>
#15
@
15 years ago
Okay, I translated the thread (thanks, Google), and Erez Wolf points to a line in wp_strip_all_tags(), but the line was changed in [12501], via #11528, for 2.9.1.
Can anyone confirm that's what caused the problem? I can't get the test case above to strip the "nun" character, but since that's the test case (that fails?)...
#16
@
15 years ago
@maorb- I did place "nun" in the post, but that was also in the tag and the category. I could not get it to replicate. That is why I was curious if this pointed to an issue with php/mysql.
#17
@
15 years ago
As @nacin points out, here was a problem with one Cyrillic letter replaced by that regex. Could somebody that can reproduce this run the following test (from Tomer's comment):
echo preg_replace('/\s+/', ' ', 'אבגדהוזחטיכלמנסעפצקרשת') . '<br />'; echo preg_replace('/[\r\n\t ]+/', ' ', 'אבגדהוזחטיכלמנסעפצקרשת') . '<br />'; echo 'אבגדהוזחטיכלמנסעפצקרשת'; // for comparison
#18
@
15 years ago
, here was a problem with one Cyrillic letter replaced by that regex. Could somebody that can reproduce this run the following test
Confirmed here:
<?php echo preg_replace('/\s+/', ' ', 'אבגדהוזחטיכלמנסעפצקרשת') . '<br />'; echo preg_replace('/[\r\n\t ]+/', ' ', 'אבגדהוזחטיכלמנסעפצקרשת') . '<br />'; echo 'אבגדהוזחטיכלמנסעפצקרשת'; // for comparison ?> אבגדהוזחטיכלמ� סעפצקרשת אבגדהוזחטיכלמנסעפצקרשת אבגדהוזחטיכלמנסעפצקרשת
Note the missing/malformed char in the 1st line.
#21
follow-up:
↓ 23
@
15 years ago
- Resolution duplicate deleted
- Status changed from closed to reopened
#23
in reply to:
↑ 21
@
15 years ago
- Milestone changed from 2.9.2 to 2.9.1
- Resolution set to fixed
- Status changed from reopened to closed
#24
follow-up:
↓ 40
@
15 years ago
- Resolution fixed deleted
- Status changed from closed to reopened
With the Hebrew letter called "nun" (ascii char: 144, Unicode: 05E0) I still have problems on current trunk to browse the tag containing that char. I get 404.
Additionally I would not count 05 nor E0 as \s. [12501] did only subset \s to [\r\n\t ]
. So I can not see a match here on the binary level.
Additionally this ticket might be more related to #11175 then to #11528.
For the letter called "nun" please try to reproduce and confirm that it is not working:
- create a tag called
test -נ-
. - assign that tag to a published post.
- view that post on the blog.
- visit the archive-link for the
test -נ-
tag.
Expected result: You should view the archive for the tag containing at least one post.
Actual result: You get a 404.
#25
@
15 years ago
I need to correct that, I actually made some tests and the binary representation of that "nun" letter is d7 a0 (two byte char). I also tested against the /u modifier and it solves the problem.
$source = 'נ'; echo string_dump($source) . '<br>'; echo preg_replace( '/\s+/', ' ', $source ) . '<br>'; echo preg_replace( '/[\r\n\t ]+/', ' ', $source ) . '<br>'; echo preg_replace( '/\s+/u', ' ', $source ) . '<br>'; echo $source; // for comparison function string_dump($string) { $l = strlen( $string ); $dump = sprintf( '%d:', $l ); if ( $l ) for ( $i = 0; $i < $l; $i++ ) $dump .= sprintf( ' %x', ord( $string[$i] ) ); return $dump; }
Output:
2: d7 a0 � נ נ נ
So the cases are actually connected compared to the binary data. But still I get a 404 on browsing the tag.
#27
@
15 years ago
I saw the #11619 ticket attachments (img1 img2), and here is what i think:
This problem effect only "name" fields, the "slug" works ok. Slug use the 'editable_slug' filter to fix non-english character issues (see Ticket #10966).
Resolution for this issue:
- Use the 'editable_slug' filter on "name" field
- Creat a new filter for the "name" field to fix non-english character issues.
#28
@
15 years ago
- Cc margolis added
- Milestone changed from 2.9.1 to 2.9.3
- Version changed from 2.9 to 2.9.2
When the slug is in Hebrew and contains "nun" ("נ"), tag page is an error. Only when replacing the slug the page works.
This also affect plugins like search-unleashed when trying to replace the tag page with a search page.
Still happening in 2.9.2
#29
@
15 years ago
- Version changed from 2.9.2 to 2.9
Please leave the Version field set to the version in which the bug originally was reported in, this allows better tracking of reported bugs.
#32
in reply to:
↑ 31
@
15 years ago
Replying to nacin:
Replying to hakre:
Ticket #11619 should be re-tested after this is fixed in 3.0.
That's assuming there is a patch. (which are welcome)
Sorry, but nope. It's an advice only in case it would. I know that's not much, just read the 3.0 if you wanna punt as 3.0 or above. (you need to deal with my unwelcomed patches first nacin).
#36
@
14 years ago
- Keywords needs-patch added; reporter-feedback removed
- Milestone changed from 3.1 to Future Release
#40
in reply to:
↑ 24
;
follow-up:
↓ 41
@
13 years ago
- Milestone Future Release deleted
- Resolution set to worksforme
- Status changed from reopened to closed
Replying to hakre:
For the letter called "nun" please try to reproduce and confirm that it is not working:
- create a tag called
test -נ-
.- assign that tag to a published post.
- view that post on the blog.
- visit the archive-link for the
test -נ-
tag.Expected result: You should view the archive for the tag containing at least one post.
Works for me in current trunk.
Some quick testing on the en_US locale:
I can add the character (a single character, or as part of a word) as a tag and a category on 2.8, 2.9 and trunk.
The slugs are encoded in the DB, the names are not. But 2.9 does treat it differently than 2.8. In 2.8, the slug appears in admin as %d7%a0, while in 2.9 the slug shows the character.