Make WordPress Core

Opened 14 years ago

Closed 12 years ago

Last modified 12 years ago

#11669 closed defect (bug) (worksforme)

There's a problem to show one letter, and it cuts taxonomies names

Reported by: maorb's profile maorb Owned by: hakre's profile hakre
Milestone: Priority: high
Severity: major Version: 2.9
Component: Charset Keywords: close
Focuses: Cc:

Description

There's the Hebrew letter called "nun" (ascii char: 144, Unicode: 05E0) ( http://www.htmlescape.net/escape_hebrew.html ) that began to do problems in 2.9.
When trying to add that letter as a part of name of a category or post tag, it just add a blank name.
Only when adding it directly to the DB table, it does appear.

Many people in Israel encountered this issue, but not all of them.
Some thinks that it may be related to the problem with the PHP function preg_replace() that causes this.

This needs an asap fix, since it breaks many WP Hebrew based sites.

Change History (42)

#1 @sirzooro
14 years ago

  • Keywords needs-patch added
  • Milestone changed from Unassigned to 2.9.1

#2 @nacin
14 years ago

Some quick testing on the en_US locale:
I can add the character (a single character, or as part of a word) as a tag and a category on 2.8, 2.9 and trunk.

The slugs are encoded in the DB, the names are not. But 2.9 does treat it differently than 2.8. In 2.8, the slug appears in admin as %d7%a0, while in 2.9 the slug shows the character.

#3 @hakre
14 years ago

  • Keywords reporter-feedback added

Please provide encoding information about your blog. The blogs encoding as well as the encoding of the database would be interesting.

Please provide tests to reproduce so that it can taken better care of this.

#4 @hakre
14 years ago

  • Keywords needs-patch removed

Reviewed:

test -נ- end could be added as category via ajax on the post editor screen.
test -נ- end 2 could be added as category w/o javascript on categories screen.
test -נ- end 2 could be added as tag w/o javascript on tags screen.
test-נ-end could be added as tag w/o javascript on Edit Post screen.
test-נ-end2 could be added as tag via ajax javascript on Edit Post screen.

Therefore I was not able to reproduce on a clean install here.

#5 @dd32
14 years ago

The slugs are encoded in the DB, the names are not. But 2.9 does treat it differently than 2.8. In 2.8, the slug appears in admin as %d7%a0, while in 2.9 the slug shows the character.

Can confirm that on a clean utf8 install here too. (That it seems to work as intended, both as a single character, and as part of a string)

I'm thinking its due to the charset of the database in use, differing from what WordPress thinks it is.

maorb: Can you open PhpMyAdmin (Or any other DB viewing app) and have a look and see if it lists the Charset/Collation in use for the database tables?

#6 @maorb
14 years ago

DB collation are utf8_general_ci for both tables and database.
It might be that the problem occurs on local XAMPP/WAMP installations and on windows' server, but not for sure. It also might be a PHP issue and not Wordpress one, but till 2.8.6 the bug didn't exist.
Is there some PHP functions that were not in use before 2.9?
This bug's behavior is not yet fully understood, since it doesn't occur for all blogs and sites.

I add here the link for the discussion on this issue in the wpheb google group (discussion there is in Hebrew, so it's Just for reference)
http://groups.google.co.il/group/wpheb/browse_thread/thread/996f0e258f75e59?hl=iw

#7 @nacin
14 years ago

I'm running utf8_unicode_ci on the DB, also running XAMPP on Windows. I will test some other collation/charsets.

I feel we're going to need more reporter feedback on this. Can you get some more site admins in here, some of whom are experiencing this and some of whom or not, and ask them to share their setups?

#8 follow-up: @margolis
14 years ago

I'm having this problem with "nun" on one of my blogs. The problematic blog is 2.9, and setup on windows hosting - IIS7, php5.x.
MySQL version is 5.0
charset = UTF-8 Unicode (utf8)
connection collation = utf8_unicode_ci
From the posts on the hebrew forum, it looks like it has something to do with the windows environment.

#9 @dshalgishira
14 years ago

  • Cc dshalgishira added
  • Severity changed from normal to major

Hello all,
First I encountered this bug on my development environment WIN7 + XAmpp.
Then I read in the Hebrew group that it is only on XAMPP and not on live servers.
Than I installed it on the hosting, and it is a windows server hosting and it happens there too.
I think it is related to windows.
Daniel

#10 @dd32
14 years ago

I tested OK on Windows7/Apache2/PHP/MySQL all custom installed, virtually using all-defaults.

#11 in reply to: ↑ 8 @nacin
14 years ago

Replying to margolis:

I'm having this problem with "nun" on one of my blogs. The problematic blog is 2.9, and setup on windows hosting - IIS7, php5.x.
MySQL version is 5.0
charset = UTF-8 Unicode (utf8)
connection collation = utf8_unicode_ci
From the posts on the hebrew forum, it looks like it has something to do with the windows environment.

I'm running the exact same setup, except that I'm running Apache instead of IIS. But the original reporter here was using XAMPP.

The only difference I think is that we're not running the same locales...

Can those reporting the problem here disable all plugins, or run a clean install of WP 2.9 on the same server?

#12 @kcristiano
14 years ago

I tested this in a post and as the tag at my development site: http://wpdev.doublelenterprises.com/2009/12/31/collation-testing/

I am running IIS 7 on Server 2008, with PHP 5.3.1 and MySQL 5.1.39

It appears fine.

Could this be related to the php/mysql combination?

What is your Windows server hosting config?

#13 @maorb
14 years ago

@nacin - That problem occurs also on a fresh install with no plugins installed.

@kcristiano - I saw in your test that you added the letter "nun" inside a post, but the problem is not inside a post or page, but when adding the letter as part of a category or tag names.

#14 @Tomer
14 years ago

  • Cc tomerc+core.trac.wordpress.org@… added

I have tested some PHP versions on Windows and was unable to reproduce this situation. Can you please provide some steps to reproduce?

Does this issue also occur using Erez Wolf's testcase which he published on the wpheb mailing list linked above?

<?php
echo preg_replace('/\s+/', ' ', 'אבגדהוזחטיכלמנסעפצקרשת');
?>

#15 @nacin
14 years ago

Okay, I translated the thread (thanks, Google), and Erez Wolf points to a line in wp_strip_all_tags(), but the line was changed in [12501], via #11528, for 2.9.1.

Can anyone confirm that's what caused the problem? I can't get the test case above to strip the "nun" character, but since that's the test case (that fails?)...

#16 @kcristiano
14 years ago

@maorb- I did place "nun" in the post, but that was also in the tag and the category. I could not get it to replicate. That is why I was curious if this pointed to an issue with php/mysql.

#17 @azaozz
14 years ago

As @nacin points out, here was a problem with one Cyrillic letter replaced by that regex. Could somebody that can reproduce this run the following test (from Tomer's comment):

echo preg_replace('/\s+/', ' ', 'אבגדהוזחטיכלמנסעפצקרשת') . '<br />';
echo preg_replace('/[\r\n\t ]+/', ' ', 'אבגדהוזחטיכלמנסעפצקרשת') . '<br />';
echo 'אבגדהוזחטיכלמנסעפצקרשת'; // for comparison

#18 @dd32
14 years ago

, here was a problem with one Cyrillic letter replaced by that regex. Could somebody that can reproduce this run the following test

Confirmed here:

<?php

 echo preg_replace('/\s+/', ' ', 'אבגדהוזחטיכלמנסעפצקרשת') . '<br />'; 
 echo preg_replace('/[\r\n\t ]+/', ' ', 'אבגדהוזחטיכלמנסעפצקרשת') . '<br 
 />'; 
 echo 'אבגדהוזחטיכלמנסעפצקרשת'; // for comparison

?> 
אבגדהוזחטיכלמ� סעפצקרשת
אבגדהוזחטיכלמנסעפצקרשת
אבגדהוזחטיכלמנסעפצקרשת

Note the missing/malformed char in the 1st line.

#19 @azaozz
14 years ago

  • Resolution set to duplicate
  • Status changed from new to closed

Great, thanks @dd32. So we can close this ticket as duplicate of #11528 fixed in [12501]. Feel free to reopen if the problem persists.

#20 @hakre
14 years ago

Related: #11724

#21 follow-up: @hakre
14 years ago

  • Resolution duplicate deleted
  • Status changed from closed to reopened

The fix in #11528 / [12501] is a placebo. I suggested there to add the u-modifier. Can not confirm that this is actually a duplicate here. It might look healed but indeed it is not.

#22 @Denis-de-Bernardy
14 years ago

  • Milestone changed from 2.9.1 to 2.9.2

#23 in reply to: ↑ 21 @azaozz
14 years ago

  • Milestone changed from 2.9.2 to 2.9.1
  • Resolution set to fixed
  • Status changed from reopened to closed

Replying to hakre:
Can you supply some proof when reopening if you think this is not a duplicate of #11528 or the problem with mangling the "נ" character persists.

#24 follow-up: @hakre
14 years ago

  • Resolution fixed deleted
  • Status changed from closed to reopened

With the Hebrew letter called "nun" (ascii char: 144, Unicode: 05E0) I still have problems on current trunk to browse the tag containing that char. I get 404.

Additionally I would not count 05 nor E0 as \s. [12501] did only subset \s to [\r\n\t ]. So I can not see a match here on the binary level.

Additionally this ticket might be more related to #11175 then to #11528.

For the letter called "nun" please try to reproduce and confirm that it is not working:

  1. create a tag called test -נ-.
  2. assign that tag to a published post.
  3. view that post on the blog.
  4. visit the archive-link for the test -נ- tag.

Expected result: You should view the archive for the tag containing at least one post.

Actual result: You get a 404.

#25 @hakre
14 years ago

I need to correct that, I actually made some tests and the binary representation of that "nun" letter is d7 a0 (two byte char). I also tested against the /u modifier and it solves the problem.

$source = 'נ';

echo string_dump($source) . '<br>';
echo preg_replace( '/\s+/', ' ', $source ) . '<br>';
echo preg_replace( '/[\r\n\t ]+/', ' ', $source ) . '<br>';
echo preg_replace( '/\s+/u', ' ', $source ) . '<br>';
echo $source; // for comparison

function string_dump($string) {
	
	$l = strlen( $string );
	$dump = sprintf( '%d:', $l );	
	if ( $l )	
		for ( $i = 0; $i < $l; $i++ )
			$dump .= sprintf( ' %x', ord( $string[$i] ) );	
	
	return $dump;
}

Output:

2: d7 a0
�
נ
נ
נ

So the cases are actually connected compared to the binary data. But still I get a 404 on browsing the tag.

#26 @hakre
14 years ago

Related / Similar: #11619 (set as fixed, but no frontend tests there as well).

#27 @ramiy
14 years ago

I saw the #11619 ticket attachments (img1 img2), and here is what i think:

This problem effect only "name" fields, the "slug" works ok. Slug use the 'editable_slug' filter to fix non-english character issues (see Ticket #10966).

Resolution for this issue:

  • Use the 'editable_slug' filter on "name" field
  • Creat a new filter for the "name" field to fix non-english character issues.

#28 @margolis
14 years ago

  • Cc margolis added
  • Milestone changed from 2.9.1 to 2.9.3
  • Version changed from 2.9 to 2.9.2

When the slug is in Hebrew and contains "nun" ("נ"), tag page is an error. Only when replacing the slug the page works.

This also affect plugins like search-unleashed when trying to replace the tag page with a search page.

Still happening in 2.9.2

#29 @dd32
14 years ago

  • Version changed from 2.9.2 to 2.9

Please leave the Version field set to the version in which the bug originally was reported in, this allows better tracking of reported bugs.

#30 follow-up: @hakre
14 years ago

Ticket #11619 should be re-tested after this is fixed in 3.0.

#31 in reply to: ↑ 30 ; follow-up: @nacin
14 years ago

Replying to hakre:

Ticket #11619 should be re-tested after this is fixed in 3.0.

That's assuming there is a patch. (which are welcome)

#32 in reply to: ↑ 31 @hakre
14 years ago

Replying to nacin:

Replying to hakre:

Ticket #11619 should be re-tested after this is fixed in 3.0.

That's assuming there is a patch. (which are welcome)

Sorry, but nope. It's an advice only in case it would. I know that's not much, just read the 3.0 if you wanna punt as 3.0 or above. (you need to deal with my unwelcomed patches first nacin).

#33 @hakre
14 years ago

Related: #13413

#34 @nacin
14 years ago

  • Milestone changed from 2.9.3 to 3.1

#35 @maorb
14 years ago

  • Cc maorb added

#36 @hakre
14 years ago

  • Keywords needs-patch added; reporter-feedback removed
  • Milestone changed from 3.1 to Future Release

#37 @hakre
14 years ago

Reference: #14292

#38 @hakre
14 years ago

I was able to write a patch that fixes a related issue: #13413

#39 @solarissmoke
13 years ago

  • Keywords close added; needs-patch removed

This works for me in trunk

#40 in reply to: ↑ 24 ; follow-up: @SergeyBiryukov
12 years ago

  • Milestone Future Release deleted
  • Resolution set to worksforme
  • Status changed from reopened to closed

Replying to hakre:

For the letter called "nun" please try to reproduce and confirm that it is not working:

  1. create a tag called test -נ-.
  2. assign that tag to a published post.
  3. view that post on the blog.
  4. visit the archive-link for the test -נ- tag.

Expected result: You should view the archive for the tag containing at least one post.

Works for me in current trunk.

#41 in reply to: ↑ 40 ; follow-up: @hakre
12 years ago

Replying to SergeyBiryukov:

Works for me in current trunk.

Please add your PHP (incl. platform) and PCRE version.

Last edited 12 years ago by hakre (previous) (diff)

#42 in reply to: ↑ 41 @SergeyBiryukov
12 years ago

Replying to hakre:

Please add your PHP (incl. platform) and PCRE version.

PHP 5.2.14 (Windows), PCRE 8.02 2010-03-19.

Note: See TracTickets for help on using tickets.