WordPress.org

Make WordPress Core

Opened 8 years ago

Closed 8 years ago

Last modified 4 years ago

#2163 closed defect (bug) (fixed)

Still broken trackback ping in utf-8

Reported by: thinkini Owned by: matt
Milestone: 2.0.6 Priority: normal
Severity: normal Version: 2.0
Component: General Keywords: bg|needs-patch trackback utf-8
Focuses: Cc:

Description

Recently i reported a problem of broken trackback by #1647.
But I think you have some misunderstanding.
[3081] and [3107] don't help this problem solved.

If we regard one capital and one small letter as one character,
let us assume that AaBbCcDdEeFf is excerpted by 10bytes.

Before : $excerpt is AaBbCcDdEeFf

in /wp-includes/functions-post.php do_trackback function

$excerpt = substr($excerpt, 0, 7) . '...';
After : $excerpt is AaBbCcD...

Then Dd is cut off and it becomes a broken character.

With another blog tool, it is printed like AaBbCc?..., because D is broken character.

But in case of wordpress,

in wp-trackback.php

if ( function_exists('mb_convert_encoding') ) { // For international trackbacks
	$title     = mb_convert_encoding($title, get_settings('blog_charset'), $charset);
	$excerpt   = mb_convert_encoding($excerpt, get_settings('blog_charset'), $charset);
	$blog_name = mb_convert_encoding($blog_name, get_settings('blog_charset'), $charset);
}

mb_convert_encoding function considers that AaBbCcD... is not UTF-8 because D is broken.
Therefore AaBbCcD... is coverted UTF-8 by other endodings, and every character is broken after all.

Wordpress uses mbstring module for international trackback. So open /wp-includes/functions-post.php and find do_trackbacks function

$excerpt = substr($excerpt, 0, 252) . '...';

replace it with

if ( function_exists('mb_strcut') ) // For international trackbacks
    $excerpt = mb_strcut($excerpt, 0, 252, get_settings('blog_charset')) . '...';
else $excerpt = substr($excerpt, 0, 252) . '...';

Must use mb_strcut Not mb_substr!

Because mb_substr is cut by character and mb_strcut is cut by byte.

for example

mb_strcut('AaBbCc', 1, 2) returns 'Aa'.

Treated as byte stream.

mb_substr('AaBbCc', 1, 2) returns 'BbCc'

Treated as character stream.

Change History (5)

comment:1 ryan8 years ago

  • Resolution set to fixed
  • Status changed from new to closed

(In [3368]) i18n trackback fix. Props thinkini. fixes #2163

comment:2 ryan8 years ago

  • Resolution set to fixed

(In [3369]) Use mb_strcut instead of mb_substr. fixes #2163

comment:3 ryan8 years ago

  • Milestone changed from 2.1 to 2.0.1

comment:4 anonymous7 years ago

  • Milestone 2.0.1 deleted

Milestone 2.0.1 deleted

comment:5 Denis-de-Bernardy4 years ago

  • Milestone changed from Unassigned to 2.0.6
Note: See TracTickets for help on using tickets.