Context Navigation

← Previous Ticket
Next Ticket →

#16079 closed defect (bug) (fixed)

Automatic excerpts don't work well with Chinese txt (word counting)

Reported by:	houshuang	Owned by:	nacin
Milestone:	3.4	Priority:	normal
Severity:	normal	Version:	3.0.4
Component:	I18N	Keywords:	has-patch needs-testing commit
Focuses:		Cc:

Description

I use the twentyten template on my Chinese blog (http://reganmian.net/boke). For search and category pages, it lists unpredictable amounts of texts for the automated extracts, it seems to me that this is due to the way it counts "words". For example, setting the number of words to 3 (adding a filter in the functions.php of the template), cause two different posts to display widely varying lengths of extracts. I believe this is because the way the_extract function counts words does not work well with Chinese, which is written without spaces. Perhaps offer an option to revert to counting (unicode) characters in this case.

Attachments (6)

Screen shot 2011-01-02 at 3.15.46 PM.png (117.3 KB) - added by houshuang 13 years ago.: Example of different lengths of extracts
Screen shot 2011-01-02 at 3.16.00 PM.png (213.6 KB) - added by houshuang 13 years ago.: Zoomed in
16079.diff (1.3 KB) - added by nacin 12 years ago.
16079.2.diff (1.7 KB) - added by tenpura 12 years ago.
16079.3.diff (4.8 KB) - added by jiehanzheng 12 years ago.: Translators could configure how trimming works through their pomo translations. Supports all the requirements mentioned in this ticket + Chinese. Needs testing.
16079.3.2.diff (5.0 KB) - added by jiehanzheng 12 years ago.: Provides French and Spanish support based on 16079.3.diff, fixes default setting.

Download all attachments as: .zip

Change History (28)

@houshuang
13 years ago

Attachment Screen shot 2011-01-02 at 3.15.46 PM.png added

Example of different lengths of extracts

@houshuang
13 years ago

Attachment Screen shot 2011-01-02 at 3.16.00 PM.png added

Zoomed in

#1 @nacin
13 years ago

Related: #8759.

#2 @markjaquith
13 years ago

Milestone changed from Awaiting Review to Future Release

#3 @westi
13 years ago

Owner set to westi
Status changed from new to assigned

#4 @westi
13 years ago

Summary changed from Automatic extracts don't work well with Chinese txt (word counting) to Automatic exceprts don't work well with Chinese txt (word counting)

#5 @andrewspittle
13 years ago

Summary changed from Automatic exceprts don't work well with Chinese txt (word counting) to Automatic excerpts don't work well with Chinese txt (word counting)

#6 @nacin
12 years ago

Component changed from Template to I18N

#7 @nacin
12 years ago

Milestone changed from Future Release to 3.4

@nacin
12 years ago

Attachment 16079.diff added

#8 @nacin
12 years ago

Keywords has-patch needs-testing added

16079.diff

#9 @westi
12 years ago

X-Referencing the other related ticket - #8759

#10 @westi
12 years ago

Owner changed from westi to nacin

@tenpura
12 years ago

Attachment 16079.2.diff added

#11 @tenpura
12 years ago

16079.2.diff fixes mainly two things about 16079.diff .

preg_match_all() without u (PCRE_UTF8) modifier destroys UTF-8 multibyte characters.
implode() with ' ' separator chops strings like 'm e a t'.

#12 @Nao
12 years ago

I tested tenpura's patch 16079.2.diff against 3.4 beta 1 Japanese.
With Twenty Eleven theme enabled, the search result page correctly showed trimmed Japanese text at 40 characters.

#13 @jiehanzheng
12 years ago

16079.3.2.diff gives translators power to control every aspect of the way trimming works. Translators may use the same method as stated in my another comment to configure this.

With this patch, translators may decide:

whether or not to count Latin part by characters.
whether or not to break Latin words apart to fit word count.
whether or not to count East Asia punctuation marks.
whether or not to count spaces.

This should meet the needs of general English usage (options are default to English usage, if pomo translations are not present), Japanese usage mentioned in this ticket, and Chinese conventions.

Last edited 12 years ago by jiehanzheng (previous) (diff)

@jiehanzheng
12 years ago

Attachment 16079.3.diff added

Translators could configure how trimming works through their pomo translations. Supports all the requirements mentioned in this ticket + Chinese. Needs testing.

@jiehanzheng
12 years ago

Attachment 16079.3.2.diff added

Provides French and Spanish support based on 16079.3.diff, fixes default setting.

#14 follow-ups: ↓ 15 ↓ 20 @nacin
12 years ago

Keywords commit added

There is a lot of good stuff here, but man, that is a lot. What is wrong with 16079.2.diff for 3.4? It is better than what we have for all locales, yes?

#15 in reply to: ↑ 14 ; follow-up: ↓ 16 @jiehanzheng
12 years ago

Replying to nacin:

There is a lot of good stuff here, but man, that is a lot. What is wrong with 16079.2.diff for 3.4? It is better than what we have for all locales, yes?

Because there is no way we can configure 16079.2.diff to make it work for Chinese usage conventions. I might integrate 16079.3.2.diff into our $locale.php and keep that file for Chinese for now.

#16 in reply to: ↑ 15 ; follow-up: ↓ 17 @nacin
12 years ago

Replying to jiehanzheng:

Because there is no way we can configure 16079.2.diff to make it work for Chinese usage conventions. I might integrate 16079.3.2.diff into our $locale.php and keep that file for Chinese for now.

Ideally, we avoid $locale.php for most locales in 3.4.

Ideally, what would it look like for Chinese? Not the configure-any-piece patch, but what specific pieces of code are appropriate for Chinese? And did you have any code in 3.3 to support this at all?

#17 in reply to: ↑ 16 @jiehanzheng
12 years ago

Replying to nacin:

Ideally, we avoid $locale.php for most locales in 3.4.

Ideally, what would it look like for Chinese? Not the configure-any-piece patch, but what specific pieces of code are appropriate for Chinese? And did you have any code in 3.3 to support this at all?

Yes I understand and I support the idea that we should prevent the use of $locale.php. I will try my best to make this happen in core without having extra files, but if this is really hard for the dev team and the community, we will have to use $locale.php.

We want to have the automatic excerpt yield the same result as our word-count.js proposal:
http://core.trac.wordpress.org/ticket/8759#comment:30

We did not have automatic excerpt support before. However, now that I've finished it, so I might consider including it into zh_CN.php for now if the updated formatting.php in 3.4 does not have Chinese excerpt capabilities. Thanks.

#18 @sirzooro
12 years ago

Cc sirzooro added

#19 @jane
12 years ago

Is this really a blocker for 3.4? Seems like this could be dealt with in 3.5.

#20 in reply to: ↑ 14 @westi
12 years ago

Replying to nacin:

There is a lot of good stuff here, but man, that is a lot. What is wrong with 16079.2.diff for 3.4? It is better than what we have for all locales, yes?

We will go with this for 3.4 and should raise a new ticket for 3.5 to cover improving this further.

#21 @westi
12 years ago

In [20859]:

i18n: Update the word splitting we use when trimming strings to build excerpts so that it has support for a character based mode for locales where character splitting is more approproate like Japan.

See #16079 props tenpura.

#22 @westi
12 years ago

Resolution set to fixed
Status changed from assigned to closed

This is done for 3.4, Raise #20739 for future enhancements.

Note: See TracTickets for help on using tickets.

Trac UI Preferences

Download in other formats:

Make WordPress Core

Context Navigation

#16079 closed defect (bug) (fixed)

Automatic excerpts don't work well with Chinese txt (word counting)

Description

Attachments (6)

Change History (28)

@houshuang 13 years ago

@houshuang 13 years ago

#1 @nacin 13 years ago

#2 @markjaquith 13 years ago

#3 @westi 13 years ago

#4 @westi 13 years ago

#5 @andrewspittle 13 years ago

#6 @nacin 12 years ago

#7 @nacin 12 years ago

@nacin 12 years ago

#8 @nacin 12 years ago

#9 @westi 12 years ago

#10 @westi 12 years ago

@tenpura 12 years ago

#11 @tenpura 12 years ago

#12 @Nao 12 years ago

#13 @jiehanzheng 12 years ago

@jiehanzheng 12 years ago

@jiehanzheng 12 years ago

#14 follow-ups: ↓ 15 ↓ 20 @nacin 12 years ago

#15 in reply to: ↑ 14 ; follow-up: ↓ 16 @jiehanzheng 12 years ago

#16 in reply to: ↑ 15 ; follow-up: ↓ 17 @nacin 12 years ago

#17 in reply to: ↑ 16 @jiehanzheng 12 years ago

#18 @sirzooro 12 years ago

#19 @jane 12 years ago

#20 in reply to: ↑ 14 @westi 12 years ago

#21 @westi 12 years ago

#22 @westi 12 years ago

Download in other formats:

@houshuang
13 years ago

@houshuang
13 years ago

#1 @nacin
13 years ago

#2 @markjaquith
13 years ago

#3 @westi
13 years ago

#4 @westi
13 years ago

#5 @andrewspittle
13 years ago

#6 @nacin
12 years ago

#7 @nacin
12 years ago

@nacin
12 years ago

#8 @nacin
12 years ago

#9 @westi
12 years ago

#10 @westi
12 years ago

@tenpura
12 years ago

#11 @tenpura
12 years ago

#12 @Nao
12 years ago

#13 @jiehanzheng
12 years ago

@jiehanzheng
12 years ago

@jiehanzheng
12 years ago

#14 follow-ups: ↓ 15 ↓ 20 @nacin
12 years ago

#15 in reply to: ↑ 14 ; follow-up: ↓ 16 @jiehanzheng
12 years ago

#16 in reply to: ↑ 15 ; follow-up: ↓ 17 @nacin
12 years ago

#17 in reply to: ↑ 16 @jiehanzheng
12 years ago

#18 @sirzooro
12 years ago

#19 @jane
12 years ago

#20 in reply to: ↑ 14 @westi
12 years ago

#21 @westi
12 years ago

#22 @westi
12 years ago