Opened 14 years ago
Closed 13 years ago
#16079 closed defect (bug) (fixed)
Automatic excerpts don't work well with Chinese txt (word counting)
Reported by: |
|
Owned by: |
|
---|---|---|---|
Milestone: | 3.4 | Priority: | normal |
Severity: | normal | Version: | 3.0.4 |
Component: | I18N | Keywords: | has-patch needs-testing commit |
Focuses: | Cc: |
Description
I use the twentyten template on my Chinese blog (http://reganmian.net/boke). For search and category pages, it lists unpredictable amounts of texts for the automated extracts, it seems to me that this is due to the way it counts "words". For example, setting the number of words to 3 (adding a filter in the functions.php of the template), cause two different posts to display widely varying lengths of extracts. I believe this is because the way the_extract function counts words does not work well with Chinese, which is written without spaces. Perhaps offer an option to revert to counting (unicode) characters in this case.
Attachments (6)
Change History (28)
#4
@
14 years ago
- Summary changed from Automatic extracts don't work well with Chinese txt (word counting) to Automatic exceprts don't work well with Chinese txt (word counting)
#5
@
14 years ago
- Summary changed from Automatic exceprts don't work well with Chinese txt (word counting) to Automatic excerpts don't work well with Chinese txt (word counting)
#11
@
13 years ago
16079.2.diff fixes mainly two things about 16079.diff.
- preg_match_all() without u (PCRE_UTF8) modifier destroys UTF-8 multibyte characters.
- implode() with ' ' separator chops strings like 'm e a t'.
#12
@
13 years ago
I tested tenpura's patch 16079.2.diff against 3.4 beta 1 Japanese.
With Twenty Eleven theme enabled, the search result page correctly showed trimmed Japanese text at 40 characters.
#13
@
13 years ago
16079.3.diff gives translators power to control every aspect of the way trimming works. Translators may use the same method as stated in my another comment to configure this.
With this patch, translators may decide:
- whether or not to count Latin part by characters.
- whether or not to break Latin words apart to fit word count.
- whether or not to count East Asia punctuation marks.
- whether or not to count spaces.
This should meet the needs of general English usage (options are default to English usage, if pomo translations are not present), Japanese usage mentioned in this ticket, and Chinese conventions.
@
13 years ago
Translators could configure how trimming works through their pomo translations. Supports all the requirements mentioned in this ticket + Chinese. Needs testing.
#14
follow-ups:
↓ 15
↓ 20
@
13 years ago
- Keywords commit added
There is a lot of good stuff here, but man, that is a lot. What is wrong with 16079.2.diff for 3.4? It is better than what we have for all locales, yes?
#15
in reply to:
↑ 14
;
follow-up:
↓ 16
@
13 years ago
Replying to nacin:
There is a lot of good stuff here, but man, that is a lot. What is wrong with 16079.2.diff for 3.4? It is better than what we have for all locales, yes?
Because there is no way we can configure 16079.2.diff to make it work for Chinese usage conventions. I might integrate 16079.3.2.diff into our $locale.php and keep that file for Chinese for now.
#16
in reply to:
↑ 15
;
follow-up:
↓ 17
@
13 years ago
Replying to jiehanzheng:
Because there is no way we can configure 16079.2.diff to make it work for Chinese usage conventions. I might integrate 16079.3.2.diff into our $locale.php and keep that file for Chinese for now.
Ideally, we avoid $locale.php for most locales in 3.4.
Ideally, what would it look like for Chinese? Not the configure-any-piece patch, but what specific pieces of code are appropriate for Chinese? And did you have any code in 3.3 to support this at all?
#17
in reply to:
↑ 16
@
13 years ago
Replying to nacin:
Ideally, we avoid $locale.php for most locales in 3.4.
Ideally, what would it look like for Chinese? Not the configure-any-piece patch, but what specific pieces of code are appropriate for Chinese? And did you have any code in 3.3 to support this at all?
Yes I understand and I support the idea that we should prevent the use of $locale.php. I will try my best to make this happen in core without having extra files, but if this is really hard for the dev team and the community, we will have to use $locale.php.
We want to have the automatic excerpt yield the same result as our word-count.js proposal:
http://core.trac.wordpress.org/ticket/8759#comment:30
We did not have automatic excerpt support before. However, now that I've finished it, so I might consider including it into zh_CN.php for now if the updated formatting.php in 3.4 does not have Chinese excerpt capabilities. Thanks.
#20
in reply to:
↑ 14
@
13 years ago
Replying to nacin:
There is a lot of good stuff here, but man, that is a lot. What is wrong with 16079.2.diff for 3.4? It is better than what we have for all locales, yes?
We will go with this for 3.4 and should raise a new ticket for 3.5 to cover improving this further.
Example of different lengths of extracts