Opened 10 years ago
Last modified 7 years ago
#28058 new defect (bug)
Taxonomies defined with UTF8 encoded names cause notices when adding a new term
Reported by: | mikejolley | Owned by: | |
---|---|---|---|
Milestone: | Future Release | Priority: | normal |
Severity: | normal | Version: | 3.9 |
Component: | Taxonomy | Keywords: | |
Focuses: | Cc: |
Description
This one is easy to reproduce as follows:
- Register a new taxonomy with UTF8 in the name, e.g. pa_資料庫版本. This is in particular possible in WC for its attribute system.
- Add a term via the admin panel
- You get notices like:
Notice: Trying to get property of non-object in /Users/patrick/Documents/woothemes/woocommerce/wp-includes/link-template.php on line 685
I traced it back to https://github.com/WordPress/WordPress/blob/master/wp-admin/includes/screen.php#L413
After adding any term, this sanitize key turns pa_資料庫版本 into just 'pa', which results in the taxonomy not being loaded because 'pa' doesn't exist.
Removing the sanitize_key fixes the issue so sanitisation could be removed, modified, or moved after the taxonomy checks.
This was originally logged at https://github.com/woothemes/woocommerce/issues/5314.
Change History (7)
#2
@
10 years ago
- Keywords reporter-feedback added
May be I misunderstand this, but why is it important to have the name of the taxonomy non-ASCII? The name is internal and used in a url, either as an archive or as ?my-tax-name=term
The label, however, is the visible part of the taxonomy identification.
#3
@
10 years ago
@knutsp I'd personally never need to use non-ASCII chars, but if you are Chinese and speak Chinese how else would you represent 資料庫版本?
In WooCommerce user's can create global attributes where they define the name and label. This occurs when they use non-ascii in the name. Understandable if they are non-english I guess.
#6
@
9 years ago
- Milestone changed from Awaiting Review to Future Release
To clarify: The issue here is not with *terms*, it's with the taxonomy name itself, correct? Eg:
register_taxonomy( 'pa_資料庫版本', $args );
Looking through the component, it looks like we don't explicitly support UTF8 characters in taxonomy names, though we don't enforce it; in most places, the use of these characters for taxonomy names will work fine, but clearly there are some finer points where things break. (The same thing is almost certainly true of post types.)
It would be great to clean this up and provide full support for taxonomies/post types with non-ASCII characters in their names. This will take a pretty thorough review, however. Some things to check:
- The 'taxonomy' field in the 'wp_term_taxonomy' table is
VARCHAR(32)
, which imposes an absolute maximum length on taxonomy names. We throw a related_doing_it_wrong()
notice inregister_taxonomy()
based onstrlen()
. This check would need to usemb_strlen()
instead. - Non-ASCII characters will be stored differently (or sometimes not at all) in databases with different character encoding. This means that a taxonomy name that works properly on one WP installation may not work properly on another one, just due to the DB charset/collation. This might be an education issue for plugin authors; or it might suggest that core should be stricter about not allowing certain character types in certain fields that are used as keys in plugins/themes.
- We should take special care testing rewrite issues, as non-ASCII characters will be encoded in various places in the context of URLs.
The real culprit is this - https://core.trac.wordpress.org/browser/tags/3.9/src/wp-includes/formatting.php#L1040
That regex (
/[^a-z0-9_\-]/
) removes non-English characters.As a workaround this piece of string can be omitted from sanitization.