#4647 closed defect (bug) (wontfix)
Text in database should not be entity-encoded
Reported by: | redsweater | Owned by: | |
---|---|---|---|
Milestone: | Priority: | normal | |
Severity: | normal | Version: | 2.2.1 |
Component: | General | Keywords: | needs-patch |
Focuses: | Cc: |
Description
I've noticed that some text, e.g. the names of categories and the post_content, are stored in the database with XML-compatible (I think) entity encoding. For instance, the & character is stored in the database as "&"
Other fields, such as the Excerpt and Title for instance, store the same & character verbatim in the field as &.
It seems that for consistency, the text in the database should be of a standardized form. I would vote for not storing entity encoding in the database, as it seems more of a presentational thing.
To observe the issue, just write a test post in which the & character for instance appears in all possible text fields. Then observe the database directly to see what has happened.
This has particularly vulgar affects on the sanity of the text values returned by the XML-RPC interface, which I'll describe in another bug report.
Attachments (1)
Change History (10)
#3
@
17 years ago
- Summary changed from Text in database is inconsistently entity-encoded to Text in database should not be entity-encoded
#5
@
17 years ago
This is happening because 'pre_term_name' in wp-includes/default-filters.php is encoding the data before it gets to the database. Removing that default filter fixes the problem for both category and tags.
#7
@
15 years ago
- Keywords needs-patch added; has-patch needs-testing removed
- Milestone 2.9 deleted
- Resolution set to wontfix
- Status changed from new to closed
patch is irrelevant, as this is a huge workflow change... it needs to address the upgrade of everything that gets changed as well.
I'm closing as wontfix, pending a proper patch.
#8
follow-up:
↓ 9
@
15 years ago
What do you mean when you say it needs to address the upgrade of everything that gets changed? Do you mean a user's existing (inconsistently encoded) data?
It seems to me that at least putting an end to the addition of inconsistent data to the database would be a valuable improvement. If Joseph's patch addresses the problem so that new users would not be building inconsistency into their database, it seems useful.
Is it normal policy of the WordPress team to close bugs as wontfix just because there is not currently an acceptable patch? Or is there more to the story of "wontfix"ing this bug than is summarized in the comment above?
Daniel
#9
in reply to:
↑ 8
@
15 years ago
Replying to redsweater:
What do you mean when you say it needs to address the upgrade of everything that gets changed? Do you mean a user's existing (inconsistently encoded) data?
yeah.
It seems to me that at least putting an end to the addition of inconsistent data to the database would be a valuable improvement.
it could also introduce lots of issues if the data is not made consistent.
Is it normal policy of the WordPress team to close bugs as wontfix just because there is not currently an acceptable patch? Or is there more to the story of "wontfix"ing this bug than is summarized in the comment above?
understand it as wontfix until a patch that takes care of the upgrade is added, not wontfix ever.
feel very free to re-open the ticket. I closed it because it would have stayed open for another two or three years pending the needed patch.
I was going to write another bug suggesting that the XML-RPC interface should do something to mitigate the effect this has on XML-RPC clients. But on further thought I think it's probably best that the XML-RPC interface serve as an honest interface to the content in the database. This should be fixed in the code that inappropriately inserts the entity-encoded text in the affected fields.
For example, it turns out that the XML-RPC interface's honesty is double-edged. I can write a post with content "Trial & Tribulation" and submit via XML-RPC, and it goes into the database without the problematic encoding. It's only when entered via the WordPress editor that the problematic encoding occurs.
However, a Category submitted via XML-RPC wp.newCategory does suffer the entity encoding problem, and goes into the database with it.
Long story short, anything that writes text to the database should, I think, take pains to make sure it goes in verbatim, and not entity encoded.