#2743 closed defect (bug) (worksforme)
RSS feeds not cleaning html entities properly
| Reported by: |
|
Owned by: | |
|---|---|---|---|
| Milestone: | Priority: | normal | |
| Severity: | blocker | Version: | 2.0.2 |
| Component: | General | Keywords: | rss entities |
| Focuses: | Cc: |
Description
Example Feed:
http://el-tramo.be/feed/
Error:
XML Parsing Error: undefined entity
Location: http://el-tramo.be/feed/
Line Number 12, Column 25: <description>Remko Tronçon's Homepage</description>
The owner of this blog has put the ç ( ç ) character into his blog literally, and wordpress is not cleaning it properly.
I see that in functions-formatting.php line 795:
'Ç' => 'Ç',
However, this function (ent2ncr) is not getting called during feed generation.
Unfortunately, this function may not be able to be adapted to simple parse RSS feeds, and allowed RSS named entities are:
< for <,
& for &,
> for >,
' for ',
and " for ".
However, all of those entities are being transformed into their numerical equivalent ( '"' => '"', '&' => '&', etc. ) with this function as well.
Using UTF-8, this bug is not present in 2.0.4 alpha.
Tested ç in blog description, author name and post title & body. Original blog showing the error is also fixed, running 2.0.3
Perhaps this can be closed?