Make WordPress Core

Opened 20 years ago

Closed 19 years ago

Last modified 19 years ago

#2743 closed defect (bug) (worksforme)

RSS feeds not cleaning html entities properly

Reported by: vxjasonxv's profile VxJasonxV Owned by:
Milestone: Priority: normal
Severity: blocker Version: 2.0.2
Component: General Keywords: rss entities
Focuses: Cc:

Description

Example Feed:
http://el-tramo.be/feed/

Error:
XML Parsing Error: undefined entity
Location: http://el-tramo.be/feed/
Line Number 12, Column 25: <description>Remko Tron&ccedil;on's Homepage</description>

The owner of this blog has put the ç ( &ccedil; ) character into his blog literally, and wordpress is not cleaning it properly.
I see that in functions-formatting.php line 795:
'&Ccedil;' => '&#199;',

However, this function (ent2ncr) is not getting called during feed generation.
Unfortunately, this function may not be able to be adapted to simple parse RSS feeds, and allowed RSS named entities are:
&lt; for <,
&amp; for &,
&gt; for >,
' for &apos;,
and &quot; for ".

However, all of those entities are being transformed into their numerical equivalent ( '&quot;' => '&#34;', '&amp;' => '&#38;', etc. ) with this function as well.

Change History (3)

#1 @leftjustified
20 years ago

Using UTF-8, this bug is not present in 2.0.4 alpha.
Tested ç in blog description, author name and post title & body. Original blog showing the error is also fixed, running 2.0.3
Perhaps this can be closed?

#2 @foolswisdom
19 years ago

  • Resolution set to worksforme
  • Status changed from new to closed

Closing bug as Works for Me based on leftjustified's update.

#3 @foolswisdom
19 years ago

  • Milestone 2.0.3 deleted
Note: See TracTickets for help on using tickets.