WordPress.org

Make WordPress Core

Opened 6 years ago

Last modified 6 weeks ago

#9993 new defect (bug)

Rss and atom feeds are dropping some characters

Reported by: pm24601 Owned by:
Milestone: Future Release Priority: normal
Severity: normal Version: 2.7.1
Component: Feeds Keywords: has-patch needs-testing
Focuses: Cc:

Description

Blog post with title:

& > test <

has the less than ( < ) is stripped from the atom and rss feed.

Attachments (2)

9993.patch (530 bytes) - added by peaceablewhale 5 years ago.
9993-tests.diff (1014 bytes) - added by stevenkword 3 months ago.
Unit tests for the_title_rss

Download all attachments as: .zip

Change History (23)

comment:1 @pm246016 years ago

  • Cc pm24601 added

comment:2 @redsweater6 years ago

Confirmed in WordPress trunk. The test case example loses the final "<" character, specifically the final output in the feed looks like:

<![CDATA[&amp; &gt; test ]]>

comment:3 @Denis-de-Bernardy6 years ago

  • Keywords needs-patch added
  • Milestone changed from Unassigned to 2.8

comment:5 @peaceablewhale6 years ago

  • Component changed from XML-RPC to Feeds
  • Keywords has-patch added; needs-patch removed
  • Owner josephscott deleted

The PHP's built-in "strip_tags" function has caused the error. It also seems not necessary to have it applied to the_title_rss(). I have uploaded a patch to remove it.

comment:6 @Denis-de-Bernardy6 years ago

trouble with the patch is the strip_tags() call might be there for a reason. isn't it possible to have html in a post title?

comment:7 @Denis-de-Bernardy6 years ago

  • Milestone changed from 2.8 to 2.9

punting per IRC discussion

comment:8 @peaceablewhale6 years ago

Yes. However, WordPress will escape the HTML (via the "ent2ncr" function) and therefore they are no longer HTML.

I think calling strip_tags is problmatic because an author can have a post title like "Use <h1> to <h6> for headings, <p> for paragraphy, but not formatting". strip_tags will remove the HTML unexpectedly.

comment:9 @Denis-de-Bernardy6 years ago

  • Keywords needs-review added

some core dev feedback would be useful here

comment:10 @peaceablewhale6 years ago

The priorities of ent2ncr are no longer modified by the patch. The new patch only removes the unnecessary 'strip_tags' filter.

comment:11 @peaceablewhale6 years ago

  • Keywords needs-testing added; needs-review removed

Marking this report "needs-testing" in order to gain attention in the upcoming bug hunt.

comment:12 @ryan6 years ago

  • Milestone changed from 2.9 to 3.0

There are other areas where we need a smarter strip. Let's figure it out in 3.0.

comment:13 @dd325 years ago

  • Keywords early added
  • Milestone changed from 3.0 to 3.1

Punting due to impending beta

@peaceablewhale5 years ago

comment:14 follow-up: @peaceablewhale5 years ago

Patch updated for 3.1.

"& > test <" now works even without the patch, but "<test>This is a test</test>" not.

I still think there is no reason for having 'strip_tags' in feed title as they are escaped already.

comment:15 @nacin5 years ago

  • Keywords 3.2-early added; early removed
  • Milestone changed from Awaiting Triage to Future Release

Bumping once more.

comment:16 @MikeHansenMe3 years ago

  • Cc mdhansen@… added
  • Keywords 3.2-early removed

I was able to reproduce the problem in trunk (3.5-beta2-22252). Confirmed the patch fixes the problem.

comment:17 @nacin3 years ago

  • Keywords needs-unit-tests added

@stevenkword3 months ago

Unit tests for the_title_rss

comment:18 @stevenkword3 months ago

  • Keywords needs-unit-tests removed

Errors still exists. Patch 9993-tests.diff adds test cases for both lone XML entities and XML tags.

comment:19 in reply to: ↑ 14 ; follow-up: @mdgl3 months ago

Replying to peaceablewhale:

I still think there is no reason for having 'strip_tags' in feed title as they are escaped already.

According to the RSS Best Practices Profile (http://www.rssboard.org/rss-profile), the <title> element for both channels and items is plain text character data and so should not contain embedded HTML tags.

I also don't see how post titles are already "escaped".

comment:20 @stevenkword6 weeks ago

This directly relates to #29621 in the Export component.

comment:21 in reply to: ↑ 19 @stevenkword6 weeks ago

Replying to mdgl:

Replying to peaceablewhale:

I still think there is no reason for having 'strip_tags' in feed title as they are escaped already.

According to the RSS Best Practices Profile (http://www.rssboard.org/rss-profile), the <title> element for both channels and items is plain text character data and so should not contain embedded HTML tags.

I also don't see how post titles are already "escaped".

I believe what @peaceablewhale is saying is that the filters applied to the the_title_rss include methods for escaping HTML. If an additional filter was added to the_title_rss that also escaped the disallowed XML characters, there would be no reason to strip the tags.

The string, <h1>, would be disallowed inside an XML element, but &lt;h1> is valid even thought &lt;h1&gt; would be best.

The point is, properly escaped HTML is technically plain text. The only necessary addition would be an additional filter to encode the disallowed characters.

Last edited 6 weeks ago by stevenkword (previous) (diff)
Note: See TracTickets for help on using tickets.