Opened 16 years ago
Last modified 3 weeks ago
#9993 assigned defect (bug)
Rss and atom feeds are dropping some characters
Reported by: | pm24601 | Owned by: | |
---|---|---|---|
Milestone: | Awaiting Review | Priority: | normal |
Severity: | normal | Version: | 2.7.1 |
Component: | Feeds | Keywords: | has-patch has-unit-tests needs-testing needs-refresh |
Focuses: | Cc: |
Description
Blog post with title:
& > test <
has the less than ( < ) is stripped from the atom and rss feed.
Attachments (3)
Change History (34)
#5
@
16 years ago
- Component changed from XML-RPC to Feeds
- Keywords has-patch added; needs-patch removed
- Owner josephscott deleted
The PHP's built-in "strip_tags" function has caused the error. It also seems not necessary to have it applied to the_title_rss(). I have uploaded a patch to remove it.
#6
@
16 years ago
trouble with the patch is the strip_tags() call might be there for a reason. isn't it possible to have html in a post title?
#8
@
16 years ago
Yes. However, WordPress will escape the HTML (via the "ent2ncr" function) and therefore they are no longer HTML.
I think calling strip_tags is problmatic because an author can have a post title like "Use <h1> to <h6> for headings, <p> for paragraphy, but not formatting". strip_tags will remove the HTML unexpectedly.
#10
@
16 years ago
The priorities of ent2ncr are no longer modified by the patch. The new patch only removes the unnecessary 'strip_tags' filter.
#11
@
15 years ago
- Keywords needs-testing added; needs-review removed
Marking this report "needs-testing" in order to gain attention in the upcoming bug hunt.
#12
@
15 years ago
- Milestone changed from 2.9 to 3.0
There are other areas where we need a smarter strip. Let's figure it out in 3.0.
#13
@
15 years ago
- Keywords early added
- Milestone changed from 3.0 to 3.1
Punting due to impending beta
#14
follow-up:
↓ 19
@
15 years ago
Patch updated for 3.1.
"& > test <" now works even without the patch, but "<test>This is a test</test>" not.
I still think there is no reason for having 'strip_tags' in feed title as they are escaped already.
#15
@
14 years ago
- Keywords 3.2-early added; early removed
- Milestone changed from Awaiting Triage to Future Release
Bumping once more.
#16
@
12 years ago
- Cc mdhansen@… added
- Keywords 3.2-early removed
I was able to reproduce the problem in trunk (3.5-beta2-22252). Confirmed the patch fixes the problem.
#18
@
10 years ago
- Keywords needs-unit-tests removed
Errors still exists. Patch 9993-tests.diff adds test cases for both lone XML entities and XML tags.
#19
in reply to:
↑ 14
;
follow-up:
↓ 21
@
10 years ago
Replying to peaceablewhale:
I still think there is no reason for having 'strip_tags' in feed title as they are escaped already.
According to the RSS Best Practices Profile (http://www.rssboard.org/rss-profile), the <title>
element for both channels and items is plain text character data and so should not contain embedded HTML tags.
I also don't see how post titles are already "escaped".
#21
in reply to:
↑ 19
@
10 years ago
Replying to mdgl:
Replying to peaceablewhale:
I still think there is no reason for having 'strip_tags' in feed title as they are escaped already.
According to the RSS Best Practices Profile (http://www.rssboard.org/rss-profile), the
<title>
element for both channels and items is plain text character data and so should not contain embedded HTML tags.
I also don't see how post titles are already "escaped".
I believe what @peaceablewhale is saying is that the filters applied to the the_title_rss
include methods for escaping HTML. If an additional filter was added to the_title_rss
that also escaped the disallowed XML characters, there would be no reason to strip the tags.
The string, <h1>
, would be disallowed inside an XML element, but <h1>
is valid even thought <h1>
would be best.
The point is, properly escaped HTML is technically plain text. The only necessary addition would be an additional filter to encode the disallowed characters.
#22
@
9 years ago
- Owner set to stevenkword
- Status changed from new to assigned
I'm standing behind (attachment:9993.2.patch) as a reasonable approach for the time being. Since the output is also passed through esc_html(), it is not necessary to strip these tags. It also opens up the benefit of having parity between the web view and the feed content.
Unit tests, (attachment:9993-tests.diff), are present and show failure before the patch is applied and success after.
#23
@
9 years ago
- Keywords has-unit-tests added; needs-testing removed
- Milestone changed from Future Release to 4.6
#25
in reply to:
↑ 24
;
follow-up:
↓ 26
@
9 years ago
Replying to netweb:
@stevenkword Maybe add tests for Atom also?
This might be a case where the naming conventions don't quite make sense. Both Atom and RSS feeds use the same the_title_rss()
method in their respective templates. The only difference being that the Atom template wraps the output in CDATA ,which would be unnecessary, but not problematic after this change.
Ideally, since both templates use XML, naming the method the_title_feed()
would have been more appropriate. While we do have the ability to add unit tests for Atom as of 4.5, I think the tests proposed in this ticket cover those test cases. As a future part of adding test coverage for feeds, I would like to add a new common.php
to /tests/feed/
to help make things more obvious.
#26
in reply to:
↑ 25
@
9 years ago
Replying to stevenkword:
This might be a case where the naming conventions don't quite make sense. Both Atom and RSS feeds use the same
the_title_rss()
method in their respective templates. The only difference being that the Atom template wraps the output in CDATA ,which would be unnecessary, but not problematic after this change.
Ideally, since both templates use XML (the export component too), naming the method
the_title_xml()
would have been more appropriate. While we do have the ability to add unit tests for Atom as of 4.5, I think the tests proposed in this ticket cover those test cases. As a future part of adding test coverage for feeds, I would like to add a newcommon.php
to/tests/feed/
to help make things more obvious.
Because the functionality is different isn't this even more reason to add unit tests for Atom?
Also looking and the_title_rss()
is also used in src/wp-admin/includes/export.php#L426, sadly there are zero export unit tests at the moment, there is a patch adding export unit tests in #22435 but they are dependant on the new Export API enhancements in that ticket.
#27
follow-up:
↓ 28
@
9 years ago
The strip_tags()
call exists since b2, see https://github.com/ericandrewlewis/b2/blame/e0de2a065d3e1d139fbb635e505e7fed5271954e/b2-include/b2template.functions.php#L269.
I think one of the reasons might be that titles can include HTML like <br>
or <strong>
for visual purposes and not as a part of the title. These shouldn't be displayed in a feed reader.
#28
in reply to:
↑ 27
;
follow-up:
↓ 29
@
9 years ago
Replying to ocean90:
The
strip_tags()
call exists since b2, see https://github.com/ericandrewlewis/b2/blame/e0de2a065d3e1d139fbb635e505e7fed5271954e/b2-include/b2template.functions.php#L269.
I think one of the reasons might be that titles can include HTML like<br>
or<strong>
for visual purposes and not as a part of the title. These shouldn't be displayed in a feed reader.
How would you feel about str_replace() those explicity rather than running them through strip_tags()?
#29
in reply to:
↑ 28
@
9 years ago
- Milestone changed from 4.6 to Future Release
Replying to stevenkword:
How would you feel about str_replace() those explicity rather than running them through strip_tags()?
Those were just two examples, it can be any HTML.
Maybe we can add another filter for the_title_rss
which runs before strip_tags()
and encodes <
which are not part of a HTML tag as <
.
#30
@
3 weeks ago
- Keywords needs-testing needs-refresh added
- Milestone set to Awaiting Review
- Owner stevenkword deleted
Found this going through a list of tickets missing a milestone.
Since it has been a good amount of time since this was last investigated, looking at the current state of this report in the context of today's code base would be a good first step to reviving.
I'm going to remove the ticket owner to make it more clear that no one is actively working on this one.
Confirmed in WordPress trunk. The test case example loses the final "<" character, specifically the final output in the feed looks like:
<![CDATA[& > test ]]>