Opened 17 years ago
Last modified 3 weeks ago
#9993 assigned defect (bug)
Rss and atom feeds are dropping some characters
| Reported by: |
|
Owned by: | |
|---|---|---|---|
| Milestone: | Awaiting Review | Priority: | normal |
| Severity: | normal | Version: | 2.7.1 |
| Component: | Feeds | Keywords: | has-patch has-unit-tests needs-refresh needs-test-info |
| Focuses: | Cc: |
Description
Blog post with title:
& > test <
has the less than ( < ) is stripped from the atom and rss feed.
Attachments (3)
Change History (38)
#5
@
17 years ago
- Component changed from XML-RPC to Feeds
- Keywords has-patch added; needs-patch removed
- Owner josephscott deleted
The PHP's built-in "strip_tags" function has caused the error. It also seems not necessary to have it applied to the_title_rss(). I have uploaded a patch to remove it.
#6
@
17 years ago
trouble with the patch is the strip_tags() call might be there for a reason. isn't it possible to have html in a post title?
#8
@
17 years ago
Yes. However, WordPress will escape the HTML (via the "ent2ncr" function) and therefore they are no longer HTML.
I think calling strip_tags is problmatic because an author can have a post title like "Use <h1> to <h6> for headings, <p> for paragraphy, but not formatting". strip_tags will remove the HTML unexpectedly.
#10
@
17 years ago
The priorities of ent2ncr are no longer modified by the patch. The new patch only removes the unnecessary 'strip_tags' filter.
#11
@
16 years ago
- Keywords needs-testing added; needs-review removed
Marking this report "needs-testing" in order to gain attention in the upcoming bug hunt.
#12
@
16 years ago
- Milestone changed from 2.9 to 3.0
There are other areas where we need a smarter strip. Let's figure it out in 3.0.
#13
@
16 years ago
- Keywords early added
- Milestone changed from 3.0 to 3.1
Punting due to impending beta
#14
follow-up:
↓ 19
@
16 years ago
Patch updated for 3.1.
"& > test <" now works even without the patch, but "<test>This is a test</test>" not.
I still think there is no reason for having 'strip_tags' in feed title as they are escaped already.
#15
@
15 years ago
- Keywords 3.2-early added; early removed
- Milestone changed from Awaiting Triage to Future Release
Bumping once more.
#16
@
13 years ago
- Cc mdhansen@… added
- Keywords 3.2-early removed
I was able to reproduce the problem in trunk (3.5-beta2-22252). Confirmed the patch fixes the problem.
#18
@
11 years ago
- Keywords needs-unit-tests removed
Errors still exists. Patch 9993-tests.diff adds test cases for both lone XML entities and XML tags.
#19
in reply to:
↑ 14
;
follow-up:
↓ 21
@
11 years ago
Replying to peaceablewhale:
I still think there is no reason for having 'strip_tags' in feed title as they are escaped already.
According to the RSS Best Practices Profile (http://www.rssboard.org/rss-profile), the <title> element for both channels and items is plain text character data and so should not contain embedded HTML tags.
I also don't see how post titles are already "escaped".
#21
in reply to:
↑ 19
@
11 years ago
Replying to mdgl:
Replying to peaceablewhale:
I still think there is no reason for having 'strip_tags' in feed title as they are escaped already.
According to the RSS Best Practices Profile (http://www.rssboard.org/rss-profile), the
<title>element for both channels and items is plain text character data and so should not contain embedded HTML tags.
I also don't see how post titles are already "escaped".
I believe what @peaceablewhale is saying is that the filters applied to the the_title_rss include methods for escaping HTML. If an additional filter was added to the_title_rss that also escaped the disallowed XML characters, there would be no reason to strip the tags.
The string, <h1>, would be disallowed inside an XML element, but <h1> is valid even thought <h1> would be best.
The point is, properly escaped HTML is technically plain text. The only necessary addition would be an additional filter to encode the disallowed characters.
#22
@
10 years ago
- Owner set to stevenkword
- Status changed from new to assigned
I'm standing behind (attachment:9993.2.patch) as a reasonable approach for the time being. Since the output is also passed through esc_html(), it is not necessary to strip these tags. It also opens up the benefit of having parity between the web view and the feed content.
Unit tests, (attachment:9993-tests.diff), are present and show failure before the patch is applied and success after.
#23
@
10 years ago
- Keywords has-unit-tests added; needs-testing removed
- Milestone changed from Future Release to 4.6
#25
in reply to:
↑ 24
;
follow-up:
↓ 26
@
10 years ago
Replying to netweb:
@stevenkword Maybe add tests for Atom also?
This might be a case where the naming conventions don't quite make sense. Both Atom and RSS feeds use the same the_title_rss() method in their respective templates. The only difference being that the Atom template wraps the output in CDATA ,which would be unnecessary, but not problematic after this change.
Ideally, since both templates use XML, naming the method the_title_feed() would have been more appropriate. While we do have the ability to add unit tests for Atom as of 4.5, I think the tests proposed in this ticket cover those test cases. As a future part of adding test coverage for feeds, I would like to add a new common.php to /tests/feed/ to help make things more obvious.
#26
in reply to:
↑ 25
@
10 years ago
Replying to stevenkword:
This might be a case where the naming conventions don't quite make sense. Both Atom and RSS feeds use the same
the_title_rss()method in their respective templates. The only difference being that the Atom template wraps the output in CDATA ,which would be unnecessary, but not problematic after this change.
Ideally, since both templates use XML (the export component too), naming the method
the_title_xml()would have been more appropriate. While we do have the ability to add unit tests for Atom as of 4.5, I think the tests proposed in this ticket cover those test cases. As a future part of adding test coverage for feeds, I would like to add a newcommon.phpto/tests/feed/to help make things more obvious.
Because the functionality is different isn't this even more reason to add unit tests for Atom?
Also looking and the_title_rss() is also used in src/wp-admin/includes/export.php#L426, sadly there are zero export unit tests at the moment, there is a patch adding export unit tests in #22435 but they are dependant on the new Export API enhancements in that ticket.
#27
follow-up:
↓ 28
@
10 years ago
The strip_tags() call exists since b2, see https://github.com/ericandrewlewis/b2/blame/e0de2a065d3e1d139fbb635e505e7fed5271954e/b2-include/b2template.functions.php#L269.
I think one of the reasons might be that titles can include HTML like <br> or <strong> for visual purposes and not as a part of the title. These shouldn't be displayed in a feed reader.
#28
in reply to:
↑ 27
;
follow-up:
↓ 29
@
10 years ago
Replying to ocean90:
The
strip_tags()call exists since b2, see https://github.com/ericandrewlewis/b2/blame/e0de2a065d3e1d139fbb635e505e7fed5271954e/b2-include/b2template.functions.php#L269.
I think one of the reasons might be that titles can include HTML like<br>or<strong>for visual purposes and not as a part of the title. These shouldn't be displayed in a feed reader.
How would you feel about str_replace() those explicity rather than running them through strip_tags()?
#29
in reply to:
↑ 28
@
10 years ago
- Milestone changed from 4.6 to Future Release
Replying to stevenkword:
How would you feel about str_replace() those explicity rather than running them through strip_tags()?
Those were just two examples, it can be any HTML.
Maybe we can add another filter for the_title_rss which runs before strip_tags() and encodes < which are not part of a HTML tag as <.
#30
@
13 months ago
- Keywords needs-testing needs-refresh added
- Milestone set to Awaiting Review
- Owner stevenkword deleted
Found this going through a list of tickets missing a milestone.
Since it has been a good amount of time since this was last investigated, looking at the current state of this report in the context of today's code base would be a good first step to reviving.
I'm going to remove the ticket owner to make it more clear that no one is actively working on this one.
#31
@
13 months ago
I just tested against a WordPress.com blog (most convenient) and the bug is still there. It seems minor, but it would still be nice to fix. Probably still an issue in .org blogs as well.
#32
@
7 months ago
- Keywords has-test-info added
Reproduction Report
Description
This report validates whether the issue can be reproduced.
Environment
- WordPress: 6.9-alpha-60093-src
- PHP: 8.2.29
- Server: nginx/1.29.0
- Database: mysqli (Server: 8.4.5 / Client: mysqlnd 8.2.29)
- Browser: Chrome 138.0.0.0
- OS: macOS
- Theme: Twenty Fifteen 4.0
- MU Plugins: None activated
- Plugins:
- Hello Dolly 1.7.2
- Test Reports 1.2.0
Actual Results
✅ Error condition does not occur .
- When creating a post with the title
& > test <, the generated RSS/Atom feed encodes the<character as<(numeric entity), not as a missing or stripped character. - The feed output is:
`xml <title> Comments on: & > test < </title>` - The
<character is present and correctly encoded, not stripped.
Additional Notes
- The original concern was that
<was missing from the feed output. However, with the current filter configuration,<is encoded as<.
#33
@
7 months ago
For what it's worth, I just reproduced this with the latest trunk sources. I think that perhaps the test @sukhendu2002 performed made it appear that the bug is fixed because an entity encoding was created by the web admin interface when creating the post. When using the XMLRPC API, for example, to send a post to the blog, the "<" character is still allowed to reach the database, whereupon the bug can be observed.
I don't know that it's necessarily worth fixing after all these years, but the bug still exists.
#34
@
3 weeks ago
- Keywords needs-test-info added; needs-testing has-test-info removed
Given that @redsweater has reproduced it, but has not provided exact instructions, I think it would be nice to have them
#35
@
3 weeks ago
As I mentioned, I'm not sure it's worth fixing this after all these years. Anybody who would work on a fix would probably be able to create arbitrary XML-RPC requests for testing at will, so the steps to reproduce are:
- Establish a test blog e.g. at http://localhost:8889/
- POST to the http://localhost:8889/xmlrpc.php endpoint with XML representing a new post request, where the title is "& > test <"
- Using a newsreader or curl, access the feed for the site:
curl http://localhost:8889/feed/
Expected: the text within <title></title> tag on the post in the feed should include a "<" character at the end.
Actual: It's stripped.
Confirmed in WordPress trunk. The test case example loses the final "<" character, specifically the final output in the feed looks like:
<![CDATA[& > test ]]>