WordPress.org

Make WordPress Core

Opened 10 years ago

Closed 10 years ago

Last modified 10 years ago

#4547 closed enhancement (wontfix)

Convert WP from magpie to Simplepie

Reported by: technosailor Owned by: technosailor
Milestone: Priority: normal
Severity: normal Version: 2.3
Component: General Keywords: simplepie magpie rss
Focuses: Cc:

Description

Moving to convert WP from magpie to Simplepie - this time without the drama of broken feeds.

First attempt. Will pick up tomorrow again.

Questions for me revolve around how we want to handle RSS caching and whether there is API that I don't see that uses magpie.

Specifically in caching, do we want to add a disk cache folder somewhere? I don't like this idea because it requires manual intervention from the user to set folder permissions. Can we object cache this?

Attachments (2)

simplepieconversion-take2.diff (7.6 KB) - added by technosailor 10 years ago.
4547.diff (1.2 KB) - added by Otto42 10 years ago.
Allows a plugin to not load Magpie at all, thus allowing for true replacement.

Download all attachments as: .zip

Change History (31)

#1 @technosailor
10 years ago

And the patch is oversize. Grab it [here http://www.technosailor.com/downloads/simplepie-switchover-take1.diff] instead. Maybe I need to figure out what we don't need from Simplepie.

#2 @foolswisdom
10 years ago

  • Summary changed from Simplepie to Convert WP from magpie to Simplepie

#3236 described Simplepie min. req being PHP 4.3.x

#2864 previous ticket for this issue.

#3 follow-up: @technosailor
10 years ago

#3236 was closed as invalid considering it requires PHP 4.3. Considering we have Atom 1.0 feeds now _and can't even parse them_, I consider this reason to up to PHP 4.3.

#2864 was deleted.

I don't see why either of those tickets should be revisited in this case, though I don't really care where the discussion happens. Obviously, we need to re-examine our minimum compat level though. My patches changes the two and only two places where RSS parsing in the core to use SP instead of MP occurs. We can leave magpie bundled for plugins.

Are we saying this is unimportant?

#4 in reply to: ↑ 3 @foolswisdom
10 years ago

Replying to technosailor:

Are we saying this is unimportant?

I am only connecting some dots. I do not have any expertise in this area.

#5 follow-up: @markjaquith
10 years ago

Tasks:

  • Determine why SimplePie requires PHP 4.3, and see if it is feasible to code a compatibility layer to keep WP requirements static.
  • Determine which WP feed API functions need to be repointed to use SimplePie, and repoint them.
  • Set up additional compatibility layer, if needed (i.e. naming of object properties).
  • Have SimplePie use existing DB caching system.
  • Mark MagpieRSS as a deprecated class (for plugins that use it directly).
  • Set up a schedule for eventual removal of MagpieRSS.
  • Communicate that schedule to plugin authors.

What else?

#6 @markjaquith
10 years ago

Also, some reasons for wanting to do this.

  • MagpieRSS has ceased to be updated
  • MagpieRSS does not support Atom 1.0
  • SimplePie is actively maintained
  • SimplePie supports Atom 1.0

#7 follow-up: @matt
10 years ago

I worry about the complexity and weight of including arbitrary feed parsing in WP.

Perhaps we should think about scaling back our parsing, right now we use RSS as a lightweight title/link content API for news and links. Even Magpie is a tad heavy for this.

In theory I see the appeal of being able to consume any type of feed on the planet, including handling malformed feeds and following Postel's law, but does it really facilitate WP's core mission of making it easier to publish on the web? Or is it plugin territory?

#8 @Otto42
10 years ago

Definitely should be core. People want to read other sites RSS feeds. Whether they want to put them in the sidebar, or whether they want to put them in the main page, it's something that there are lots of requests for. The fetch_rss() page in the codex gets a lot of hits and a lot of questions about how to make it do different things on the support forums.

Making this sort of thing into plugin territory means that every RSS plugin is going to include its own feed-reading stuff, making things difficult, heavier, more complex. Consider that we also have things like Prototype in the core code, both because it's used, but also because it prevents conflicts when several plugins all try to include it as well.

Suggestion: Make the thing not loaded by default. This is currently the case with the rss.php, so there's no reason not to also make it the case for simplepie. Then you only get the weight of it when you actually use it.

#9 @Otto42
10 years ago

technosailor: Suggestion: Instead of cutting/pasting the SimplePie code into rss.php, leave it as a separate file (simplepie.php) and then include it. This allows easy upgrades of the simplepie code later. The backward compatibility stuff can stay in rss.php, of course.

#10 @technosailor
10 years ago

Thanks for candid and helpful suggestions.

@Mark: A little bit of work for me, huh?

@Matt: I think removing RSS parsing from the core is suicide.

@Otto: Yeah, stab 2 will be that way. I don't know what Trac's upload limitation is, but class-Simplepie.php is just around 300kb so maybe it will fit in on the next patch too :)

#11 in reply to: ↑ 7 @markjaquith
10 years ago

Replying to matt:

I worry about the complexity and weight of including arbitrary feed parsing in WP.

Weight, yes... I agree there. But keep in mind that it will only be loaded as needed. Complexity comes in that it is an external library that we have to update ... we have many of these.

Perhaps we should think about scaling back our parsing, right now we use RSS as a lightweight title/link content API for news and links. Even Magpie is a tad heavy for this.

In theory I see the appeal of being able to consume any type of feed on the planet, including handling malformed feeds and following Postel's law, but does it really facilitate WP's core mission of making it easier to publish on the web? Or is it plugin territory?

It facilitates the core mission if you consider republishing to be a form of publishing. Some of the more complex uses of an internal feed parser are plugin territory, but:

  • we need feed parsing for the Dashboard
  • we need feed parsing for the RSS widget (with Atom 1.0 capabilities)
  • we could even use feed parsing for plugin/core update notifications
  • it provides a standard way for all plugins that need feed parsing to get it, without each including their own solution
  • what is the alternative? Axe feed consuming from core entirely?

#12 follow-up: @matt
10 years ago

Load weight isn't a big deal, as you said it's only loaded when needed. I'm more concerned with the weight of maintaining and securing so much new code, related to the comment I left on your blog.

As an alternative I see either maintaining the status quo with the package we currently use, which parses the feeds we use Good Enough, or creating an even more simplistic parser designed just to work with our dashboard feeds. (It could also be a lot more memory/space efficient if it didn't store the whole feed.)

I don't buy the argument of parsing more types of feeds than we currently do as enough to justify the change, because even SimplePie, which is way better than Magpie from what I can see, is still pretty limited when compared to what Google does or UFP, particularly with regard to malformed or invalid feeds.

If parsing everything possible is our goal, we should probably use a web service like Google's feed API, not switch out one almost-there library for another.

#13 @markjaquith
10 years ago

  • Type changed from defect to task

What cramps our style here is the RSS widget. With MagpieRSS as our feed backend, the RSS widget can't parse Atom feeds. If we remove the RSS widget, people who use it aren't going to be happy.

If we can get past that hurdle, I'd be open to creating a quick-and-dirty parser for the dashboard feeds and taking general feed parsing support out of core.

Using a JavaScript-powered third party web service for feeds just sounds like a bad idea to me. And the status quo stinks. If we're going to support general feed parsing, we should be able to parse the formats we produce.

#14 in reply to: ↑ 12 ; follow-up: @Otto42
10 years ago

Replying to matt:

As an alternative I see either maintaining the status quo with the package we currently use, which parses the feeds we use Good Enough, or creating an even more simplistic parser designed just to work with our dashboard feeds. (It could also be a lot more memory/space efficient if it didn't store the whole feed.)

With regards to this argument, I don't think that "rolling our own" is a particularly good idea for a few reasons:

  • Rolling our own feed parser means that we're only likely to support what Wordpress itself needs (the dashboard feeds), making that particular code useless to everybody else. There's no way for general feed functionality to be used by other people (RSS widget, any other use of fetch_rss() in existence).
  • Maintenance becomes a pain in the arse, as people want to add new features to the feed parser in order to support whatever their feeds need.
  • Why reinvent the wheel? Using an existing, well-supported, and actively maintained library means that upgrading is a drop-in operation, with minor changes needed to accommodate changes. We do this with all the javascript code now (prototype, jquery, etc) so where's the problem?


I don't buy the argument of parsing more types of feeds than we currently do as enough to justify the change, because even SimplePie, which is way better than Magpie from what I can see, is still pretty limited when compared to what Google does or UFP, particularly with regard to malformed or invalid feeds.

You don't necessarily need "more feeds" to justify the change. The fact that MagpieRSS is basically dead code should be enough to switch away from it. The fact that it uses Snoopy (which has issues) instead of having its own cross-server compatible URL retrieval (using curl if available, fopen if not, etc) should be enough to switch away from it. All major PHP web projects that used Magpie in the past have switched to SimplePie because the switchover is relatively simple and straightforward, the functionality so much better, and it's still being actively maintained.

If you've tried to use Magpie to parse virtually any of Google's feeds, for example, you'd rapidly learn to appreciate the simplicity and flexibility of SimplePie. Magpie doesn't even fully support RSS feeds, much less Atom feeds.

If parsing everything possible is our goal, we should probably use a web service like Google's feed API, not switch out one almost-there library for another.

Using Google's AJAX API is certainly a possibility for the dashboard, however it doesn't solve the more generic problem of reading feeds for other purposes. It would be beneficial to have generic feed reading capabilities in Wordpress because there are many, many other reasons to read feeds than just displaying news on the dashboard.

The alternative is to eliminate the RSS widget, remove all feed parsing functionality from Wordpress, and then let plugins do it along with all the conflicts that that will introduce as you have half a dozen plugins all trying to load feed parsing libraries.

Want to display flickr pictures? You need a feed parser. Want to display anything from any of Google's services? You need a feed parser. If you want to pull anything from any third party service, the short of it is that you need a feed parser. Having a standardized one that works for most, if not all, cases just makes sense.

#15 @technosailor
10 years ago

Take 2 attached. Also, grab class-Simplepie.php from http://www.technosailor.com/downloads/class-Simplepie.diff

#16 in reply to: ↑ 5 @link92
10 years ago

Replying to markjaquith:

Determine why SimplePie requires PHP 4.3, and see if it is feasible to code a compatibility layer to keep WP requirements static.

As of r775, SP supports PHP4.1 and up.

Have SimplePie use existing DB caching system.

See my patch in prior bug.

#17 in reply to: ↑ 14 ; follow-up: @matt
10 years ago

Replying to Otto42:

Using Google's AJAX API is certainly a possibility for the dashboard, however it doesn't solve the more generic problem of reading feeds for other purposes. It would be beneficial to have generic feed reading capabilities in Wordpress because there are many, many other reasons to read feeds than just displaying news on the dashboard.

Their API returns normalized data structures that are much easier to parse than RSS/Atom, and it's forward-compatible. I think it would give us the combined benefits of supporting all types of feeds current and future, no ongoing maintenance on our part, and reducing the size and complexity of what's bundled with WP.

#18 in reply to: ↑ 17 ; follow-up: @Otto42
10 years ago

Replying to matt:

Their API returns normalized data structures that are much easier to parse than RSS/Atom, and it's forward-compatible. I think it would give us the combined benefits of supporting all types of feeds current and future, no ongoing maintenance on our part, and reducing the size and complexity of what's bundled with WP.

Their API also doesn't give you any control, as it doesn't give you the actual content of the feed directly, it first goes through Google's once-an-hour-maximum caching mechanism and numerous other things. A lot of potentially useful data in the feed is also scrubbed in the process. In other words, you only get what Google supports, not the actual unadulterated feed.

Frankly, I'd prefer an API that would give me the actual content of the feed that I am wanting to get, not some kind of weird thing that may or may not work for any particular application that I may or may not be using.

#19 in reply to: ↑ 18 ; follow-up: @Otto42
10 years ago

Oh, and one more thing: Google's mechanism is Javascript-only. It only works in the browser. I can't get data from a feed and, say, store it in the database or anything.

#20 @technosailor
10 years ago

As far as I can tell, the only one opposing this is Matt. It seems a bit like tags in 2.2. Matt, I don't know if you're just playing devil's advocate but either way, you're outnumbered.

My argument is much more philisophical. Otto makes a good case for the technical and practical side of the argument. However, from where I sit, feeds are becoming more about ubiquitous content. We don't know when, where, how or why content is being used and in what form it's being used. WP has done a good job of providing it's own content in a variety of formats. The reality is WP is also on the consuming side of content and we are bottlenecked right now. Plugin authors can't write plugins to do feed consummation effectively unless they bundle their own feed library and that on it's face is just silly when we already provide a library. Why should a plugin author have to bundle SimplePie when WP is offering Magpie? Quite simply, magpie sucks and by bundling both... well again, that's just silly.

WP should be offering Simplepie because

1) It's better than Magpie
2) It makes it easier for plugin authors to write RSS-consuming plugins
3) It meets the "Code is Poetry" mantra of WP by providing clean and simple functions to do all the stuff a library should do. Magpie is just fugly.
4) Upgrading Simplepie is as clean as dropping an upgraded library into class-Simplepie.php. No fuss, no muss, no editing. Matt, why is upgrading hard?
5) Provides WP a future-proofed method of consuming feeds in new and different ways not yet in this iteration of WP.
6) Because feed consumption in a blogging software should not be relegated to "second rate" functionality or as additional pieces of flair. Feed consumption is _essential_ to blogging and blogging software. Let's stop rying to pretend it can be worked around.

#21 in reply to: ↑ 19 @matt
10 years ago

Replying to Otto42:

Oh, and one more thing: Google's mechanism is Javascript-only. It only works in the browser. I can't get data from a feed and, say, store it in the database or anything.

That's not true.

#22 follow-up: @matt
10 years ago

  • Milestone 2.3 (trunk) deleted
  • Resolution set to wontfix
  • Status changed from new to closed
  • Type changed from task to enhancement

Don't forget the WikiFormatting that Trac needs so your points don't run together.

1) It's better than Magpie

Where "better" means different, and parsing more types of feeds. Like all new code, I'm sure it has bugs and security problems. It is larger than phpmailer, snoopy, IXR, pop3, and KSES combined. I'm sure it's fine code, but that's a lot of bulk for something that (1) generates very little complaints or support right now and (2) works for what we need it for.

2) It makes it easier for plugin authors to write RSS-consuming plugins

Plugin authors can and will bundle their own libraries. If the worst thing that happens is two RSS-consuming plugins bundle the same library, don't do a is_defined type check to see if it's already there, and are loaded at the same time, I don't see that as a big deal.

3) It meets the "Code is Poetry" mantra of WP by providing clean and simple functions to do all the stuff a library should do. Magpie is just fugly.

There is lots of nice code in the world, we shouldn't include it all. Our API functions in front of the library are going to stay the same anyway.

4) Upgrading Simplepie is as clean as dropping an upgraded library into class-Simplepie.php. No fuss, no muss, no editing. Matt, why is upgrading hard?

Are you saying you've audited all 300k of that code?

5) Provides WP a future-proofed method of consuming feeds in new and different ways not yet in this iteration of WP.

It's not actually future proof in any sense of the word. Historically feed formats have changed several times in the short lifetime of WP, and even the interpretation of feed elements have evolved over time.

6) Because feed consumption in a blogging software should not be relegated to "second rate" functionality or as additional pieces of flair. Feed consumption is _essential_ to blogging and blogging software.

Repeating yourself doesn't make it more true. If feed consumption was as essential as you say, and magpie was as broken as you suppose, then no one would use WP. Pick a side.

Write a great Simplepie plugin, put it in the plugin dir, and we can track its usage. If it's popular, that shows a market demand for it and we can look at bringing it into core. We're at the point in WP where we need to focus on improving the most-used parts, rather than adding a third of a megabyte for edge functionality.

#23 in reply to: ↑ 22 @Otto42
10 years ago

Replying to matt:

That's not true.

Okay, so what am I missing? How would I run the Google feed API on my own server, considering that it's AJAX only?

Replying to matt:

We're at the point in WP where we need to focus on improving the most-used parts, rather than adding a third of a megabyte for edge functionality.

Calling it edge functionality over and over again doesn't make that true either.

#24 @markjaquith
10 years ago

Write a great Simplepie plugin, put it in the plugin dir, and we can track its usage.

This should enable that:

(In [5776]) Make it possible for a plugin to replace the feed-parsing engine. see: #4547

Have your plugin hook into the 'load_feed_engine' action, include your feed engine and define the WP feed API functions that you want to replace.

At least now we can have a third party solution.

@Otto42
10 years ago

Allows a plugin to not load Magpie at all, thus allowing for true replacement.

#25 @Otto42
10 years ago

markjaquith: That addition is great and all, but it doesn't allow you to replace Magpie because wp_rss and get_rss are not actually used anywhere in WordPress.

In order to replace the RSS features with a plugin, you need to be able to replace fetch_rss and emulate its behavior.

Also, with just that action at the top, there's still no way to prevent the Magpie class from loading anyway. I suggest the attached patch, which will let you define SKIP_MAGPIE and not load any of the whole mess, thus allowing for a REAL replacement feed plugin.

#26 @markjaquith
10 years ago

It was intentional that you couldn't stop Magpie from loading -- for maximum backwards compatibility with any plugins that might be calling it directly. Lack of fetch_rss() replacement ability was an oversight. I'll !function_exists()-wrap fetch_rss() as a stopgap measure.

Also, I think your patch has it backwards!

#27 @markjaquith
10 years ago

(In [5845]) function_exists() wrapper around fetch_rss() to allow for plugin replacement. Props Otto42. see #4547

#28 follow-up: @markjaquith
10 years ago

I didn't make this clear:

Although I intentionally prevented the hook from replacing the Magpie class, I'm open to arguments for why it should be able to do so. The best argument that comes to mind is a reduction of the amount of PHP code that has to be parsed... is there another?

#29 in reply to: ↑ 28 @Otto42
10 years ago

Replying to markjaquith:

Also, I think your patch has it backwards!

Argh. I swear that there was a ! there when I wrote it!

Replying to markjaquith:

Although I intentionally prevented the hook from replacing the Magpie class, I'm open to arguments for why it should be able to do so. The best argument that comes to mind is a reduction of the amount of PHP code that has to be parsed... is there another?

Seems like enough to me. There's a huge amount of code there that is essentially useless if you're writing a plugin to replace it. All of the rss.php, all of the Snoopy stuff, etc. I mean, with the current codebase (2.2.1) I dropped my replacement code over the file entirely instead of using a plugin method, if you're replacing those functions, there's no reason to keep the old stuff.

Note: See TracTickets for help on using tickets.