Make WordPress Core

Opened 16 years ago

Closed 16 years ago

#5651 closed defect (bug) (fixed)

Optimize get_plugin_data

Reported by: filosofo's profile filosofo Owned by: jacobsantos's profile jacobsantos
Milestone: 2.7 Priority: normal
Severity: normal Version: 2.5
Component: General Keywords: has-patch get_plugin_data memory plugin-management needs-testing
Focuses: Cc:

Description

When get_plugin_data looks up the plugin's Author, URL, Description, etc., it slurps in the entire plugin file, and runs 5 regexp matches against it. This seems to result in fatal errors for a number of people, as googling will show [1].

Instead, the get_plugin_data should check just the first 30 lines, as my attached patch does.

[1] Some examples of this fatal error in the wild:

Attachments (5)

plugin_data_opt.diff (602 bytes) - added by filosofo 16 years ago.
5651.r6603.diff (2.5 KB) - added by darkdragon 16 years ago.
Adds phpdoc comments along with new function to get plugin contents
4048.2.r6603.diff (3.3 KB) - added by darkdragon 16 years ago.
Second try after suggestions, still based off of r6603 and fixes version info
plugins.txt (34.5 KB) - added by filosofo 16 years ago.
[line number on which "plugin name:" appears]:[plugin name] for plugins in wp-plugins.org repository
3089.5651.patch.diff (5.4 KB) - added by darkdragon 16 years ago.
Combined #3089 completed patch, along with #5651 optimizations, with phpdoc and based off of r6666

Download all attachments as: .zip

Change History (34)

#1 @ryan
16 years ago

Would file_get_contents() with the offset and maxlen args be faster? This function requires php >= 4.3, but given that some 4.3-isms snuck into 2.3 (#5416), maybe we should consider bumping the php requirement for WP 2.5.

#2 @darkdragon
16 years ago

Most of the errors can be solved with the patch in #3141.

Your patch does not solve the issue, but merely only covers a large portion of it. file() is still going to read the entire contents and return an array. So it will still increase the memory requirements. However, basically what the code is doing is exactly what file_get_contents() will return.

If you want to solve the memory issue without #3141, then you need to open the file, read the first 2048 bytes, run that through a regular expression that checks if it contains all of the required information, if not then pull in the next 2048 bytes and recheck and continue until you get everything. At the very most you'll probably only be pulling 4 KB of data. Far less than the size of some plugin files. It is also unlikely that you'll need anything more than 2KB, but to be sure you can pull in 8KB of the file and just hope that everything is within that amount.

#3 @darkdragon
16 years ago

Ha ha, most plugin files I have are more than 2KB, but less than 4KB.

Pulling in the first KB might be better.

@darkdragon
16 years ago

Adds phpdoc comments along with new function to get plugin contents

#4 @darkdragon
16 years ago

New patch uses file_get_contents if it exists. It also has a hook with instructions on what to do in the event that not all of the plugin's data is retrieved in the first KB.

Plugin authors will have to make sure that they contain their information in the first KB.

#5 @filosofo
16 years ago

Excellent!

#6 follow-up: @DD32
16 years ago

  1. get_plugin_data @ since shouldnt be 2.5
  2. never trust plugin authors to have all the data within 1kb at the top of the file, I've come accross several with huge FAQ's at the top of the file (and then the metadata)

My thoughts is that the file should be read until the first comment block is in, then match for details. That'd work for files like this:

/*
Name: Blah
*/

But wouldnt work for some others i've seen lately of:

// Plugin Name: Blaah
// Author: Me!

Unless the reading function was smarter.

While i'd say reading the entire file isnt a problem for the admin since its only called once in awhile, However, as you point out, others are having troubles, with such low memory limits(8M), i'm supprised the rest of WP is loading(Which is shown by furthur pages where install cant run due to it)

So maybe a function to read in the first // comment block, Or to read all the lines starting with if a // is not found? How common is the comment style for metadata?(I've only ever seen it once)

#7 @darkdragon
16 years ago

On the comment of why the function is @since 2.5 it is when the function was added and not when the functionality was added. The note expresses this detail in the comment. I couldn't find the function in my version detail list, so I assumed it was Trunk only (I am missing the latest of 2.3 branch however).

I'll take your suggestion and see what can be done about it.

#8 @DD32
16 years ago

Well here's where it was introduced, if thats any help? [1887]

#9 @DD32
16 years ago

(Seems to be introduced with 1.3-alpha-5 http://trac.wordpress.org/browser/trunk/wp-includes/version.php?rev=1887 -- so why its not in your index.. i'm not sure)

#10 in reply to: ↑ 6 ; follow-up: @filosofo
16 years ago

Replying to DD32:

  1. never trust plugin authors to have all the data within 1kb at the top of the file, I've come accross several with huge FAQ's at the top of the file (and then the metadata)

The Codex says the metadata should be at the top, and it's said that for almost three years, so plugin authors should know better. But what plugin is an example of this?

#11 in reply to: ↑ 10 ; follow-up: @DD32
16 years ago

Replying to filosofo:

The Codex says the metadata should be at the top, and it's said that for almost three years, so plugin authors should know better. But what plugin is an example of this?

Just because documentation specifies how something should be done, Doesnt mean authors will follow it when writing their plugins (As they find it actually works their way.

As for an example.. Heres one(Allthough it's metadata is half way through the lot), 14KB of opening comment: http://svn.wp-plugins.org/google-sitemap-generator/trunk/sitemap.php

@darkdragon
16 years ago

Second try after suggestions, still based off of r6603 and fixes version info

#12 in reply to: ↑ 11 @filosofo
16 years ago

Replying to DD32:

As for an example.. Heres one(Allthough it's metadata is half way through the lot), 14KB of opening comment: http://svn.wp-plugins.org/google-sitemap-generator/trunk/sitemap.php

For that plugin everything up to and including the metadata is 1.6K, so let's assume it's one of the worst offenders and read in 2K.

#13 follow-up: @darkdragon
16 years ago

It was a copy and paste error. It was in my version index after all. I'm not sure how many mistakes there are, as it only happens a few times while doing the phpdoc documentation. However that is why there is a ticket open to fix phpdoc documentation mistakes.

Replying to Filosofo, I have to agree with DD32 on this one. It would introduce a regression if we forced the suggestion on Plugin Developers. The goal is to only pull in enough that we get all of the data, not completely break half the plugins out there.

Something that should be, does not mean it is required. If it was required, then the wording would be must. Should just means you can do it, if you want, but it doesn't matter either way. Or that is what I read from the RFC (little bit complicated wording if you ask me).

Patch completes all of DD32 suggestions. Any more?

#14 follow-up: @darkdragon
16 years ago

Well, the current patch will read in 2k, since it won't find it in the first kB and will pull in the next kB. Which will contain all of the required information.

Can someone try the patch on the sitemap plugin to see if it works?

#15 in reply to: ↑ 14 @DD32
16 years ago

Replying to darkdragon:

Can someone try the patch on the sitemap plugin to see if it works?

Yup, Works for that one.

(Keep in mind, that the XML sitemap plugin was just a lucky shot, first "large"(120KB) plugin file i saw in my plugins folder, just so happened to have a lot of comments too, Theres others out there somewhere..)

#16 @darkdragon
16 years ago

Codex Page

The top of your plugin's main PHP file must contain a standard plugin information header.

Well, oops. It does say must, my bad. Technically the sitemaps plugin is not wrong since it is the at the top. So well I apologize.

#17 @darkdragon
16 years ago

I'm sure, DD32, that what you said might just be the case. Which is why I pull in some padding of 512 bytes. I believe it should be enough for version info, since it isn't part of the check.

I think that the version info should be required, but it isn't up to me to make that decision.

Well, the "padding" is really a hack and I'm not too happy with it, but I think pulling in 2.5 kB is better than 120KB. It might be better to pull in increments of 2kB if a lot of problems start popping up. That might be a good idea as a 2kB file is not going to cause the memory from exhausting.

#18 @darkdragon
16 years ago

Well, part of the problem also is that I'm using stripos(), so in theory, if any batch of the contents contains all of the plugin data fields with semicolons it will cause the loop to break and the info to not be pulled in.

I can't justify doing two preg_matches without returning anything. With the worse case, it might be better to return the matches in the plugin_get_contents instead of a string and allow for get_plugin_data() to finish triming and cleaning the data.

#19 @darkdragon
16 years ago

  • Keywords plugin-management added

#20 @westi
16 years ago

  • Owner changed from anonymous to westi
  • Status changed from new to assigned

#21 in reply to: ↑ 13 @filosofo
16 years ago

Replying to darkdragon:

The goal is to only pull in enough that we get all of the data, not completely break half the plugins out there.

To get an idea of how many plugins are doing what, I grepped through the wp-plugins.org repository for "Plugin Name:," outputting and reverse sorting by the line number where "Plugin Name:" appears. I've attached the result as plugins.txt.

As you can see, the worst offender is "WP List Files" at line 37 immediately followed by Google Sitemaps at line 33.

Some of the lines are different versions of the same plugin, but even so, less than 4% have the "Plugin Name:" after the fifth line. So I think it's safe to say that simply reading in 2K would adversely affect a statistically insignificant number of plugins, if any at all.

@filosofo
16 years ago

[line number on which "plugin name:" appears]:[plugin name] for plugins in wp-plugins.org repository

#22 @darkdragon
16 years ago

That is sweet dude! However, that would just mean that the loop would only need to run once before breaking. For most plugins that most that will be pulled in is 1.5kB, for plugins like Google Sitemaps, the total will be around 2.5kB.

I think that it is quite useful information. Thanks.

#23 follow-up: @ryan
16 years ago

Isn't stripos() PHP 5 only?

http://php.net/stripos

#24 in reply to: ↑ 23 @filosofo
16 years ago

Replying to ryan:

Isn't stripos() PHP 5 only?

Yes, but it's defined in WordPress for earlier versions. See wp-includes/compat.php

#25 @darkdragon
16 years ago

Reviewing the patch again, reveals several missing pieces. Basically the description and two others don't complete and will always be true, even if they haven't been pulled in yet.

Still, the hacks should prevent missing data, but it should be correct regardless. I'll patch this up again this weekend, so that it will finally be ready for commit.

@darkdragon
16 years ago

Combined #3089 completed patch, along with #5651 optimizations, with phpdoc and based off of r6666

#26 @darkdragon
16 years ago

Added patch which combines both #3089 and #5651.

#27 @darkdragon
16 years ago

  • Keywords needs-testing added

#28 @santosj
16 years ago

  • Milestone changed from 2.9 to 2.7
  • Owner changed from westi to jacobsantos
  • Status changed from assigned to new

Need to break the two patches from each other and resubmit patch.

#29 @ryan
16 years ago

  • Resolution set to fixed
  • Status changed from new to closed

(In [8367]) Plugin metadata localization and get_plugin_dta() optimization from santosj. fixes #5651 #3089

Note: See TracTickets for help on using tickets.