Make WordPress Core

Opened 14 years ago

Closed 14 years ago

Last modified 14 years ago

#15193 closed defect (bug) (fixed)

get_file_data() regex issue

Reported by: scribu's profile scribu Owned by:
Milestone: 3.1 Priority: lowest
Severity: trivial Version:
Component: Plugins Keywords: has-patch
Focuses: Cc:

Description (last modified by scribu)

I always wondered how the plugin/theme header syntax could be so permissive.

It turns out the regex is too permissive.

If the description field is missing, for example, it will match any line at the beginning of a file that contains 'description:' in it, such as:

Some_Class_Description::init();

_e('Edit description:');

Attachments (3)

15193.diff (536 bytes) - added by scribu 14 years ago.
example.php (108 bytes) - added by scribu 14 years ago.
example plugin, demonstrating bug
15193.2.diff (545 bytes) - added by scribu 14 years ago.
slightly more permissive

Download all attachments as: .zip

Change History (25)

#1 @scribu
14 years ago

  • Description modified (diff)

@scribu
14 years ago

#2 @scribu
14 years ago

  • Keywords has-patch added

@scribu
14 years ago

example plugin, demonstrating bug

#3 @Denis-de-Bernardy
14 years ago

Couldn't that prevent the following two from working?

# Description: foo

/**
 * Description: foo
 */

#4 @Denis-de-Bernardy
14 years ago

Also:

// Description: foo

#5 @filosofo
14 years ago

I've seen a number of plugins with whitespace before each header, as Denis mentions.

If you want to avoid the double colon of a static method, why not use a negative lookahead instead?

:(?!:)(.*)

#6 @scribu
14 years ago

It's not just static method calls I want to avoid.

Another example of the problem:

_e('Edit description:');

#7 @scribu
14 years ago

  • Description modified (diff)

I always wondered how

#8 @scribu
14 years ago

  • Severity changed from normal to minor

#9 @nacin
14 years ago

  • Keywords needs-patch added; has-patch removed

Also:

#Description:
//Version:

etc.

We can probably get away with preventing alphanumeric characters, i.e. [^a-z0-9]*.

Related/duplicate: #14243

@scribu
14 years ago

slightly more permissive

#10 @scribu
14 years ago

  • Keywords has-patch added; needs-patch removed

15193.2.diff allows any combination of whitespace, '/', '#' and '*' characters.

I think that's the sweetspot.

#11 @scribu
14 years ago

  • Resolution set to fixed
  • Status changed from new to closed

(In [16215]) Make get_file_data() regex more precise. Fixes #15193

#12 @nacin
14 years ago

  • Priority changed from normal to lowest
  • Resolution fixed deleted
  • Severity changed from minor to trivial
  • Status changed from closed to reopened

There were no unit tests or thorough research through the plugin directory on this. This could very well break a number of plugins for not being inclusive enough.

This is a trivial bug of the lowest priority. No need to rush it. Suggest revert and punt.

#13 follow-up: @scribu
14 years ago

If we can knowingly break many plugins with #14915, I think we can also afford to potentially break a few plugins with extremely weird header definitions.

#14 in reply to: ↑ 13 @nacin
14 years ago

Replying to scribu:

If we can knowingly break many plugins with #14915, I think we can also afford to potentially break a few plugins with extremely weird header definitions.

We don't bargain like that. #14915 was a change deliberately made to fix an important inconsistency. Here, the fix for an extreme edge bug has the potential to introduce some much more noticeable bugs, like a plugin no longer being considered valid and thus deactivating itself.

I'm asking for research, and potentially unit tests. A grep of "Plugin Name:" through the entire repo should show every preceding character in use, and patterns could be drawn.

#15 @scribu
14 years ago

Fair enough.

#16 @scribu
14 years ago

So, based on a random sample of 2800+ plugins, here are all the combinations I found:

Plugin Name
Plugin Name
    *   Plugin name
/*Plugin name
Plugin name
      Plugin Name
     *   Plugin Name
     * Plugin Name
    * Plugin Name
    Plugin Name
   * Plugin Name
   Plugin Name
  Plugin Name
 *   Plugin Name
 *  Plugin Name
 * Plugin Name
 *	Plugin Name
 *Plugin Name
 Plugin Name
//Plugin Name
/* Plugin Name
/*Plugin Name
* Plugin Name
*	Plugin Name
	/*Plugin Name
	*Plugin Name
		Plugin Name
	Plugin Name
Plugin Name
PLUGIN NAME
  preg_match("|Plugin Name
			preg_match("|Plugin Name
                <td width="80px" valign="top">Plugin Name
				<td width="80px" valign="top">Plugin Name
			while ( ( null !== $item_key = array_rand($items) ) && false !== strpos( $items[$item_key]->get_description(), 'Plugin Name

Notice how the last lines shouldn't be recognised as valid headers.

This is the command I used:

grep --exclude-dir=.svn --include=*.php -ri 'Plugin Name:' | cut -d ':' -f 2 | sort | uniq

#17 @nacin
14 years ago

The thoroughness is much appreciated :-)

Kind of concerned about themes, too, especially since CSS isn't required to parse. Also, this won't fix #14243, but I call that an acceptable edge case.

#18 @scribu
14 years ago

Sample: 1775 random themes

Result:

  Original Theme Name
	Original Theme Name
        Theme Name
      Theme Name
    Theme Name
  Theme Name
 *   Theme Name
 * Theme Name
 *Theme Name
 Theme Name
/* Theme Name
/*	Theme Name
/*Theme Name
* Theme Name
/*Theme Name
		Theme Name
	Theme Name
Theme Name

The 'Original Theme Name' is a header defined by a 1 or two themes, after 'Theme Name'

Command used:

ack -G 'style\.css' 'Theme Name:' | cut -d ':' -f 3 | sort | uniq

#19 @nacin
14 years ago

  • Resolution set to fixed
  • Status changed from reopened to closed

Thanks for that! Let's call this one fixed based on the results of the research.

#20 @scribu
14 years ago

(In [16552]) Make get_file_data() regex even more precise. Props hakre. See #15193

#21 @scribu
14 years ago

hakre pointed out that '\s' in regexes causes issues on Windows systems and we're not interested in matching '\n' or '\r' anyway, so I replaced it with ' \t'.

Note: See TracTickets for help on using tickets.