WordPress.org

Make WordPress Core

Opened 3 years ago

Closed 3 years ago

Last modified 3 years ago

#15193 closed defect (bug) (fixed)

get_file_data() regex issue

Reported by: scribu Owned by:
Milestone: 3.1 Priority: lowest
Severity: trivial Version:
Component: Plugins Keywords: has-patch
Focuses: Cc:

Description (last modified by scribu)

I always wondered how the plugin/theme header syntax could be so permissive.

It turns out the regex is too permissive.

If the description field is missing, for example, it will match any line at the beginning of a file that contains 'description:' in it, such as:

Some_Class_Description::init();

_e('Edit description:');

Attachments (3)

15193.diff (536 bytes) - added by scribu 3 years ago.
example.php (108 bytes) - added by scribu 3 years ago.
example plugin, demonstrating bug
15193.2.diff (545 bytes) - added by scribu 3 years ago.
slightly more permissive

Download all attachments as: .zip

Change History (25)

comment:1 scribu3 years ago

  • Description modified (diff)

scribu3 years ago

comment:2 scribu3 years ago

  • Keywords has-patch added

scribu3 years ago

example plugin, demonstrating bug

comment:3 Denis-de-Bernardy3 years ago

Couldn't that prevent the following two from working?

# Description: foo

/**
 * Description: foo
 */

comment:4 Denis-de-Bernardy3 years ago

Also:

// Description: foo

comment:5 filosofo3 years ago

I've seen a number of plugins with whitespace before each header, as Denis mentions.

If you want to avoid the double colon of a static method, why not use a negative lookahead instead?

:(?!:)(.*)

comment:6 scribu3 years ago

It's not just static method calls I want to avoid.

Another example of the problem:

_e('Edit description:');

comment:7 scribu3 years ago

  • Description modified (diff)

I always wondered how

comment:8 scribu3 years ago

  • Severity changed from normal to minor

comment:9 nacin3 years ago

  • Keywords needs-patch added; has-patch removed

Also:

#Description:
//Version:

etc.

We can probably get away with preventing alphanumeric characters, i.e. [^a-z0-9]*.

Related/duplicate: #14243

scribu3 years ago

slightly more permissive

comment:10 scribu3 years ago

  • Keywords has-patch added; needs-patch removed

15193.2.diff allows any combination of whitespace, '/', '#' and '*' characters.

I think that's the sweetspot.

comment:11 scribu3 years ago

  • Resolution set to fixed
  • Status changed from new to closed

(In [16215]) Make get_file_data() regex more precise. Fixes #15193

comment:12 nacin3 years ago

  • Priority changed from normal to lowest
  • Resolution fixed deleted
  • Severity changed from minor to trivial
  • Status changed from closed to reopened

There were no unit tests or thorough research through the plugin directory on this. This could very well break a number of plugins for not being inclusive enough.

This is a trivial bug of the lowest priority. No need to rush it. Suggest revert and punt.

comment:13 follow-up: scribu3 years ago

If we can knowingly break many plugins with #14915, I think we can also afford to potentially break a few plugins with extremely weird header definitions.

comment:14 in reply to: ↑ 13 nacin3 years ago

Replying to scribu:

If we can knowingly break many plugins with #14915, I think we can also afford to potentially break a few plugins with extremely weird header definitions.

We don't bargain like that. #14915 was a change deliberately made to fix an important inconsistency. Here, the fix for an extreme edge bug has the potential to introduce some much more noticeable bugs, like a plugin no longer being considered valid and thus deactivating itself.

I'm asking for research, and potentially unit tests. A grep of "Plugin Name:" through the entire repo should show every preceding character in use, and patterns could be drawn.

comment:15 scribu3 years ago

Fair enough.

comment:16 scribu3 years ago

So, based on a random sample of 2800+ plugins, here are all the combinations I found:

Plugin Name
Plugin Name
    *   Plugin name
/*Plugin name
Plugin name
      Plugin Name
     *   Plugin Name
     * Plugin Name
    * Plugin Name
    Plugin Name
   * Plugin Name
   Plugin Name
  Plugin Name
 *   Plugin Name
 *  Plugin Name
 * Plugin Name
 *	Plugin Name
 *Plugin Name
 Plugin Name
//Plugin Name
/* Plugin Name
/*Plugin Name
* Plugin Name
*	Plugin Name
	/*Plugin Name
	*Plugin Name
		Plugin Name
	Plugin Name
Plugin Name
PLUGIN NAME
  preg_match("|Plugin Name
			preg_match("|Plugin Name
                <td width="80px" valign="top">Plugin Name
				<td width="80px" valign="top">Plugin Name
			while ( ( null !== $item_key = array_rand($items) ) && false !== strpos( $items[$item_key]->get_description(), 'Plugin Name

Notice how the last lines shouldn't be recognised as valid headers.

This is the command I used:

grep --exclude-dir=.svn --include=*.php -ri 'Plugin Name:' | cut -d ':' -f 2 | sort | uniq

comment:17 nacin3 years ago

The thoroughness is much appreciated :-)

Kind of concerned about themes, too, especially since CSS isn't required to parse. Also, this won't fix #14243, but I call that an acceptable edge case.

comment:18 scribu3 years ago

Sample: 1775 random themes

Result:

  Original Theme Name
	Original Theme Name
        Theme Name
      Theme Name
    Theme Name
  Theme Name
 *   Theme Name
 * Theme Name
 *Theme Name
 Theme Name
/* Theme Name
/*	Theme Name
/*Theme Name
* Theme Name
/*Theme Name
		Theme Name
	Theme Name
Theme Name

The 'Original Theme Name' is a header defined by a 1 or two themes, after 'Theme Name'

Command used:

ack -G 'style\.css' 'Theme Name:' | cut -d ':' -f 3 | sort | uniq

comment:19 nacin3 years ago

  • Resolution set to fixed
  • Status changed from reopened to closed

Thanks for that! Let's call this one fixed based on the results of the research.

comment:20 scribu3 years ago

(In [16552]) Make get_file_data() regex even more precise. Props hakre. See #15193

comment:21 scribu3 years ago

hakre pointed out that '\s' in regexes causes issues on Windows systems and we're not interested in matching '\n' or '\r' anyway, so I replaced it with ' \t'.

Note: See TracTickets for help on using tickets.