Opened 2 years ago
Closed 2 years ago
#17267 closed defect (bug) (fixed)
twentyeleven_url_grabber() needs work
| Reported by: |
|
Owned by: | |
|---|---|---|---|
| Priority: | normal | Milestone: | 3.2 |
| Component: | Bundled Theme | Version: | 3.2 |
| Severity: | normal | Keywords: | has-patch close |
| Cc: |
Description
/**
* Grab the first URL from a Link post
*/
function twentyeleven_url_grabber() {
global $post, $posts;
$first_url = '';
ob_start();
ob_end_clean();
$output = preg_match_all('/<a.+href=[\'"]([^\'"]+)[\'"].*>/i', $post->post_content, $matches);
$first_url = $matches [1] [0];
if ( empty( $first_url ) )
return false;
return $first_url;
}
The regex needs some work:
- a.+href would match area href. While that isn't much of a concern, it would also skip to the second link as it's greedy.
- Likewise we should probably use a non-greedy (.+?) rather than ([\'"]+) for what we're capturing.
- We probably want the s pattern modifier there, in case there's an \n somewhere such as after <a and before href.
- No need for the .*> at the end, just grab the href and bail.
Something like this might work: '/<a\s[^>]+href=[\'"](.*?)[\'"]/is.
Also:
No need for the $posts global to be called upon.
Rather than preg_match_all, we should just use preg_match, since we want the first one. If ! preg_match, then we should return false. No need to assign the result to $output.
We should arguably use get_the_content() since this is being called in within a loop, rather than the raw, unfiltered data.
Anyone know what the output buffering was designed to do?
Attachments (2)
Change History (10)
comment:2
duck_
— 2 years ago
- Keywords has-patch added; needs-patch removed
How's this?
Slight change to nacin's suggested regex -- can have zero non > characters after the space and matching at least one character in the href attribute.
This also fixes a notice if there is no link present.
Related: #17279
comment:3
lancewillett
— 2 years ago
Nice improvements!
Output buffering is definitely not needed here since the "haystack" is the post content string; it was probably a leftover from a different function.
The updated regex works well in my testing.
comment:4
nacin
— 2 years ago
Note that /** et al. is phpdoc syntax. So any changes to comment text should happen to the text only, not the syntax around it.
duck_'s patch tests well here and looks like a good start. Let's roll with it.
We also need to make sure the link is safe. Perhaps esc_url_raw() before returning.