Opened 15 years ago
Closed 15 years ago
#17267 closed defect (bug) (fixed)
twentyeleven_url_grabber() needs work
| Reported by: |
|
Owned by: | |
|---|---|---|---|
| Milestone: | 3.2 | Priority: | normal |
| Severity: | normal | Version: | 3.2 |
| Component: | Bundled Theme | Keywords: | has-patch close |
| Focuses: | Cc: |
Description
/**
* Grab the first URL from a Link post
*/
function twentyeleven_url_grabber() {
global $post, $posts;
$first_url = '';
ob_start();
ob_end_clean();
$output = preg_match_all('/<a.+href=[\'"]([^\'"]+)[\'"].*>/i', $post->post_content, $matches);
$first_url = $matches [1] [0];
if ( empty( $first_url ) )
return false;
return $first_url;
}
The regex needs some work:
a.+hrefwould matcharea href. While that isn't much of a concern, it would also skip to the second link as it's greedy.- Likewise we should probably use a non-greedy
(.+?)rather than([\'"]+)for what we're capturing. - We probably want the
spattern modifier there, in case there's an\nsomewhere such as after<aand beforehref. - No need for the
.*>at the end, just grab the href and bail.
Something like this might work: '/<a\s[^>]+href=[\'"](.*?)[\'"]/is.
Also:
No need for the $posts global to be called upon.
Rather than preg_match_all, we should just use preg_match, since we want the first one. If ! preg_match, then we should return false. No need to assign the result to $output.
We should arguably use get_the_content() since this is being called in within a loop, rather than the raw, unfiltered data.
Anyone know what the output buffering was designed to do?
Attachments (2)
Change History (10)
#2
@
15 years ago
- Keywords has-patch added; needs-patch removed
How's this?
Slight change to nacin's suggested regex -- can have zero non > characters after the space and matching at least one character in the href attribute.
This also fixes a notice if there is no link present.
Related: #17279
#3
@
15 years ago
Nice improvements!
Output buffering is definitely not needed here since the "haystack" is the post content string; it was probably a leftover from a different function.
The updated regex works well in my testing.
We also need to make sure the link is safe. Perhaps esc_url_raw() before returning.