Opened 13 years ago
Closed 13 years ago
#17267 closed defect (bug) (fixed)
twentyeleven_url_grabber() needs work
Reported by: | nacin | Owned by: | |
---|---|---|---|
Milestone: | 3.2 | Priority: | normal |
Severity: | normal | Version: | 3.2 |
Component: | Bundled Theme | Keywords: | has-patch close |
Focuses: | Cc: |
Description
/** * Grab the first URL from a Link post */ function twentyeleven_url_grabber() { global $post, $posts; $first_url = ''; ob_start(); ob_end_clean(); $output = preg_match_all('/<a.+href=[\'"]([^\'"]+)[\'"].*>/i', $post->post_content, $matches); $first_url = $matches [1] [0]; if ( empty( $first_url ) ) return false; return $first_url; }
The regex needs some work:
a.+href
would matcharea href
. While that isn't much of a concern, it would also skip to the second link as it's greedy.- Likewise we should probably use a non-greedy
(.+?)
rather than([\'"]+)
for what we're capturing. - We probably want the
s
pattern modifier there, in case there's an\n
somewhere such as after<a
and beforehref
. - No need for the
.*>
at the end, just grab the href and bail.
Something like this might work: '/<a\s[^>]+href=[\'"](.*?)[\'"]/is
.
Also:
No need for the $posts
global to be called upon.
Rather than preg_match_all, we should just use preg_match, since we want the first one. If ! preg_match
, then we should return false. No need to assign the result to $output
.
We should arguably use get_the_content() since this is being called in within a loop, rather than the raw, unfiltered data.
Anyone know what the output buffering was designed to do?
Attachments (2)
Change History (10)
#2
@
13 years ago
- Keywords has-patch added; needs-patch removed
How's this?
Slight change to nacin's suggested regex -- can have zero non > characters after the space and matching at least one character in the href attribute.
This also fixes a notice if there is no link present.
Related: #17279
#3
@
13 years ago
Nice improvements!
Output buffering is definitely not needed here since the "haystack" is the post content string; it was probably a leftover from a different function.
The updated regex works well in my testing.
We also need to make sure the link is safe. Perhaps esc_url_raw() before returning.