WordPress.org

Make WordPress Core

Opened 5 years ago

Closed 5 years ago

#23572 closed enhancement (fixed)

Add functions to parse Audio / Video data out of arbitrary content or a post

Reported by: wonderboymusic Owned by: markjaquith
Milestone: 3.6 Priority: normal
Severity: normal Version:
Component: Post Formats Keywords: needs-unit-tests has-patch commit needs-codex
Focuses: Cc:

Description

Scenarios

Audio

After realizing that a post has a format of audio but no metadata for audio:

  • Theme does not support Audio, but the content contains an <audio>, <object>, <embed>, or <iframe>
  • Theme does not support Audio, but the content is a URL
  • Theme does not support Audio, but the first line of content is a URL followed by a new line(s) and some commentary
  • Theme does not support Audio, but the post has attached audio, and core or a plugin has an [audio] shortcode
  • Theme does not support Audio, but the post has attached audio, and core or a plugin does not have an [audio] shortcode

Introduce the_audio(), get_the_audio( $id = 0 ), get_post_audio( $post_id = 0 ), get_content_audio( &$content, $remove = false )

Video

After realizing that a post has a format of video but no metadata for video:

  • Theme does not support Video, but the content contains an <video>, <object>, <embed>, or <iframe>
  • Theme does not support Video, but the content is a URL
  • Theme does not support Video, but the first line of content is a URL followed by a new line(s) and some commentary
  • Theme does not support Video, but the post has attached video, and core or a plugin has an [video] shortcode
  • Theme does not support Video, but the post has attached video, and core or a plugin does not have an [video] shortcode

Introduce the_video(), get_the_video( $id = 0 ), get_post_video( $post_id = 0 ), get_content_video( &$content, $remove = false )

Helper functions

Introduce get_tag_regex() and shortcode_exists( $tag )

Attachments (13)

23572.diff (9.0 KB) - added by wonderboymusic 5 years ago.
23572.2.diff (5.4 KB) - added by wonderboymusic 5 years ago.
23572.3.diff (6.2 KB) - added by wonderboymusic 5 years ago.
23572-html.diff (13.1 KB) - added by wonderboymusic 5 years ago.
23572-html.2.diff (11.6 KB) - added by wonderboymusic 5 years ago.
23572-html.3.diff (13.5 KB) - added by wonderboymusic 5 years ago.
23572-html.4.diff (13.8 KB) - added by wonderboymusic 5 years ago.
23572-html.5.diff (13.8 KB) - added by wonderboymusic 5 years ago.
23572-html.6.diff (20.8 KB) - added by wonderboymusic 5 years ago.
23572-html.7.diff (19.7 KB) - added by wonderboymusic 5 years ago.
23572-html.8.diff (19.4 KB) - added by markjaquith 5 years ago.
remove already-added function
23572-html.9.diff (19.5 KB) - added by wonderboymusic 5 years ago.
23572-html.10.diff (23.4 KB) - added by wonderboymusic 5 years ago.

Download all attachments as: .zip

Change History (67)

@wonderboymusic
5 years ago

#1 @helen
5 years ago

  • Component changed from Media to Post Formats

#2 @lancewillett
5 years ago

  • Cc lancewillett added

#3 @beaulebens
5 years ago

  • Cc beau@… added

#4 @lancewillett
5 years ago

For support in default themes, see #23620 -- only Twenty Thirteen needs this right now.

#5 @wonderboymusic
5 years ago

  • Keywords needs-refresh added; has-patch removed

This ticket really requires #23282 and needs to be refreshed after it is committed

#6 @nacin
5 years ago

  • Keywords needs-unit-tests added

The extraction code definitely needs unit tests.

Probably no need for so many functions. wp_get_post_videos( $post ) and wp_get_post_audio( $post ) both returning arrays should be a sufficient starting point. In general I would start with general utility functions and then add more as we realize how best these will be used in a theme.

#7 follow-up: @wonderboymusic
5 years ago

  • Keywords has-patch added; needs-refresh removed

Probably no need for so many functions.

This is similar to slurping images / galleries from a content blob. There are so many different ways it can be in there, we need 6 extraction functions, and here they are:

  • get_post_audio() - wraps get_children(), returns attached audio, impl'd in #23282
  • get_post_video() - wraps get_children(), returns attached video, impl'd in #23282
  • get_content_audio() - slurps shortcodes out of the passed content and returns a list of lists: each shortcode can have multiple <source>s (the primary audio <source> and its fallbacks)
  • get_content_video() - slurps shortcodes out of the passed content and returns a list of lists: each shortcode can have multiple <source>s (the primary video <source> and its fallbacks)
  • get_embedded_audio() - returns a list: slurps embed HTML (<object>,<embed>,<iframe>) and HTML5 <audio> tags out of content, or if the content is a URL or starts with a URL, returns that as well
  • get_embedded_video() - returns a list: slurps embed HTML (<object>,<embed>,<iframe>) and HTML5 <video> tags out of content, or if the content is a URL or starts with a URL, returns that as well

Patch updated. Unit Tests forthcoming.

#8 in reply to: ↑ 7 @DrewAPicture
5 years ago

  • Keywords dev-feedback added

Replying to wonderboymusic:

Probably no need for so many functions.

This is similar to slurping images / galleries from a content blob. There are so many different ways it can be in there, we need 6 extraction functions, and here they are:

Looking at these six functions, seems like there's a lot of duplication between the pairs. I could see reducing six to three and adding a media flag with a switch:

  • get_post_media( 'audio|video' ) (#23282)
  • get_content_media ( 'audio|video' )
  • get_embedded_media( 'audio|video' )

Also happy to make patch(es) to do just that.

#9 @DrewAPicture
5 years ago

  • Cc xoodrew@… added

#10 @wonderboymusic
5 years ago

  • Keywords dev-feedback removed

patch updated, functions are generic now, etc

#11 @markjaquith
5 years ago

In 23774:

Add functions to parse audio or video data out of arbitrary content or a post

props wonderboymusic. see #23572

#12 @markjaquith
5 years ago

Filters for this too?

#13 follow-up: @lancewillett
5 years ago

@wonderboymusic -- two questions as we dig into integrating this into Twenty Thirteen.

  1. For get_content_images() and get_content_image() could we add an option to return the full <img /> HTML string instead of just the src attribute value?

Right now we'll constrain the display with CSS, but ideally we'd be able to use the HTML that was inserted into the editor so we get the correctly sized img output HTML without having to rebuild it.

Or, even better: return the attachment IDs of the images, if the images strings looks like a WP attachment. That way we can use wp_get_attachment_image() on the image instead of a) trusting the returned HTML and b) trying to build the image HTML string from scratch.

Reasoning: In Twenty Thirteen after we get the bare image src URL we have to turn around and rebuild an img tag again.

  1. get_content_media() -- should this function be outputting the final HTML or just returning bare URLs?
@return array A list of lists. Each item has a list of sources corresponding
 *		to a [{media type}]'s primary src and specified fallbacks

Looks to me like this is only returning a URL now, right? We get the "Primary src" back. (Also, what would be "specified callbacks" here)?

Reasoning: In Twenty Thirteen after we get the bare URL we have to then turn around and re-parse with wp_oembed_get() or similar to make it into the correct output HTML again.

#14 @wonderboymusic
5 years ago

yes, I can see how that is weird - its probably easy to change it to just return the HTML and skip the src regex - I will take a look tonight

#15 @obenland
5 years ago

Looking at Lance's patch for 2013 there is two more things that come to mind for me:

  1. You probably talked about this earlier, but I think it would be beneficial to have template tags a la the_video() as a single point of reference for themes. Apparently there are up to four ways to find a video - we should leave it to themes to come up with an abstraction layer.
  2. It's a pain in the butt to remove a [video] shortcode from the post's content through get_content_media(). Can we come up with a (default) way to remove it so that it is removed from the_content()?

#16 in reply to: ↑ 13 @DrewAPicture
5 years ago

Replying to lancewillett:

Right now we'll constrain the display with CSS, but ideally we'd be able to use the HTML that was inserted into the editor so we get the correctly sized img output HTML without having to rebuild it.

Replying to wonderboymusic:

yes, I can see how that is weird - its probably easy to change it to just return the HTML and skip the src regex - I will take a look tonight

I think we need to be careful about pigeonholing ourselves into only returning the source or a built HTML tag.

Sometimes people want the source, sometimes they want the markup and using a bool $src parameter or similar could accomplish that.

I'd just like to see us avoid a situation where we need two functions to do basically the same thing with a different return, such as with wp_get_attachment_image() and wp_get_attachment_image_src()

#17 @wonderboymusic
5 years ago

  • Keywords commit added

attachment:23572-html.diff changes the function signatures for the following functions:

  • get_content_image( &$content, $html = true, $remove = false )
  • get_content_images( &$content, $html = true, $remove = false, $limit = 0 )
  • get_content_video( &$content, $html = true, $remove = false )
  • get_content_audio( &$content, $html = true, $remove = false )
  • get_content_media( $type, &$content, $html = true, $remove = false )

By default now, you get the HTML for the matched items, set $html to false to get the srcs

#18 follow-up: @wonderboymusic
5 years ago

yes, there def needs to be a the_video() and the_audio() type thing, just needs to be thorough. Has to check for the following things:

  • Is the theme structured and is there meta?
  • Is there a video attached to the post instead? Is there more than one?
  • Is there an attachments AND meta? Is is the same or different?
  • Is there a [video] in the content?
  • Is there embed code(s) in the content?
  • does the content contain a video URL on a line by itself?

In order of precedence, I would say:

  • Metadata
  • Attached video
  • Parse from content
    • Shortcode
    • URL by itself
    • Embed codes

#19 in reply to: ↑ 18 @lancewillett
5 years ago

Replying to wonderboymusic:

yes, there def needs to be a the_video() and the_audio() type thing, just needs to be thorough.

+1

This would remove almost all the complex code currently in the Twenty Thirteen patch to support video post formats.

#20 @lancewillett
5 years ago

I agree with your order of precedence, and for these wrapper functions I'd argue that "removing from content" should be the default behavior. Theme can call the_video() once and get back the HTML output for that video item -- and the post content should be automatically filtered without having to change the_content() call in that same template.

I'd love to start off with:

  • the_video()
  • the_audio()
  • the_image()

Those are the most common use cases for media-related post formats.

#21 follow-up: @obenland
5 years ago

Sorry, I didn't make myself clear: I meant, if remove == true, remove from content so the_content() works, unless a content to remove it from was passed.

#22 in reply to: ↑ 21 @lancewillett
5 years ago

Replying to obenland:

Sorry, I didn't make myself clear: I meant, if remove == true, remove from content so the_content() works, unless a content to remove it from was passed.

+1

So if no content string passed as an argument (which would be the case for a filter) the core function should remove it from the globally scoped $post object (the current post's content)?

Also: should these functions check for being in a post format? has_post_format( 'video' ) before executing? Does that matter?

#23 follow-up: @wonderboymusic
5 years ago

Ok, I updated attachment:23572-html.diff and added 3 functions:

  • get_the_media( $type )
  • the_audio()
  • the_video()

the_audio() and the_video() are just wrappers for get_the_media(), which contains all of the black voodoo to return the HTML for an audio or video and remove it from the global post's content when appropriate. Does a post format check for meta only

Comments?

#24 in reply to: ↑ 23 @lancewillett
5 years ago

Replying to wonderboymusic:

added 3 functions: the_audio() and the_video() are just wrappers for get_the_media(), which contains all of the black voodoo to return the HTML for an audio or video and remove it from the global post's content when appropriate. Does a post format check for meta only

Lookin' awesome.

One thing noticed in testing, if the video output is a bare URL it probably needs to be converted to embed before returning. @obenland noted in IRC, we should probably check there in get_the_media() whether the post meta value is an embeddable URL, similar to how we handle oEmbed in regular content.

Could you also add a case for the_image()?

Since it doesn't involve post meta, would be first checking the first attachment, then fallback to first image found in the post content.

  1. get_attached_images( ... ) -- take the first result, return as output HTML
  2. get_content_image( ... ) -- take 1st result, return as output HTML

@obenland Can you please add your notes here about the $pages global versus $post->post_content?

#25 @wonderboymusic
5 years ago

if the video output is a bare URL

should Line 2042 in patch be [embed] instead of [{type}] shortcode?

I will work on the_image() right now

#26 @obenland
5 years ago

Updating $post->post_content seems to be only one step in a two step process. We would need to call setup_postdata( $post ) again after that, so that all globals get updated for the template tags.

Also I noticed, an attached video (for example) would not get stripped from the post content. In my test case I had it in there with a video shortcode that made it even more challenging to come up with a way to remove it :(

Scott, the next time I see you, beer's on me!

#27 @obenland
5 years ago

do_shortcode( '[embed]http://www.youtube.com/watch?v=9yEtf-r08uM[/embed]' ) did not work for me. Is my syntax wrong?

#28 @wonderboymusic
5 years ago

Patch updated.

Added the_image()

Also fixed get_the_media()so that the shortcode is removed from the content. The problem was nesting of functions that all passed the same variable by reference. Super weird. The content would be correct within the function, but not when the function returned.

do_shortcode( '[embed]http://www.youtube.com/watch?v=9yEtf-r08uM[/embed]' ) did not work for me.

I think [embed] only works during the_content() - it works in post_format_compat() because that filter runs before the embed filter runs.

#29 @wonderboymusic
5 years ago

Updated patch - the_image() will now return HTML and remove the found image from the content when present, even when the image is the result of an Insert Into HTML image attachment.

Few things left to do, and why this is a nightmare:

  • Found an image, the image is from the Media Library, but the image's post parent is another post.
  • The found image is wrapped in a link that could either be the image src or the attachment page
  • All of this nonsense could be wrapped in a caption shortcode
  • Even more awesome, have to loop through every possible size of the image

Probably have to do the following:

  • Extract [caption]s, and for each one, look for the existence of the image URL in all of its sizes, also looping through the attachment URLs (they will be in the same $urls array) - if the URL is present, remove the [caption]
  • Do the same, but for <a> then <img>

Brutal.

#30 @wonderboymusic
5 years ago

Patch updated - strips associated captions now, which takes care of swallowing their contents

Last edited 5 years ago by wonderboymusic (previous) (diff)

@markjaquith
5 years ago

remove already-added function

#31 @markjaquith
5 years ago

So there's some strange behavior here. The wp_video_shortcode() has a printf(). Paste in a YouTube URL and setup_postdata() will echo a link onto the page. Huh?

Seems like it's checking for a local URL before it's checking for an OEmbed capable URL.

#32 @lancewillett
5 years ago

Testing .9.diff with images and video -- works nicely, w00t. Using the_extra_content() works beautifully, using attachment:ticket:23620:23620.7.diff for testing with Twenty Thirteen.

#33 @lancewillett
5 years ago

@wonderboymusic Tiny code style comments on .9.diff, generally with PHP code we avoid camel-cased variable names; you use hasTeaser for example. Better: has_teaser. And something like stripteaser could be strip_teaser, both per Reference.

#34 @wonderboymusic
5 years ago

those snippets also lovingly reside in get_the_content() which is what we copied to make that func. We knew it was duping code, so left an @internal block that says we need to abstract more of it - this code happened at like 3am, pretty sure Jaquith was calling me names as it happened. Will peruse again right now

#35 @wonderboymusic
5 years ago

attachment:23572-html.10.diff modernizes the get_the_content()-inspired code

#36 @markjaquith
5 years ago

  • Owner set to markjaquith
  • Resolution set to fixed
  • Status changed from new to closed

In 23819:

Give themers tangible, user-friendly template functions to take full advantage of structured post formats.

  • the_audio()
  • the_video()
  • the_image()
  • get_the_media()

Also introduces:

  • get_the_extra_content()
  • the_extra_content()

Those two functions are like their non-extra versions, except that they
will have any post-format bits extracted. e.g. It's an image post, for
which the_image() will extract an <img /> tag. the_extra_content() will
output the content *without* that image.

props wonderboymusic. Herculean effort. fixes #23572

#37 @greenshady
5 years ago

I wanted to point out that the get_the_image() function will conflict with the Get the Image plugin here: http://wordpress.org/extend/plugins/get-the-image/

The main function of that plugin is get_the_image(). Introducing that function into core will break 1,000s of WordPress sites that currently use it.

#38 @greenshady
5 years ago

  • Cc justin@… added

#39 @DrewAPicture
5 years ago

  • Keywords needs-codex added

#40 @mintindeed
5 years ago

  • Cc gabriel.koen@… added

Just a heads up that r23819 contains a function naming conflict with the (as far as I can tell) widely-used plugin http://wordpress.org/extend/plugins/get-the-image/

#41 @batmoo
5 years ago

  • Cc batmoo@… added

#42 @johnjamesjacoby
5 years ago

  • Cc jjj@… added

#43 @westi
5 years ago

  • Resolution fixed deleted
  • Status changed from closed to reopened

I'd been holding back on saying this until the dust settled and someone else found a true issue but I think that these functions have really generic names and are likely to clash with other stuff.

Should we not wp_ prefix them all while we can?

#44 follow-up: @johnjamesjacoby
5 years ago

Perhaps not oddly, I have the opposite opinion.

Plugin authors have a responsibility to namespace their functions and classes, so as not to collide with the non-namespaced environment they run in.

Another cringe-worthy approach, is moving get_the_image() into pluggable.php, and Tadlock making sure there's feature parity between core and his plugin.

#45 in reply to: ↑ 44 @obenland
5 years ago

Replying to johnjamesjacoby:

Perhaps not oddly, I have the opposite opinion.

Plugin authors have a responsibility to namespace their functions and classes, so as not to collide with the non-namespaced environment they run in.

I agree. Especially when it comes to template tags.

#46 follow-up: @greenshady
5 years ago

Replying to westi:

Should we not wp_ prefix them all while we can?

I actually wish we'd do that with all WordPress functions, but that's a different discussion. I say we should definitely do that for these functions.

Replying to johnjamesjacoby:

Plugin authors have a responsibility to namespace their functions and classes, so as not to collide with the non-namespaced environment they run in.

The name of the plugin is "Get the Image". At the time I created the plugin, using get_the_image made a lot of sense. If I had created the plugin today, I'd probably name it different. Five years ago when the plugin was created, I didn't know about prefixing stuff. It wasn't something that was preached that much.

Unfortunately, it's not something that can change in the plugin because the plugin code, specifically the get_the_image() call, requires theme integration.

Another cringe-worthy approach, is moving get_the_image() into pluggable.php, and Tadlock making sure there's feature parity between core and his plugin.

One thing I didn't mention above is the the Get the Image plugin is included with many themes and is used on millions of Web sites in that manner. Themes can't overwrite pluggable functions.

It's also used on WordPress.com, which is problematic.

Last edited 5 years ago by greenshady (previous) (diff)

#47 in reply to: ↑ 46 @johnjamesjacoby
5 years ago

Replying to greenshady:

The name of the plugin is "Get the Image". At the time I created the plugin, using get_the_image made a lot of sense. If I had created the plugin today, I'd probably name it different. Five years ago when the plugin was created, I didn't know about prefixing stuff. It wasn't something that was preached that much.

I understand, and assumed as much.

One thing I didn't mention above is the the Get the Image plugin is included with many themes and is used on millions of Web sites in that manner. Themes can't overwrite pluggable functions. It's also used on WordPress.com, which is problematic.

At the risk of sounding dramatic, it's another example of why including plugins in themes is such a bad idea; functions.php ends up being a booby-trap.

I actually wish we'd do that with all WordPress functions, but that's a different discussion. I say we should definitely do that for these functions.

Seems like the most logical choice given the circumstance. A shame to break the convention, since most other loop-oriented template tags aren't prefixed.

#48 @Jayjdk
5 years ago

  • Cc kontakt@… added

#49 @iamtakashi
5 years ago

  • Cc takashi@… added

#50 @SergeyBiryukov
5 years ago

get_the_image() conflict was also mentioned in ticket:23593:2.

FWIW, "Avoiding Function Name Collisions" section was added to both "Plugin API" and "Writing a Plugin" Codex articles back in 2005:
https://codex.wordpress.org/index.php?title=Plugin_API&diff=6312&oldid=5289
https://codex.wordpress.org/index.php?title=Writing_a_Plugin&diff=9452&oldid=9448

#51 @danielbachhuber
5 years ago

Fatal error reports for get_the_image() have started coming in from WordPress.com VIP devs. r23819 is probably a merge blocker for WordPress.com, as it will break many sites.

#52 @WraithKenny
5 years ago

#23927 related, and should resolve the issue here

#53 @chipbennett
5 years ago

  • Cc chip@… added

#54 @wonderboymusic
5 years ago

  • Resolution set to fixed
  • Status changed from reopened to closed

The names were changed (no more get_the_image()) so we can close this and move along

Note: See TracTickets for help on using tickets.