Make WordPress Core

Opened 10 years ago

Last modified 3 years ago

#30377 reopened enhancement

wp_check_filetype is broken when checking urls with parameters

Reported by: supercleanse's profile supercleanse Owned by: audrasjb's profile audrasjb
Milestone: Future Release Priority: normal
Severity: normal Version: 4.0
Component: Media Keywords: has-unit-tests needs-dev-note needs-patch
Focuses: Cc:

Description

The function in ./wp-includes/media.php named wp_check_filetype has a bug.

It works properly when checking a url such as http://example.org/coolfile.mp4 but as soon as you add parameters to it (a common practice when attempting to embed un-cached or amazon pre-signed urls) like so: http://example.org/coolfile.mp4?extra=true&params=true ... it fails to return the extension / content type.

The fix for this should be *very* easy. The preg_match in this function that looks like this currently:

$ext_preg = '!\.(' . $ext_preg . ')$!i';

could be adjusted to ignore the query string (if there is one) and just return the true extension like so:

$ext_preg = '!\.(' . $ext_preg . ')(\?.*)?$!i';

I've tested this change in my local environment and it works great.

Attachments (5)

30377.diff (1.1 KB) - added by voldemortensen 10 years ago.
30377.2.diff (462 bytes) - added by layotte 9 years ago.
30377.3.diff (4.0 KB) - added by layotte 9 years ago.
30377.4.diff (2.5 KB) - added by ianmjones 8 years ago.
wp_check_filetype now returns correct extension when URL has query params or fragment. Unit tests included.
30377.5.diff (3.5 KB) - added by johnbillion 3 years ago.

Download all attachments as: .zip

Change History (71)

#1 @supercleanse
10 years ago

  • Severity changed from normal to major

#2 @SergeyBiryukov
10 years ago

  • Component changed from General to Media
  • Focuses ui javascript administration performance removed
  • Keywords needs-patch needs-unit-tests added
  • Milestone changed from Awaiting Review to 4.1
  • Severity changed from major to normal

#3 @supercleanse
10 years ago

For some additional info about this issue ... here's someone else who has had this problem:

http://wordpress.stackexchange.com/questions/152316/using-audio-shortcode-for-mp3-urls-with-a-query-string

#4 @voldemortensen
10 years ago

  • Keywords has-patch added; needs-patch needs-unit-tests removed

30377.diff updates the regex to correctly identify file type and adds unit tests.

#5 @supercleanse
10 years ago

That looks great.

So what are the odds that this change will be incorporated into 4.1?

I'm just asking because I figured out a nasty hack in my plugin around this issue but would like to version it so it won't run once this is in place.

Thanks, Blair

#6 @valendesigns
10 years ago

  • Keywords needs-testing added

This ticket was mentioned in Slack in #core by voldemortensen. View the logs.


10 years ago

#8 @wonderboymusic
10 years ago

  • Owner set to wonderboymusic
  • Resolution set to fixed
  • Status changed from new to closed

In 30640:

Adjust the RegEx in wp_check_filetype() to be aware that query strings are thing that exist sometimes in URLs.

Adds unit tests.

Props voldemortensen.
Fixes #30377.

#9 @pento
9 years ago

In 32172:

Revert [30640], as it was incorrectly checking some filenames. This merges [32171] in the 4.1 branch.

See #30377.

#10 @pento
9 years ago

  • Keywords needs-refresh added
  • Milestone changed from 4.1 to Future Release
  • Resolution fixed deleted
  • Status changed from closed to reopened

[30640] was reverted in [32171] and [32172].

#11 @voldemortensen
9 years ago

@pento, can I get an example or two of filetypes it was checking incorrectly so I can do testing?

#12 @johnbillion
9 years ago

#32082 was marked as a duplicate.

#13 follow-up: @cartpauj
9 years ago

This was working in 4.1, but appears to be broken again in 4.1.2 FYI

#14 in reply to: ↑ 13 @helen
9 years ago

Replying to cartpauj:

This was working in 4.1, but appears to be broken again in 4.1.2 FYI

As per above, it was reverted out of 4.1.2, so that's expected.

#15 @supercleanse
9 years ago

Is there any place I can see the urls the reverted regex was failing on?

I'm sure this regex can be adjusted to fix whatever cases it was failing on that have been submitted.

Now we're back to urls that include query strings being completely un-usable by the WordPress media players.

#16 @johnbillion
9 years ago

#32188 was marked as a duplicate.

@layotte
9 years ago

#17 @layotte
9 years ago

I just submitted a patch for this. Running into the problem when using a file from Soundcloud which has query args. Instead of a regex, I chose to get the 'path' from parse_url, which works in these cases:

file.mp3
c:\file.mp3
/usr/local/bin/file.mp3
http://lewayotte.com/file.mp3
lewayotte.com/file.mp3
http://lewayotte.com/file.mp3?test=1
lewayotte.com/file.mp3?test=1
http://lewayotte.com/file.mp3?test=1&echo=2
lewayotte.com/file.mp3?test=1&echo=2

#18 @johnbillion
9 years ago

  • Keywords needs-unit-tests added; needs-refresh removed

The problem here is that wp_check_filetype() was never intended to work with URLs, only file paths.

If we're going to explicitly allow support for URLs (which we need to do by extension of the fact that this function is used in the wp_[audio|video|playlist]_shortcode() functions), this is gonna need unit test coverage and security review.

#19 follow-up: @dd32
9 years ago

If we're going to explicitly allow support for URLs (which we need to do by extension of the fact that this function is used in the wp_[audio|video|playlist]_shortcode() functions), this is gonna need unit test coverage and security review.

I'd personally prefer to add an explicit wp_check_url_filetype() function instead, one which extracts the actual filename from the URL and passes it along to wp_check_filetype() (if appropriate, or duplicates parts of it)

@layotte
9 years ago

#20 in reply to: ↑ 19 @layotte
9 years ago

Replying to dd32:

If we're going to explicitly allow support for URLs (which we need to do by extension of the fact that this function is used in the wp_[audio|video|playlist]_shortcode() functions), this is gonna need unit test coverage and security review.

I'd personally prefer to add an explicit wp_check_url_filetype() function instead, one which extracts the actual filename from the URL and passes it along to wp_check_filetype() (if appropriate, or duplicates parts of it)

I've submitted a patch for this.

#21 @dd32
9 years ago

Sorry I never saw it, 30377.3.diff looks good to me at first glance.

#22 @johnbillion
9 years ago

  • Owner changed from wonderboymusic to johnbillion
  • Status changed from reopened to reviewing

Note to self: [32171]

This ticket was mentioned in Slack in #core by atomicjack. View the logs.


9 years ago

#24 follow-ups: @atomicjack
9 years ago

  • Keywords needs-refresh added; has-patch removed

We pull data from a product feed, with parameters in the URL. It works perfectly in 4.1.1, but breaks after any version past that, even with the latest -3 patch.

It is related to this bug.

#25 in reply to: ↑ 24 @zeen101
9 years ago

wrong account

Last edited 9 years ago by zeen101 (previous) (diff)

#26 in reply to: ↑ 24 ; follow-ups: @layotte
9 years ago

Replying to atomicjack:

We pull data from a product feed, with parameters in the URL. It works perfectly in 4.1.1, but breaks after any version past that, even with the latest -3 patch.

It is related to this bug.

Can you describe the setup and GET request, so I can rework the patch and test?

Last edited 9 years ago by layotte (previous) (diff)

#27 follow-up: @helen
9 years ago

  • Type changed from defect (bug) to enhancement

Changing to enhancement, as wp_check_filetype() was never meant for URLs and the solution in this case is to introduce something else that does work for URLs.

#28 in reply to: ↑ 27 @layotte
9 years ago

Replying to helen:

Changing to enhancement, as wp_check_filetype() was never meant for URLs and the solution in this case is to introduce something else that does work for URLs.

The wp_check_filetype() function is called several times on URLs in wp-includes/media.php. Including the Audio shortcode (which is where I ran into this bug). I believe this should be considered a bug. Especially if the function was never meant to be used for URLs.

Last edited 9 years ago by layotte (previous) (diff)

#29 in reply to: ↑ 26 @atomicjack
9 years ago

Replying to layotte:

Replying to atomicjack:

We pull data from a product feed, with parameters in the URL. It works perfectly in 4.1.1, but breaks after any version past that, even with the latest -3 patch.

It is related to this bug.

Can you describe the setup and GET request, so I can rework the patch and test?

Thanks for offering to help - currently we have a line like this:

<?php
$media_info = wp_check_filetype(basename($vebra_media_line['File_url']));

I understand it is the File_url that you will probably need? I'm waiting for an example of what gets passed into that to be sent to me, but if you can do anything with the above at all, that'd be helpful. :)

#30 in reply to: ↑ 26 @atomicjack
9 years ago

Replying to layotte:

Replying to atomicjack:

We pull data from a product feed, with parameters in the URL. It works perfectly in 4.1.1, but breaks after any version past that, even with the latest -3 patch.

It is related to this bug.

Can you describe the setup and GET request, so I can rework the patch and test?

Here's an example URL that gets passed in:

http://[BUCKET NAME].s3.amazonaws.com/[RANDOM CHARACTERS].jpg
Last edited 9 years ago by SergeyBiryukov (previous) (diff)

#31 @supercleanse
9 years ago

Yeah, our use case is for AWS urls as well ... but ours are expiring links so they include a question mark followed by a bunch of random characters.

Here is a real link we'd use ... it may not actually work because it's an expiring link but this will give you an idea of the type of URL that can be passed to this function:

https://wordup.s3.amazonaws.com/4992_IMG_8521_converted.mp4?AWSAccessKeyId=1D10DF3V7M34NWTQNBG2&Expires=1445884947&Signature=Mp8PDvQHYZJP69hdAFNgAr11X00%3D&_=1

Thanks

Last edited 9 years ago by supercleanse (previous) (diff)

#32 @spencercameron
9 years ago

Worth noting this also breaks importing when attachment URLs contain query parameters since the importer uses wp_check_filetype().

See: https://plugins.trac.wordpress.org/browser/wordpress-importer/trunk/wordpress-importer.php#L861

This ticket was mentioned in Slack in #core by ocean90. View the logs.


8 years ago

@ianmjones
8 years ago

wp_check_filetype now returns correct extension when URL has query params or fragment. Unit tests included.

#34 @ianmjones
8 years ago

  • Keywords has-patch added; needs-unit-tests needs-refresh removed

My 30377.4.diff patch allows wp_check_filetype to find the extension of a file in a URL when it also includes either query params or a fragment.

This is really important for use cases such as @supercleanse's where signed URLs are failing to instantiate the default media player due to wp_check_filetype not finding the extension.

The patch also fixes strip_fragment_from_url throwing "Undefined index: path" when a URL without a trailing slash is passed to it.

#35 @ianmjones
7 years ago

  • Type changed from enhancement to defect (bug)

30377.4.diff is still required on trunk in order to be able to use the built in media player in the admin area, and still applies cleanly to r40891.

#36 @aaemnnosttv
7 years ago

For anyone looking for a temporary workaround, I created this plugin: https://github.com/aaemnnosttv/fix-wp-media-shortcodes-with-params

#37 @paulschreiber
7 years ago

@helen @dd32 What steps need to be taken to get this fixed?

#38 @ocean90
6 years ago

#44657 was marked as a duplicate.

#39 @anthonyeden
6 years ago

Hi all,

Is there any chance someone from the Core team can look at this again? This ticket has been open for four years, and the [audio] shortcode with URL Parameters is still broken.

Are we able to merge this into core? Is there something that still needs to be done?

#40 @paulschreiber
6 years ago

@johnbillion What is blocking including this fix in a release?

This ticket was mentioned in Slack in #core by markparnell. View the logs.


5 years ago

#42 follow-up: @SergeyBiryukov
3 years ago

#54244 was marked as a duplicate.

#43 in reply to: ↑ 42 @Ruxton
3 years ago

oh my god. how on earth has this taken so long to resolve? am I going to have to manage a patch forever (according to this drawn out timeline) for a 1 line resolution?

@johnbillion
3 years ago

#44 @johnbillion
3 years ago

  • Keywords dev-feedback added; needs-testing removed
  • Type changed from defect (bug) to enhancement

30377.5.diff refreshes the patch and adds more tests, particularly ones which include URLs and paths which attempt to subvert the extension detection.

One test fails: http://example.mp3 incorrectly results in an extension of mp3, but this isn't strictly related to this change and AFAIK none of the extensions in wp_get_mime_types() are valid TLDs anyway.

This ticket was mentioned in Slack in #core by ruxton. View the logs.


3 years ago

#46 @audrasjb
3 years ago

  • Keywords needs-refresh added
  • Milestone changed from Future Release to 6.0

Moving for 6.0 consideration
Also marking as needs-refresh since the patch doesn't apply anymore against trunk.
Anyone is welcome to submit a patch refresh.
(see https://wordpress.slack.com/archives/C02RQBWTW/p1646400165837319).

Last edited 3 years ago by audrasjb (previous) (diff)

This ticket was mentioned in PR #2383 on WordPress/wordpress-develop by audrasjb.


3 years ago
#47

  • Keywords has-unit-tests added; needs-refresh removed

#48 @audrasjb
3 years ago

  • Keywords needs-testing added

The above PR refreshes the last patch proposed by @johnbillion.
Adding needs-testing keyword.

#49 @audrasjb
3 years ago

One test fails: http://example.mp3 incorrectly results in an extension of mp3, but this isn't strictly related to this change and AFAIK none of the extensions in wp_get_mime_types() are valid TLDs anyway.

👆 this is still an issue with the current PR. @johnbillion should we just remove the failing test?

#50 @johnbillion
3 years ago

  • Keywords dev-feedback removed

Yes let's remove that test please and get this in 👍

#51 @audrasjb
3 years ago

  • Keywords needs-dev-note commit added; needs-testing removed

Alright the tests are passing now and I tested the patch successfully.
Marking for a mention in the Misc changes devnote.
Self assigning for commit.

#52 @audrasjb
3 years ago

  • Owner changed from johnbillion to audrasjb
  • Status changed from reviewing to accepted

#53 @audrasjb
3 years ago

  • Resolution set to fixed
  • Status changed from accepted to closed

In 52829:

Media: Allow wp_check_filetype() to support query strings in URLs.

This changeset adjusts the regex in wp_check_filetype() to support query strings in URLs.

Follow-up to [30640], [32172].

Props voldemortensen, johnbillion, layotte, dd32, atomicjack, supercleanse, spencercameron, ianmjones, audrasjb.
Fixes #30377.

#55 @dd32
3 years ago

  • Resolution fixed deleted
  • Status changed from closed to reopened

@audrasjb Unfortunately this causes a PHP Fatal error for ms-files.php:

Uncaught Error: Call to undefined function strip_fragment_from_url() in wp-includes/functions.php:3009
Stack trace:
#0 wp-includes/ms-files.php(32): wp_check_filetype('wp-include...')
#1 {main}
 thrown

Source: GET https://make.wordpress.org/core/files/2022/01/list-view-keyboard-focus-1.mp4
Timestamp: Tue, 08 Mar 2022 00:28:30 +0000 (1646699310)
File: wp-includes/functions.php
Line: 3009

#56 @peterwilsoncc
3 years ago

  • Keywords needs-patch added; has-patch commit removed

I am being a little tedious to keep the milestone report keywords clean.

I think we'd need e2e tests to test this use case as I don't think phpunit's ms-files tests use SHORTINIT.

#57 @dd32
3 years ago

#55333 is a php notice that this change started triggering on WordPress.org, but is 'unrelated'.

#58 follow-up: @audrasjb
3 years ago

@dd32 I'm trying to reproduce the issue but I'm struggling a bit with ms-files.php, it's pretty hard to reproduce on a new install. Do you have an easy to reproduce use case (I mean outside w.org)?

#59 @SergeyBiryukov
3 years ago

As noted in comment:10, this was previously attempted in [30640] and reverted in [32171] as a security fix for WordPress 4.1.2. It would be great to confirm with the Security Team that [52829] does not reintroduce any issues.

#60 @peterwilsoncc
3 years ago

In 52832:

Media: Revert query string support for wp_check_filetype().

Revert [52829] due to fatal errors in some Multisite configurations.

Props dd32, SergeyBiryukov, audrasjb.
See #30377.

#61 in reply to: ↑ 58 @dd32
3 years ago

Replying to audrasjb:

@dd32 I'm trying to reproduce the issue but I'm struggling a bit with ms-files.php, it's pretty hard to reproduce on a new install.

Unfortunately I'm not aware of how to setup Multisite using ms-files.php, You probably need to manually flip the ms_files_rewriting site option, or perhaps even just request ms-files.php?file=relative/path/to/file.ext. I can say that strip_fragment_from_url is in canonical and not loaded in multisite SHORTINIT though.

#62 follow-up: @Ruxton
3 years ago

With the knowledge of what @dd32 just mentioned, it seems the viable solution outside of moving strip_fragment_from_url out of canonical (which is excessive) is to strip the fragment and beyond with strpos, etc. as the query string was.

<?php
$frag_pos = strpos( $filename, '#' );

if ( false !== $frag_pos) {
  $filename = substr($filename,0, $frag_pos-1)
}
?>

#63 in reply to: ↑ 62 ; follow-up: @dd32
3 years ago

Replying to Ruxton:

With the knowledge of what dd32 just mentioned, it seems the viable solution outside of moving strip_fragment_from_url out of canonical (which is excessive) is to strip the fragment and beyond with strpos, etc. as the query string was.

Using string functions to remove parts of a URL (both ?... and #...) is a potential security issue here (and why it would've been reverted prevously), as it's possible to cause it to strip more than anticipated and result in operating on the incorrect filename. You should really use wp_parse_url() instead too (It's kind of like parsing HTML with Regex, possible, but should be avoided).

Additionally, since the parameter can be a Filename, Path, or URL, you need to restrict any URL handling to URLs only, not filenames.

If a filename on disk looks like a url (You can have a filename on Linux that IS a url), you can't treat it as a URL either. If the filename came from an untrusted location, you can't treat it as a URL unless it's known to be intentionally a URL.

Pulling out two comments from above:

  • Comment by me: I'd personally prefer to add an explicit wp_check_url_filetype() function instead, one which extracts the actual filename from the URL and passes it along to wp_check_filetype() (if appropriate, or duplicates parts of it)
  • Comment by Helen: Changing to enhancement, as wp_check_filetype() was never meant for URLs and the solution in this case is to introduce something else that does work for URLs.

Even if the existing function is being passed URLs, it's not intentionally supposed to be used for URLs, and performing processing like this is doomed for failure.

Version 0, edited 3 years ago by dd32 (next)

#64 in reply to: ↑ 63 ; follow-up: @Ruxton
3 years ago

All very fair, someone actually wrote a patch with what you want and you responded to it at the time, then it somehow just drifted away from that.

If patch 3 is the solution you want, should we refresh that and include the tests from John's patch? Would that get us to a better place?

I'm just trying to get this across the line. It's old, annoying and a resolution is needed.

Replying to dd32:

Pulling out two comments from above:

  • Comment by me: I'd personally prefer to add an explicit wp_check_url_filetype() function instead, one which extracts the actual filename from the URL and passes it along to wp_check_filetype() (if appropriate, or duplicates parts of it)
  • Comment by Helen: Changing to enhancement, as wp_check_filetype() was never meant for URLs and the solution in this case is to introduce something else that does work for URLs.

Even if the existing function is being passed URLs, it's not intentionally supposed to be used for URLs, and performing processing like this is doomed for failure.

#65 in reply to: ↑ 64 @SergeyBiryukov
3 years ago

Replying to Ruxton:

If patch 3 is the solution you want, should we refresh that and include the tests from John's patch? Would that get us to a better place?

Yes, I think that would be the preferred direction here.

#66 @peterwilsoncc
3 years ago

  • Milestone changed from 6.0 to Future Release

I'm moving this off the milestone, it's getting close to the beta and there hasn't been any additional patches since it was reverted earlier in this release cycle.

Note: See TracTickets for help on using tickets.