Opened 8 years ago
Closed 3 years ago
#38231 closed enhancement (fixed)
Allow download_url to respect content-disposition header
Reported by: | cklosows | Owned by: | johnjamesjacoby |
---|---|---|---|
Milestone: | 5.9 | Priority: | normal |
Severity: | normal | Version: | 4.7 |
Component: | HTTP API | Keywords: | has-patch has-unit-tests |
Focuses: | Cc: |
Description
In #34938 it was proposed to use a filter in the wp_tempnam
function which was rejected in favor of ignore the query string when downloading a file. This, however can still propose a problem when someone is the function to download a tokenized file where the path can be long.
In the case where the headers of the response provide a content-disposition
header including a file name, it would be ideal to have the download_url
function respect this file name and move the file name (which by default is the token in the path) to the file name requested.
Attachments (11)
Change History (29)
#3
@
4 years ago
- Keywords needs-testing added
- Milestone changed from Awaiting Review to 5.8
- Owner set to johnjamesjacoby
- Status changed from new to assigned
#5
follow-up:
↓ 7
@
3 years ago
- Keywords needs-testing removed
38231.1.diff improves on @cklosows's original patch by:
- Explicitly checking the return value of
preg_match()
- Returning an error when the call to
wp_tempnam()
fails - Returning an error when the call to
rename()
fails
38231-tests.1.diff adds an extra assertion to make sure the file is actually created.
I've also now manually tested this and can confirm the Content-Disposition
header is respected when calling download_url()
.
#6
@
3 years ago
- Milestone changed from 5.8 to 5.9
Today is feature freeze for the 5.8 release. This one is really close! But unfortunately the time for including in 5.8 has passed.
I'm going to punt to 5.9
instead of Future Release
since there is a recent momentum, and @psrpinto has done a great job iterating on the approaches here so far.
#7
in reply to:
↑ 5
@
3 years ago
Replying to psrpinto:
38231.1.diff improves on @cklosows's original patch by:
- Explicitly checking the return value of
preg_match()
- Returning an error when the call to
wp_tempnam()
fails- Returning an error when the call to
rename()
fails
Hi @psrpinto! Thanks for the improved patch :)
In general, WordPress tries to use the most basic loose boolean checking it can, so for example (I'm being nit-picky here, since you're new to Core patches, this is intended to be constructive feedback)
$content_disposition = wp_remote_retrieve_header( $response, 'Content-Disposition' ); if ( ! empty( $content_disposition ) && 1 === preg_match( '/filename="([^ ]+)"/', $content_disposition, $matches ) ) {
can be simplified to simply:
$content_disposition = wp_remote_retrieve_header( $response, 'Content-Disposition' ); if ( $content_disposition && preg_match( '/filename="([^ ]+)"/', $content_disposition, $matches ) ) {
as a) We only want to know that $content_disposition
is 'something' the preg_match will validate it further and b) any truthful value return is acceptable.
I personally think in the event that the Content-Disposition
filename can't be used, it should be something like this, what do you think?
set tmpName = create random name tmp file if ( disposition header is set ) { set tmpNameDisposition = create disposition-named tmp file if ( can create tmpNameDisposition AND can rename tmpName to tmpNameDisposition ) { set tmpName = tmpNameDisposition continue on with processing } else { continue on like no error was encountered, tmpName is still set to random } } /// .. continue on
So that if the header is invalid, can't be created on the filesystem, or something else odd the function can still succeed without erroring out. As this function is used within the WordPress update process, failing on some edgecase here could be disastrous.
In other words, treating this as a progressive enhancement if possible but retaining existing behaviour in the event that it can't be used.
Other edgecases worth considering here:
filename="file.zip"
andfilename=file.zip
are both valid but the patch only handles the former- What if the filename in the URL and in the Content-Disposition header match? For example
https://wordpress.org/wordpress-5.7.2.zip
#8
@
3 years ago
Thanks for the feedback @dd32! I agree with pretty much all your points so I've now refactored the code to reflect your suggested changes.
Please do let me know if you have further suggestions, they will be very welcome and I will be happy to consider them.
What if the filename in the URL and in the Content-Disposition header match?
Filenames are generated through wp_tempnam()
so there can be no collisions, since that method makes sure the filename is unique. So even if the filename in the URL and in the Content-Disposition header are the same, the temporary files that are created will have unique names.
Changes in 38231.2.diff:
- No longer check that
$content_disposition
is not empty before the call topreg_match()
. If it is empty, it won't match, so that check is unnecessary. (wp_remote_retrieve_header()
returns either a filename or an empty string). - Treat Content-Disposition as "progressive enhancement", falling back to original randomly-generated filename in case of error.
- Support the case where the header contains no quotes:
Content-Disposition: filename=foo.txt
Changes in 38231-tests.2.diff
- Add test for the case where the header contains no quotes:
Content-Disposition: filename=foo.txt
#9
@
3 years ago
38231-tests.2.diff Looks good to me @psrpinto! Thanks
The only thought I have is that the unlink could be moved to the else to avoid a file_exists()
call, but that's all I got :) I'll leave that up to you or whomever commits this.
if ( tmpfname_disposition && rename ) { set ... } else if ( tmpfname_disposition ) { unlink tmpfname_disposition }
#10
@
3 years ago
After testing the patch and doing some research, I believe this will need a bit more work.
(I've started that work, and will post an updated patch in the coming days.)
Citations:
- MDN: https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Content-Disposition
- RFC6266: https://datatracker.ietf.org/doc/html/rfc6266
In short:
- The
$header
parameter ofwp_remote_retrieve_header()
needs to be lowercase – I'm opening a new core ticket imminently to give this more thought - The regex needs tweaking to account for relative paths and the alternate
filename*
directive:filename
Is followed by a string containing the original name of the file transmitted. The filename is always optional and must not be used blindly by the application: path information should be stripped, and conversion to the server file system rules should be done. This parameter provides mostly indicative information. When used in combination with Content-Disposition: attachment, it is used as the default filename for an eventual "Save As" dialog presented to the user.
filename*
The parameters "filename" and "filename*" differ only in that "filename*" uses the encoding defined in RFC 5987. When both filename and filename* are present in a single header field value, filename* is preferred over filename when both are understood.
Warning: The string following filename should always be put into quotes; but, for compatibility reasons, many browsers try to parse unquoted names that contain spaces.
Possible implications from the RFC, to confirm with additional unit tests:
It is essential that recipients treat the specified filename as advisory only, and thus be very careful in extracting the desired information. In particular: o Recipients MUST NOT be able to write into any location other than one to which they are specifically entitled. To illustrate the problem, consider the consequences of being able to overwrite well-known system locations (such as "/etc/passwd"). One strategy to achieve this is to never trust folder name information in the filename parameter, for instance by stripping all but the last path segment and only considering the actual filename (where 'path segments' are the components of the field value delimited by the path separator characters "\" and "/"). o Many platforms do not use Internet Media Types ([RFC2046]) to hold type information in the file system, but rely on filename extensions instead. Trusting the server-provided file extension could introduce a privilege escalation when the saved file is later opened (consider ".exe"). Thus, recipients that make use of file extensions to determine the media type MUST ensure that a file extension is used that is safe, optimally matching the media type of the received payload. o Recipients SHOULD strip or replace character sequences that are known to cause confusion both in user interfaces and in filenames, such as control characters and leading and trailing whitespace. o Other aspects recipients need to be aware of are names that have a special meaning in the file system or in shell commands, such as "." and "..", "~", "|", and also device names. Recipients SHOULD ignore or substitute names like these. Note: Many user agents do not properly handle the escape character "\" when using the quoted-string form. Furthermore, some user agents erroneously try to perform unescaping of "percent" escapes (see Appendix C.2), and thus might misinterpret filenames containing the percent character followed by two hex digits.
#11
@
3 years ago
38231.3.diff confirms (with unit tests) that:
wp_tempnam()
prevents path traversal- single, double, and no quotes all work
- both
filename=
andfilename*=
now supported
This ticket was mentioned in Slack in #core by jjj. View the logs.
3 years ago
#13
@
3 years ago
38231.4.diff evolves 3.diff with the following feedback from @costdev:
- Prefer
data_
overprovider_
to match existing data pattern from other tests - Lower case
content-disposition
in our fake headers arrays, becausewp_remote_retrieve_header()
won't have the luxury of extendingRequests_Utility_CaseInsensitiveDictionary
in our filtered data
#14
@
3 years ago
38231.5.diff includes the following improvements:
- Prefers
assertMatchesRegularExpression
overassertContains
for future PHPUnit compatibility - Adds array keys to data provider for easier identification
#15
@
3 years ago
Are you aware of WP_REST_Attachments_Controller::get_filename_from_disposition()
(Reference)?
@
3 years ago
Alternate approach, using WP_REST_Attachments_Controller::get_filename_from_disposition()
#16
@
3 years ago
Hey @ocean90 👋 thank you for mentioning that. I was unaware and it seems like a good idea to use it.
- It is a
public static
method so intended to be used elsewhere - It is always included by WordPress core
- It has an array of unit tests already to match RFC6266
38231.6.diff updates the approach to use it over the bespoke regular expression.
#17
@
3 years ago
38231.7.diff includes the following iterations:
- Abandons
WP_REST_Attachments_Controller::get_filename_from_disposition()
approach. @costdev and I collaborated privately for a few full days, and concluded that it does not fully sanitize a filename from an untrusted source in the many ways thatdownload_url()
can be used. - Looks for
'attachment; filename='
instead offilename=
. Per the specification, theContent-Disposition
header only triggers save-as dialogs when theattachment;
"type" is sent with it. (Theinline;
type would be displayed as a web page instead of being downloaded anyways. The type is a required parameter.) - Uses
sanitize_file_name()
to ensure that the suggestedfilename
value is as close to acceptable for the web server and WordPress rules as possible. The nice thing about this is that values with a period in them will automatically be checked againstwp_get_mime_types()
usingwp_check_filetype()
. The value in doing this here (early) is that the temporary file name can still be used if the suggestion happens to be illegal. - Uses
validate_file()
to avoid directory traversal. This adequately prevents thefilename
value from including relative file paths and writing to unintended locations in the file system. - Includes 6 separate tests with 9 total assertions. These tests confirm that valid file name values are correctly saved, and invalid file name values are correctly rejected.
Massive props again to @costdev for going the extra mile with me behind the scenes on this. 🙏
Barring anything we've missed, this patch is ready to be committed.
Let's try to make this happen 5.8.