Opened 8 years ago
Last modified 2 years ago
#39963 reopened enhancement
MIME Alias Handling
Reported by: | blobfolio | Owned by: | |
---|---|---|---|
Milestone: | Future Release | Priority: | normal |
Severity: | normal | Version: | |
Component: | Media | Keywords: | has-patch has-unit-tests needs-testing |
Focuses: | administration | Cc: |
Description
WordPress currently only supports a single MIME type for a given extension. This is good enough in many cases, but begins to cause problems like #39550 when a library references an alternative MIME type (like audio/mpeg vs audio/mp3).
We probably don't want to introduce changes to the basic MIME functions as that could have crazy-far-reaching consequences, but we could add a wp_check_mime_alias($filename, $mime, $allowed_mimes)
function, and query it in places where validation-by-MIME is happening.
This would require a comprehensive collection of MIMEs, past, present, vernacular, etc. I am actually already maintaining such a database for another PHP library that combines results from Nginx, Apache, freedesktop.org, and IANA. So let's start from the assumption that the necessary data already exists for retrieving all possible MIME types for a given extension.
I can whip up a patch with the added functionality, but wanted to first get some feedback on where/how the data should be incorporated (it is a rather big list, a PHP array would go on for miles.. I'm keeping it in a JSON file currently), and what existing areas might need to plug into that.
@joemcgill, do you have any initial thoughts?
Attachments (10)
Change History (52)
This ticket was mentioned in Slack in #core-media by blobfolio. View the logs.
8 years ago
This ticket was mentioned in Slack in #core-media by blobfolio. View the logs.
8 years ago
#4
@
8 years ago
The MIME list has been extended to include more results:
-- MacOS file types, namely those Apple didn't register with IANA)
-- parent classes (likeapplication/ogg
, which some systems return instead of specific types likevideo/ogg
)
There's now a third function:
<?php /** * Retrieve the "real" file type from the file. * * This extends `wp_check_filetype()` to additionally * consider content-based indicators of a file's * true type. * * The content-based type will override the name-based * type if available and included in the $mimes list. * * @since xxx * * @see wp_check_filetype() * @see wp_check_filetype_and_ext() * * @param string $file Full path to the file. * @param string $filename The name of the file (may differ from $file due to $file being in a tmp directory). * @param array $mimes Optional. Key is the file extension with value as the mime type. * @return array Values with extension first and mime type. */ function wp_check_real_filetype( $file, $filename = null, $mimes = null ) { ... }
The idea is to abstract the type-checking from other sanitizing/handling being performed in wp_check_filetype_and_ext()
so that it can be called from other functions that might need a more robust type determination. wp_check_filetype_and_ext()
could then be reduced so that it calls this function, then handles its specific validation checks.
#5
@
8 years ago
Even with the patch that landed in 4.7.3 (#39550) a lot of users are still running into validation issues. When fileinfo
returns an incorrect/old type, chances are it will be application/*
, which wp_check_filetype_and_ext()
subjects to the extra testing.
Thus landing MIME alias handling as a mitigation sooner rather than later would be good.
The database in wp_get_mime_aliases()
is refreshed at least once monthly, and by its nature will need to be re-pushed to the WP Core indefinitely (gotta keep pace!). So any edge cases that might still be missing now should not be a blocking concern; with 1751 MIMEs at the time of this writing, it already addresses most problems people are likely to hit.
What I see as blocking:
(1) We need eyes on the functions themselves. Does anyone see any issues in format, structure, technique?
(2) The quickie patch at #40017 needs to land so exif
fallbacks (for environments without fileinfo
) can be added to wp_check_real_filetype()
.
(3) Layout feedback. Where should these functions ultimately go? I suggest we place the main payload function (wp_get_mime_aliases()
) in its own file because of its size, while moving the others to wp-includes/functions.php
. Those two functions can require_once
the standalone MIME file so we aren't reading such a large file into memory unnecessarily.
With #1-3 out of the way, we can:
(4) Get a patch together, including a reduction of wp_check_filetype_and_ext()
to remove duplicate behaviors, utilize alias checking.
(5) Add unit tests for the core.
(6) Profit!
This ticket was mentioned in Slack in #core-media by blobfolio. View the logs.
8 years ago
#7
follow-up:
↓ 8
@
7 years ago
Would this also cover web fonts (x-font-woff => font-woff)? That would really help my team.
~ thx, Eddie
#8
in reply to:
↑ 7
@
7 years ago
Replying to eddiemcham:
Would this also cover web fonts (x-font-woff => font-woff)? That would really help my team.
Yup, although application/font-woff
isn't technically correct either (see e.g. https://www.iana.org/assignments/media-types/font/woff).
In a nutshell, the point of this enhancement is to deprovincialize WordPress' singular concept of MIMEs (in both time and space). Aside from increasing the explicit MIME relational data by over 17x what WordPress has currently (making it the largest single collection on the planet, haha), it also has automatic detection for whatever/x-variants
(e.g. your x-font-woff
vs font-woff
, non-MIME parent class nonsense (like application/ms-office
, which is not, nor has never been, a valid media type), and won't, by default, penalize generic application/octet-stream
associations (which, while it can be a valid type, is more often just fileinfo
's equivalent of a shrug).
WordPress isn't likely going to rollback or narrow the security fix that is causing all the collateral damage, but at least with MIME alias support, we can seriously mitigate almost all of the incorrect identifications people are seeing on various server environments (while, crucially, not defeating the purpose of the security fix in the first place!).
If you need a fix sooner rather than later, shoot me a message on Slack or email me (contact info can be found at the URL linked in my profile; it gets stripped if I try to post it directly here) and I can package this up as a quickie plugin for you. One of my clients is a font foundry and has had no problem uploading WOFF, WOFF2, TTF, OTF, SVG, etc., files to their site with this workaround.
I know you said you didn't want to add more overhead to your sites, which is understandable, but a surgical workaround is going to be a lot safer than allowing unfiltered uploads or disabling the security checks altogether. ;)
#9
@
7 years ago
Thanks, @blobfolio, for the offer. Let me talk this over with my colleagues and team leads before I dig any deeper. They (and/or our IT dept) may have additional ideas. If they give me the go-ahead, I'll reach out to you via Slack or some other means. :)
#10
follow-up:
↓ 11
@
7 years ago
@blobfolio,
I'm testing a workaround where colleagues upload their WOFFs to an FTP folder I've created for them. Then I'll modify our CSS rendering code to look there for the WOFFs instead of the Media Library. No more MIME types, (super)admin lockouts, or security error messages.
Thanks for all your help, though! :)
#11
in reply to:
↑ 10
@
7 years ago
Replying to eddiemcham:
@blobfolio,
I'm testing a workaround where colleagues upload their WOFFs to an FTP folder I've created for them. Then I'll modify our CSS rendering code to look there for the WOFFs instead of the Media Library. No more MIME types, (super)admin lockouts, or security error messages.
Glad to hear you found your own workaround! The only caution I'd throw out there is that unbridled FTP access could pose an even greater danger than anything conducted through the WordPress UX. If possible, I recommend using SSH keys instead of passwords for your users and jailing access to that one directory. :)
This ticket was mentioned in Slack in #core-media by blobfolio. View the logs.
7 years ago
#14
@
7 years ago
I put together a proof of concept patch:
wp-includes/functions.php
now contains:
--
wp_check_real_filetype()
(function and filter)
--wp_check_mime_alias()
(function and filter)
--wp_check_application_octet_stream
(filter)
-- updatedwp_check_filetype_and_ext()
(see below)
wp-includes/media-mimes.php
--
wp_get_mime_aliases()
(function and filter)
tests/phpunit/tests/functions.php
-- updated
big5.txt
test
This demonstrates the benefits of MIME alias handling by allowing for more robust type matching, increased upload file validation (ALL files are subject to type evaluation when possible), and provides some UX improvements (ALL incorrectly named files, if otherwise valid, are renamed with the correct extension).
wp_check_real_filetype()
begins with a name-based approach (i.e. wp_check_filetype()
). If that fails, the failure is passed on. If it succeeds, it attempts to evaluate the "real" type using EXIF (not yet implemented, waiting on #40017), or that failing, FILEINFO. If either evaluation succeeds, the "real" type is compared against the known aliases for the file extension. If the alias is good, the name-based type is returned (i.e. WordPress' hardcoded definitions take priority). If the real MIME does not match the extension, but it is whitelisted, that MIME and the *correct* extension are returned. If the "real" MIME is not whitelisted, false
is returned. If no content-based evaluation can be performed, the name-based results are returned.
wp_check_mime_alias()
has automatic handling for temporary x-subtype
/subtype
variations (e.g. application/font-woff
and application/x-font-woff
are considered equivalent). By default it also soft-matches application/octet-stream
against any extension, as that tends to be the response returned by a server when it doesn't know what a file is. That behavior can be overridden using the wp_check_application_octet_stream
filter. All checks are case-insensitive and strip out invalid characters.
wp_check_filetype_and_ext()
is updated to call wp_check_real_filetype()
instead of wp_check_filetype()
. This covers whitelist checks and type-based evaluation. Renaming is now applied to all files, not just the small subset of image types in the original version. image/*
and application/*
checks are removed as unnecessary (all content is evaluated now). The ultimate determinations are still filterable as before.
All original PHPUnit tests pass, with the exception of test which passes big5.txt
as a JPEG; because of the improvements, the result is correctly identified and accepted as a text/plain
file. ;) This patch updates the test accordingly.
$ phpunit tests/phpunit/tests/functions.php Installing... Running as single site... To run multisite, use -c tests/phpunit/multisite.xml Not running ajax tests. To execute these, use --group ajax. Not running ms-files tests. To execute these, use --group ms-files. Not running external-http tests. To execute these, use --group external-http. PHPUnit 5.4.6 by Sebastian Bergmann and contributors. ................................................................. 65 / 83 ( 78%) .................. 83 / 83 (100%) Time: 918 ms, Memory: 24.00Mb OK (83 tests, 565 assertions)
It is a lot to digest, I know. But the benefits are numerous. It mitigates the issues in #40175, but happens to do so in a way that increases upload security, and file integrity more generally (for example, users won't accidentally hear a screeching "MP3" because that MP3 is really an OGG).
The MIME database (media-mimes.php
) will need to be indefinitely maintained, as MIME data is always changing. That data, however, is automatically, regularly re-built independently of WordPress. I would propose we aim to update the data once per major release, a task I am more than happy to adopt.
Speaking of, the MIME data in this patch is at roughly 1750 entries. While that data will never be 100% complete, as-is it already improves WordPress' chances of correctly identifying a file by over 2000%.
This ticket was mentioned in Slack in #core-media by blobfolio. View the logs.
7 years ago
@
7 years ago
handle application/CDFV2* types, implicit match of vnd.* variations, unit tests for variable XLS files
#16
@
7 years ago
New patch up! Changes include:
-- Rebuilt against current trunk
-- Catch genericapplication/CDFV2*
MS types https://core.trac.wordpress.org/ticket/40175#comment:8
-- Implicit type checking now includesvnd.*
variants
-- Added wp_check_mime_alias unit tests covering all the different types of XLS files found to date (now up to *4* different MIME types returned by any single instance offileinfo
, haha)
This ticket was mentioned in Slack in #core-media by blobfolio. View the logs.
7 years ago
#19
@
7 years ago
The Features Plugin is yet another way to give this a test!
The code is identical to this ticket's latest patch (save for a couple changed paths), except by necessity it has to duplicate the validation work after wp_check_filetype_and_ext()
has finished its run (which it does by hooking into the eponymous filter).
The code for the plugin version is hosted at https://github.com/Blobfolio/blob-mimes/tree/master/build/WordPress/feature-plugin, but again, contains the same functions as the patch.
For those interested in unit testing, please apply the patch instead.
#20
follow-up:
↓ 21
@
7 years ago
Hello! I tried the patch and it solved the problem for a couple of files I had, but still won't allow me to upload a .xlsx file with application/CDFV2-encrypted
MIME type (password protected file).
#21
in reply to:
↑ 20
@
7 years ago
Replying to hvianna:
Hello! I tried the patch and it solved the problem for a couple of files I had, but still won't allow me to upload a .xlsx file with
application/CDFV2-encrypted
MIME type (password protected file).
Thanks for the feedback! I'll do a little more research this morning and put up a new patch. Right now the application/CDFV2...
workaround applies to 3-letter Office types (like DOC, XLS, etc.), but not the 4-letter types (DOCX, XLSX, etc.).
#22
@
7 years ago
@hvianna Would you mind giving mimes.3.diff
a shot and see if that helps with your XLSX file?
For those using the plugin, please update to blob-mimes.2.zip
. The plugin (as of 0.1.2
) now hooks into WordPress' update hooks, so new version notifications and updates should start coming through the usual way (i.e. you won't have to keep coming back here for new zips). If it looks like it will take a while for the patch to land in the Core, the plugin source will be moved from Github to the main WP Repo, but will transition automatically so long as you're running 0.1.2+
. :)
#23
follow-up:
↓ 24
@
7 years ago
@hvianna
Updated the plugin to blob-mimes.2.zip and this .xlsx test file uploaded fine on a local dev site running WP 4.7.3.
#24
in reply to:
↑ 23
@
7 years ago
Replying to lukecavanagh:
@hvianna
Updated the plugin to blob-mimes.2.zip and this .xlsx test file uploaded fine on a local dev site running WP 4.7.3.
Thanks @lukecavanagh. Because of legacy code living within the fileinfo.so
extension itself, a single PHP server might return a different MIME type from one XLSX file to another.
The class that was throwing @hvianna for a loop earlier, application/CDFV2-xxx
, is a sort of catch-all that is used when a document contains ambiguous, corrupt, or unknowable magic headers (usually because of features like encryption or compression). Some XLSX files will trigger it, some won't.
We'll have to wait for @hvianna to test the exact file from earlier to be sure, but it should be good now.
#25
follow-up:
↓ 26
@
7 years ago
@blobfolio @lukecavanagh I confirm it's now working! Thank you @blobfolio for your work on this issue!
#26
in reply to:
↑ 25
@
7 years ago
Replying to hvianna:
@blobfolio @lukecavanagh I confirm it's now working! Thank you @blobfolio for your work on this issue!
Awesome, thanks for confirming! I know patches can be a bit of a pain to work with; I appreciate you taking the time.
This ticket was mentioned in Slack in #forums by lukecavanagh. View the logs.
7 years ago
This ticket was mentioned in Slack in #core-media by blobfolio. View the logs.
7 years ago
#29
@
7 years ago
The plugin implementation of this patch has been updated to 0.1.3
!
The new release includes a handy upload debug tool for users with manage_options
privileges (e.g. admins) under Tools > Debug File Validation
. If at any point in the future the Media Library is rejecting something it shouldn't, please use that tool to gather some useful feedback information and post it here (or email me if you aren't comfortable posting it publicly).
The plugin is now hosted in the main WordPress Plugins repository (https://wordpress.org/plugins/blob-mimes/). If you are currently running 0.1.2
, you should get the orange upgrade notice automatically once the version cache clears (no later than tomorrow, surely). Once you update, you'll be on the WP repo version.
If you're using an older version of the plugin or running it from a server that can't reach Github (the old host), please remove the plugin and re-install it from the main repo so you can catch up.
That's all, folks. Happy Friday!
This ticket was mentioned in Slack in #core-media by blobfolio. View the logs.
7 years ago
#32
@
7 years ago
- Resolution set to wontfix
- Status changed from new to closed
This solution is not something the WP Core is interested in pursuing, so I'm closing the ticket as WONTFIX.
I still think there is value in providing more complete file validation during the upload process, so will keep the enhancement alive in plugin form (https://wordpress.org/plugins/blob-mimes/). I've renamed it to Lord of the Files
and extended its focus to cover upload issues more generally. To that end, version 0.5.0
disentangles it from this ticket's patch (moves stuff out of wp_* namespaces, etc.), and adds SVG sanitization for sites that have whitelisted that file type.
Developer reference has been moved to https://github.com/Blobfolio/blob-mimes/tree/master/wp, so anybody interested in hooking into the provided filters, etc., should take a peek.
Thanks everyone for all the discussion and feedback!
This ticket was mentioned in Slack in #core-media by joemcgill. View the logs.
7 years ago
#35
@
7 years ago
- Milestone set to 5.0
- Resolution wontfix deleted
- Status changed from closed to reopened
After thinking about this for a while now, I want to take another look at adding MIME aliasing. A few concerns that I'd want us to look into:
- Let's make sure there is a solid plan for keeping the MIME database up to date so things don't go stale.
- I'm curious if we could limit the (massive) list in mimes.3.diff to only include mimes that are currently supported by WordPress, while allowing the list to be extended to a more full list for plugins that want to provide that functionality, but that's not a blocker.
- I like the idea of putting application/octet stream MIMES through an extra set of checks, similarly to what is proposed in #40175, but we may need to read the first few bytes of files ourselves rather than relying on an external library, like fileinfo.
I'd also like to get feedback from folks at larger hosting companies or from WP.com, who have had to add additional checking around MIMES. @mikeschroder, I'd appreciate your feedback there.
#36
@
7 years ago
Thanks @joemcgill!
I just attached an example data file. It is only around 150KB and contains information on 1978 different file extensions. Variations like x-
and vnd.
have been stripped out; those will be discovered as part of the alias matching function (separate from this file), so that saves some space.
My thinking is that every aspect of this enhancement is progressive. The existing upload_mimes
functionality shouldn't be messed with, and whatever MIME type that list declares something to be, that's what should ultimately be used (when e.g. saving an attachment).
What we'd be adding with this enhancement (aside from the alias array) are a couple small functions to check that MIME/ext pairs are appropriate (particularly in cases where the MIME comes from outside WP and might disagree with the upload_mimes
entry). Those functions could then be added to wp_check_filetype_and_ext
to make upload validation more accurate (we can detect and block bad files, but also fix incorrectly named ones).
Insofar as maintenance, the only piece that will need to be updated is the MIME data itself. I already have a build script for this, so it is really easy to regenerate. I'll push it somewhere public once we're in agreement on the file contents. As-is, the file contains a function and filter called wp_get_mime_aliases
. I don't think it needs more than that.
I'm curious if we could limit the (massive) list in mimes.3.diff to only include mimes that are currently supported by WordPress...
I think it would be better to handle all files out-of-the-box. Many users add types to WordPress through plugins, and those plugins aren't going to be capable of populating this data on their own. Again, we're only looking at about 150KB of data, so I'd rather not risk the UX/security implications from shortening it.
I like the idea of putting application/octet stream MIMES through an extra set of checks...
With proper alias handling, we should be checking all uploaded content. That's the real security benefit here. Initially I think we should just continue using fileinfo.so
when available, but down the road we could definitely make our own Magic Header script to strategically examine files for users who don't have that PHP extension installed.
My plan is to put together an updated patch for this this weekend! I'll post it here when that's ready.
#37
@
7 years ago
All rightie, got a new patch in place (39963.diff
and a few files for unit tests in phpunit-data.zip
).
Functions:
This patch includes 5 new functions.
The file-renaming logic that sometimes happens during a file upload has been abstracted into its own function, wp_update_filename_extension()
. This simply replaces the extension on the end of the file name.
/**
* Assign a new extension to a filename.
*
* @since 5.0.0
*
* @param string $filename The original filename.
* @param string $ext The new extension.
* @return string The renamed file.
*/
function wp_update_filename_extension( $filename, $ext ) {
In terms of actual MIME alias work, the most basic check is wp_check_mime_alias()
, which is used to see whether or not a given MIME type is appropriate for a given file extension.
/**
* Check extension and MIME pairing.
*
* @since 5.0.0
*
* @param string $ext File extension.
* @param string $mime MIME type.
* @return bool True/false.
*/
function wp_check_mime_alias( $ext = '', $mime = '' ) {
To go a little deeper, we have wp_check_allowed_mime_aliases()
, which checks to see whether a given MIME type is in the official upload_mimes
list, or is an alias of a type that is in that list. The upload_mimes
types always take priority. This will return an array with ext
and type
keys (the type being the official whitelist entry) if it is allowed, otherwise false
.
/**
* Check Allowed Aliases
*
* This will cycle through each allowed ext/MIME pair to see if an alias
* matches anything.
*
* @since 5.0.0
*
* @param string $alias MIME alias.
* @param array $mimes Allowed MIME types.
* @return array|bool Array containing ext and type keys or false.
*/
function wp_check_allowed_mime_aliases( $alias, $mimes = null ) {
Finally, we have a function that applies this alias-matching to an actual file. This uses a tiered approach to determine the best guess about what kind of a file a file actually is, and whether or not it is allowed by WP.
This staged approach begins with a basic call to wp_check_filetype()
, which determines type solely based on the file name. After that it will evaluate content via EXIF
and fileinfo.so
(if it is able to do so). If the "real" MIME differs from the name-based MIME, alias matching comes into play, and if that file is still allowed, the ext
and type
returned are updated accordingly to match reality.
Like the original wp_check_filetype()
function, invalid/illegal files result in an array with ext
and type
set to false
being returned.
/**
* Retrieve the "real" file type from the file.
*
* This extends `wp_check_filetype()` to additionally consider content-
* based indicators of a file's true type.
*
* The content-based type will override the name-based type if available
* and included in the $mimes list.
*
* A false response will be set if the extension is not allowed, or if a
* "real MIME" was found and that MIME is not allowed.
*
* @since 5.0.0
*
* @see wp_check_filetype()
* @see wp_check_filetype_and_ext()
*
* @param string $file Full path to the file.
* @param string $filename The name of the file (may differ from $file due to $file being in a tmp directory).
* @param array $mimes Optional. Key is the file extension with value as the mime type.
* @return array Values with extension first and mime type.
*/
function wp_check_real_filetype( $file, $filename = null, $mimes = null ) {
And last but not least, we have a function that returns the full database of MIME aliases, which is auto-generated and stored in wp-includes/media-mimes.php
. This data is an array, each key is a file extension, the values are arrays of MIME type(s) for said extension.
/**
* Return MIME Aliases
*
* @since 5.0.0
*
* @return array MIME aliases organized by file extension. See data below for example.
*/
function wp_get_mime_aliases() {
Filters:
Many of these new functions have eponymous filters that users can hook into to change the results:
wp_get_mime_aliases
wp_check_real_filetype
wp_check_mime_alias
There is one additional filter that needs a bit of background:
Most libraries will assign a file type of application/octet-stream
to any file they don't understand. It is the equivalent of "I dunno." Because of this, the wp_check_mime_alias()
function implicitly passes (i.e. returns true
) for any extension paired with this unhelpful type.
The filter wp_check_mime_alias_application_octet_stream
was added to allow users to decide whether or not application/octet-stream
deserves full checks or not. (true
means actually check this, false
(the default) means don't worry.)
Behavior:
The behavioral changes from this patch all stem from modifications to the existing wp_check_filetype_and_ext()
function, which is called as part of the file upload process.
Previously, that function ran some content-based checks on certain types of files, but did so in a way that resulted in false positives and negatives.
Now that functionality is offloaded to wp_check_real_filetype()
.
First and foremost, this means that all uploaded files are now subjected to content-based analysis (for environments that support it). This fixes the security issues the changes in 4.7.1
were meant to handle, but also expands them to fix a lot of similar, unpublished issues.
This also means that WordPress can now intelligently rename files that are using the wrong extension (but are otherwise allowed).
Testing:
This functionality has existed in the Lord of the Files
plugin for some time and worked wonders there, but please give this patch a shot to make sure it behaves the same way. (Also, be sure to deactivate LotF
if it is installed on your test sites.)
For those interested in checking out the build script (that compiles the media-mimes.php
file), you can find that at https://github.com/Blobfolio/wp-blob-mimes.
To get the ball rolling, here's an example of what I was thinking: https://github.com/Blobfolio/blob-mimes/blob/master/build/WordPress/media-mimes.php
This script includes all the necessary data, and two functions.
First, a function to return all known MIME types for a given file extension:
And second, a function to check whether an arbitrary file extension/MIME pairing is valid. This also automatically checks for both
type/subtype
andtype/x-subtype
variants, thex-
prefix being something commonly used during a file type's transformative years, but possibly missing from any official database.This second function could be added to places like
wp_check_filetype_and_ext
where MIME data is pulled from somewhere variable (e.g.finfo
). Rather than comparing$real_mime
to the results ofwp_check_filetype()
, we can evaluate withwp_check_mime_alias( $type, $real_mime )
.