Make WordPress Core

Opened 8 years ago

Last modified 2 years ago

#39963 reopened enhancement

MIME Alias Handling

Reported by: blobfolio's profile blobfolio Owned by:
Milestone: Future Release Priority: normal
Severity: normal Version:
Component: Media Keywords: has-patch has-unit-tests needs-testing
Focuses: administration Cc:

Description

WordPress currently only supports a single MIME type for a given extension. This is good enough in many cases, but begins to cause problems like #39550 when a library references an alternative MIME type (like audio/mpeg vs audio/mp3).

We probably don't want to introduce changes to the basic MIME functions as that could have crazy-far-reaching consequences, but we could add a wp_check_mime_alias($filename, $mime, $allowed_mimes) function, and query it in places where validation-by-MIME is happening.

This would require a comprehensive collection of MIMEs, past, present, vernacular, etc. I am actually already maintaining such a database for another PHP library that combines results from Nginx, Apache, freedesktop.org, and IANA. So let's start from the assumption that the necessary data already exists for retrieving all possible MIME types for a given extension.

I can whip up a patch with the added functionality, but wanted to first get some feedback on where/how the data should be incorporated (it is a rather big list, a PHP array would go on for miles.. I'm keeping it in a JSON file currently), and what existing areas might need to plug into that.

@joemcgill, do you have any initial thoughts?

Attachments (10)

mimes.diff (145.2 KB) - added by blobfolio 8 years ago.
proof of concept
mimes.2.diff (1.0 MB) - added by blobfolio 8 years ago.
handle application/CDFV2* types, implicit match of vnd.* variations, unit tests for variable XLS files
blob-mimes.zip (21.2 KB) - added by blobfolio 8 years ago.
Code repackaged as Features Plugin
mimes.3.diff (1.0 MB) - added by blobfolio 8 years ago.
expand CDFV2 workaround to cover newer 4-letter Office types
blob-mimes.2.zip (23.0 KB) - added by blobfolio 8 years ago.
rebuild data, add plugin upgrade check hooks
media-mimes.php (147.8 KB) - added by blobfolio 7 years ago.
Example data file
39963.diff (1.1 MB) - added by blobfolio 7 years ago.
Updated patch!
phpunit-data.zip (16.2 KB) - added by blobfolio 7 years ago.
Files for unit tests; extract to tests/phpunit/data/
39963.2.diff (1.1 MB) - added by blobfolio 7 years ago.
Refresh to current src, update MIME db.
39963.3.diff (1.1 MB) - added by blobfolio 7 years ago.
Update aliases, rebuild against current src

Change History (52)

#1 @blobfolio
8 years ago

To get the ball rolling, here's an example of what I was thinking: https://github.com/Blobfolio/blob-mimes/blob/master/build/WordPress/media-mimes.php

This script includes all the necessary data, and two functions.

First, a function to return all known MIME types for a given file extension:

<?php
/**
 * Return MIME aliases for a particular file extension.
 *
 * @since xxx
 *
 * @see {https://www.iana.org/assignments/media-types}
 * @see {https://svn.apache.org/repos/asf/httpd/httpd/trunk/docs/conf/mime.types}
 * @see {http://hg.nginx.org/nginx/raw-file/default/conf/mime.types}
 * @see {https://cgit.freedesktop.org/xdg/shared-mime-info/plain/freedesktop.org.xml.in}
 * @see {https://github.com/Blobfolio/blob-mimes}
 *
 * @param string $ext File extension.
 * @return array|bool MIME types. False on failure.
 */
function wp_get_mime_aliases( $ext = '' ) { ... }

And second, a function to check whether an arbitrary file extension/MIME pairing is valid. This also automatically checks for both type/subtype and type/x-subtype variants, the x- prefix being something commonly used during a file type's transformative years, but possibly missing from any official database.

<?php
/**
 * Check extension and MIME pairing.
 *
 * @since xxx
 *
 * @param string $ext File extension.
 * @param string $mime MIME type.
 * @return bool True/false.
 */
function wp_check_mime_alias( $ext = '', $mime = '' ) { ... }

This second function could be added to places like wp_check_filetype_and_ext where MIME data is pulled from somewhere variable (e.g. finfo). Rather than comparing $real_mime to the results of wp_check_filetype(), we can evaluate with wp_check_mime_alias( $type, $real_mime ).

This ticket was mentioned in Slack in #core-media by blobfolio. View the logs.


8 years ago

This ticket was mentioned in Slack in #core-media by blobfolio. View the logs.


8 years ago

#4 @blobfolio
8 years ago

The MIME list has been extended to include more results:

-- MacOS file types, namely those Apple didn't register with IANA)
-- parent classes (like application/ogg, which some systems return instead of specific types like video/ogg)

There's now a third function:

<?php
/**
 * Retrieve the "real" file type from the file.
 *
 * This extends `wp_check_filetype()` to additionally
 * consider content-based indicators of a file's
 * true type.
 *
 * The content-based type will override the name-based
 * type if available and included in the $mimes list.
 *
 * @since xxx
 *
 * @see wp_check_filetype()
 * @see wp_check_filetype_and_ext()
 *
 * @param string $file Full path to the file.
 * @param string $filename The name of the file (may differ from $file due to $file being in a tmp directory).
 * @param array $mimes Optional. Key is the file extension with value as the mime type.
 * @return array Values with extension first and mime type.
 */
function wp_check_real_filetype( $file, $filename = null, $mimes = null ) { ... }

The idea is to abstract the type-checking from other sanitizing/handling being performed in wp_check_filetype_and_ext() so that it can be called from other functions that might need a more robust type determination. wp_check_filetype_and_ext() could then be reduced so that it calls this function, then handles its specific validation checks.

#5 @blobfolio
8 years ago

Even with the patch that landed in 4.7.3 (#39550) a lot of users are still running into validation issues. When fileinfo returns an incorrect/old type, chances are it will be application/*, which wp_check_filetype_and_ext() subjects to the extra testing.

Thus landing MIME alias handling as a mitigation sooner rather than later would be good.

The database in wp_get_mime_aliases() is refreshed at least once monthly, and by its nature will need to be re-pushed to the WP Core indefinitely (gotta keep pace!). So any edge cases that might still be missing now should not be a blocking concern; with 1751 MIMEs at the time of this writing, it already addresses most problems people are likely to hit.

What I see as blocking:

(1) We need eyes on the functions themselves. Does anyone see any issues in format, structure, technique?

(2) The quickie patch at #40017 needs to land so exif fallbacks (for environments without fileinfo) can be added to wp_check_real_filetype().

(3) Layout feedback. Where should these functions ultimately go? I suggest we place the main payload function (wp_get_mime_aliases()) in its own file because of its size, while moving the others to wp-includes/functions.php. Those two functions can require_once the standalone MIME file so we aren't reading such a large file into memory unnecessarily.

With #1-3 out of the way, we can:

(4) Get a patch together, including a reduction of wp_check_filetype_and_ext() to remove duplicate behaviors, utilize alias checking.

(5) Add unit tests for the core.

(6) Profit!

This ticket was mentioned in Slack in #core-media by blobfolio. View the logs.


8 years ago

#7 follow-up: @eddiemcham
8 years ago

Would this also cover web fonts (x-font-woff => font-woff)? That would really help my team.

~ thx, Eddie

#8 in reply to: ↑ 7 @blobfolio
8 years ago

Replying to eddiemcham:

Would this also cover web fonts (x-font-woff => font-woff)? That would really help my team.

Yup, although application/font-woff isn't technically correct either (see e.g. https://www.iana.org/assignments/media-types/font/woff).

In a nutshell, the point of this enhancement is to deprovincialize WordPress' singular concept of MIMEs (in both time and space). Aside from increasing the explicit MIME relational data by over 17x what WordPress has currently (making it the largest single collection on the planet, haha), it also has automatic detection for whatever/x-variants (e.g. your x-font-woff vs font-woff, non-MIME parent class nonsense (like application/ms-office, which is not, nor has never been, a valid media type), and won't, by default, penalize generic application/octet-stream associations (which, while it can be a valid type, is more often just fileinfo's equivalent of a shrug).

WordPress isn't likely going to rollback or narrow the security fix that is causing all the collateral damage, but at least with MIME alias support, we can seriously mitigate almost all of the incorrect identifications people are seeing on various server environments (while, crucially, not defeating the purpose of the security fix in the first place!).

If you need a fix sooner rather than later, shoot me a message on Slack or email me (contact info can be found at the URL linked in my profile; it gets stripped if I try to post it directly here) and I can package this up as a quickie plugin for you. One of my clients is a font foundry and has had no problem uploading WOFF, WOFF2, TTF, OTF, SVG, etc., files to their site with this workaround.

I know you said you didn't want to add more overhead to your sites, which is understandable, but a surgical workaround is going to be a lot safer than allowing unfiltered uploads or disabling the security checks altogether. ;)

#9 @eddiemcham
8 years ago

Thanks, blobfolio, for the offer. Let me talk this over with my colleagues and team leads before I dig any deeper. They (and/or our IT dept) may have additional ideas. If they give me the go-ahead, I'll reach out to you via Slack or some other means. :)

Version 0, edited 8 years ago by eddiemcham (next)

#10 follow-up: @eddiemcham
8 years ago

@blobfolio,

I'm testing a workaround where colleagues upload their WOFFs to an FTP folder I've created for them. Then I'll modify our CSS rendering code to look there for the WOFFs instead of the Media Library. No more MIME types, (super)admin lockouts, or security error messages.

Thanks for all your help, though! :)

#11 in reply to: ↑ 10 @blobfolio
8 years ago

Replying to eddiemcham:

@blobfolio,

I'm testing a workaround where colleagues upload their WOFFs to an FTP folder I've created for them. Then I'll modify our CSS rendering code to look there for the WOFFs instead of the Media Library. No more MIME types, (super)admin lockouts, or security error messages.

Glad to hear you found your own workaround! The only caution I'd throw out there is that unbridled FTP access could pose an even greater danger than anything conducted through the WordPress UX. If possible, I recommend using SSH keys instead of passwords for your users and jailing access to that one directory. :)

#12 @eddiemcham
8 years ago

Thanks for the heads-up. I'll note the SSH part.

This ticket was mentioned in Slack in #core-media by blobfolio. View the logs.


8 years ago

@blobfolio
8 years ago

proof of concept

#14 @blobfolio
8 years ago

I put together a proof of concept patch:
wp-includes/functions.php now contains:

-- wp_check_real_filetype() (function and filter)
-- wp_check_mime_alias() (function and filter)
-- wp_check_application_octet_stream (filter)
-- updated wp_check_filetype_and_ext() (see below)

wp-includes/media-mimes.php

-- wp_get_mime_aliases() (function and filter)

tests/phpunit/tests/functions.php

-- updated big5.txt test

This demonstrates the benefits of MIME alias handling by allowing for more robust type matching, increased upload file validation (ALL files are subject to type evaluation when possible), and provides some UX improvements (ALL incorrectly named files, if otherwise valid, are renamed with the correct extension).

wp_check_real_filetype() begins with a name-based approach (i.e. wp_check_filetype()). If that fails, the failure is passed on. If it succeeds, it attempts to evaluate the "real" type using EXIF (not yet implemented, waiting on #40017), or that failing, FILEINFO. If either evaluation succeeds, the "real" type is compared against the known aliases for the file extension. If the alias is good, the name-based type is returned (i.e. WordPress' hardcoded definitions take priority). If the real MIME does not match the extension, but it is whitelisted, that MIME and the *correct* extension are returned. If the "real" MIME is not whitelisted, false is returned. If no content-based evaluation can be performed, the name-based results are returned.

wp_check_mime_alias() has automatic handling for temporary x-subtype/subtype variations (e.g. application/font-woff and application/x-font-woff are considered equivalent). By default it also soft-matches application/octet-stream against any extension, as that tends to be the response returned by a server when it doesn't know what a file is. That behavior can be overridden using the wp_check_application_octet_stream filter. All checks are case-insensitive and strip out invalid characters.

wp_check_filetype_and_ext() is updated to call wp_check_real_filetype() instead of wp_check_filetype(). This covers whitelist checks and type-based evaluation. Renaming is now applied to all files, not just the small subset of image types in the original version. image/* and application/* checks are removed as unnecessary (all content is evaluated now). The ultimate determinations are still filterable as before.

All original PHPUnit tests pass, with the exception of test which passes big5.txt as a JPEG; because of the improvements, the result is correctly identified and accepted as a text/plain file. ;) This patch updates the test accordingly.

$ phpunit tests/phpunit/tests/functions.php 
Installing...
Running as single site... To run multisite, use -c tests/phpunit/multisite.xml
Not running ajax tests. To execute these, use --group ajax.
Not running ms-files tests. To execute these, use --group ms-files.
Not running external-http tests. To execute these, use --group external-http.
PHPUnit 5.4.6 by Sebastian Bergmann and contributors.

................................................................. 65 / 83 ( 78%)
..................                                                83 / 83 (100%)

Time: 918 ms, Memory: 24.00Mb

OK (83 tests, 565 assertions)

It is a lot to digest, I know. But the benefits are numerous. It mitigates the issues in #40175, but happens to do so in a way that increases upload security, and file integrity more generally (for example, users won't accidentally hear a screeching "MP3" because that MP3 is really an OGG).

The MIME database (media-mimes.php) will need to be indefinitely maintained, as MIME data is always changing. That data, however, is automatically, regularly re-built independently of WordPress. I would propose we aim to update the data once per major release, a task I am more than happy to adopt.

Speaking of, the MIME data in this patch is at roughly 1750 entries. While that data will never be 100% complete, as-is it already improves WordPress' chances of correctly identifying a file by over 2000%.

This ticket was mentioned in Slack in #core-media by blobfolio. View the logs.


8 years ago

@blobfolio
8 years ago

handle application/CDFV2* types, implicit match of vnd.* variations, unit tests for variable XLS files

#16 @blobfolio
8 years ago

New patch up! Changes include:

-- Rebuilt against current trunk
-- Catch generic application/CDFV2* MS types https://core.trac.wordpress.org/ticket/40175#comment:8
-- Implicit type checking now includes vnd.* variants
-- Added wp_check_mime_alias unit tests covering all the different types of XLS files found to date (now up to *4* different MIME types returned by any single instance of fileinfo, haha)

This ticket was mentioned in Slack in #core-media by blobfolio. View the logs.


8 years ago

#18 @blobfolio
8 years ago

  • Keywords has-patch has-unit-tests added

@blobfolio
8 years ago

Code repackaged as Features Plugin

#19 @blobfolio
8 years ago

The Features Plugin is yet another way to give this a test!

The code is identical to this ticket's latest patch (save for a couple changed paths), except by necessity it has to duplicate the validation work after wp_check_filetype_and_ext() has finished its run (which it does by hooking into the eponymous filter).

The code for the plugin version is hosted at https://github.com/Blobfolio/blob-mimes/tree/master/build/WordPress/feature-plugin, but again, contains the same functions as the patch.

For those interested in unit testing, please apply the patch instead.

#20 follow-up: @hvianna
8 years ago

Hello! I tried the patch and it solved the problem for a couple of files I had, but still won't allow me to upload a .xlsx file with application/CDFV2-encrypted MIME type (password protected file).

#21 in reply to: ↑ 20 @blobfolio
8 years ago

Replying to hvianna:

Hello! I tried the patch and it solved the problem for a couple of files I had, but still won't allow me to upload a .xlsx file with application/CDFV2-encrypted MIME type (password protected file).

Thanks for the feedback! I'll do a little more research this morning and put up a new patch. Right now the application/CDFV2... workaround applies to 3-letter Office types (like DOC, XLS, etc.), but not the 4-letter types (DOCX, XLSX, etc.).

@blobfolio
8 years ago

expand CDFV2 workaround to cover newer 4-letter Office types

@blobfolio
8 years ago

rebuild data, add plugin upgrade check hooks

#22 @blobfolio
8 years ago

@hvianna Would you mind giving mimes.3.diff a shot and see if that helps with your XLSX file?

For those using the plugin, please update to blob-mimes.2.zip. The plugin (as of 0.1.2) now hooks into WordPress' update hooks, so new version notifications and updates should start coming through the usual way (i.e. you won't have to keep coming back here for new zips). If it looks like it will take a while for the patch to land in the Core, the plugin source will be moved from Github to the main WP Repo, but will transition automatically so long as you're running 0.1.2+. :)

#23 follow-up: @lukecavanagh
8 years ago

@hvianna

Updated the plugin to blob-mimes.2.zip and this .xlsx test file uploaded fine on a local dev site running WP 4.7.3.

http://dev-builds.libreoffice.org/tmp/test.xlsx

Last edited 8 years ago by lukecavanagh (previous) (diff)

#24 in reply to: ↑ 23 @blobfolio
8 years ago

Replying to lukecavanagh:

@hvianna

Updated the plugin to blob-mimes.2.zip and this .xlsx test file uploaded fine on a local dev site running WP 4.7.3.

http://dev-builds.libreoffice.org/tmp/test.xlsx

Thanks @lukecavanagh. Because of legacy code living within the fileinfo.so extension itself, a single PHP server might return a different MIME type from one XLSX file to another.

The class that was throwing @hvianna for a loop earlier, application/CDFV2-xxx, is a sort of catch-all that is used when a document contains ambiguous, corrupt, or unknowable magic headers (usually because of features like encryption or compression). Some XLSX files will trigger it, some won't.

We'll have to wait for @hvianna to test the exact file from earlier to be sure, but it should be good now.

#25 follow-up: @hvianna
8 years ago

@blobfolio @lukecavanagh I confirm it's now working! Thank you @blobfolio for your work on this issue!

#26 in reply to: ↑ 25 @blobfolio
8 years ago

Replying to hvianna:

@blobfolio @lukecavanagh I confirm it's now working! Thank you @blobfolio for your work on this issue!

Awesome, thanks for confirming! I know patches can be a bit of a pain to work with; I appreciate you taking the time.

This ticket was mentioned in Slack in #forums by lukecavanagh. View the logs.


8 years ago

This ticket was mentioned in Slack in #core-media by blobfolio. View the logs.


8 years ago

#29 @blobfolio
8 years ago

The plugin implementation of this patch has been updated to 0.1.3!

The new release includes a handy upload debug tool for users with manage_options privileges (e.g. admins) under Tools > Debug File Validation. If at any point in the future the Media Library is rejecting something it shouldn't, please use that tool to gather some useful feedback information and post it here (or email me if you aren't comfortable posting it publicly).

The plugin is now hosted in the main WordPress Plugins repository (https://wordpress.org/plugins/blob-mimes/). If you are currently running 0.1.2, you should get the orange upgrade notice automatically once the version cache clears (no later than tomorrow, surely). Once you update, you'll be on the WP repo version.

If you're using an older version of the plugin or running it from a server that can't reach Github (the old host), please remove the plugin and re-install it from the main repo so you can catch up.

That's all, folks. Happy Friday!

#30 @blobfolio
8 years ago

  • Focuses administration added

This ticket was mentioned in Slack in #core-media by blobfolio. View the logs.


8 years ago

#32 @blobfolio
7 years ago

  • Resolution set to wontfix
  • Status changed from new to closed

This solution is not something the WP Core is interested in pursuing, so I'm closing the ticket as WONTFIX.

I still think there is value in providing more complete file validation during the upload process, so will keep the enhancement alive in plugin form (https://wordpress.org/plugins/blob-mimes/). I've renamed it to Lord of the Files and extended its focus to cover upload issues more generally. To that end, version 0.5.0 disentangles it from this ticket's patch (moves stuff out of wp_* namespaces, etc.), and adds SVG sanitization for sites that have whitelisted that file type.

Developer reference has been moved to https://github.com/Blobfolio/blob-mimes/tree/master/wp, so anybody interested in hooking into the provided filters, etc., should take a peek.

Thanks everyone for all the discussion and feedback!

#33 @ocean90
7 years ago

  • Milestone Awaiting Review deleted

This ticket was mentioned in Slack in #core-media by joemcgill. View the logs.


7 years ago

#35 @joemcgill
7 years ago

  • Milestone set to 5.0
  • Resolution wontfix deleted
  • Status changed from closed to reopened

After thinking about this for a while now, I want to take another look at adding MIME aliasing. A few concerns that I'd want us to look into:

  1. Let's make sure there is a solid plan for keeping the MIME database up to date so things don't go stale.
  2. I'm curious if we could limit the (massive) list in mimes.3.diff to only include mimes that are currently supported by WordPress, while allowing the list to be extended to a more full list for plugins that want to provide that functionality, but that's not a blocker.
  3. I like the idea of putting application/octet stream MIMES through an extra set of checks, similarly to what is proposed in #40175, but we may need to read the first few bytes of files ourselves rather than relying on an external library, like fileinfo.

I'd also like to get feedback from folks at larger hosting companies or from WP.com, who have had to add additional checking around MIMES. @mikeschroder, I'd appreciate your feedback there.

@blobfolio
7 years ago

Example data file

#36 @blobfolio
7 years ago

Thanks @joemcgill!

I just attached an example data file. It is only around 150KB and contains information on 1978 different file extensions. Variations like x- and vnd. have been stripped out; those will be discovered as part of the alias matching function (separate from this file), so that saves some space.

My thinking is that every aspect of this enhancement is progressive. The existing upload_mimes functionality shouldn't be messed with, and whatever MIME type that list declares something to be, that's what should ultimately be used (when e.g. saving an attachment).

What we'd be adding with this enhancement (aside from the alias array) are a couple small functions to check that MIME/ext pairs are appropriate (particularly in cases where the MIME comes from outside WP and might disagree with the upload_mimes entry). Those functions could then be added to wp_check_filetype_and_ext to make upload validation more accurate (we can detect and block bad files, but also fix incorrectly named ones).

Insofar as maintenance, the only piece that will need to be updated is the MIME data itself. I already have a build script for this, so it is really easy to regenerate. I'll push it somewhere public once we're in agreement on the file contents. As-is, the file contains a function and filter called wp_get_mime_aliases. I don't think it needs more than that.

I'm curious if we could limit the (massive) list in mimes.3.diff​ to only include mimes that are currently supported by WordPress...

I think it would be better to handle all files out-of-the-box. Many users add types to WordPress through plugins, and those plugins aren't going to be capable of populating this data on their own. Again, we're only looking at about 150KB of data, so I'd rather not risk the UX/security implications from shortening it.

I like the idea of putting application/octet stream MIMES through an extra set of checks...

With proper alias handling, we should be checking all uploaded content. That's the real security benefit here. Initially I think we should just continue using fileinfo.so when available, but down the road we could definitely make our own Magic Header script to strategically examine files for users who don't have that PHP extension installed.

My plan is to put together an updated patch for this this weekend! I'll post it here when that's ready.

@blobfolio
7 years ago

Updated patch!

@blobfolio
7 years ago

Files for unit tests; extract to tests/phpunit/data/

#37 @blobfolio
7 years ago

All rightie, got a new patch in place (39963.diff and a few files for unit tests in phpunit-data.zip).


Functions:

This patch includes 5 new functions.


The file-renaming logic that sometimes happens during a file upload has been abstracted into its own function, wp_update_filename_extension(). This simply replaces the extension on the end of the file name.

/**
 * Assign a new extension to a filename.
 *
 * @since 5.0.0
 *
 * @param string $filename The original filename.
 * @param string $ext      The new extension.
 * @return string          The renamed file.
 */
function wp_update_filename_extension( $filename, $ext ) {


In terms of actual MIME alias work, the most basic check is wp_check_mime_alias(), which is used to see whether or not a given MIME type is appropriate for a given file extension.

/**
 * Check extension and MIME pairing.
 *
 * @since 5.0.0
 *
 * @param string $ext  File extension.
 * @param string $mime MIME type.
 * @return bool True/false.
 */
function wp_check_mime_alias( $ext = '', $mime = '' ) {


To go a little deeper, we have wp_check_allowed_mime_aliases(), which checks to see whether a given MIME type is in the official upload_mimes list, or is an alias of a type that is in that list. The upload_mimes types always take priority. This will return an array with ext and type keys (the type being the official whitelist entry) if it is allowed, otherwise false.

/**
 * Check Allowed Aliases
 *
 * This will cycle through each allowed ext/MIME pair to see if an alias
 * matches anything.
 *
 * @since 5.0.0
 *
 * @param string $alias MIME alias.
 * @param array  $mimes Allowed MIME types.
 * @return array|bool Array containing ext and type keys or false.
 */
function wp_check_allowed_mime_aliases( $alias, $mimes = null ) {


Finally, we have a function that applies this alias-matching to an actual file. This uses a tiered approach to determine the best guess about what kind of a file a file actually is, and whether or not it is allowed by WP.

This staged approach begins with a basic call to wp_check_filetype(), which determines type solely based on the file name. After that it will evaluate content via EXIF and fileinfo.so (if it is able to do so). If the "real" MIME differs from the name-based MIME, alias matching comes into play, and if that file is still allowed, the ext and type returned are updated accordingly to match reality.

Like the original wp_check_filetype() function, invalid/illegal files result in an array with ext and type set to false being returned.

/**
 * Retrieve the "real" file type from the file.
 *
 * This extends `wp_check_filetype()` to additionally consider content-
 * based indicators of a file's true type.
 *
 * The content-based type will override the name-based type if available
 * and included in the $mimes list.
 *
 * A false response will be set if the extension is not allowed, or if a
 * "real MIME" was found and that MIME is not allowed.
 *
 * @since 5.0.0
 *
 * @see wp_check_filetype()
 * @see wp_check_filetype_and_ext()
 *
 * @param string $file     Full path to the file.
 * @param string $filename The name of the file (may differ from $file due to $file being in a tmp directory).
 * @param array  $mimes    Optional. Key is the file extension with value as the mime type.
 * @return array Values with extension first and mime type.
 */
function wp_check_real_filetype( $file, $filename = null, $mimes = null ) {


And last but not least, we have a function that returns the full database of MIME aliases, which is auto-generated and stored in wp-includes/media-mimes.php. This data is an array, each key is a file extension, the values are arrays of MIME type(s) for said extension.

/**
 * Return MIME Aliases
 *
 * @since 5.0.0
 *
 * @return array MIME aliases organized by file extension. See data below for example.
 */
function wp_get_mime_aliases() {


Filters:

Many of these new functions have eponymous filters that users can hook into to change the results:

  • wp_get_mime_aliases
  • wp_check_real_filetype
  • wp_check_mime_alias

There is one additional filter that needs a bit of background:

Most libraries will assign a file type of application/octet-stream to any file they don't understand. It is the equivalent of "I dunno." Because of this, the wp_check_mime_alias() function implicitly passes (i.e. returns true) for any extension paired with this unhelpful type.

The filter wp_check_mime_alias_application_octet_stream was added to allow users to decide whether or not application/octet-stream deserves full checks or not. (true means actually check this, false (the default) means don't worry.)


Behavior:

The behavioral changes from this patch all stem from modifications to the existing wp_check_filetype_and_ext() function, which is called as part of the file upload process.

Previously, that function ran some content-based checks on certain types of files, but did so in a way that resulted in false positives and negatives.

Now that functionality is offloaded to wp_check_real_filetype().

First and foremost, this means that all uploaded files are now subjected to content-based analysis (for environments that support it). This fixes the security issues the changes in 4.7.1 were meant to handle, but also expands them to fix a lot of similar, unpublished issues.

This also means that WordPress can now intelligently rename files that are using the wrong extension (but are otherwise allowed).


Testing:

This functionality has existed in the Lord of the Files plugin for some time and worked wonders there, but please give this patch a shot to make sure it behaves the same way. (Also, be sure to deactivate LotF if it is installed on your test sites.)

For those interested in checking out the build script (that compiles the media-mimes.php file), you can find that at https://github.com/Blobfolio/wp-blob-mimes.

@blobfolio
7 years ago

Refresh to current src, update MIME db.

@blobfolio
7 years ago

Update aliases, rebuild against current src

This ticket was mentioned in Slack in #core-media by joemcgill. View the logs.


6 years ago

This ticket was mentioned in Slack in #core-media by antpb. View the logs.


6 years ago

This ticket was mentioned in Slack in #core-media by blobfolio. View the logs.


6 years ago

#41 @kirasong
6 years ago

  • Keywords needs-testing added
  • Milestone changed from 5.0 to Future Release

Changing to Future Release per @blobfolio's feedback in Slack above.

I'd still love to see this land at some point!

#42 @o815
2 years ago

hi, has this ticket lost attention? when to expect further process?

Note: See TracTickets for help on using tickets.