Make WordPress Core

Opened 9 months ago

Closed 3 months ago

Last modified 3 months ago

#57898 closed defect (bug) (fixed)

Google Doc saved as .docx file not allowed in media library

Reported by: winterstreet's profile winterstreet Owned by: audrasjb's profile audrasjb
Milestone: 6.4 Priority: normal
Severity: normal Version: 6.1.1
Component: Upload Keywords: has-patch has-screenshots changes-requested commit
Focuses: Cc:

Description

On a site with no plugins, and a default theme, I get denied uploading a .docx file created from a Google Doc. If I upload a .docx file downloaded from Microsoft it works fine.

Attachments (6)

media-libray-issue.png (287.9 KB) - added by jakariaistauk 9 months ago.
File extensions are same but one uploaded another denied
mujuonly.docx (6.0 KB) - added by mujuonly 9 months ago.
The sample google docx file
Screenshot 2023-03-11 at 12.36.48 PM.png (408.8 KB) - added by mujuonly 9 months ago.
Screenshot of the error
test-googledoc.docx (211.0 KB) - added by audrasjb 8 months ago.
Test file
Capture d’écran 2023-04-10 à 23.45.54.png (216.0 KB) - added by audrasjb 8 months ago.
Before patch: Google Docs refused (wrong mime type info)
d52ac2026167366cb0315372b5389193.gif (2.3 MB) - added by audrasjb 8 months ago.
After patch: all good!

Change History (26)

#1 @sabernhardt
9 months ago

  • Component changed from General to Upload

#2 @jakariaistauk
9 months ago

Hi,
I also reproduced the issue. the file extension is the same but file created by google can't upload.

@jakariaistauk
9 months ago

File extensions are same but one uploaded another denied

#3 @mujuonly
9 months ago

I have tested this in both WP 6.1.1 and WP 5.7.2 - I was able to reproduce the issue

Env

  • WordPress 5.7.2
  • Chrome Version 110.0.5481.177 (Official Build) (arm64)
  • MacOS Monterey
  • Theme: Twenty Twenty Three

Steps to test

  1. Add one .docx file downloaded from Google
Last edited 9 months ago by mujuonly (previous) (diff)

@mujuonly
9 months ago

The sample google docx file

@mujuonly
9 months ago

Screenshot of the error

This ticket was mentioned in Slack in #core-test by juhise. View the logs.


9 months ago

#5 follow-up: @mi5t4n
9 months ago

<?php
File: wp-includes/functions.php
3116: 
3117:   // Validate files that didn't get validated during previous checks.
3118:   if ( $type && ! $real_mime && extension_loaded( 'fileinfo' ) ) {
3119:           $finfo     = finfo_open( FILEINFO_MIME_TYPE );
3120:           $real_mime = finfo_file( $finfo, $file );
3121:           finfo_close( $finfo );
3122:


The main issue is that finfo_file() is returning application/vnd.openxmlformats-officedocument.wordprocessingml.documentapplication/vnd.openxmlformats-officedocument.wordprocessingml.document mime_type for google docs. As we can see, it is just a redudant of mime type application/vnd.openxmlformats-officedocument.wordprocessingml.document because of that the validation is failing.

Version 1, edited 9 months ago by mi5t4n (previous) (next) (diff)

This ticket was mentioned in PR #4228 on WordPress/wordpress-develop by @mi5t4n.


9 months ago
#6

  • Keywords has-patch added

#7 @sabernhardt
9 months ago

  • Keywords needs-testing added
  • Milestone changed from Awaiting Review to 6.3

#8 @annashopina
8 months ago

Hi! I am trying to deploy the patch, but I was stuck in errors.

Looks like the patch was not merged. Please check https://github.com/WordPress/wordpress-develop/pull/4228

@audrasjb
8 months ago

Test file

@audrasjb
8 months ago

Before patch: Google Docs refused (wrong mime type info)

@audrasjb
8 months ago

After patch: all good!

#9 @audrasjb
8 months ago

  • Keywords has-screenshots added; needs-testing removed
  • Owner set to audrasjb
  • Status changed from new to accepted

I reproduced the issue as well. I tested the proposed patch and it appears it fixes the issue.

By the way, I'm curious about how did we get a redundant file info at the first place… 🤨

#10 in reply to: ↑ 5 @azaozz
8 months ago

  • Keywords 2nd-opinion added

Replying to mi5t4n:

The main issue is that finfo_file() is returning

application/vnd.openxmlformats-officedocument.wordprocessingml.documentapplication/vnd.openxmlformats-officedocument.wordprocessingml.document

mime_type for google docs. As we can see, it is just a redudant of mime type

application/vnd.openxmlformats-officedocument.wordprocessingml.document

because of that the validation is failing.

Sounds like a bug in finfo_file()?

Frankly I'm not sure about the patch in https://github.com/WordPress/wordpress-develop/pull/4228 as this is somewhat security related. If this is a bug in finfo_file() that seems to happen only for *.docx files saved by Google Docs, lets add an exception for this particular case only. I.e. match the whole wrong mime type string and replace it with the right one.

#11 @mi5t4n
8 months ago

@audrasjb The finfo_file() returns redundant mime type for google docs.

@azaozz You're right, it's an issue with the finfo_file() function. It will be better to handle that specific edge case only. I have pushed new changes.

#12 @mikeschroder
6 months ago

Checked into this a bit, and wanted to confirm that it does seem to be an upstream bug:
https://bugs.php.net/bug.php?id=77784

The ticket indicates it may affect Excel / xlsx files as well, returning application/vnd.openxmlformats-officedocument.spreadsheetml.sheetapplication/vnd.openxmlformats-officedocument.spreadsheetml.sheet.

A commenter in the above ticket notes that they believe it's an issue in libmagic. I had difficulty searching for references in the tracker for libmagic for this issue, but if anyone is able to do so, please feel free to reference here.

If there isn't yet an open (or resolved) issue there, it'd likely be a good idea to open one so that it can be fixed upstream.

In the meantime, I'm not opposed to fixing the exceptions here like in the proposed patch.

#13 @oglekler
5 months ago

  • Milestone changed from 6.3 to 6.4

I uploaded several files from my Google Disk - .docx and .xlsx and some of them have correct mime types and were uploaded fine and some of them have this duplicated string. I opened .xlsx file in Excel and resaved it, and it got the correct mime type. I wonder if we need to make a workaround for errors which were made by some other tool instead of addressing the source of this thing — Google Docs or any function/library they are using while saving a file as .docx or .xlsx. All .pptx filed I tried were uploaded successfully, but it doesn't mean that this can be the case also.

Solution is questionable and patch is not ready. And if there will be a patch, I believe that it will need Unit test as well. So, I am moving this ticket for 6.4 for further consideration.

#14 @oglekler
3 months ago

  • Keywords changes-requested added

@azaozz and @mikeschroder we need a decision, what we are doing. I would have preferred to fix this in the source of the problem, but it looks like there things are not going anywhere, and due to inconvenience for WordPress users, I am suggesting to continue with the patch, but make a solution which will work for all doubled mime types which can accrue and make a proper description for this work around to make clear why we need such strange thing in the first place.

I didn't manage to get such doubled mime time with pptx, but I think, it needs to be checked for such possibility as well.

#15 @mi5t4n
3 months ago

@oglekler Since, it is an issue with docx, xlsx and some of the odt files as well. Should I have revert the PR which contains a solution which catches all the duplicated mime issues?

#16 @oglekler
3 months ago

@mi5t4n this is what I want to clarify before you will rework the patch.

#17 @azaozz
3 months ago

  • Keywords commit added; 2nd-opinion removed

In the meantime, I'm not opposed to fixing the exceptions here like in the proposed patch.

Same. The patch (PR) looks good imho. Fixes exactly this problem without affecting anything else.

Last edited 3 months ago by azaozz (previous) (diff)

#18 @audrasjb
3 months ago

  • Resolution set to fixed
  • Status changed from accepted to closed

In 56497:

Upload: Add a MIME type exception for .docx generated by Google Docs.

This changeset adds an exception to prevent permission issues on .docx generated by Google Docs. This is a temporary fix for an upstream bug on the finfo_file()
PHP function which returns a redundant MIME type for these documents.

Props winterstreet, jakariaistauk, mujuonly, mi5t4n, annashopina, audrasjb, azaozz, mikeschroder, oglekler.
Fixes #57898.

#20 @SergeyBiryukov
3 months ago

In 56510:

Upload: Correct duplicate MIME type for .xlsx files generated by Google Docs.

This expands the code block previously added for .docx files to include .xlsx files as well, which are known to have the same issue with finfo_file().

Includes a unit test case for wp_check_filetype_and_ext().

Reference: PHP Bug #77784: mime_content_type() result gets doubled for .xlsx.

Follow-up to [56497].

See #57898.

Note: See TracTickets for help on using tickets.