Make WordPress Core

Opened 20 months ago

Closed 14 months ago

Last modified 14 months ago

#57898 closed defect (bug) (fixed)

Google Doc saved as .docx file not allowed in media library

Reported by: winterstreet's profile winterstreet Owned by: audrasjb's profile audrasjb
Milestone: 6.4 Priority: normal
Severity: normal Version: 6.1.1
Component: Upload Keywords: has-patch has-screenshots changes-requested commit
Focuses: Cc:

Description

On a site with no plugins, and a default theme, I get denied uploading a .docx file created from a Google Doc. If I upload a .docx file downloaded from Microsoft it works fine.

Attachments (6)

media-libray-issue.png (287.9 KB) - added by jakariaistauk 20 months ago.
File extensions are same but one uploaded another denied
mujuonly.docx (6.0 KB) - added by mujuonly 19 months ago.
The sample google docx file
Screenshot 2023-03-11 at 12.36.48 PM.png (408.8 KB) - added by mujuonly 19 months ago.
Screenshot of the error
test-googledoc.docx (211.0 KB) - added by audrasjb 18 months ago.
Test file
Capture d’écran 2023-04-10 à 23.45.54.png (216.0 KB) - added by audrasjb 18 months ago.
Before patch: Google Docs refused (wrong mime type info)
d52ac2026167366cb0315372b5389193.gif (2.3 MB) - added by audrasjb 18 months ago.
After patch: all good!

Change History (26)

#1 @sabernhardt
20 months ago

  • Component changed from General to Upload

#2 @jakariaistauk
20 months ago

Hi,
I also reproduced the issue. the file extension is the same but file created by google can't upload.

@jakariaistauk
20 months ago

File extensions are same but one uploaded another denied

#3 @mujuonly
19 months ago

The same issue exist with 5.7.2.

Version 0, edited 19 months ago by mujuonly (next)

@mujuonly
19 months ago

The sample google docx file

@mujuonly
19 months ago

Screenshot of the error

This ticket was mentioned in Slack in #core-test by juhise. View the logs.


19 months ago

#5 follow-up: @mi5t4n
19 months ago

<?php
File: wp-includes/functions.php
3116: 
3117:   // Validate files that didn't get validated during previous checks.
3118:   if ( $type && ! $real_mime && extension_loaded( 'fileinfo' ) ) {
3119:           $finfo     = finfo_open( FILEINFO_MIME_TYPE );
3120:           $real_mime = finfo_file( $finfo, $file );
3121:           finfo_close( $finfo );
3122:


The main issue is that finfo_file() is returning "application/vnd.openxmlformats-officedocument.wordprocessingml.documentapplication/vnd.openxmlformats-officedocument.wordprocessingml.document" mime_type for google docs. As we can see, it is just a redudant of mime type "application/vnd.openxmlformats-officedocument.wordprocessingml.document" because of that the validation is failing.

Last edited 19 months ago by mi5t4n (previous) (diff)

This ticket was mentioned in PR #4228 on WordPress/wordpress-develop by @mi5t4n.


19 months ago
#6

  • Keywords has-patch added

#7 @sabernhardt
19 months ago

  • Keywords needs-testing added
  • Milestone changed from Awaiting Review to 6.3

#8 @annashopina
19 months ago

Hi! I am trying to deploy the patch, but I was stuck in errors.

Looks like the patch was not merged. Please check https://github.com/WordPress/wordpress-develop/pull/4228

@audrasjb
18 months ago

Test file

@audrasjb
18 months ago

Before patch: Google Docs refused (wrong mime type info)

@audrasjb
18 months ago

After patch: all good!

#9 @audrasjb
18 months ago

  • Keywords has-screenshots added; needs-testing removed
  • Owner set to audrasjb
  • Status changed from new to accepted

I reproduced the issue as well. I tested the proposed patch and it appears it fixes the issue.

By the way, I'm curious about how did we get a redundant file info at the first place… 🤨

#10 in reply to: ↑ 5 @azaozz
18 months ago

  • Keywords 2nd-opinion added

Replying to mi5t4n:

The main issue is that finfo_file() is returning

application/vnd.openxmlformats-officedocument.wordprocessingml.documentapplication/vnd.openxmlformats-officedocument.wordprocessingml.document

mime_type for google docs. As we can see, it is just a redudant of mime type

application/vnd.openxmlformats-officedocument.wordprocessingml.document

because of that the validation is failing.

Sounds like a bug in finfo_file()?

Frankly I'm not sure about the patch in https://github.com/WordPress/wordpress-develop/pull/4228 as this is somewhat security related. If this is a bug in finfo_file() that seems to happen only for *.docx files saved by Google Docs, lets add an exception for this particular case only. I.e. match the whole wrong mime type string and replace it with the right one.

#11 @mi5t4n
18 months ago

@audrasjb The finfo_file() returns redundant mime type for google docs.

@azaozz You're right, it's an issue with the finfo_file() function. It will be better to handle that specific edge case only. I have pushed new changes.

#12 @kirasong
17 months ago

Checked into this a bit, and wanted to confirm that it does seem to be an upstream bug:
https://bugs.php.net/bug.php?id=77784

The ticket indicates it may affect Excel / xlsx files as well, returning application/vnd.openxmlformats-officedocument.spreadsheetml.sheetapplication/vnd.openxmlformats-officedocument.spreadsheetml.sheet.

A commenter in the above ticket notes that they believe it's an issue in libmagic. I had difficulty searching for references in the tracker for libmagic for this issue, but if anyone is able to do so, please feel free to reference here.

If there isn't yet an open (or resolved) issue there, it'd likely be a good idea to open one so that it can be fixed upstream.

In the meantime, I'm not opposed to fixing the exceptions here like in the proposed patch.

#13 @oglekler
16 months ago

  • Milestone changed from 6.3 to 6.4

I uploaded several files from my Google Disk - .docx and .xlsx and some of them have correct mime types and were uploaded fine and some of them have this duplicated string. I opened .xlsx file in Excel and resaved it, and it got the correct mime type. I wonder if we need to make a workaround for errors which were made by some other tool instead of addressing the source of this thing — Google Docs or any function/library they are using while saving a file as .docx or .xlsx. All .pptx filed I tried were uploaded successfully, but it doesn't mean that this can be the case also.

Solution is questionable and patch is not ready. And if there will be a patch, I believe that it will need Unit test as well. So, I am moving this ticket for 6.4 for further consideration.

#14 @oglekler
14 months ago

  • Keywords changes-requested added

@azaozz and @mikeschroder we need a decision, what we are doing. I would have preferred to fix this in the source of the problem, but it looks like there things are not going anywhere, and due to inconvenience for WordPress users, I am suggesting to continue with the patch, but make a solution which will work for all doubled mime types which can accrue and make a proper description for this work around to make clear why we need such strange thing in the first place.

I didn't manage to get such doubled mime time with pptx, but I think, it needs to be checked for such possibility as well.

#15 @mi5t4n
14 months ago

@oglekler Since, it is an issue with docx, xlsx and some of the odt files as well. Should I have revert the PR which contains a solution which catches all the duplicated mime issues?

#16 @oglekler
14 months ago

@mi5t4n this is what I want to clarify before you will rework the patch.

#17 @azaozz
14 months ago

  • Keywords commit added; 2nd-opinion removed

In the meantime, I'm not opposed to fixing the exceptions here like in the proposed patch.

Same. The patch (PR) looks good imho. Fixes exactly this problem without affecting anything else.

Last edited 14 months ago by azaozz (previous) (diff)

#18 @audrasjb
14 months ago

  • Resolution set to fixed
  • Status changed from accepted to closed

In 56497:

Upload: Add a MIME type exception for .docx generated by Google Docs.

This changeset adds an exception to prevent permission issues on .docx generated by Google Docs. This is a temporary fix for an upstream bug on the finfo_file()
PHP function which returns a redundant MIME type for these documents.

Props winterstreet, jakariaistauk, mujuonly, mi5t4n, annashopina, audrasjb, azaozz, mikeschroder, oglekler.
Fixes #57898.

#20 @SergeyBiryukov
14 months ago

In 56510:

Upload: Correct duplicate MIME type for .xlsx files generated by Google Docs.

This expands the code block previously added for .docx files to include .xlsx files as well, which are known to have the same issue with finfo_file().

Includes a unit test case for wp_check_filetype_and_ext().

Reference: PHP Bug #77784: mime_content_type() result gets doubled for .xlsx.

Follow-up to [56497].

See #57898.

Note: See TracTickets for help on using tickets.