Make WordPress Core

Opened 13 years ago

Closed 11 years ago

Last modified 10 years ago

#20074 closed task (blessed) (fixed)

Attempt MD5 checksum verification in upgrader if available

Reported by: sivel's profile sivel Owned by: dd32's profile dd32
Milestone: 3.7 Priority: normal
Severity: normal Version: 2.8
Component: Upgrade/Install Keywords: has-patch
Focuses: Cc:

Description

In #19928 the Content-MD5 header was added for downloads of WP releases.

We had discussed performing MD5 verification, in the past, and now that we can do it without an extra HTTP request, it is somewhat more feasible.

If we add the Content-MD5 header for themes and plugins, this should work for them as well.

The patch will also give us access to the full HTTP API response array from download_url(), if specified.

Attachments (5)

20074.diff (5.1 KB) - added by sivel 13 years ago.
20074.2.diff (2.2 KB) - added by sivel 13 years ago.
Move the verification to download_url()
20074.3.diff (3.1 KB) - added by sivel 13 years ago.
Take this a bit further, perhaps just to show what could be done, to allow people to use either the Content-MD5 header, a hex hash, or a URL to an MD5 file to verify the download via download_url()
20074.4.diff (755 bytes) - added by dd32 11 years ago.
20074.5.diff (1.7 KB) - added by dd32 11 years ago.

Download all attachments as: .zip

Change History (30)

@sivel
13 years ago

#1 @nacin
13 years ago

Nice.

#2 @dd32
13 years ago

Just to be clear: The MD5 verification that was talked about in the past was individual file md5 checks to ensure that the local files were correct before performing a partial build upgrade.

An alternative that could be used here, would be to perform the md5 check within the download function if the header exists, and return a WP_Error instead if it doesn't match. But that's assuming that the md5 headers can be trusted from non-wp sources as well.

This would simply catch a incomplete download earlier, with a better message, a corrupt or incomplete zip download will currently throw a 'archive in an incompatible format' type error (since the zip verification fails) - which is a good thing, a more specific error is nice in this case and lets users know what's up without a potentially cryptic error

#3 @nacin
13 years ago

An alternative that could be used here, would be to perform the md5 check within the download function if the header exists, and return a WP_Error instead if it doesn't match. But that's assuming that the md5 headers can be trusted from non-wp sources as well.

An alternative implementation could be download_url( $url, $timeout = 300, &$response = null ), and then we simply assign the response to $response, therefore optionally being able to pass that back as well. That's basically what sivel's patch does, though this would probably be a bit cleaner.

#4 follow-up: @dd32
13 years ago

That would work too, I'm not too picky on the implementation honestly, I was just counting a md5 failure as a Download failure, and one which should therefor be returned from download_url() rather than being handled outside of the function.

I like the patch as it stands, just wanted to throw that thought into the arena :)

#5 in reply to: ↑ 4 @sivel
13 years ago

If we did the verification within download_url(), we would have to add an arg to specify whether or not we want to perform the verification, perhaps defaulted to false. Although as I think about it, I'm not sure why you wouldn't want to verify if it was available, so perhaps defaulted to true. This may actually help those who use download_url(), so that they don't have to re-implement the verification code. I kind of like this.

In either case, I'd still like to be able to get the full response back from download_url(), whether in the return or as a reference. Although for me, assigning the response by reference feels a little odd. We don't do this often as it is, and could perhaps not be clear to people trying to use it.

I'll update the patch, based on feedback, if we want to move in one or more of those directions.

And for what it is worth, and as a side note, I am pretty much finished on the code to add Content-MD5 to nginx via the embedded perl module.

@sivel
13 years ago

Move the verification to download_url()

@sivel
13 years ago

Take this a bit further, perhaps just to show what could be done, to allow people to use either the Content-MD5 header, a hex hash, or a URL to an MD5 file to verify the download via download_url()

#6 @sivel
13 years ago

20074.2.diff moves the download verification to download_url()

20074.3.diff is the result of a little thinking on how we could make download_url() flexible for developers to use it for download and download verification, regardless of being able to use Content-MD5. It adds a few additional arguments, to download_url() and allows the developer to specify either a bool to use the Content-MD5 header, or a string for either a md5 hash, or a URL to a MD5 file for verification.

#7 follow-up: @westi
13 years ago

20074.2.diff is my preferred of the later two patches but I'm not a big fan of extending download_url to optionally return the HTTP Repsonse object.

The whole point of the function is you either get a temp filename where the file has been downloaded to or an Error object adding a third thing into the mix complicates things for the users of the function unnecessarily - I think if you want access to the full HTTP response object you should make your own version of the function.

Also the current patch leaves files lying around because it is the responsibility of the caller of download_url to unlink but if the funtion returns the full response you don't have access to the temp file to unlink.

#8 in reply to: ↑ 7 @sivel
13 years ago

Replying to westi:

20074.2.diff is my preferred of the later two patches but I'm not a big fan of extending download_url to optionally return the HTTP Repsonse object.

The whole point of the function is you either get a temp filename where the file has been downloaded to or an Error object adding a third thing into the mix complicates things for the users of the function unnecessarily - I think if you want access to the full HTTP response object you should make your own version of the function.

We actually had to do this with wpqi but I don't really see why we couldn't return the response. If it is more palatable we could assign the $response by reference like nacin mentioned.

Also the current patch leaves files lying around because it is the responsibility of the caller of download_url to unlink but if the funtion returns the full response you don't have access to the temp file to unlink.

The HTTP response will return the filename regardless if it was provided to the HTTP API or if the HTTP API generated the temp name.

#9 @nacin
12 years ago

  • Milestone changed from 3.4 to Future Release

#10 @nacin
12 years ago

#23191 was marked as a duplicate.

#11 @nacin
11 years ago

  • Milestone changed from Future Release to 3.7
  • Type changed from enhancement to task (blessed)

@dd32
11 years ago

#12 @dd32
11 years ago

Attachment attachment:20074.4.diff​ added

Bare-bones always-verifying no-options refresh of attachment:20074.2.diff

Completely untested as WordPress.org no longer sends Content-MD5 headers (we're looking into it, it's a nginx short-coming)

#13 @dd32
11 years ago

  • Keywords commit needs-testing added; dev-feedback removed

#14 @nacin
11 years ago

dd32 pointed out to me that Content-MD5 is supposed to be base64-encoded. Does not look like that is handled in the latest patches.

#15 follow-up: @dd32
11 years ago

Correct, It's not handled at present as it was modelled on the previous (incorrect) implementation.

The check would need to change to

$md5_file = md5_file( $tmpfname  );
$content_md5 = bin2hex( base64_decode( $content_md5 ) );
if ( $md5_file != $content_md5 ) {
 ...

#16 in reply to: ↑ 15 @rmccue
11 years ago

Replying to dd32:

Correct, It's not handled at present as it was modelled on the previous (incorrect) implementation.

The check would need to change to

$md5_file = md5_file( $tmpfname  );
$content_md5 = bin2hex( base64_decode( $content_md5 ) );
if ( $md5_file != $content_md5 ) {
 ...

I believe this should actually be md5_file( $tmpfname, true ) to get the MD5 as raw binary data. It's also worth noting that although some servers use base64 encoding as per the specification, many don't. IMO, we should check the base64'd version first, then do the current check.

Also, can we split this into a separate function please? The REST API needs to do these checks too, and it'd be nice to have it in a function like wp_md5_check( $filename, $header_value ) that can handle both base64'd and non-encoded.

#17 @dd32
11 years ago

I believe this should actually be md5_file( $tmpfname, true ) t

We could do it that way too, either way, with the current error string, it requires them in a Hex format, but using bin2hex() conditionally within the error condition would be less computations for sure.

#18 @dd32
11 years ago

  • Keywords needs-patch added; has-patch commit needs-testing removed

Needs patch, anyone feel like whipping up something which does:

  • Separate function
  • Accepts md5 or base64(md5_raw) - Check the string length, md5 = 32, base64(md5_raw) = 24
  • Accepts filename
  • returns a error w/ the octal values
  • performs least amount of calculations in the most-used path
  • performs the check in download_url() if the header is present.

@dd32
11 years ago

#19 @dd32
11 years ago

Attachment attachment:20074.5.diff​ added

  • Splits it into a separate function (verify_file_md5())
  • Performs the check in download_url()
  • Handles a md5() or base64_encode( md5_file( $file, true ) ) value in the header / passed value.

Tested fine for me, WordPress.org still doesn't return the header, but will just work when it does.

#20 @dd32
11 years ago

  • Keywords has-patch added; needs-patch removed

@rmccue would that work for the JSON API use-case?

I'm also not set on the wording of the error - to the extent that I'm considering download_url() might want to return a simplified error, something along the lines of "The download could not be verified against it's provided checksum". We could even just return (bool) from the function, skip a particular error message entirely.

#21 @dd32
11 years ago

  • Owner set to dd32
  • Resolution set to fixed
  • Status changed from new to closed

In 25541:

When using download_url(), if the resource supplies a Content-MD5 header, verify the downloaded file against it. Fixes #20074

This ticket was mentioned in IRC in #wordpress-dev by dz0ny_. View the logs.


10 years ago

#23 follow-up: @dz0ny
10 years ago

wordpress.org servers no longer provide this header. Any plans to re-enable them?

#24 in reply to: ↑ 23 @nacin
10 years ago

Replying to dz0ny:

wordpress.org servers no longer provide this header. Any plans to re-enable them?

We never actually got it set up, as it's a bear to set up in nginx with our load balancing setup.

Note: See TracTickets for help on using tickets.