Make WordPress Core

Opened 20 months ago

Last modified 2 months ago

#60788 new defect (bug)

Content-Disposition support in download_url() seems broken

Reported by: siliconforks's profile siliconforks Owned by:
Milestone: Awaiting Review Priority: normal
Severity: normal Version: 6.4.3
Component: HTTP API Keywords: has-patch 2nd-opinion needs-testing needs-unit-tests
Focuses: Cc:

Description (last modified by SergeyBiryukov)

In [51939] a change was made to resolve ticket #38231 to make download_url() use the Content-Disposition header to specify the name of the downloaded file. I realize I'm a bit late to the party here since the ticket was opened more than 7 years ago and the change was made more than 2 years ago, but I think that this change was flawed and needs to be reconsidered (and possibly reverted entirely). The way it was implemented seems fundamentally broken and has the effect of making download_url() basically impossible to use reliably.

Consider the following simple PHP script named echo.php:

<?php

header( 'Content-Type: text/plain' );
header( 'Content-Disposition: attachment; filename="echo.txt"' );
if ( isset( $_GET['echo'] ) ) {
        echo $_GET['echo'];
}

This script just takes a query string parameter echo and outputs it, in plain text format, with suggested filename echo.txt.

Now consider the following file, download.php, which is intended to be used with WP-CLI:

<?php

$foo = download_url( 'http://example.test/echo.php?echo=foo' );
echo file_get_contents( $foo );
echo "\n";
unlink( $foo );

$bar = download_url( 'http://example.test/echo.php?echo=bar' );
echo file_get_contents( $bar );
echo "\n";
unlink( $bar );

If you run this, it works as expected:

$ wp eval-file download.php
foo
bar

Now consider the following file, download-bad.php:

<?php

$foo = download_url( 'http://example.test/echo.php?echo=foo' );
$bar = download_url( 'http://example.test/echo.php?echo=bar' );

echo file_get_contents( $foo );
echo "\n";

echo file_get_contents( $bar );
echo "\n";

unlink( $foo );
unlink( $bar );

If you try to run this, the results are not good:

$ wp eval-file download-bad.php
bar
bar
PHP Warning:  unlink(...tmp/echo.txt): No such file or directory...

You might argue that the code for download-bad.php is flawed, and should be rewritten to look more like download.php, and I would agree. However, note that download-bad.php would have worked fine before the change [51939] was made. Furthermore, even download.php is likely to display buggy behavior if two users attempt to run it at the same time. I don't think there is any reliable way to use download_url() in a manner that will protect against concurrent access (short of adding some kind of external locking mechanism). Consider the following PHP files, foo.php...

<?php

$foo = download_url( 'http://example.test/echo.php?echo=foo' );
sleep( 10 );
echo file_get_contents( $foo );
echo "\n";
sleep( 10 );
@ unlink( $foo );

...and bar.php:

<?php

sleep( 5 );
$bar = download_url( 'http://example.test/echo.php?echo=bar' );
sleep( 10 );
echo file_get_contents( $bar );
echo "\n";
@ unlink( $bar );

Now try running them both at the same time:

$ wp eval-file foo.php & wp eval-file bar.php
bar
bar

Obviously the output might vary because there's a race condition here, but I've added sleep() calls in such a way that the output will usually be bar and bar.

I'm not really sure what should be done with download_url(), but as it currently stands it seems essentially impossible to use in a reliable manner. Personally I would probably recommend just removing all Content-Disposition support entirely. That would technically be a backward compatibility break, but I'm not sure anyone is actually using this functionality? To me it doesn't really even make any sense as currently implemented. Why would you want a temporary file stored under a fixed name?

It looks like the Content-Disposition feature was originally proposed as a solution to the problem that download_url() uses wp_tempnam() in such a way that it can create very long filenames for some URLs, as described in #34938 - but that doesn't make much sense either:

  1. The change to use the Content-Disposition header bears no resemblance to the change that was originally requested in #34938
  2. Adding support for the Content-Disposition header doesn't actually fix the original problem (what if the URL being downloaded doesn't have a Content-Disposition header?)
  3. Even if the URL being downloaded has a Content-Disposition header, the way the Content-Disposition support was implmented means that it still creates a file based on the URL first, then renames it later, so the first file could still have a filename which is too long
  4. Another change was made in [37598] which should (mostly) address the original problem

That having been said, the original patch https://core.trac.wordpress.org/attachment/ticket/38231/38231.diff actually makes a lot more sense than what was finally committed. The original patch takes the Content-Disposition filename and feeds it as input into wp_tempnam(), so you will get a random filename which should prevent the issues in the examples I've given. Somehow, in the multiple iterations of the patch, the call to wp_tempnam() got lost, so the current code just uses the Content-Disposition filename directly instead of using a random filename based on the Content-Disposition filename.

I suppose the behavior could be changed to use a random filename based on the Content-Disposition filename, as was originally intended, but that would also technically be a backward compatibility break (assuming anyone is actually using this functionality and expecting the current behavior).

To sum up - I'm not sure exactly why this Content-Disposition functionality was added in the first place, but it should probably just be removed. If it can't be removed, then the behavior should be changed so that it uses a random filename based on the Content-Disposition filename rather than using the Content-Disposition filename directly. If that can't be done either, perhaps there could be another argument added to the download_url() function to turn the Content-Disposition functionality on or off? Ideally the default would be off.

Change History (5)

#1 @pmbaldha
2 months ago

Reproduction Report

Description

🐞 This report validates the issue can be reproduced.

Environment

  • WordPress: 6.8.2
  • PHP: 8.2.10
  • Server: Apache/2.4.43 (Win32) mod_fcgid/2.3.9a
  • Database: mysqli (Server: 10.4.10-MariaDB / Client: mysqlnd 8.2.10)
  • Browser: Chrome 139.0.0.0
  • OS: Windows 10/11
  • Theme: Twenty Twenty-Four 1.0
  • MU Plugins: None activated
  • Plugins:
    • Test Reports 1.2.0
    • wpDataTables - Tables & Table Charts 6.3.3.5

Actual Results

  1. Create echo.php file in the root folder with the following code:
    <?php
    header( 'Content-Type: text/plain' );
    header( 'Content-Disposition: attachment; filename="echo.txt"' );
    if ( isset( $_GET['echo'] ) ) {
            echo $_GET['echo'];
    }
    
  1. Create download.php file in the root folder with the following code:
    <?php
    $foo = download_url( 'http://example.test/echo.php?echo=foo' );
    echo file_get_contents( $foo );
    echo "\n";
    unlink( $foo );
    
    $bar = download_url( 'http://example.test/echo.php?echo=bar' );
    echo file_get_contents( $bar );
    echo "\n";
    unlink( $bar );
    
  1. If you run this, it works as expected:
    $ wp eval-file download.php
    foo
    bar
    
  1. Create the download-bad.php file with the following code:
    <?php
    $foo = download_url( 'http://example.test/echo.php?echo=foo' );
    $bar = download_url( 'http://example.test/echo.php?echo=bar' );
    
    echo file_get_contents( $foo );
    echo "\n";
    
    echo file_get_contents( $bar );
    echo "\n";
    
    unlink( $foo );
    unlink( $bar );
    
  1. 🐞 If I try to run this, the results are not good:
    $ wp eval-file download-bad.php
    bar
    bar
    

This ticket was mentioned in PR #9654 on WordPress/wordpress-develop by @pmbaldha.


2 months ago
#2

  • Keywords has-patch added

#3 @siliconforks
2 months ago

I guess PR #9654 is an improvement over the current behavior, but I would still prefer to just see the Content-Disposition support in download_url() removed entirely.

  1. The proposed PR doesn't entirely protect against concurrent access - there is still a race condition (admittedly much less likely than before).
  1. For the people who do want Content-Disposition support, it's not clear they will actually be happy with the behavior proposed: the filename is now no longer predictable, since it might have a -1 (or a -2, or a -3, etc.) in it.
  1. The Content-Disposition functionality has a lot of other issues besides the fundamental problem I've discussed here in #60788. (Some examples: #63015, #63384.) It would take a lot of additional work to fix these. I don't think it's really worth doing so much work when the entire feature is fundamentally flawed and should not even exist in the first place. Again, I think it would be better to simply remove all the Content-Disposition functionality from download_url().

#4 @mindctrl
2 months ago

  • Keywords 2nd-opinion needs-testing needs-unit-tests added

#5 @SergeyBiryukov
2 months ago

  • Description modified (diff)
Note: See TracTickets for help on using tickets.