media_sideload_image() broken with filenames containing strange characters (e.g., +, %)
|Reported by:||Coolkevman||Owned by:|
I'm using the media_sideload_image() method in the upcoming version of my e107 Importer script (see: http://github.com/kdeldycke/e107-importer/blob/b7925fdac6aa43db4be5b7925265a83d95fc62ad/e107-importer.php#L277 ) to upload remote images into WordPress.
This method work as expected with lots of images from a lot of different sources, but fail on URLs containing spaces.
Let me illustrate this bug with an example.
When trying to upload the image located at
http://home.nordnet.fr/francois.jankowski/pochette avant thumb.jpg
the result looks like this on the file system: http://twitpic.com/3s0dk7 . As you can see, image file names are clean. But in the Media Manager, here is what you have: http://twitpic.com/3s0e5d . No thumbnails seems to have been created.
Now, trying to fix this, I modified the original URL before calling media_sideload_image() with the following code:
$img_url = str_replace(' ', '%20', html_entity_decode($img_url)); $new_tag = media_sideload_image($img_url, $post_id);
With this patch, here is the result on the filesystem: http://twitpic.com/3s0ets . I was surprised by WordPress not sanitizing URLs. Is that normal ?
But the most surprising stuff is in the Media Manager: http://twitpic.com/3s0hup . It looks like thanks to this hack, WordPress somehow succeeded downloading the remote file but messed with filesystem naming. What let me think this ? The Media Manager, get the right image thumbnail dimensions but not the binary payload of the thumbnail (contrary to the case above were no binary nor dimensions are available about the thumbnail).
All of this was tested in WordPress 3.1-RC2.
As for the idea of the patch above, it come from a very old version of my plugin (v0.9) that was based on WordPress 2.3.2. There, I somehow found the root cause of the problem, according the comment I wrote 3 years ago:
// The fopen() function in wp_remote_fopen() don't like URLs with space chars not translated to html entities
I should have posted this bug report sooner, as now I've forgotten everything about this issue... :(
Change History (48)
- Milestone Awaiting Review deleted
- Resolution set to invalid
- Status changed from new to closed
comment:7 in reply to: ↑ 5 ; follow-up: ↓ 8 Coolkevman — 3 years ago
- Keywords reporter-feedback removed
- Resolution invalid deleted
- Status changed from closed to reopened
comment:13 Azuaron — 3 years ago
- Component changed from HTTP to Media
- Keywords needs-patch added; has-patch removed
- Severity changed from normal to major
- Summary changed from media_sideload_image() broken with URLs containing spaces to media_sideload_image() broken with filenames containing strange characters (e.g., +, %)
- Version changed from 3.1 to 3.2.1
comment:15 follow-up: ↓ 16 kawauso — 3 years ago
- Keywords has-patch needs-testing added; needs-patch removed
comment:19 nacin — 2 years ago
- Keywords 3.5-early needs-unit-tests added
- Milestone changed from 3.4 to Future Release
comment:22 krembo99 — 2 years ago
- Cc krembo99 added
comment:39 johnbillion — 11 days ago
- Keywords commit added; has-patch needs-testing 3.5-early removed
- Milestone changed from Future Release to 4.0