#35316 closed defect (bug) (fixed)
Images with latin extended characters in exif (slovak/czech) are missing thumbnails
Reported by: | michalrusina | Owned by: | ocean90 |
---|---|---|---|
Milestone: | 4.4.2 | Priority: | normal |
Severity: | normal | Version: | 4.4 |
Component: | Media | Keywords: | has-patch |
Focuses: | Cc: |
Description
after uploading image with latin extended characters in exif (slovak/czech) there is no thumbnail in media library and image sizes (thumb, medium, large and all custom sizes) are missing in wp, so wordress will autmatically insert fullsize image in themes where thumbnail size should be. the image sizes (files) are created but are NOT registered in wordpress.
Attachments (3)
Change History (24)
#1
@
9 years ago
- Milestone Awaiting Review deleted
- Resolution set to duplicate
- Status changed from new to closed
- Version 4.4 deleted
#2
@
9 years ago
- Keywords reporter-feedback added
- Resolution duplicate deleted
- Status changed from closed to reopened
- Version set to 4.4
i dont think that this is a duplicate of #15955, in this case there is nothing wrong with filenames (filename gabcikovo.jpg has no extended characters), the whole problem lies in exif data. as i mentioned earlier, the files are successfully generated, but wordpress assumes that theyre not.
#5
@
9 years ago
@michalrusina: Interesting, could you please upload or link any image example for testing?
#6
@
9 years ago
@pavelevap image for testing: https://a-static.projektn.sk/2016/01/gabcikovo.jpg
This ticket was mentioned in Slack in #core by pavelevap. View the logs.
9 years ago
#11
@
9 years ago
- Keywords has-patch added
- Milestone changed from Awaiting Review to 4.4.2
- Severity changed from major to normal
Since #33772 attachment metadata includes IPTC keywords. But the keywords are not UTF8 encoded like titles or captions, see tags/4.4/src/wp-admin/includes/image.php?marks=405-409#L404.
Current output:
array(5) { ["width"]=> int(4016) ["height"]=> int(2673) ["file"]=> string(23) "2016/01/gabcikovo-3.jpg" ["sizes"]=> array(5) { ["thumbnail"]=> array(4) { ["file"]=> string(23) "gabcikovo-3-150x150.jpg" ["width"]=> int(150) ["height"]=> int(150) ["mime-type"]=> string(10) "image/jpeg" } ["medium"]=> array(4) { ["file"]=> string(23) "gabcikovo-3-300x200.jpg" ["width"]=> int(300) ["height"]=> int(200) ["mime-type"]=> string(10) "image/jpeg" } ["medium_large"]=> array(4) { ["file"]=> string(23) "gabcikovo-3-768x511.jpg" ["width"]=> int(768) ["height"]=> int(511) ["mime-type"]=> string(10) "image/jpeg" } ["large"]=> array(4) { ["file"]=> string(24) "gabcikovo-3-1024x682.jpg" ["width"]=> int(1024) ["height"]=> int(682) ["mime-type"]=> string(10) "image/jpeg" } ["post-thumbnail"]=> array(4) { ["file"]=> string(24) "gabcikovo-3-1200x799.jpg" ["width"]=> int(1200) ["height"]=> int(799) ["mime-type"]=> string(10) "image/jpeg" } } ["image_meta"]=> array(12) { ["aperture"]=> float(4) ["credit"]=> string(4) "TASR" ["camera"]=> string(8) "NIKON D4" ["caption"]=> string(126) "Na snímke turbína na výrobu elektrickej energie vo Vodnej elektrárni Gabèíkovo 9. marca 2015. FOTO TASR - Martin Baumann" ["created_timestamp"]=> int(1425908436) ["copyright"]=> string(22) "Tlaèová agentúra SR" ["focal_length"]=> string(2) "14" ["iso"]=> string(4) "2500" ["shutter_speed"]=> string(17) "0.066666666666667" ["title"]=> string(46) "Vodná turbína na výrobu elektrickej energie" ["orientation"]=> int(1) ["keywords"]=> array(2) { [0]=> string(58) "Slovensko vl�da energetika Vodn� elektr�re� Gab��kovo prem" [1]=> string(17) "Fico n�v�teva TTX" } } }
35316.patch encodes the keywords.
Until this gets fixed in core you can use the following function:
<?php function trac35316_fix_iptc_keywords_encoding( $meta ) { foreach ( $meta['keywords'] as $key => $keyword ) { if ( ! seems_utf8( $keyword ) ) { $meta['keywords'][ $key ] = utf8_encode( $keyword ); } } return $meta; } add_filter( 'wp_read_image_metadata', 'trac35316_fix_iptc_keywords_encoding' );
#12
follow-up:
↓ 13
@
9 years ago
@ocean90: Great, patch works well for adding available sizes.
How could wrong encoding leads to missing _wp_attachment_metadata
? I am not sure about that...
There are still some encoding issues:
- 3 original keywords from image
Slovensko vláda energetika Vodná elektráreň Gabčíkovo premiér Fico návšteva TTX Slovensko vláda energetika Vodná elektráreň Gabčíkovo prem
- Only 2 results from
_wp_attachment_metadata
with some wrong encoding
[keywords] => Array ( [0] => Slovensko vláda energetika Vodná elektráreò Gabèíkovo prem [1] => Fico návteva TTX )
#13
in reply to:
↑ 12
@
9 years ago
Replying to pavelevap:
How could wrong encoding leads to missing
_wp_attachment_metadata
? I am not sure about that...
Because of the broken chars the data gets blocked by some sanity checks in wpdb.
3 original keywords from image
I have only 2:
("Slovensko vl\U00e1da energetika Vodn\U00e1 elektr\U00e1re\U0148 Gab\U010d\U00edkovo premi\U00e9r ","Fico n\U00e1v\U0161teva TTX")
with some wrong encoding
It uses utf8_encode()
which is also used for titles and captions. Are those wrong too? If yes that should probably be handled in a separate ticket.
But I also noticed that the long keyword is truncated, but it's already truncated when the the data comes from iptcparse().
#14
@
9 years ago
@ocean90: You are right, chars are probably stripped inside strip_invalid_text()
: https://core.trac.wordpress.org/browser/tags/4.4.1/src/wp-includes/wp-db.php#L2788
And that is why process_fields()
returns false
here: https://core.trac.wordpress.org/browser/tags/4.4.1/src/wp-includes/wp-db.php#L2085
Function returns only 2 keywords, but Windows shows 3 (see attached screenshot). I am not sure what is wrong.
Encoding: Yes, also caption
(later saved as post_excerpt
) is wrong: Gabèíkovo
should be Gabčíkovo
. Also copyright
is wrong, title
probably does not contain problematic chars.
#15
@
9 years ago
A test image full of non-utf8 data that we can test with and also throw into some unit tests would be beneficial here.
#16
@
9 years ago
@dd32 this image triggers this bug (was referenced earlier) https://a-static.projektn.sk/2016/01/gabcikovo.jpg
@
9 years ago
Image from https://core.trac.wordpress.org/ticket/35316#comment:6 for archival purposes
#17
@
9 years ago
Thanks @michalrusina I read over that and missed it :(
I've uploaded it here for archival purposes incase the origin ever switches it out.
Hi there, thanks for the report.
We're tracking this issue in #15955, see also comment:11:ticket:15955.
Related/duplicates: #18634, #19842, #21217, #22363, #28808, #23588, #32887.