#31050 closed feature request (fixed)
Better PDF Upload Management
Reported by: | celloexpressions | Owned by: | kirasong |
---|---|---|---|
Milestone: | 4.7 | Priority: | normal |
Severity: | normal | Version: | |
Component: | Media | Keywords: | has-patch needs-testing has-dev-note |
Focuses: | ui | Cc: |
Description
Over the past several releases, WordPress has added improved support for more and more media types, particularly images, audio, and video. All other file types get reasonably decent handling, with file type icons in the admin and link-insertion into content for the front-end.
But a particularly notable gap is with PDF files. Even though it's generally preferred to put content directly into html/site format, there are still many unavoidable situations where users must post PDF files.
In my recent sheet music library project, I discovered that PDFs can easily be converted to images using ImageMagick and WP_Image_Editor. If we create thumbnail images of PDFs on upload, we could use those images instead of the generic document icon in the admin media library. Additionally, we could explore improved means of displaying PDFs on the front-end. For example, this could be accomplished by offering the user the ability to insert a thumbnail-image-preview, linked to the PDF file, instead of a plain-text link.
This is the basic code for generating a thumbnail of a PDF on upload, and storing it as post meta in a plugin:
/** * When a PDF is uploaded, generate a thumbnail image and save it in the attachment meta. */ add_filter( 'wp_generate_attachment_metadata', 'prefix_generate_pdf_thumbnail_metadata', 10, 2 ); function prefix_generate_pdf_thumbnail_metadata( $metadata, $attachment_id ) { $attachment = get_post( $attachment_id ); if ( 'application/pdf' === get_post_mime_type( $attachment ) ) { // Create a png from the pdf. $file = get_attached_file( $attachment_id ); $pdf = wp_get_image_editor( $file ); if ( ! is_wp_error( $pdf ) ) { // Most likely cause for error is that ImageMagick is not available. $filename = $pdf->generate_filename( 'image', null, 'png' ); $uploaded = $pdf->save( $filename, 'image/png' ); if ( ! is_wp_error( $uploaded ) ) { $upload_dir = wp_upload_dir(); update_post_meta( $attachment_id, 'pdf_thumbnail_url', $upload_dir['url'] . '/' . $uploaded['file'] ); } } } return $metadata; }
I'd like to propose with this ticket that we explore creating thumbnails of PDFs when they're uploaded and use them for improved administration and display purposes, when the server environment supports ImageMagick, of course.
Attachments (22)
Change History (127)
#2
@
10 years ago
Let's figure out the general strategy here first, then see how that leads to a patch.
We can do the actual image generation when it's done for audio and video in wp_generate_attachment_metadata()
. The generated image would be stored as its own sub-attachment, which could in turn have multiple sizes, like any other image attachment, that could be used in the admin or on the frontend. But we would need to prevent these sub-attachments from showing up in the admin anywhere. Would it maybe be better to just generate thumbnails at a couple of arbitrary sizes (probably a thumbnail for admin and full-sized for possible frontend use) and store their URLs as attachment meta instead?
We should also back-generate images for all existing PDF uploads if possible, but I have no idea how to do that. I think we could get to a patch here for 4.2 with a team effort, but I know we'll get improvements in here eventually.
This ticket was mentioned in Slack in #core by tomauger. View the logs.
10 years ago
#4
@
10 years ago
We've got a little hackfest group working on this, and the current thinking is that we let core handle generating the PDF thumbnail upon upload, as per @celloexpressions, but that the back-generation belongs more properly in plugin territory, since it's basically a one-time affair.
#5
@
10 years ago
It shouldn't be difficult to build a basic mechanism into core that generates any missing thumbnails either as-needed (probably a bad idea since it will always fail if there's no IMagick), during an upgrade routine, or when the attachment post object is updated. When it comes down to it, this is core functionality and you shouldn't need a plugin to use it on already-uploaded media, it should "just work". Pretty much all users would benefit from that just happening without going through a plugin. But I wouldn't worry about that detail too much yet - let's get the actual functionality working and then dig into those compatibility details.
#6
@
10 years ago
Thanks @celloexpressions, we'll deal with the batch update piece later.
The team here is at a bit of a crossroads and could use core dev feedback. It has to do with the actual thumbnail images that we create.
- We think that the thumbnail image of the PDF ought to be generated not just for the thumbnail size (ie: media gallery) but also for all the intermediate image sizes as well, so that the thumbnail can propberly be used on the front-end.
- If this is the case, then we have to decide whether these generated images are their own attachment that is somehow associated with the original (PDF) attachment, or simply the -150x150.jpg versions (etc) of the original PDF attachment.
- If we simply append the size postfix, then we have a TLA mismatch situation (.pdf vs. .jpg) - I don't know whether that could pose some issues down the road.
Interested to hear some perspectives on this.
This ticket was mentioned in Slack in #core by tomauger. View the logs.
10 years ago
#8
@
10 years ago
As for the "when" part of back-generating, could perhaps handle it the way we deal with ID3 tags.
#9
@
10 years ago
Good thought. I'm not currently familiar with the ID3 tag generation, but I'll look into it at the same time we look into the video thumbnail generation.
This ticket was mentioned in Slack in #core by tomauger. View the logs.
10 years ago
#11
@
9 years ago
- Focuses ui administration removed
Anyone willing to work on patch here?
How is image generation currently working for other non-image post types? Perhaps we can learn something from there.
#13
@
9 years ago
The first part to tackle this. Currently it only works on the grid since only a full size is getting generated.
Questions:
- Should we generate a thumbnail to or handle it differently?
- Should we always generate images for video/audio even without theme support?
#15
@
9 years ago
Tested it out.
Yeah, thinking we might want to generate a thumb as well, so it doesn't need to load the larger image when unnecessary.
Wondering if we can show the full or thumb in MCE/attachment page, when an image is available, as well.
#17
@
9 years ago
I dig the idea of creating a preview image/thumbnail to make it easier to visually distinguish documents inside the media library. I think we should make sure the top level attachment metadata (e.g. mime-type, etc.) refers to the PDF and not the preview image.
We probably should include a thumbnail since that will get used in some places but I wouldn't want the preview images to be thought of as a front-end experience solution. I would think something like https://mozilla.github.io/pdf.js/ might make for better front-end previews if we were going to go down that rabbit hole.
#18
follow-up:
↓ 19
@
9 years ago
Added two new patches:
- Generate thumbnail image
- Changed
image_downsize()
to also work for non images. Please check that part. - Because of that change image now shows at the list table
- Show image on the edit screen
Todo
- Fix front-end. Breaks since it wants to show the pdf file as image
- What to do about Video/Audio image? 31050.3.patch removes the logic added in #23673
#19
in reply to:
↑ 18
;
follow-up:
↓ 20
@
9 years ago
Replying to markoheijnen:
- What to do about Video/Audio image? 31050.3.patch removes the logic added in #23673
I had a similar problem where I needed the image for a video the way I solved it was to get the open graph data saved the image provided there.
<?php /** * http://stackoverflow.com/questions/2068344/how-do-i-get-a-youtube-video-thumbnail-from-the-youtube-api * * @param $url * * @return string */ private static function get_open_graph_data_for_url( $url ) { if ( empty( esc_url( $url ) ) ) { return ''; } $transit_id = 'LP-open-graph-data'; $transit_key = md5( $url ); $open_graph_data = get_transient( $transit_id ); if ( false === $open_graph_data || ! isset( $open_graph_data[ $transit_key ] ) ) { try { $scraper = new SimpleScraper( esc_url( $url ) ); $open_graph_data[ $transit_key ] = $scraper->getOgp(); set_transient( $transit_id, $open_graph_data, HOUR_IN_SECONDS ); } catch ( Exception $e ) { error_log( 'Caught SimpleScrape exception: ' . $e->getMessage() ); return false; } } return $open_graph_data[ $transit_key ]; }
I used the lib
/* +---------------------------------------------------------------------------+ | Simple Scraper (SimpleScraper.class.php ) | | Copyright (c) 2013-2015, Ramon Kayo | +---------------------------------------------------------------------------+ | Author : Ramon Kayo | | Email : contato@ramonkayo.com | | License : Distributed under the MIT License | | Full license : http://code.ramonkayo.com/simple-scraper/license.txt | +---------------------------------------------------------------------------+ | "Simplicity is the ultimate sophistication." - Leonardo Da Vinci | +---------------------------------------------------------------------------+ | Last modified : 2015-07-08 | +---------------------------------------------------------------------------+ */
Just a throught
Paul
#20
in reply to:
↑ 19
@
9 years ago
Replying to pbearne:
Replying to markoheijnen:
- What to do about Video/Audio image? 31050.3.patch removes the logic added in #23673
I had a similar problem where I needed the image for a video the way I solved it was to get the open graph data saved the image provided there.
Thats something unrelated to this. This is about getting data out of video/audio and in this case with ID3 tag data. Currently there is check if is has CPT support or theme support.
#21
@
9 years ago
Information storage was incorrect, fixed that and updated the previous code to work with that. The image of a PDF will be displayed on the attachment page. I'm not that happy with the quality but no clue how to fix that.
This one is ready to be reviewed.
Open questions:
- What to do about Video/Audio image?
- Which mimetypes will we support for this? Currently application/pdf and video/avi (haven't tested that one yet)
- Is
attachment_fallback_mimetypes
the best filter name? would love to hear alternatives. Maybe move the logic to a function or extensions in combination with wp_get_video_extensions().
This ticket was mentioned in Slack in #core by joemcgill. View the logs.
9 years ago
@
9 years ago
Docs for filter, only support PDF, check for fallback support without loading attachment in WP_Image_Editor
#23
@
9 years ago
- Focuses ui added
- Keywords needs-unit-tests needs-testing added
- Milestone changed from Future Release to 4.5
I think we should only support PDF, but allow users to specify other types if they'd like, with the opening to expand these types for core later. As far as I can tell, FFMPEG is required for any video types.
Notes:
- This is cool.
- When we use thumbs for the gallery display, the top and bottom of standard pages get cut off. Wondering if we shouldn't crop for these thumbs.
- It'd be cool to insert the thumbnail into the page in the Visual editor, if this is easily doable. If we do that, will probably want to run a user test to see if it's preferred behavior.
Added 31050.5.patch:
- Docs for filter
- Only support PDF in filter
- Check for fallback support without loading attachment in WP_Image_Editor
#24
follow-up:
↓ 25
@
9 years ago
Fine with only PDF. Was just checking if we could support more. Fallback can be removed since the same check is inside wp_image_editor_supports.
#25
in reply to:
↑ 24
;
follow-up:
↓ 27
@
9 years ago
Replying to markoheijnen:
Fine with only PDF. Was just checking if we could support more. Fallback can be removed since the same check is inside wp_image_editor_supports.
Maybe I misunderstand -- don't we still want to be sure that only PDF is used, even if other types are supported by the image editors available, for now?
This ticket was mentioned in Slack in #core-images by mike. View the logs.
9 years ago
#27
in reply to:
↑ 25
@
9 years ago
Replying to markoheijnen:
Fine with only PDF. Was just checking if we could support more. Fallback can be removed since the same check is inside wp_image_editor_supports.
Ah, nevermind, you are correct -- instantiating with wp_get_image_editor()
uses _wp_image_editor_choose()
internally, which is what wp_image_editor_supports()
runs before loading the image, so we're fine to skip checking for editor support ahead of time, like you did in the previous patch.
#29
@
9 years ago
This looks good. I like the version with Cropped display "landscape" => max-height 100%
- I think a typical PDF is for printing and will have portrait orientation.
#30
follow-up:
↓ 31
@
9 years ago
If possible, let's handle both portrait and landscape PDFs - depending on industry/usage, both could be common so it's better not to assume. For example, construction documents are nearly always landscape, and distributed in PDF format.
I still think there's room for improvement in the way PDFs can be displayed on the front end; maybe some sort of a tile that shows the thumbnail and title, but really anything other than plain links would be an improvement. We can look into that in another ticket once image generation is implemented for the admin side, and it would be optional regardless.
#31
in reply to:
↑ 30
@
9 years ago
Replying to celloexpressions:
If possible, let's handle both portrait and landscape PDFs - depending on industry/usage, both could be common so it's better not to assume. For example, construction documents are nearly always landscape, and distributed in PDF format.
I still think there's room for improvement in the way PDFs can be displayed on the front end; maybe some sort of a tile that shows the thumbnail and title, but really anything other than plain links would be an improvement. We can look into that in another ticket once image generation is implemented for the admin side, and it would be optional regardless.
It's just inverting the logic of images to let the image show completely instead of cropped.
Currently it's showing the image linked to the PDF on the frontend side. So that part has been improved.
This ticket was mentioned in Slack in #core by joemcgill. View the logs.
9 years ago
#33
@
9 years ago
Been thinking about this. To work it requires ImageMagick and Postscript on the server. There are some settings that we can pass through imagick to ps to try to get better quality previews.
There is a difference between setting album art for audio, poster image for video and preview image for PDF.
- Album art can be added separately, it should be a separate image attachment.
- Video posters are usually generated automatically but would be good to support adding them manually, to match audio/album art.
- PDF previews are always generated automatically. There is no need for separate image attachments.
Thinking that it's better to set another meta "key" for preview images in the attachment meta, perhaps preview_image
. Then keep the image details in there. That way we won't mix non-image mime types with image meta.
The structure of the preview_image
array can also be better than the existing image meta. For example we can store the sub-dirs separately and have all sizes in one array, etc.
#34
@
9 years ago
I could look into it. Reason why I did it this way was the easy integration in the existing code. To me the sizes
part already is something separate and not linked to only to images.
About the full image we could probably move that one to sizes
too. Reason why I set $metadatafile? was that it's used to get the relative path. However we could also use the meta key _wp_attached_file
for that. So I'm looking if that could be used to only add the PDF data to sizes
.
#35
follow-up:
↓ 55
@
9 years ago
I added PDF thumbnail generation to my plugin, [Media Library Assistant](https://wordpress.org/support/plugin/media-library-assistant), a while back. It supports dynamic generation in the [mla_gallery]
shortcode. It also has a bulk action in the Media/Assistant admin submenu to generate thumbnails and store them as separate Media Library items. You are welcome to incorporate any of the code or ideas in Core.
I also discovered that adding "Featured Image" support to Media Library items is easy and works well:
add_post_type_support( 'attachment', 'thumbnail' );
Once that's done you can assign an image to any non-image item (PDF, audio, video, Office documents) and use it in gallery displays, etc. It's a nice alternative to generating thumbnails.
#36
@
9 years ago
Added the option to have the full image inside the sizes array. I still need to look over the code since both approaches are generating 5 failing unit tests.
This ticket was mentioned in Slack in #core-images by markoheijnen. View the logs.
9 years ago
#38
@
9 years ago
Some notes from testing the latest patch.
- The thumbnails while browsing in the media modal and library are great.
- The attachment page displays the thumbnail of the PDF, and links to the PDF, great.
- Inserting a PDF into post content creates a link to the thumbnail of the PDF. This should link to the PDF.
- From the media modal and library, the URL in the attachment details pane points to the thumbnail of the PDF. This should link to the PDF.
- Relatedly, it is hard to get to the original PDF from the media modal and library. Even if the URL linked to the PDF, I would have to click on the URL, select all, and navigate to that URL.
#39
follow-up:
↓ 40
@
9 years ago
That's not good. I guess that was always a problem and will look into linking to the PDF instead of the Image.
Isn't it always hard to get the original link? So it's a side issue of what we are trying to fix here?
#40
in reply to:
↑ 39
;
follow-up:
↓ 41
@
9 years ago
Replying to markoheijnen:
Isn't it always hard to get the original link? So it's a side issue of what we are trying to fix here?
In trunk without the patch applied, embedding the PDF in a post will create a link to the original PDF, and the media modal URL points to the original PDF. So this would be a regression.
#41
in reply to:
↑ 40
@
9 years ago
Replying to ericlewis:
Replying to markoheijnen:
Isn't it always hard to get the original link? So it's a side issue of what we are trying to fix here?
In trunk without the patch applied, embedding the PDF in a post will create a link to the original PDF, and the media modal URL points to the original PDF. So this would be a regression.
Okay that way. Should be fixed now. Still looking at why the unit tests are failing.
This ticket was mentioned in Slack in #core by jorbin. View the logs.
9 years ago
#43
@
9 years ago
In general the patch is really good however when a PDF isn't correctly build, the images will be broken. I have tried to run several imagick methods to test things out but couldn't get closer to the answer. Using mergeImageLayers(Imagick::LAYERMETHOD_FLATTEN)
broke a PDF that normally did work for me. When using identifyImage
the only difference was the type. but forcing the broken one to another type didn't work either.
Below the warnings that imagemagick will output for the broken PDF.
**** Warning: can't process font stream, loading font by the name. **** Error reading a content stream. The page may be incomplete. **** File did not complete the page properly and may be damaged. **** This file had errors that were repaired or ignored. **** The file was produced by: **** >>>> Microsoft: Print To PDF <<<< **** Please notify the author of the software that produced this **** file that it does not conform to Adobe's published PDF **** specification.
#44
@
9 years ago
Also in this patch the contruct of Imagick will not load the image anymore but readimage()
will. This because setResolution()
need to be called before the image is been loaded.
#45
@
9 years ago
- Milestone changed from 4.5 to Future Release
Since @markoheijnen noted that this is not working with some unknown (but readily discoverable) quantity of PDFs in a currently undetectable way, going to punt this from 4.5.
I'd still love to see this make its way to core. It might make the most sense in a feature plugin, to allow us to test with a wider variety of PDFs.
#46
@
9 years ago
I would be very interested in testing some of the "broken PDF" files against the code in my plugin. I am using a combination of Imagick calls and direct Ghostscript exec calls and I have never seen the kind of errors reported in earlier posts to this topic.
Where can I find the PDF documents you're testing with? Thanks!
This ticket was mentioned in Slack in #core-images by mike. View the logs.
9 years ago
This ticket was mentioned in Slack in #core-images by joemcgill. View the logs.
9 years ago
This ticket was mentioned in Slack in #core-images by joemcgill. View the logs.
8 years ago
#52
@
8 years ago
I've been looking at this in several of my own projects. It's a bit jarring to see a wall of default images used for PDF documents. It should be relatively easy to generate a thumbnail / sample image using Imagick or similar so that users can see at a glace what this file is. I'd prefer any type of image regardless of how it's cropped, so long as you can get a feel for what type of file is being displayed.
If we can't / don't want to add a preview thumbnail image, it would be good to have a large preview image displayed when you click on the document itself (when it opens in the media attachment lightbox, where it shows you it's metadata).
How likely is it that this will be followed up in 4.7? I'm happy to try a pull together a simple solution if needed.
Cheers
This ticket was mentioned in Slack in #core by joemcgill. View the logs.
8 years ago
This ticket was mentioned in Slack in #core-images by mike. View the logs.
8 years ago
#55
in reply to:
↑ 35
@
8 years ago
Replying to dglingren:
I added PDF thumbnail generation to my plugin, [Media Library Assistant](https://wordpress.org/support/plugin/media-library-assistant), a while back. It supports dynamic generation in the
[mla_gallery]
shortcode. It also has a bulk action in the Media/Assistant admin submenu to generate thumbnails and store them as separate Media Library items. You are welcome to incorporate any of the code or ideas in Core.
I also discovered that adding "Featured Image" support to Media Library items is easy and works well:
add_post_type_support( 'attachment', 'thumbnail' );
Once that's done you can assign an image to any non-image item (PDF, audio, video, Office documents) and use it in gallery displays, etc. It's a nice alternative to generating thumbnails.
Using the post_thumbnail seems like it would be the WordPress way to do things, as attachments are just posts anyway, and it applies to all non-image types instead of just PDFs. Then it's just a matter of generating it on upload, and maybe marking it so the image is not displayed separately in the Media Library (or not).
My opinion for generating the already uploaded ones is that it should only happen when used, as in while displaying the thumbnails in Media Library, the PDF is next, get the thumbnail but if it doesn't exist, create it. It seems to me that all image sizes should be done that way instead of littering the server with unused image sizes. (There's a plugin that does this: Dynamic Image Resizer.)
This ticket was mentioned in Slack in #core-images by joemcgill. View the logs.
8 years ago
#57
@
8 years ago
Current status is that I'm refreshing the patch so it works again.
For plugins, I don't know how "Media Library Assistant" should work but it doesn't generate an image for me. "PDF Image Generator" has same mistake as the current code and so do some others that solly only use imagick logic. I will work on a catalog of PDF's we can use. The current one I use is an invoice that I can't share.
#58
@
8 years ago
@markoheijnen,
I am the Media Library Assistant author and I'd be happy to help you in any way I can. MLA supports image generation in two ways:
- The [mla_gallery] shortcode has an mla_viewer parameter that generates images for its gallery display.
- The Media/Assistant admin submenu has a "Thumbnail" bulk action that will generate images for PDF documents and add them to the Media Library as separate items. Then, for example, the new items can be used as the Featured Image for the document to avoid generating the image each time it is displayed in an [mla_gallery] output.
These options are described in the "Thumbnail Substitution Support, mla_viewer" section of the Settings/Media Library Assistant Documentation tab. Let me know if there's anything I can do to help!
#59
@
8 years ago
@dglingren I found later on that I read your message here as I was sure it was on Slack. So I tried out the bulk action and with the invoice PDF I have, it only shows the logo and the rest is removed. So same behavior as others.
But I will look into your plugins code to see if there are things to learn from.
This ticket was mentioned in Slack in #core-images by joemcgill. View the logs.
8 years ago
#61
@
8 years ago
@markoheijnen,
Thanks for your quick response. I am always looking for PDF documents to test with; if you get a failing document that you can share, let me know. MLA also does a lot of PDF parsing to extract IPTC, EXIF and XMP metadata and there are many odd cases that don't work without special code.
#62
@
8 years ago
Would it be useful to upload various types of PDFs (ideally generated from different types of software) to this ticket for testing? Or maybe setup an open dropbox/google drive folder somewhere? Probably the most unique examples I can share include sheet music, construction drawings, and construction schedules.
#63
@
8 years ago
@markoheijnen Doing some trac gardening. Are you still working on this for 4.7? I want to make sure the milestone accuretly reflects what is being worked on for 4.7
#64
@
8 years ago
@jorbin Awesome job. Yes, I'm still looking at it. I probably give it another 2 weeks to see if I can get it in for 4.7.
@celloexpressions I got something at https://github.com/markoheijnen/Debug-Image-editors but it does more checks. I can create categories so PDF's only can be tested as well.
@dglingren I will collect more PDF's and test it with your plugin this weekend. Sorry for the delay.
This ticket was mentioned in Slack in #core-images by joemcgill. View the logs.
8 years ago
#66
@
8 years ago
- Keywords needs-refresh added
The most recent patch is failing for me when applying. Needs a refresh.
This ticket was mentioned in Slack in #core by jeffpaul. View the logs.
8 years ago
This ticket was mentioned in Slack in #core-images by joemcgill. View the logs.
8 years ago
#69
@
8 years ago
- Milestone changed from 4.7 to Future Release
Chatted with @markoheijnen on this today. Since there isn't a refresh yet, going to remove it from the milestone for now. Of course, if more happens here, we can move it back in.
If I have time today or tomorrow, I'll see what the work looks like to give it a refresh -- by all means, if others are interested, please feel free to do so as well.
#70
@
8 years ago
- Keywords needs-refresh removed
In 31050.diff, refreshed patch, made a few whitespace changes, and added a couple of things we'll want to consider before a commit in TODOs.
This should do for testing, but does not resolve any of the issues found previously with broken thumbnails being generated.
Please test this with your PDFs and please attach any test PDFs you find that break the functionality!
This ticket was mentioned in Slack in #core-images by mike. View the logs.
8 years ago
#72
@
8 years ago
- Keywords dev-feedback removed
- Milestone changed from Future Release to 4.7
After a chat with @helen, adding back to milestone for testing attention.
Please test this with any/all types of PDFs and post any that do not work properly.
Some of the broken ones have been Word PDFs (with images in the header/top of page), but I haven't been able to reproduce this on Word Mac, so testing with PDFs from Word on Windows would be appreciated.
Since we can't detect the warnings Ghostscript/Imagick is tossing, hoping we can find a profile of the PDFs and thumbs that fail.
This ticket was mentioned in Slack in #core-images by mike. View the logs.
8 years ago
@
8 years ago
A really old and bad PDF of a public paper that causes an HTTP Error on upload and no thumbnail generation
#74
follow-up:
↓ 75
@
8 years ago
I attached a file - Emperor_s_New_Truncation.pdf - and I apologize for the Comic Sans in it. When I upload it, I get an HTTP Error on upload as it fails to generate a thumbnail. Per my expectations, the file DID upload, just no thumbnails. It's 144kb which is pretty small so I was surprised at that. I have a few others. They're all public PDFs anyway. IIRC they were made on Windows XP back in 2008 or so, but we couldn't remember. It claims to be made by PrimoPDF http://www.primopdf.com
#75
in reply to:
↑ 74
@
8 years ago
Replying to Ipstenu:
I attached a file - Emperor_s_New_Truncation.pdf
Thanks for testing, Mika!
I wasn't able to reproduce the HTTP Error on VVV (tested in VVV first, since when we chatted you noted that was where you were testing).
Any chance you could use dev tools to check on what the HTTP response/content was, to give us an idea on why it failed?
#76
follow-up:
↓ 77
@
8 years ago
I dropped a bunch of files in and this worked great for a while. Then, one of the exceeded the max upload size and another had an http error (this happened for several, which I believe was an internet connectivity issue). Now, no images will generate thumbnails, in the media modal or library. Not even the ones that worked originally. I tried re-applying the patch, but no luck. Any ideas?
#77
in reply to:
↑ 76
@
8 years ago
Replying to celloexpressions:
I dropped a bunch of files in and this worked great for a while. Then, one of the exceeded the max upload size and another had an http error (this happened for several, which I believe was an internet connectivity issue). Now, no images will generate thumbnails, in the media modal or library. Not even the ones that worked originally. I tried re-applying the patch, but no luck. Any ideas?
This sounds like it might be a hosting throttling issue. I assume it's still broken if you revert the patch?
Time will probably reset whatever throttle you hit, but you could also try killing the PHP processes via the shell, if you have access.
This ticket was mentioned in Slack in #core by desrosj. View the logs.
8 years ago
#79
@
8 years ago
- Owner changed from markoheijnen to mikeschroder
Assigning to myself while I'm working on this -- more than happy to pass it on if/when someone else is taking point next :)
#80
@
8 years ago
Looked at this with @markoheijnen and a bit with @joemcgill (who has other comments as well), but a few notes:
Attached 31050.2.diff. I mis-applied some of the logic at the beginning of image_downsize()
, which is fixed in the latest patch.
However, now that it's working "properly," the black thumbnail issue is back, because it looks like the "full" size rendered JPGs are properly saved, whereas the thumbnails are not being generated correctly with all PDFs. I'll dig into this to see whether it can be solved without intentionally making use of these.
It's great that it highlights the issue, and we'll have to decide how to proceed.
After chatting with @markoheijnen, I'm okay with proceeding without a filter for now, with the PDF detection parts within WP_Image_Editor
, so removed the TODOs related to that, and documented the WP_Image_Editor changes a bit more.
Also, changed the logic in edit_form_image_editor()
so that wp_edit_form_attachment_display
doesn't result in double output.
Still to do: Double-check that Imagick::readImage()
is available all the way back in versions of Imagick.
This ticket was mentioned in Slack in #core-images by mike. View the logs.
8 years ago
#82
@
8 years ago
There were some good details in the Slack conversation above.
A few notes:
- I've got a workaround working where we generate the thumbs with the initially created render, which fixes the
multi_resize
/black thumbnail problem, and allows us to create whatever other sizes we think best. - Imagick supports
readImage()
back to at least 2.0.0 alpha (WP supports 2.2.0+). - @joemcgill notes that he'd like to see a different/smaller size used in the gallery, and I entirely agree.
- It's an option to only commit the thumbs, and not the Attachment Details view, which would allow WP to only generate lower quality images (it's faster to render PDFs at a lower DPI)
- @joemcgill notes that he'd like to explore the post thumbnail option before this gets committed (I'm going to dig into what this would mean, and connect with him and @markoheijnen about it)
- @joemcgill notes that we'll want to address the issue of some PDFs timing out that wouldn't have previously, and had suggested a size limit for the thumbnail generation. I noted that a lower DPI would also help with this -- especially for image intensive PDFs.
- @joemcgill would prefer if we didn’t have to hack around
image_downsize()
, which he thinks would be possible if we didn't need to read$meta['sizes']
from PDF attachments.
If I missed anything, please feel free to add additional comments :)
#83
@
8 years ago
Thanks @mikeschroder. A few other bits of feedback that I would add, which still apply to 31050.2.diff, are that full size image being generated from the PDF should be named like "filename.jpg" rather than "filename.pdf.jpg" and for images we store the full size metadata outside the [sizes]
array. We should probably stay consistent for PDFs unless there's a good reason not to.
#84
@
8 years ago
Any chance you could use dev tools to check on what the HTTP response/content was, to give us an idea on why it failed?
Sorry about the delay. Was offline.
It's a 502 Bad Gateway all along the way.
Safari's Request & Response:
POST Cached No Status Bad Gateway Code 502 Encoded 166 B Decoded 166 B Transferred 329 B
Chrome's Response Headers:
Connection:keep-alive Content-Length:568 Content-Type:text/html; charset=utf-8 Date:Wed, 19 Oct 2016 18:39:05 GMT Server:nginx
Testing just that PDF worked a second time on a fresh setup of the same flavor. So I ran them all again and got different PDFs failing. This made me wonder if the issue was how many I was uploading at once, though /srv/log/php_errors.log
was empty. Also once a PDF failed, it failed 'forever' which was weird. (Now a file called SRA2.pdf is failing... Attached for grins and giggles)
I ran an svn up again and re-tested with 31050.2.diff - same problem, but the same "new" PDF.
Turned on Xdebug and reloaded Vagrant. Same errors.
Both PDFs were made in MS PowerPoint on a Windows XP box.
This ticket was mentioned in Slack in #core by joemcgill. View the logs.
8 years ago
This ticket was mentioned in Slack in #core by jeffpaul. View the logs.
8 years ago
This ticket was mentioned in Slack in #core-media by joemcgill. View the logs.
8 years ago
#88
@
8 years ago
I did a file search for PDFs on my system and started uploading.
I am noticing some PDFs returning HTTP errors, even though the PDF file actually does get copied into the appropriate upload directory. It happens pretty consistently for files that are small in terms of file size, but large in terms of page numbers.
Each of these files weighs in at under 400KB, but each is in the 30-50 page range.
#89
@
8 years ago
Also, after refreshing the page, every PDF file that returned an HTTP error appears in the media library with the default document icon.
@
8 years ago
Work around black thumbnail issue. Generate more sizes; use medium by default in gallery and large by default in attachment details. Lower rendering DPI to 128.
#90
@
8 years ago
In 31050.3.diff:
- Work around black thumbnail issue.
- Name full image '.jpg' rather than '.pdf.jpg'
- Generate more sizes (thumbnail, medium, large, full)
- Make thumbnail not crop by default.
- Use medium by default in gallery
- Use large by default in attachment details.
- Lower rendering DPI to 128. This reduces all PDF rendered image quality, but also reduces the chances of timeouts. We'll have to decide what compromise we're comfortable with here.
Left:
- Decide what to do with full images, whether to discard, or how to store in meta if we're still going to. If we don't store in meta, we'll need to change to checking for whether there is an image representation for a non-image filetype a different way. I actually don't mind the way this is working currently, since it signals "The 'file' isn't an image, so the 'full' rendered image is here."
- Figure out why
data.size.url
isn't getting set by the time we hit the checks inmedia-template.php
. I've added the extra sizes, and they're being sent via JS like with standard images (data.sizes
), but the "default" isn't getting set to medium, which is why it's being hardcoded currently. I'm probably missing something obvious here, so would appreciate any pointers here.
This ticket was mentioned in Slack in #core-media by mike. View the logs.
8 years ago
This ticket was mentioned in Slack in #core by jeffpaul. View the logs.
8 years ago
#93
follow-up:
↓ 94
@
8 years ago
31050.3.diff is working much better on the performance front. 31050.4.diff cleans up a few things:
- Since we're saving the attachment metadata similarly to how images save data, most of the internal image functions work just fine, so I've removed some changes to
image_downsize()
that seemed unnecessary. - I fixed two issues with the way orientations were being calculated. First, in
wp_prepare_attachment_for_js()
when we use the sizes fallback to set the full size data we were calculating the orientation in reverse. While it's simple to do this calculation, I'm not against adding a helper function as @markoheijnen suggested on Slack. I also fixed a typo which meant we were failing to set the orientation, height, and width attributes for the response. - The only other change is to go back to the default size and crop attributes for image sizes in
wp_generate_attachment_metadata()
, which I don't feel strongly about if there is a good reason to hard code those values in.
Remaining issues I see:
- On the edit media screen
wp-admin/post.php?action=edit
the thumbnail is not showing up for PDFs because it's attempting to use the attachment file and not the thumbnail image. - I'm not confident that adding the
attachment_fallback_mimetypes
filter is needed here since any mimes that are added to that array might need its own process for creating a fallback image rather than using the same process we use for PDFs.
I'm going to keep testing, but otherwise this looks good.
#94
in reply to:
↑ 93
;
follow-up:
↓ 95
@
8 years ago
Replying to joemcgill:
- I fixed two issues with the way orientations were being calculated. First, in
wp_prepare_attachment_for_js()
when we use the sizes fallback to set the full size data we were calculating the orientation in reverse. While it's simple to do this calculation, I'm not against adding a helper function as @markoheijnen suggested on Slack. I also fixed a typo which meant we were failing to set the orientation, height, and width attributes for the response.
Also not opposed. Good catch!
- The only other change is to go back to the default size and crop attributes for image sizes in
wp_generate_attachment_metadata()
, which I don't feel strongly about if there is a good reason to hard code those values in.
This was done to make the thumbnail size more useful. By disabling crop for the thumbnail size, this makes the entire page visible, and avoids cropping out of the middle of a page (which allows visible headers in the thumbnail). I've left it the way you changed it, but please test with both and see which one you prefer (it's easiest to test by uploading various PDFs with each patch, then looking in the Media list view).
Remaining issues I see:
- On the edit media screen
wp-admin/post.php?action=edit
the thumbnail is not showing up for PDFs because it's attempting to use the attachment file and not the thumbnail image.
This was caused by the removal of the bits in image_downsize()
that replaced the PDF's filename with the "full image" filename.
I've restored these in the most recent patch, but am definitely open to a different or cleaner way of doing it.
- I'm not confident that adding the
attachment_fallback_mimetypes
filter is needed here since any mimes that are added to that array might need its own process for creating a fallback image rather than using the same process we use for PDFs.
I'm not opposed to pulling this for now, and adding it back if there is a call for it.
I'm going to keep testing, but otherwise this looks good.
Great to hear! I noticed that the 'full' size is still in there. I've been assuming you want to see it pulled before a commit, rather than standardize on it, but want to be sure we're on the same page here, since there's a bit of doing to do so. :)
#95
in reply to:
↑ 94
@
8 years ago
- Keywords needs-unit-tests removed
Replying to mikeschroder:
This was done to make the thumbnail size more useful. By disabling crop for the thumbnail size, this makes the entire page visible, and avoids cropping out of the middle of a page (which allows visible headers in the thumbnail).
It's probably better the way you had it. 31050.6.diff reverts back to disabling crop for thumbnails.
This was caused by the removal of the bits in
image_downsize()
that replaced the PDF's filename with the "full image" filename.
I've restored these in the most recent patch, but am definitely open to a different or cleaner way of doing it.
Ah, I had missed that when image_get_intermediate_size()
fails, we needed to set fallback data. Looks good to me.
I'm not opposed to pulling [the
attachment_fallback_mimetypes
filter] for now, and adding it back if there is a call for it.
Let's do that for now. 31050.6.diff removes the filter. I'm ok to add it back if we have a need for it but would rather not get stuck with back-compat issues if we find we needed the filter to work differently.
Great to hear! I noticed that the 'full' size is still in there. I've been assuming you want to see it pulled before a commit, rather than standardize on it, but want to be sure we're on the same page here since there's a bit of doing to do so. :)
The more I think about it, the more I'm coming around to the idea of keeping the full
size data in the sizes
array.
I've added a unit test for wp_generate_attachment_metadata()
to show that it should generate image sizes for PDFs. We could certainly do with more tests here, but it's a start. You'll need to add the wordpress-gsoc-flyer.pdf file from this ticket to your /tests/phpunit/data/images/
directory to get this test to pass.
#96
@
8 years ago
- Keywords commit added
Thanks @joemcgill!
Made a minor changes to the test (mainly to clean up the PDF after the fact) and putting this together for a first-run commit.
Was able to test using all of the posted PDFs locally, and they end up with rendered thumbnails as expected.
Would be interested to hear from those who have tested if the change to rendering DPI makes them work better on their hosts as well.
#98
@
8 years ago
Now that this has landed in trunk, if you still see issues with PDFs/hosting you're testing, please follow up here: #38522
#99
@
8 years ago
#100
@
8 years ago
Ghostscript was missing, it is not part of Imagemagick. Installs separately.
Sorry for mail notifications, but it can help someone else.
Not so easy to install on localhost/windows. Kind of 7 different tutorials with 2 of my own guesses/tweaks.
This ticket was mentioned in Slack in #core by swissspidy. View the logs.
7 years ago
#105
@
4 years ago
- Keywords has-dev-note added; commit removed
Linking to the published dev note for reference: https://make.wordpress.org/core/2016/11/15/enhanced-pdf-support-4-7/
Yes, please! Will happily move this to 4.2 if a patch materializes in a timely manner.