Opened 18 months ago
Last modified 9 days ago
#55535 accepted enhancement
Pre-populate Image Alt Text field with IPTC Photo Metadata Standard Alt Text
Reported by: |
|
Owned by: |
|
---|---|---|---|
Milestone: | 6.5 | Priority: | normal |
Severity: | minor | Version: | |
Component: | Media | Keywords: | needs-patch 2nd-opinion |
Focuses: | accessibility | Cc: |
Description
The IPTC Photo Metadata Standard includes the ability to embed Alt Text with a photo.
Seems like it would be helpful if WordPress would check for this data when an image is uploaded, and, if it exists, pre-populate the Alt Text field with it.
I could see this being especially useful for site owners who purchase stock photography; if the alt text is embedded with those images, it would save the site owner time, and it would also help ensure that an Alt Text itself is added -- and is an accurate description of the image (assuming the photographer or stock photo site actually enters a good description!).
http://www.iptc.org/std/photometadata/specification/IPTC-PhotoMetadata#alt-text-accessibility
Attachments (1)
Change History (48)
This ticket was mentioned in Slack in #accessibility by eatingrules. View the logs.
18 months ago
#3
@
18 months ago
@joyously Yes, that's the same type of thing @eatingrules is referring to. But the alt attribute parameter in IPTC data is newly added to the specification, and is not currently handled by WordPress.
#5
@
18 months ago
Thought it might help to provide an image with embedded photo metadata. This meme has the new IPTC AltTextAccessibility and ExtDescrAccessibility properties, as well as a number of other descriptive fields (Title, Headline, Description/Caption, Description Writer). Posted above.
Not sure if uploading this image will strip out the metadata so here's a Google Drive link just in case: https://drive.google.com/file/d/1MKSk5GZfGoxUdJxyWubS9PNKnIZCu8gS/view?usp=sharing
This ticket was mentioned in Slack in #accessibility by ryokuhi. View the logs.
17 months ago
This ticket was mentioned in Slack in #accessibility by ryokuhi. View the logs.
17 months ago
This ticket was mentioned in Slack in #accessibility by joedolson. View the logs.
17 months ago
This ticket was mentioned in Slack in #accessibility by ryokuhi. View the logs.
15 months ago
#12
@
13 months ago
I'm not having any luck getting the new IPTC fields from the image using iptcparse
, and I'm not really sure why - I don't see anything in the function source that looks like it would skip newly defined tags.
I verified that the provided image does contain the extended alt information by viewing the raw image data, but the function doesn't pick them up.
#13
@
13 months ago
I'm having the same problem as @joedolson, a simple var_dump
does not show the new image data. It seems that getimagesize
is not returning the alt text information.
This ticket was mentioned in Slack in #accessibility by sabernhardt. View the logs.
13 months ago
#15
@
13 months ago
- Milestone changed from 6.1 to Future Release
Moving to Future Release for further investigation
#16
@
13 months ago
@sabernhardt - I'm disappointed to see that this request has been changed from 6.1 to future release. What needs to be done to investigate / resolve this issue and get this back on track? I'm a member of the IPTC photo metadata group. Would it help to arrange a meeting to discuss further?
#17
@
13 months ago
@carolinescribely If somebody can provide a way to get the IPTC metadata out of the image using PHP, we can continue to research it; but multiple people have worked on it without successfully getting that data. If we can't get the attribute, we can't use it. See the previous comments.
#18
@
12 months ago
@joedolson not a perfect but a well working solution can be using ExifTool - https://exiftool.org/ - a PERL based library for reading and writing metadata. With proper parameters it returns a structured JSON object with all the metadata in the image. We from IPTC could help using ExifTool properly in terms of metadata reading - but sorry, we are no PHP experts.
#19
@
12 months ago
- Milestone changed from Future Release to 6.2
Going to look at this again for 6.2; need to see if we can resolve the PHP issue with iptcparse
.
@mwsat I don't think I can see a way that would help resolve the issue. I can definitely extract and see the metadata, but for these purposes we need WordPress to be able to extract that data, and that's what isn't working as expected.
#20
@
9 months ago
Hi @joedolson and all, if I can explain a bit of the detail behind IPTC Photo Metadata (more details in the spec: https://www.iptc.org/std/photometadata/specification/IPTC-PhotoMetadata-2022.1.html#history):
The "original" IPTC metadata is now referred to as IPTC/IIM. It is a binary format encoded in the Photoshop Image Resource Block (APP13 in JPEG). It was implemented by Photoshop in the 1990s. This is what is supported by PHP's native iptcparse
function.
In the 2000s, IPTC and Adobe moved to the XMP format, which is stored in a different image header using the APP1 block marker (which can also used to store Exif tags). Unfortunately, PHP's iptcparse
has still not been updated. It's understandable, because the parsing is more complicated - it's an XML-encoded set of RDF triples.
I'm not a PHP expert either, but here's some code to extract the XMP packet from an image and retrieve the Alt Text field (created with a bit of help from ChatGPT!)
<?php // Read the contents of the JPEG file into a string $jpeg_contents = file_get_contents("image.jpg"); // Find the start and end positions of the XMP metadata $xmp_start = strpos($jpeg_contents, "<x:xmpmeta"); $xmp_end = strpos($jpeg_contents, "</x:xmpmeta>"); // Extract the XMP metadata from the JPEG contents $xmp_data = substr($jpeg_contents, $xmp_start, $xmp_end - $xmp_start + 12); // Parse the XMP metadata using DOMDocument $doc = new DOMDocument(); $doc->loadXML($xmp_data); // Get the "Iptc4xmpCore:AltTextAccessibility" element $element = $doc->getElementsByTagName("Iptc4xmpCore:AltTextAccessibility")->item(0); // Extract the value of the element $alt_text = $element->nodeValue;
I hope that helps move towards a solution. I guess the best path forward would be to change the core iptcparse
to support XMP metadata. But in the meantime this wouldn't be a bad step forward.
Another option would be to use a third-party tool like exiftool, as @mwsat suggests. Of course, this creates a dependency for WordPress that might cause issues.
This ticket was mentioned in Slack in #accessibility by ryokuhi. View the logs.
9 months ago
This ticket was mentioned in Slack in #accessibility by ryokuhi. View the logs.
8 months ago
#23
@
8 months ago
thanx @brendanquinn.
For everyone, here's the embedded XMP from Tired Spongebob Meme.jpg (with ...
meaning that parts have been elided for clarity):
<x:xmpmeta xmlns:x="adobe:ns:meta/" x:xmptk="Adobe XMP Core 7.0-c000 1.000000, 0000/00/00-00:00:00"> <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"> xmlns:Iptc4xmpCore="http://iptc.org/std/Iptc4xmpCore/1.0/xmlns/" ... > ... <Iptc4xmpCore:ExtDescrAccessibility> <rdf:Alt> <rdf:li xml:lang="x-default">Spongebob leans on his hand as if to balance or stabilize himself. He's naked, not wearing his usual square pants, and he's looking off to one side with his brow pulled up, looking completely exhausted. Both of his cheeks are puffed out as he exhales from a small opening in his mouth, like he's out of breath and trying to recover. </rdf:li> <rdf:li xml:lang="en">Spongebob leans on his hand as if to balance or stabilize himself. He's naked, not wearing his usual square pants, and he's looking off to one side with his brow pulled up, looking completely exhausted. Both of his cheeks are puffed out as he exhales from a small opening in his mouth, like he's out of breath and trying to recover. </rdf:li> </rdf:Alt> </Iptc4xmpCore:ExtDescrAccessibility> <Iptc4xmpCore:AltTextAccessibility> <rdf:Alt> <rdf:li xml:lang="x-default">Meme. Spongebob Squarepants leans against the wall of an undersea cave, puffing out his cheeks with a tired and put-upon expression. Caption: Me after I put the fitted sheet on the bed by myself.</rdf:li> <rdf:li xml:lang="en">Meme. Spongebob Squarepants leans against the wall of an undersea cave, puffing out his cheeks with a tired and put-upon expression. Caption: Me after I put the fitted sheet on the bed by myself.</rdf:li> </rdf:Alt> </Iptc4xmpCore:AltTextAccessibility> </rdf:Description> </rdf:RDF> </x:xmpmeta>
And here's some working code to extract the AltTextAccessibility
from the XMP, which builds on what @brendanquinn posted and is more complete. I through this together pretty quickly, and I'm sure it can be improved upon.
<?php $jpeg_contents = file_get_contents( $file ); // Find the start and end positions of the XMP metadata $xmp_start = strpos( $jpeg_contents, '<x:xmpmeta' ); $xmp_end = strpos( $jpeg_contents, '</x:xmpmeta>'); // Extract the XMP metadata from the JPEG contents $xmp_data = substr( $jpeg_contents, $xmp_start, $xmp_end - $xmp_start + 12 ); // Parse the XMP metadata using DOMDocument $doc = new DOMDocument(); $doc->loadXML( $xmp_data ); // Instantiate an XPath object, used to extract portions of the XMP. $xpath = new DOMXPath( $doc ); // Register the relevant XML namespaces. $xpath->registerNamespace( 'x', 'adobe:ns:meta/' ); $xpath->registerNamespace( 'rdf', 'http://www.w3.org/1999/02/22-rdf-syntax-ns#' ); $xpath->registerNamespace( 'Iptc4xmpCore', 'http://iptc.org/std/Iptc4xmpCore/1.0/xmlns/' ); $node_list = $xpath->query( '/x:xmpmeta/rdf:RDF/rdf:Description/Iptc4xmpCore:AltTextAccessibility' ); if ( $node_list && $node_list->count() ) { // Get the alt text accessibility alternative most appropriate for the site language. $node = $node_list->item( 0 ); // Get the site's locale. $locale = get_locale(); // There are 3 possibilities: // // 1. there is an rdf:li with an exact match on the site locale // 2. there is an rdf:li with a partial match on the site locale (e.g., site locale is en_US and rdf:li has @xml:lang="en") // 3. there is an rdf:li with an "x-default" lang. // // we evaluate them in that order, stopping when we have a match. $value = $xpath->evaluate( "string( rdf:Alt/rdf:li[ @xml:lang = '{$locale}' ] )", $node ); if ( ! $value ) { $value = $xpath->evaluate( 'string( rdf:Alt/rdf:li[ @xml:lang = "' . substr( $locale, 0, 2 ) . '" ] )', $node ); if ( ! $value ) { $value = $xpath->evaluate( 'string( rdf:Alt/rdf:li[ @xml:lang = "x-default" ] )', $node ); } } }
Also note that it's possible that there are multiple <x:xmpmeta>
embedded with a given image, and the above code doesn't handle that case.
#24
@
8 months ago
p.s. the above code will won't "do the right thing" if the image doesn't have XMP data :-)
also, see this Stack Overflow post for some code that is supposed to be able to extract multiple XMP blocks...I haven't tried it.
#25
follow-up:
↓ 26
@
8 months ago
another thing to note: there is already code in core to extra image caption/description info from EXIF data within an image.
@eatingrules and @brendanquinn : is the intent of this ticket to prefer the 2 new IPTC fields for accessibility purposes?
#26
in reply to:
↑ 25
@
8 months ago
IPTC Caption/Description is more for the visible captions that appear below the image. Caption/Descriptions provide the "facts" about an image (who, what, when, where).
The IPTC Alt Text and Extended Description properties are for accessibility purposes specifically. Alt Text goes to the alt attribute for the image and Extended Description goes to the page reference linked from the image. More details on that here: https://www.w3.org/WAI/tutorials/images/complex/
So, yes, the intent of this ticket would be to import Alt Text instead of Caption/Description when Alt Text is present. We want to import / preserve the Extended Description from the image metadata in Word Press so users can add it to the web using their preferred approach from the w3c guidance.
Replying to pbiron:
another thing to note: there is already code in core to extra image caption/description info from EXIF data within an image.
@eatingrules and @brendanquinn : is the intent of this ticket to prefer the 2 new IPTC fields for accessibility purposes?
This ticket was mentioned in Slack in #accessibility by ryokuhi. View the logs.
8 months ago
This ticket was mentioned in Slack in #accessibility by ryokuhi. View the logs.
6 months ago
#31
@
6 months ago
So, yes, the intent of this ticket would be to import Alt Text instead of Caption/Description when Alt Text is present. We want to import / preserve the Extended Description from the image metadata in Word Press so users can add it to the web using their preferred approach from the w3c guidance.
This isn't an either/or situation - there's no reason not to import caption/description because the alt text is present. These are all separate fields in WordPress.
The ticket will add support for importing alt attribute values when provided, but make no changes to how existing captions and descriptions are handled.
#32
@
6 months ago
@brendanquinn and/or @carolinescribely : can you please add some additional images to this ticket "showcase" the variability in the way this IPTC metadata can appear in an image? For example, there can be multiple <x:xmpmeta>
elements, etc. That will help with implementing this.
#33
@
6 months ago
@pbiron Yes, there are sample images on the IPTC site, which already have values for both of the accessibility fields as well as other IPTC metadata properties. Including a few options below. Is this what you need? If not, can you explain what you mean by "showcase the variability in multiple <x:xmpmeta> elements?"
Current IPTC Reference images:
http://iptc.org/standards/photo-metadata/guidelines-support/
It's part way down the page, under "Reference Image with all current IPTC Photo Metadata fields filled in"
The direct link to the Standard 2021.1 file is here:
https://www.iptc.org/std/photometadata/examples/IPTC-PhotometadataRef-Std2021.1.jpg
THe image is linked to from https://iptc.org/standards/photo-metadata/iptc-standard/
https://www.iptc.org/std/photometadata/examples/IPTC-PhotometadataRef-Std2021.1-large.jpg is the link to the image at the top of the "Get IPTC Photometadata page" https://getpmd.iptc.org/getiptcpmd.html (which might be a better one to use, as it also shows how to view photo metadata via this service).
#34
@
6 months ago
- Keywords 2nd-opinion added
Hi there, I have a quick question about this proposal. It seems to me that the alt text of an image should not only describe the image alone, but should describe it in the context where the image is used.
For example, the photo of a green gourd could have different alt texts:
- On an mountain equipment ecommerce website, the alt text could be "Green steel hiking gourd" to describe the product
- On a photographer's site, the same photo might have a more technical alt text "Photo with motion blur - aperture priority mode"
- On a hiking blog where the photo would only be decorative, we would rather have an empty alt text.
Since everything depends on the context in which the photo is inserted, wouldn't it be better to avoid generating the alternative text automatically from the exif data?
#35
@
5 months ago
Alternative text provided by the source (photo owner and or photographer, for example) can provide details that aren't necessarily available to the user at the time they use the photo. Retaining that information is helpful.
The alt text field in WordPress should never be considered the source of the optimal alt text, because WP only supports a single field for that use, and alt text needs to be context-sensitive, anyway.
The value to this is to retain valuable information that could be provided by the photo creator.
See #47456
#36
@
5 months ago
Thanks, Joe. I thought I had responded and looks like the comment didnt post.
Good question @audrasjb. Yes, alt text does depend on the surrounding context. The ideal use case is to show the embedded alt text and allow WP users to accept/edit the description. There are a few good reasons for doing this. First, it gives users a place to start their descriptions from - always easier to adapt than create from scratch. Second, the alt text may come from the original creator or source provider and provide helpful information about how to accurately describe the image. Third, the alt text can be imported from an outside source/database managed by the host of the website. The idea with this ticket is to put the plumbing in place to preserve and pass any existing descriptive text through, and then allow users to adapt the alt text as needed. Let me know if there are any additional questions or comments on contextual descriptive text.
#37
@
3 months ago
- Milestone changed from 6.3 to 6.4
This is an enhancement, and we are in 12 days until Beta 1 after which we will not add new enhancement to the release, so, because there is no patch, I am moving this ticket to 6.4.
#38
follow-up:
↓ 39
@
3 months ago
I think alongside with @audrasjb, if information will be uploaded automatically, and it is actually invisible to user who is uploading it, the user can miss it and land with something unwanted or unexpected. If a picture was bought, there can be a lot of keywords to optimize search visibility of this image where it was purchased and also authors attributes and links to buy more.
Article in addition: https://yoast.com/developer-blog/why-we-dont-set-the-og-image-alt-tag/
#39
in reply to:
↑ 38
@
3 months ago
Why wouldn't WordPress make the embedded alt text visible to users? Is there a way to have an actual conversation with stakeholders about this ticket? It has been pushed back last minute for several releases at this point and I think we would benefit from having a discussion about the best way forward.
Replying to oglekler:
I think alongside with @audrasjb, if information will be uploaded automatically, and it is actually invisible to user who is uploading it, the user can miss it and land with something unwanted or unexpected. If a picture was bought, there can be a lot of keywords to optimize search visibility of this image where it was purchased and also authors attributes and links to buy more.
Article in addition: https://yoast.com/developer-blog/why-we-dont-set-the-og-image-alt-tag/
#40
@
3 months ago
@oglekler I'm not sure in what sense you mean "would be invisible to user who is uploading it" - the alt text would be visible exactly like any other alt text would be visible.
I'll also note that in the case where the author doesn't notice the alt text, then they were never going to have appropriate alt text in the first place - they were going to have no alt text. So it's somewhat a wash there; just a different default.
@carolinescribely This has been pushed back largely because there's no consensus on what to do and PHP doesn't have good support for extracting the data in the first place, so it's fairly labor intensive to make this happen.
There is a larger question here which is about how WordPress is going to handle alt text. Historically, WordPress has only stored a single canonical alt text, and that leads to a number of problems in usage. However, due to the many different ways of using this data, it's difficult to make significant changes to how it's stored. I've had a number of conversations about that with other contributors, but we don't have a clear path yet.
#41
@
3 months ago
I didn't realize this was part of a larger conversation and that it has a higher LOE. Thanks for sharing that. Let me know if I can help weigh in with contributors and decide on a go-forward alt text strategy. I work with clients and partners on this sort of thing all the time and I would be happy to contribute to moving this conversation forward.
Here are initial thoughts in case they are helpful...
Ideally there's a way to store multiple versions of alt text within a media library and have users control which one works best and whether or not to adapt / write a new description. It would be good to be able to categorize an alt text description as "primary / default" in the media library, as well as include other categories that might be helpful to users like "IPTC / embedded" or "custom."
From there, users could set controls for images on the actual webpage to pull let's say the "primary" alt text from the media library or write a new description that's just for that page. This addresses the context concern brought up on this thread. WebFlow has a similar process of pulling descriptions from the asset library to make if you want a reference for this but I don't believe they allow you to store and select from multiple versions of alt text in the media library which is really what's needed for accessibility workflows.
Putting my ideas down here but happy to talk through any of them with you / others if it's helpful.
This ticket was mentioned in Slack in #core-media by joedolson. View the logs.
2 months ago
This ticket was mentioned in Slack in #accessibility by joedolson. View the logs.
2 months ago
This ticket was mentioned in Slack in #accessibility by joedolson. View the logs.
4 weeks ago
#45
@
4 weeks ago
@carolinescribely Do you know of any image sources who are actively using this field? For example, stock imagery providers, photography studios, etc. - a place where we could look at a wide variety of images to examine.
The accessibility team's preferred approach to this is to add a media data panel that can expose this and other important metadata without automatically assigning it to fields that will be output on the front end, so that the user can choose to use this information, but it isn't automatic.
Do you mean like this? https://core.trac.wordpress.org/browser/tags/5.9/src/wp-admin/includes/image.php#L752