Make WordPress Core

Opened 5 years ago

Last modified 3 years ago

#49413 reviewing defect (bug)

wp_exif_date2ts should use Datetime and accept an optional offset

Reported by: dshanske's profile dshanske Owned by: rarst's profile Rarst
Milestone: Future Release Priority: low
Severity: normal Version:
Component: Date/Time Keywords: has-patch needs-refresh needs-unit-tests needs-dev-note
Focuses: Cc:

Description

I think this falls under the date/time component. This function should be updated to use DateTime and to accept an optional timezone offset.

Right now, this is called by wp_read_image_metadata with the result the timestamp is wrong. This is because timestamps in exif are stored without timezone offsets, usually in local time. Storing it as a timestamp was therefore a mistake in my opinion as it will always be off. We should be assuming that the time is local site time unless an optional offset/timezone string is passed in.

Separately, as newer cameras and most cell phones do store that offset, will be proposing changes for that in a separate ticket.

Attachments (1)

49413.diff (3.7 KB) - added by dshanske 5 years ago.
Proof of Concept

Download all attachments as: .zip

Change History (25)

#1 @skithund
5 years ago

PHP exif doesn't support the latest exif standards, which include OffsetTime, OffsetTimeOriginal and OffsetTimeDigitized tags.
There's GPSTimeStamp though, which PHP supports, and it has the GPS time in UTC - no date.

#2 @Rarst
5 years ago

Yeah, I am getting to look at the function some time, it didn't get into revamp effort. Another ticket about #48204

I know what EXIF is, but not familiar with specifics. Is there any good resource describing what we have there to work with?

Last edited 5 years ago by Rarst (previous) (diff)

#3 @skithund
5 years ago

There's the actual 2.32 standard (pretty slow PDF download), which describes different tags, their types and possible values. Like in the case of #48204 and DateTimeDigitized:

When the date and time are unknown, all the character spaces except colons (":") should be filled with blank characters

At the moment PHP supports Exif 2.2 (timezone offsets were introduced in Exif 2.31)

#4 @dshanske
5 years ago

I addressed the issue of getting the data in #49414. I opened this to address the update to the function to allow an offset and improve the conversion, but they are related.

#5 @dshanske
5 years ago

Timezone offset aside, converting a time without a timezone to a timestamp is fraught with issues that we need to document, standardize and be consistent about. I think we might have been better off to store as an iso8601 date

#6 @dshanske
5 years ago

@skithund I added a filter to read_image_metadata that dumped the exif from a jpeg from my phone to a file for testing...and regrettably you are wrong. The data did come out.

"ExifVersion": "0231",
"DateTimeOriginal": "2020:02:08 20:31:08",
"DateTimeDigitized": "2020:02:08 20:31:08",
"UndefinedTag:0x9010": "+02:00",
"UndefinedTag:0x9011": "+02:00",
"UndefinedTag:0x9012": "+02:00",
"SubSecTime": "190673",
"SubSecTimeOriginal": "190673",
"SubSecTimeDigitized": "190673",
"GPSTimeStamp": [

"18\/1",
"31\/1",
"8\/1"

],

Now, the EXIF 2.32 standard defines that as

9010 - Offset Data of DateTime
9011 - Offset Data of DateTimeOriginal
9012 - Offset Data of DateTimeDigitized
SubSec is subseconds attached to those same fields.
DateTimeOriginal - Date and time original was made,
DateTimeDigitized - Date and time it was digitized
DateTime - File Change and Date Time

And The GPS Timestamp and GPS DateStamp indicates the time as UTC (Coordinated Universal Time). TimeStamp is expressed as three RATIONAL values giving the hour, minute, and second....which means we could try to use that to derive a correct timestamp

DateTimes are expressed in format "1997:09:01 12:00:00"

#7 @dshanske
5 years ago

According to wp_read_image_metadata, it uses DateTimeDigitized in EXIF, if the IPTC date and time fields are empty.

This ticket was mentioned in Slack in #core by noisysocks. View the logs.


5 years ago

#9 @dshanske
5 years ago

Related: #47717

#10 @dshanske
5 years ago

I've given some thought to my proposal. I'd like to propose introducing a wp_exif_datetime function, which seems consistent with the changes in the other time retrieval functions, and this can replace the usage of strtotime inside this function. DateTime actually accepts EXIF formatted dates without any of the special parsing we do now.

Last edited 5 years ago by dshanske (previous) (diff)

@dshanske
5 years ago

Proof of Concept

#11 @dshanske
5 years ago

I uploaded a very quick and dirty proof of concept for what I'm thinking here.

#12 @dshanske
5 years ago

Joy, this gets better. I just read about EXIF 2.32...we've been discussing 2.31.

http://www.cipa.jp/std/documents/download_e.html?DC-X010-2020

If I'm reading this correctly, in 2.32, the DateTimeOriginal/DateTimeDigitized fields will be stored in ISO8601, as opposed to the EXIF format...so we may need to check the Exif Version, which deprecates all of the offset stuff I was talking about, and will break our existing code and my proposed amendment.

This ticket was mentioned in Slack in #core-datetime by dshanske. View the logs.


4 years ago

This ticket was mentioned in PR #495 on WordPress/wordpress-develop by dshanske.


4 years ago
#14

  • Keywords has-patch added

This patch removes uses of strtotime from image metadata functions, replacing them with the DateTime class. It introduces a new wp_exif_datetime function that can be used for this. wp_exif_date2ts if now a wrapper around it that generates a timestamp for backcompat only. It replaces the use of the old function in the read metadata functionality and to ensure timezone is respected, stores as a new property the full datetime with timezone.

Trac ticket: https://core.trac.wordpress.org/ticket/49413

#15 @dshanske
4 years ago

  • Milestone changed from Awaiting Review to 5.6

Spent some time with this and simplified it. The other items I put in my proof of concept could be addressed in a future ticket and addressed as a media issue.

This new version removes all usages of strtotime and replaces them with the DateTime class. It also stores the generated time as an RFC3339 timestamp with the key 'created' so the timezone data is preserved. The DateTime object uses the offset info introduced in Exif 2.31.

Because the new function uses DateTime, when 2.32, which now incorporates the timezone in the DateDigitized field starts to be adopted, this will require no changes to just work.

It maintains the created_timestamp key as the timestamp for backward compatibility...however, when there is an offset, the timestamp will be adjusted appropriately. When there isn't an offset, it will use the site's timezone, which means it will be at least the correct timezone much of the time, if not all.

While the focus was on the EXIF data, there was a need to make sure the IPTC data was consistent by adding the same 'created' key when it is used.

#16 follow-up: @Rarst
4 years ago

  • Owner set to Rarst
  • Status changed from new to reviewing

Thank you for keeping this going! :) From quick look at PR it looks generally sound, but I definitely need to find time to thoroughly read up on context and review it then.

Accommodating for different versions of input has me suspicious how many problems/edge cases will float up. Also unit tests (I can take care of that).

#17 in reply to: ↑ 16 @dshanske
4 years ago

Replying to Rarst:

Thank you for keeping this going! :) From quick look at PR it looks generally sound, but I definitely need to find time to thoroughly read up on context and review it then.

Accommodating for different versions of input has me suspicious how many problems/edge cases will float up. Also unit tests (I can take care of that).

That's why I pulled back from other ideas...I want this to just an iteration of what is there now...Even if you get rid of the offset check in the exif function...you still will make the timestamp more accurate by assuming it is the site timezone.

#18 @hellofromTonya
4 years ago

  • Keywords needs-refresh needs-unit-tests needs-dev-note added

Adding needs-refresh as PR has PHPCS issues. The tests are failing (though may just need a merge from master to sync with latest).

Adding needs-dev-note for the new wp_exif_datetime() function.

This ticket was mentioned in Slack in #core by hellofromtonya. View the logs.


4 years ago

#20 @davidbaumwald
4 years ago

  • Summary changed from wp_exif_date2ts should use Dateatime and accept an optional offset to wp_exif_date2ts should use Datetime and accept an optional offset

#21 @hellofromTonya
4 years ago

  • Milestone changed from 5.6 to Future Release

Too deep into 5.6 beta and no movement. Punting this one to Future Release.

If any maintainer or committer feels this can be resolved in time, or wishes to assume ownership during a specific cycle, feel free to update the milestone accordingly.

#22 @dshanske
3 years ago

@Rarst What do I need to do to finally move this one along? We're storing incorrect and therefore questionably useful time information. I could try to figure out how to add sufficient testing for it, though to be honest, unit testing was never my strength.

#23 @Rarst
3 years ago

I have it noted down and wanted to do some moping up of date/time tickets in my winter downtime, but right now sick with covid instead of fun stuff. :( I'll try to review if I can knock it out once I recover.

#24 @dshanske
3 years ago

@Rarst Again, tell me if I can help. And feel better soon.

Note: See TracTickets for help on using tickets.