Make WordPress Core

Opened 13 years ago

Closed 6 years ago

Last modified 6 years ago

#15397 closed enhancement (fixed)

redirect_guess_404_permalink() doesn't guess posts with updated dates

Reported by: archon810's profile archon810 Owned by: pento's profile pento
Milestone: 4.9.3 Priority: normal
Severity: normal Version:
Component: Canonical Keywords: has-patch has-unit-tests fixed-major
Focuses: Cc:

Description

Problem

Here's my post path scheme: http://site.com/YEAR/MONTH/DAY/SLUG. Whenever I have writers working on a post for a while and saving drafts (we're all using Windows Live Writer), they oftentimes publish to the date when the last draft was saved, i.e. several days in the past. Then, they quickly correct the date but the previously tweeted/shared link is now 404 due to the changed date.

I've looked into the source of redirect_guess_404_permalink(), and it purposedly narrows down the query when it sees a post date to that date only. If I understand correctly, this is done to minimize accidental redirects to the wrong post, but has the side effect of not guessing the new link if only the date was changed.

A workaround of removing these lines:

  if ( get_query_var('year') )
    $where .= $wpdb->prepare(" AND YEAR(post_date) = %d", get_query_var('year'));
  if ( get_query_var('monthnum') )
    $where .= $wpdb->prepare(" AND MONTH(post_date) = %d", get_query_var('monthnum'));
  if ( get_query_var('day') )
    $where .= $wpdb->prepare(" AND DAYOFMONTH(post_date) = %d", get_query_var('day'));

fixes the problem for me.

Can this case be solved in the trunk and the code above removed, or logic improved? My .htaccess file is filled with 301 redirects to correct wrong dates.

Thank you.

Attachments (3)

15397.diff (2.5 KB) - added by solarissmoke 13 years ago.
15397-20171212.diff (13.9 KB) - added by nickmomrik 6 years ago.
15397-tests.patch (9.0 KB) - added by Frank Klein 6 years ago.

Download all attachments as: .zip

Change History (36)

#1 @archon810
13 years ago

  • Cc admin@… added

#2 @nacin
13 years ago

  • Keywords 2nd-opinion added
  • Milestone changed from Awaiting Review to Future Release
  • Priority changed from normal to lowest
  • Type changed from defect (bug) to enhancement

Seems to me we'll have to store date changes as well. Otherwise we'll be making far too liberal redirects, which this is designed to prevent.

#3 @archon810
13 years ago

Sounds like an acceptable workaround, similar to how _wp_old_slug works for slugs. Alternatively, maybe add a hook into this function, or provide an option we can set in order to disable date checking.

Right now, my core is edited (and it's worked really well), but I hate to see my core files with modifications.

Thanks.

#4 @nacin
13 years ago

I would love to see this function instead hooked in. I was asked a few weeks ago if there was a way to disable exactly this (but keep the rest of canonical), and it required a goofy hack.

#5 follow-up: @dd32
13 years ago

Removing the specific months/days from the query has been mentioned before, and IMO it'd be nice if it could catch cases like that as well.

But it gets to a point where it may make the redirection more liberate than preferred, As soon as I saw this ticket, I thought that perhaps, we should be returning a Search Results page listing the possible locations which may have been intended..

Another option, is to attempt redirection with the date, and progressively remove refinements until a single post is returned. (ie. 2010/05/04/postname doesnt return, does 2010 + 05 + 04 + postname return? ok, how about 2010 + 05 + postname, 2010+postname? etc)

#6 in reply to: ↑ 5 @archon810
13 years ago

Replying to dd32:

Removing the specific months/days from the query has been mentioned before, and IMO it'd be nice if it could catch cases like that as well.

But it gets to a point where it may make the redirection more liberate than preferred, As soon as I saw this ticket, I thought that perhaps, we should be returning a Search Results page listing the possible locations which may have been intended..

Another option, is to attempt redirection with the date, and progressively remove refinements until a single post is returned. (ie. 2010/05/04/postname doesnt return, does 2010 + 05 + 04 + postname return? ok, how about 2010 + 05 + postname, 2010+postname? etc)

A search page for when the # of results is > 1 sounds acceptable, and so does the latter solution, although my concern is that it might result in more queries (which, in this case, is actually not bad - it would only result in 1 more query if there are no results by the time you get to the date filter).

#7 @solarissmoke
13 years ago

  • Keywords has-patch dev-feedback added; 2nd-opinion removed

Attaching is a patch that tries to make the function more lenient without being too liberal:

If only one match for post name is found, it returns that without querying dates. Otherwise it tries to find the best match out of several by factoring in dates, incrementally by year, month, and day. There has to be a clear winner on the date match otherwise it returns false.

@solarissmoke
13 years ago

#8 follow-up: @markjaquith
13 years ago

Good start. What about limiting the number of days that it can drift? I mean, what happens if you delete a post from 2004 with a slug of "my-post" and then subsequently you create a post in 2011 with a slug of "my-post" — you wouldn't want the 2004 URL to now redirect to the 2011 URL. When people change dates on posts, I'd wager that 90% of them change it plus or minus one day. We could restrict the changes to plus or minus 5 days (or so) and probably cover the majority of legitimate changes, without risking false redirection scenarios.

#9 in reply to: ↑ 8 @solarissmoke
13 years ago

Replying to markjaquith:

Good start. What about limiting the number of days that it can drift?

It would only be possible when the permalink structure is /%year%/%monthnum%/%date%/%postname%/, which is only one of many possible structures. So if the structure is /%year%/%postname%/ then we can only match the year..

#10 @archon810
13 years ago

@mark Unless you make the time period configurable, your solution is highly subjective. Writers can be working on a piece for days, weeks, or even months and then the final URL published by accident. Similarly, a post may be updated and its date changed so that it jumps back onto the front page - thus the date difference could be anything.

#11 @danielbachhuber
12 years ago

  • Cc wordpress@… added

#12 @danielbachhuber
10 years ago

  • Keywords needs-patch needs-unit-tests added; has-patch dev-feedback removed

I like the idea of storing some small number of date changes in post meta, and only performing the redirect if the post has an old matching date. It seems like the least magic option.

We can offer some level of customization by allowing the number of dates store to be configurable (__return_zero would disable completely)

#13 @chriscct7
8 years ago

  • Priority changed from lowest to normal

#14 @johnbillion
8 years ago

  • Summary changed from redirect_guess_404_permalink() purposedly doesn't guess posts with updated dates to redirect_guess_404_permalink() doesn't guess posts with updated dates

#15 @nickmomrik
6 years ago

Big thanks to @pento for guiding me with the patch.

#16 @pento
6 years ago

  • Keywords has-patch has-unit-tests added; needs-patch needs-unit-tests removed
  • Milestone changed from Future Release to 4.9.2
  • Owner set to pento
  • Status changed from new to accepted

#17 @pento
6 years ago

  • Resolution set to fixed
  • Status changed from accepted to closed

In 42401:

Canonical URLs: Redirect to the correct URL when the post date changes.

When a post slug is changed, we store a copy of the old slug, so that we can redirect visitors visiting the old URL to the new URL.

In the same way, this stores a copy of the old date, when the post date changes, so we can redirect visitors to the new URL.

Props nickmomrik.
Fixes #15397 for trunk.

#18 @pento
6 years ago

  • Keywords fixed-major added
  • Resolution fixed deleted
  • Status changed from closed to reopened

#19 @matt
6 years ago

Great fix for 4.9.x.

#20 @archon810
6 years ago

Sorry, why was this reopened?

#21 @pento
6 years ago

@archon810: It's currently fixed in trunk (which will be WordPress 5.0). I re-opened it for backporting to 4.9.2, as well.

#22 @Frank Klein
6 years ago

The tests could be improved:

  • Use shared fixtures, which makes the tests faster.
  • Use assertSame(), which checks types (not necessary here, but a good best practice nonetheless).
  • Avoid using legacy method arguments when creating the attachment fixture (existing tests are often reused when writing others, so let's do it right the first time).
  • Do not remove the added filter on tearDown(), as this is handled by the WP_UnitTestCase class (hooks are saved in setUp() and restored in tearDown()).

#23 @dd32
6 years ago

  • Milestone changed from 4.9.2 to 4.9.3

Bumping to 4.9.3 due to 4.9.2s release

This ticket was mentioned in Slack in #core by desrosj. View the logs.


6 years ago

This ticket was mentioned in Slack in #core by desrosj. View the logs.


6 years ago

#26 @pento
6 years ago

In 42587:

Tests: Improve the old date redirect tests.

Props frank-klein.
See #15397.

#27 @pento
6 years ago

In 42588:

Docs: Correct some version numbers.

wp_check_for_changed_dates(), _find_post_by_old_slug(), _find_post_by_old_date(), and the old_slug_redirect_post_id` filter were intended to land in 4.9.2, but didn't quite make it. Instead, they'll be landing in 4.9.3.

See #15397.

#28 @pento
6 years ago

  • Resolution set to fixed
  • Status changed from reopened to closed

In 42589:

Canonical URLs: Redirect to the correct URL when the post date changes.

When a post slug is changed, we store a copy of the old slug, so that we can redirect visitors visiting the old URL to the new URL.

In the same way, this stores a copy of the old date, when the post date changes, so we can redirect visitors to the new URL.

Merge of [42401,42587,42588] to the 4.9 branch.

Props nickmomrik, frank-klein.
Fixes #15397.

This ticket was mentioned in Slack in #core by sergey. View the logs.


6 years ago

#30 @archon810
6 years ago

4.9.3 is out, and so is the fix.

#31 follow-up: @archon810
6 years ago

Warning: Looks like the fix is not retroactive so I'll have to keep my redirect function in indefinitely.

#32 in reply to: ↑ 31 ; follow-up: @dd32
6 years ago

Replying to archon810:

Warning: Looks like the fix is not retroactive so I'll have to keep my redirect function in indefinitely.

Correct, you can also migrate it to store the data the same way core does, to avoid having to run your own variant indefinately.

If you look at the above commits, you'll find it stores the data in _wp_old_date. I'd suggest inspecting an actual install though.

#33 in reply to: ↑ 32 @archon810
6 years ago

Replying to dd32:

Replying to archon810:

Warning: Looks like the fix is not retroactive so I'll have to keep my redirect function in indefinitely.

Correct, you can also migrate it to store the data the same way core does, to avoid having to run your own variant indefinately.

If you look at the above commits, you'll find it stores the data in _wp_old_date. I'd suggest inspecting an actual install though.

The problem is our solution doesn't keep track of dates either, it just detects 404s and attempts to check if the slug is valid for a post, then finds the right date. So I can change the dates all day long, and they'll all still redirect correctly.

Note: See TracTickets for help on using tickets.