WordPress.org

Make WordPress Core

Opened 7 years ago

Last modified 30 hours ago

#5305 assigned defect (bug)

permalinks broken when article name is numeric

Reported by: thomask Owned by: valendesigns
Milestone: 4.2 Priority: normal
Severity: major Version: 2.3.1
Component: Permalinks Keywords: has-patch
Focuses: Cc:

Description

if you create numeric-only post name, the generated slug is this number - this conflicts with article ID, so it returns different article or 404 page, never the article. It can be then solved by generating manual slug with some char in it, but i think it would be better to include some char in that case, e.g. underscore, like _123

Also if someone will try to solve this, it would be nice to solve other problem - if post slug is begining with the slug of the category, than the category page returns that post, not the category

Attachments (16)

wp-includes_post.php.diff (693 bytes) - added by loushou 6 months ago.
wp-includes/post.php patch for handling numeric slugs
wp-includes_post.php.2.diff (704 bytes) - added by loushou 6 months ago.
updated patch, which includes translation on 'number'
5305-unittests.diff (1.1 KB) - added by MikeHansenMe 4 months ago.
see #30284
5305.diff (1.3 KB) - added by valendesigns 6 weeks ago.
5305.1.diff (1.6 KB) - added by valendesigns 6 weeks ago.
5305.2.diff (1.6 KB) - added by valendesigns 3 weeks ago.
5305.3.diff (1.6 KB) - added by valendesigns 3 weeks ago.
5305.4.diff (2.5 KB) - added by valendesigns 3 weeks ago.
Added a unit test.
5305.5.diff (10.5 KB) - added by valendesigns 3 weeks ago.
5305.6.diff (10.5 KB) - added by valendesigns 3 weeks ago.
5305.7.diff (7.9 KB) - added by boonebgorges 12 days ago.
5305.8.diff (10.3 KB) - added by valendesigns 12 days ago.
5305.9.diff (12.0 KB) - added by boonebgorges 11 days ago.
5305.10.diff (13.2 KB) - added by boonebgorges 7 days ago.
5305.11.diff (18.6 KB) - added by valendesigns 7 days ago.
5305.12.diff (24.5 KB) - added by boonebgorges 7 days ago.

Download all attachments as: .zip

Change History (92)

comment:1 @lloydbudd7 years ago

thomask, can you include your Permalink Structure? And provide specific examples? This could help duplicate the problem.

comment:2 @darkfate7 years ago

I don't think he's using permalinks. I think he's saying, like if his post slug is 123 and his title is 123, it breaks. It looks like 2.4 trunk doesn't break like that though. Unless it is a specific permalink structure.

comment:3 @dille7 years ago

My structure is /archief/%postname%/, and I see the same problem. I've simply re-slugged the offending posts to the number in words, so 30 would become "thirty" (or in my case, "dertig", as I'm Dutch).

I didn't have this problem when my structure was /archief/year/month/day/postname/ though.

comment:4 @JDTrower7 years ago

Actually, I am able to verify that this does happen. If you have a custom permalink, where you have something like /archives/%postname%/ or /archives/post/%postname%/ and you publish a post with a numeric title such as 2008, when you click to view the post, you get a 404 error. In all my testing, this only occurs with this type of a permalink where you have a static word followed by a forward slash and then the dynamic postname.

If you try /archive/%author%/%postname/ or put any of the other dynamic options between the static word and the dynamic postname, the link will work fine.

I verified this in 2.3.1 and in 2.4 trunk.

comment:5 @JDTrower7 years ago

Also, if you have a custom permalink that is just /%postname%/ you get a 404 error on a post with a numeric title.

As for the other problem that was mentioned concerning if a post slug is beginning with the slug of the category, than the category page returns that post, not the category. I have been unable to recreate it.

comment:6 @JDTrower7 years ago

Just a note, Ryan in comment 25 on ticket #3614 (http://trac.wordpress.org/ticket/3614#comment:25) explains why the 404 errors occur with permalinks like those that I tried (I don't personally use them myself) or that others use. I have yet to test the fix that they have come up with on that ticket to see if it solves the problem here(although, it appears as though you would still need to modify your permalinks structure to not be like those above). When I get a chance I'll test this ticket based on ticket #3614.

comment:7 @lloydbudd7 years ago

  • Milestone changed from 2.3.2 to 2.5

comment:8 follow-up: @chrismou7 years ago

I've just a had a similar ticket closed as a duplicate, although I'm not 100% sure from the comments here that people have grasped what the problem is.

If you need examples, check the info on http://trac.wordpress.org/ticket/6997

To summarise, I create a page (lets call it "a"). The slug is also "a". I then create another page, give it a slug of "2008" and set "a" to be its parent. When I try to navigate to /a/2008/, it simply shows me the content of "a", rather than "2008". If I add anything else onto the end (ie a/2008/a/) it redirects me to a post written in 2008.

I just seems to be the case that as soon as it sees 2008, it tries to guess what I want, rather than just checking to see if there is actually a page called 2008.

comment:9 in reply to: ↑ 8 @DD327 years ago

By the look of it, Its reconising the 2008 as a page value(ie. The 2008th page):

object(WP_Query)[3]
  public 'query_vars' => &
    array
      'page' => string '/2008' (length=5)
      'pagename' => string 'a' (length=1)

(Which in itself seems to be a bug that /2008 is reconised as the page number, it shouldnt have the slash in there)

The rule which is being matched is:

[(.+?)(/[0-9]+)?/?$] => index.php?pagename=$matches[1]&page=$matches[2]

I can see 2 ways to resolve it:

  1. Revert to the old method of actually having *every* page slug being checked for, which is slow, messy, and really isnt going to happen
  2. Limit pages to only have a certain ammount of pages (Why remove functionality?)
  3. Prefix extra pages with /page/

More on that last one: Most common permalink structures in WordPress allready use that method (.*/page/[0-9]+).

Changing the current permalink settings will break links on pages at present (ie. /a/2/ for page 2) in search results, But would allow for numeric page slugs on subpages.

comment:10 @DD327 years ago

Also, For consistancy, Is there a reason thats using &page= instead of &paged= ?

comment:11 @SupersonicSquirrel7 years ago

I also had a ticket about this closed and I had no idea it existed already, sorry; as I had no such problems in any version prior to 2.5.1. I'm confused now. How come that the issue occurs in different versions of WP for each person?

comment:13 @Viper007Bond7 years ago

For anyone looking to do this meanwhile and is willing to give up the paging ability, here's a plugin I wrote:

<?php
/*
Plugin Name:  Allow Numeric Stubs
Description:  Allows children Pages to have a stub that is only a number. Sacrifices the <code>&lt;!--nextpage--&gt;</code> ability in Pages.
Version:      2008.07.23
Author:       Viper007Bond
Author URI:   http://www.viper007bond.com/
*/


// Register plugin hooks
register_activation_hook( __FILE__, 'allow_numeric_stubs_activate' );
add_filter( 'page_rewrite_rules', 'allow_numeric_stubs' );


// Force a flush of the rewrite rules when this plugin is activated
function allow_numeric_stubs_activate() {
	global $wp_rewrite;
	$wp_rewrite->flush_rules();
}


// Remove the offending rule and replace it with something else
function allow_numeric_stubs( $rules ) {
	unset( $rules['(.+?)(/[0-9]+)?/?$'] );

	$rules['(.+?)?/?$'] = 'index.php?pagename=$matches[1]';

	return $rules;
}

?>

comment:14 @bitethemailman7 years ago

This never happened to me in 2.3.2, but happens in 2.5. I'm having the same problem described by chrismou in comment 8.
Duplicated in #7444.

comment:16 @mrmist6 years ago

Also ref #7355

comment:17 @mrmist6 years ago

  • Keywords needs-patch added

comment:18 @Denis-de-Bernardy6 years ago

  • Keywords permalinks number removed

comment:19 @Denis-de-Bernardy6 years ago

  • Component changed from General to Permalinks
  • Owner changed from anonymous to ryan

comment:20 @janeforshort5 years ago

  • Milestone changed from 2.9 to Future Release

Punting due to schedule.

comment:21 @Viper007Bond4 years ago

See also this dupe ticket with some good discussion on it: #14238

comment:22 follow-up: @SergeyBiryukov4 years ago

Just stumbled upon this. Steps to reproduce:

  1. Create a post with slug 1.
  2. Set permalink structure to /%year%/%monthnum%/%postname%/.
  3. Then we have request: 2008/12/1
  4. Matched rule: year=2008&monthnum=12&day=1

Other permalink structures (Day and name, Numeric, even /%postname%/ mentioned above) work fine.

comment:23 in reply to: ↑ 22 ; follow-up: @Viper007Bond4 years ago

Replying to SergeyBiryukov:

  1. Create a post with slug 1.

I don't believe that's possible. WordPress will currently change that to 1-2 as far as I know.

comment:24 in reply to: ↑ 23 @SergeyBiryukov4 years ago

Replying to Viper007Bond:

I don't believe that's possible. WordPress will currently change that to 1-2 as far as I know.

That's for pages. It's still possible to create a post with a numeric-only slug.

comment:26 @lkraav3 years ago

  • Cc lkraav added

comment:27 @SergeyBiryukov3 years ago

Closed #20617 as a duplicate.

comment:28 @wonderboymusic2 years ago

This is the date rewrite rules being generated regardless of the post_permastruct. Because the post permastruct Sergey listed doesn't match any form of endian, this date permastruct wins:

if ( empty($date_endian) )
	$date_endian = '%year%/%monthnum%/%day%';

Because of that, day archives are generated which wipe out post permastruct when the slug matches a 1-2 digit integer. Here's an abbreviated list of the rules that are generated:

array (
  '([0-9]{4})/([0-9]{1,2})/([0-9]{1,2})/page/?([0-9]{1,})/?$' => 'index.php?year=$matches[1]&monthnum=$matches[2]&day=$matches[3]&paged=$matches[4]',
  '([0-9]{4})/([0-9]{1,2})/([0-9]{1,2})/?$' => 'index.php?year=$matches[1]&monthnum=$matches[2]&day=$matches[3]',
  '([0-9]{4})/([0-9]{1,2})/page/?([0-9]{1,})/?$' => 'index.php?year=$matches[1]&monthnum=$matches[2]&paged=$matches[3]',
  '([0-9]{4})/([0-9]{1,2})/?$' => 'index.php?year=$matches[1]&monthnum=$matches[2]',
  '([0-9]{4})/page/?([0-9]{1,})/?$' => 'index.php?year=$matches[1]&paged=$matches[2]',
  '([0-9]{4})/?$' => 'index.php?year=$matches[1]',
  '([0-9]{4})/([0-9]{1,2})/([^/]+)(/[0-9]+)?/?$' => 'index.php?year=$matches[1]&monthnum=$matches[2]&name=$matches[3]&page=$matches[4]',
)

Similar issue: #13701

comment:29 @SergeyBiryukov2 years ago

#22592 was marked as a duplicate.

comment:30 @toscho2 years ago

  • Cc info@… added

comment:31 @DrewAPicture22 months ago

#24294 was marked as a duplicate.

comment:32 @c3mdigital18 months ago

#20185 was marked as a duplicate.

comment:33 @SergeyBiryukov15 months ago

#26641 was marked as a duplicate.

comment:34 @SergeyBiryukov13 months ago

#26974 was marked as a duplicate.

comment:35 @ircbot13 months ago

This ticket was mentioned in IRC in #wordpress-dev by SergeyBiryukov. View the logs.

comment:36 @Denis-de-Bernardy13 months ago

Posting the entire #26974 ticket's description for reference:

To reproduce, create a new blog, select that permalink structure, and publish an untitled post. View it. Expected: the post shows up. Actual: the day shows up empty — unless, of course, the post's day matches the post's id.
Minor, but annoying. Temporary fix to disable day links entirely:

add_filter('date_rewrite_rules', function($date_rewrite_rules) {
	if (get_option('permalink_structure') == '/%year%/%monthnum%/%postname%/') {
		$date_rewrite_rules = array_filter($date_rewrite_rules, function($rule) {
			return strpos($rule, '&day=') === false;
		});
	}
	return $date_rewrite_rules;
});

Methinks disabling day links should be the default behavior in WP when that structure is chosen.

comment:37 @SergeyBiryukov13 months ago

  • Milestone changed from Future Release to 3.9

comment:38 @SergeyBiryukov12 months ago

  • Milestone changed from 3.9 to Future Release

Still needs a patch.

comment:39 @ryan8 months ago

  • Owner ryan deleted
  • Status changed from new to assigned

comment:40 @loushou6 months ago

  • Keywords has-patch added; needs-patch removed

Seems to me that the only real solution to this problem is going to be to prefix those numeric slugs with some alpha-dash-text of some sort; otherwise, it will always get confused with any numeric value expected in the permalink, and thus interpreted incorrectly. Prefixing it with something as nondescript as '_' seems like the wrong solution, if for no other reason than _123 just looks weird in a URL.

My solution simply prefixes a numeric slug with 'number-' ( http://example.com/number-123/ ). It is longer, and one could argue that it is equally nondescript, but at least you know it is a number and it does not get confused with expected numeric values in the permalink. This is a blanket patch to the slug generation function, so it applies to all post_types. As requested, this prefix allows any post type to work in any of the standard permalink structures. As a party favor to all those devs out there who love their filters (like me), I also have a filter on 'number-' that only gets called when it is needed. As a fallback to all of this, if 'number-123' is not a nice permalink for a specific person, we obviously still have the ability to manually override the permalink, just not with anything numeric.

The original question also included a secondary issue, where apparently post_name values that were prefixes to a category slug used to force the category page to load instead of the post page. Based on the current permalink class, this sounds like it is already solved and not even possible any more. Even the old category slug - parent page slug collision problem seems solved. This is because categories and tags are all prefixed with a slug now. Matching clashes with that specified slug seems unnecessary, so I assume we are just dropping that second request.

Thus I will attach the patch.

@loushou6 months ago

wp-includes/post.php patch for handling numeric slugs

comment:41 follow-up: @toscho6 months ago

number must be translatable, and then you end up with something like číslo. That would probably raise new issues. I don’t think this is a robust solution.

@loushou6 months ago

updated patch, which includes translation on 'number'

comment:42 in reply to: ↑ 41 @loushou6 months ago

Replying to toscho:

number must be translatable, and then you end up with something like číslo. That would probably raise new issues. I don’t think this is a robust solution.

You make a good point that number needs to be translatable. Thus, I have updated the patch to include that in the solution. The second part of your objection is moot however. The slug sanitization process already handles sanitization of translations. If that is really an objection, then the sanitization functions need redesign, though I think you will find they don't. As for the 'robustness' of the solution, it is meant to be simple. If there is something I missed as far as the solution is concerned in the ways of 'robustness', feel free to suggest them, and I will try to work them in. Thanks again for the comment. Anything to make it better.

comment:43 @ircbot5 months ago

This ticket was mentioned in IRC in #wordpress-dev by SergeyBiryukov. View the logs.

comment:44 @slackbot2 months ago

This ticket was mentioned in Slack in #forums by radices. View the logs.

comment:45 @slackbot2 months ago

This ticket was mentioned in Slack in #forums by radices. View the logs.

comment:46 @ericlewis7 weeks ago

  • Keywords needs-patch good-first-bug added; has-patch removed
  • Milestone changed from Future Release to 4.2

Let's just prepend or append the number with an abstract character (maybe -?) and avoid translations.

Bringing it into 4.2 for consideration - needs a patch soon though.

@valendesigns6 weeks ago

@valendesigns6 weeks ago

comment:47 @valendesigns6 weeks ago

  • Keywords has-patch added; needs-patch removed

Replying to ericlewis:

Let's just prepend or append the number with an abstract character (maybe -?) and avoid translations.

Bringing it into 4.2 for consideration - needs a patch soon though.

The problem with prepending a hyphen alone is that it's technically still numeric and will run a second time when saving which then prepends a double hyphen --.

I've updated the patch to prepend the value of $post_type together with a hyphen to avoid translations. So you get something like post-123 or page-123. As well, I've added proper documentation to the new filter.

The one caveat is we are going to be changing slugs without any notice to the end user. So I created a second patch to allow this feature to be filtered off. That way if you want your slugs unmolested because you use a compatible permalink structure you're not completely without options.

comment:48 follow-up: @Denis-de-Bernardy6 weeks ago

I don't like this last patch, as it introduces potential localization problems.

Prepending or appending a hyphen might technically be numeric, but it doesn't get caught by the rewrite regex, and that's the only thing that counts. Imho, it's a better option that trying to introduce some arbitrary string that screams "translate me".

Along similar lines, we could prepend and/or append a colon : to the post's slug instead of a hyphen, in order to get urls like /2015/01/:23/ or /2015/01/:23:/. (This is assuming, of course, that sanitize_title wouldn't balk at the input. Whereas it actually might...)

Anyway, my concern is these potential solutions don't really fix the problem. They merely sweep things under the rug in the hopes of not allowing the bug in the future while not addressing the multitudes of sites that exhibit the issue with various duct tape fixes already. That and the other solutions discussed here merely stick to sweeping it under the rug so as to try to avoid it from occurring.

If the latest patch or a variation thereof gets applied, I'm left wondering what to do with the existing posts on my site that led me to report a bug, and I'd gather other reporters in the thread are left asking for more as well.

Be it #22592, #26974, #22592 or #24294, #20185, #26641, the issue seems to invariably break down to one of:

  • /yyyy/mm/slug/ conflicting with /yyyy/mm/dd/
  • /yyyy/slug/ conflicting with /yyyy/mm/
  • /slug/ conflicting with /yyyy/

On my own site, I disabled day links. It's not a one-size fits all solution, since for all I know I'd like these day links to get recognized in the event a user tries an arbitrary day link, but it was the right thing to do.

It seems to me that the better solution would be something like this:

  1. On permalink save (and on the next WP upgrade), test for permalink conflict potential. Suppose a two or four digit slug. Does a day rule catch it? The same for month and year rules. If yes, flag the structure as potentially leading to conflicts (as in year, month, day, or none).
  1. On the appropriate date archive type where we know conflicts may occur (year, month or day), try treating the offending parameter as if it was a post slug. This is a nearly free query -- we're looking for a single post. If a post is found, serve the expected post. If not, either fallback to serving the date archive as usual, or redirect to something appropriate (e.g. day url to month url, month url to year url, year url to blog url).

The upside of this suggestion is it "just works" from the end-user's point of view. I may be wrong, but I can't imagine a site owner caring the slightest bit about offending date archives having a slightly non-standard behavior if he opted for a permalink structure that basically invites potential conflicts.

Last edited 6 weeks ago by Denis-de-Bernardy (previous) (diff)

comment:49 in reply to: ↑ 48 ; follow-up: @SergeyBiryukov6 weeks ago

Replying to Denis-de-Bernardy:

On the appropriate date archive type where we know conflicts may occur (year, month or day), try treating the offending parameter as if it was a post ID. This is a nearly free query -- we're looking for a single post. If a post is found, serve the expected post. If not, either fallback to serving the date archive as usual, or redirect to something appropriate (e.g. day url to month url, month url to year url, year url to blog url).

Makes the most sense to me.

comment:50 @Denis-de-Bernardy6 weeks ago

Yeah. Just to clarify, I meant "as if it was a post slug", rather than as a post ID.

comment:51 follow-up: @jesin6 weeks ago

How about padding the slug with a leading zero?

I found another bug, not sure if there is a ticket for it.

  1. Set the permalink structure to %postname%.
  2. Create a post with slug foo-bar.
  3. Create a page with the same slug foo-bar.

Now the URL example.com/foo-bar/ loads the page created in step 2.

Problem lies in wp_unique_post_slug() which checks for uniqueness only within the post type.

comment:52 in reply to: ↑ 51 @SergeyBiryukov6 weeks ago

Replying to jesin:

Now the URL example.com/foo-bar/ loads the page created in step 2.

See #13459.

comment:53 @DrewAPicture4 weeks ago

  • Owner set to valendesigns

comment:54 in reply to: ↑ 49 @valendesigns3 weeks ago

  • Keywords needs-patch needs-unit-tests added; good-first-bug has-patch removed

Replying to SergeyBiryukov:

Replying to Denis-de-Bernardy:

On the appropriate date archive type where we know conflicts may occur (year, month or day), try treating the offending parameter as if it was a post ID. This is a nearly free query -- we're looking for a single post. If a post is found, serve the expected post. If not, either fallback to serving the date archive as usual, or redirect to something appropriate (e.g. day url to month url, month url to year url, year url to blog url).

Makes the most sense to me.

I'll start looking into a better long term solution, which doesn't force a prepend or append on a numeric slug.

@valendesigns3 weeks ago

comment:55 @valendesigns3 weeks ago

  • Keywords needs-unit-tests removed

Disregard the last patch it doesn't have full coverage, but is on the right path.

@valendesigns3 weeks ago

comment:56 @valendesigns3 weeks ago

  • Keywords has-patch added; needs-patch removed

OK, so patch 5305.3.diff should solve this issue. After trying a couple different things I stumbled into WP_Rewrite::get_date_permastruct where the solution was staring me in the face. We already have some code in place that was preventing %post_id% collisions. All I did was extend the current implementation to include fixes for the offending %postname% permalink structures.

@valendesigns3 weeks ago

Added a unit test.

comment:57 @valendesigns3 weeks ago

Patch 5305.4.diff adds a unit test that fails before the fix and passes once it's applied.

comment:58 @valendesigns3 weeks ago

  • Keywords has-patch removed

I stepped away for a minute and quickly realized there were backwards compatibility concerns due to redirect implications with this fix. I also ran all the phpunit tests, not just the post group, and got back failing tests. I'm going to keep at it, but I think we're at least moving in the right direction. I'll keep you guys posted. Sorry for all the chatter. I should have done more testing before uploading a patch.

Last edited 3 weeks ago by valendesigns (previous) (diff)

comment:59 @valendesigns3 weeks ago

Hooray, I'm confident I have a solution!

However, I want to test it rigorously before I upload it this time. The previous fix was OK in theory, but it didn't work in practice. This time around I created a pre_get_posts filter that solves the issue for this and all the other associated tickets without changing the permalinks, so no backwards compatibility concerns or broken tests.

I'll do a bit more testing and upload the patch in the next day or so. In the meantime I need to know how to create a unit test post that has an ID above 1k, so I can test for one of the known year permalink conflicts. Anyone have suggestions or tricks that might be useful in artificially inflating the post ID without actually creating that many posts?

@valendesigns3 weeks ago

comment:60 @valendesigns3 weeks ago

  • Keywords has-patch needs-testing added

Patch 5305.5.diff is ready for testing! I've included 6 new unit tests and I feel confident that the issue is now solved. I couldn't account for every possible permalink structure, but I feel we now have coverage on the majority of them. As well, this fixes posts without titles, which has also been a long standing issue.

I did have to introduce a new filter, as much as I didn't want to, but we needed a way to extend this functionality to custom post types without having to use the functions I created. Please let me know if there is an outlier permalink structure that you know of that wasn't mentioned in the comments that needs to be accounted for, or you think I did something in a way that doesn't make sense so I can fix it.

Thanks Guys! And thank you @DrewAPicture for making me the owner.

@valendesigns3 weeks ago

comment:61 @slackbot3 weeks ago

This ticket was mentioned in Slack in #core by valendesigns. View the logs.

comment:62 @valendesigns2 weeks ago

  • Keywords dev-feedback added

It's been a week without any response to the latest 5305.6.diff patch, so I figure it's time to ping this thread and get the conversation going again. Has anyone had time to test the patch or look over the code?

comment:63 @boonebgorges12 days ago

valendesigns - Thanks for your work on this. Broadly speaking, I think the strategy here is good: when a date query is detected and the permalink structure contains %postname%, give precedence to a post matching the postname, rather than the corresponding date archive.

I have some implementation suggestions, some large and some small. See [5305.7.diff].

First, a 'pre_get_posts' callback doesn't seem like the right place to do this. For one thing, while you've given it a priority of 1, it's still possible for plugins to intervene before it runs, which could cause various problems. It's also not clear to me why we would want the proposed to be unhookable. It seems to me that the proper place is in WP_Query::parse_query(), right after the existing date checks.

I've also simplified (and, I think, generalized) your $token_index trick. What we really care about is the location of %postname%, and whether it's preceded directly by %year% or %month% or whether it's in the 0th position. My patch tests this more directly.

I also combined the two functions into a single method. By returning from the method whenever we know there's nothing to do, we prevent the nesting that I'm assuming you were trying to avoid.

I simplified <!--nextpage--> support as well. Most of the necessary work - like the nextpage count - is done for us later on in setup_postdata(). No need to repeat it here. The only thing required is to translate the date query var into 'page'.

I've left a @todo comment on the zeroise() block. There is still a bug here. Month and day URL chunks are parsed properly with or without the leading zero (([0-9]{4})/([0-9]{1,2})/([0-9]{1,2})/page/?([0-9]{1,})/?$' - note the {1,2}). This means that 2015/2 is a potential clash when post_name=2, in addition to 2015/02 and post_name=02. Does this mean we have to do two calls to get_page_by_path()`?

@boonebgorges12 days ago

comment:64 @boonebgorges12 days ago

Forgot to mention in the above that I left out the post_type filter, and hardcoded 'post'. WP doesn't create date-based archives for CPTs out of the box. So I'm not sure that there's a natural situation where you'd have the kind of slug conflict covered in this post. I suppose it'd be possible in cases where a plugin registers its own rewrite rules for these kinds of archives. But in that case, we should probably be doing something more automatic to detect that these post types are subject to the clash in question - maybe examining the rewrite rules or something? Any thoughts are welcome.

comment:65 @valendesigns12 days ago

  • Keywords dev-feedback removed

@boonebgorges Thank you for testing my patch. Also, I really like your iteration, it's very clean and concise.

To address using pre_get_posts, my thinking was doing it that way first would make it easier to test and iterate on. I just wasn't sure of the best place to jump in and take over. You seemed to have found it though.

As for <!--nextpage-->, this is something that does not act in the same manner as the last patch. For example, say I have a /%year%/%postname%/ permalink structure and I create a post named '02' and I try and view any day archive in February, the post takes over. The way I had it setup was that it took over only if <!--nextpage--> was being used. And then, only for the pages it uses. So that means if I have 2 pages for the '02' post, then /2015/02/1/ & /2015/02/2/ would match a page, but /2015/02/3/ would pass and the date archive would be shown instead. So even though setup_postdata() is where the current page count happens, we need to do our own logic to establish the paging boundaries and avoid introducing a regression.

I used zeroise() in an attempt to normalize input so it returned in a date format with a leading zero. However, after second thought, that logic may be somewhat flawed due to the reasons you mentioned above with '2' & '02' being the same as a date, but not as a post. The real issue is that the 0 is not passed in the query var, so it doesn't match the '02' post name format. I guess the question should be why is the leading zero being stripped away from the query var? We might want to start looking in that direction to avoid a second query and/or using zeroise(). If that can't be changed, then yes we'll likely need to do a second get_page_by_path().

Something to note, is that get_page_by_path() cannot find posts without titles. It will always return null in that situation. So in order to resolve that, we need to have the second check in place or update the logic in get_page_by_path() to allow for checks against ID and not just post_name. In either case, we should find a way to address resolving posts without titles where the ID collides with a date.

I realized yesterday that the post_type filter was not needed when I tested the patch further by creating a very simple CPT. I quickly found out that it was a pointless filter and I had every intention of removing it, but got sidetracked on something else. It would seem you beat me to it. ;)

I only briefly tested your new patch, but I'll put it through its paces and lob back an update on Monday. Overall though, I think we're getting pretty close to having a final solution to this long standing issue.

Thanks again for taking the time to rework the code and give feedback. Cheers!

comment:66 @valendesigns12 days ago

Sorry, I take my comment back about get_page_by_path(). After further investigation and testing, I was very much wrong on that one. I'm trying to remember why I thought that it would return null in that situation, I know there was a reason for the extra get_post call in the resolve function, but for the life of me I can't remember what I was thinking when I wrote it. Disregard that part of my previous response. Though if I remember what edge case I was trying to account for I'll let you know.

@valendesigns12 days ago

comment:67 @valendesigns12 days ago

I've added three new tests and a couple fixes in patch 5305.8.diff. Two of those tests are to demonstrate the post_name=2 vs. post_name=02 issue, and that it's now been fixed. The third test demonstrates the pagination issue I mentioned above. Basically, the test would return true before the patch is applied and is expected to return true after the patch is applied. Without the latest changes, is_day would return false, which is not the expected behavior. I would have left this all alone until Monday, but I got pulled in when I started testing.

@boonebgorges11 days ago

comment:68 follow-up: @boonebgorges11 days ago

Thanks, valendesigns. You're right about nextpage - I hadn't been thinking of the case of invalid page numbers. On that note, I've added a few more tests, and tweaked the logic a bit more. Take the following two cases:

  1. A post called 02 with two instances of <!--nextpage-->. example.com/2015/02/05 is not a valid page, so should resolve to the day archive. (Your patch did this.)
  2. A post called 02 with *no* instances of <!--nextpage-->. example.com/2015/02/05 is not a valid page, so should resolve to the day archive. (Your patch did not do this, but 5305.9.diff does.)

I also reorganized the comments to narrate the flow a bit better. Let me know what you think.

comment:69 in reply to: ↑ 68 @valendesigns11 days ago

@boonebgorges This looks great! Nice catch on the page 1 bug, as well. I think the only thing left to do is have more people test it out. So I'll summarize the patch for those interested in testing.

  • Added a total of twelve new tests. Two of those tests should pass and ten should fail before the patch is applied. The two passing tests are:
    test_date_slug_collision_should_distinguish_too_high_pagination_from_date()
    
    test_date_slug_collision_should_be_ignored_when_pagination_var_is_present_but_post_does_not_have_multiple_pages()
    
  • Resolved the following date premalink structures for numeric slugs; this includes posts without titles.
    • /post_name/
    • /post_name/page/
    • /year/post_name/
    • /year/post_name/page/
    • /year/month/post_name/
  • Accounted for day and monthnum values that are cast to integers, which strips leading zeroes. This allows us to resolve variations in post_name, like /2015/2/ and /2015/02/.
  • Added the page query var when post pagination is present and valid.
Last edited 11 days ago by valendesigns (previous) (diff)

@boonebgorges7 days ago

comment:70 @boonebgorges7 days ago

  • Keywords needs-patch added; has-patch needs-testing removed

Coming back to review this, I think there's a problem with the current approach. By intervening in WP_Query::parse_query(), we make it impossible to create a WP_Query object whose date params clash with the permalink of a post with a numeric slug. See test_numeric_slug_permalink_conflicts_should_only_be_resolved_for_the_main_query() in 5305.10.diff. What we are trying to resolve here is improperly parsed URLs, but solving the problem in WP_Query means that we're "fixing" queries that didn't originate in URLs at all.

I see two possible ways forward:

  1. Move our proposed disambiguation logic to somewhere around the 'request' filter or 'parse_request' action. The internals will have to change a bit.
  2. Keep it in WP_Query, but only run it if is_main_query().

2 is easier, but I don't think it'll be reliable. It won't fix url_to_postid(), for instance. It looks like 1 could possibly fit into parse_request() after non-public post_types are filtered out - the disambiguation is a similar kind of sanity check.

valendesigns, do you have thoughts about this?

comment:71 @valendesigns7 days ago

@boonebgorges I originally had is_main_query() in the code about 5 patches ago, but I removed it before uploading and after finding that is_date worked reasonably well to keep our code out of other queries. Though, as you've pointed out we are now blocking a valid date query altogether. So yes, I agree we need to move it or rethink the implementation. I'm working on some ideas to see if it's possible to keep it where it is, but I'm not all that optimistic.

@valendesigns7 days ago

comment:72 @valendesigns7 days ago

  • Keywords has-patch added; needs-patch removed

@boonebgorges I've moved the function, and how & where it's called. It now directly modifies the query vars before WP_Query is called. As well, I've added additional tests. Please take a look at 5305.11.diff and let me know what you think.

@boonebgorges7 days ago

comment:73 @boonebgorges7 days ago

Getting closer :)

After some conversation with valendesigns, I'm going to propose that we take a two-pronged strategy. See 5305.12.diff.

  1. During parse_request(), if we detect that a URL looks like a date query, look for conflicts with posts that have numeric slugs. (This is what 11.diff already did.)
  2. Simply doing (1) is a potentially incomplete solution, because it fixes the original bug - not being able to access a post '28' located at /2015/02/28/ - at the expense of creating another, lesser bug - not being able to access the date archive at /2015/02/28/. So we should also attempt to prevent the creation of posts with this kind of slug. 12.diff attempts a solution that is more modest, in two ways, than what has been suggested in previous discussion on this ticket.
    1. It only affects the auto-generated slug in get_sample_permalink(). If you want to override this by manually entering a conflicting slug (or by creating posts programatically), you may.
    2. It only enforces uniqueness (using the -2 etc formula used in wp_unique_post_slug()) if it's detected that the suggested post would actually create a date conflict. We look at the current permalink structure, and verify that the autogenerated post_name would be a valid value for year/monthnum/day, as appropriate. So the suffixing will only happen in very rare cases, instead of every time the post_name is numeric.

Feedback welcome. I'd especially like to know what people think about the get_sample_permalink() strategy - is it too lax? Should we be enforcing this in wp_unique_post_slug()?

comment:74 @valendesigns2 days ago

@boonebgorges Sorry for the delay, I was out of town the last couple of days. This looks great! Though, I haven't tested it yet. I will tonight and get back to you. As far as the get_sample_permalink() strategy, my first thoughts are that it's good to be somewhere in the middle when enforcing unique slugs. Especially since we have a solid two prong approach that covers both sides of the fence without being overly aggressive.

comment:75 @slackbot38 hours ago

This ticket was mentioned in Slack in #core by boone. View the logs.

comment:76 @slackbot30 hours ago

This ticket was mentioned in Slack in #core by drew. View the logs.

Note: See TracTickets for help on using tickets.