WordPress.org

Make WordPress Core

Opened 8 years ago

Last modified 39 hours ago

#5305 assigned defect (bug)

permalinks broken when article name is numeric

Reported by: thomask Owned by: valendesigns
Milestone: 4.3 Priority: normal
Severity: major Version: 2.3.1
Component: Permalinks Keywords: has-patch
Focuses: Cc:

Description

if you create numeric-only post name, the generated slug is this number - this conflicts with article ID, so it returns different article or 404 page, never the article. It can be then solved by generating manual slug with some char in it, but i think it would be better to include some char in that case, e.g. underscore, like _123

Also if someone will try to solve this, it would be nice to solve other problem - if post slug is begining with the slug of the category, than the category page returns that post, not the category

Attachments (21)

wp-includes_post.php.diff (693 bytes) - added by loushou 8 months ago.
wp-includes/post.php patch for handling numeric slugs
wp-includes_post.php.2.diff (704 bytes) - added by loushou 8 months ago.
updated patch, which includes translation on 'number'
5305-unittests.diff (1.1 KB) - added by MikeHansenMe 7 months ago.
see #30284
5305.diff (1.3 KB) - added by valendesigns 4 months ago.
5305.1.diff (1.6 KB) - added by valendesigns 4 months ago.
5305.2.diff (1.6 KB) - added by valendesigns 4 months ago.
5305.3.diff (1.6 KB) - added by valendesigns 4 months ago.
5305.4.diff (2.5 KB) - added by valendesigns 4 months ago.
Added a unit test.
5305.5.diff (10.5 KB) - added by valendesigns 4 months ago.
5305.6.diff (10.5 KB) - added by valendesigns 4 months ago.
5305.7.diff (7.9 KB) - added by boonebgorges 3 months ago.
5305.8.diff (10.3 KB) - added by valendesigns 3 months ago.
5305.9.diff (12.0 KB) - added by boonebgorges 3 months ago.
5305.10.diff (13.2 KB) - added by boonebgorges 3 months ago.
5305.11.diff (18.6 KB) - added by valendesigns 3 months ago.
5305.12.diff (24.5 KB) - added by boonebgorges 3 months ago.
5305.13.diff (24.5 KB) - added by valendesigns 2 months ago.
refresh
5305.14.diff (24.5 KB) - added by valendesigns 5 weeks ago.
refresh
5305.15.diff (24.9 KB) - added by boonebgorges 2 weeks ago.
5305.16.diff (11.8 KB) - added by boonebgorges 2 days ago.
5305.17.diff (33.1 KB) - added by boonebgorges 39 hours ago.

Download all attachments as: .zip

Change History (109)

comment:1 @lloydbudd8 years ago

thomask, can you include your Permalink Structure? And provide specific examples? This could help duplicate the problem.

comment:2 @darkfate8 years ago

I don't think he's using permalinks. I think he's saying, like if his post slug is 123 and his title is 123, it breaks. It looks like 2.4 trunk doesn't break like that though. Unless it is a specific permalink structure.

comment:3 @dille8 years ago

My structure is /archief/%postname%/, and I see the same problem. I've simply re-slugged the offending posts to the number in words, so 30 would become "thirty" (or in my case, "dertig", as I'm Dutch).

I didn't have this problem when my structure was /archief/year/month/day/postname/ though.

comment:4 @JDTrower7 years ago

Actually, I am able to verify that this does happen. If you have a custom permalink, where you have something like /archives/%postname%/ or /archives/post/%postname%/ and you publish a post with a numeric title such as 2008, when you click to view the post, you get a 404 error. In all my testing, this only occurs with this type of a permalink where you have a static word followed by a forward slash and then the dynamic postname.

If you try /archive/%author%/%postname/ or put any of the other dynamic options between the static word and the dynamic postname, the link will work fine.

I verified this in 2.3.1 and in 2.4 trunk.

comment:5 @JDTrower7 years ago

Also, if you have a custom permalink that is just /%postname%/ you get a 404 error on a post with a numeric title.

As for the other problem that was mentioned concerning if a post slug is beginning with the slug of the category, than the category page returns that post, not the category. I have been unable to recreate it.

comment:6 @JDTrower7 years ago

Just a note, Ryan in comment 25 on ticket #3614 (http://trac.wordpress.org/ticket/3614#comment:25) explains why the 404 errors occur with permalinks like those that I tried (I don't personally use them myself) or that others use. I have yet to test the fix that they have come up with on that ticket to see if it solves the problem here(although, it appears as though you would still need to modify your permalinks structure to not be like those above). When I get a chance I'll test this ticket based on ticket #3614.

comment:7 @lloydbudd7 years ago

  • Milestone changed from 2.3.2 to 2.5

comment:8 follow-up: @chrismou7 years ago

I've just a had a similar ticket closed as a duplicate, although I'm not 100% sure from the comments here that people have grasped what the problem is.

If you need examples, check the info on http://trac.wordpress.org/ticket/6997

To summarise, I create a page (lets call it "a"). The slug is also "a". I then create another page, give it a slug of "2008" and set "a" to be its parent. When I try to navigate to /a/2008/, it simply shows me the content of "a", rather than "2008". If I add anything else onto the end (ie a/2008/a/) it redirects me to a post written in 2008.

I just seems to be the case that as soon as it sees 2008, it tries to guess what I want, rather than just checking to see if there is actually a page called 2008.

comment:9 in reply to: ↑ 8 @DD327 years ago

By the look of it, Its reconising the 2008 as a page value(ie. The 2008th page):

object(WP_Query)[3]
  public 'query_vars' => &
    array
      'page' => string '/2008' (length=5)
      'pagename' => string 'a' (length=1)

(Which in itself seems to be a bug that /2008 is reconised as the page number, it shouldnt have the slash in there)

The rule which is being matched is:

[(.+?)(/[0-9]+)?/?$] => index.php?pagename=$matches[1]&page=$matches[2]

I can see 2 ways to resolve it:

  1. Revert to the old method of actually having *every* page slug being checked for, which is slow, messy, and really isnt going to happen
  2. Limit pages to only have a certain ammount of pages (Why remove functionality?)
  3. Prefix extra pages with /page/

More on that last one: Most common permalink structures in WordPress allready use that method (.*/page/[0-9]+).

Changing the current permalink settings will break links on pages at present (ie. /a/2/ for page 2) in search results, But would allow for numeric page slugs on subpages.

comment:10 @DD327 years ago

Also, For consistancy, Is there a reason thats using &page= instead of &paged= ?

comment:11 @SupersonicSquirrel7 years ago

I also had a ticket about this closed and I had no idea it existed already, sorry; as I had no such problems in any version prior to 2.5.1. I'm confused now. How come that the issue occurs in different versions of WP for each person?

comment:13 @Viper007Bond7 years ago

For anyone looking to do this meanwhile and is willing to give up the paging ability, here's a plugin I wrote:

<?php
/*
Plugin Name:  Allow Numeric Stubs
Description:  Allows children Pages to have a stub that is only a number. Sacrifices the <code>&lt;!--nextpage--&gt;</code> ability in Pages.
Version:      2008.07.23
Author:       Viper007Bond
Author URI:   http://www.viper007bond.com/
*/


// Register plugin hooks
register_activation_hook( __FILE__, 'allow_numeric_stubs_activate' );
add_filter( 'page_rewrite_rules', 'allow_numeric_stubs' );


// Force a flush of the rewrite rules when this plugin is activated
function allow_numeric_stubs_activate() {
	global $wp_rewrite;
	$wp_rewrite->flush_rules();
}


// Remove the offending rule and replace it with something else
function allow_numeric_stubs( $rules ) {
	unset( $rules['(.+?)(/[0-9]+)?/?$'] );

	$rules['(.+?)?/?$'] = 'index.php?pagename=$matches[1]';

	return $rules;
}

?>

comment:14 @bitethemailman7 years ago

This never happened to me in 2.3.2, but happens in 2.5. I'm having the same problem described by chrismou in comment 8.
Duplicated in #7444.

comment:16 @mrmist6 years ago

Also ref #7355

comment:17 @mrmist6 years ago

  • Keywords needs-patch added

comment:18 @Denis-de-Bernardy6 years ago

  • Keywords permalinks number removed

comment:19 @Denis-de-Bernardy6 years ago

  • Component changed from General to Permalinks
  • Owner changed from anonymous to ryan

comment:20 @janeforshort6 years ago

  • Milestone changed from 2.9 to Future Release

Punting due to schedule.

comment:21 @Viper007Bond4 years ago

See also this dupe ticket with some good discussion on it: #14238

comment:22 follow-up: @SergeyBiryukov4 years ago

Just stumbled upon this. Steps to reproduce:

  1. Create a post with slug 1.
  2. Set permalink structure to /%year%/%monthnum%/%postname%/.
  3. Then we have request: 2008/12/1
  4. Matched rule: year=2008&monthnum=12&day=1

Other permalink structures (Day and name, Numeric, even /%postname%/ mentioned above) work fine.

comment:23 in reply to: ↑ 22 ; follow-up: @Viper007Bond4 years ago

Replying to SergeyBiryukov:

  1. Create a post with slug 1.

I don't believe that's possible. WordPress will currently change that to 1-2 as far as I know.

comment:24 in reply to: ↑ 23 @SergeyBiryukov4 years ago

Replying to Viper007Bond:

I don't believe that's possible. WordPress will currently change that to 1-2 as far as I know.

That's for pages. It's still possible to create a post with a numeric-only slug.

comment:26 @lkraav3 years ago

  • Cc lkraav added

comment:27 @SergeyBiryukov3 years ago

Closed #20617 as a duplicate.

comment:28 @wonderboymusic3 years ago

This is the date rewrite rules being generated regardless of the post_permastruct. Because the post permastruct Sergey listed doesn't match any form of endian, this date permastruct wins:

if ( empty($date_endian) )
	$date_endian = '%year%/%monthnum%/%day%';

Because of that, day archives are generated which wipe out post permastruct when the slug matches a 1-2 digit integer. Here's an abbreviated list of the rules that are generated:

array (
  '([0-9]{4})/([0-9]{1,2})/([0-9]{1,2})/page/?([0-9]{1,})/?$' => 'index.php?year=$matches[1]&monthnum=$matches[2]&day=$matches[3]&paged=$matches[4]',
  '([0-9]{4})/([0-9]{1,2})/([0-9]{1,2})/?$' => 'index.php?year=$matches[1]&monthnum=$matches[2]&day=$matches[3]',
  '([0-9]{4})/([0-9]{1,2})/page/?([0-9]{1,})/?$' => 'index.php?year=$matches[1]&monthnum=$matches[2]&paged=$matches[3]',
  '([0-9]{4})/([0-9]{1,2})/?$' => 'index.php?year=$matches[1]&monthnum=$matches[2]',
  '([0-9]{4})/page/?([0-9]{1,})/?$' => 'index.php?year=$matches[1]&paged=$matches[2]',
  '([0-9]{4})/?$' => 'index.php?year=$matches[1]',
  '([0-9]{4})/([0-9]{1,2})/([^/]+)(/[0-9]+)?/?$' => 'index.php?year=$matches[1]&monthnum=$matches[2]&name=$matches[3]&page=$matches[4]',
)

Similar issue: #13701

comment:29 @SergeyBiryukov3 years ago

#22592 was marked as a duplicate.

comment:30 @toscho3 years ago

  • Cc info@… added

comment:31 @DrewAPicture2 years ago

#24294 was marked as a duplicate.

comment:32 @c3mdigital21 months ago

#20185 was marked as a duplicate.

comment:33 @SergeyBiryukov18 months ago

#26641 was marked as a duplicate.

comment:34 @SergeyBiryukov16 months ago

#26974 was marked as a duplicate.

comment:35 @ircbot16 months ago

This ticket was mentioned in IRC in #wordpress-dev by SergeyBiryukov. View the logs.

comment:36 @Denis-de-Bernardy16 months ago

Posting the entire #26974 ticket's description for reference:

To reproduce, create a new blog, select that permalink structure, and publish an untitled post. View it. Expected: the post shows up. Actual: the day shows up empty — unless, of course, the post's day matches the post's id.
Minor, but annoying. Temporary fix to disable day links entirely:

add_filter('date_rewrite_rules', function($date_rewrite_rules) {
	if (get_option('permalink_structure') == '/%year%/%monthnum%/%postname%/') {
		$date_rewrite_rules = array_filter($date_rewrite_rules, function($rule) {
			return strpos($rule, '&day=') === false;
		});
	}
	return $date_rewrite_rules;
});

Methinks disabling day links should be the default behavior in WP when that structure is chosen.

comment:37 @SergeyBiryukov16 months ago

  • Milestone changed from Future Release to 3.9

comment:38 @SergeyBiryukov15 months ago

  • Milestone changed from 3.9 to Future Release

Still needs a patch.

comment:39 @ryan11 months ago

  • Owner ryan deleted
  • Status changed from new to assigned

comment:40 @loushou8 months ago

  • Keywords has-patch added; needs-patch removed

Seems to me that the only real solution to this problem is going to be to prefix those numeric slugs with some alpha-dash-text of some sort; otherwise, it will always get confused with any numeric value expected in the permalink, and thus interpreted incorrectly. Prefixing it with something as nondescript as '_' seems like the wrong solution, if for no other reason than _123 just looks weird in a URL.

My solution simply prefixes a numeric slug with 'number-' ( http://example.com/number-123/ ). It is longer, and one could argue that it is equally nondescript, but at least you know it is a number and it does not get confused with expected numeric values in the permalink. This is a blanket patch to the slug generation function, so it applies to all post_types. As requested, this prefix allows any post type to work in any of the standard permalink structures. As a party favor to all those devs out there who love their filters (like me), I also have a filter on 'number-' that only gets called when it is needed. As a fallback to all of this, if 'number-123' is not a nice permalink for a specific person, we obviously still have the ability to manually override the permalink, just not with anything numeric.

The original question also included a secondary issue, where apparently post_name values that were prefixes to a category slug used to force the category page to load instead of the post page. Based on the current permalink class, this sounds like it is already solved and not even possible any more. Even the old category slug - parent page slug collision problem seems solved. This is because categories and tags are all prefixed with a slug now. Matching clashes with that specified slug seems unnecessary, so I assume we are just dropping that second request.

Thus I will attach the patch.

@loushou8 months ago

wp-includes/post.php patch for handling numeric slugs

comment:41 follow-up: @toscho8 months ago

number must be translatable, and then you end up with something like číslo. That would probably raise new issues. I don’t think this is a robust solution.

@loushou8 months ago

updated patch, which includes translation on 'number'

comment:42 in reply to: ↑ 41 @loushou8 months ago

Replying to toscho:

number must be translatable, and then you end up with something like číslo. That would probably raise new issues. I don’t think this is a robust solution.

You make a good point that number needs to be translatable. Thus, I have updated the patch to include that in the solution. The second part of your objection is moot however. The slug sanitization process already handles sanitization of translations. If that is really an objection, then the sanitization functions need redesign, though I think you will find they don't. As for the 'robustness' of the solution, it is meant to be simple. If there is something I missed as far as the solution is concerned in the ways of 'robustness', feel free to suggest them, and I will try to work them in. Thanks again for the comment. Anything to make it better.

comment:43 @ircbot8 months ago

This ticket was mentioned in IRC in #wordpress-dev by SergeyBiryukov. View the logs.

comment:44 @slackbot5 months ago

This ticket was mentioned in Slack in #forums by radices. View the logs.

comment:45 @slackbot5 months ago

This ticket was mentioned in Slack in #forums by radices. View the logs.

comment:46 @ericlewis4 months ago

  • Keywords needs-patch good-first-bug added; has-patch removed
  • Milestone changed from Future Release to 4.2

Let's just prepend or append the number with an abstract character (maybe -?) and avoid translations.

Bringing it into 4.2 for consideration - needs a patch soon though.

@valendesigns4 months ago

@valendesigns4 months ago

comment:47 @valendesigns4 months ago

  • Keywords has-patch added; needs-patch removed

Replying to ericlewis:

Let's just prepend or append the number with an abstract character (maybe -?) and avoid translations.

Bringing it into 4.2 for consideration - needs a patch soon though.

The problem with prepending a hyphen alone is that it's technically still numeric and will run a second time when saving which then prepends a double hyphen --.

I've updated the patch to prepend the value of $post_type together with a hyphen to avoid translations. So you get something like post-123 or page-123. As well, I've added proper documentation to the new filter.

The one caveat is we are going to be changing slugs without any notice to the end user. So I created a second patch to allow this feature to be filtered off. That way if you want your slugs unmolested because you use a compatible permalink structure you're not completely without options.

comment:48 follow-up: @Denis-de-Bernardy4 months ago

I don't like this last patch, as it introduces potential localization problems.

Prepending or appending a hyphen might technically be numeric, but it doesn't get caught by the rewrite regex, and that's the only thing that counts. Imho, it's a better option that trying to introduce some arbitrary string that screams "translate me".

Along similar lines, we could prepend and/or append a colon : to the post's slug instead of a hyphen, in order to get urls like /2015/01/:23/ or /2015/01/:23:/. (This is assuming, of course, that sanitize_title wouldn't balk at the input. Whereas it actually might...)

Anyway, my concern is these potential solutions don't really fix the problem. They merely sweep things under the rug in the hopes of not allowing the bug in the future while not addressing the multitudes of sites that exhibit the issue with various duct tape fixes already. That and the other solutions discussed here merely stick to sweeping it under the rug so as to try to avoid it from occurring.

If the latest patch or a variation thereof gets applied, I'm left wondering what to do with the existing posts on my site that led me to report a bug, and I'd gather other reporters in the thread are left asking for more as well.

Be it #22592, #26974, #22592 or #24294, #20185, #26641, the issue seems to invariably break down to one of:

  • /yyyy/mm/slug/ conflicting with /yyyy/mm/dd/
  • /yyyy/slug/ conflicting with /yyyy/mm/
  • /slug/ conflicting with /yyyy/

On my own site, I disabled day links. It's not a one-size fits all solution, since for all I know I'd like these day links to get recognized in the event a user tries an arbitrary day link, but it was the right thing to do.

It seems to me that the better solution would be something like this:

  1. On permalink save (and on the next WP upgrade), test for permalink conflict potential. Suppose a two or four digit slug. Does a day rule catch it? The same for month and year rules. If yes, flag the structure as potentially leading to conflicts (as in year, month, day, or none).
  1. On the appropriate date archive type where we know conflicts may occur (year, month or day), try treating the offending parameter as if it was a post slug. This is a nearly free query -- we're looking for a single post. If a post is found, serve the expected post. If not, either fallback to serving the date archive as usual, or redirect to something appropriate (e.g. day url to month url, month url to year url, year url to blog url).

The upside of this suggestion is it "just works" from the end-user's point of view. I may be wrong, but I can't imagine a site owner caring the slightest bit about offending date archives having a slightly non-standard behavior if he opted for a permalink structure that basically invites potential conflicts.

Last edited 4 months ago by Denis-de-Bernardy (previous) (diff)

comment:49 in reply to: ↑ 48 ; follow-up: @SergeyBiryukov4 months ago

Replying to Denis-de-Bernardy:

On the appropriate date archive type where we know conflicts may occur (year, month or day), try treating the offending parameter as if it was a post ID. This is a nearly free query -- we're looking for a single post. If a post is found, serve the expected post. If not, either fallback to serving the date archive as usual, or redirect to something appropriate (e.g. day url to month url, month url to year url, year url to blog url).

Makes the most sense to me.

comment:50 @Denis-de-Bernardy4 months ago

Yeah. Just to clarify, I meant "as if it was a post slug", rather than as a post ID.

comment:51 follow-up: @jesin4 months ago

How about padding the slug with a leading zero?

I found another bug, not sure if there is a ticket for it.

  1. Set the permalink structure to %postname%.
  2. Create a post with slug foo-bar.
  3. Create a page with the same slug foo-bar.

Now the URL example.com/foo-bar/ loads the page created in step 2.

Problem lies in wp_unique_post_slug() which checks for uniqueness only within the post type.

comment:52 in reply to: ↑ 51 @SergeyBiryukov4 months ago

Replying to jesin:

Now the URL example.com/foo-bar/ loads the page created in step 2.

See #13459.

comment:53 @DrewAPicture4 months ago

  • Owner set to valendesigns

comment:54 in reply to: ↑ 49 @valendesigns4 months ago

  • Keywords needs-patch needs-unit-tests added; good-first-bug has-patch removed

Replying to SergeyBiryukov:

Replying to Denis-de-Bernardy:

On the appropriate date archive type where we know conflicts may occur (year, month or day), try treating the offending parameter as if it was a post ID. This is a nearly free query -- we're looking for a single post. If a post is found, serve the expected post. If not, either fallback to serving the date archive as usual, or redirect to something appropriate (e.g. day url to month url, month url to year url, year url to blog url).

Makes the most sense to me.

I'll start looking into a better long term solution, which doesn't force a prepend or append on a numeric slug.

@valendesigns4 months ago

comment:55 @valendesigns4 months ago

  • Keywords needs-unit-tests removed

Disregard the last patch it doesn't have full coverage, but is on the right path.

@valendesigns4 months ago

comment:56 @valendesigns4 months ago

  • Keywords has-patch added; needs-patch removed

OK, so patch 5305.3.diff should solve this issue. After trying a couple different things I stumbled into WP_Rewrite::get_date_permastruct where the solution was staring me in the face. We already have some code in place that was preventing %post_id% collisions. All I did was extend the current implementation to include fixes for the offending %postname% permalink structures.

@valendesigns4 months ago

Added a unit test.

comment:57 @valendesigns4 months ago

Patch 5305.4.diff adds a unit test that fails before the fix and passes once it's applied.

comment:58 @valendesigns4 months ago

  • Keywords has-patch removed

I stepped away for a minute and quickly realized there were backwards compatibility concerns due to redirect implications with this fix. I also ran all the phpunit tests, not just the post group, and got back failing tests. I'm going to keep at it, but I think we're at least moving in the right direction. I'll keep you guys posted. Sorry for all the chatter. I should have done more testing before uploading a patch.

Last edited 4 months ago by valendesigns (previous) (diff)

comment:59 @valendesigns4 months ago

Hooray, I'm confident I have a solution!

However, I want to test it rigorously before I upload it this time. The previous fix was OK in theory, but it didn't work in practice. This time around I created a pre_get_posts filter that solves the issue for this and all the other associated tickets without changing the permalinks, so no backwards compatibility concerns or broken tests.

I'll do a bit more testing and upload the patch in the next day or so. In the meantime I need to know how to create a unit test post that has an ID above 1k, so I can test for one of the known year permalink conflicts. Anyone have suggestions or tricks that might be useful in artificially inflating the post ID without actually creating that many posts?

@valendesigns4 months ago

comment:60 @valendesigns4 months ago

  • Keywords has-patch needs-testing added

Patch 5305.5.diff is ready for testing! I've included 6 new unit tests and I feel confident that the issue is now solved. I couldn't account for every possible permalink structure, but I feel we now have coverage on the majority of them. As well, this fixes posts without titles, which has also been a long standing issue.

I did have to introduce a new filter, as much as I didn't want to, but we needed a way to extend this functionality to custom post types without having to use the functions I created. Please let me know if there is an outlier permalink structure that you know of that wasn't mentioned in the comments that needs to be accounted for, or you think I did something in a way that doesn't make sense so I can fix it.

Thanks Guys! And thank you @DrewAPicture for making me the owner.

@valendesigns4 months ago

comment:61 @slackbot3 months ago

This ticket was mentioned in Slack in #core by valendesigns. View the logs.

comment:62 @valendesigns3 months ago

  • Keywords dev-feedback added

It's been a week without any response to the latest 5305.6.diff patch, so I figure it's time to ping this thread and get the conversation going again. Has anyone had time to test the patch or look over the code?

comment:63 @boonebgorges3 months ago

valendesigns - Thanks for your work on this. Broadly speaking, I think the strategy here is good: when a date query is detected and the permalink structure contains %postname%, give precedence to a post matching the postname, rather than the corresponding date archive.

I have some implementation suggestions, some large and some small. See [5305.7.diff].

First, a 'pre_get_posts' callback doesn't seem like the right place to do this. For one thing, while you've given it a priority of 1, it's still possible for plugins to intervene before it runs, which could cause various problems. It's also not clear to me why we would want the proposed to be unhookable. It seems to me that the proper place is in WP_Query::parse_query(), right after the existing date checks.

I've also simplified (and, I think, generalized) your $token_index trick. What we really care about is the location of %postname%, and whether it's preceded directly by %year% or %month% or whether it's in the 0th position. My patch tests this more directly.

I also combined the two functions into a single method. By returning from the method whenever we know there's nothing to do, we prevent the nesting that I'm assuming you were trying to avoid.

I simplified <!--nextpage--> support as well. Most of the necessary work - like the nextpage count - is done for us later on in setup_postdata(). No need to repeat it here. The only thing required is to translate the date query var into 'page'.

I've left a @todo comment on the zeroise() block. There is still a bug here. Month and day URL chunks are parsed properly with or without the leading zero (([0-9]{4})/([0-9]{1,2})/([0-9]{1,2})/page/?([0-9]{1,})/?$' - note the {1,2}). This means that 2015/2 is a potential clash when post_name=2, in addition to 2015/02 and post_name=02. Does this mean we have to do two calls to get_page_by_path()`?

@boonebgorges3 months ago

comment:64 @boonebgorges3 months ago

Forgot to mention in the above that I left out the post_type filter, and hardcoded 'post'. WP doesn't create date-based archives for CPTs out of the box. So I'm not sure that there's a natural situation where you'd have the kind of slug conflict covered in this post. I suppose it'd be possible in cases where a plugin registers its own rewrite rules for these kinds of archives. But in that case, we should probably be doing something more automatic to detect that these post types are subject to the clash in question - maybe examining the rewrite rules or something? Any thoughts are welcome.

comment:65 @valendesigns3 months ago

  • Keywords dev-feedback removed

@boonebgorges Thank you for testing my patch. Also, I really like your iteration, it's very clean and concise.

To address using pre_get_posts, my thinking was doing it that way first would make it easier to test and iterate on. I just wasn't sure of the best place to jump in and take over. You seemed to have found it though.

As for <!--nextpage-->, this is something that does not act in the same manner as the last patch. For example, say I have a /%year%/%postname%/ permalink structure and I create a post named '02' and I try and view any day archive in February, the post takes over. The way I had it setup was that it took over only if <!--nextpage--> was being used. And then, only for the pages it uses. So that means if I have 2 pages for the '02' post, then /2015/02/1/ & /2015/02/2/ would match a page, but /2015/02/3/ would pass and the date archive would be shown instead. So even though setup_postdata() is where the current page count happens, we need to do our own logic to establish the paging boundaries and avoid introducing a regression.

I used zeroise() in an attempt to normalize input so it returned in a date format with a leading zero. However, after second thought, that logic may be somewhat flawed due to the reasons you mentioned above with '2' & '02' being the same as a date, but not as a post. The real issue is that the 0 is not passed in the query var, so it doesn't match the '02' post name format. I guess the question should be why is the leading zero being stripped away from the query var? We might want to start looking in that direction to avoid a second query and/or using zeroise(). If that can't be changed, then yes we'll likely need to do a second get_page_by_path().

Something to note, is that get_page_by_path() cannot find posts without titles. It will always return null in that situation. So in order to resolve that, we need to have the second check in place or update the logic in get_page_by_path() to allow for checks against ID and not just post_name. In either case, we should find a way to address resolving posts without titles where the ID collides with a date.

I realized yesterday that the post_type filter was not needed when I tested the patch further by creating a very simple CPT. I quickly found out that it was a pointless filter and I had every intention of removing it, but got sidetracked on something else. It would seem you beat me to it. ;)

I only briefly tested your new patch, but I'll put it through its paces and lob back an update on Monday. Overall though, I think we're getting pretty close to having a final solution to this long standing issue.

Thanks again for taking the time to rework the code and give feedback. Cheers!

comment:66 @valendesigns3 months ago

Sorry, I take my comment back about get_page_by_path(). After further investigation and testing, I was very much wrong on that one. I'm trying to remember why I thought that it would return null in that situation, I know there was a reason for the extra get_post call in the resolve function, but for the life of me I can't remember what I was thinking when I wrote it. Disregard that part of my previous response. Though if I remember what edge case I was trying to account for I'll let you know.

@valendesigns3 months ago

comment:67 @valendesigns3 months ago

I've added three new tests and a couple fixes in patch 5305.8.diff. Two of those tests are to demonstrate the post_name=2 vs. post_name=02 issue, and that it's now been fixed. The third test demonstrates the pagination issue I mentioned above. Basically, the test would return true before the patch is applied and is expected to return true after the patch is applied. Without the latest changes, is_day would return false, which is not the expected behavior. I would have left this all alone until Monday, but I got pulled in when I started testing.

@boonebgorges3 months ago

comment:68 follow-up: @boonebgorges3 months ago

Thanks, valendesigns. You're right about nextpage - I hadn't been thinking of the case of invalid page numbers. On that note, I've added a few more tests, and tweaked the logic a bit more. Take the following two cases:

  1. A post called 02 with two instances of <!--nextpage-->. example.com/2015/02/05 is not a valid page, so should resolve to the day archive. (Your patch did this.)
  2. A post called 02 with *no* instances of <!--nextpage-->. example.com/2015/02/05 is not a valid page, so should resolve to the day archive. (Your patch did not do this, but 5305.9.diff does.)

I also reorganized the comments to narrate the flow a bit better. Let me know what you think.

comment:69 in reply to: ↑ 68 @valendesigns3 months ago

@boonebgorges This looks great! Nice catch on the page 1 bug, as well. I think the only thing left to do is have more people test it out. So I'll summarize the patch for those interested in testing.

  • Added a total of twelve new tests. Two of those tests should pass and ten should fail before the patch is applied. The two passing tests are:
    test_date_slug_collision_should_distinguish_too_high_pagination_from_date()
    
    test_date_slug_collision_should_be_ignored_when_pagination_var_is_present_but_post_does_not_have_multiple_pages()
    
  • Resolved the following date premalink structures for numeric slugs; this includes posts without titles.
    • /post_name/
    • /post_name/page/
    • /year/post_name/
    • /year/post_name/page/
    • /year/month/post_name/
  • Accounted for day and monthnum values that are cast to integers, which strips leading zeroes. This allows us to resolve variations in post_name, like /2015/2/ and /2015/02/.
  • Added the page query var when post pagination is present and valid.
Last edited 3 months ago by valendesigns (previous) (diff)

@boonebgorges3 months ago

comment:70 @boonebgorges3 months ago

  • Keywords needs-patch added; has-patch needs-testing removed

Coming back to review this, I think there's a problem with the current approach. By intervening in WP_Query::parse_query(), we make it impossible to create a WP_Query object whose date params clash with the permalink of a post with a numeric slug. See test_numeric_slug_permalink_conflicts_should_only_be_resolved_for_the_main_query() in 5305.10.diff. What we are trying to resolve here is improperly parsed URLs, but solving the problem in WP_Query means that we're "fixing" queries that didn't originate in URLs at all.

I see two possible ways forward:

  1. Move our proposed disambiguation logic to somewhere around the 'request' filter or 'parse_request' action. The internals will have to change a bit.
  2. Keep it in WP_Query, but only run it if is_main_query().

2 is easier, but I don't think it'll be reliable. It won't fix url_to_postid(), for instance. It looks like 1 could possibly fit into parse_request() after non-public post_types are filtered out - the disambiguation is a similar kind of sanity check.

valendesigns, do you have thoughts about this?

comment:71 @valendesigns3 months ago

@boonebgorges I originally had is_main_query() in the code about 5 patches ago, but I removed it before uploading and after finding that is_date worked reasonably well to keep our code out of other queries. Though, as you've pointed out we are now blocking a valid date query altogether. So yes, I agree we need to move it or rethink the implementation. I'm working on some ideas to see if it's possible to keep it where it is, but I'm not all that optimistic.

@valendesigns3 months ago

comment:72 @valendesigns3 months ago

  • Keywords has-patch added; needs-patch removed

@boonebgorges I've moved the function, and how & where it's called. It now directly modifies the query vars before WP_Query is called. As well, I've added additional tests. Please take a look at 5305.11.diff and let me know what you think.

@boonebgorges3 months ago

comment:73 @boonebgorges3 months ago

Getting closer :)

After some conversation with valendesigns, I'm going to propose that we take a two-pronged strategy. See 5305.12.diff.

  1. During parse_request(), if we detect that a URL looks like a date query, look for conflicts with posts that have numeric slugs. (This is what 11.diff already did.)
  2. Simply doing (1) is a potentially incomplete solution, because it fixes the original bug - not being able to access a post '28' located at /2015/02/28/ - at the expense of creating another, lesser bug - not being able to access the date archive at /2015/02/28/. So we should also attempt to prevent the creation of posts with this kind of slug. 12.diff attempts a solution that is more modest, in two ways, than what has been suggested in previous discussion on this ticket.
    1. It only affects the auto-generated slug in get_sample_permalink(). If you want to override this by manually entering a conflicting slug (or by creating posts programatically), you may.
    2. It only enforces uniqueness (using the -2 etc formula used in wp_unique_post_slug()) if it's detected that the suggested post would actually create a date conflict. We look at the current permalink structure, and verify that the autogenerated post_name would be a valid value for year/monthnum/day, as appropriate. So the suffixing will only happen in very rare cases, instead of every time the post_name is numeric.

Feedback welcome. I'd especially like to know what people think about the get_sample_permalink() strategy - is it too lax? Should we be enforcing this in wp_unique_post_slug()?

comment:74 @valendesigns3 months ago

@boonebgorges Sorry for the delay, I was out of town the last couple of days. This looks great! Though, I haven't tested it yet. I will tonight and get back to you. As far as the get_sample_permalink() strategy, my first thoughts are that it's good to be somewhere in the middle when enforcing unique slugs. Especially since we have a solid two prong approach that covers both sides of the fence without being overly aggressive.

comment:75 @slackbot3 months ago

This ticket was mentioned in Slack in #core by boone. View the logs.

comment:76 @slackbot3 months ago

This ticket was mentioned in Slack in #core by drew. View the logs.

comment:77 follow-up: @boonebgorges3 months ago

  • Keywords 4.3-early added
  • Milestone changed from 4.2 to Future Release

Given the late date, I don't feel comfortable pulling the trigger on this for 4.2. Let's do it first thing for 4.3.

comment:78 in reply to: ↑ 77 ; follow-up: @valendesigns3 months ago

Replying to boonebgorges:

Given the late date, I don't feel comfortable pulling the trigger on this for 4.2. Let's do it first thing for 4.3.

That's really unfortunate to hear. I assumed all that was left to do was get additional dev feedback. The last patch looked complete, and with all the unit tests this seemed like it would make it into 4.2 for sure. :(

comment:79 in reply to: ↑ 78 @boonebgorges3 months ago

Replying to valendesigns:

Replying to boonebgorges:

Given the late date, I don't feel comfortable pulling the trigger on this for 4.2. Let's do it first thing for 4.3.

That's really unfortunate to hear. I assumed all that was left to do was get additional dev feedback. The last patch looked complete, and with all the unit tests this seemed like it would make it into 4.2 for sure. :(

The hard work will not be in vain. It's just that this ticket has been open for many years, with many of the Great Minds of WordPress trying unsuccessfully to solve the problem, so I want to be sure we have adequate time for feedback pre- and post-commit. Thanks for your patience.

comment:80 @valendesigns3 months ago

I totally understand and that makes perfect sense. Cheers!

@valendesigns2 months ago

refresh

@valendesigns5 weeks ago

refresh

comment:81 @boonebgorges4 weeks ago

  • Keywords 4.3-early removed
  • Milestone changed from Future Release to 4.3

comment:82 @slackbot2 weeks ago

This ticket was mentioned in Slack in #core by valendesigns. View the logs.

@boonebgorges2 weeks ago

comment:83 @boonebgorges2 weeks ago

valendesigns - I'm glad we let this percolate for 4.3, because in revisiting it, I found some problems :) I've run out of time to work on the patch today, so I wanted to throw it up here for you to chew on.

5305.15.diff proposes a couple of changes:

  1. Force 'wp_unique_post_slug_is_bad_[hierarchical|flat]_slug' in get_sample_permalink(). This is necessary, because when the existing $post is an auto-draft, the post_name in the database will be empty, which means that the slug won't be suffixed. I'm not a huge fan of the add/remove filter trick, but it's done elsewhere in core, and it's far simpler than any other solution I could come up with.
  1. I changed the "resolve" function name to avoid the underscore prefix.

Unfortunately, I still don't think we're ready to move on this. A couple outstanding issues that I want to throw out for consideration:

  1. The get_sample_permalink() suffixing is too aggressive. On a typical WP installation, it's only 'post' objects that will have URLs that clash with date archives. (Objects from hierarchical post types are prevented from having numeric slugs; see #11917 [13424].) So perhaps we should only be massaging the suggested permalink if the post_type is 'post'. But are there cases where a CPT might have a conflicting slug? I don't think so, but it needs some thought.
  1. I'm having second thoughts about the get_sample_permalink() strategy. Hierarchical post type objects are prevented from having numeric slugs at all; see #11917 [13424]. I don't think we have to go that far, but perhaps we should put our date-clash enforcement at the level of wp_unique_post_slug(), if only for consistency. This will mean rewriting a number of our tests, since it'll no longer be possible to create fixtures with date-clash slugs through the unit test factory.

comment:84 @valendesigns8 days ago

@boonebgorges Sorry for not responding back sooner, I'll have some dedicated time this weekend to focus on this issue. But I wanted to comment that I do not think there is an issue here with CPT as a conflict cannot exists with these permalink structures. Correct me if I'm wrong here, but to me there is only an issue with posts because without the post_type query var it cannot route to a CPT. So a quick check for post sounds like a good solution to me.

comment:85 @boonebgorges2 days ago

In 32604:

Improve unit tests for wp_unique_post_slug().

See #5305.

@boonebgorges2 days ago

comment:86 @boonebgorges2 days ago

Back to the drawing board with 5305.16.diff. I've moved the check into wp_unique_post_slug(). This avoids weird issues with post drafts, and more closely parallels the similar pagination checks in [13424].

16.diff does not contain the rewrite/query parsing. I'm of two minds about this. On the one hand, forcing uniqueness in wp_unique_post_slug() means that slug conflicts won't happen under normal circumstances, so it's not worth the additional overhead that wp_resolve_numeric_slug_conflicts() introduces. On the other hand, post/archive URL conflicts may still arise in the following cases: (a) legacy content, (b) changed permalink settings, (c) manual manipulation (direct SQL, filtering 'wp_unique_post_slug', etc). I could go either way here. valendesigns, what do you think?

comment:87 @valendesigns2 days ago

Well 16 does remove a lot of potential overhead, so I am with you on that. Though, as you mentioned it has drawbacks with legacy slugs. Which I think is potentially the biggest issue we face, as the point of this ticket is really to fix legacy slugs.

Yes, 16 fixes future collisions, but if we go this route we're only half fixing the issue. I feel like wp_resolve_numeric_slug_conflicts should remain intact and be ran when there is a 404 or something along those lines. There needs to be a way to prevent the collision for legacy slugs for this really to be a complete fix. We need to limit the use of the function to specific scenarios so we don't introduce unnecessary overhead. Is that even possible or are we chasing a unicorn here?

@boonebgorges39 hours ago

comment:88 @boonebgorges39 hours ago

Which I think is potentially the biggest issue we face, as the point of this ticket is really to fix legacy slugs.

I disagree - the point of the ticket is to fix numeric slug conflicts. At any given time, all existing conflicts are "legacy", but preventing future instances of the problem is still a Good Thing.

Anyway, I was really just throwing 16 out there to see what you thought. Let's go with something like 5305.17.diff, which includes the rewrite-level conflict resolution. That patch also includes some enhancements to the unit tests so that they work properly with the new wp_unique_post_slug() changes, as well as new logic that prevents a URL like /2013/05/25 from resolving to post_name=25 if that post was not published in May 2013 (phew).

We need to limit the use of the function to specific scenarios so we don't introduce unnecessary overhead.

I think 17.diff is as focused as we're going to get. We can't run on 404, because the problem here is not 404s - the problem is that certain URLs resolve to date archives when they should resolve to single posts. So we only run the additional regex and database query when WP has parsed the request as a date archive. Anything more focused than this is probably a unicorn.

I think this is about ready to go, but I'd appreciate a last look.

Note: See TracTickets for help on using tickets.