Make WordPress Core

Opened 17 years ago

Closed 9 years ago

Last modified 5 years ago

#5305 closed defect (bug) (fixed)

permalinks broken when article name is numeric

Reported by: thomask's profile thomask Owned by: valendesigns's profile valendesigns
Milestone: 4.3 Priority: normal
Severity: major Version: 2.3.1
Component: Permalinks Keywords: has-patch commit
Focuses: Cc:

Description

if you create numeric-only post name, the generated slug is this number - this conflicts with article ID, so it returns different article or 404 page, never the article. It can be then solved by generating manual slug with some char in it, but i think it would be better to include some char in that case, e.g. underscore, like _123

Also if someone will try to solve this, it would be nice to solve other problem - if post slug is begining with the slug of the category, than the category page returns that post, not the category

Attachments (24)

wp-includes_post.php.diff (693 bytes) - added by loushou 10 years ago.
wp-includes/post.php patch for handling numeric slugs
wp-includes_post.php.2.diff (704 bytes) - added by loushou 10 years ago.
updated patch, which includes translation on 'number'
5305-unittests.diff (1.1 KB) - added by MikeHansenMe 10 years ago.
see #30284
5305.diff (1.3 KB) - added by valendesigns 10 years ago.
5305.1.diff (1.6 KB) - added by valendesigns 10 years ago.
5305.2.diff (1.6 KB) - added by valendesigns 10 years ago.
5305.3.diff (1.6 KB) - added by valendesigns 10 years ago.
5305.4.diff (2.5 KB) - added by valendesigns 10 years ago.
Added a unit test.
5305.5.diff (10.5 KB) - added by valendesigns 10 years ago.
5305.6.diff (10.5 KB) - added by valendesigns 10 years ago.
5305.7.diff (7.9 KB) - added by boonebgorges 10 years ago.
5305.8.diff (10.3 KB) - added by valendesigns 10 years ago.
5305.9.diff (12.0 KB) - added by boonebgorges 10 years ago.
5305.10.diff (13.2 KB) - added by boonebgorges 10 years ago.
5305.11.diff (18.6 KB) - added by valendesigns 10 years ago.
5305.12.diff (24.5 KB) - added by boonebgorges 10 years ago.
5305.13.diff (24.5 KB) - added by valendesigns 9 years ago.
refresh
5305.14.diff (24.5 KB) - added by valendesigns 9 years ago.
refresh
5305.15.diff (24.9 KB) - added by boonebgorges 9 years ago.
5305.16.diff (11.8 KB) - added by boonebgorges 9 years ago.
5305.17.diff (33.1 KB) - added by boonebgorges 9 years ago.
5305.18.diff (3.7 KB) - added by boonebgorges 9 years ago.
5305.19.diff (1.4 KB) - added by boonebgorges 9 years ago.
5305.20.diff (1.4 KB) - added by boonebgorges 9 years ago.

Download all attachments as: .zip

Change History (140)

#1 @lloydbudd
17 years ago

thomask, can you include your Permalink Structure? And provide specific examples? This could help duplicate the problem.

#2 @darkfate
17 years ago

I don't think he's using permalinks. I think he's saying, like if his post slug is 123 and his title is 123, it breaks. It looks like 2.4 trunk doesn't break like that though. Unless it is a specific permalink structure.

#3 @dille
17 years ago

My structure is /archief/%postname%/, and I see the same problem. I've simply re-slugged the offending posts to the number in words, so 30 would become "thirty" (or in my case, "dertig", as I'm Dutch).

I didn't have this problem when my structure was /archief/year/month/day/postname/ though.

#4 @JDTrower
17 years ago

Actually, I am able to verify that this does happen. If you have a custom permalink, where you have something like /archives/%postname%/ or /archives/post/%postname%/ and you publish a post with a numeric title such as 2008, when you click to view the post, you get a 404 error. In all my testing, this only occurs with this type of a permalink where you have a static word followed by a forward slash and then the dynamic postname.

If you try /archive/%author%/%postname/ or put any of the other dynamic options between the static word and the dynamic postname, the link will work fine.

I verified this in 2.3.1 and in 2.4 trunk.

#5 @JDTrower
17 years ago

Also, if you have a custom permalink that is just /%postname%/ you get a 404 error on a post with a numeric title.

As for the other problem that was mentioned concerning if a post slug is beginning with the slug of the category, than the category page returns that post, not the category. I have been unable to recreate it.

#6 @JDTrower
17 years ago

Just a note, Ryan in comment 25 on ticket #3614 (http://trac.wordpress.org/ticket/3614#comment:25) explains why the 404 errors occur with permalinks like those that I tried (I don't personally use them myself) or that others use. I have yet to test the fix that they have come up with on that ticket to see if it solves the problem here(although, it appears as though you would still need to modify your permalinks structure to not be like those above). When I get a chance I'll test this ticket based on ticket #3614.

#7 @lloydbudd
17 years ago

  • Milestone changed from 2.3.2 to 2.5

#8 follow-up: @chrismou
16 years ago

I've just a had a similar ticket closed as a duplicate, although I'm not 100% sure from the comments here that people have grasped what the problem is.

If you need examples, check the info on http://trac.wordpress.org/ticket/6997

To summarise, I create a page (lets call it "a"). The slug is also "a". I then create another page, give it a slug of "2008" and set "a" to be its parent. When I try to navigate to /a/2008/, it simply shows me the content of "a", rather than "2008". If I add anything else onto the end (ie a/2008/a/) it redirects me to a post written in 2008.

I just seems to be the case that as soon as it sees 2008, it tries to guess what I want, rather than just checking to see if there is actually a page called 2008.

#9 in reply to: ↑ 8 @DD32
16 years ago

By the look of it, Its reconising the 2008 as a page value(ie. The 2008th page):

object(WP_Query)[3]
  public 'query_vars' => &
    array
      'page' => string '/2008' (length=5)
      'pagename' => string 'a' (length=1)

(Which in itself seems to be a bug that /2008 is reconised as the page number, it shouldnt have the slash in there)

The rule which is being matched is:

[(.+?)(/[0-9]+)?/?$] => index.php?pagename=$matches[1]&page=$matches[2]

I can see 2 ways to resolve it:

  1. Revert to the old method of actually having *every* page slug being checked for, which is slow, messy, and really isnt going to happen
  2. Limit pages to only have a certain ammount of pages (Why remove functionality?)
  3. Prefix extra pages with /page/

More on that last one: Most common permalink structures in WordPress allready use that method (.*/page/[0-9]+).

Changing the current permalink settings will break links on pages at present (ie. /a/2/ for page 2) in search results, But would allow for numeric page slugs on subpages.

#10 @DD32
16 years ago

Also, For consistancy, Is there a reason thats using &page= instead of &paged= ?

#11 @SupersonicSquirrel
16 years ago

I also had a ticket about this closed and I had no idea it existed already, sorry; as I had no such problems in any version prior to 2.5.1. I'm confused now. How come that the issue occurs in different versions of WP for each person?

#13 @Viper007Bond
16 years ago

For anyone looking to do this meanwhile and is willing to give up the paging ability, here's a plugin I wrote:

<?php
/*
Plugin Name:  Allow Numeric Stubs
Description:  Allows children Pages to have a stub that is only a number. Sacrifices the <code>&lt;!--nextpage--&gt;</code> ability in Pages.
Version:      2008.07.23
Author:       Viper007Bond
Author URI:   http://www.viper007bond.com/
*/


// Register plugin hooks
register_activation_hook( __FILE__, 'allow_numeric_stubs_activate' );
add_filter( 'page_rewrite_rules', 'allow_numeric_stubs' );


// Force a flush of the rewrite rules when this plugin is activated
function allow_numeric_stubs_activate() {
	global $wp_rewrite;
	$wp_rewrite->flush_rules();
}


// Remove the offending rule and replace it with something else
function allow_numeric_stubs( $rules ) {
	unset( $rules['(.+?)(/[0-9]+)?/?$'] );

	$rules['(.+?)?/?$'] = 'index.php?pagename=$matches[1]';

	return $rules;
}

?>

#14 @bitethemailman
16 years ago

This never happened to me in 2.3.2, but happens in 2.5. I'm having the same problem described by chrismou in comment 8.
Duplicated in #7444.

#16 @mrmist
16 years ago

Also ref #7355

#17 @mrmist
16 years ago

  • Keywords needs-patch added

#18 @Denis-de-Bernardy
15 years ago

  • Keywords permalinks number removed

#19 @Denis-de-Bernardy
15 years ago

  • Component changed from General to Permalinks
  • Owner changed from anonymous to ryan

#20 @janeforshort
15 years ago

  • Milestone changed from 2.9 to Future Release

Punting due to schedule.

#21 @Viper007Bond
13 years ago

See also this dupe ticket with some good discussion on it: #14238

#22 follow-up: @SergeyBiryukov
13 years ago

Just stumbled upon this. Steps to reproduce:

  1. Create a post with slug 1.
  2. Set permalink structure to /%year%/%monthnum%/%postname%/.
  3. Then we have request: 2008/12/1
  4. Matched rule: year=2008&monthnum=12&day=1

Other permalink structures (Day and name, Numeric, even /%postname%/ mentioned above) work fine.

#23 in reply to: ↑ 22 ; follow-up: @Viper007Bond
13 years ago

Replying to SergeyBiryukov:

  1. Create a post with slug 1.

I don't believe that's possible. WordPress will currently change that to 1-2 as far as I know.

#24 in reply to: ↑ 23 @SergeyBiryukov
13 years ago

Replying to Viper007Bond:

I don't believe that's possible. WordPress will currently change that to 1-2 as far as I know.

That's for pages. It's still possible to create a post with a numeric-only slug.

#26 @lkraav
13 years ago

  • Cc lkraav added

#27 @SergeyBiryukov
12 years ago

Closed #20617 as a duplicate.

#28 @wonderboymusic
12 years ago

This is the date rewrite rules being generated regardless of the post_permastruct. Because the post permastruct Sergey listed doesn't match any form of endian, this date permastruct wins:

if ( empty($date_endian) )
	$date_endian = '%year%/%monthnum%/%day%';

Because of that, day archives are generated which wipe out post permastruct when the slug matches a 1-2 digit integer. Here's an abbreviated list of the rules that are generated:

array (
  '([0-9]{4})/([0-9]{1,2})/([0-9]{1,2})/page/?([0-9]{1,})/?$' => 'index.php?year=$matches[1]&monthnum=$matches[2]&day=$matches[3]&paged=$matches[4]',
  '([0-9]{4})/([0-9]{1,2})/([0-9]{1,2})/?$' => 'index.php?year=$matches[1]&monthnum=$matches[2]&day=$matches[3]',
  '([0-9]{4})/([0-9]{1,2})/page/?([0-9]{1,})/?$' => 'index.php?year=$matches[1]&monthnum=$matches[2]&paged=$matches[3]',
  '([0-9]{4})/([0-9]{1,2})/?$' => 'index.php?year=$matches[1]&monthnum=$matches[2]',
  '([0-9]{4})/page/?([0-9]{1,})/?$' => 'index.php?year=$matches[1]&paged=$matches[2]',
  '([0-9]{4})/?$' => 'index.php?year=$matches[1]',
  '([0-9]{4})/([0-9]{1,2})/([^/]+)(/[0-9]+)?/?$' => 'index.php?year=$matches[1]&monthnum=$matches[2]&name=$matches[3]&page=$matches[4]',
)

Similar issue: #13701

#29 @SergeyBiryukov
12 years ago

#22592 was marked as a duplicate.

#30 @toscho
12 years ago

  • Cc info@… added

#31 @DrewAPicture
11 years ago

#24294 was marked as a duplicate.

#32 @c3mdigital
11 years ago

#20185 was marked as a duplicate.

#33 @SergeyBiryukov
11 years ago

#26641 was marked as a duplicate.

#34 @SergeyBiryukov
11 years ago

#26974 was marked as a duplicate.

This ticket was mentioned in IRC in #wordpress-dev by SergeyBiryukov. View the logs.


11 years ago

#36 @Denis-de-Bernardy
11 years ago

Posting the entire #26974 ticket's description for reference:

To reproduce, create a new blog, select that permalink structure, and publish an untitled post. View it. Expected: the post shows up. Actual: the day shows up empty — unless, of course, the post's day matches the post's id.
Minor, but annoying. Temporary fix to disable day links entirely:

add_filter('date_rewrite_rules', function($date_rewrite_rules) {
	if (get_option('permalink_structure') == '/%year%/%monthnum%/%postname%/') {
		$date_rewrite_rules = array_filter($date_rewrite_rules, function($rule) {
			return strpos($rule, '&day=') === false;
		});
	}
	return $date_rewrite_rules;
});

Methinks disabling day links should be the default behavior in WP when that structure is chosen.

#37 @SergeyBiryukov
11 years ago

  • Milestone changed from Future Release to 3.9

#38 @SergeyBiryukov
11 years ago

  • Milestone changed from 3.9 to Future Release

Still needs a patch.

#39 @ryan
10 years ago

  • Owner ryan deleted
  • Status changed from new to assigned

#40 @loushou
10 years ago

  • Keywords has-patch added; needs-patch removed

Seems to me that the only real solution to this problem is going to be to prefix those numeric slugs with some alpha-dash-text of some sort; otherwise, it will always get confused with any numeric value expected in the permalink, and thus interpreted incorrectly. Prefixing it with something as nondescript as '_' seems like the wrong solution, if for no other reason than _123 just looks weird in a URL.

My solution simply prefixes a numeric slug with 'number-' ( http://example.com/number-123/ ). It is longer, and one could argue that it is equally nondescript, but at least you know it is a number and it does not get confused with expected numeric values in the permalink. This is a blanket patch to the slug generation function, so it applies to all post_types. As requested, this prefix allows any post type to work in any of the standard permalink structures. As a party favor to all those devs out there who love their filters (like me), I also have a filter on 'number-' that only gets called when it is needed. As a fallback to all of this, if 'number-123' is not a nice permalink for a specific person, we obviously still have the ability to manually override the permalink, just not with anything numeric.

The original question also included a secondary issue, where apparently post_name values that were prefixes to a category slug used to force the category page to load instead of the post page. Based on the current permalink class, this sounds like it is already solved and not even possible any more. Even the old category slug - parent page slug collision problem seems solved. This is because categories and tags are all prefixed with a slug now. Matching clashes with that specified slug seems unnecessary, so I assume we are just dropping that second request.

Thus I will attach the patch.

@loushou
10 years ago

wp-includes/post.php patch for handling numeric slugs

#41 follow-up: @toscho
10 years ago

number must be translatable, and then you end up with something like číslo. That would probably raise new issues. I don’t think this is a robust solution.

@loushou
10 years ago

updated patch, which includes translation on 'number'

#42 in reply to: ↑ 41 @loushou
10 years ago

Replying to toscho:

number must be translatable, and then you end up with something like číslo. That would probably raise new issues. I don’t think this is a robust solution.

You make a good point that number needs to be translatable. Thus, I have updated the patch to include that in the solution. The second part of your objection is moot however. The slug sanitization process already handles sanitization of translations. If that is really an objection, then the sanitization functions need redesign, though I think you will find they don't. As for the 'robustness' of the solution, it is meant to be simple. If there is something I missed as far as the solution is concerned in the ways of 'robustness', feel free to suggest them, and I will try to work them in. Thanks again for the comment. Anything to make it better.

This ticket was mentioned in IRC in #wordpress-dev by SergeyBiryukov. View the logs.


10 years ago

This ticket was mentioned in Slack in #forums by radices. View the logs.


10 years ago

This ticket was mentioned in Slack in #forums by radices. View the logs.


10 years ago

#46 @ericlewis
10 years ago

  • Keywords needs-patch good-first-bug added; has-patch removed
  • Milestone changed from Future Release to 4.2

Let's just prepend or append the number with an abstract character (maybe -?) and avoid translations.

Bringing it into 4.2 for consideration - needs a patch soon though.

@valendesigns
10 years ago

@valendesigns
10 years ago

#47 @valendesigns
10 years ago

  • Keywords has-patch added; needs-patch removed

Replying to ericlewis:

Let's just prepend or append the number with an abstract character (maybe -?) and avoid translations.

Bringing it into 4.2 for consideration - needs a patch soon though.

The problem with prepending a hyphen alone is that it's technically still numeric and will run a second time when saving which then prepends a double hyphen --.

I've updated the patch to prepend the value of $post_type together with a hyphen to avoid translations. So you get something like post-123 or page-123. As well, I've added proper documentation to the new filter.

The one caveat is we are going to be changing slugs without any notice to the end user. So I created a second patch to allow this feature to be filtered off. That way if you want your slugs unmolested because you use a compatible permalink structure you're not completely without options.

#48 follow-up: @Denis-de-Bernardy
10 years ago

I don't like this last patch, as it introduces potential localization problems.

Prepending or appending a hyphen might technically be numeric, but it doesn't get caught by the rewrite regex, and that's the only thing that counts. Imho, it's a better option that trying to introduce some arbitrary string that screams "translate me".

Along similar lines, we could prepend and/or append a colon : to the post's slug instead of a hyphen, in order to get urls like /2015/01/:23/ or /2015/01/:23:/. (This is assuming, of course, that sanitize_title wouldn't balk at the input. Whereas it actually might...)

Anyway, my concern is these potential solutions don't really fix the problem. They merely sweep things under the rug in the hopes of not allowing the bug in the future while not addressing the multitudes of sites that exhibit the issue with various duct tape fixes already. That and the other solutions discussed here merely stick to sweeping it under the rug so as to try to avoid it from occurring.

If the latest patch or a variation thereof gets applied, I'm left wondering what to do with the existing posts on my site that led me to report a bug, and I'd gather other reporters in the thread are left asking for more as well.

Be it #22592, #26974, #22592 or #24294, #20185, #26641, the issue seems to invariably break down to one of:

  • /yyyy/mm/slug/ conflicting with /yyyy/mm/dd/
  • /yyyy/slug/ conflicting with /yyyy/mm/
  • /slug/ conflicting with /yyyy/

On my own site, I disabled day links. It's not a one-size fits all solution, since for all I know I'd like these day links to get recognized in the event a user tries an arbitrary day link, but it was the right thing to do.

It seems to me that the better solution would be something like this:

  1. On permalink save (and on the next WP upgrade), test for permalink conflict potential. Suppose a two or four digit slug. Does a day rule catch it? The same for month and year rules. If yes, flag the structure as potentially leading to conflicts (as in year, month, day, or none).
  1. On the appropriate date archive type where we know conflicts may occur (year, month or day), try treating the offending parameter as if it was a post slug. This is a nearly free query -- we're looking for a single post. If a post is found, serve the expected post. If not, either fallback to serving the date archive as usual, or redirect to something appropriate (e.g. day url to month url, month url to year url, year url to blog url).

The upside of this suggestion is it "just works" from the end-user's point of view. I may be wrong, but I can't imagine a site owner caring the slightest bit about offending date archives having a slightly non-standard behavior if he opted for a permalink structure that basically invites potential conflicts.

Last edited 10 years ago by Denis-de-Bernardy (previous) (diff)

#49 in reply to: ↑ 48 ; follow-up: @SergeyBiryukov
10 years ago

Replying to Denis-de-Bernardy:

On the appropriate date archive type where we know conflicts may occur (year, month or day), try treating the offending parameter as if it was a post ID. This is a nearly free query -- we're looking for a single post. If a post is found, serve the expected post. If not, either fallback to serving the date archive as usual, or redirect to something appropriate (e.g. day url to month url, month url to year url, year url to blog url).

Makes the most sense to me.

#50 @Denis-de-Bernardy
10 years ago

Yeah. Just to clarify, I meant "as if it was a post slug", rather than as a post ID.

#51 follow-up: @jesin
10 years ago

How about padding the slug with a leading zero?

I found another bug, not sure if there is a ticket for it.

  1. Set the permalink structure to %postname%.
  2. Create a post with slug foo-bar.
  3. Create a page with the same slug foo-bar.

Now the URL example.com/foo-bar/ loads the page created in step 2.

Problem lies in wp_unique_post_slug() which checks for uniqueness only within the post type.

#52 in reply to: ↑ 51 @SergeyBiryukov
10 years ago

Replying to jesin:

Now the URL example.com/foo-bar/ loads the page created in step 2.

See #13459.

#53 @DrewAPicture
10 years ago

  • Owner set to valendesigns

#54 in reply to: ↑ 49 @valendesigns
10 years ago

  • Keywords needs-patch needs-unit-tests added; good-first-bug has-patch removed

Replying to SergeyBiryukov:

Replying to Denis-de-Bernardy:

On the appropriate date archive type where we know conflicts may occur (year, month or day), try treating the offending parameter as if it was a post ID. This is a nearly free query -- we're looking for a single post. If a post is found, serve the expected post. If not, either fallback to serving the date archive as usual, or redirect to something appropriate (e.g. day url to month url, month url to year url, year url to blog url).

Makes the most sense to me.

I'll start looking into a better long term solution, which doesn't force a prepend or append on a numeric slug.

@valendesigns
10 years ago

#55 @valendesigns
10 years ago

  • Keywords needs-unit-tests removed

Disregard the last patch it doesn't have full coverage, but is on the right path.

@valendesigns
10 years ago

#56 @valendesigns
10 years ago

  • Keywords has-patch added; needs-patch removed

OK, so patch 5305.3.diff should solve this issue. After trying a couple different things I stumbled into WP_Rewrite::get_date_permastruct where the solution was staring me in the face. We already have some code in place that was preventing %post_id% collisions. All I did was extend the current implementation to include fixes for the offending %postname% permalink structures.

@valendesigns
10 years ago

Added a unit test.

#57 @valendesigns
10 years ago

Patch 5305.4.diff adds a unit test that fails before the fix and passes once it's applied.

#58 @valendesigns
10 years ago

  • Keywords has-patch removed

I stepped away for a minute and quickly realized there were backwards compatibility concerns due to redirect implications with this fix. I also ran all the phpunit tests, not just the post group, and got back failing tests. I'm going to keep at it, but I think we're at least moving in the right direction. I'll keep you guys posted. Sorry for all the chatter. I should have done more testing before uploading a patch.

Last edited 10 years ago by valendesigns (previous) (diff)

#59 @valendesigns
10 years ago

Hooray, I'm confident I have a solution!

However, I want to test it rigorously before I upload it this time. The previous fix was OK in theory, but it didn't work in practice. This time around I created a pre_get_posts filter that solves the issue for this and all the other associated tickets without changing the permalinks, so no backwards compatibility concerns or broken tests.

I'll do a bit more testing and upload the patch in the next day or so. In the meantime I need to know how to create a unit test post that has an ID above 1k, so I can test for one of the known year permalink conflicts. Anyone have suggestions or tricks that might be useful in artificially inflating the post ID without actually creating that many posts?

@valendesigns
10 years ago

#60 @valendesigns
10 years ago

  • Keywords has-patch needs-testing added

Patch 5305.5.diff is ready for testing! I've included 6 new unit tests and I feel confident that the issue is now solved. I couldn't account for every possible permalink structure, but I feel we now have coverage on the majority of them. As well, this fixes posts without titles, which has also been a long standing issue.

I did have to introduce a new filter, as much as I didn't want to, but we needed a way to extend this functionality to custom post types without having to use the functions I created. Please let me know if there is an outlier permalink structure that you know of that wasn't mentioned in the comments that needs to be accounted for, or you think I did something in a way that doesn't make sense so I can fix it.

Thanks Guys! And thank you @DrewAPicture for making me the owner.

@valendesigns
10 years ago

This ticket was mentioned in Slack in #core by valendesigns. View the logs.


10 years ago

#62 @valendesigns
10 years ago

  • Keywords dev-feedback added

It's been a week without any response to the latest 5305.6.diff patch, so I figure it's time to ping this thread and get the conversation going again. Has anyone had time to test the patch or look over the code?

#63 @boonebgorges
10 years ago

valendesigns - Thanks for your work on this. Broadly speaking, I think the strategy here is good: when a date query is detected and the permalink structure contains %postname%, give precedence to a post matching the postname, rather than the corresponding date archive.

I have some implementation suggestions, some large and some small. See [5305.7.diff].

First, a 'pre_get_posts' callback doesn't seem like the right place to do this. For one thing, while you've given it a priority of 1, it's still possible for plugins to intervene before it runs, which could cause various problems. It's also not clear to me why we would want the proposed to be unhookable. It seems to me that the proper place is in WP_Query::parse_query(), right after the existing date checks.

I've also simplified (and, I think, generalized) your $token_index trick. What we really care about is the location of %postname%, and whether it's preceded directly by %year% or %month% or whether it's in the 0th position. My patch tests this more directly.

I also combined the two functions into a single method. By returning from the method whenever we know there's nothing to do, we prevent the nesting that I'm assuming you were trying to avoid.

I simplified <!--nextpage--> support as well. Most of the necessary work - like the nextpage count - is done for us later on in setup_postdata(). No need to repeat it here. The only thing required is to translate the date query var into 'page'.

I've left a @todo comment on the zeroise() block. There is still a bug here. Month and day URL chunks are parsed properly with or without the leading zero (([0-9]{4})/([0-9]{1,2})/([0-9]{1,2})/page/?([0-9]{1,})/?$' - note the {1,2}). This means that 2015/2 is a potential clash when post_name=2, in addition to 2015/02 and post_name=02. Does this mean we have to do two calls to get_page_by_path()`?

@boonebgorges
10 years ago

#64 @boonebgorges
10 years ago

Forgot to mention in the above that I left out the post_type filter, and hardcoded 'post'. WP doesn't create date-based archives for CPTs out of the box. So I'm not sure that there's a natural situation where you'd have the kind of slug conflict covered in this post. I suppose it'd be possible in cases where a plugin registers its own rewrite rules for these kinds of archives. But in that case, we should probably be doing something more automatic to detect that these post types are subject to the clash in question - maybe examining the rewrite rules or something? Any thoughts are welcome.

#65 @valendesigns
10 years ago

  • Keywords dev-feedback removed

@boonebgorges Thank you for testing my patch. Also, I really like your iteration, it's very clean and concise.

To address using pre_get_posts, my thinking was doing it that way first would make it easier to test and iterate on. I just wasn't sure of the best place to jump in and take over. You seemed to have found it though.

As for <!--nextpage-->, this is something that does not act in the same manner as the last patch. For example, say I have a /%year%/%postname%/ permalink structure and I create a post named '02' and I try and view any day archive in February, the post takes over. The way I had it setup was that it took over only if <!--nextpage--> was being used. And then, only for the pages it uses. So that means if I have 2 pages for the '02' post, then /2015/02/1/ & /2015/02/2/ would match a page, but /2015/02/3/ would pass and the date archive would be shown instead. So even though setup_postdata() is where the current page count happens, we need to do our own logic to establish the paging boundaries and avoid introducing a regression.

I used zeroise() in an attempt to normalize input so it returned in a date format with a leading zero. However, after second thought, that logic may be somewhat flawed due to the reasons you mentioned above with '2' & '02' being the same as a date, but not as a post. The real issue is that the 0 is not passed in the query var, so it doesn't match the '02' post name format. I guess the question should be why is the leading zero being stripped away from the query var? We might want to start looking in that direction to avoid a second query and/or using zeroise(). If that can't be changed, then yes we'll likely need to do a second get_page_by_path().

Something to note, is that get_page_by_path() cannot find posts without titles. It will always return null in that situation. So in order to resolve that, we need to have the second check in place or update the logic in get_page_by_path() to allow for checks against ID and not just post_name. In either case, we should find a way to address resolving posts without titles where the ID collides with a date.

I realized yesterday that the post_type filter was not needed when I tested the patch further by creating a very simple CPT. I quickly found out that it was a pointless filter and I had every intention of removing it, but got sidetracked on something else. It would seem you beat me to it. ;)

I only briefly tested your new patch, but I'll put it through its paces and lob back an update on Monday. Overall though, I think we're getting pretty close to having a final solution to this long standing issue.

Thanks again for taking the time to rework the code and give feedback. Cheers!

#66 @valendesigns
10 years ago

Sorry, I take my comment back about get_page_by_path(). After further investigation and testing, I was very much wrong on that one. I'm trying to remember why I thought that it would return null in that situation, I know there was a reason for the extra get_post call in the resolve function, but for the life of me I can't remember what I was thinking when I wrote it. Disregard that part of my previous response. Though if I remember what edge case I was trying to account for I'll let you know.

@valendesigns
10 years ago

#67 @valendesigns
10 years ago

I've added three new tests and a couple fixes in patch 5305.8.diff. Two of those tests are to demonstrate the post_name=2 vs. post_name=02 issue, and that it's now been fixed. The third test demonstrates the pagination issue I mentioned above. Basically, the test would return true before the patch is applied and is expected to return true after the patch is applied. Without the latest changes, is_day would return false, which is not the expected behavior. I would have left this all alone until Monday, but I got pulled in when I started testing.

@boonebgorges
10 years ago

#68 follow-up: @boonebgorges
10 years ago

Thanks, valendesigns. You're right about nextpage - I hadn't been thinking of the case of invalid page numbers. On that note, I've added a few more tests, and tweaked the logic a bit more. Take the following two cases:

  1. A post called 02 with two instances of <!--nextpage-->. example.com/2015/02/05 is not a valid page, so should resolve to the day archive. (Your patch did this.)
  2. A post called 02 with *no* instances of <!--nextpage-->. example.com/2015/02/05 is not a valid page, so should resolve to the day archive. (Your patch did not do this, but 5305.9.diff does.)

I also reorganized the comments to narrate the flow a bit better. Let me know what you think.

#69 in reply to: ↑ 68 @valendesigns
10 years ago

@boonebgorges This looks great! Nice catch on the page 1 bug, as well. I think the only thing left to do is have more people test it out. So I'll summarize the patch for those interested in testing.

  • Added a total of twelve new tests. Two of those tests should pass and ten should fail before the patch is applied. Those tests are:
    test_date_slug_collision_should_distinguish_too_high_pagination_from_date()
    
    test_date_slug_collision_should_be_ignored_when_pagination_var_is_present_but_post_does_not_have_multiple_pages()
    
  • Resolved the following date premalink structures for numeric slugs; this includes posts without titles.
    • /post_name/
    • /post_name/page/
    • /year/post_name/
    • /year/post_name/page/
    • /year/month/post_name/
  • Accounted for day and monthnum values that are cast to integers, which strips leading zeroes. This allows us to resolve variations in post_name, like /2015/2/ and /2015/02/.
  • Added the page query var when post pagination is present and valid.
Version 1, edited 10 years ago by valendesigns (previous) (next) (diff)

#70 @boonebgorges
10 years ago

  • Keywords needs-patch added; has-patch needs-testing removed

Coming back to review this, I think there's a problem with the current approach. By intervening in WP_Query::parse_query(), we make it impossible to create a WP_Query object whose date params clash with the permalink of a post with a numeric slug. See test_numeric_slug_permalink_conflicts_should_only_be_resolved_for_the_main_query() in 5305.10.diff. What we are trying to resolve here is improperly parsed URLs, but solving the problem in WP_Query means that we're "fixing" queries that didn't originate in URLs at all.

I see two possible ways forward:

  1. Move our proposed disambiguation logic to somewhere around the 'request' filter or 'parse_request' action. The internals will have to change a bit.
  2. Keep it in WP_Query, but only run it if is_main_query().

2 is easier, but I don't think it'll be reliable. It won't fix url_to_postid(), for instance. It looks like 1 could possibly fit into parse_request() after non-public post_types are filtered out - the disambiguation is a similar kind of sanity check.

valendesigns, do you have thoughts about this?

#71 @valendesigns
10 years ago

@boonebgorges I originally had is_main_query() in the code about 5 patches ago, but I removed it before uploading and after finding that is_date worked reasonably well to keep our code out of other queries. Though, as you've pointed out we are now blocking a valid date query altogether. So yes, I agree we need to move it or rethink the implementation. I'm working on some ideas to see if it's possible to keep it where it is, but I'm not all that optimistic.

#72 @valendesigns
10 years ago

  • Keywords has-patch added; needs-patch removed

@boonebgorges I've moved the function, and how & where it's called. It now directly modifies the query vars before WP_Query is called. As well, I've added additional tests. Please take a look at 5305.11.diff and let me know what you think.

#73 @boonebgorges
10 years ago

Getting closer :)

After some conversation with valendesigns, I'm going to propose that we take a two-pronged strategy. See 5305.12.diff.

  1. During parse_request(), if we detect that a URL looks like a date query, look for conflicts with posts that have numeric slugs. (This is what 11.diff already did.)
  2. Simply doing (1) is a potentially incomplete solution, because it fixes the original bug - not being able to access a post '28' located at /2015/02/28/ - at the expense of creating another, lesser bug - not being able to access the date archive at /2015/02/28/. So we should also attempt to prevent the creation of posts with this kind of slug. 12.diff attempts a solution that is more modest, in two ways, than what has been suggested in previous discussion on this ticket.
    1. It only affects the auto-generated slug in get_sample_permalink(). If you want to override this by manually entering a conflicting slug (or by creating posts programatically), you may.
    2. It only enforces uniqueness (using the -2 etc formula used in wp_unique_post_slug()) if it's detected that the suggested post would actually create a date conflict. We look at the current permalink structure, and verify that the autogenerated post_name would be a valid value for year/monthnum/day, as appropriate. So the suffixing will only happen in very rare cases, instead of every time the post_name is numeric.

Feedback welcome. I'd especially like to know what people think about the get_sample_permalink() strategy - is it too lax? Should we be enforcing this in wp_unique_post_slug()?

#74 @valendesigns
10 years ago

@boonebgorges Sorry for the delay, I was out of town the last couple of days. This looks great! Though, I haven't tested it yet. I will tonight and get back to you. As far as the get_sample_permalink() strategy, my first thoughts are that it's good to be somewhere in the middle when enforcing unique slugs. Especially since we have a solid two prong approach that covers both sides of the fence without being overly aggressive.

This ticket was mentioned in Slack in #core by boone. View the logs.


10 years ago

This ticket was mentioned in Slack in #core by drew. View the logs.


10 years ago

#77 follow-up: @boonebgorges
10 years ago

  • Keywords 4.3-early added
  • Milestone changed from 4.2 to Future Release

Given the late date, I don't feel comfortable pulling the trigger on this for 4.2. Let's do it first thing for 4.3.

#78 in reply to: ↑ 77 ; follow-up: @valendesigns
10 years ago

Replying to boonebgorges:

Given the late date, I don't feel comfortable pulling the trigger on this for 4.2. Let's do it first thing for 4.3.

That's really unfortunate to hear. I assumed all that was left to do was get additional dev feedback. The last patch looked complete, and with all the unit tests this seemed like it would make it into 4.2 for sure. :(

#79 in reply to: ↑ 78 @boonebgorges
10 years ago

Replying to valendesigns:

Replying to boonebgorges:

Given the late date, I don't feel comfortable pulling the trigger on this for 4.2. Let's do it first thing for 4.3.

That's really unfortunate to hear. I assumed all that was left to do was get additional dev feedback. The last patch looked complete, and with all the unit tests this seemed like it would make it into 4.2 for sure. :(

The hard work will not be in vain. It's just that this ticket has been open for many years, with many of the Great Minds of WordPress trying unsuccessfully to solve the problem, so I want to be sure we have adequate time for feedback pre- and post-commit. Thanks for your patience.

#80 @valendesigns
10 years ago

I totally understand and that makes perfect sense. Cheers!

@valendesigns
9 years ago

refresh

@valendesigns
9 years ago

refresh

#81 @boonebgorges
9 years ago

  • Keywords 4.3-early removed
  • Milestone changed from Future Release to 4.3

This ticket was mentioned in Slack in #core by valendesigns. View the logs.


9 years ago

@boonebgorges
9 years ago

#83 @boonebgorges
9 years ago

valendesigns - I'm glad we let this percolate for 4.3, because in revisiting it, I found some problems :) I've run out of time to work on the patch today, so I wanted to throw it up here for you to chew on.

5305.15.diff proposes a couple of changes:

  1. Force 'wp_unique_post_slug_is_bad_[hierarchical|flat]_slug' in get_sample_permalink(). This is necessary, because when the existing $post is an auto-draft, the post_name in the database will be empty, which means that the slug won't be suffixed. I'm not a huge fan of the add/remove filter trick, but it's done elsewhere in core, and it's far simpler than any other solution I could come up with.
  1. I changed the "resolve" function name to avoid the underscore prefix.

Unfortunately, I still don't think we're ready to move on this. A couple outstanding issues that I want to throw out for consideration:

  1. The get_sample_permalink() suffixing is too aggressive. On a typical WP installation, it's only 'post' objects that will have URLs that clash with date archives. (Objects from hierarchical post types are prevented from having numeric slugs; see #11917 [13424].) So perhaps we should only be massaging the suggested permalink if the post_type is 'post'. But are there cases where a CPT might have a conflicting slug? I don't think so, but it needs some thought.
  1. I'm having second thoughts about the get_sample_permalink() strategy. Hierarchical post type objects are prevented from having numeric slugs at all; see #11917 [13424]. I don't think we have to go that far, but perhaps we should put our date-clash enforcement at the level of wp_unique_post_slug(), if only for consistency. This will mean rewriting a number of our tests, since it'll no longer be possible to create fixtures with date-clash slugs through the unit test factory.

#84 @valendesigns
9 years ago

@boonebgorges Sorry for not responding back sooner, I'll have some dedicated time this weekend to focus on this issue. But I wanted to comment that I do not think there is an issue here with CPT as a conflict cannot exists with these permalink structures. Correct me if I'm wrong here, but to me there is only an issue with posts because without the post_type query var it cannot route to a CPT. So a quick check for post sounds like a good solution to me.

#85 @boonebgorges
9 years ago

In 32604:

Improve unit tests for wp_unique_post_slug().

See #5305.

@boonebgorges
9 years ago

#86 @boonebgorges
9 years ago

Back to the drawing board with 5305.16.diff. I've moved the check into wp_unique_post_slug(). This avoids weird issues with post drafts, and more closely parallels the similar pagination checks in [13424].

16.diff does not contain the rewrite/query parsing. I'm of two minds about this. On the one hand, forcing uniqueness in wp_unique_post_slug() means that slug conflicts won't happen under normal circumstances, so it's not worth the additional overhead that wp_resolve_numeric_slug_conflicts() introduces. On the other hand, post/archive URL conflicts may still arise in the following cases: (a) legacy content, (b) changed permalink settings, (c) manual manipulation (direct SQL, filtering 'wp_unique_post_slug', etc). I could go either way here. valendesigns, what do you think?

#87 @valendesigns
9 years ago

Well 16 does remove a lot of potential overhead, so I am with you on that. Though, as you mentioned it has drawbacks with legacy slugs. Which I think is potentially the biggest issue we face, as the point of this ticket is really to fix legacy slugs.

Yes, 16 fixes future collisions, but if we go this route we're only half fixing the issue. I feel like wp_resolve_numeric_slug_conflicts should remain intact and be ran when there is a 404 or something along those lines. There needs to be a way to prevent the collision for legacy slugs for this really to be a complete fix. We need to limit the use of the function to specific scenarios so we don't introduce unnecessary overhead. Is that even possible or are we chasing a unicorn here?

@boonebgorges
9 years ago

#88 @boonebgorges
9 years ago

Which I think is potentially the biggest issue we face, as the point of this ticket is really to fix legacy slugs.

I disagree - the point of the ticket is to fix numeric slug conflicts. At any given time, all existing conflicts are "legacy", but preventing future instances of the problem is still a Good Thing.

Anyway, I was really just throwing 16 out there to see what you thought. Let's go with something like 5305.17.diff, which includes the rewrite-level conflict resolution. That patch also includes some enhancements to the unit tests so that they work properly with the new wp_unique_post_slug() changes, as well as new logic that prevents a URL like /2013/05/25 from resolving to post_name=25 if that post was not published in May 2013 (phew).

We need to limit the use of the function to specific scenarios so we don't introduce unnecessary overhead.

I think 17.diff is as focused as we're going to get. We can't run on 404, because the problem here is not 404s - the problem is that certain URLs resolve to date archives when they should resolve to single posts. So we only run the additional regex and database query when WP has parsed the request as a date archive. Anything more focused than this is probably a unicorn.

I think this is about ready to go, but I'd appreciate a last look.

#89 @boonebgorges
9 years ago

In 32647:

Disallow post slugs that will result in permalinks that conflict with date archive URLs.

On certain permalink structures, a numeric post slug will result in a post
permalink that conflicts with a date archive URL. For example, with permastruct
/%year%/%monthnum%/%postname%/, a post published in May 2015 with slug
'15' will result in a URL (/2015/05/15/) that conflicts with the archive
for May 15, 2015.

To avoid this problem, wp_unique_post_slug() rejects a requested slug when it
would generate a conflict of this type. Thus, in our example, '15' would
become '15-2'.

Props valendesigns, boonebgorges, Denis-de-Bernardy, loushou.
See #5305.

#90 @boonebgorges
9 years ago

  • Resolution set to fixed
  • Status changed from assigned to closed

In 32648:

When parsing what appears to be a date archive request, check for a post with a clashing permalink before resolving to the archive.

A URL like example.com/2015/05/15/ generally resolves to the May 15, 2015 date
archive. But in certain cases, it could also be the permalink of a post with
the slug '2015'. When a conflict of this sort is detected, resolve to the post
instead of the archive.

URL conflicts of this sort should no longer occur for new posts; see [32647].

Props valendesigns, boonebgorges, Denis-de-Bernardy.
Fixes #5305.

#91 follow-up: @hkucukali
9 years ago

  • Resolution fixed deleted
  • Status changed from closed to reopened

I tested it by creating a post without title which supposed to get an id=3862.
Permalink setting was %postname%

It converted the permalink to example.com/3862-2, but url redirects to 404 still.

#92 follow-up: @valendesigns
9 years ago

Shouldn't a permalink structure have slashes? Like, /%postname%/ for example.

#93 in reply to: ↑ 92 @hkucukali
9 years ago

Replying to valendesigns:

Shouldn't a permalink structure have slashes? Like, /%postname%/ for example.

Yes, I meant /%postname%/ in the first place. Sorry for sloppy writing.

5305.17.diff is not working on "creating a post without title" case while using /%postname%/ structure. Other information that may help: I tested 5305.9.diff, that was working.

#94 in reply to: ↑ 91 @boonebgorges
9 years ago

Replying to hkucukali:

I tested it by creating a post without title which supposed to get an id=3862.
Permalink setting was %postname%

It converted the permalink to example.com/3862-2, but url redirects to 404 still.

Just tested on trunk (rev 33152):

  1. Dashboard > Settings > Permalinks - changed permalink structure to /%postname%/ and hit Save Changes.
  2. Posts > Add New
  3. Created a post with no title and hit Publish
  4. A post is created with id 67.
  5. "Permalink" is of the form example.com/67-2/.
  6. Visiting example.com/67-2/ *shows the post as expected*, not a 404.

hkucukali - Can you verify that these are the steps you've taken? And can you also verify that you are testing against trunk, and not just a patched version of 4.2.x? It's possible that you made a mistake when backporting.

Can anyone else verify one way or another whether posts with no titles are working correctly?

#95 @hkucukali
9 years ago

Exactly same steps, but at last it goes 404.
I did not test against trunk, just patched 4.2.2. Sorry for inconvenience, I wanted to notify if it helps little bit.

#96 @boonebgorges
9 years ago

hkucukali - Thanks very much for the update. valendesigns, would you mind firing up trunk to test the steps listed above? If you can confirm it's working, we'll chalk hkucukali's problems up to a bad backport to 4.2.

#97 follow-up: @valendesigns
9 years ago

  • Keywords needs-patch added; has-patch removed

I get a 404 now too without the title. Though, all my old test posts that don't have titles and the -2 in them still work without returning a 404. Whats worse is that my slugs for the old posts are being forced updated to -2 structure if I save, which would break incoming links. I kind of wish we had gone with an earlier more backwards compatible patch that allowed the URL to be whatever we want them to be and not append junk to the end of the slug.

#98 @hkucukali
9 years ago

Well, I am happy for pointing that out.

@boonebgorges
9 years ago

#99 in reply to: ↑ 97 @boonebgorges
9 years ago

Replying to valendesigns:

I kind of wish we had gone with an earlier more backwards compatible patch that allowed the URL to be whatever we want them to be and not append junk to the end of the slug.

Oh pshaw. If it were easy, it would've been done years ago :)

I did manage to reproduce the issue. Two things in 5305.18.diff:

  • When no post_name is provided to wp_insert_post(), it falls back on sanitize_title( $post_title, $post_ID ). The $post_ID fallback hasn't needed a unique check in the past, since post IDs are unique. But now that wp_unique_post_slug() is doing a bit more work, we need to run the fallback title through it. (I didn't test, but it's likely that there is an existing bug here that's unrelated to numeric slugs: if you have a existing post with slug 'foo', and then try to create a post like this wp_insert_post( array( 'post_name' => '', 'post_title' => 'Foo' ) ), you'd probably get some weirdness. The unique check proposed here will fix that bonus bug.)
  • As valendesigns notes, wp_unique_post_slug() has no business suggesting a new unique slug for an existing post under normal circumstances. I've added a check to wp_unique_post_slug() that ensures that the numeric-slug check is only performed when the slug being checked is different from the one that the post already has.

It looks like this fixes the workflows that we've discussed in the last couple comments. valendesigns and hkucukali, could you put 5305.18.diff through its paces?

#100 @valendesigns
9 years ago

Thanks Boone! I'll thoroughly test the new patch later today.

#101 @valendesigns
9 years ago

  • Keywords has-patch commit added; needs-patch removed

Looks good! No more 404's on my end, all the old test posts work, and new posts only get the extra bits when they are going to collide with a date.

#102 @boonebgorges
9 years ago

Thanks for confiming, valendesigns!

#103 @boonebgorges
9 years ago

In 33261:

When creating a new post with an empty post_name and post_title, don't generate a post_name that conflicts with a date archive permalink.

See #5305.

#104 @boonebgorges
9 years ago

  • Resolution set to fixed
  • Status changed from reopened to closed

In 33262:

In wp_unique_post_slug(), only prevent date archive conflicts when the slug is being changed.

This prevents existing posts with numeric slugs from having their permalinks
changed on update.

Fixes #5305.

#105 @boonebgorges
9 years ago

#33089 was marked as a duplicate.

#106 @netweb
9 years ago

#33392 was marked as a duplicate.

#107 @netweb
9 years ago

  • Resolution fixed deleted
  • Status changed from closed to reopened

Reopening as r33262 breaks bbPress, bbPress replies are no longer created with a unique GUID:

Related: #bbPress2844

This ticket was mentioned in Slack in #core by obenland. View the logs.


9 years ago

@boonebgorges
9 years ago

@boonebgorges
9 years ago

#109 @boonebgorges
9 years ago

The bbPress ticket cited above is not strictly caused by the changes in [33262], but [33262] did expose an ancient bug in wp_insert_post().

In cases where a post has an empty post_name - generally this happens when post_title is empty - wp_insert_post() generates a post_name from the post's ID. (See eg [33261].) The function then directly updates the database using $wpdb->update(). https://core.trac.wordpress.org/browser/tags/4.2.4/src/wp-includes/post.php#L3407

A few lines later, wp_insert_post() generates a guid for new posts, using get_permalink(). https://core.trac.wordpress.org/browser/tags/4.2.4/src/wp-includes/post.php?marks=3438#L3434

The problem is that get_permalink() generates its return value based on a *cached* post, if available. Prior to [33262], the cache for the newly created post would always be empty at this point, which means that the fresh data would always be pulled from the database. But the get_post() call introduced in [33262] means that the cache is populated before the guid is determined, meaning that get_permalink() is looking at stale data. (wp_insert_post() does clear the post cache, but only later in the function.)

The solution is to clear the post cache immediately after the database update. See 5305.20.diff.

#110 @netweb
9 years ago

Boone, thanks for this, confirming 5305.20.diff​ fixes the issue for bbPress :)

#111 @dd32
9 years ago

5305.20.diff looks good to me.

#112 @boonebgorges
9 years ago

  • Resolution set to fixed
  • Status changed from reopened to closed

In 33630:

When generating a fallback post_name using the post ID, wp_insert_post() should clear the post cache immediately.

If the post cache is not cleared at this point, the cache can become stale
for operations performed before the cache is cleared later in the function.
Specifically, the generation of a guid for new posts can use stale data,
resulting in non-unique values. [33262] introduced a call to get_post()
that introduced just such an invalidation problem.

Fixes #5305.

This ticket was mentioned in Slack in #core by superphly. View the logs.


9 years ago

This ticket was mentioned in Slack in #core by boone. View the logs.


8 years ago

#115 @ocean90
6 years ago

#46172 was marked as a duplicate.

#116 @guigmonteiro
5 years ago

I'm using wordpress 5.2.2 and the bug still happens, but only if the post id is above 1000...
i have:

mywebsite.com.br/posttype/999 << Works fine
mywebsite.com.br/posttype/1001 << returns to archive page and the bug happens

Someone could help me with that?

Note: See TracTickets for help on using tickets.