Make WordPress Core

Opened 16 years ago

Last modified 5 years ago

#6492 reopened enhancement

Guids No Longer Have Permalink Format

Reported by: brianwhite's profile brianwhite Owned by:
Milestone: Future Release Priority: normal
Severity: normal Version: 2.5
Component: Database Keywords: needs-patch
Focuses: Cc:

Description

When you create a new post using WordPress 2.5 the GUID is created in the http://siteurl/?p=<PostId> format even when permalinks are enabled. This is because the _transition_post_status function in /wp-includes/post.php now checks if the guid is empty (which it never is) before resetting/creating it with the proper permalink structure. Line 2841 should be removed.

Attachments (3)

uuid.diff (3.0 KB) - added by Otto42 16 years ago.
UUID patch example
uuid_guid.6492.diff (1.5 KB) - added by filosofo 14 years ago.
6492.get_the_guid.patch (371 bytes) - added by r-a-y 14 years ago.

Download all attachments as: .zip

Change History (92)

#1 @lloydbudd
16 years ago

  • Milestone set to 2.6
  • Version set to 2.5

#2 @lloydbudd
16 years ago

  • Milestone changed from 2.6 to 2.5.1

#3 @Otto42
16 years ago

  • Severity changed from normal to major

#4 follow-up: @markjaquith
16 years ago

  • Milestone 2.5.1 deleted
  • Resolution set to invalid
  • Status changed from new to closed

This is intentional. The GUID is merely a unique identifier for that post. It does not have to be a URL at all. We just use a URL because our ?p=ID URLs are always going to be unique because the ID is unique to the blog, and the URL of the blog is unique to the blog. GUIDs should never, ever change.

#5 in reply to: ↑ 4 ; follow-up: @lloydbudd
16 years ago

  • Milestone set to 2.7
  • Resolution invalid deleted
  • Severity changed from major to trivial
  • Status changed from closed to reopened

ENV: wp trunk r7566 (version 2.5)

Replying to markjaquith:

This is intentional. The GUID is merely a unique identifier for that post.

It's trivial to break this, but definitely edge cases. Simply do an WordPress (WXR) import of content from a staging of the same blog. Or another scenario would be deciding to restart your blog, and later reimport some of the content, maybe selectively by author. You can easily result in multiple posts with the same GUID (as seen in the feed).

The property of uniquiness seems to have actually decreased by this change, not be much, but it does seem it has. Are there other scenarios where the property of uniqueness has increased?

#6 @markjaquith
16 years ago

Lloyd, my mistake -- didn't realize this was a change. At one time, we used to make '?p=X' guids. (My 2005 posts seem to have a lot of them). I'm looking into the issue.

#7 @westi
16 years ago

  • Keywords regression added

the change was introduced to preserve permalinks on import from my memory - it was something I committed a while ago I believe.

Marking as a regression. Off to search for the relavent changeset and ticket.

#8 @westi
16 years ago

[6593] is the changeset and #5589 was the ticket.

Does reverting that change revert the behaviour I wonder....

#9 @lloydbudd
16 years ago

  • Milestone changed from 2.7 to 2.6

#10 follow-up: @AaronCampbell
16 years ago

I'm sorry, if the ID is unique already, and it doesn't matter what the GUID is as long as it's unique, what's it's purpose exactly?

#11 in reply to: ↑ 10 @filosofo
16 years ago

Replying to AaronCampbell:

I'm sorry, if the ID is unique already, and it doesn't matter what the GUID is as long as it's unique, what's it's purpose exactly?

GUID stands for Globally Unique Identifier; the problem with the ID is that it's not globally unique. For example, both your blog and mine have posts with IDs of "24."

Maybe we should hash the GUID to keep people from being tempted to use it as a permalink.

#12 @AaronCampbell
16 years ago

A hash sounds like a good idea, but technically collisions COULD occur, so it wouldn't be global...just close to it. Another advantage would be a known length, which would cut down on the varchar 255 column.

#13 in reply to: ↑ 5 @lloydbudd
16 years ago

The ID is the whole URI, so the "24" isn't itself a problem, and generally a URI based scheme is an elegant solution, but as I wrote in lloydbudd it is currently possibly to get duplicates even within a blog through maintenance or moving the blog.

Also, personally it irrates me when I see localhost in the GUID after doing local development for a client and then importing the data ;-)

#14 @Otto42
16 years ago

Dumb question, perhaps, but why are we reinventing the wheel on this one?

mysql> select UUID();
+--------------------------------------+
| UUID() |
+--------------------------------------+
| d003bd32-9427-102b-9d35-000f1f666c3b |
+--------------------------------------+

#15 follow-up: @AaronCampbell
16 years ago

The Function Otto was talking about:
UUID in MySQL 5.0

It was added in MySQL v4.1.2

Also, note that UUID() does not yet work with replication.

#16 @Otto42
16 years ago

Yeah, sorry if that was unclear...

I just don't see the value in using the permalink for this if the permalink can change anyway. What's more, as has been shown, you can get conflicts when you do imports and such, if you are importing from your own older blog and so forth.

A UUID makes more sense. It's unique across the whole of space and time (in theory). It's shorter than most permalinks. And it's built in for just this sort of usage.

Quick and dirty example patch attached.

@Otto42
16 years ago

UUID patch example

#17 @AaronCampbell
16 years ago

Unfortunately, the WordPress Requirements support down to MySQL 4.0 which doesn't have UUID(), so there would be more code involved than that. Also, since it doesn't work with replication, I wonder what problems that would cause.

#18 @Otto42
16 years ago

I think that the so-called replication problem with it basically means that if you do something like this:
INSERT INTO whatever (guid) VALUES (UUID());

And have that whatever table replicating, then the slaves will get different UUIDs than the master. The UUID function call is replicated over to the slaves, making them generate new UUIDs. If, however, you pull out a UUID and then treat it as a string instead, it works fine.

Also, I think the ante should be upped to MySQL 4.1 anyway. But, it's entirely possible to create a UUID function in PHP, or just snag it from free code elsewhere, if the MySQL 4.0 support is such a big requirement.

#19 @generalkeebler
16 years ago

I've run headfirst into this issue because wp_link_pages seems to use the GUID. Now all my pagination links are "ugly". According to filosofo, this shouldn't be the case. Should wp_link_pages using GUID be its own bug report?

#20 @Otto42
16 years ago

wp_link_pages doesn't use the GUID. It uses the get_permalink function, as it should.

#21 @generalkeebler
16 years ago

My fault. wp_link_pages ditches get_permalink and uses the default URL for drafts/pending for whatever reason. Should've dug in the code first, sorry.

#22 @Denis-de-Bernardy
15 years ago

  • Keywords dev-feedback added; regression removed

Why not use php's uniqid() function?

#23 follow-up: @fix22
15 years ago

This is causing issues for a plugin I developed. It used the wp_posts.guid to get the full permalink to a post.

How can I get the permalink from the database now? If I retrieve this field in wp2.5+ it only comes back with a ugly permalink (/?=post_nr) (not a proper rewritten one as in /%postname%/)

Also, the documentation says that this is still the permalink: http://codex.wordpress.org/Function_Reference/get_post

#24 in reply to: ↑ 23 ; follow-up: @DD32
15 years ago

Replying to fix22:

How can I get the permalink from the database now? If I retrieve this field in wp2.5+ it only comes back with a ugly permalink (/?=post_nr) (not a proper rewritten one as in /%postname%/)

You've got 2 choices:

  • Manually create the URL based on the details in the DB
  • Include wp-load.php in your script, let it load all of WP, then call get_permalink($id)

Your other option, is to filter the data and set the GUID to the permalink always (Or to add a meta-field of the current permalink even)

#25 @Denis-de-Bernardy
15 years ago

When the post gets published, cache the permalink in a hidden post meta field. (Don't forget to keep it up to date when the permalink structure or the post changes.) Then, use a join to retrieve it.

#26 in reply to: ↑ 24 @fix22
15 years ago

You've got 2 choices:

  • Manually create the URL based on the details in the DB
  • Include wp-load.php in your script, let it load all of WP, then call get_permalink($id)

Thanks! In what version was get_permalink($id) introduced to WP?

#27 @DD32
15 years ago

Thanks! In what version was get_permalink($id) introduced to WP?

1.0.0 apparently. http://trac.wordpress.org/browser/trunk/wp-includes/link-template.php#L71

#28 @jidanni
15 years ago

  • Cc jidanni@… added

#9183 has (manually, dang nab it) been marked as a duplicate of this bug.

Gentlemen, observe the daring GUIDs at
http://www.coolloud.org.tw/tag/%E6%96%B0%E7%A7%BB%E6%B0%91/feed
E.g.,

<guid isPermaLink="false">35528 at http://www.coolloud.org.tw</guid>

which makes it quite clear that these aren't intended to be URLs,
isPermaLink or not.

Hmmm,

123 at http://nerds-are-us.net
123 at http://nerds-are-us.net/
123@nerds-are-us.net #like Message-IDs
123 at nerds-are-us.net

the last seems like a winner... gets the point (not a URL, not an
email) across. But fragile due to embedded spaces.
Anyway, maybe there's an Internet RFC for this...

#30 @Denis-de-Bernardy
15 years ago

  • Keywords has-patch tested added
  • Milestone changed from 2.9 to 2.8
  • Type changed from defect (bug) to enhancement

#31 @ryan
15 years ago

  • Milestone changed from 2.8 to Future Release
  • Resolution set to fixed
  • Status changed from reopened to closed

UUID() requires MySQL 4.1.2. We'd have to bump our MySQL requirement first.

#32 @ryan
15 years ago

  • Resolution fixed deleted
  • Status changed from closed to reopened

#33 @Denis-de-Bernardy
15 years ago

  • Keywords dev-feedback removed

#34 @Denis-de-Bernardy
15 years ago

  • Milestone changed from Future Release to 2.9

#35 @Denis-de-Bernardy
15 years ago

  • Keywords needs-patch added; has-patch tested removed

b0rke patch

#36 @westi
14 years ago

  • Milestone changed from 2.9 to Future Release

As we are now feature frozen it is too late to include this into 2.9.

Moving to Future Release as there is currently no patch.

#37 @filosofo
14 years ago

  • Keywords has-patch guid added; needs-patch removed
  • Milestone changed from Future Release to 3.0

I refreshed Otto's patch, which we can now use because 2.8 requires the use of MySQL 4.1.2.

#38 @filosofo
14 years ago

I mean 2.9, not 2.8.

#39 @Denis-de-Bernardy
14 years ago

  • Keywords bug-hunt added

#40 @Denis-de-Bernardy
14 years ago

  • Keywords featured added; bug-hunt removed

#41 @nacin
14 years ago

  • Milestone changed from 3.0 to Future Release

#42 @filosofo
14 years ago

  • Cc jidanni@… removed
  • Milestone changed from Future Release to 3.1

This has a patch. It's just waiting for commit.

#43 @r-a-y
14 years ago

Instead of setting the guid when permalinks are saved. You could also offer developers a filter on "get_the_guid".

function get_the_guid( $id = 0 ) {
	$post = &get_post($id);

	return apply_filters('get_the_guid', $post->guid, $post );
}

Attached patch above this comment.

#44 @filosofo
14 years ago

  • Milestone Awaiting Triage deleted
  • Resolution set to wontfix
  • Status changed from reopened to closed

Not going anywhere.

#45 @nacin
14 years ago

  • Milestone set to Awaiting Review

I think this is a good idea to move to. Especially now that we're bumping minimum requirements (and have to the point where we can use UUID now).

Perhaps getting a UUID should be a specific function or $wpdb method?

#46 @nacin
14 years ago

  • Resolution wontfix deleted
  • Status changed from closed to reopened

#47 @Denis-de-Bernardy
14 years ago

here's a php implementation for v4 uuids in PHP 5.3, in case it helps:

	/**
	 * Generates random bytes for use in UUIDs and password salts, using
	 * (when available) a cryptographically strong random number generator.
	 *
	 * @param integer $bytes The number of random bytes to generate
	 * @return string Random bytes
	 */
	public static function random($bytes) {
		$source = static::$_source ?: static::_source();
		return $source($bytes);
	}

	/**
	 * Initializes Crypto::$_source using the best available random
	 * number generator.
	 *
	 * When available, /dev/urandom and COM gets used on *unix and
	 * Windows systems, respectively.
	 *
	 * If all else fails, a Mersenne Twister gets used. (Strictly
	 * speaking, this fallback is inadequate, but good enough.)
	 *
	 * @return Closure The random number generator.
	 */
	protected static function _source() {
		switch (true) {
			case isset(static::$_source);
				return static::$_source;

			case is_readable('/dev/urandom') && $fp = fopen('/dev/urandom', 'rb'):
				return static::$_source = function($bytes) use (&$fp) {
					return fread($fp, $bytes);
				};

			case class_exists('COM', 0):
				// http://msdn.microsoft.com/en-us/library/aa388182(VS.85).aspx
				try {
					$com = new COM('CAPICOM.Utilities.1');
					return static::$_source = function($bytes) use ($com) {
						return base64_decode($com->GetRandom($bytes,0));
					};
				} catch (Exception $e) {
				}

			default:
				// fallback to using mt_rand() if all else fails
				return static::$_source = function($bytes) {
					$rand = '';
					for ($i = 0; $i < $bytes; $i++) {
						$rand .= chr(mt_rand(0, 255));
					}
					return $rand;
				};
		}
	}

	/**
	 * UUID related constants
	 */
	const clearVer = 15;  // 00001111  Clears all bits of version byte
	const version4 = 64;  // 01000000  Sets the version bit
	const clearVar = 63;  // 00111111  Clears relevant bits of variant byte
	const varRFC   = 128; // 10000000  The RFC 4122 variant

	/**
	 * Generates an RFC 4122-compliant version 4 UUID.
	 *
	 * @return string The string representation of an RFC 4122-compliant, version 4 UUID.
	 * @link http://www.ietf.org/rfc/rfc4122.txt
	 */
	public static function uuid() {
		$uuid = Crypto::random(16);

		// Set version
		$uuid[6] = chr(ord($uuid[6]) & static::clearVer | static::version4);

		// Set variant
		$uuid[8] = chr(ord($uuid[8]) & static::clearVar | static::varRFC);

		// Return the uuid's string representation
		return bin2hex(substr($uuid, 0, 4)) . '-'
			. bin2hex(substr($uuid, 4, 2)) . '-'
			. bin2hex(substr($uuid, 6, 2)) . '-'
			. bin2hex(substr($uuid, 8, 2)) . '-'
			. bin2hex(substr($uuid, 10, 6));
	}

#48 follow-up: @Otto42
14 years ago

Might be worth using a v3 UUID in the URL Namespace, as defined by RFC 4122. Here's some easy code to do that, just pass it a valid URL (like the permalink):

function uuid_v3_url($url) {
	$nhex = '6ba7b8119dad11d180b400c04fd430c8'; // Namespace_URL as defined in RFC 4122 (6ba7b811-9dad-11d1-80b4-00c04fd430c8)
	$nstr = '';
	for($i = 0; $i < strlen($nhex); $i+=2) {
		$nstr .= chr(hexdec($nhex[$i].$nhex[$i+1]));
	}
	var_dump($nstr);
	$hash = md5($nstr . $name);
	$uuid = sprintf('%08s-%04s-%04x-%04x-%12s',
		substr($hash, 0, 8),
		substr($hash, 8, 4),
		(hexdec(substr($hash, 12, 4)) & 0x0fff) | 0x3000,
		(hexdec(substr($hash, 16, 4)) & 0x3fff) | 0x8000,
		substr($hash, 20, 12)
	);
	return $uuid;
}

Example usage:
echo uuid_v3_url('http://example.com');

This form of UUID is based on the uniqueness of the URL. Giving it the same URL will give you the same result every time.

#49 @Otto42
14 years ago

Whoops, I left a var_dump in there from testing. Just remove that line.

#50 @nacin
13 years ago

  • Milestone Awaiting Review deleted
  • Resolution set to wontfix
  • Status changed from reopened to closed

#51 @Otto42
13 years ago

  • Resolution wontfix deleted
  • Status changed from closed to reopened

WordPress 3.2 is moving to require MySQL 5. Suggest reconsideration of this for 3.3.

#52 @Otto42
13 years ago

  • Milestone set to Awaiting Review

#53 in reply to: ↑ 15 @aaroncampbell
13 years ago

Replying to AaronCampbell:

The Function Otto was talking about:
UUID in MySQL 5.0

It was added in MySQL v4.1.2

Also, note that UUID() does not yet work with replication.

Moving to use UUID() wouldn't be that difficult. I'm still a little unsure of how it not working with replication will affect us.

#54 follow-up: @arnee
13 years ago

Why not just use a md5/sha1(microtime() . $blog_url)? That should be pretty unique and works everywhere?

#55 in reply to: ↑ 54 @Denis-de-Bernardy
13 years ago

Replying to arnee:

Why not just use a md5/sha1(microtime() . $blog_url)? That should be pretty unique and works everywhere?

Because uuid() is even more unique, probably faster, and works everywhere too? :-)

#56 follow-up: @arnee
13 years ago

Because uuid() is even more unique, probably faster, and works everywhere too? :-)

I'm still a little unsure of how it not working with replication will affect us.

Sounds like you already need workarounds for some cases ;-)

Also the md5/sha1 value is available before and after INSERT, while the UUID() has to be queried again.

Last edited 13 years ago by arnee (previous) (diff)

#57 in reply to: ↑ 56 @Denis-de-Bernardy
13 years ago

Replying to arnee:

Also the md5/sha1 value is available before and after INSERT, while the UUID() has to be queried again.

You can generate a uuid using php, e.g.:

http://rad-dev.org/forks/ddebernardy/lithium/source/branches/uuid/libraries/lithium/util/String.php

http://rad-dev.org/forks/ddebernardy/lithium/source/branches/uuid/libraries/lithium/security/Crypto.php

#58 follow-up: @arnee
13 years ago

Sorry, I don't really see the benefit of including a 300-line library for generating "just" a random string. I think a sha1 hash of microtime() and the blog URL is enough random and unique for this case, a call to /dev/urandom or initializing COM objects sounds just a bit excessive, no?

#59 in reply to: ↑ 58 @Denis-de-Bernardy
13 years ago

Replying to arnee:

Sorry, I don't really see the benefit of including a 300-line library for generating "just" a random string. I think a sha1 hash of microtime() and the blog URL is enough random and unique for this case, a call to /dev/urandom or initializing COM objects sounds just a bit excessive, no?

That entirely depends on how much what you're trying to do can cope with collisions.

Also, fwiw, it's not "just" a random string. It's a 128bit integer.

#60 follow-up: @westi
13 years ago

I'm not sure there is significant benifit for most installs in switching to UUID which from what I can tell from a quick google is mysql specific so UUID generation would probably have to become part of the wpdb interface.

I think it might be better to at least have a filter so that plugins could use a different GUID generation scheme for new posts.

#61 in reply to: ↑ 60 @Denis-de-Bernardy
13 years ago

Replying to westi:

I'm not sure there is significant benifit for most installs in switching to UUID which from what I can tell from a quick google is mysql specific so UUID generation would probably have to become part of the wpdb interface.

On the small benefit, I tend to agree, since the core devs expressed no interest in using it downstream in the editor and introduced the new draft types instead. Re-opening this ticket feels a bit like beating a dead horse, so suggesting we close it back (possibly with a filter as you suggest).

Your Googling is giving you incorrect impressions, by the way.

For one, other DB products support it under other names: GUID, UUID, Identity, Unique Identifier, etc. More importantly, it has less to do with being a native DB type (leading to efficient storage and indexing) than it has to do with having a means to generate a unique number that can be used as a surrogate key on distributed systems.

With respect to the latter point, PostgreSQL offers the UUID type but no means to generate one unless you install a contrib module. Best I recollect the pg-hackers discussion, one of the arguments was that generating them can, and usually should, be done at the application layer anyway.

#62 @aaroncampbell
13 years ago

I don't particularly care how we get the unique hash, I'd just prefer that it be a hash of some kind. The biggest issue here seems to be the huge number of new developers that grab a post from the database, look at the data, see that guid is the URL, and use it accordingly.

#63 @braydonf
12 years ago

Doesn't the GUID have to be a URL for all media? Related ticket concerning media urls stored as a guid value #19110

#64 in reply to: ↑ 48 ; follow-up: @braydonf
12 years ago

Replying to Otto42:

(...)

This form of UUID is based on the uniqueness of the URL. Giving it the same URL will give you the same result every time.

Basing it from the URL really wouldn't be that much of a difference than just using the URL itself, especially since there wouldn't be a different GUID for revisions. For example if there is a post made and it is read by various feed readers with one GUID, by changing the revision it would affect the GUID noting that the last read from the feed reader has been changed, and nolonger valid. This is especially sensitive for posts that have time sensitive information such as Paypal buttons on items to be purchased with a limited quantity. I've been working on a feed reader and because of the problem of posts being able to be updated, I need to verify that the title and contents have not changed. If they have been then I update the post.

#65 in reply to: ↑ 64 ; follow-up: @Otto42
12 years ago

Replying to braydonf:

Basing it from the URL really wouldn't be that much of a difference than just using the URL itself, especially since there wouldn't be a different GUID for revisions.

The URL can change, but the GUID should never change even if you change the URL. The GUID shouldn't change because the post content changes.

For example if there is a post made and it is read by various feed readers with one GUID, by changing the revision it would affect the GUID noting that the last read from the feed reader has been changed, and nolonger valid. This is especially sensitive for posts that have time sensitive information such as Paypal buttons on items to be purchased with a limited quantity. I've been working on a feed reader and because of the problem of posts being able to be updated, I need to verify that the title and contents have not changed. If they have been then I update the post.

Different problem. The GUID is an identifier for the post. A new GUID means a new post, not an update to an old one.

#66 in reply to: ↑ 65 ; follow-up: @braydonf
12 years ago

Replying to Otto42:

Replying to braydonf:

Basing it from the URL really wouldn't be that much of a difference than just using the URL itself, especially since there wouldn't be a different GUID for revisions.

The URL can change, but the GUID should never change even if you change the URL. The GUID shouldn't change because the post content changes.

So the GUID shouldn't be based on the URL, but what else would be used to get it if it where based on using a uuid, md5, sha1 hash that wouldn't change? Or could the GUID be a randomly generated hash and that would be its point of reference. However if the GUID isn't a peramlink, how would the permalink be sent to RSS readers, since that is what is currently used.

For example if there is a post made and it is read by various feed readers with one GUID, by changing the revision it would affect the GUID noting that the last read from the feed reader has been changed, and nolonger valid. This is especially sensitive for posts that have time sensitive information such as Paypal buttons on items to be purchased with a limited quantity. I've been working on a feed reader and because of the problem of posts being able to be updated, I need to verify that the title and contents have not changed. If they have been then I update the post.

Different problem. The GUID is an identifier for the post. A new GUID means a new post, not an update to an old one.

In feeds the GUID would need to stay the same, your right, otherwise there would be no point of reference.

#67 in reply to: ↑ 66 ; follow-up: @Otto42
12 years ago

Replying to braydonf:

So the GUID shouldn't be based on the URL, but what else would be used to get it if it where based on using a uuid, md5, sha1 hash that wouldn't change? Or could the GUID be a randomly generated hash and that would be its point of reference.

The GUID can be anything that is "globally unique".

However if the GUID isn't a peramlink, how would the permalink be sent to RSS readers, since that is what is currently used.

No, it's not. The GUID is *not* a permalink, and the spec specifically says not to treat it as one. It even has an isPermalink="false" attribute, just to push this idea across properly.

The <link> tag contains the permalink to the post.

In feeds the GUID would need to stay the same, your right, otherwise there would be no point of reference.

The GUID needs to stay the same, period. And the GUID should only be used in feeds. The fact that it's being used for other things is what is wrong.

#68 in reply to: ↑ 67 @braydonf
12 years ago

Replying to Otto42:

Replying to braydonf:

So the GUID shouldn't be based on the URL, but what else would be used to get it if it where based on using a uuid, md5, sha1 hash that wouldn't change? Or could the GUID be a randomly generated hash and that would be its point of reference.

The GUID can be anything that is "globally unique".

UUID here would be unique, since there is extremely small probability of collision. After calculating 70,368,744,177,664 random UUIDs the probability of a duplicate is 0.0000000004

Ref: http://en.wikipedia.org/wiki/Universally_unique_identifier#Random_UUID_probability_of_duplicates

#69 @toscho
12 years ago

  • Cc info@… added

#70 @ruckus
11 years ago

  • Cc kimmo@… added

#71 @iseulde
11 years ago

  • Component changed from General to Database

#72 @wonderboymusic
11 years ago

  • Keywords needs-patch added; has-patch guid featured removed
  • Milestone changed from Awaiting Review to 3.7

Let's resurrect this

#73 @jeremyfelt
11 years ago

  • Cc jeremy.felt@… added
  • Severity changed from trivial to normal

Lots of +1

Related (for comparing uniqueness during an import) #18315

Last edited 11 years ago by jeremyfelt (previous) (diff)

#74 follow-up: @nacin
11 years ago

What's the plan here? I see a few:

  1. Stop using GUIDs for anything other than UUID usage. Attachments are guilty here.
  2. Start using real UUIDs for GUID.

I'm for 1. I'm not sure 2 is worth the pain.

#75 @wonderboymusic
11 years ago

Looks like customizer is the only thing that uses GUIDs to display images (background/header #yolo) - the rest of the time it is used to short circuit the filename or fake a title by doing basename( $guid ) or something

Last edited 11 years ago by wonderboymusic (previous) (diff)

#76 follow-up: @jeremyfelt
11 years ago

  • Milestone changed from 3.7 to Future Release

We should approach this again in the future. Decisions should probably be made early in a cycle so that thorough testing can happen.

#77 in reply to: ↑ 76 @braydonf
10 years ago

Replying to jeremyfelt:

We should approach this again in the future. Decisions should probably be made early in a cycle so that thorough testing can happen.

Likely problem would be that any plugin code that incorrectly uses the GUID as a link would not work correctly, and would need to be updated.

#78 @here
9 years ago

There is also no verification that a newly created guid be unique. Out of the box, guids are unreliable as unique identifiers.

In my case, an imported post has guid "http://example.com/?p=100" , yet the site only has 90 posts. As a result, the 10th "newly created" post ends up with a duplicate guid with post ID 100.

Since we are relying on guids to be unique, this breaks things.

Filtering wp_insert_post_data will allow an override to generate and use UUIDs or other unique identifier, but this definitely feels like something worth addressing in core.

Are there still examples in core like Attachments and Customizer that are known to rely on guids for URI or path information? Lets confirm and fix those first @74 , @75 .

#79 in reply to: ↑ 74 @here
8 years ago

Perceived goal of this 8 year old ticket: Reliable unique identifiers for WordPress posts table.

Replying to nacin:

What's the plan here? I see a few:

  1. Stop using GUIDs for anything other than UUID usage. Attachments are guilty here.
  2. Start using real UUIDs for GUID.
  1. Require folks who need a UUID to filter or ignore GUID.
  2. Add new UUID field and leave GUID in place for compatibility?

If the current guid field didn't exist, adding a unique identifier would be an obvious enhancement. Since we do not, in fact, currently have a working unique identifier, lets add one.

Changing the existing guid field may break things in core and 3rd party plugins, so we can leave it there as deprecated.

#80 @here
8 years ago

Related to #18286 where adding a new uuid field was also discussed.

#81 @here
8 years ago

Of note, the rewritten WordPress Importer will no longer be updating the guid to match the target.

https://make.wordpress.org/core/2015/11/18/wordpress-importer-redux/

"... Removing the GUID change means we can easily check if an attachment has been imported already. The downside is that plugins still using the GUID may break; this needs to be fixed in the plugins in question, as the GUID is already an unreliable reference to the image."

#82 follow-up: @rmccue
8 years ago

FWIW, I am hugely in favor of using actual GUIDs/UUIDs instead of URIs. One of the issues with using URIs is that people want to change them to hide their development environment host/etc, which causes issues for deduplication on imports.

(There was only a single instance of the GUID being used in core UI that I found while working on the importer, but that was fixed in #33386.)

#83 follow-up: @rmccue
8 years ago

I wrote a plugin to generate actual UUIDs for the GUID. It also uses the urn:uuid: syntax so it's compatible with people expecting URLs. It adds urn: as an allowed protocol, although we can probably get away with just not running esc_url_raw.

#84 @mattheu
8 years ago

Huge +1 from me. Seems like a neat solution.

#85 in reply to: ↑ 82 ; follow-up: @Otto42
8 years ago

Replying to rmccue:

(There was only a single instance of the GUID being used in core UI that I found while working on the importer, but that was fixed in #33386.)

Does that adequately address nacin's point 1 from above?

  1. Stop using GUIDs for anything other than UUID usage. Attachments are guilty here.

If so, then addressing point 2 becomes possible, however we may want to consider any impact on plugins using the guid field incorrectly. Might be worth a scan.

#86 in reply to: ↑ 85 @rmccue
8 years ago

Replying to Otto42:

Does that adequately address nacin's point 1 from above?

  1. Stop using GUIDs for anything other than UUID usage. Attachments are guilty here.

If so, then addressing point 2 becomes possible, however we may want to consider any impact on plugins using the guid field incorrectly. Might be worth a scan.

It doesn't change GUIDs for attachments at all yet. The primary issue with these is that at one point, WordPress used the GUID for the filename itself, and that was the recommended way of getting the direct file URL. If we change this, we technically break BC, which is a problem. I'm not sure whether we want to consider that.

Core already stopped using GUIDs, I believe #33386 mopped up the last instances of it.

Apart from attachments, all other GUIDs become UUIDs, but only for new posts. Old posts won't change.

#87 in reply to: ↑ 83 @keithcancel
8 years ago

Replying to rmccue:

I wrote a plugin to generate actual UUIDs for the GUID. It also uses the urn:uuid: syntax so it's compatible with people expecting URLs. It adds urn: as an allowed protocol, although we can probably get away with just not running esc_url_raw.

I looked at your plugin would you recommend registering a protocol like you did in your plugin for this issue #36928. Honestly, not have having to have the guid prepended with http or protocol at all would be nice. So getting rid of esc_url_raw would nice.

--Edit wouldn't also adding a unique contraint to guid column be good idea?

Last edited 8 years ago by keithcancel (previous) (diff)

#88 @dd32
6 years ago

#42312 was marked as a duplicate.

#89 @thomasprice61
5 years ago

Now that wp_generate_uuid4 has been introduced in 4.7, will this be reopened?

Note: See TracTickets for help on using tickets.