Make WordPress Core

Opened 17 years ago

Closed 15 years ago

Last modified 15 years ago

#5183 closed enhancement (wontfix)

General Meta-Data Table

Reported by: filosofo's profile filosofo Owned by:
Milestone: Priority: normal
Severity: normal Version: 2.9
Component: General Keywords: needs-patch meta-table meta
Focuses: Cc:

Description

As I proposed on wp-hackers, it would be great to have a general meta table.

There are at least a couple of benefits to such a table:

  • We could have meta-data for comments and categories.

Others have proposed meta tables for the comments (e.g. #5153 and #2659); rather than
adding meta tables seriatim, a general meta-data table would meet that
need and offer the possibility of actually reducing the total number
of tables.

  • It would provide a better place for plugins to store data.

Currently, most plugins---when they don't have enough data to justify
creating their own table---store data in the options table. I think
there are disadvantages to bloating the options table, one of which is
that WP loads almost all options into the object cache.

My attached patch includes the code necessary get such a table working in WP (aside from decrementing the db_version), and I've tested each of the get_general_meta, update_general_meta, and delete_general_meta functions successfully.

By the way, update_meta and delete_meta would have been cleaner names, but they're already taken for admin post meta stuff.

Attachments (1)

general_meta_table.diff (4.4 KB) - added by filosofo 17 years ago.

Download all attachments as: .zip

Change History (42)

#1 @westi
17 years ago

+1

Some notes from a quick first pass of the patch:

  1. You need to create (or find) a meta_key sanitization function - duplicated preg_replace's are bad.
  2. Doesn't ->prepare escape the data for you?/prepa
  3. db_insert and db_update should probably me used now they exists ;-)

#2 @filosofo
17 years ago

  1. I was thinking about using sanitize_title, but I wondered if there was some reason someone uses this regex in the user meta functions to exclude dashes.
  2. I've updated the patch to take out the extraneous escape.
  3. I've updated the patch to use db_insert, but db_update is no good, as it allows you to specify only one column in the where clause.

#3 @Otto42
17 years ago

General thoughts that I can think of right now:

I agree with Marcos' opinion on wp-hackers: Change meta_type to object_type.

I didn't think about this when designing the table schema, but having a NULL meta_key makes no sense that I can think of. Change it to NOT NULL with no default. Force the entry to have a defined key.

#4 @darkdragon
17 years ago

I would really like to see this be included in WordPress!

I'm referencing #3364, since it has to do with metadata db improvements.

#5 @ryan
17 years ago

  • Milestone changed from 2.5 to 2.6

#6 @Denis-de-Bernardy
15 years ago

the table should have a key on (meta_type, object_id, meta_key), to allow for quick queries with a meta_type/object_id constraint.

#7 @Denis-de-Bernardy
15 years ago

  • Keywords needs-patch added; has-patch removed
  • Milestone changed from 2.9 to Future Release

#8 @filosofo
15 years ago

  • Milestone Future Release deleted
  • Resolution set to wontfix
  • Status changed from new to closed

I no longer think this is necessary. There is a meta-data API in WordPress, but it's known as the "taxonomy" API.

#9 @Denis-de-Bernardy
15 years ago

see #10142 however, and I'm pretty certain we'll want a comments meta table at some point.

leaving this closed, however, because I think the meta tables should be separate for every data type.

#10 @filosofo
15 years ago

Is there a particular reason why you think separate tables would be better? The problem I see is that WordPress is trending toward more abstract data types.

#11 @Denis-de-Bernardy
15 years ago

yeah. it makes it impossible to enforce a foreign key constraint, and performance is harmed due to the larger index.

2.3 already removed the ability to do so with categories and tags. If anything, I'd personally push towards using post2tags, link2cats, post2cats tables for instance, and dropping the terms_relationship tables. But this is probably a discussion we'd want to have in IRC or a separate ticket.

#12 @westi
15 years ago

  • Resolution wontfix deleted
  • Status changed from closed to reopened

I don't see the taxonomy api as a place for storing metameta.

It doesn't fit at all what I would envisage meta meta containing.

#13 @Denis-de-Bernardy
15 years ago

We should probably close this nonetheless. For performance sake, we'll be better off with a meta table *per* relevant table than one huge table that references things all over the place. This doesn't prevent us from having a generalized *API* to grab things, but the last thing we want is another table like wp_term_relationships that stores cats and dogs.

#14 @Denis-de-Bernardy
15 years ago

just for reference:

itching to produce a big patch along the lines of:
11:41 PM
drop table wp_terms; drop table wp_term_taxonomy; drop table wp_term_relationships;
11:42 PM
create table wp_taxonomies; create table wp_post2tax; create table wp_link2tax
11:42 PM
and then, create table wp_tax_meta;
11:43 PM
at some point

#15 @Denis-de-Bernardy
15 years ago

Keeping more reference material:

filosofo
ddebernardy, that would be a regression.  the taxonomy system is one of the best parts of wp, because of its abstractness.
11:48 PM ddebernardy
I'd vote for it being one fo the worst parts, myself
11:48 PM filosofo
why?
11:48 PM ddebernardy
well, for one thing, the current implementation has a shared terms table
11:49 PM
which prevents a category and a tag with the same slug from having the a different name (with/without cap)
11:49 PM
then, there is the wp_term_relationships table
11:49 PM
which contains cats, and dogs, and what not
11:49 PM
this prevents us from defining a foreign key if needs be
11:50 PM
and it underperforms what we'd get with a table for posts (it can contain tags and cats, that's no issue) and a separate one for links
11:50 PM
then, in the wp_taxonomies I suggested, we'd have a unique key on (taxonomy,slug,parent)
11:51 PM
or something like that involving the three fields
11:51 PM
that would allow to have a category with the same slug and multiple parents
11:51 PM
lastly, a key on slug (or the previous key) would still allow to join wp_tax on itself, in order to cross the boundaries on link terms and post terms

#16 @Denis-de-Bernardy
15 years ago

More reference, since IRC logs are broken:

well, I don't think there's anything about the wp_terms table structure that prevents different terms from having the same name.  that's an api issue
11:53 PM ddebernardy
no no, it's a db issue underneath
11:53 PM filosofo
The foreign key thing could be solved with another table, wp_objects
11:54 PM ddebernardy
that's even worse than our current implementation
11:54 PM
and it won't allow the foreign key
11:54 PM filosofo
?
11:55 PM ddebernardy
well, unless you're meaning wp_objects is merging posts and links (heck, why not...), it's really not a good idea, because you'd merely add an extra table in the lot that doesn't fix the foreign key constraint problem
11:56 PM filosofo
right, that's what I mean.  all objects in one table
11:56 PM ddebernardy
FWIW, I've been playing with that idea fo the past 10 years or so
11:56 PM
the only implementation I found worthy of implementing was like this:
11:56 PM
table nodes (id)
11:56 PM
table foo (id references nodes(id))
11:57 PM
that way you're working on native structures
11:57 PM
and you still have native indexes
11:57 PM
but it actually comes down to lackings in database engines
11:57 PM
because what the above really should be is:
11:57 PM
table nodes (id)
11:57 PM
table foo inherits nodes
11:58 PM
with id shared across the board
11:58 PM
you can do that with a uuid field, basically. but your meta, e.g. created, modified, etc. then needs to be redefined in each table
11:59 PM ruslany__ has left IRC ("Leaving")
11:59 PM ddebernardy
in pgsql, inherit won't enforce unique indexes. and I'm not even aware of a similar keyword in mysql
12:01 AM filosofo
I was going to say, I didn't think you could do that in MySQL.  I don't think foreign keys are supported with all mysql storage engines.
12:01 AM ddebernardy
no, they're not. only in innodb, ndb, and a few more (but not myisam)
12:02 AM rodrigo_sampaio1 has left IRC ("Leaving.")
12:02 AM ddebernardy
still, it's generally good to be able to enforce them. it means the db is well designed if you can, and I'd like to make it possible to enforce transactions in 2.9 or 3.0.
12:04 AM filosofo
what about the db structure prevents different terms from having the same name?
12:04 AM ddebernardy
UNIQUE KEY slug (slug),
12:04 AM
in wp_terms
12:04 AM filosofo
ok, missed that.  So change that.
12:05 AM ddebernardy
no no, you want more than that, really
12:05 AM
by the same token, then other fields would want to be in wp_term_taxonomy
12:05 AM
with a unique index on slug/parent/taxonomy
12:05 AM
and the term table then disappears
12:06 AM
for the sake of marking the change, I'd suggest renaming the table as wp_taxonomies, so that old plugins end up miserably failing (or then, we'd add a pseudoview to make it backwards compat)
12:07 AM
I'd be curious to have the thoughts of rboren on all of this, though

12:11 AM sorich87 has joined the channel.
12:13 AM filosofo
so how about 3 tables: terms, taxonomies, objects
12:13 AM
we could move taxonomy definition into the db
12:13 AM ddebernardy
why terms and taxonomies?
12:14 AM
if you've an index on slug, held in taxonomies, then you can join that table on itself using ( slug ) where t1.id <> t2.id, and you've the same thing as terms
12:14 AM
the ultimate difference is, with three tables, you've three joins, with two tables, you've only two :-)
12:14 AM
and when you join tax join tax join tax as I and others do in a few plugins, it can make a huge difference
12:15 AM
query optimizers are generally lost after 8-10 joins and end up using a (slow) genetic algorithm to optimize query plans
12:15 AM filosofo
ok, but wouldn't there be a performance benefit?
12:15 AM sivel[a] is now known as sivel
12:16 AM filosofo
currently, to query a taxonomy you have to do a text search on varchar
12:16 AM
with a taxonomy table, you could make that an integer
12:16 AM ddebernardy
oh, that's what you mean
12:16 AM filosofo
db size would be smaller, too
12:17 AM ddebernardy
in this case yeah, agreeing with you, but the general trend in WP is to use php arrays for this kind of stuff
12:17 AM
I mean, we're using a serialized option to store roles :D
12:18 AM filosofo
I think everyone would like to move away from that
12:18 AM
the roles, I mean
12:18 AM ddebernardy
yeah, I hope I'll find enough time to get this done in 2.9
12:19 AM
but there are still a couple of things we'll want in varchar fields even if we do that, for the same of making thing pluginable
12:19 AM
take roles, for instance
12:19 AM filosofo
so a role is taxonomy
12:19 AM ddebernardy
by the same token as the taxo table you're describing, we'd want roles and caps defined in tables with ids references in a user2role table
12:20 AM
trouble is, this makes heavy queries
12:20 AM
in practice, it's better to have an enum field to store roles, and caps
12:20 AM
enum is really an int with another name
12:20 AM
now, to make the thing extendable, we actually want a varchar in there
12:20 AM
and the same holds for taxonomies/terms
12:21 AM
whatever we do, we'll end up with a varchar field to make things extendable
12:21 AM
and end up with the thingy containing post_tag, category, link_category (and cities, countries, and whatever plugin authors want it to contain)
12:22 AM filosofo
problem with role as enum is that it doesn't allow easy custom roles, right?
12:22 AM ddebernardy
exactly
12:22 AM
hence the need for a varchar in WP, even if it performs more poorly
12:23 AM filosofo
hmm, but see what I meant before is that in your wp_taxonomies table...
12:23 AM ddebernardy
yes?
12:23 AM filosofo
you'd have to query over zillions of varchars, instead of just a few in a taxonomy table.
12:24 AM ddebernardy
no no, you'd still have a single one - to fetch post_tag, or category, or link_category, or...
12:24 AM
picture it as the current term_taxonomy and terms combined
12:24 AM filosofo
right, if we combine present terms and taxonomy....
12:25 AM ddebernardy
and renamed for the sake of making things backwards compatible (we'd make subqueries or views available for plugins that dig straight into it)
12:25 AM filosofo
then if we want to query for the "post_tag" "my term", we have to query over the large terms table looking for "post_tag"
12:26 AM ddebernardy
it would contain the same number of rows as the current term_taxonomy table
12:26 AM
it would merely have a few extra fields to play with, avoiding a join, and adding for the possibility of extra indexes
12:26 AM filosofo
yes, that's why I think we need a taxonomy table.
12:27 AM ddebernardy
issue with a taxonomy table (as you're thinking, referencing what really ends up being an array in WP, or so I'm understanding), has to do with the register_taxonomy function
12:27 AM
I suspect
12:28 AM filosofo
just make register_taxonomy like add_rewrite_rule
12:28 AM
a one-time thing
12:28 AM ddebernardy
works too, but we end up breaking plugins in the process, if terms becomes a different animal
12:29 AM

filosofo
And this would improve the query_posts queries,
12:30 AM
because you could do one query
12:31 AM ddebernardy
correct
12:31 AM
hence the need for rboren feedback
12:31 AM
I mean, I'd be all +1 to break existing plugins and change the semantics of wp_terms
12:32 AM
but one of the core devs may end up wanting to keep backwards compat

#17 @Denis-de-Bernardy
15 years ago

  • Milestone set to Future Release

#18 @Denis-de-Bernardy
15 years ago

see also #10233, closed as dup

Replying to scribu:

Over time, there have been requests for category metadata, comment metadata and even link metadata.

In one of the more recent discussions on this topic, a generic metadata table has been proposed.

I also think that this is the way to go forward: instead of having a separate table for each object type, just have a single, generic, plugin friendly table.

This table would have the following columns:

object_id
object_type
meta_id
meta_key
meta_value

It's basically the same as the current postmeta and usermeta, except that post_id and user_id are replaced with object_id. Also, a new column 'object_type' would simplify lookups.

#19 follow-up: @mikeschinkel
15 years ago

  • Milestone changed from Future Release to 2.9
  • Version changed from 2.3 to 2.9

This came up as a discussion on the hackers list because of my need for link metadata as I wanted to do it how it might evolve to be done in future versions of WordPress.

The consensus on the list was to move everything to a wp_meta table and scribu nicely submitted ticket #10223 although he was pointed back to this one. Here it seems the consensus is that a single meta table is too abstract and will add too much overhead for most people's liking. Further this ticket has stalled without addressing the problem.

So let me take a stab at a strawman. Let's keep wp_postmeta and wp_usermeta but lets add a more generic table called wp_othermeta and let's use it for comments, links, or whatever. It would need an object_type field, of course but this would give anyone the ability to add meta to anything they need and should be a good balance of the concerns expressed thus far.

BTW, I selected "General" as a "Component" because there was no "Meta" to select.

#20 @Denis-de-Bernardy
15 years ago

  • Version changed from 2.9 to 2.3

#21 in reply to: ↑ 19 ; follow-up: @Denis-de-Bernardy
15 years ago

  • Milestone changed from 2.9 to Future Release

Replying to mikeschinkel:

This ticket will likely get fixed in 2.9 indirectly, when we address permalink history. In the meanwhile, punting it back to Future pending patch.

Personally, I'd be +1 to adding several meta tables (comment_meta, link_meta, term_meta), and abstract the meta functions if needs be to fix the issues in user_meta. A unique meta table that includes the user_meta is a no go due to WPMU considerations. And combining the rest is a no go, imo, for performance reasons (the index is too large).

#22 in reply to: ↑ 21 @mikeschinkel
15 years ago

  • Version changed from 2.3 to 2.9

Replying to Denis-de-Bernardy:

Can you elaborate to what you mean by "when we address permalink history?" Maybe provide a ticket link? Thanks.

As for adding several meta tables I'd ask that we consider possibly adding one for comment_meta (which could be very large) but use an abstract solution for items such as links and terms that are likely to have orders of magnitude less records than posts or comments. This would also allow anyone to add meta for any arbitrary object type without having to add custom tables.

If we only add explicit meta tables i.e. wp_linkmeta, wp_termmeta, etc. this issue will come up over and over when people want meta for some new object type (including "meta meta") plus we'll be littering the database with tables. Better to address it once now moving forward than having to constantly readdress it.

As for user meta, I have yet to work on a project that needs advanced user meta so currently I have no experience nor opinion on user meta.

#23 follow-up: @Denis-de-Bernardy
15 years ago

see #9825 for permalinks, and #7540 for user meta.

#24 in reply to: ↑ 23 @mikeschinkel
15 years ago

Replying to Denis-de-Bernardy:

see #9825 for permalinks, and #7540 for user meta.

Thanks for the links.

What of the idea to have a flexible meta table for the "lesser" objects like links, terms, and whatever object a plugin developer might need meta for? Does a wp_othermeta work?

-Mike

#25 follow-up: @Denis-de-Bernardy
15 years ago

Personally, I'd rather see the schema changed as follows:

  • wp_posts
  • wp_postmeta
  • wp_users
  • wp_usermeta
  • wp_links
  • wp_linkmeta
  • wp_terms
  • wp_termmeta
  • wp_term2post
  • wp_term2link
  • wp_term2user (assuming we want that around too; this would allow us to manage user roles as a taxonomy)

... that way you can have foreign keys and proper indexes everywhere. But it's a discussion we should be having during an IRC meet-up.

#26 @dd32
15 years ago

... that way you can have foreign keys and proper indexes everywhere. But it's a discussion we should be having during an IRC meet-up.

I dont see why you couldnt have a foriegn key.. They can be based on a Where column = 'blah' and ID = foriegn.id right?

..Proper indexes can include multiple columns too..

No need for a huge number of tables, Only usermeta/postmeta are used at present, the others will have little usage for a few more revisions to come.. and i dont see the meta becoming a huge burden.

#27 @Denis-de-Bernardy
15 years ago

@dd32: A foreign key constraint can only reference a single table. Currently, a constraint on wp_term_relationships(object_id) referencing wp_posts(ID) would fail for this reason -- it could also be a link id. The performance implication is you cannot easily do: delete from wp_term_relationships where object_id = $post->ID. Merging meta tables would lead us to similar problems.

#28 in reply to: ↑ 25 @mikeschinkel
15 years ago

  • Cc mikeschinkel@… added

Replying to Denis-de-Bernardy:

Are you are wanting to see the terms/term_relationships/term_taxonomy table go away? Can I assume because you want to see foreign key constraints?

Here is my perspective and I've got over 20 years database experience on several different platforms as a developer, as a trainer, and as a book author. In my earlier years I was very much into 3rd normal form, referential integrity, explicit tables and relationships and so on. What I found was complexity quickly ratcheted up and significant flexibility was lost with that approach.

Since I'm worked with both Drupal and WordPress I've really come to appreciate the elegant simplicity and flexibily of the models used and would never go back to the other if I could help it. I really think that enforcing referential integrity at the database level would be a real mistake. If you want an enterprise class app that evidently doesn't use database-level foreign key constraints take a look at SAP.

#29 follow-up: @Denis-de-Bernardy
15 years ago

There are occasions where you some DB integrity is very desired. When dealing with accounting-related data, for instance: if you delete a user, you need to be sure that the query will fail if it's about to delete lines in your accounting.

Don't get me wrong, I'm not saying we should have a posts, pages, post_revisions, etc. tables. Just like you I like the fact that we're able to store attachments and cats and dogs in there.

When it comes to properly optimize the queries afterwards, joining posts with a table contains IDs from posts, comments, links and users won't fare as well as one on a table that contains IDs from a single one.

Now, if we introduce some kind of node ID so as to ensure no two pieces of data in posts, comments, links and users share the same numeric ID, that's a different story. Generic meta and term2node tables then makes perfect sense. But that's a much more massive change.

#30 @Denis-de-Bernardy
15 years ago

Expanding on the previous comment, the last thing want are joins that use indexes like (foo_id, vtype) or (foo_id, bar_id, vtype), where vtype is a varchar. We can use the vtype internally in php and simple queries, but at the end of the day, there needs to be a means to join tables using integers only for performance reasons.

#31 in reply to: ↑ 29 ; follow-up: @mikeschinkel
15 years ago

Replying to Denis-de-Bernardy:

There are occasions where you some DB integrity is very desired. When dealing with accounting-related data, for instance: if you delete a user, you need to be sure that the query will fail if it's about to delete lines in your accounting.

Oh, agreed; right tool for the right job. I didn't mention those use-cases because generally people don't use WordPress for accounting (though admittedly some do.)

When it comes to properly optimize the queries afterwards, joining posts with a table contains IDs from posts, comments, links and users won't fare as well as one on a table that contains IDs from a single one.

Agreed. My biggest concerns are making sure there is what I like to call "an escape valve"; i.e. the ability for a plugin developer to add metadata for any arbitrary object w/o having to add their own table or to overload an existing table in a non-relational manner. If the team thinks link meta should be in a specific table I'm fine with that but do want to see a table that can handle arbitrary objects.

Now, if we introduce some kind of node ID so as to ensure no two pieces of data in posts, comments, links and users share the same numeric ID, that's a different story. Generic meta and term2node tables then makes perfect sense. But that's a much more massive change.

I'd caution against a generic node ID; I think it adds too much complexity w/o enough benefits. I also think that there is huge value to established structure and changing established structure should only happen when very significant improvements would come with it.

Expanding on the previous comment, the last thing want are joins that use indexes like (foo_id, vtype) or (foo_id, bar_id, vtype), where vtype is a varchar. We can use the vtype internally in php and simple queries, but at the end of the day, there needs to be a means to join tables using integers only for performance reasons.

It's ironic. I came to WordPress/MySQL via a year of Drupal/MySQL but before that 12 years of ASP/VBScript/SQL Server where we *never* used anything except numeric keys (and with constraints in the database!) My first reaction to storing strings for keys was one of disgust but after working with both Drupal then WordPress I've really come to appreciate and even prefer them. I do understand the performance concerns but wonder if the gains for moving to all integers is really worth the loss of usability for working directly with the database which I do all the time (for development.)

Simply put: 15 years ago I was exactly certain the answers for all the rules about how database apps should be developed. Today I wonder if there really are any rules at all that don't fail in one use-case or another.

#32 in reply to: ↑ 31 ; follow-up: @Denis-de-Bernardy
15 years ago

Replying to mikeschinkel:

When it comes to properly optimize the queries afterwards, joining posts with a table contains IDs from posts, comments, links and users won't fare as well as one on a table that contains IDs from a single one.

Agreed. My biggest concerns are making sure there is what I like to call "an escape valve"; i.e. the ability for a plugin developer to add metadata for any arbitrary object w/o having to add their own table or to overload an existing table in a non-relational manner. If the team thinks link meta should be in a specific table I'm fine with that but do want to see a table that can handle arbitrary objects.

I think the real underlying issue is we need abstraction in the php. It's like, if the data cannot go into wp_posts (which could really be called wp_nodes), and ends up requiring a separate table, I'd personally suggest it should get its own meta table as well. So long as the needed php functions are around to manage this user-defined meta table, things should be fine.

Now, if we introduce some kind of node ID so as to ensure no two pieces of data in posts, comments, links and users share the same numeric ID, that's a different story. Generic meta and term2node tables then makes perfect sense. But that's a much more massive change.

I'd caution against a generic node ID; I think it adds too much complexity w/o enough benefits. I also think that there is huge value to established structure and changing established structure should only happen when very significant improvements would come with it.

I know... I've been playing around with the idea of a nodes table for over 10 years. We could actually do this by switching to UUID types instead of auto-incremented integers. It adds an extra benefit along the way, too: you can then sync sites that use shared databases.

Simply put: 15 years ago I was exactly certain the answers for all the rules about how database apps should be developed. Today I wonder if there really are any rules at all that don't fail in one use-case or another.

Might it be that you only ever needed to write queries that involved one or two joins? I ask, because my own 10 years of experience tell me a rather different story. I was a varchar index junkie until 2-3 years ago -- until I ended up needing to look into the performance of an app that was heavy on joins and that had more than a minute amount of data.

PGSQL ended up using merge join plans (= the slowest you can get) that involved zillions of rows. No amount of vacuum analyze, adding partial indexes, or clustering the data helped in any meaningful way. Re-engineering the DB in such a way that PG had means to fetch the needed rows using less joins, on integer/uuid fields only, bought the app was brought back to life.

My own bottom line was this: primary key (object_id, object_type) = avoid like the Plague if you've a large amount of data. If you can ensure that object_id is unique across the various tables using a single sequence (and thus the primary key all by itself), you can abstract as much as you want (and indeed, you don't need the hard-coded referential integrity). But it *must* be unique.

In case you're thinking WP will never get so much data that this will make a big difference, consider the zillions of blogs, users, posts, terms, links, and comments on wordpress.com.

#33 in reply to: ↑ 32 ; follow-up: @mikeschinkel
15 years ago

Replying to Denis-de-Bernardy:

I think the real underlying issue is we need abstraction in the php. It's like, if the data cannot go into wp_posts (which could really be called wp_nodes),

How very Drupalish of you. (I came to WordPress by way of Drupal. :)

and ends up requiring a separate table, I'd personally suggest it should get its own meta table as well. So long as the needed php functions are around to manage this user-defined meta table, things should be fine.

I think that is grand in concept and certainly follows the relational model but I think in practice following said rules results in increases in rigidity and complexity without equivalent increases in benefit.

Certainly for the most prominent data elements it makes sense to have them each in their own tables but I urge you to reconsider the edge cases as they are IMO handled much better by creating a repository to hold them all instead of requiring "yet another table" for what are often edge cases.

I know... I've been playing around with the idea of a nodes table for over 10 years.

Not to one-up you (well, okay maybe I am) I've been playing with them for 20 years... :) And for the most I've decided the complexity is not worth the pain.

We could actually do this by switching to UUID types instead of auto-incremented integers. It adds an extra benefit along the way, too: you can then sync sites that use shared databases.

UUID have a certain elegance but if you are concerned about performance as you were regarding "vtypes" then UUIDs have a whole series of issues:
http://www.mysqlperformanceblog.com/2007/03/13/to-uuid-or-not-to-uuid/

Might it be that you only ever needed to write queries that involved one or two joins?

LOL!!!!! Sorry, I've at times written queries that exceeded the number of tables that the then SQL Server parser could handle (max 32 tables.)

I've built some very complex database systems. I built a full ecommerce platform starting in 1995 for internal use that processed almost $100 million in revenue for my company over a 10 year period. I can promise you that almost everything I did had more than two joins.

And I've had to do a lot of work with tables in Drupal too. For example, here's a query I pulled from a project of mine (not exactly trivial):

SELECT
	vehicle.nid,
	vehicle.title,
	vehicle_content.teaser,
	vehicle_path.dst AS path,
	IFNULL(caption.field_photo_caption_value,image_node.title) AS photo_caption,
	files.filepath AS photo_file,
	IFNULL(photo_info.field_image_orientation_value,'landscape') AS photo_orientation
FROM
	node showcase
	INNER JOIN content_field_vehicle cfv ON showcase.nid=cfv.nid
	INNER JOIN node vehicle ON cfv.field_vehicle_nid=vehicle.nid
	INNER JOIN content_type_vehicle ctv ON vehicle.nid=ctv.nid
	INNER JOIN node_revisions vehicle_content ON vehicle.nid=vehicle_content.nid AND vehicle.vid=vehicle_content.vid
	INNER JOIN url_alias vehicle_path ON vehicle_path.src=CONCAT('node/',vehicle.nid)
	LEFT OUTER JOIN content_field_photo_caption caption ON vehicle.nid=caption.nid
	LEFT OUTER JOIN image_attach ia ON vehicle.nid=ia.nid
	LEFT OUTER JOIN node image_node ON ia.iid=image_node.nid
	LEFT OUTER JOIN content_type_image photo_info ON photo_info.nid=image_node.nid
	LEFT OUTER JOIN files ON image_node.nid=files.nid
WHERE
	showcase.type='showcase' AND
	vehicle.type='vehicle' AND
	ctv.field_vehicle_subtype_value in ('abstract','standalone') AND
	(files.fid IS NULL OR COALESCE(files.filename,'small')='%s') AND
	showcase.nid=%d AND
	vehicle.status=1

I ask, because my own 10 years of experience tell me a rather different story. I was a varchar index junkie until 2-3 years ago -- until I ended up needing to look into the performance of an app that was heavy on joins and that had more than a minute amount of data.

Funny how the pendulum swings one way and then the next. It is likely our cycles are just not in sync.

PGSQL ended up using merge join plans (= the slowest you can get) that involved zillions of rows. No amount of vacuum analyze, adding partial indexes, or clustering the data helped in any meaningful way. Re-engineering the DB in such a way that PG had means to fetch the needed rows using less joins, on integer/uuid fields only, bought the app was brought back to life.

No debate there.

My own bottom line was this: primary key (object_id, object_type) = avoid like the Plague if you've a large amount of data.

What's large? 1000? Or 100 million? I'll give you the latter, not the former.

For performance on large data sets, integer keys are the best. String are much slower, and the string the larger the indexes get and the more page loads from disk are required to seek into the index. But that's doesn't mean all indexes MUST be integers as I'll explain below.

Also I'm not a fan of composite keys in general, but I've also come to believe that every rule has reasonable exceptions. You can get around the composite key issue by concatonating (i.e. "273links_meta") but that's not elegant unless it can be done on a trigger.

OTOH, one of the biggest evils I have found in technology is premature optimization. To enforce a prohibition on composite indexes of (int,string) "at all costs" is a premature optimization for tables that may 99.9% of the time have less than 100,000 records. That's why I am suggesting them for use in edge cases. IF we find that one of those objects starts getting hundreds of thousands of records and many installations THEN it is time to optimize. But don't force all the complexity on the edge cases simple because there is a tiny chance on of them might become a non-edge case.

In case you're thinking WP will never get so much data that this will make a big difference, consider the zillions of blogs, users, posts, terms, links, and comments on wordpress.com.

I'm not at all assuming WP won't get that much data. I am instead assuming that not all data in WP will get so big that performance optimization concerns actually matter.

Look at it this way; if you don't give plugin and theme developers an appropriate place to store data they will instead store their data in far less appropriate places (i.e. an overloaded wp_options table, for example) and they will see far worse performance than the purity of your approach wants to ensure against. Remember that most people who use and program for WordPress do know have any experience with SQL and are even afraid of it so they won't be creating custom tables on their own, they will be overloading wp_options.

As a case in point: Nathan Rice. Nathan is a brillant theme developer but has said he doesn't understand database and won't go near SQL even though I know if he did he's easily smart enough to learn it in no time. There are *many* Nathans out that and you need to give them a tool they are comfortable using instead of giving them prohibition for fear that one day by some small chance they might misuse that tool.

#34 in reply to: ↑ 33 ; follow-up: @Denis-de-Bernardy
15 years ago

Replying to mikeschinkel:

Had a thought earlier today. FWIW, WP and WPMU seem slated to merge into a single code-base. So we're probably kicking a dead horse. Best I'm aware, terms and users are already shared across blogs. This would merely leave merging post_meta and a yet to be done comment_meta.

#35 in reply to: ↑ 34 @mikeschinkel
15 years ago

Replying to Denis-de-Bernardy:

I haven't worked with the latest WPMU; what's it's meta structure? Is it all one but wp_meta?

#36 follow-up: @Denis-de-Bernardy
15 years ago

Haven't tried MU in a while, but here's the current schema:

http://trac.mu.wordpress.org/browser/trunk/wp-admin/includes/schema.php

Re your prior question... To me, a "real" database means when the query engine decides that using an index can be faster than a seq scan. (Truth is, most databases are so small that the query optimizer will seq scan the bloody mess, since the table stats show it's faster to load all pages into the memory anyway.) Large is when having the correct indexes becomes meaningful (usually 50-100k rows). Very large means you're dead unless your indexes are optimized to fit your queries and data. That can be 100k to hundreds of millions of rows, depending on the app.

As for the query you highlighted, it looks very yucky, but I'm guessing that the showcase.nid=%d will prompt the optimizer to return a nested loop plan. Whereas the odds are the optimizer will be merge joining its way through temporary files on the swap if you try the same with showcase.nid IN (..) on a few dozens of thousands of rows.

#37 in reply to: ↑ 36 ; follow-up: @mikeschinkel
15 years ago

Replying to Denis-de-Bernardy:

Haven't tried MU in a while, but here's the current schema:

http://trac.mu.wordpress.org/browser/trunk/wp-admin/includes/schema.php

All I saw additional was sitemeta which is basically wp_options for a single site. Nothing to address comment meta, link meta or "whatever" meta.

Re your prior question... To me, a "real" database means when the query engine decides that using an index can be faster than a seq scan. (Truth is, most databases are so small that the query optimizer will seq scan the bloody mess, since the table stats show it's faster to load all pages into the memory anyway.) Large is when having the correct indexes becomes meaningful (usually 50-100k rows). Very large means you're dead unless your indexes are optimized to fit your queries and data. That can be 100k to hundreds of millions of rows, depending on the app.

With the exception of use of the term "real" :-) I agree.

As for the query you highlighted, it looks very yucky, but I'm guessing that the showcase.nid=%d will prompt the optimizer to return a nested loop plan. Whereas the odds are the optimizer will be merge joining its way through temporary files on the swap if you try the same with showcase.nid IN (..) on a few dozens of thousands of rows.

Didn't run it through the optimizer. Honestly I've never seen a single query (that I wrote) be a bottleneck on any web app I've been involved with, and I've dealt with 500k record tables before. I've instead seen performance concerns from calling too many queries (Drupal was horrible for doing that, WordPress is only marginally better), from running too much server code per page load (Drupal's bad again, WordPress only slightly better), consuming too much memory per page load (Drupal is awful at that, WordPress not so bad), and then client-side issues: Too many HTTP requests, image/swf/etc files too large, HTTP caching used poorly or not at all, application caching not used and poorly written Javascript and/or Flash hogging the client processor.

Can you actually point to several large WordPress blogs that are currently having performance problems explicitly because of existing database design?

#38 in reply to: ↑ 37 ; follow-up: @Denis-de-Bernardy
15 years ago

Replying to mikeschinkel:

Can you actually point to several large WordPress blogs that are currently having performance problems explicitly because of existing database design?

Yes. Try blogs that run the various related posts plugins that exist. They all have a different approach to fetching related posts. All of these approaches are just bad, from a performance standpoint, except the very worst of them all (using a MyISAM full text index).

Then, there are all sorts of horrifying (not to mention paid-for) memberships plugins. Most are really using WP Roles (another area where much needed change is needed). Each of those that I looked into were simply horrifying. Those that don't use WP roles are lacking when it comes to transactions/accounting management.

As you highlight, not many coders understand SQL, and many are prompt to treat db tables as huge arrays. For patented examples, try the Vanilla forum application. Enable whispers, tags, a plugins that add extra filters, and see it bring an NDB cluster to a crawl -- even with a few thousand rows.

#39 in reply to: ↑ 38 @mikeschinkel
15 years ago

Replying to Denis-de-Bernardy:

Ah, good point! While I am aware of some of the really nasty code in some even popular plugins I wasn't thinking about it above; I was instead thinking about how people who have some idea how to properly use a SQL database would implement.

One example to support your comments is a very popular plugin that stored relationships that should have been in a linking table in a serialized array in one of the two tables making certain queries that I needed to run impossible in SQL and for performance reasons impractical in PHP. I won't name it here but that plugin is very well-known.

I do, however, don't think these people misusing tables should be an argument against edge-cases using composite keys; instead I think it argues *for* it because if we don't give them the ability they will do much worse with their own designs.

#40 @scribu
15 years ago

  • Resolution set to wontfix
  • Status changed from reopened to closed

The cards have been dealt. We're using separate meta tables. See #2659.

But we do have a generic meta API, which makes it trivial to add new meta types.

#41 @westi
15 years ago

  • Milestone Future Release deleted
Note: See TracTickets for help on using tickets.