Make WordPress Core

Opened 11 years ago

Closed 9 years ago

#25043 closed enhancement (wontfix)

Add a pre_ hook to term_exists() to allow pre-query optimization

Reported by: dllh's profile dllh Owned by: boonebgorges's profile boonebgorges
Milestone: Priority: normal
Severity: normal Version: 3.6
Component: Taxonomy Keywords: has-patch
Focuses: Cc:


In benchmarking imports, I noticed that term_exists() is a very expensive operation. In a command line import of a WXR with 100 posts, 500 comments, 5 tags, 5 categories (and 5 tags and 5 cats associated with each post), term_exists() accounted for about 17% of the run-time (using qcachegrind for metrics). It's doing queries every time it runs, and sometimes more than one query.

In day-to-day usage, this probably isn't so awfully expensive that people are noticing and griping about it, but in potentially long-running processes like imports and bulk edits, it can be very significant.

The issue could be mitigated with a pre_ filter that can be used by plugins to, for example, fetch stored term data from a cache.

I tested this by applying a simple filter in term_exists() (patch forthcoming), adding the filter during import, and having it store/check term data in an array. This allowed term_exists() to just look up the term in the array vs. the database if we had already fetched it. With the filter and the array lookup added, the percentage of run time spent in term_exists() dropped to 0.10% and an import that consistently otherwise ran in about 2:50 ran consistently in about 2:20 (not really significant for a small import, but very significant as orders of magnitude climb).

I also did some benchmarking using wp-cli, to make sure that the fact that I was doing costly import stuff wasn't skewing the perceived benefit of the filter addition. To test, I did simple term deletion. Predictably, the results are less dramatic for simple, short operations than for long-running ones, but the performance increases I saw are not insignificant.

To test, I made a WXR with 3 categories and 50 posts. Each category was associated with every post, so my test was to measure the cost of term_exists() for a category belonging to 50 posts, both with and without the pre_ filter. Results were as follows (it was a very small set of tests, I'll grant):

With pre_ filter Without pre_ filter
run timeterm_exists %run timeterm_exists %
1.605s 0.42% 2.278s 11.53%
4.790s 0.15% 1.492s 8.64%
1.986s 0.22% 2.222s 7.44%

I present the data as tabular, but of course it doesn't make sense to assume that the uncached and cached cells correlate across a single row. Curiously, the middle run for each set of three tests was something of an outlier. Discarding those and averaging the run times, we get 1.795s for the operation with the filter and 2.25 for the operation without. The moral, then, is that even for short, single-task operations like deletion of a term, we stand to see fairly significant improvements in performance with the addition of a pre_filter.

The pre_ filter pattern occurs in other places in core, so this seems to me like a pretty common-sense, low-risk, potentially high-gain addition.

Attachments (1)

taxonomy.diff (611 bytes) - added by dllh 11 years ago.
Adds pre_term_exists filter to term_exists()

Download all attachments as: .zip

Change History (6)

11 years ago

Adds pre_term_exists filter to term_exists()

#1 @nacin
11 years ago

Given that we are likely to change term_exists() in #17689 and then very much change it when implementing the early phases of (#5809, #22023), I wonder if it will be very difficult to manage expectations of using this hook — it could make it more difficult for plugins using this hook to be backwards compatible.

#2 @yoavf
11 years ago

  • Cc yoavf added

#3 @SergeyBiryukov
10 years ago

  • Version changed from trunk to 3.6

#4 @wonderboymusic
9 years ago

  • Milestone changed from Awaiting Review to 4.4
  • Owner set to boonebgorges
  • Status changed from new to assigned

Tax Roadmap chatter in the comments

#5 @boonebgorges
9 years ago

  • Milestone 4.4 deleted
  • Resolution set to wontfix
  • Status changed from assigned to closed

term_exists() is a minefield that I think we ought to stay out of. Use get_term_by() to do precise lookups by name or slug, as necessary. (In the future, these calls will be cached. See #21760.) If you need to do a parent match, consider using get_terms(), which also caches its queries.

Ideally, term_exists() would be phased out over time. Putting a hook in there marries us to the function.

Note: See TracTickets for help on using tickets.