Make WordPress Core

Opened 11 months ago

Last modified 8 months ago

#25043 new enhancement

Add a pre_ hook to term_exists() to allow pre-query optimization

Reported by: dllh Owned by:
Milestone: Awaiting Review Priority: normal
Severity: normal Version: 3.6
Component: Taxonomy Keywords: has-patch
Focuses: Cc:


In benchmarking imports, I noticed that term_exists() is a very expensive operation. In a command line import of a WXR with 100 posts, 500 comments, 5 tags, 5 categories (and 5 tags and 5 cats associated with each post), term_exists() accounted for about 17% of the run-time (using qcachegrind for metrics). It's doing queries every time it runs, and sometimes more than one query.

In day-to-day usage, this probably isn't so awfully expensive that people are noticing and griping about it, but in potentially long-running processes like imports and bulk edits, it can be very significant.

The issue could be mitigated with a pre_ filter that can be used by plugins to, for example, fetch stored term data from a cache.

I tested this by applying a simple filter in term_exists() (patch forthcoming), adding the filter during import, and having it store/check term data in an array. This allowed term_exists() to just look up the term in the array vs. the database if we had already fetched it. With the filter and the array lookup added, the percentage of run time spent in term_exists() dropped to 0.10% and an import that consistently otherwise ran in about 2:50 ran consistently in about 2:20 (not really significant for a small import, but very significant as orders of magnitude climb).

I also did some benchmarking using wp-cli, to make sure that the fact that I was doing costly import stuff wasn't skewing the perceived benefit of the filter addition. To test, I did simple term deletion. Predictably, the results are less dramatic for simple, short operations than for long-running ones, but the performance increases I saw are not insignificant.

To test, I made a WXR with 3 categories and 50 posts. Each category was associated with every post, so my test was to measure the cost of term_exists() for a category belonging to 50 posts, both with and without the pre_ filter. Results were as follows (it was a very small set of tests, I'll grant):

With pre_ filter Without pre_ filter
run timeterm_exists %run timeterm_exists %
1.605s 0.42% 2.278s 11.53%
4.790s 0.15% 1.492s 8.64%
1.986s 0.22% 2.222s 7.44%

I present the data as tabular, but of course it doesn't make sense to assume that the uncached and cached cells correlate across a single row. Curiously, the middle run for each set of three tests was something of an outlier. Discarding those and averaging the run times, we get 1.795s for the operation with the filter and 2.25 for the operation without. The moral, then, is that even for short, single-task operations like deletion of a term, we stand to see fairly significant improvements in performance with the addition of a pre_filter.

The pre_ filter pattern occurs in other places in core, so this seems to me like a pretty common-sense, low-risk, potentially high-gain addition.

Attachments (1)

taxonomy.diff (611 bytes) - added by dllh 11 months ago.
Adds pre_term_exists filter to term_exists()

Download all attachments as: .zip

Change History (4)

dllh11 months ago

Adds pre_term_exists filter to term_exists()

comment:1 nacin11 months ago

Given that we are likely to change term_exists() in #17689 and then very much change it when implementing the early phases of http://make.wordpress.org/core/2013/07/28/potential-roadmap-for-taxonomy-meta-and-post-relationships/ (#5809, #22023), I wonder if it will be very difficult to manage expectations of using this hook — it could make it more difficult for plugins using this hook to be backwards compatible.

comment:2 yoavf11 months ago

  • Cc yoavf added

comment:3 SergeyBiryukov8 months ago

  • Version changed from trunk to 3.6
Note: See TracTickets for help on using tickets.