Add a pre_ hook to term_exists() to allow pre-query optimization
|Reported by:||dllh||Owned by:|
In benchmarking imports, I noticed that term_exists() is a very expensive operation. In a command line import of a WXR with 100 posts, 500 comments, 5 tags, 5 categories (and 5 tags and 5 cats associated with each post), term_exists() accounted for about 17% of the run-time (using qcachegrind for metrics). It's doing queries every time it runs, and sometimes more than one query.
In day-to-day usage, this probably isn't so awfully expensive that people are noticing and griping about it, but in potentially long-running processes like imports and bulk edits, it can be very significant.
The issue could be mitigated with a pre_ filter that can be used by plugins to, for example, fetch stored term data from a cache.
I tested this by applying a simple filter in term_exists() (patch forthcoming), adding the filter during import, and having it store/check term data in an array. This allowed term_exists() to just look up the term in the array vs. the database if we had already fetched it. With the filter and the array lookup added, the percentage of run time spent in term_exists() dropped to 0.10% and an import that consistently otherwise ran in about 2:50 ran consistently in about 2:20 (not really significant for a small import, but very significant as orders of magnitude climb).
I also did some benchmarking using wp-cli, to make sure that the fact that I was doing costly import stuff wasn't skewing the perceived benefit of the filter addition. To test, I did simple term deletion. Predictably, the results are less dramatic for simple, short operations than for long-running ones, but the performance increases I saw are not insignificant.
To test, I made a WXR with 3 categories and 50 posts. Each category was associated with every post, so my test was to measure the cost of term_exists() for a category belonging to 50 posts, both with and without the pre_ filter. Results were as follows (it was a very small set of tests, I'll grant):
|With pre_ filter||Without pre_ filter|
|run time||term_exists %||run time||term_exists %|
I present the data as tabular, but of course it doesn't make sense to assume that the uncached and cached cells correlate across a single row. Curiously, the middle run for each set of three tests was something of an outlier. Discarding those and averaging the run times, we get 1.795s for the operation with the filter and 2.25 for the operation without. The moral, then, is that even for short, single-task operations like deletion of a term, we stand to see fairly significant improvements in performance with the addition of a pre_filter.
The pre_ filter pattern occurs in other places in core, so this seems to me like a pretty common-sense, low-risk, potentially high-gain addition.