Make WordPress Core

Opened 3 months ago

Last modified 5 days ago

#38418 new feature request

Add telemetry (aka usage data collection) as opt-in feature in core

Reported by: mor10 Owned by:
Milestone: Awaiting Review Priority: normal
Severity: normal Version: 4.7
Component: General Keywords:
Focuses: Cc:


Many discussions around changes, additions, and removal of features in WordPress core run into the same problem: We don't have the necessary data to know how end-users are interacting with the application and its features.

To solve this problem I propose the adoption of an opt-in telemetry feature in WordPress core that collects anonymized data on feature and functionality use. This is in line with what major software providers do, and it is a feature most users will be familiar with.

Implementation and activation

  • The opt-in selector for the feature should be surfaced on first install or when the site is updated to the first version of WordPress containing the feature is installed.
  • For new installs the opt-in question should appear on the 5-minute install page along with "Allow search engines to index the site" or similar.
  • For upgrades, the opt-in question should be revealed in a dedicated modal.
  • The feature should be disabled by default and the admin can make an active choice to participate.
  • The feature should be controllable at any time through a dedicated section under Settings->General
  • It is possible the best way to make users feel this feature is not a Trojan horse is to ship it as a plugin that auto-installs on opt-in and auto-uninstalls on opt-out.

Data collection

Some core data should always be collected, including but not limited to:

  • Number of themes and plugins installed
  • Frequency of use of specific views (Settings, Customizer, etc)
  • Current version
  • Update status
  • Locale (generalized to country)
  • Language
  • etc

In addition it should be possible to push, custom queries to activated users to test for specific interactions, as an example how many users click the Underline button in TinyMCE. I'm not sure exactly what the best approach here is, but this is one idea: The feature queries a centralized service on a weekly / monthly basis to get instructions on what type of data is currently being collected.

The decision on what data to be collected should be done by committee based on current active tickets that require user data.

Anonymity and transparency

A core requirement for the success of this feature is that data collection must be 100% anonymized. No data collected can be traced back to an individual user. Ideally the feature itself will be built in such a way that even accidental collection of personal data is impossible.

At any time, information about what data is being collected should be available to end-users both on a dedicated page on WordPress.org and through the setting in admin.

All data collected should be made public for scrutiny and use to ensure transparency and enable actual use.

Practical way forward

To prove the viability of this feature I propose a slow incremental deployment: Start with collection of certain uncontroversial datapoints like current language setting, number of themes and plugins, and one UI interaction that needs testing. Once this MVP has proven itself effective, a larger scale testing program can be shipped.

Change History (9)

This ticket was mentioned in Slack in #design by mor10. View the logs.

3 months ago

#2 @mor10
3 months ago

Related: #38419

#3 @karmatosed
3 months ago

A good thing to bring up and glad the discussion in design on Slack has helped lead to this. I do have a few thoughts on seeing this:

  • Data collection through tracking like this isn't the only data source we should look to. I often think we don't do enough surveys, interviews and other non tracking methods.
  • Above said, doing that and having tracking would be good, it's just how we do it and what impact that makes.
  • I think whatever data we have should be made anonymous and made public from start.

At this point I don't have any other thoughts but interested how this develops.

Last edited 3 months ago by karmatosed (previous) (diff)

#5 @gibrown
7 days ago

There were a number of discussions about this in early October. I think a good summary is that there are a lot of hurdles in the way and currently no one has time to work on it.

@azaozz and @iseulde probably also have opinions. Some of the original chats happened in core-editor also: https://wordpress.slack.com/archives/core-editor/p1475087414000621

The prototype we narrowed in on was:

Why this direction?

  • All technologies that have been deployed by .org systems folks before, and they just need to be adapted to work on .org servers.
  • Getting the systems deployed onto WP.org is a non-trivial step, and is unlikely to be a high priority. So we need a working prototype that can be proven and is likely to be easy to deploy.
  • Most of this will scale pretty well. Not to millions of users, but certainly tracking events for many thousands of users.
  • It's a step in the right direction

Ultimately, I think the biggest blocker is getting someone with the time, inclination, and persistence to work on this. Getting it deployed onto .org is the right thing to do eventually, but I suspect it will take quite a while.

#6 @dd32
7 days ago

IMHO for the initial testing it should just be a MySQL backend, piggybacking on an 'official' plugin such as the WordPress Beta Tester plugin which is probably the ideal home for telementary to start with.
We can deploy a pixel-type delayed-data-collection to api.w.org reasonably easy with very little systems work.

Having a MVP running on w.org infrastructure is IMHO the best option forward, hosting it off-site brings in question over data privacy and whether it'll ever get deployed there. Deploying the MVP would also show how valuable the information captured is.
The obvious issue with using MySQL rather than jumping straight to logstash would be that it'd be limited in capacity to start with - but that would allow another backend to be designed and deployed transparently.
Data extraction from MySQL for a front-end may also be an issue, as it's obviously limited comparatively to logstash for the same data.

The biggest hurdle I see at present, would be defining what data is being captured and how.

It would also be valuable to see how our existing stats system can compliment or be replaced by the proposal here though.
I mention this as most of the stats from the original description are already tracked, just not exposed in any form.
The only new thing mentioned here is the Frequency of use of specific views (Settings, Customizer, etc) and transparency part (Which would still probably only be anonymised summaries, not exact data).

#7 @lukecavanagh
6 days ago

This seems related to ticket 16778.


This ticket was mentioned in Slack in #core by lukecavanagh. View the logs.

5 days ago

This ticket was mentioned in Slack in #design by lukecavanagh. View the logs.

5 days ago

Note: See TracTickets for help on using tickets.