WordPress.org

Make WordPress Core

Opened 11 months ago

Closed 6 weeks ago

#38418 closed feature request (wontfix)

Add telemetry (aka usage data collection) as opt-in feature in core

Reported by: mor10 Owned by:
Milestone: Priority: normal
Severity: normal Version: 4.7
Component: General Keywords:
Focuses: Cc:

Description

Many discussions around changes, additions, and removal of features in WordPress core run into the same problem: We don't have the necessary data to know how end-users are interacting with the application and its features.

To solve this problem I propose the adoption of an opt-in telemetry feature in WordPress core that collects anonymized data on feature and functionality use. This is in line with what major software providers do, and it is a feature most users will be familiar with.

Implementation and activation

  • The opt-in selector for the feature should be surfaced on first install or when the site is updated to the first version of WordPress containing the feature is installed.
  • For new installs the opt-in question should appear on the 5-minute install page along with "Allow search engines to index the site" or similar.
  • For upgrades, the opt-in question should be revealed in a dedicated modal.
  • The feature should be disabled by default and the admin can make an active choice to participate.
  • The feature should be controllable at any time through a dedicated section under Settings->General
  • It is possible the best way to make users feel this feature is not a Trojan horse is to ship it as a plugin that auto-installs on opt-in and auto-uninstalls on opt-out.

Data collection

Some core data should always be collected, including but not limited to:

  • Number of themes and plugins installed
  • Frequency of use of specific views (Settings, Customizer, etc)
  • Current version
  • Update status
  • Locale (generalized to country)
  • Language
  • etc

In addition it should be possible to push, custom queries to activated users to test for specific interactions, as an example how many users click the Underline button in TinyMCE. I'm not sure exactly what the best approach here is, but this is one idea: The feature queries a centralized service on a weekly / monthly basis to get instructions on what type of data is currently being collected.

The decision on what data to be collected should be done by committee based on current active tickets that require user data.

Anonymity and transparency

A core requirement for the success of this feature is that data collection must be 100% anonymized. No data collected can be traced back to an individual user. Ideally the feature itself will be built in such a way that even accidental collection of personal data is impossible.

At any time, information about what data is being collected should be available to end-users both on a dedicated page on WordPress.org and through the setting in admin.

All data collected should be made public for scrutiny and use to ensure transparency and enable actual use.

Practical way forward

To prove the viability of this feature I propose a slow incremental deployment: Start with collection of certain uncontroversial datapoints like current language setting, number of themes and plugins, and one UI interaction that needs testing. Once this MVP has proven itself effective, a larger scale testing program can be shipped.

Change History (21)

This ticket was mentioned in Slack in #design by mor10. View the logs.


11 months ago

#2 @mor10
11 months ago

Related: #38419

#3 @karmatosed
11 months ago

A good thing to bring up and glad the discussion in design on Slack have triggered this. I do have a few thoughts on seeing this:

  • Data collection through tracking like this isn't the only data source we should look to. I often think we don't do enough surveys, interviews and other non tracking methods.
  • Above said, doing that and having tracking would be good, it's just how we do it and what impact that makes.
  • I think whatever data we have should be made anonymous and made public from start.

At this point I don't have any other thoughts but interested how this develops.

Version 0, edited 11 months ago by karmatosed (next)

#5 @gibrown
8 months ago

There were a number of discussions about this in early October. I think a good summary is that there are a lot of hurdles in the way and currently no one has time to work on it.

@azaozz and @iseulde probably also have opinions. Some of the original chats happened in core-editor also: https://wordpress.slack.com/archives/core-editor/p1475087414000621

The prototype we narrowed in on was:

Why this direction?

  • All technologies that have been deployed by .org systems folks before, and they just need to be adapted to work on .org servers.
  • Getting the systems deployed onto WP.org is a non-trivial step, and is unlikely to be a high priority. So we need a working prototype that can be proven and is likely to be easy to deploy.
  • Most of this will scale pretty well. Not to millions of users, but certainly tracking events for many thousands of users.
  • It's a step in the right direction

Ultimately, I think the biggest blocker is getting someone with the time, inclination, and persistence to work on this. Getting it deployed onto .org is the right thing to do eventually, but I suspect it will take quite a while.

#6 follow-up: @dd32
8 months ago

IMHO for the initial testing it should just be a MySQL backend, piggybacking on an 'official' plugin such as the WordPress Beta Tester plugin which is probably the ideal home for telementary to start with.
We can deploy a pixel-type delayed-data-collection to api.w.org reasonably easy with very little systems work.

Having a MVP running on w.org infrastructure is IMHO the best option forward, hosting it off-site brings in question over data privacy and whether it'll ever get deployed there. Deploying the MVP would also show how valuable the information captured is.
The obvious issue with using MySQL rather than jumping straight to logstash would be that it'd be limited in capacity to start with - but that would allow another backend to be designed and deployed transparently.
Data extraction from MySQL for a front-end may also be an issue, as it's obviously limited comparatively to logstash for the same data.

The biggest hurdle I see at present, would be defining what data is being captured and how.


It would also be valuable to see how our existing stats system can compliment or be replaced by the proposal here though.
I mention this as most of the stats from the original description are already tracked, just not exposed in any form.
The only new thing mentioned here is the Frequency of use of specific views (Settings, Customizer, etc) and transparency part (Which would still probably only be anonymised summaries, not exact data).

#7 @lukecavanagh
8 months ago

This seems related to ticket #16778.

Last edited 4 months ago by SergeyBiryukov (previous) (diff)

This ticket was mentioned in Slack in #core by lukecavanagh. View the logs.


8 months ago

This ticket was mentioned in Slack in #design by lukecavanagh. View the logs.


8 months ago

#10 in reply to: ↑ 6 @mor10
8 months ago

Replying to dd32:

It would also be valuable to see how our existing stats system can compliment or be replaced by the proposal here though.
I mention this as most of the stats from the original description are already tracked, just not exposed in any form.
The only new thing mentioned here is the Frequency of use of specific views (Settings, Customizer, etc) and transparency part (Which would still probably only be anonymised summaries, not exact data).

The main feature of my proposal is the ability for developers and researchers to add custom queries to the data collection when new data is required. This would enable us to learn relevant and time/place-specific data on user behavior and could open the door for real human-centric design decisions. This of course would have to be done while adhering to strict guidelines about privacy and anonymity.

#11 follow-ups: @matt
8 months ago

  • Milestone Awaiting Review deleted
  • Resolution set to wontfix
  • Status changed from new to closed

I realized this was discussed in October, but it is off the table for 2017 as it is not within the three focus areas. If after 2017 it is considered: There is no part of current or potential WP development that is being held back by the lack of this existing, as there are easy and current ways to answer questions with data to the extent it would inform our decisions.

#12 in reply to: ↑ 11 @mor10
8 months ago

Replying to matt:

I realized this was discussed in October, but it is off the table for 2017 as it is not within the three focus areas. If after 2017 it is considered: There is no part of current or potential WP development that is being held back by the lack of this existing, as there are easy and current ways to answer questions with data to the extent it would inform our decisions.

I respectfully disagree. Quantitative user testing falls squarely within the Customizer focus area and is vital for future development of the Customizer. I would argue since the release of the Customizer some years back, it has gone through a multi-year large-scale quantitative user test with incremental tweaks and improvements. This is in line with standard agile development. At this juncture, the Customizer can be considered mature, and moving a mature solution forward requires hard data on usage, use cases, and user needs. This goes beyond standard user testing to large scale data collection, which is what this ticket aims at addressing.

Just within the scope of the Customizer there are large blind spots that could be illuminated through data gathering. Some examples:

  • Are users successful in adding logos to their sites or do they often make more than one attempt before getting the result they want? If the latter, what exactly are they doing? Uploading the same image multiple times, experimenting with cropping, uploading formats not supported (SVG in particular)?
  • How often do admins change menus, and what kind of changes do they undertake? Do they use features like "Open in new tab"? Do they drag-and-drop or use nav buttons? What is their flow (add items, then organize, or add single item, organize, add another item)?
  • How many users change the default colors of the selected theme, and do they use the color picker or add a specific color?
  • Are there items in the Customizer that see very little use (below 5%)?
  • Are there items in the Customizer that are often interacted with but never saved?
  • How often is the hide/show feature in the Customizer sidebar used?
  • How often is the "responsive preview" feature in the Customizer used?
  • this list could go on for a long time

Even with extensive qualitative (in-person) user testing, we would not be able to get actionable data on these types of questions. Only large data sets can provide a reliable answer and uncover surprising results. Considering the scale and reach of WordPress' user base, it is not enough to say the community provides sufficient feedback or we know what people want. The "community" in this context is a small and non-representative subset of total users with a particular interest in the application and its function. The average user is not someone who takes part in the community but rather uses the application as any other tool without active participation and feedback procedures. These are the 80% we build WordPress for, and these are the people we need data from.

This ticket was mentioned in Slack in #core by jeffpaul. View the logs.


7 months ago

This ticket was mentioned in Slack in #core by desrosj. View the logs.


6 months ago

#15 @davetgreen
6 weeks ago

Given that Gutenberg now has opt-in usage data collection (https://make.wordpress.org/core/2017/08/06/opt-in-usage-tracking-in-gutenberg/), can this discussion be revisited with a view toward consideration in 2018?

@matt is there any chance this ticket could be re-opened to provoke further discussion instead of it being closed and hidden from view? We're already half way into Q3 2017 so surely now is as good a time as any to re-visit the benefits of implementing @mor10 suggestion. :)

#16 follow-up: @matt
6 weeks ago

I think it's a terrible idea for Gutenberg too, I doubt that anything actionable or useful will come of it that couldn't be obtained by non-data-collecting means.

#17 @mor10
6 weeks ago

@matt I've seen how important telemetry is for data driven design and I have yet to see you provide any strong arguments against it beyond "we already have data" which begs the question "where is that data?"

You argument for closing this ticket was it was "not within the current focus areas." That is no longer the case as @davetgreen points out. I would like to see this ticket reopened so we can have an open discussion about telemetry. I'm especially interested to know your arguments against the proposals, especially why "it's a terrible idea."

#18 @jcasabona
6 weeks ago

I'm not a regular in these parts but I'm pretty baffled as to why this isn't even up for consideration. Knowing how people actually use WordPress will do nothing but improve the experience for everyone. We no longer need to make decisions based on "I think." We can make them based on, "This is what we know."

Apple is constantly praised for the excellent UX. That's not because Jony Ive is really good at guessing. It's because they have the data. Why can't WordPress be the same way?

#19 in reply to: ↑ 16 @markzahra
6 weeks ago

Replying to matt:

I think it's a terrible idea for Gutenberg too, I doubt that anything actionable or useful will come of it that couldn't be obtained by non-data-collecting means.

What are the other "non-date collecting means", if I may ask?

Although I'm no expert, and just like @jcasabona I'm not a regular in these parts, I agree with the idea above and have personally seen it put to good use on a smaller scale. I've seen this be rejected twice now, but with no alternatives being proposed and no valid reasons being presented. "I doubt that anything actionable or useful will come of it..." is more of a personal opinion rather than that of an open-source community.

#20 in reply to: ↑ 11 @mor10
6 weeks ago

  • Resolution wontfix deleted
  • Status changed from closed to reopened

Replying to matt:

The reason for closing this ticket is no longer valid:

it is off the table for 2017 as it is not within the three focus areas.

As @davetgreen points out above (comment:15), Gutenberg has officially adopted telemetry in the beta plugin, placing telemetry squarely inside the editor focus area. When Gutenberg merges with core beta, the question of whether telemetry should be part of that merge will come up and will be precedence setting. Keeping the ticket closed hinders discussion of the merits and issues around telemetry and leaves an important debate to be decided by committee rather than the community. It also prevents discussion from taking place in one centralized location resulting instead in various conversations happening on GitHub, Twitter, and various blogs.

I'm reopening this ticket to surface the conversation and open the discussion to the community. May its status remain open until the community decides whether there is a future for telemetry in WordPress.

#21 @pento
6 weeks ago

  • Resolution set to wontfix
  • Status changed from reopened to closed

Usage tracking has been removed from Gutenberg, the reason for closing this ticket remains valid.

With that in mind, I'm re-closing this ticket, though everyone is welcome to continue discussing it here, as is the case with any closed ticket. If you're after an idea for when a good time to reopen it would be, getting a lead developer to post their vision of a way forward would be an excellent measure.

In the mean time, the 3 focus areas remain this year's primary goals.

On a personal note, I know how tempting it can be take ideas you love, and try to fit them into one of the focus areas - I have to pull myself up on that as much as anyone. But the only way the focus concept can work is if we remain focussed, and don't allow ourselves to be sidetracked. Gutenberg, Customiser, and REST API are big ideas, they touch most of WordPress. It's easy to find things that could fit into them, and hard to choose the things that do.

Finally, I'd like you all to take a moment to read James' comment on removing tracking from Gutenberg. There's no need to reply, but we've all seen the mini storm that spun up around this in the last few days. Remember that there are real people on the other side of the screen. What we write in comment and tweet boxes has a real impact on people, so please consider that before pressing the send button.

We're all in this to make WordPress the best it can be, so let's try to make a pleasant journey for our friends, colleagues, and co-voyagers, 'kay? :-)

Note: See TracTickets for help on using tickets.