WordPress.org

Make WordPress Core

Opened 7 years ago

Closed 3 years ago

#5066 closed enhancement (maybelater)

Anonymize update checking

Reported by: zamoose Owned by:
Milestone: Priority: normal
Severity: normal Version: 2.9
Component: Administration Keywords: needs-patch
Focuses: Cc:

Description

As per this thread, we ought to allow for user opt-out of sending their blog_url and plugin info unencrypted, over the wire, unbidden, at least without a stated Usage Policy, notification and EULA in place.

Attachments (2)

updateoption.diff (2.7 KB) - added by intoxination 7 years ago.
updateoption-2.diff (2.7 KB) - added by intoxination 7 years ago.

Download all attachments as: .zip

Change History (47)

comment:1 johnbillion7 years ago

I know it's stupidly close to the line, but I think it would be a good move to delay 2.3 and get this into 2.3. Putting it in 2.4 or even 2.3.1 will be too late for the tinfoil hat brigade.

Note that I couldn't give a shit. I'm thinking of the PR here.

comment:2 johnbillion7 years ago

Actually scrap that last comment.

A good move would be to delay 2.3 by a day or two in order to prepare a bullet-proof press release explaining in detail what the update notification system is, how it works, and what information will be sent by it (for both the core updates and plugin updates). The aim of it being to reassure people that this is a good thing™. Because it is.

intoxination7 years ago

comment:3 intoxination7 years ago

Attached a quick patch. Basically if the option 'update_plugins' is not set then no checks will be done. The nag screen at the top asks the user to select if they want to opt in or out, with a link to the privacy screen. Once they select either yes or no, the nag screen goes away.

I would like to have some other eyes check over the wording real quick.

intoxination7 years ago

comment:4 intoxination7 years ago

  • Keywords has-patch added

OOPS - I attached the wrong patch (guess I should clean the desktop). -2 is better.

comment:5 intoxination7 years ago

  • Milestone changed from 2.4 to 2.3

Changing to 2.3 in hopes of getting this in.

comment:6 Viper007Bond7 years ago

  • Keywords 2nd-opinion added
  • Type changed from defect to enhancement
  • Version set to 2.3

-1

Checking for core updates (and possibly plugins) shouldn't be able to be disabled without a plugin. It's just too important.

+1 for an option to adjust sending statistics data (blog URL, etc.).

comment:7 zamoose7 years ago

There needs to be notification/disclosure of the information being sent, though, which was the initial thrust of my ticket.

comment:8 follow-up: chmac7 years ago

I support adding 3 options:

  • Option name "Updates"
  • Option choice 1: "WordPress should check for updates automatically and send my blog url."
  • Option choice 2: "WordPress should check for updates automatically but *not* send my blog url."
  • Option choice 3: "WordPress should not check for updates."
  • Then the text: "For more on how this information is used, see the Privacy Policy" - link to the policy online.

If there's good support on wp-hackers I'll write a patch.

comment:9 in reply to: ↑ 8 ; follow-up: docwhat7 years ago

Replying to chmac:

May I suggest that you create two radio buttons?

  • "WordPress should check for updates automatically"
  • "WordPress should not check for updates"

And

  • "WordPress should advertise itself via User-Agent"
  • "WordPress should not advertise itself"

This is because there are lots of User-Agent strings with data that some blog owners don't want transmitted. See bug #5065

Even better would be if there was an option to take "WordPress/$wp_version" out of everything.

If you do the options part, I'd be willing to go around and locate all the places that WordPress leaks it's name and version number and change them depending on the option.

Ciao!

comment:10 arnee7 years ago

+1 for just one single checkbox:

[X] Check for updates of WordPress and Plugins <a href="#">Privacy Policy</a>

All other things should be adjusted via plugins.

comment:11 docwhat7 years ago

I have added a patch to bug #5065 and added a new bug #5085 for the various generator strings.

The patch in bug #5065 would allow exactly as arnee suggests: turning off the user-agent via a plugin.

Bug #5085 is a little more complicated to make it plugin-patchable.

Ciao!

comment:12 in reply to: ↑ 9 ; follow-up: johnbillion7 years ago

I'm on the fence about an option to disable update checking, but I don't think it's really necessary. Those who want to disable it can install a relevant plugin - that's what the plugin system is there for, to enable control of options and functionality that a smaller subset of people who use WordPress might want.

To quote Westi:

One of the core design ideas for WordPress is that we don't introduce options lightly. The moment you think of making a feature optional you challenge the argument for introducing the feature in the beginning.

This is exactly why WordPress has a very extensive plugin API. Write a plugin so people can change their user-agent and be done with it.

comment:13 zamoose7 years ago

There needs to be a way to allow those who are concerned about this issue to ensure that nothing gets sent the first time they load up the WP admin and plugin admin pages. If you have to go to the plugin page (at which point, if my understanding is correct, it will fire off an update attempt) in order to activate the plugin, you've already sent the information at least once. This, to be quite frank, is unacceptable and will preclude users concerned about this issue from using WordPress.

There needs to be an opt-in solution from the beginning, IMNSHO.

comment:14 in reply to: ↑ 12 arnee7 years ago

Replying to johnbillion:

I'm on the fence about an option to disable update checking, but I don't think it's really necessary. Those who want to disable it can install a relevant plugin - that's what the plugin system is there for, to enable control of options and functionality that a smaller subset of people who use WordPress might want.

You are right, but I don't think that every future release will state in the announcement that WP will look for updates and "phone home". An option to enable or disable the functionality with a link to an explanation or the privacy policy would avoid the "Whaaat? WP phones home? Why didn't I know???" effect because I think that most of the users click through the options pages first. Or include this checkbox in the installation where the other privacy options are set too.

Of course every user may disable the update-check via plugin, but why should users install or look for a plugin if they don't know that this update-feature actually exists?

One checkbox, "enable update-check", at the privacy options and the installation page won't hurt, or?

comment:15 f00f7 years ago

I noticed that there is negative response to that feature mainly for one reasons:
You have missed communicating it. If you transmit data, make the users aware of it, tell them what the data is used for and give them a choice.
I am perfectly ok with sending a list of all my plugins to wp.org, but I do not want to expose which of them are active.
And, as a side note, is it really neccessary to transmit the description of every plugin? Just thinking about performance here.

Solving this via a plugin is a good idea, but not that easy as
a) you have to go to plugins page to activate it, thus the update-check will be done once and
b) plugins have little access to what's going on in wp_update_plugins().

Furthermore I like the idea of having a function wp_user_agent() that provides Wordpress/$wp_version but can be overridden by a plugin.
And for the really paranoid I could imagine an option to remove/obfuscate the blog url in http requests.

My plugin (http://wordpress.org/extend/plugins/anonymous-wordpress-plugin-updates/) replaces wp_update_plugins with a version that does not transmit plugin-desc etc. and the list of active plugins. It also replaces WP version and blog url.
I also offer making a patch out of if, but I'd rather wait and see how the discussion goes on.

comment:16 f00f7 years ago

  • Keywords privacy added

Just found #5065 which addresses the user agent part.

comment:17 docwhat7 years ago

How about instead of enabling/disabling the auto-update-notification
feature, we enable/disable the sending of the information that bugs
people?

E.g.:

When receiving and sending services 
(RSS, API, ATOM, etc.) WordPress should:
 [x] Identify as WordPress/2.3 and the blog
     URL when appropriate
 [ ] Be anonymous

Selecting "Be anonymous" would replace the user-agent string with something generic (bug #5065) and remove the generator strings (bug #5085).

Ciao!

comment:18 follow-up: johnbillion7 years ago

I'm getting bored of this now, but let me just say this:

Selecting "Be anonymous" would replace the user-agent string with something generic (bug #5065) and remove the generator strings (bug #5085).

Having this option will mislead users into thinking that by using it,
their requests to the update server will be anonymous, when in fact
they are not and cannot be.

comment:19 in reply to: ↑ 18 docwhat7 years ago

Replying to johnbillion:

I'm getting bored of this now, but let me just say this:

Selecting "Be anonymous" would replace the user-agent string with something generic (bug #5065) and remove the generator strings (bug #5085).

Having this option will mislead users into thinking that by using it,
their requests to the update server will be anonymous, when in fact
they are not and cannot be.

Well, the verbage is easily changed. But I meant the concept more than the verbage.

Instead of "Be anonymous" how about "Don't identify" or something like that. Less stigma associated and makes the choice that the developers dislike a negative choice (and therefore influences the user to not choose it).

comment:20 ffemtcj6 years ago

  • Milestone changed from 2.5 to 2.6

comment:21 matt6 years ago

  • Resolution set to worksforme
  • Status changed from new to closed
  • Summary changed from Make update checking more consumer-friendly to Anonymize update checking

We promoted a plugin about this in the release announcement for 2.3, but it has had very little adoption:

http://wordpress.org/extend/plugins/disable-wordpress-plugin-updates/stats/

Rather than bumping the milestone, let's just close it until there are compelling new arguments.

comment:22 lloydbudd6 years ago

  • Milestone 2.6 deleted

comment:23 jeffr05 years ago

Can someone take a look at this thread http://lists.automattic.com/pipermail/wp-hackers/2009-December/029083.html and see if there have been any more compelling arguments made or is this just one of those topics that's like a circle, never ending?

comment:24 zamoose5 years ago

  • Cc zamoose added
  • Milestone set to 3.0
  • Resolution worksforme deleted
  • Status changed from closed to reopened
  • Version 2.3 deleted

This ought to be reopened for discussion as per this thread on wp-hackers

comment:25 docwhat5 years ago

So the information sent is:

  • The version WordPress you are using - we need this to be able to give you the correct response
  • The versions of PHP and mysql you are using - we need these to be able to make sensible decisions about which versions we should support
  • The locale you are using - so we can offer you the update in your language
  • The url of the site doing the checks - so we can differentiate between different clients in order to aggregate the version numbers correctly.
  • All plugins, active and inactive, in your plugins directory

Which means that this is one-stop shopping for someone who wants to
exploit a wordpress vulnerability.

Which means that if someone breaks into wordpress.org and gets this
information he/she will be able to target exactly which boxes have
which versions of Wordpress, mysql, php, and plugins.

If I was looking to mass-exploit wordpress boxes, this is exactly what
I'd do.

I'd like to propose the following:

  • Use a different identifier instead of URL:
    • Old wordpress installations, will work the same.
    • New ones will have an identifier in the request saying they are using the new update check method.
    • The new check method will request an ID on the first check. This ID will be stored in the wordpress installation for use in the future.
    • In the future, this ID will be used instead of the URL.
    • If the user checks a check box on the privacy page ("Don't send stats") then instead of in ID, a token is sent that tells wordpress.org not to track this request. This prevents bogus IDs from collecting. In addition, non-important information (PHP version, MySQL version, etc.) won't be sent.
  • Modify wordpress.org to stop tracking any old wordpress installations (ones that use URLs as identifiers).
  • Add a description on the privacy page explaining what information wordpress.org collects, for how long it is saved, and why this is useful.

Ciao!

comment:26 docwhat5 years ago

Yarg...Editing Wordpress and Trac at the same time is hard.

The list at the top should look like this:

  • The version WordPress you are using - we need this to be able to give you the correct response.
  • The versions of PHP and mysql you are using - we need these to be able to make sensible decisions about which versions we should support
  • The locale you are using - so we can offer you the update in your language * The url of the site doing the checks - so we can differentiate between different clients in order to aggregate the version numbers correctly.
  • All plugins, active and inactive, in your plugins directory

comment:27 follow-up: Denis-de-Bernardy5 years ago

It also collects theme information, and active plugin information.

Suggesting we close this as wontfix once and for all, personally. I totally hate the idea of sending the site url, but I completely understand the reasoning behind almost all, if not every single bit, of this information:

  • WP version, Plugin version, and Theme version are all needed for update checks
  • Plugin data is needed to identify the plugins; some plugins have similar names, and their directory/file name, author, url, etc. are all needed in order to discriminate, say, My Super Plugin from author A, from My Super Plugin from author B.
  • Theme data is needed to identify the themes, for the very same reason
  • I doubt that a plugin's description gets stored anywhere on wp.org, so maybe we can get rid of that in order to remove network clutter

The url might be questionable... But it seems to benign. And I can see reasons for collecting it. As well as reasons for not wanting to change the unique identifier -- namely, blogs would endup up changing ID when upgrading.

comment:28 Denis-de-Bernardy5 years ago

Just adding to my prior comment. I think this whole privacy issue would be a lot less of an issue if the data's key findings (WP version usage, theme usage with version, plugin usage with version, etc.) were made available on a page from wp.org.

comment:29 in reply to: ↑ 27 filosofo5 years ago

Replying to Denis-de-Bernardy:

  • WP version, Plugin version, and Theme version are all needed for update checks

Not so. A client doesn't need to give its own version to request and receive the current version of WP core, plugins, or themes. In fact a request sans self-identification would take up less bandwidth and the response would involve fewer CPU cycles on the part of the WP.org server. So not only is it not technically necessary, there are technical reasons (i.e. outside the moral issue of privacy) to prefer it.

And to really save bandwidth and CPU cycles, we could push out update info with something like PubSubHubbub or RSSCloud.

comment:30 follow-up: strider725 years ago

Why can't we simply do an MD5 hash of the URL? We need a unique identifier, but do we need to know the actual URL? I can't see any reason why we do.

One line of code in WP to resolve any privacy issues I can imagine....

comment:31 in reply to: ↑ 30 ; follow-up: Denis-de-Bernardy5 years ago

Replying to strider72:

Why can't we simply do an MD5 hash of the URL? We need a unique identifier, but do we need to know the actual URL? I can't see any reason why we do.

I can see one good reason -- stats. md5($url) cannot go through parse_url() in order to eliminate dup data from the same domain: two sites or more on subdomains, multiple sites in folders, etc.

comment:33 in reply to: ↑ 31 filosofo5 years ago

Replying to Denis-de-Bernardy:

I can see one good reason -- stats. md5($url) cannot go through parse_url() in order to eliminate dup data from the same domain: two sites or more on subdomains, multiple sites in folders, etc.

If the problem is eliminating duplicate entries per domain, then the solution is simply to parse the URL before hashing it.

comment:34 zamoose5 years ago

Reposting a forum entry I penned over at WPTavern. Reposting it here for the "if it isn't in Trac, it doesn't exist" factor.

A common complaint from those in favor the the status quo is: what harm could there possibly be in storing your URL? Allow me to give two examples to illustrate potential for harm.

Military/Gov't Contractor blogging environment
I don't know how many of you have worked for a military or US gov't contractor in the past, but one of their concerns when it comes to information disclosure is the revelation of employee names to third parties. Foreign intelligence officers (FIOs) from e.g. Syria, North Korea, China, Iran, etc. are always on the lookout for employee names/identifiable information so that they can potentially exploit that person as an intelligence asset, either via compromising their home (or work) machines with spear-phishing attacks or via direct physical surveillance. The vast majority of proprietary/classified information leaks come not through direct technological hacks/cracks but through social engineering and careful use of human factors (see Mitnick, et al.)

In the event that a contractor is using WordPress internal to the company and employing custom themes or plugins whose authors (as is a good practice in the WP community) have identified themselves, the .org site is potentially storing said information in a way directly tied to the company. If WordPress.org inadvertantly discloses the information in question, either through human error on their part or through a security breach by FIOs, you now have exploitable humint on discrete employees working for said contractors.

If it was widely known, this fact alone is enough to make most contractors' IT departments ban WordPress outright. At the very least, they will disable update checking in its entirety. Neither of these situations is particularly a good thing.

Political Dissident blogging environment
I referred to FIOs above. For hostile/totalitarian regimes, FIOs generally serve two purposes: exploitation of information from gov't interests and observation/intimidation of political dissidents.

As in the gov't contractor example I gave above, if dissidents are using any custom themes or plugins, their information could be accidentally disclosed to FIOs which could either lead to direct physical danger for themselves (if they are living within the borders of an oppressive state) or, in the case of ex-pats with families that remain behind, danger for their families still under the sway of these states.

Or, say a plugin was written that made your stylesheet go green in support of the protestors in Iran. If WP.org's data was compromised, Iranian IOs would have access to a comprehensive list of folks running said plugin (and a list of everyone using the Farsi locale) and thus be able to narrow their intelligence-gathering and intimidation efforts based solely upon an installed plugin.

If you think I'm being paranoid here, please see the recent examples of Egypt and Cuba jailing political bloggers and the Iranian intelligence services threatening expats (http://online.wsj.com/article/SB125978649644673331.html?mod=WSJ_hpp_LEFTTopStories)

(I'm not even going to get into the area of compulsory legal disclosure of the info -- i.e., a third party brings suit or attempts to get law enforcement to retrieve .org's data based upon discovery or a warrant.)

It's not "just" a URL in these situations, it's real people whose real lives stand to be substantively affected in the event of a disclosure, unintentional or otherwise.

comment:35 zamoose5 years ago

There are two additional points to consider:

  1. Matt has stated that he will re-open the retention of the info (in specific and in aggregate) w/Automattic's legal counsel "in January or February" (cite).
  2. After Thursday's IRC meetup, Westi reverted the new-for-2.9 default post to the old "Hello World" boilerplate in r12366 as the implementation of Critical-First-Time-Info-As-Post was apparently confusing the tar out of newbies.

I'd like to suggest that, as a portion of the disclosure policy, we include verbiage in version 3.0+ that either explicitly spells out the policy or makes note of the URL on the .org and lets people know why it's important. Also, links to the disabling/anonymizing plugins in said new default post language or in the .org's privacy policy page. In combination, I believe this would answer 99% of the objections currently held (including mine).

comment:36 zamoose5 years ago

Self correction: "Include verbiage in" the new implementation of the default post/first time info. I think the changes that were reverted were a great start, it's just the implementation as the first default post was rather opaque.

comment:37 intoxination5 years ago

As one of the originals on this ticket I would like to throw in the issue of the information being sent. I just saw Mark Jaquith's Twitter post stating the items sent when checking for an update - including plugin/theme author's name. When you think about that, it does present another issue. For example -- John Smith maybe doing some little anonymous blogging. He wanted some simple feature on his blog that there wasn't a plugin for, so he header over to wp-hackers and someone said "hey just do this quick filter". Now he uses one of the plugins packaged with WP and grabs the top comment section that includes this information and replaces it all with is own, thinking that no one will ever see it. Of course no one should, but the fact that WP is sending it out means there is always a chance.

Adding also that URL's are not always public. WP is a wide use application, which speaks to the power of it, and in some cases people use it on private corporate intranets complete with their own DNS server and that gives them the power of having internal URLs that aren't for the public's eye.

comment:38 follow-up: chmac5 years ago

zamoose, you raise some interesting scenarios where privacy becomes very important. I think because your scenarios are all based on wp.org being compromised, it's unlikely to hold much sway with the decision makers.

I see a fundamental issue here. For early version of WordPress, it was hard to figure out how many times the software was being used. Now with the update mechanism "phoning home", wp.org has a list of every site running their software. There's huge bragging rights, stats geekiness, and all sorts of other benefits to this.

From this perspective, it seems natural that small, obscure, and unlikely privacy concerns would be quashed by the desire to track the number of installs.

The crux for me is that WP phones home immediately, so there's no way to install a plugin to stop that initial call in. One option would be to link the phone home feature, or maybe anonymize it, based on the "Announce this blog to the world" option during install. Honestly though, I think it's unlikely the core devs will do that. Again, it's upside versus downside. Their interest is better served by all sites phoning home.

I think the most sensible scenario all round is a fork. I can see space for a few forks based on different use cases. A privacy focused fork, a security focused fork, and so on. I'd consider them WP flavours. I think using subversion, quilt and one or two custom scripts, it would be minimal work to roll a few custom version of WordPress and publish them in tgz and svn.

To that end, I've started a discussion here:
http://www.callum-macdonald.com/2009/12/17/proposing-wp-flavours/

comment:39 in reply to: ↑ 38 ; follow-up: docwhat5 years ago

Replying to chmac:

zamoose, you raise some interesting scenarios where privacy becomes very important. I think because your scenarios are all based on wp.org being compromised, it's unlikely to hold much sway with the decision makers.

WP.org doesn't have to be compromised, a man-in-the-middle attack would work as well. But either way, why wouldn't it hold sway? Unless wordpress.org signs an contract with me that contains penalties for being hacked, I don't see how or why I should trust them with my data.

The fact they don't need it just rubs salt in the wound.

comment:40 in reply to: ↑ 39 chmac5 years ago

Replying to docwhat:

WP.org doesn't have to be compromised, a man-in-the-middle attack would work as well. But either way, why wouldn't it hold sway? Unless wordpress.org signs an contract with me that contains penalties for being hacked, I don't see how or why I should trust them with my data.

Personally, I agree completely. But, I don't think I'd hold the same view if I were a core dev or an Automattic employee. In that case, I think I'd be willing to tolerate these outside and unlikely problems for a few users. I'd most likely justify my decision saying that it was "for the greater good".

The man in the middle attack is particularly appropriate in the Iranian example. I imagine the Iranian government has the capability to easily monitor all outgoing traffic to the update service without even needing a man in the middle attack. Simple traffic monitoring would probably be sufficient (I'm assuming the data is sent in plain text).

comment:41 hakre5 years ago

The main problem is, that there was no awareness about security / privacy issues when this was put in the source. Get yourself a plugin that just disables the problematic area and you're fine.

comment:43 hakre5 years ago

  • Keywords needs-testing added; 2nd-opinion privacy removed

comment:44 Denis-de-Bernardy5 years ago

  • Keywords needs-patch added; has-patch needs-testing removed
  • Milestone changed from 3.0 to Future Release

Patch is using the wrong approach. The general consensus is that disabling updates should not be in core.

Punting to future given recent developments (the whole idea seems slated to be revisited later this year).

comment:45 dd323 years ago

  • Milestone Future Release deleted
  • Resolution set to maybelater
  • Status changed from reopened to closed
Note: See TracTickets for help on using tickets.