Opened 7 years ago
Last modified 6 years ago
#43588 new enhancement
Anonymize commenter IP address once a comment is no longer pending
Reported by: | allendav | Owned by: | |
---|---|---|---|
Milestone: | Awaiting Review | Priority: | normal |
Severity: | normal | Version: | |
Component: | Privacy | Keywords: | needs-patch needs-unit-tests |
Focuses: | Cc: |
Description
A commenter's IP address is stored with each comment. The commenters IP address can often be used to identify a single individual or device at a location.
To enhance commentor's privacy, and to reduce the amount of personal data stored by a WordPress site in preparation for upcoming laws like the GDPR, this issue proposes that once a comment transitions out of pending, core WordPress should zero the commentor's IP address final octet similar to how Google Analytics Anonymizes IP addresses
The rationale for keeping it while a comment is pending is to continue to allow anti-spam access to the IP address which can be used to detect spam.
The rationale for keeping all but the last octet is to still allow statistics to be gathered about the general geographic location of commenters based on the first three octets of the IP address.
Change History (24)
#4
@
6 years ago
- Keywords needs-unit-tests added
The insert and update comment functions come to mind:
wp_insert_comment()
wp_update_comment()
where e.g.:
wp_new_comment()
is a wrapper forwp_insert_comment()
.edit_comment()
is wrapper forwp_update_comment()
.wp_filter_comment()
withinwp_new_comment()
.
So within wp_insert_comment()
and wp_update_comment()
we could consider something like:
// Anonymize the comment author's IP, if the comment is not pending. if( 0 !== $data['comment_approved'] ) { $data['comment_author_IP'] = wp_privacy_anonymize_ip( $data['comment_author_IP'] ); }
where wp_privacy_anonymize_ip()
comes from #43545
Further we might consider what should happen when the pending
comment status is changed with wp_transition_comment_status()
.
#5
follow-ups:
↓ 7
↓ 9
@
6 years ago
Together with anonymizing the IP address we should probably delete the browser UA. It is used for pretty much the same purpose and has no other role.
However this has to be done after some time. Sometimes comments are "approved" but the user marks them as spam later, or the other way around. In both cases all comment data is still submitted to spam detection services together with the user action. If we remove some of the data before this, it may cause errors.
#7
in reply to:
↑ 5
;
follow-up:
↓ 8
@
6 years ago
Replying to azaozz:
Together with anonymizing the IP address we should probably delete the browser UA. It is used for pretty much the same purpose and has no other role.
However this has to be done after some time. Sometimes comments are "approved" but the user marks them as spam later, or the other way around. In both cases all comment data is still submitted to spam detection services together with the user action. If we remove some of the data before this, it may cause errors.
Agree.
What retention period do you propose?
#8
in reply to:
↑ 7
;
follow-up:
↓ 16
@
6 years ago
Replying to idea15:
Replying to azaozz:
Together with anonymizing the IP address we should probably delete the browser UA. It is used for pretty much the same purpose and has no other role.
However this has to be done after some time. Sometimes comments are "approved" but the user marks them as spam later, or the other way around. In both cases all comment data is still submitted to spam detection services together with the user action. If we remove some of the data before this, it may cause errors.
Agree.
What retention period do you propose?
Let's do 30 days and make it filterable.
#9
in reply to:
↑ 5
;
follow-up:
↓ 11
@
6 years ago
Replying to azaozz:
Together with anonymizing the IP address we should probably delete the browser UA. It is used for pretty much the same purpose and has no other role.
I disagree with deleting the browser UA. As a site owner, I want to know my mix of mobile vs non-mobile users for example so I can know where to invest my marketing. Other site owners may want to know what mix of browsers are visiting their site to determine where to prioritize bug fixes (e.g. we have bug X on Opera - but how many users are visiting our site with that vs fixing bug Y on Edge)
If anything, could we have a separate issue for UA erasure please?
This ticket was mentioned in Slack in #gdpr-compliance by tz-media. View the logs.
6 years ago
#11
in reply to:
↑ 9
;
follow-up:
↓ 12
@
6 years ago
Replying to allendav:
I disagree with deleting the browser UA. As a site owner, I want to know my mix of mobile vs non-mobile users for example so I can...
This makes sense but we are talking only about people (hopefully not bots) that have left comments on the site. They are very small percentage of the site visitors and that data is pretty useless for "statistical purposes" :)
I don't mind either way. Seems anonymizing one piece of data, but keeping the rest of it wouldn't make any significant difference to user privacy.
#12
in reply to:
↑ 11
@
6 years ago
Replying to azaozz:
Replying to allendav:
I disagree with deleting the browser UA. As a site owner, I want to know my mix of mobile vs non-mobile users for example so I can...
This makes sense but we are talking only about people (hopefully not bots) that have left comments on the site. They are very small percentage of the site visitors and that data is pretty useless for "statistical purposes" :)
JFTR: Maybe I would like to make statistics specifically about users who interact with the content on my site.
Maybe we could strip down the user agent during anonymization, to keep essential information while deleting excess data.
This ticket was mentioned in Slack in #gdpr-compliance by mnelson4. View the logs.
6 years ago
#14
@
6 years ago
FYI commenter IP addresses are now getting anonymized when a commenter requests to be forgotten, see https://core.trac.wordpress.org/ticket/43442
#16
in reply to:
↑ 8
@
6 years ago
Replying to allendav:
Replying to idea15:
Replying to azaozz:
Together with anonymizing the IP address we should probably delete the browser UA. It is used for pretty much the same purpose and has no other role.
However this has to be done after some time. Sometimes comments are "approved" but the user marks them as spam later, or the other way around. In both cases all comment data is still submitted to spam detection services together with the user action. If we remove some of the data before this, it may cause errors.
Agree.
What retention period do you propose?
Let's do 30 days and make it filterable.
I think, that is a great idea! And important, as data is just allowed to be stored as long as needed. One idea is to store and keep as less data as possible. And after some time the IP is not needed anymore.
This ticket was mentioned in Slack in #gdpr-compliance by desrosj. View the logs.
6 years ago
#18
@
6 years ago
- Version trunk deleted
Some input from spam detection plugins would be useful here. Some plugins may be checking a comment for spam on edit or when the comment status is changed.
#19
@
6 years ago
Hi,
I am collaborating via the Pluginkollektiv on Antispam Bee.
We use the IP address for three different checks:
- You can whitelist/blacklist countries and block comments which come from a specific country (we send an anonymized ip to an external service. currently we anonymize ourselfs, lets see, if we could utilize the new anonymize functionality from core for this.
- We check the local database if we have spam-comments from the same IP. We have altered this behavior in the last release and we start to save a hash (using
wp_create_password($ip)
in the meta data of the comment. We have to see, how this plays out for a couple of reasons (like its quite an expensive check). The options for us here right now is to get rid of this IP check completely or to strengthen it, because currently we are hooked intocomment_post
to save the data out of$comment_data
. My thoughts here would be to abandon$comment_data
completely and rely on our own IP detection. - The last check is called
fake_ip
.
All those checks do not rely (or won't no longer regardless of your moves, as there are also some filters into play we need to consider, which are used by others to anonymize already) on the data given by $comment_data
, but we use our own implementation.
With all this said, in regards of Antispam Bee, we are monitoring the moves you guys and girls do in core closely and are very happy you are taking the necessary steps. Thanks a lot for all your work. Even if you wouldn't save the IP at all, this wouldn't affect us. But I can only speak for Antispam Bee.
#20
@
6 years ago
Some plugins may be checking a comment for spam on edit or when the comment status is changed.
Sorry, I missed that note. In the case of Antispam Bee, we do check on preprocess_comment
in wp_new_comment()
. We are not hooked into wp_insert_comment
or edit_comment
.
#21
@
6 years ago
- Keywords gdpr removed
Removing the GDPR keyword. This has been replaced by the new Privacy component and privacy focuses in Trac.
This ticket was mentioned in Slack in #core-privacy by desrosj. View the logs.
6 years ago
#23
@
6 years ago
- Keywords changed from needs-patch, needs-unit-tests to needs-patch needs-unit-tests
This is one of those instances where we need to find the middle ground between privacy and security. It may be necessary for an administrator to want to keep a particular commenter's IP address recorded for security purposes, be it a hack attempt, trolling, harassment, etc. We can easily think of examples where wiping all IP addresses too fast, or too universally, would also delete information which might be essential to protect network or human security.
I would suggest something like six months for a default retention period. At that point, if an admin is aware of a particular commenter creating a particular problem, then can take whatever steps are required.
There any probably statistics/anti-spam plugins in the repository that would show from what country people are commenting from? In that case I suppose they use that IP?
Not sure if removing the last octet would make a difference or not ...
See also comment from Stefan @stk_jj :
have a look at that: https://github.com/pluginkollektiv/antispam-bee/blob/master/antispam_bee.php#L1376