WordPress.org

Make WordPress Core

Opened 2 months ago

Last modified 19 hours ago

#43588 new enhancement

Anonymize commenter IP address once a comment is no longer pending

Reported by: allendav Owned by:
Milestone: Awaiting Review Priority: normal
Severity: normal Version: trunk
Component: Privacy Keywords: gdpr needs-patch needs-unit-tests
Focuses: Cc:

Description

A commenter's IP address is stored with each comment. The commenters IP address can often be used to identify a single individual or device at a location.

To enhance commentor's privacy, and to reduce the amount of personal data stored by a WordPress site in preparation for upcoming laws like the GDPR, this issue proposes that once a comment transitions out of pending, core WordPress should zero the commentor's IP address final octet similar to how Google Analytics Anonymizes IP addresses

The rationale for keeping it while a comment is pending is to continue to allow anti-spam access to the IP address which can be used to detect spam.

The rationale for keeping all but the last octet is to still allow statistics to be gathered about the general geographic location of commenters based on the first three octets of the IP address.

Change History (16)

#1 @allendav
2 months ago

  • Keywords gdpr added

#2 @allendav
2 months ago

  • Keywords needs-patch added

#3 @casiepa
2 months ago

There any probably statistics/anti-spam plugins in the repository that would show from what country people are commenting from? In that case I suppose they use that IP?

Not sure if removing the last octet would make a difference or not ...

See also comment from Stefan @stk_jj : have a look at that: https://github.com/pluginkollektiv/antispam-bee/blob/master/antispam_bee.php#L1376

Last edited 2 months ago by casiepa (previous) (diff)

#4 @birgire
2 months ago

  • Keywords needs-unit-tests added

The insert and update comment functions come to mind:

  • wp_insert_comment()
  • wp_update_comment()

where e.g.:

  • wp_new_comment() is a wrapper for wp_insert_comment().
  • edit_comment() is wrapper for wp_update_comment().
  • wp_filter_comment() within wp_new_comment().
  • wp_handle_comment_submission() uses wp_new_comment()

So within wp_insert_comment() and wp_update_comment() we could consider something like:

// Anonymize the comment author's IP, if the comment is not pending.
if( 0 !== $data['comment_approved'] ) {
    $data['comment_author_IP'] = wp_privacy_anonymize_ip( $data['comment_author_IP'] );
}

where wp_privacy_anonymize_ip() comes from #43545

Further we might consider

  • What should happen when the pending comment status is changed with wp_set_comment_status() / wp_transition_comment_status().
  • What about author IP in comment cache?
Last edited 2 months ago by birgire (previous) (diff)

#5 follow-ups: @azaozz
2 months ago

Together with anonymizing the IP address we should probably delete the browser UA. It is used for pretty much the same purpose and has no other role.

However this has to be done after some time. Sometimes comments are "approved" but the user marks them as spam later, or the other way around. In both cases all comment data is still submitted to spam detection services together with the user action. If we remove some of the data before this, it may cause errors.

#6 @SergeyBiryukov
2 months ago

  • Component changed from General to Comments

#7 in reply to: ↑ 5 ; follow-up: @idea15
8 weeks ago

Replying to azaozz:

Together with anonymizing the IP address we should probably delete the browser UA. It is used for pretty much the same purpose and has no other role.

However this has to be done after some time. Sometimes comments are "approved" but the user marks them as spam later, or the other way around. In both cases all comment data is still submitted to spam detection services together with the user action. If we remove some of the data before this, it may cause errors.

Agree.

What retention period do you propose?

#8 in reply to: ↑ 7 ; follow-up: @allendav
7 weeks ago

Replying to idea15:

Replying to azaozz:

Together with anonymizing the IP address we should probably delete the browser UA. It is used for pretty much the same purpose and has no other role.

However this has to be done after some time. Sometimes comments are "approved" but the user marks them as spam later, or the other way around. In both cases all comment data is still submitted to spam detection services together with the user action. If we remove some of the data before this, it may cause errors.

Agree.

What retention period do you propose?

Let's do 30 days and make it filterable.

#9 in reply to: ↑ 5 ; follow-up: @allendav
7 weeks ago

Replying to azaozz:

Together with anonymizing the IP address we should probably delete the browser UA. It is used for pretty much the same purpose and has no other role.

I disagree with deleting the browser UA. As a site owner, I want to know my mix of mobile vs non-mobile users for example so I can know where to invest my marketing. Other site owners may want to know what mix of browsers are visiting their site to determine where to prioritize bug fixes (e.g. we have bug X on Opera - but how many users are visiting our site with that vs fixing bug Y on Edge)

If anything, could we have a separate issue for UA erasure please?

This ticket was mentioned in Slack in #gdpr-compliance by tz-media. View the logs.


7 weeks ago

#11 in reply to: ↑ 9 ; follow-up: @azaozz
7 weeks ago

Replying to allendav:

I disagree with deleting the browser UA. As a site owner, I want to know my mix of mobile vs non-mobile users for example so I can...

This makes sense but we are talking only about people (hopefully not bots) that have left comments on the site. They are very small percentage of the site visitors and that data is pretty useless for "statistical purposes" :)

I don't mind either way. Seems anonymizing one piece of data, but keeping the rest of it wouldn't make any significant difference to user privacy.

#12 in reply to: ↑ 11 @TZ Media
7 weeks ago

Replying to azaozz:

Replying to allendav:

I disagree with deleting the browser UA. As a site owner, I want to know my mix of mobile vs non-mobile users for example so I can...

This makes sense but we are talking only about people (hopefully not bots) that have left comments on the site. They are very small percentage of the site visitors and that data is pretty useless for "statistical purposes" :)

JFTR: Maybe I would like to make statistics specifically about users who interact with the content on my site.

Maybe we could strip down the user agent during anonymization, to keep essential information while deleting excess data.

This ticket was mentioned in Slack in #gdpr-compliance by mnelson4. View the logs.


10 days ago

#14 @mnelson4
10 days ago

FYI commenter IP addresses are now getting anonymized when a commenter requests to be forgotten, see https://core.trac.wordpress.org/ticket/43442

#15 @desrosj
9 days ago

  • Component changed from Comments to Privacy

Moving to the new Privacy component.

#16 in reply to: ↑ 8 @Michael_Hartl
19 hours ago

Replying to allendav:

Replying to idea15:

Replying to azaozz:

Together with anonymizing the IP address we should probably delete the browser UA. It is used for pretty much the same purpose and has no other role.

However this has to be done after some time. Sometimes comments are "approved" but the user marks them as spam later, or the other way around. In both cases all comment data is still submitted to spam detection services together with the user action. If we remove some of the data before this, it may cause errors.

Agree.

What retention period do you propose?

Let's do 30 days and make it filterable.

I think, that is a great idea! And important, as data is just allowed to be stored as long as needed. One idea is to store and keep as less data as possible. And after some time the IP is not needed anymore.

Note: See TracTickets for help on using tickets.