Make WordPress Core

Opened 7 years ago

Closed 7 years ago

Last modified 7 years ago

#43442 closed enhancement (fixed)

Add tools for anonymizing of commenters

Reported by: azaozz's profile azaozz Owned by: allendav's profile allendav
Milestone: 4.9.6 Priority: normal
Severity: normal Version:
Component: Privacy Keywords: gdpr has-patch needs-testing commit fixed-major
Focuses: Cc:

Description

Site visitors and logged-in users that submit comments can request the removal of their personal data.

Ideally we should provide tools for both anonymization of the comments and (easier) deletion of comments by email. Then the sire owner(s) would decide which method suits them best. There is a caveat that once anonymized, the comments cannot be attributed to any user and so a request to delete them will not be possible.

To avoid misuse there should be a confirmation email sent to the commenter with a link similar to the reset password link. After clicking that link another email can be sent to the site owner(s) to perform the action.

Attachments (8)

43442_delete.diff (1.7 KB) - added by xkon 7 years ago.
simple function for deleting comments by email
43442_anonymize.diff (1.5 KB) - added by fclaussen 7 years ago.
Anonymization function for comments
43442.diff (3.3 KB) - added by allendav 7 years ago.
Builds on fclaussen's patch but makes it pageable and compatible with the ajax in 43637
43442.2.diff (3.2 KB) - added by allendav 7 years ago.
Updated to match new impl in 43637 (returns number of comments updated per page, etc)
43442.3.diff (3.2 KB) - added by allendav 7 years ago.
Updated to use new anonymization functions recently added to core
43442.4.c.diff (8.6 KB) - added by birgire 7 years ago.
43442.5.diff (9.0 KB) - added by azaozz 7 years ago.
43442.6.diff (13.0 KB) - added by birgire 7 years ago.

Download all attachments as: .zip

Change History (43)

@xkon
7 years ago

simple function for deleting comments by email

@fclaussen
7 years ago

Anonymization function for comments

#1 @fclaussen
7 years ago

  • Keywords has-patch needs-testing added; needs-patch removed

#2 @fclaussen
7 years ago

I believe the email confirmation handling is being taken care of on a separate ticket, isn't it?

#3 @xkon
7 years ago

I've forgotten to update this ticket:

The new goal here would be to go with anonymizing everything to not break front-end UI elements from nested comments etc. So @fclaussen 's diff goes along that path well.

And yes the e-mails should be handled at #43437 afaic.

#4 @fclaussen
7 years ago

Do we want to replace the comment itself for something like deleted ?

#5 @jesperher
7 years ago

I have written an example of a helper function to anonymize data, based on its types.
https://core.trac.wordpress.org/ticket/43545#comment:4
Maybe you guys have som thoughts on this. also if there should be some special handling for comments?

@allendav
7 years ago

Builds on fclaussen's patch but makes it pageable and compatible with the ajax in 43637

#6 @allendav
7 years ago

@fclaussen - I've taken your patch and rolled it into a new one that is compatible with the ajax pump and eraser registration in #43637

@jesperher - I'd like to use your anonymizers when they land - we need anonymizers for author, author email and IP

To test:

cc @azaozz @mikejolley

#7 @azaozz
7 years ago

Looking at 43442.diff, thinking we may be better off doing a custom db query. Updating each comment in a loop would be very ineffective. Perhaps something like:

$wpdb->query(
    $wpdb->prepare( "UPDATE $wpdb->comments SET comment_agent = '', comment_author = %s, comment_author_email = '', comment_author_IP = '0.0.0.0', comment_author_url = '', user_id = 0 WHERE comment_author_email = %s", __( 'Anonymous' ), $email )
);

Of course this will not run any of the actions and filters that run when updating a comment. Thinking that is acceptable since this is a special multi-comments update, not a usual edit-a-comment one.

Last edited 7 years ago by azaozz (previous) (diff)

This ticket was mentioned in Slack in #gdpr-compliance by pento. View the logs.


7 years ago

This ticket was mentioned in Slack in #gdpr-compliance by azaozz. View the logs.


7 years ago

#10 follow-up: @azaozz
7 years ago

Also wondering if we should support deleting the comment text here. This may be required in some cases.

Thinking we should still keep the comment row in the db when the comment has "child comments". Can replace the text with something like:

This content was deleted by the author.

That would mean running a lot more complex db queries. I'll run couple of tests to see which would be faster.

#11 in reply to: ↑ 10 @allendav
7 years ago

Replying to azaozz:

Also wondering if we should support deleting the comment text here. This may be required in some cases.

Thinking we should still keep the comment row in the db when the comment has "child comments". Can replace the text with something like:

This content was deleted by the author.

That would mean running a lot more complex db queries. I'll run couple of tests to see which would be faster.

I see your point - we can't know if a given comment has personal data or not - i.e. it would be difficult to detect things like "I've lived on such and such street for the last 20 years, and this was the best parade yet!"

So... the _expectation_ that comment text _could_ be erased (and replaced with a placeholder, not unlike what you've suggested) is probably there, if not a legal requirement.

So... maybe this should be a checkbox on the erasure UX - checked by default - to erase the comment text as well. What do you think?

(The reason for checking this by default is, 1) like i said, the expectation is probably there in general that comment text will be erased when you erase me from your site and 2) once personal data is erased from comments, finding a user's comments to then erase their comments' content could be impossible

#12 follow-up: @azaozz
7 years ago

Thinking that individual comments containing personal data can be handled directly. The commenter can contact the admin and ask a comment to be edited or deleted. Then the admin can use the email verification/confirmation.

Was mostly thinking to give the site owners the option to also delete the comments completely. In that case we may have to keep comments that have child comments. Currently when a comment parent is deleted we hide all children.

On the other hand the current behavior also makes sense. Yes, child comments are not displayed, but if they were, they will probably be out of context.

This also can be fixed when displaying the comments. Maybe we should leave it as-is for now :)

#13 in reply to: ↑ 12 @TZ Media
7 years ago

Replying to azaozz:

Thinking that individual comments containing personal data can be handled directly. The commenter can contact the admin and ask a comment to be edited or deleted. Then the admin can use the email verification/confirmation.

+1 for this. We should, however, inform the user that personal data inside the comment body will not be touched. Ideally this information should be given to the user before commenting at all, as to avoid putting personal data in a comment body in the first place. And we should give this information prior to anonymizing comments, so deletion of any personal data from the body can occur before anonymizing them.

Still there might be an option to delete the body, or replace it with a replacement as suggested when the comment has children.

@allendav
7 years ago

Updated to match new impl in 43637 (returns number of comments updated per page, etc)

#14 @allendav
7 years ago

Updated patch to work with new ajax on #43637 -- response now includes how many comments were anonymized for the given page.

In addition to requiring #43637 to test, this patch also requires #43602 (coming soon) to test.

cc @azaozz @mikejolley

@allendav
7 years ago

Updated to use new anonymization functions recently added to core

#15 @allendav
7 years ago

Note: In addition to requiring #43637 to test, this patch also requires #43602 to test.

#16 @birgire
7 years ago

Three approaches come to mind regarding how we can update each comment in wp_comments_personal_data_eraser():

  1. Use wp_update_comment().
  2. Use wp_update_comment() with wp_defer_comment_counting() for better performance.
  3. Use $wpdb->update(), I would suggest this instead of $wpdb->query().

I did some timing and query counting over three runs for each approach, with 500 comments:

a) wp_update_comment()
------------------
1. time: 1.627704
   #qs:  2501
2. time: 1.661613
   #qs:  2501
3. time: 1.607060
   #qs:  2501

b) wp_update_comment() + defer counting
------------------
1. time: 0.992301
   #qs:  1004
2. time: 0.928880
   #qs:  1004
3. time: 0.952708
   #qs:  1004

c) $wpdb->update()
------------------
1. time: 0.358650
   #qs:  501
2. time: 0.348916
   #qs:  501
3. time: 0.334349
   #qs:  501

As expected c) is the fastest with lowest number of queries.

Here's the test setup:

global $wpdb;
$count = 500;
$post_id = self::factory()->post->create();
$args = array(
	'comment_post_ID'      => $post_id,
	'comment_author'       => 'Comment Author',
	'comment_author_email' => 'personal@local.host',
	'comment_author_url'   => 'https://local.host/',
	'comment_author_IP'    => '192.168.0.1',
	'comment_date'         => '2018-03-28 20:05:00',
	'comment_agent'        => 'SOME_AGENT',
	'comment_content'      => 'Comment',
);
self::factory()->comment->create_many( $count, $args );

$wpdb->timer_start();
$query_count = $wpdb->num_queries;
wp_comments_personal_data_eraser( $args['comment_author_email'] );
		
printf( 'time: %f' . PHP_EOL, $wpdb->timer_stop() );
printf( '#qs: %d' . PHP_EOL, $wpdb->num_queries - $query_count );

@birgire
7 years ago

#17 follow-ups: @birgire
7 years ago

43442.4.c.diff is a suggestion that:

  • Adds the c) approach (mentioned above) to 43442.3.diff.
  • Adds clean_comment_cache( $comment->comment_ID ) after each comment update.
  • Adjustments according to the Coding Standard.
  • Adds the tests
    • test_wp_comments_personal_data_eraser()
    • test_wp_comments_personal_data_eraser_empty_first_page_output()
    • test_wp_comments_personal_data_eraser_non_empty_first_page_output()
    • test_wp_comments_personal_data_eraser_empty_second_page_output()

I was wondering about the idea of num_items_retained and message in wp_comments_personal_data_eraser():

In the patch I set 'num_items_retained' => count( $comments ) - $num_items_removed, i.e. the number of comments that were not erased due to some error. Is that correct understanding? The message is still an empty array though.

#18 in reply to: ↑ 17 @allendav
7 years ago

Replying to birgire:

I was wondering about the idea of num_items_retained and message in wp_comments_personal_data_eraser():

The idea is that if core or a plugin decides not to erase personal data (e.g. in a WooCommerce Order) it increments num_items_retained and includes a message explaining itself to the admin (e.g. "Order 1234 is less than 180 days old. Personal data is not removed from orders less than 180 days old due to dispute resolution requirements.")

#19 in reply to: ↑ 17 @allendav
7 years ago

Replying to birgire:

43442.4.c.diff is a suggestion that...

@birgire - thank you - that tests nicely! The only downside I see is that core actions/filters associated with wp_update_comment won't fire, but in this particular case I don't think that will be a problem. @azaozz - do you concur?

I was wondering about the idea of num_items_retained and message...

@birgire - Ideally, message would include an array of messages like "Comment 1432 contained personal data but could not be anonymized" - again, I think that is unlikely if not impossible (which is why my own patch returned 0 and an empty array) - which I think would be ok if yours did that as well.

If you do want to have num_item_retained non-zero, then we should also have a message for each failure - that way the admin can make informed decisions about what they want to do next.

This ticket was mentioned in Slack in #gdpr-compliance by casiepa. View the logs.


7 years ago

#21 @desrosj
7 years ago

  • Owner set to allendav
  • Status changed from new to assigned

Moving to the 4.9.6 milestone after consensus was reached in the most recent GDPR chat (https://wordpress.slack.com/archives/C9695RJBW/p1524063200000304).

#22 @birgire
7 years ago

Thanks for the feedback @allendav

  • Yes, it's something to consider, I'm still 50/50 divided between approach b) and c) ;-) I might have had few timeout cases in the past, when deleting hundreds of spam comments in bulk, in wp-admin.
  • Regarding the messages in wp_comments_personal_data_eraser, one way could be to grab possible (rare) db update errors with:
$updated = $wpdb->update( $wpdb->comments, $anonymized_comment, array(
	'comment_ID' => $comment->comment_ID,
) );

if ( $updated ) {
	$num_items_removed++;
} else {
	$messages[] = sprintf( 
		'Comment %d contained personal data but could not be anonymized.', 
		$comment->comment_ID
	);
}

  • Any particular structure here for $messages (like we see e.g. in the $data_to_export array here) ?
  • Another thing to consider is empty input for wp_comments_personal_data_eraser(), it will still erase comments, as it will call get_comments() but without the email filtering.
  • Also the admin page is Remove Personal Data but here we have erase functions. So we have both "remove" and "erase". I think we should consider only one of them for consistency.

@azaozz
7 years ago

#23 @azaozz
7 years ago

In 43442.5.diff: building on 43442.4.c.diff and the above suggestions, added more and better error messages. Also added new filter: 'wp_anonymize_comment' so plugins can prevent anonymization of a particular comment and return an error message with the reason.

@birgire
7 years ago

#24 follow-up: @birgire
7 years ago

I like the idea in 43442.5.diff to provide the user with a link to manually handle comments with anonymization errors.

43442.6.diff additionally:

  • Adds a docblock for the wp_anonymize_comment filter
  • Adjusts two strings to make them translatable
  • Adds translators comments.
  • Adds accessibility text for the _blank link.
  • Adds external-link class for _blank link.
  • Adds an escape for $edit_url.
  • Changes Ah error to An error in a string.
  • Changes $result to $updated.
  • Adds a test for thewp_anonymize_comment filter, to prevent comment anonymization.
  • Adds a test for thewp_anonymize_comment filter, to prevent comment anonymization, with a custom message.

ps: I wonder if we should rename $num_items_removed to $num_items_erased in general?

#25 @azaozz
7 years ago

In 42994:

Privacy: add functionality to anonymize commenters.

Props xkon, fclaussen, allendav, birgire, azaozz.
See #43442.

#26 in reply to: ↑ 24 @azaozz
7 years ago

  • Milestone changed from 5.0 to 4.9.6

Replying to birgire:

I like the idea in 43442.5.diff to provide the user with a link to manually handle comments with anonymization errors.

Yeah, like that idea too. However looked into it a bit more and couldn't find any cases where a comment updating will fail but the comment can still be edited. Removed it for now. If we can prove it's useful, can add it back later.

This ticket was mentioned in Slack in #gdpr-compliance by eric.daams. View the logs.


7 years ago

This ticket was mentioned in Slack in #core by jeffpaul. View the logs.


7 years ago

#29 @azaozz
7 years ago

  • Keywords commit fixed-major added

This ticket was mentioned in Slack in #gdpr-compliance by allendav. View the logs.


7 years ago

#31 @joemcgill
7 years ago

This requires changes from [42971] (see #43545) before merging to the 4.9 branch.

This ticket was mentioned in Slack in #gdpr-compliance by desrosj. View the logs.


7 years ago

#33 @SergeyBiryukov
7 years ago

In 43080:

Privacy: add functionality to anonymize commenters.

Props xkon, fclaussen, allendav, birgire, azaozz.
Merges [42994] to the 4.9 branch.
See #43442.

#34 @SergeyBiryukov
7 years ago

  • Resolution set to fixed
  • Status changed from assigned to closed

#35 @desrosj
7 years ago

  • Component changed from General to Privacy

Moving to the new Privacy component.

Note: See TracTickets for help on using tickets.