#43442 closed enhancement (fixed)
Add tools for anonymizing of commenters
Reported by: | azaozz | Owned by: | allendav |
---|---|---|---|
Milestone: | 4.9.6 | Priority: | normal |
Severity: | normal | Version: | |
Component: | Privacy | Keywords: | gdpr has-patch needs-testing commit fixed-major |
Focuses: | Cc: |
Description
Site visitors and logged-in users that submit comments can request the removal of their personal data.
Ideally we should provide tools for both anonymization of the comments and (easier) deletion of comments by email. Then the sire owner(s) would decide which method suits them best. There is a caveat that once anonymized, the comments cannot be attributed to any user and so a request to delete them will not be possible.
To avoid misuse there should be a confirmation email sent to the commenter with a link similar to the reset password link. After clicking that link another email can be sent to the site owner(s) to perform the action.
Attachments (8)
Change History (43)
#2
@
6 years ago
I believe the email confirmation handling is being taken care of on a separate ticket, isn't it?
#3
@
6 years ago
I've forgotten to update this ticket:
The new goal here would be to go with anonymizing everything to not break front-end UI elements from nested comments etc. So @fclaussen 's diff goes along that path well.
And yes the e-mails should be handled at #43437 afaic.
#5
@
6 years ago
I have written an example of a helper function to anonymize data, based on its types.
https://core.trac.wordpress.org/ticket/43545#comment:4
Maybe you guys have som thoughts on this. also if there should be some special handling for comments?
@
6 years ago
Builds on fclaussen's patch but makes it pageable and compatible with the ajax in 43637
#6
@
6 years ago
@fclaussen - I've taken your patch and rolled it into a new one that is compatible with the ajax pump and eraser registration in #43637
@jesperher - I'd like to use your anonymizers when they land - we need anonymizers for author, author email and IP
To test:
- install 43442.diff
- install 43637.diff from https://core.trac.wordpress.org/ticket/43637
- install the temporary plugin from https://github.com/allendav/wp-privacy-requests and follow the instructions in the README.md in that repo to kick off personal data removal
- note: the temporary plugin will be delivered in #43602 + #43546
- profit!
cc @azaozz @mikejolley
#7
@
6 years ago
Looking at 43442.diff, thinking we may be better off doing a custom db query. Updating each comment in a loop would be very ineffective. Perhaps something like:
$wpdb->query( $wpdb->prepare( "UPDATE $wpdb->comments SET comment_agent = '', comment_author = %s, comment_author_email = '', comment_author_IP = '0.0.0.0', comment_author_url = '', user_id = 0 WHERE comment_author_email = %s", __( 'Anonymous' ), $email ) );
Of course this will not run any of the actions and filters that run when updating a comment. Thinking that is acceptable since this is a special multi-comments update, not a usual edit-a-comment one.
This ticket was mentioned in Slack in #gdpr-compliance by pento. View the logs.
6 years ago
This ticket was mentioned in Slack in #gdpr-compliance by azaozz. View the logs.
6 years ago
#10
follow-up:
↓ 11
@
6 years ago
Also wondering if we should support deleting the comment text here. This may be required in some cases.
Thinking we should still keep the comment row in the db when the comment has "child comments". Can replace the text with something like:
This content was deleted by the author.
That would mean running a lot more complex db queries. I'll run couple of tests to see which would be faster.
#11
in reply to:
↑ 10
@
6 years ago
Replying to azaozz:
Also wondering if we should support deleting the comment text here. This may be required in some cases.
Thinking we should still keep the comment row in the db when the comment has "child comments". Can replace the text with something like:
This content was deleted by the author.That would mean running a lot more complex db queries. I'll run couple of tests to see which would be faster.
I see your point - we can't know if a given comment has personal data or not - i.e. it would be difficult to detect things like "I've lived on such and such street for the last 20 years, and this was the best parade yet!"
So... the _expectation_ that comment text _could_ be erased (and replaced with a placeholder, not unlike what you've suggested) is probably there, if not a legal requirement.
So... maybe this should be a checkbox on the erasure UX - checked by default - to erase the comment text as well. What do you think?
(The reason for checking this by default is, 1) like i said, the expectation is probably there in general that comment text will be erased when you erase me from your site and 2) once personal data is erased from comments, finding a user's comments to then erase their comments' content could be impossible
#12
follow-up:
↓ 13
@
6 years ago
Thinking that individual comments containing personal data can be handled directly. The commenter can contact the admin and ask a comment to be edited or deleted. Then the admin can use the email verification/confirmation.
Was mostly thinking to give the site owners the option to also delete the comments completely. In that case we may have to keep comments that have child comments. Currently when a comment parent is deleted we hide all children.
On the other hand the current behavior also makes sense. Yes, child comments are not displayed, but if they were, they will probably be out of context.
This also can be fixed when displaying the comments. Maybe we should leave it as-is for now :)
#13
in reply to:
↑ 12
@
6 years ago
Replying to azaozz:
Thinking that individual comments containing personal data can be handled directly. The commenter can contact the admin and ask a comment to be edited or deleted. Then the admin can use the email verification/confirmation.
+1 for this. We should, however, inform the user that personal data inside the comment body will not be touched. Ideally this information should be given to the user before commenting at all, as to avoid putting personal data in a comment body in the first place. And we should give this information prior to anonymizing comments, so deletion of any personal data from the body can occur before anonymizing them.
Still there might be an option to delete the body, or replace it with a replacement as suggested when the comment has children.
#16
@
6 years ago
Three approaches come to mind regarding how we can update each comment in wp_comments_personal_data_eraser()
:
- Use
wp_update_comment()
. - Use
wp_update_comment()
withwp_defer_comment_counting()
for better performance. - Use
$wpdb->update()
, I would suggest this instead of$wpdb->query()
.
I did some timing and query counting over three runs for each approach, with 500 comments:
a) wp_update_comment() ------------------ 1. time: 1.627704 #qs: 2501 2. time: 1.661613 #qs: 2501 3. time: 1.607060 #qs: 2501 b) wp_update_comment() + defer counting ------------------ 1. time: 0.992301 #qs: 1004 2. time: 0.928880 #qs: 1004 3. time: 0.952708 #qs: 1004 c) $wpdb->update() ------------------ 1. time: 0.358650 #qs: 501 2. time: 0.348916 #qs: 501 3. time: 0.334349 #qs: 501
As expected c) is the fastest with lowest number of queries.
Here's the test setup:
global $wpdb; $count = 500; $post_id = self::factory()->post->create(); $args = array( 'comment_post_ID' => $post_id, 'comment_author' => 'Comment Author', 'comment_author_email' => 'personal@local.host', 'comment_author_url' => 'https://local.host/', 'comment_author_IP' => '192.168.0.1', 'comment_date' => '2018-03-28 20:05:00', 'comment_agent' => 'SOME_AGENT', 'comment_content' => 'Comment', ); self::factory()->comment->create_many( $count, $args ); $wpdb->timer_start(); $query_count = $wpdb->num_queries; wp_comments_personal_data_eraser( $args['comment_author_email'] ); printf( 'time: %f' . PHP_EOL, $wpdb->timer_stop() ); printf( '#qs: %d' . PHP_EOL, $wpdb->num_queries - $query_count );
#17
follow-ups:
↓ 18
↓ 19
@
6 years ago
43442.4.c.diff is a suggestion that:
- Adds the c) approach (mentioned above) to 43442.3.diff.
- Adds
clean_comment_cache( $comment->comment_ID )
after each comment update. - Adjustments according to the Coding Standard.
- Adds the tests
test_wp_comments_personal_data_eraser()
test_wp_comments_personal_data_eraser_empty_first_page_output()
test_wp_comments_personal_data_eraser_non_empty_first_page_output()
test_wp_comments_personal_data_eraser_empty_second_page_output()
I was wondering about the idea of num_items_retained
and message
in wp_comments_personal_data_eraser()
:
In the patch I set 'num_items_retained' => count( $comments ) - $num_items_removed
, i.e. the number of comments that were not erased due to some error. Is that correct understanding? The message
is still an empty array though.
#18
in reply to:
↑ 17
@
6 years ago
Replying to birgire:
I was wondering about the idea of
num_items_retained
andmessage
inwp_comments_personal_data_eraser()
:
The idea is that if core or a plugin decides not to erase personal data (e.g. in a WooCommerce Order) it increments num_items_retained and includes a message explaining itself to the admin (e.g. "Order 1234 is less than 180 days old. Personal data is not removed from orders less than 180 days old due to dispute resolution requirements.")
#19
in reply to:
↑ 17
@
6 years ago
Replying to birgire:
43442.4.c.diff is a suggestion that...
@birgire - thank you - that tests nicely! The only downside I see is that core actions/filters associated with wp_update_comment won't fire, but in this particular case I don't think that will be a problem. @azaozz - do you concur?
I was wondering about the idea of num_items_retained and message...
@birgire - Ideally, message would include an array of messages like "Comment 1432 contained personal data but could not be anonymized" - again, I think that is unlikely if not impossible (which is why my own patch returned 0 and an empty array) - which I think would be ok if yours did that as well.
If you do want to have num_item_retained non-zero, then we should also have a message for each failure - that way the admin can make informed decisions about what they want to do next.
This ticket was mentioned in Slack in #gdpr-compliance by casiepa. View the logs.
6 years ago
#21
@
6 years ago
- Owner set to allendav
- Status changed from new to assigned
Moving to the 4.9.6 milestone after consensus was reached in the most recent GDPR chat (https://wordpress.slack.com/archives/C9695RJBW/p1524063200000304).
#22
@
6 years ago
Thanks for the feedback @allendav
- Yes, it's something to consider, I'm still 50/50 divided between approach b) and c) ;-) I might have had few timeout cases in the past, when deleting hundreds of spam comments in bulk, in wp-admin.
- Regarding the messages in
wp_comments_personal_data_eraser
, one way could be to grab possible (rare) db update errors with:
$updated = $wpdb->update( $wpdb->comments, $anonymized_comment, array( 'comment_ID' => $comment->comment_ID, ) ); if ( $updated ) { $num_items_removed++; } else { $messages[] = sprintf( 'Comment %d contained personal data but could not be anonymized.', $comment->comment_ID ); }
- Any particular structure here for
$messages
(like we see e.g. in the$data_to_export
array here) ?
- Another thing to consider is empty input for
wp_comments_personal_data_eraser()
, it will still erase comments, as it will callget_comments()
but without the email filtering.
- Also the admin page is Remove Personal Data but here we have erase functions. So we have both "remove" and "erase". I think we should consider only one of them for consistency.
#23
@
6 years ago
In 43442.5.diff: building on 43442.4.c.diff and the above suggestions, added more and better error messages. Also added new filter: 'wp_anonymize_comment'
so plugins can prevent anonymization of a particular comment and return an error message with the reason.
#24
follow-up:
↓ 26
@
6 years ago
I like the idea in 43442.5.diff to provide the user with a link to manually handle comments with anonymization errors.
43442.6.diff additionally:
- Adds a docblock for the
wp_anonymize_comment
filter - Adjusts two strings to make them translatable
- Adds translators comments.
- Adds accessibility text for the
_blank
link. - Adds
external-link
class for_blank
link. - Adds an escape for
$edit_url
. - Changes
Ah error
toAn error
in a string. - Changes
$result
to$updated
. - Adds a test for the
wp_anonymize_comment
filter, to prevent comment anonymization. - Adds a test for the
wp_anonymize_comment
filter, to prevent comment anonymization, with a custom message.
ps: I wonder if we should rename $num_items_removed
to $num_items_erased
in general?
#26
in reply to:
↑ 24
@
6 years ago
- Milestone changed from 5.0 to 4.9.6
Replying to birgire:
I like the idea in 43442.5.diff to provide the user with a link to manually handle comments with anonymization errors.
Yeah, like that idea too. However looked into it a bit more and couldn't find any cases where a comment updating will fail but the comment can still be edited. Removed it for now. If we can prove it's useful, can add it back later.
simple function for deleting comments by email