Make WordPress Core

Opened 7 years ago

Closed 6 years ago

Last modified 6 years ago

#43440 closed enhancement (fixed)

Add personal data from comments to personal data export

Reported by: azaozz's profile azaozz Owned by: azaozz's profile azaozz
Milestone: 4.9.6 Priority: normal
Severity: normal Version:
Component: Privacy Keywords: gdpr has-patch has-unit-tests fixed-major commit
Focuses: Cc:

Description

The "right to data portability" in GDPR also applies to site visitors that submit comments. The personal data they provide is their name, email address and (optionally) site URL.

For sites to be able to comply with the GDPR we will have to add tools (UI and a method) to both show that data and export it as JSON.

As the only private data is the user email address, thinking that we don't need a confirmation and can simply email the data and the export file to the user at their email.

Attachments (10)

43440.diff (1.1 KB) - added by xkon 7 years ago.
export comments
43440.1.diff (1.6 KB) - added by xkon 7 years ago.
minor fixes and correct json output
43440.2.diff (3.1 KB) - added by allendav 7 years ago.
Comment exporter ported from 43438
43440.3.diff (3.5 KB) - added by azaozz 6 years ago.
43440.4.diff (3.4 KB) - added by azaozz 6 years ago.
43440.5.diff (3.9 KB) - added by allendav 6 years ago.
Add additional data from comment row to export; structure export to allow grouping and to allow de-duplication when the same object appears in the export twice
43440.5.png (851.0 KB) - added by allendav 6 years ago.
Screenshot example of new export structure
43440.6.diff (7.7 KB) - added by birgire 6 years ago.
43440.7.diff (7.6 KB) - added by birgire 6 years ago.
43440.8.diff (2.2 KB) - added by desrosj 6 years ago.

Download all attachments as: .zip

Change History (50)

#1 @azaozz
7 years ago

As discussed in Slack computer generated data like IDs, IP addresses, browser UAs, etc. are not data entered by the user, cannot be reused at another site, and should not be exported.

@xkon
7 years ago

export comments

#2 @xkon
7 years ago

Good morning!

As we're waiting for the Tools page to get created I thought of going forward a bit with this to at least have a snippet, broad idea of what we could easily do. Then we can easily implement it inside the page and adjust it to our needs of export.

43440.diff adds a simple function that utilizes the already-existing get_comments() and returns either the json or just a notification string for now. So basically it will return anything that get_comments() returns as well.

The idea of this is to be able to do something like :

$export = export_comments('wapuu@wordpress.example');
var_dump( $export );

I will clean this up a bit and proceed to return the array/json we want for this asap.

Last edited 7 years ago by xkon (previous) (diff)

@xkon
7 years ago

minor fixes and correct json output

#3 @xkon
7 years ago

43440.1.diff just cleans up the process a bit and returns a JSON with the:

date
author_email
author_name
author_url
comment

I added the date as a means of reference, I'm sure it will somehow be needed eventually if somebody comes to the use of the export function.

As an example if we do :

$exports = export_comments( 'wapuu@wordpress.example' );
error_log( print_r( $exports, true ) );

We can get:

[  
   {  
      "date":"2018-03-06 06:58:01",
      "author_email":"wapuu@wordpress.example",
      "author_name":"A WordPress Commenter",
      "author_url":"https:\/\/wordpress.org\/",
      "comment":"Hi, this is a comment.\r\nTo get started with moderating, editing, and deleting comments, please visit the Comments screen in the dashboard.\r\nCommenter avatars come from Gravatar<\/a>."
   },
   {  
      "date":"2018-03-06 06:58:01",
      "author_email":"wapuu@wordpress.example",
      "author_name":"A WordPress Commenter",
      "author_url":"https:\/\/wordpress.org\/",
      "comment":"Hi, this is a comment.\r\nTo get started with moderating, editing, and deleting comments, please visit the Comments screen in the dashboard.\r\nCommenter avatars come from Gravatar<\/a>."
   }
]

We can either show this JSON for copy/paste in a placeholder or make it a file etc on the actual Tools page.

IMHO as this pretty much extends the current comments functions either way, it might be useful in the future in different areas as well if we want to add more things or make it create a wider range of options / exports.

#4 @xkon
7 years ago

  • Keywords has-patch 2nd-opinion added; needs-patch removed

#5 follow-up: @xkon
7 years ago

First of all I was going to merge this idea with #43442 as well by creating a single function that handles an $arg as an action. So 1 function could tackle deletion/anonymization/export/display of comments depending on what we want for the Tools page: for example handle_comments( some@where.com, 'delete' ); etc. This would've been easy BUT!

@netweb brought to my attention on this ticket that there might me 10.000 comments and because of that we should consider using the Export API that already works with batches so it should be perfect for websites with massive ammounts of comments etc instead of re-creating something that exists.

The thing is that the current Export API is only handling WXR but there's already a ticket around for -oh well long-enough- to enable more support for extra files / outputs #22435.

I'd like @azaozz and @allendav inputs if possible on this ( and everybody else that likes to follow along it be great of course! ), since we had the chat last night to see what is best fitting for now as we're working under a deadline etc. I'm not very well versed in all matters of core still so your inputs will definitely give a better opinion and shed a light somehow.

#6 in reply to: ↑ 5 @azaozz
7 years ago

Replying to xkon:

First of all I was going to merge this idea with #43442 as well by creating a single function that handles an $arg as an action. So 1 function could tackle deletion/anonymization/export/display of comments depending on what we want for the Tools page: for example handle_comments( some@where.com, 'delete' ); etc. This would've been easy BUT!

I'd rather we have separate functions for each action. Even if we handle all cases in one "low level" function we should have separate "helpers" for each action that will fire a filter or an action for plugins to hook in, etc.

@netweb brought to my attention on this ticket that there might me 10.000 comments and because of that we should consider using the Export API that already works with batches so it should be perfect for websites with massive ammounts of comments etc instead of re-creating something that exists.

I'm not so sure we need that. Even with an extreme amount of comments we do a similar DB query every time when displaying comments on each post on the front-end. The important part is that we have an index on comment_author_email that speeds things up considerably.

Of course writing to the DB when deleting or anonymizing will be slower, but still think we can pull that off without batching. Lets do it "directly" for now and optimize for writing later if needed. Possible optimization can be to select the comment_ID in batches of lets say 500 and use that to limit the writing.

#7 @allendav
7 years ago

  • Summary changed from Add tools to show and export the personal data of commenters to Add personal data from comments to personal data export

Re-focusing this ticket to just "Add personal data from comments to personal data export" per ticket scrub discussion in Making WordPress gdpr-compliance chat today. This ticket relies on the infrastructure added in #43438.

Last edited 6 years ago by SergeyBiryukov (previous) (diff)

#8 @allendav
7 years ago

@azaozz wrote in https://core.trac.wordpress.org/ticket/43438#comment:18

For getting only unique data from the comments query in 43438.3.diff I think we can try something like...

I don't think we can - that will only guarantee unique data within that page of that exporter's results. Since this exporter will be an example for others to follow for their own plugins, I'd like to keep this ticket as simple as possible and unique-ify data when we create the export file in #43551

edit: That said, I'm going to simplify the name in the name - value pair from things like "Comment 13 : Comment Author" to "Comment Author" which will help us unique-ify and cut down on repetition in #43551

Last edited 7 years ago by allendav (previous) (diff)

@allendav
7 years ago

Comment exporter ported from 43438

#9 @allendav
7 years ago

FYI - testing this patch now requires also applying the patches from #43438 and from #43546

#10 follow-up: @allendav
6 years ago

Note: It is possible that we will need to export not just obvious unique personal data for a user (e.g. author_email) from objects like comments, but every "field" in a comment that user made. If so, this means that the name->value approach would result in extremely large files since we 1) wouldn't be able to de-duplicate since 2) we'd probably keep each exported comment object "intact" (i.e. with all its fields together, so the export made more sense.)

Hope to have an update shortly from discussion with stakeholders.

cc @azaozz

#11 in reply to: ↑ 10 @azaozz
6 years ago

Replying to allendav:

As far as I understand we are "exporting" private or identifiable user data, not dumping all rows from the comments table in the DB, see https://core.trac.wordpress.org/ticket/43438#comment:31.

If we have to "export" all comments a user ever made, perhaps we should be using the WP exporter, no need of separate functions.

@azaozz
6 years ago

#12 @azaozz
6 years ago

In 43440.3.diff: if we don't need to dump all comments rows, tweaked 43440.2.diff to output structured data "map" without duplicates. The data looks like this:

    [data] => Array
        (
            [comment_author] => Array
                (
                    [name] => Comment Author
                    [values] => Array
                        (
                            [0] => Andrew
                        )

                )

            [comment_author_email] => Array
                (
                    [name] => Comment Author Email
                    [values] => Array
                        (
                            [0] => admin@local.host
                        )

                )

            [comment_author_url] => Array
                (
                    [name] => Comment Author URL
                    [values] => Array
                        (
                        )

                )

            [comment_author_IP] => Array
                (
                    [name] => Comment Author IP
                    [values] => Array
                        (
                            [0] => 127.0.0.1
                            [1] => 192.168.0.166
                            [2] => 192.168.0.152
                        )

                )

            [comment_agent] => Array
                (
                    [name] => Comment Agent
                    [values] => Array
                        (
                            [0] => Mozilla/5.0 (Windows NT 6.1; WOW64; rv:10.0.2) Gecko/20100101 Firefox/10.0.2
                            [1] => Mozilla/5.0 (Windows NT 6.1; WOW64; rv:11.0) Gecko/20100101 Firefox/11.0
                            [2] => Mozilla/5.0 (Windows NT 6.1; WOW64; rv:12.0) Gecko/20100101 Firefox/12.0
                            .....

@azaozz
6 years ago

#13 follow-up: @azaozz
6 years ago

Also, in 43440.4.diff: we can determine when we're done when $done = count( $comments ) < $number; to avoid one more loop. That leaves the edge case when count( $comments ) exactly matches $number but thinking we can live with that :)

#14 in reply to: ↑ 13 @allendav
6 years ago

Replying to azaozz:

Also, in 43440.4.diff: we can determine when we're done when $done = count( $comments ) < $number; to avoid one more loop. That leaves the edge case when count( $comments ) exactly matches $number but thinking we can live with that :)

Nice!

#15 @allendav
6 years ago

@azaozz - based on lengthy discussion with my stakeholders, it would be better if this feature exported more of the comment row to make the exported data actually useful/understandable for the user (and not just include the columns that have their personal info in them). So... I’d include the following at this point for EACH comment:

Comment Author Username or Name (comment_author)
Comment Author Email (comment_author_email)
Comment Author URL (comment_author_url), if provided
Comment Author IP Address (comment_author_IP)
Comment Author Agent (comment_agent), if provided
Comment Text (or at least an excerpt)
Datetime (in blog time zone)
URL (if approved)

I would NOT export these columns but I would expect the site owner to disclose in a site’s privacy policy that they may be calculated automatically (e.g. by Akismet)

Approved/Pending/Spam
Karma

And I’d not bother exporting these columns for a comment at all, as they won’t be meaningful to the user, don’t contain personal data, and we shouldn’t include other’s comments in the export for a given user (so therefore we won’t be exporting comment trees and won’t need parent IDs):

Comment ID
Parent Comment ID, if any
Datetime (in GMT)
Post ID
Comment Type
Commenter’s User ID, if any

I'll update your patch to pull this all in - we'll need to keep the meta for a given comment together now, so I'll probably change your array structure a bit but we'll keep the de-duplication concept for plugins like WooCommerce that adds additional meta data to comments (for product review ratings) so the export can keep those items together and cut down on the export size a bit.

@allendav
6 years ago

Add additional data from comment row to export; structure export to allow grouping and to allow de-duplication when the same object appears in the export twice

@allendav
6 years ago

Screenshot example of new export structure

#16 @allendav
6 years ago

@azaozz - in 43440.5 I took your patch and modified it to include the additional data listed in https://core.trac.wordpress.org/ticket/43440#comment:15

With this change, it will be easier for the end user of the export to understand the overall comment context in which their personal data is stored.

Also, since the original proposal (an array of name-value pairs) makes it impossible to represent personal data in context, I've nested the comment data as discussed and I added the concept of a item_id to allow us to collect all the personal data that core or plugins (e.g. WooCommerce product ratings) add to a given comment and I've added the concept of a group_id and group_label so we can present all the comments together in the export HTML report, one after the other - not unlike how Facebook groups things in their export.

We can have additional groups for posts, attachments, usermeta, etc and plugins can add their own if needed.

Unfortunately, this (presenting personal data in context) means that we can't really de-duplicate things like IP addresses or user agent that repeat across comments, so I had to remove that code from your patch.

To test this patch, you'll need:

This patch (43440.5)
Patch 43438.6 (for exporter filters and wp-admin ajax)
Patch 43443.4 (for sending the confirmation email to the user and getting the request in the list so you can kick off the export)
And this plugin (until 43546 is ready): https://github.com/allendav/wp-privacy-requests (to see the list under wp-admin > Tools > Personal Data Export)

And then, after installing all that:

Make a few comments on your site if none exist
Open the Javascript console for the browser tab if not already open
Go to wp-admin > Tools > Personal Data Export
Select "Personal Data Export" as the Type of Action to Request
Enter the commenter's email address and hit "Send Request"
It is not necessary to actually confirm the request if you don't want to
You should see the request in the WP List Table
Hover over the email and you should see a "Download" action. Click it.
The download should successfully complete and the comment data should appear in the console.
(ZIP file download is not yet available - but soon will be)

Please let me know what you think.

#17 @birgire
6 years ago

Will GDPR support in WordPress be backported to older versions?

If not then in wp_comments_personal_data_exporter() one could use paged for pagination in get_comments() instead of calculating an $offset.

This is supported in WP_Comment_Query since 4.9 (#38268)

If it will be backported, then I wonder if it will be more clear to use a "1-based" pagination, instead of "0-based".

As it might be confusing that page=0 is the first page, and page=1 is the second page, etc.

So instead of:

$offset = $number * $page; 

where $page >= 0, we would use:

$offset = $number * ( $page - 1 ); 

where $page >= 1.

#18 follow-up: @azaozz
6 years ago

  • Owner set to azaozz
  • Resolution set to fixed
  • Status changed from new to closed

In 42888:

Privacy: add functionality for exporting personal data from comments.

Props allendav, xkon.
Fixes #43440.

#19 in reply to: ↑ 18 @allendav
6 years ago

Replying to azaozz:

In 42888:

Privacy: add functionality for exporting personal data from comments.

Props allendav, xkon.
Fixes #43440.

Thank you @azaozz ! This patch works in concert with #43438 - will that land soon too as well?

#20 @ocean90
6 years ago

  • Keywords needs-patch added; has-patch 2nd-opinion removed
  • Resolution fixed deleted
  • Status changed from closed to reopened

[42888] has a few issues which need to be addressed:

  • The @since tag is missing for the new functions.
  • @param and @return shouldn't be aligned.
  • Comments should end with a period.
  • Multi-line comments should start with a /*.
  • "Comment Agent" should be "Comment Author User Agent".
  • trim( strtolower( $email_address ) ) should be strtolower( trim( $email_address ) ), strtolower() doesn't produce whitespace characters.
  • After the switch keyword should be a whitespace.
  • In associative arrays with more than one item each item should start on a new line.
  • For variable property names we usually use curly braces, so $comment->{$key};.
  • It doesn't seem like comment meta are used, can we pass update_comment_meta_cache to get_comments()?

@birgire
6 years ago

#21 @birgire
6 years ago

  • Keywords has-patch has-unit-tests added; needs-patch removed

43440.6.diff includes:

  • the adjustments suggested by @ocean90 according to the Coding Standard
  • the follwoing tests:
    • test_wp_comments_personal_data_exporter()
    • test_wp_comments_personal_data_exporter_with_no_comments_found()
    • test_wp_comments_personal_data_exporter_with_empty_comment_prop()
    • test_wp_comments_personal_data_exporter_second_page()

@birgire
6 years ago

#23 @birgire
6 years ago

43440.7.diff has minor adjustments for the tests, like using assertTrue() instead of assertSame( true, ...) and adjusting the test method names.

#24 @desrosj
6 years ago

  • Milestone changed from 5.0 to 4.9.6

Moving to the 4.9.6 milestone after consensus was reached in the most recent GDPR chat (https://wordpress.slack.com/archives/C9695RJBW/p1524063200000304).

#25 @azaozz
6 years ago

In 42987:

Privacy: fix docs, formatting, white space, add tests for the personal data from comments exporter.

Props birgire.
See #43440.

#26 @jorbin
6 years ago

In 42989:

Privacy: Fix JSHint errors

Introduced in [42986].

See #43440.

#27 @jorbin
6 years ago

In 42990:

Remove accidental addition to package.json

Introduced in [42989].

See #43440.
Unprops jorbin.

#28 @jorbin
6 years ago

In 42991:

Wow. I really shouldn't try to fix the build.

Previous [42989] [42986].
See #43440.

This ticket was mentioned in Slack in #core by jeffpaul. View the logs.


6 years ago

#30 @azaozz
6 years ago

  • Keywords fixed-major added

#31 @azaozz
6 years ago

  • Keywords commit added

This ticket was mentioned in Slack in #gdpr-compliance by desrosj. View the logs.


6 years ago

@desrosj
6 years ago

#33 @desrosj
6 years ago

43440.8.diff addresses the remainder of the feedback from @ocean90.

  • @param and @return shouldn't be aligned.
  • "Comment Agent" should be "Comment Author User Agent".
  • For variable property names, we usually use curly braces, so $comment->{$key};.
  • Comment meta is not used, pass update_comment_meta_cache to get_comments()?

#34 @azaozz
6 years ago

In 43058:

Privacy: docs fixes and improvements for wp_comments_personal_data_exporter().

Props desrosj.
See #43440.

#35 @SergeyBiryukov
6 years ago

In 43076:

Privacy: add functionality for exporting personal data from comments.

Props allendav, xkon.
Merges [42888] to the 4.9 branch.
See #43440.

#36 @SergeyBiryukov
6 years ago

In 43077:

Privacy: fix docs, formatting, white space, add tests for the personal data from comments exporter.

Props birgire.
Merges [42987] to the 4.9 branch.
See #43440.

#37 @SergeyBiryukov
6 years ago

In 43078:

Privacy: Fix JSHint errors.

Introduced in [42986].

Props jorbin.
Merges [42989] to the 4.9 branch.
See #43440.

#38 @SergeyBiryukov
6 years ago

In 43079:

Privacy: docs fixes and improvements for wp_comments_personal_data_exporter().

Props desrosj.
Merges [43058] to the 4.9 branch.
See #43440.

#39 @SergeyBiryukov
6 years ago

  • Resolution set to fixed
  • Status changed from reopened to closed

#40 @desrosj
6 years ago

  • Component changed from Administration to Privacy

Moving to the new Privacy component.

Note: See TracTickets for help on using tickets.