#43545 closed enhancement (fixed)
Helper functions: Anonymizing data in a standardized way
Reported by: | dejliglama | Owned by: | azaozz |
---|---|---|---|
Milestone: | 4.9.6 | Priority: | normal |
Severity: | normal | Version: | 5.1 |
Component: | Privacy | Keywords: | needs-patch gdpr |
Focuses: | Cc: |
Description
To help GDPR Compliance with the anonymization functionality, we need to consider
- returning anonymized data based on the type of field holding the data eg. e-mail, address, number sequences and other
- returning data that can't be misunderstood as real data (making sense locally as well as to the administrator of a site
- ?
A proposal has been to consider a general format with a prefix, an MD5 hash of the data, and a postfix to make the data compliant with the field type.
eg:
johndoe@… > [deleted]-[MD5 of Johndoe@…][postfix]
The e-mail would then become: deleted-0c47598f5738e73c843552572a371543@…
The prefix and postfix of each field type should be localized.
Furthermore, field data that doesn't require MD5 hashing like standard addresses: 123 Main Street - could be localized.
Attachments (9)
Change History (57)
#4
follow-up:
↓ 18
@
6 years ago
We could do something like this:
<?php function wp_privacy_anonymize_data( $data_to_anonymize = '', $type = 'text' ) { switch ( $type ) { case 'email': $data_to_anonymize = ... break ... ... } return $data_to_anonymize; }
That would mean we could expand it with multiple types, and also allow filtering on the data to be returned.
fx:
<?php /** * Anonymizes data for the GDPR "Right to Anonymization". * * @since 5.0.0 * * @param string $data_to_anonymize. * @param string $type the type of the data to be anonymized. * * @return string $data_to_anonymize as anonymized. */ function wp_privacy_anonymize_data( $data_to_anonymize = '', $type = 'text' ) { if( ! $data_to_anonymize) { return false; } $type = trim( strtolower( $type ) ); $orginal_data_to_anonymize = $data_to_anonymize; switch ( $type ) { case 'id': // should we ever make something to do this? break; case 'email': /* translators: IP to anonymize */ $data_to_anonymize = __('anonymized-email') . '-' . wp_hash( $data_to_anonymize ) .'@example.com'; break; case 'url': /* translators: url to anonymize */ $data_to_anonymize = __('example.com'); break; case 'ip': /* translators: IP to anonymize */ $data_to_anonymize = __('0.0.0.0'); break; case 'agent': break; case 'float': break; case 'number': case 'int': break; case 'date': //should we ever make something to do this? - can dates be personal? break; case 'name': break; case 'text': /* translators: text to anonymize */ $data_to_anonymize = __('anonymized-text'); break; case 'shorttext': /* translators: shorttext to anonymize */ $data_to_anonymize = __('anonymized-text. This text has been deleted on request of the person who wrote it.'); break; case 'longtext': /* translators: longtext to anonymize */ $data_to_anonymize = __('anonymized-text. This text has been deleted on request of the person who wrote it. This text has been deleted on request of the person who wrote it. This text has been deleted on request of the person who wrote it. This text has been deleted on request of the person who wrote it. This text has been deleted on request of the person who wrote it. This text has been deleted on request of the person who wrote it.'); break; default: //keep this empty to allow filtering for other types break; } // allow for others to filter data_to_anonymize $data_to_anonymize = apply_filters( 'wp_privacy_anonymize_data', $data_to_anonymize, $type, $orginal_data_to_anonymize ); // allow for others to filter data_to_anonymize specific for type $data_to_anonymize = apply_filters( "wp_privacy_anonymize_data_{$type}", $data_to_anonymize, $type, $orginal_data_to_anonymize ); return $data_to_anonymize; }
Then we could create small helper functions fx:
<?php function wp_privacy_anonymize_ip( $ip_to_anonymize ) { return wp_privacy_anonymize_data( $ip_to_anonymize, 'ip' ); }
#5
@
6 years ago
What file will these functions fit into, Do you think?
/includes/functions.php
Or is there another file that is more appropriate ?
#6
follow-up:
↓ 7
@
6 years ago
In relation to e-mail anoymization (and possibly other fields), the requirement of an e-mail to be unique should be thought into this solution so that e-mails stay unique even after anonymization.
#7
in reply to:
↑ 6
@
6 years ago
Replying to dejliglama:
In relation to e-mail anoymization (and possibly other fields), the requirement of an e-mail to be unique should be thought into this solution so that e-mails stay unique even after anonymization.
Emails, is already unique, since we already use it as an ID.
The situation where an two hashed emails gets hashed to the same, is very unlikely.
And if the there still needs to be a relation between seperate data, based on email as link, it wouldn't be good to make 2 different emails, based on the same email.
But maybe we could make a "unique" parameter to the function, and if its true, we include a unique ID
#8
@
6 years ago
Thanks for the patch @jesperher
If we use a wrapper like this one:
function wp_privacy_anonymize_ip( $ip_to_anonymize ) { return wp_privacy_anonymize_data( $ip_to_anonymize, 'ip' ); }
then we could use:
case 'ip': $data_to_anonymize = _wp_privacy_anonymize_ip_handler( $data_to_anonymize ); break;
in wp_privacy_anonymize_data()
, where we could use the IP anonymizing part from the WP_Community_Events::get_unsafe_client_ip()
method in (src):
/** * Partially anonymize the IP address by reducing it to the corresponding network ID. * * Modified from https://github.com/geertw/php-ip-anonymizer, MIT license. * * @since 5.0.0 * @access private * @param string $ip IP address. * @return false|string $ip The anonymized IP address on success; the given * address or false on failure. */ function _wp_privacy_anonymize_ip( $ip ) { $ip_prefix = ''; $netmask = false; // Detect what kind of IP address this is. $is_ipv6 = substr_count( $ip, ':' ) > 1; $is_ipv4 = ( 3 === substr_count( $ip, '.' ) ); if ( $is_ipv6 && $is_ipv4 ) { // IPv6 compatibility mode, temporarily strip the IPv6 part, and treat it like IPv4. $ip_prefix = '::ffff:'; $ip = preg_replace( '/^\[?[0-9a-f:]*:/i', '', $ip ); $ip = str_replace( ']', '', $ip ); $is_ipv6 = false; } if ( $is_ipv6 ) { // IPv6 addresses will always be enclosed in [] if there's a port. $ip_start = 1; $ip_end = (int) strpos( $ip, ']' ) - 1; $netmask = 'ffff:ffff:ffff:ffff:0000:0000:0000:0000'; // Strip the port (and [] from IPv6 addresses), if they exist. if ( $ip_end > 0 ) { $ip = substr( $ip, $ip_start, $ip_end ); } // Partially anonymize the IP by reducing it to the corresponding network ID. if ( function_exists( 'inet_pton' ) && function_exists( 'inet_ntop' ) ) { $ip = inet_ntop( inet_pton( $ip ) & inet_pton( $netmask ) ); } } elseif ( $is_ipv4 ) { // Strip any port and partially anonymize the IP. $last_octet_position = strrpos( $ip, '.' ); $ip = substr( $ip, 0, $last_octet_position ) . '.0'; } else { return false; } // Restore the IPv6 prefix to compatibility mode addresses. $ip = $ip_prefix . $ip; return $ip; }
Note that the above version is currently suited for the WP_Community_Events::get_unsafe_client_ip()
method, where it returns the full IP address if e.g. the functions inet_pton()
and inet_ntop()
do not exists. That's intentional behavior there, see e.g. #41083.
I think we might need to adjust the full IP address return here, with something else. Also should the function return false
or e.g. WP_Error
instead on failure? But it would be great if this function could be re-used in e.g. the WP_Community_Events::get_unsafe_client_ip()
method.
#10
follow-up:
↓ 12
@
6 years ago
Thanks @jesperher for that detailed list!
I wanted to comment on this bit:
case 'date': //should we ever make something to do this? - can dates be personal?
Birthdates or the day when a user joins a service, for example, could partly reveal identity as it filters a larger dataset into a smaller one.
It reminds me of the great dialogue in the #gdpr-compliance Slack channel about how user agents aren't exactly anonymous... Also, the EFF wrote a great post a few years back about what a user agent contains.
My proposal would be to, on random submit for time
, just return 0 UTC
/Thursday, January 1, 1970, which should indicate, universally, that this date was redacted.
#11
follow-up:
↓ 13
@
6 years ago
This sounds good but imho there are far too many options to anonymize things. It would be handy for translatable strings but not for numbers, tokens, etc. For example: what would be an anonymized id
or email
or url
? All these should be empty strings.
I'd also argue that an anonymized IP address is 0.0.0.0
(if not an empty string). Don't see any value to try and keep partial user data for general purposes. If a plugin requires to retain specific data, it can choose to anonymize it in a specific way.
I like the translatable text
and longtext
, but they seem too long? Something like Deleted
and Deleted by the author.
would be enough imho.
Also, two different filters, one of which is dynamic (a.k.a very hard to read/use) for few strings seem excessive :)
#12
in reply to:
↑ 10
@
6 years ago
Replying to postphotos:
case 'date': //should we ever make something to do this? - can dates be personal?Birthdates or the day when a user joins a service, for example, could partly reveal identity as it filters a larger dataset into a smaller one.
It reminds me of the great dialogue in the #gdpr-compliance Slack channel about how user agents aren't exactly anonymous... Also, the EFF wrote a great post a few years back about what a user agent contains.
My proposal would be to, on random submit for
time
, just return0 UTC
/Thursday, January 1, 1970, which should indicate, universally, that this date was redacted.
I think you are right on this one. I will make it so it returns 0 / January 1, 1970, nomatter what the input is.
For the user agents - i dont see other solution right now, than just to return an empty string.
#13
in reply to:
↑ 11
@
6 years ago
Replying to azaozz:
This sounds good but imho there are far too many options to anonymize things. It would be handy for translatable strings but not for numbers, tokens, etc. For example: what would be an anonymized
id
orurl
? All these should be empty strings.
I kind of agree on numbers - when first creating the list, it was kind of a brainstorm on different types of data.
For the ID part, i dont think we can anonymize that.
But for the email part i still think this i very relevant, because of the way WordPress uses mail adresses, and how a lot of code normally us depending on the the data to be of a valid email type.
Then i belive it is better, to create a "standard" of how it us anonymized, and help plugin developers to use this feature, istead of it beeing done on 1000 different ways.
I'd also argue that an anonymized IP address is
0.0.0.0
(if not an empty string). Don't see any value to try and keep partial user data for general purposes. If a plugin requires to retain specific data, it can choose to anonymize it in a specific way.
We could maybe create a Ip_zero, and a IP_partially or something like that, to make both methods avaviable.
I like the translatable
text
andlongtext
, but they seem too long? Something likeDeleted
andDeleted by the author.
would be enough imho.
You might be right on this.
Also, two different filters, one of which is dynamic (a.k.a very hard to read/use) for few strings seem excessive :)
I disagree.
It is also used in that way other places in core, with dynamic hooks. Like on creating/saving posts.
#14
follow-up:
↓ 19
@
6 years ago
We could maybe create a Ip_zero, and a IP_partially...
Actually think there was a court ruling in Germany requiring anonymized IPs to be cut to only the first two octets, i.e. 192.168.0.0. This makes them really useless for any purpose :(
#16
@
6 years ago
Instead of 0.0.0.0, it is helpful to retain the first octets of the IP address so you can still see where (geographically) your commenters are coming from without exposing their identify (which the full IP address could.)
For IPV4, let's zero the last octet and for IPV6 the last ten. If a site needs more than that, let's make the IP anonymization filterable
#17
@
6 years ago
In 43545.2.diff: make the changes discussed in https://core.trac.wordpress.org/ticket/43545#comment:11 and https://core.trac.wordpress.org/ticket/43545#comment:13. I've also removed the $data_to_anonymize
param since it became unused.
If we want to be able to anonymize IP addresses by truncating them, we can extract the function from the Community Events dashboard widget https://core.trac.wordpress.org/browser/tags/4.9.4/src/wp-admin/includes/class-wp-community-events.php#L210 and reuse it. However seems that this will not comply with the privacy laws in some countries.
Also, seems that the comment text is considered "personal data", so we will have to delete the whole comments instead of "anonymizing" them. Then there will be no need for truncated IPs for comments.
#18
in reply to:
↑ 4
@
6 years ago
NB there is no "GDPR right to anonymisation" (I think you may be conflating two different things there.) Anonymisation is highly recommended under GDPR but it's not one of the fundamental user rights - it's one of the means which could make it possible.
Might sound a bit petty but it's like Capital P dangit
- Anonymizes data for the GDPR "Right to Anonymization".
#19
in reply to:
↑ 14
;
follow-up:
↓ 20
@
6 years ago
Replying to azaozz:
We could maybe create a Ip_zero, and a IP_partially...
Actually think there was a court ruling in Germany requiring anonymized IPs to be cut to only the first two octets, i.e. 192.168.0.0. This makes them really useless for any purpose :(
Have you got any source for this? I know of no such ruling and have not been able to find any German article that mentions it. All German articles on the topic of IP anonymization seem talk about killing the last octet (or using Google's standard _anonymizeIp()
function).
#20
in reply to:
↑ 19
;
follow-up:
↓ 47
@
6 years ago
Replying to pputzer:
Have you got any source for this? I know of no such ruling and have not been able to find any German article that mentions it. All German articles on the topic of IP anonymization seem talk about killing the last octet (or using Google's standard
_anonymizeIp()
function).
Somebody mentioned this few weeks ago in Slack (I think) but can't find anything more about it either. Thinking we can consider this as "not real unless proven otherwise" :)
#21
@
6 years ago
Instead of example.test, could we use example.com or example.org per https://www.iana.org/domains/reserved
#22
follow-up:
↓ 24
@
6 years ago
@azaozz - I added IPV4 and IPV6 anonymization and unit tests for the lot. I also modified the domains to example.com and the time stamp to a more appropriate default
To run the unit tests, run
phpunit tests/phpunit/tests/functions/anonymization.php
Please let me know if adding a unit test file will be a no-no for a minor release and i can add these somewhere else
This ticket was mentioned in Slack in #gdpr-compliance by allendav. View the logs.
6 years ago
#24
in reply to:
↑ 22
@
6 years ago
Replying to allendav:
43545.diff looks good. There is one place where it may fail to anonymize an IPv6 address, if ! function_exists( 'inet_pton' )
. I'll fix that.
Was also thinking we can reuse the "extracted" function in WP_Community_Events: https://core.trac.wordpress.org/browser/tags/4.9.4/src/wp-admin/includes/class-wp-community-events.php#L210
...modified the domains to example.com and the time stamp to a more appropriate default
Not sure if that's better. Thinking that it should be changed to something that can never be used. The example.com is an actual website, it belongs to "somebody". However the .test domain will never ever exist.
#25
@
6 years ago
In 43545.3.diff:
- Fix output from
wp_privacy_anonymize_ip()
wheninet_pton()
orinet_ntop()
don't exist. - Make
wp_privacy_anonymize_data()
return an empty string when$type
is not passed. - Change the anonymous site address to
site.invalid
. That way nobody ever can mistake it for a "real site" as it will always return an error when trying to reach it.
#26
follow-up:
↓ 27
@
6 years ago
I was wondering about the current 0.0.0.0
output for an IPv6 address.
For example:
wp_privacy_anonymize_ip( '2001:db8:a0b:12f0::1' )
now outputs '0.0.0.0'
when inet_ntop()
or inet_pton()
are not available.
The unspecified address for IPv6 is 0:0:0:0:0:0:0:0
or ::
.
Would ::
be expected in the above example?
Some other suggestions for 43545.3.diff:
Make the tests aware of missing inet_ntop()
/inet_pton()
functions, with e.g.
if ( ! function_exists( 'inet_ntop' ) ) { $this->markTestSkipped( 'inet_ntop extension not available.' ); } if ( ! function_exists( 'inet_pton' ) ) { $this->markTestSkipped( 'inet_pton extension not available.' ); }
or modify the expected value explicitly like:
if( function_exists( 'inet_ntop' ) && function_exists( 'inet_pton' ) ) { $expected = '...'; } else { $expected = '0.0.0.0'; }
Adjustments according to the Coding Standard.
Should we use data providers here instead of for loops?
#27
in reply to:
↑ 26
@
6 years ago
Replying to birgire:
I was wondering about the current
0.0.0.0
output for an IPv6 address.
Yeah, can change that to ::
although (afaik) IPv4 and IPv6 addresses are fully interchangeable.
This ticket was mentioned in Slack in #gdpr-compliance by allendav. View the logs.
6 years ago
#29
follow-up:
↓ 30
@
6 years ago
FYI, there was a recent bug fix to the Events Widget IP anonymization in r42968, so that should probably be synced here.
Although, rather than duplicating that code, would it be better to abstract it into a modular function that can be used by both the GDPR anonymization and the Events Widget anonymization?
#30
in reply to:
↑ 29
@
6 years ago
Replying to iandunn:
FYI, there was a recent bug fix to the Events Widget IP anonymization in r42968, so that should probably be synced here.
Yep, thanks for the ping! :)
Although, rather than duplicating that code, would it be better to abstract it into a modular function that can be used by both the GDPR anonymization and the Events Widget anonymization?
Yeah, was thinking that too. In 43545.4.diff:
- Abstract the code that anonymizes IP address and use it in both the Events Widget and
wp_privacy_anonymize_data()
. - Make
$type
required inwp_privacy_anonymize_data()
. The right data type is pretty important. Thinking it's better to always pass it rather than get an empty string for unspecified types. That will let us add more types in the future if needed. - Clean up the test a bit.
#31
@
6 years ago
43545.4.diff looks pretty good to me. I made a few minor changes in 43545.5.diff:
src
:
- Move
$ip_prefix
fromget_unsafe_client_ip()
wp_privacy_anonymize_ip()
to preserve the default value (although theNULL
would probably get converted to''
anyway). - Minor cleanup (trailing spaces, full stops, etc).
tests
:
- Moved the tests from
Test_WP_Community_Events::test_get_unsafe_client_ip_anonymization()
toTests_Functions_Anonymization
, so that they remain attached to the function they're testing. - Refactored
Test_WP_Community_Events::test_get_unsafe_client_ip()
to only test the functionality left inget_unsafe_client_ip()
. - Swapped function params in unit test calls to
wp_privacy_anonymize_ip()
, so that the tests pass.
It might be worth noting that the Events Widget doesn't strictly need $ipv6_fallback
mode, if using the un-anonymized IP is going to cause GDPR issues. If only anonymized IPs were sent to api.wordpress.org, then a small subset of users -- those running PHP < 5.3
on Windows -- would lose the functionality that automatically displays events based on geolocating the IP address. Instead, they'd have to manually enter their location.
Automatically detecting the location and showing local events is one of key features of the Events Widget, because the demographic it primarily targets are those users who aren't already aware of the WP community, and are therefore less likely to put in the effort to explore the dashboard widget and manually fetch the events. Having said that, though, removing that functionality could be an acceptable compromise.
If we do keep $ipv6_fallback
mode, then we should probably add some tests for it.
#32
@
6 years ago
If this gets bumped from `5.0` to `4.9.x`, then it'd probably be best to move the tests from tests/functions/anonymization.php
to tests/functions.php
, to avoid adding a new file in a point release.
#33
@
6 years ago
@iandunn thanks for the review and the fixes! 43545.5.diff looks good. Going to commit it later today if no other remarks.
...the Events Widget doesn't strictly need $ipv6_fallback mode, if using the un-anonymized IP is going to cause GDPR issues.
Was thinking about that too. For privacy we have to guarantee we return an anonymized IP, no matter what. Since we depend on the PHP version/environment to be able to do anonymization of IPv6 addresses, it's probably a good idea to be able to fall back gracefully. However removing that will make wp_privacy_anonymize_ip()
simpler to use... Kind of 50/50 there :)
If this gets bumped from
5.0
to4.9.x
...
No problems with new files in /tests. They are only problematic in /build for minor releases.
#34
@
6 years ago
I was testing various IP address inputs, when I noticed that for e.g.:
$ip_addr = ':::'; echo inet_pton( $ip_addr );
gives a PHP warning:
Warning: inet_pton(): Unrecognized address ::: in [...][...] on line 3
We could handle it with e.g.:
try { $ip_addr = inet_ntop( inet_pton( $ip_addr ) & inet_pton( $netmask ) ); if( false === $ip_addr ) { return '::'; } } catch( Exception $error ) { return '::'; }
where we also handle the possible false
output of inet_ntop()
.
I checked and these warnings will be removed in PHP 7.1+
https://github.com/php/php-src/pull/3200
The 43545.6.diff patch includes suggestions that:
tests part:
- Declares visibility and adds doc comments for
test_anonymize_email()
,test_anonymize_url()
,test_anonymize_date()
,test_anonymize_text()
,test_anonymize_long_text()
.
- Adds misisng full-stops to inline comments.
- Adds doc parameters for
$raw_ip
and$expected_result
. - Detailed
@return
fordata_wp_privacy_anonymize_ip()
. - Adds a file doc comment.
- Adds short doc comment for the class.
- Makes the tests explicit in
test_anonymize_text()
,test_anonymize_long_text()
, i.e. useassertEquals()
instead ofassertNotEquals()
. - Adds the invalid IP testcase:
::: -> ::
that would give a PHP warning if unhandled. - Adds the invalid IP testcase:
null -> 0.0.0.0
. - Adds the netmask testcase:
10.20.30.45/24 -> 10.20.30.0
. - Adds a
markTestSkipped
intest_wp_privacy_anonymize_ip()
ifinet_pton
andinet_pton
are not available. Otherwise most of the IPv6 testcases, provided bydata_wp_privacy_anonymize_ip()
, would fail. Another approach would be to adjust the testcases to to handle that. - Replaces:
Test that get_unsafe_client_ip() properly anonymizes all possible address formats.
with:Test that wp_privacy_anonymize_ip() properly anonymizes all possible IP address formats.
src part:
- Adds
@uses inet_ntop and inet_pton if available, to handle IPv6 addresses.
- Handle a possible PHP warning from
inet_pton
andinet_pton
for an unrecognized address. - Handle a possible
false
output ofinet_ntop()
.
To consider:
Example:
echo wp_privacy_anonymize_data( 'ip', '...' );
outputs: '...0'
instead of 0.0.0.0
.
Another example is how out-of-range IPv4 should be handled:
echo wp_privacy_anonymize_data( 'ip', '999.999.999.999' );
outputs: '999.999.999.0'
but should it be instead 0.0.0.0
?
One might aim for a stricter IPv4 validation in a future ticket or e.g. ticket #34538 Improvement of the IPv4 address format check on WP_Http::is_ip_address
.
I wonder if there should be a general WP IP utility (possible future ticket) to avoid IP handling duplications in various parts of the core (e.g. db hosts, HTTP API, Rest, Multisite support of IPv6, ... ).
#35
@
6 years ago
No problems with new files in /tests. They are only problematic in /build for minor releases.
Doh, good point :)
43545.6.diff has lots of good tweaks. A few minor questions:
- Should
@uses
be added? It sounds like the standard discourages it now, and I personally feel like it adds a maintenance overhead without being particularly useful. - I don't think
inet_(ntop|pton)
throw
any exceptions, so is there a benefit of usingtry/catch
? - It looks like the changes to
src/wp-admin/includes/class-wp-community-events.php
andtests/phpunit/tests/admin/includesCommunityEvents.php
were left out.
#36
@
6 years ago
Thanks @iandunn, I wasn't aware of the @uses
being discouraged, I only consulted core here regarding that tag ;-)
You're absolutely right about the try/catch not working for PHP warnings, it was the first thing that I tried and it somehow worked well with phpunit ;-)
43545.7.diff should include all the four files and it removes the @uses
, try/catch and the unrecognized address testcase.
#37
@
6 years ago
- Owner set to azaozz
- Resolution set to fixed
- Status changed from new to closed
In 42971:
#38
@
6 years ago
- Milestone changed from 5.0 to 4.9.6
- Resolution fixed deleted
- Status changed from closed to reopened
Reopen for "next minor".
This ticket was mentioned in Slack in #gdpr-compliance by azaozz. View the logs.
6 years ago
This ticket was mentioned in Slack in #core by jeffpaul. View the logs.
6 years ago
#41
@
6 years ago
43545.8.diff Updates version numbers in the docblocks before backporting. This needs to be backported before r42994, since that commit depends on functionality from this ticket.
This ticket was mentioned in Slack in #gdpr-compliance by desrosj. View the logs.
6 years ago
This ticket was mentioned in Slack in #gdpr-compliance by allendav. View the logs.
6 years ago
#46
@
6 years ago
- Component changed from Options, Meta APIs to Privacy
Moving to the new Privacy component.
#47
in reply to:
↑ 20
;
follow-up:
↓ 48
@
6 years ago
Replying to azaozz:
Replying to pputzer:
Have you got any source for this? I know of no such ruling and have not been able to find any German article that mentions it. All German articles on the topic of IP anonymization seem talk about killing the last octet (or using Google's standard
_anonymizeIp()
function).
Somebody mentioned this few weeks ago in Slack (I think) but can't find anything more about it either. Thinking we can consider this as "not real unless proven otherwise" :)
I am not aware of any court rulings, but the consideration about a potential requirement of cutting to only the first two octets (IPv4) may stem from the EU Article 29 Working Group's Opinion paper WP148 (2008):
"Currently, some search engine providers truncate IPv4 addresses by removing the final octet, thus in effect retaining information about the user's ISP or subnet, but not directly
identifying the individual. The activity could then originate from any of 254 IP addresses. This may not always be enough to guarantee anonymisation." (emphasis mine)
The "Unabhängiges Landeszentrum für Datenschutz Schleswig-Holstein", one of Germany's 16 Federal Data Protection Authorities references that text in their FAQ on IP addresses (in German) to suggest obfuscation of two octets as acceptable.
This is the opinion of public advisory bodies on data protection, not binding law; deleting the last octet appears to indeed be common practice in Germany, though some sources (legal blogs, mainly) tend to recommend deleting two. Since privacy compliance is always about minimising risk, not absolute rules, I believe that at least a filter to set a higher anonymization level could be worth considering?
#48
in reply to:
↑ 47
@
6 years ago
Replying to sgreger:
This is the opinion of public advisory bodies on data protection, not binding law; deleting the last octet appears to indeed be common practice in Germany, though some sources (legal blogs, mainly) tend to recommend deleting two. Since privacy compliance is always about minimising risk, not absolute rules, I believe that at least a filter to set a higher anonymization level could be worth considering?
Thanks, now we at least know where that is coming from. While I still think that removing the last octet is fine, having a filter hook seems like a good idea.
Copied from #43550 (accidental duplicate)
For example, come up with functions to provide replacement text for things like name and address fields that can be used by plugins when removing personal data. The ideal text will be recognizable by the administration as having been thus removed, e.g. "Anonymized User".
Since email addresses are at the core of our export and removal strategy, the anonymized email address function should also probably support a random hash or some such, e.g. anonymized-user-123456abcdef@… to avoid creating a situation where all anonymized content ends up pointing at a single fictitious user.
From the ticket scrub discussion in Making WordPress gdpr-compliance chat today.