WordPress.org

Make WordPress Core

Opened 6 years ago

Last modified 8 months ago

#31992 new defect (bug)

Unicode Email Addresses

Reported by: ysalame Owned by:
Milestone: Priority: normal
Severity: normal Version:
Component: Formatting Keywords: is-email
Focuses: Cc:

Description (last modified by SergeyBiryukov)

Tested against trunk (2015-04-16)

Test case

$target_email = 'dummy-üñîçøðé.y.!#$%@üñîçøðé.gmail.com';
echo $target_email.'<br>';
echo sanitize_email($target_email).'<br>';
echo 'is_email : '.is_email($target_email);

Return

dummy-üñîçøðé.y.!#$%@üñîçøðé.gmail.com
dummy-email.y.!#$%@gmail.com
is_email :

Function is_email @ /wp-includes/formatting.php line 2177
Preg_replace @ line 2211 is not correct.

if ( !preg_match( '/^[a-zA-Z0-9!#$%&\'*+\/=?^_`{|}~\.-]+$/', $local ) ) {

Function sanitize_email() @ /wp-includes/formatting.php line 2430
Preg_replace @ line 2460 is not correct.

$local = preg_replace( '/[^a-zA-Z0-9!#$%&\'*+\/=?^_`{|}~\.-]/', '', $local );

Attachments (1)

31992.patch (8.5 KB) - added by prfidneai 11 months ago.

Download all attachments as: .zip

Change History (10)

#1 @boonebgorges
6 years ago

  • Keywords reporter-feedback added

Can you specify exactly what the bug is? As far as I can see, dummy-email.y.!#$%@gmail.com is a valid email address. See eg http://en.wikipedia.org/wiki/Email_address#Local_part.

#2 @ysalame
6 years ago

ugh... sorry. I actually pasted the email that was sanitized.

The test I made was

$target_email = 'dummy-üñîçøðé.y.!#$%@üñîçøðé.gmail.com';
echo $target_email.'<br>';
echo sanitize_email($target_email).'<br>';
echo 'is_email : '.is_email($target_email);

with return as

dummy-üñîçøðé.y.!#$%@üñîçøðé.gmail.com
dummy-email.y.!#$%@gmail.com
is_email : 

The unicode characters were all removed. For international emails this can be a real problem.

ps. I actually used the Wiki page you sent as a base for my ticket. I tried a mix of one of the last examples in the "Valid email Examples" list.

#3 @jasonhendriks
6 years ago

I had looked into email validation when I wrote my SMTP plugin, and eventually concluded it is nearly impossible to validate an email address at all. Most email validators are much more restrictive than any RFC requires.

I think currently my code doesn't even call sanitize_text_field because it failed too many of my test cases.

https://wordpress.org/plugins/postman-smtp/

http://girders.org/blog/2013/01/31/dont-rfc-validate-email-addresses/

Last edited 6 years ago by jasonhendriks (previous) (diff)

#4 @miqrogroove
6 years ago

  • Keywords reporter-feedback removed
  • Summary changed from sanitize_email() and is_email() preg_replace/preg_match problems to Unicode Email Addresses
  • Version trunk deleted

Replying to ysalame:

ugh... sorry. I actually pasted the email that was sanitized.

We need an admin to update the ticket description then.

#5 @SergeyBiryukov
6 years ago

  • Description modified (diff)

#6 @miqrogroove
6 years ago

  • Keywords is-email added

#7 @desrosj
11 months ago

#51732 was marked as a duplicate.

#8 @prfidneai
11 months ago

Hi!

I opened #51732 ticket to propose a patch to change email validation rules to work with EAI, but it was recognized a duplicate of this one.

As I can see, nothing was changed with EAI for 6 years from the time this ticket was open and Unicode EAI cannot be used in WP. I would like to propose some things to fix it:

1) Let's modify sanitize_email() and is_email() to use special constants in default-constants.php for regex check username and domain name (WP_IDN_LOCAL_MAIL_REGEX and WP_IDN_DOMAIN_REGEX) which every admin can change to validate IDN domains rules for language he uses. For example I added Russian IDN rules to these regexes.

2) Let's also add decode_punycode() to sanitize_email() function which check if e-mail domain part in Punycode from and convert it to Unicode as required by uasg.tech

3) In addition to all these things we can modify sanitize_user() in the same manner to allow Unicode usernames to WP Universal Acceptance ready.

I attached the patch for all of these.

Last edited 11 months ago by prfidneai (previous) (diff)

@prfidneai
11 months ago

#9 @liedekef
8 months ago

This patch does not solve the problem with e.g. "测试5@普遍接受-测试.世界"
A more simpler approach would be to use php FILTER_VALIDATE_EMAIL (which doesn't work for unicode, but we add the second if for that) here. An example validation function could look like this:

   if (filter_var($email, FILTER_VALIDATE_EMAIL)) {
        return true;
   } elseif (preg_match("/^([\w\-\.\+]+@([\w\-]+\.)+[\w\-]{2,63})?$/u", $email)) {
        return true;
   } else {
        return false;
   }

I guess even more simple would be to use just the regex:

   if (preg_match("/^([\w\-\.\+]+@([\w\-]+\.)+[\w\-]{2,63})?$/u", $email)) {
        return true;
   } else {
        return false;
   }

In any case is_email currently blocks unicode email addresses and should get fixed.

Note: See TracTickets for help on using tickets.