Make WordPress Core

Opened 6 years ago

Last modified 3 weeks ago

#31992 new defect (bug)

Unicode Email Addresses

Reported by: ysalame Owned by:
Milestone: Priority: normal
Severity: normal Version:
Component: Formatting Keywords: is-email
Focuses: Cc:

Description (last modified by SergeyBiryukov)

Tested against trunk (2015-04-16)

Test case

$target_email = 'dummy-üñîçøðé.y.!#$%@üñîçøðé.gmail.com';
echo $target_email.'<br>';
echo sanitize_email($target_email).'<br>';
echo 'is_email : '.is_email($target_email);


is_email :

Function is_email @ /wp-includes/formatting.php line 2177
Preg_replace @ line 2211 is not correct.

if ( !preg_match( '/^[a-zA-Z0-9!#$%&\'*+\/=?^_`{|}~\.-]+$/', $local ) ) {

Function sanitize_email() @ /wp-includes/formatting.php line 2430
Preg_replace @ line 2460 is not correct.

$local = preg_replace( '/[^a-zA-Z0-9!#$%&\'*+\/=?^_`{|}~\.-]/', '', $local );

Attachments (1)

31992.patch (8.5 KB) - added by prfidneai 3 weeks ago.

Download all attachments as: .zip

Change History (9)

#1 @boonebgorges
6 years ago

  • Keywords reporter-feedback added

Can you specify exactly what the bug is? As far as I can see, dummy-email.y.!#$%@gmail.com is a valid email address. See eg http://en.wikipedia.org/wiki/Email_address#Local_part.

#2 @ysalame
6 years ago

ugh... sorry. I actually pasted the email that was sanitized.

The test I made was

$target_email = 'dummy-üñîçøðé.y.!#$%@üñîçøðé.gmail.com';
echo $target_email.'<br>';
echo sanitize_email($target_email).'<br>';
echo 'is_email : '.is_email($target_email);

with return as

is_email : 

The unicode characters were all removed. For international emails this can be a real problem.

ps. I actually used the Wiki page you sent as a base for my ticket. I tried a mix of one of the last examples in the "Valid email Examples" list.

#3 @jasonhendriks
6 years ago

I had looked into email validation when I wrote my SMTP plugin, and eventually concluded it is nearly impossible to validate an email address at all. Most email validators are much more restrictive than any RFC requires.

I think currently my code doesn't even call sanitize_text_field because it failed too many of my test cases.



Last edited 6 years ago by jasonhendriks (previous) (diff)

#4 @miqrogroove
6 years ago

  • Keywords reporter-feedback removed
  • Summary changed from sanitize_email() and is_email() preg_replace/preg_match problems to Unicode Email Addresses
  • Version trunk deleted

Replying to ysalame:

ugh... sorry. I actually pasted the email that was sanitized.

We need an admin to update the ticket description then.

#5 @SergeyBiryukov
6 years ago

  • Description modified (diff)

#6 @miqrogroove
5 years ago

  • Keywords is-email added

#7 @desrosj
3 weeks ago

#51732 was marked as a duplicate.

#8 @prfidneai
3 weeks ago


I opened #51732 ticket to propose a patch to change email validation rules to work with EAI, but it was recognized a duplicate of this one.

As I can see, nothing was changed with EAI for 6 years from the time this ticket was open and Unicode EAI cannot be used in WP. I would like to propose some things to fix it:

1) Let's modify sanitize_email() and is_email() to use special constants in default-constants.php for regex check username and domain name (WP_IDN_LOCAL_MAIL_REGEX and WP_IDN_DOMAIN_REGEX) which every admin can change to validate IDN domains rules for language he uses. For example I added Russian IDN rules to these regexes.

2) Let's also add decode_punycode() to sanitize_email() function which check if e-mail domain part in Punycode from and convert it to Unicode as required by uasg.tech

3) In addition to all these things we can modify sanitize_user() in the same manner to allow Unicode usernames to WP Universal Acceptance ready.

I attached the patch for all of these.

Last edited 3 weeks ago by prfidneai (previous) (diff)

3 weeks ago

Note: See TracTickets for help on using tickets.