Opened 8 years ago
Last modified 2 years ago
#31992 new defect (bug)
Unicode Email Addresses
Reported by: |
|
Owned by: | |
---|---|---|---|
Milestone: | Priority: | normal | |
Severity: | normal | Version: | |
Component: | Formatting | Keywords: | is-email |
Focuses: | Cc: |
Description (last modified by )
Tested against trunk (2015-04-16)
Test case
$target_email = 'dummy-üñîçøðé.y.!#$%@üñîçøðé.gmail.com'; echo $target_email.'<br>'; echo sanitize_email($target_email).'<br>'; echo 'is_email : '.is_email($target_email);
Return
dummy-üñîçøðé.y.!#$%@üñîçøðé.gmail.com dummy-email.y.!#$%@gmail.com is_email :
Function is_email @ /wp-includes/formatting.php line 2177
Preg_replace @ line 2211 is not correct.
if ( !preg_match( '/^[a-zA-Z0-9!#$%&\'*+\/=?^_`{|}~\.-]+$/', $local ) ) {
Function sanitize_email() @ /wp-includes/formatting.php line 2430
Preg_replace @ line 2460 is not correct.
$local = preg_replace( '/[^a-zA-Z0-9!#$%&\'*+\/=?^_`{|}~\.-]/', '', $local );
Attachments (2)
Change History (11)
#2
@
8 years ago
ugh... sorry. I actually pasted the email that was sanitized.
The test I made was
$target_email = 'dummy-üñîçøðé.y.!#$%@üñîçøðé.gmail.com'; echo $target_email.'<br>'; echo sanitize_email($target_email).'<br>'; echo 'is_email : '.is_email($target_email);
with return as
dummy-üñîçøðé.y.!#$%@üñîçøðé.gmail.com dummy-email.y.!#$%@gmail.com is_email :
The unicode characters were all removed. For international emails this can be a real problem.
ps. I actually used the Wiki page you sent as a base for my ticket. I tried a mix of one of the last examples in the "Valid email Examples" list.
#3
@
8 years ago
I had looked into email validation when I wrote my SMTP plugin, and eventually concluded it is nearly impossible to validate an email address at all. Most email validators are much more restrictive than any RFC requires.
I think currently my code doesn't even call sanitize_text_field because it failed too many of my test cases.
https://wordpress.org/plugins/postman-smtp/
http://girders.org/blog/2013/01/31/dont-rfc-validate-email-addresses/
#4
@
8 years ago
- Keywords reporter-feedback removed
- Summary changed from sanitize_email() and is_email() preg_replace/preg_match problems to Unicode Email Addresses
- Version trunk deleted
Replying to ysalame:
ugh... sorry. I actually pasted the email that was sanitized.
We need an admin to update the ticket description then.
#8
@
3 years ago
Hi!
I opened #51732 ticket to propose a patch to change email validation rules to work with EAI, but it was recognized a duplicate of this one.
As I can see, nothing was changed with EAI for 6 years from the time this ticket was open and Unicode EAI cannot be used in WP. I would like to propose some things to fix it:
1) Let's modify sanitize_email() and is_email() to use special constants in default-constants.php for regex check username and domain name (WP_IDN_LOCAL_MAIL_REGEX and WP_IDN_DOMAIN_REGEX) which every admin can change to validate IDN domains rules for language he uses. For example I added Russian IDN rules to these regexes.
2) Let's also add decode_punycode() to sanitize_email() function which check if e-mail domain part in Punycode from and convert it to Unicode as required by uasg.tech
3) In addition to all these things we can modify sanitize_user() in the same manner to allow Unicode usernames to WP Universal Acceptance ready.
I attached the patch for all of these.
#9
@
2 years ago
This patch does not solve the problem with e.g. "测试5@普遍接受-测试.世界"
A more simpler approach would be to use php FILTER_VALIDATE_EMAIL (which doesn't work for unicode, but we add the second if for that) here. An example validation function could look like this:
if (filter_var($email, FILTER_VALIDATE_EMAIL)) { return true; } elseif (preg_match("/^([\w\-\.\+]+@([\w\-]+\.)+[\w\-]{2,63})?$/u", $email)) { return true; } else { return false; }
I guess even more simple would be to use just the regex:
if (preg_match("/^([\w\-\.\+]+@([\w\-]+\.)+[\w\-]{2,63})?$/u", $email)) { return true; } else { return false; }
In any case is_email currently blocks unicode email addresses and should get fixed.
Can you specify exactly what the bug is? As far as I can see,
dummy-email.y.!#$%@gmail.com
is a valid email address. See eg http://en.wikipedia.org/wiki/Email_address#Local_part.