Make WordPress Core

Opened 9 years ago

Last modified 12 months ago

#31992 new defect (bug)

Unicode Email Addresses

Reported by: ysalame's profile ysalame Owned by:
Milestone: Priority: normal
Severity: normal Version:
Component: Formatting Keywords: has-testing-info has-patch has-screenshots changes-requested has-unit-tests
Focuses: Cc:

Description (last modified by SergeyBiryukov)

Tested against trunk (2015-04-16)

Test case

$target_email = 'dummy-üñîçøðé.y.!#$%@üñîçøðé.gmail.com';
echo $target_email.'<br>';
echo sanitize_email($target_email).'<br>';
echo 'is_email : '.is_email($target_email);

Return

dummy-üñîçøðé.y.!#$%@üñîçøðé.gmail.com
dummy-email.y.!#$%@gmail.com
is_email :

Function is_email @ /wp-includes/formatting.php line 2177
Preg_replace @ line 2211 is not correct.

if ( !preg_match( '/^[a-zA-Z0-9!#$%&\'*+\/=?^_`{|}~\.-]+$/', $local ) ) {

Function sanitize_email() @ /wp-includes/formatting.php line 2430
Preg_replace @ line 2460 is not correct.

$local = preg_replace( '/[^a-zA-Z0-9!#$%&\'*+\/=?^_`{|}~\.-]/', '', $local );

Attachments (2)

31992.patch (8.5 KB) - added by prfidneai 4 years ago.
31992-icann.patch (3.5 KB) - added by agulbra 17 months ago.
Patch to add support for unicode email addresses in is_email, but making no other changes and resolving no other limitations

Download all attachments as: .zip

Change History (19)

#1 @boonebgorges
9 years ago

  • Keywords reporter-feedback added

Can you specify exactly what the bug is? As far as I can see, dummy-email.y.!#$%@gmail.com is a valid email address. See eg http://en.wikipedia.org/wiki/Email_address#Local_part.

#2 @ysalame
9 years ago

ugh... sorry. I actually pasted the email that was sanitized.

The test I made was

$target_email = 'dummy-üñîçøðé.y.!#$%@üñîçøðé.gmail.com';
echo $target_email.'<br>';
echo sanitize_email($target_email).'<br>';
echo 'is_email : '.is_email($target_email);

with return as

dummy-üñîçøðé.y.!#$%@üñîçøðé.gmail.com
dummy-email.y.!#$%@gmail.com
is_email : 

The unicode characters were all removed. For international emails this can be a real problem.

ps. I actually used the Wiki page you sent as a base for my ticket. I tried a mix of one of the last examples in the "Valid email Examples" list.

#3 @jasonhendriks
9 years ago

I had looked into email validation when I wrote my SMTP plugin, and eventually concluded it is nearly impossible to validate an email address at all. Most email validators are much more restrictive than any RFC requires.

I think currently my code doesn't even call sanitize_text_field because it failed too many of my test cases.

https://wordpress.org/plugins/postman-smtp/

http://girders.org/blog/2013/01/31/dont-rfc-validate-email-addresses/

Last edited 9 years ago by jasonhendriks (previous) (diff)

#4 @miqrogroove
9 years ago

  • Keywords reporter-feedback removed
  • Summary changed from sanitize_email() and is_email() preg_replace/preg_match problems to Unicode Email Addresses
  • Version trunk deleted

Replying to ysalame:

ugh... sorry. I actually pasted the email that was sanitized.

We need an admin to update the ticket description then.

#5 @SergeyBiryukov
9 years ago

  • Description modified (diff)

#6 @miqrogroove
9 years ago

  • Keywords is-email added

#7 @desrosj
4 years ago

#51732 was marked as a duplicate.

#8 @prfidneai
4 years ago

Hi!

I opened #51732 ticket to propose a patch to change email validation rules to work with EAI, but it was recognized a duplicate of this one.

As I can see, nothing was changed with EAI for 6 years from the time this ticket was open and Unicode EAI cannot be used in WP. I would like to propose some things to fix it:

1) Let's modify sanitize_email() and is_email() to use special constants for regex check username and domain name WP_IDN_LOCAL_MAIL_REGEX and WP_IDN_DOMAIN_REGEX in default-constants.php which every admin can change to validate IDN domains rules for language he uses. For example I added Russian IDN rules to these regexes.

2) Let's also add decode_punycode() to sanitize_email() function which check if e-mail domain part in Punycode from and convert it to Unicode as required by uasg.tech

3) In addition to all these things we can modify sanitize_user() in the same manner to allow Unicode usernames to WP Universal Acceptance ready.

I attached the patch for all of these.

Version 1, edited 4 years ago by prfidneai (previous) (next) (diff)

@prfidneai
4 years ago

#9 @liedekef
4 years ago

This patch does not solve the problem with e.g. "测试5@普遍接受-测试.世界"
A more simpler approach would be to use php FILTER_VALIDATE_EMAIL (which doesn't work for unicode, but we add the second if for that) here. An example validation function could look like this:

   if (filter_var($email, FILTER_VALIDATE_EMAIL)) {
        return true;
   } elseif (preg_match("/^([\w\-\.\+]+@([\w\-]+\.)+[\w\-]{2,63})?$/u", $email)) {
        return true;
   } else {
        return false;
   }

I guess even more simple would be to use just the regex:

   if (preg_match("/^([\w\-\.\+]+@([\w\-]+\.)+[\w\-]{2,63})?$/u", $email)) {
        return true;
   } else {
        return false;
   }

In any case is_email currently blocks unicode email addresses and should get fixed.

@agulbra
17 months ago

Patch to add support for unicode email addresses in is_email, but making no other changes and resolving no other limitations

#10 @ironprogrammer
13 months ago

  • Keywords has-testing-info has-patch has-screenshots added; is-email removed

Test Report

Patch tested: https://core.trac.wordpress.org/attachment/ticket/31992/31992-icann.patch

Steps to Test

  1. In WP admin, navigate to Users > Add New.
  2. Enter a Username in standard ASCII characters (no need to confuse with related #5918).
  3. Enter a valid Email with extended ASCII or unicode characters, e.g.: grå@grå.org.
  4. 🐞 Observe error returned: Error: The email address is not correct.

Expected Results

  • ✅ An email with extended ASCII or unicode characters should be accepted.

Environment

  • Hardware: MacBook Pro Apple M1 Pro
  • OS: macOS 13.5
  • Browser: Safari 16.6
  • Server: nginx/1.25.2
  • PHP: 8.2.9
  • Database: SQLite
  • WordPress: 6.4-alpha-56267-src

Actual Results

  • ✅ The Add New User form accepts the email with special characters.
  • ❌ The email stored in the database (and subsequently displayed on the page) has special characters stripped out. E.g. (also see Figure 1):
    • grå@grå.org => gr@gr.org
    • 测试5@普遍接受-测试.世界 => (empty!)

Supplemental Artifacts

Figure 1: Emails stored in database from test samples.
https://cldup.com/N6nKk0qLuA.thumb.png

#11 @agulbra
13 months ago

Thanks for testing, and thanks for not confusing it with that other issue.

My patch fixes only email address validation, it doesn't touch database storage. I'll be happy to work on database storage too, but do you want it on this issue? I lean towards fixing only is_email in this issue, and fixing other related issues in separate patches.

(JFYI, my most important goal is to accept unicode email addresses in forms managed by plugins, CF7 for example.)

#12 @ironprogrammer
13 months ago

  • Keywords changes-requested added

@agulbra, yes, it's fine to work on the DB portion with a separate patch file, but please do include it with this ticket. Based on testing 31992-icann.patch, addressing the DB storage issue is first required to prevent the issue noted in comment:10. Also, there are other workflows that will need testing, like during site creation, etc, so this could very well touch even more areas in Core.

I wanted to note that GitHub pull requests are the preferred mechanism for code review, as they facilitate easier reviews and collaboration. (For instance, see this branch with your first patch applied.) PRs are often faster to update, and additional requirements can be addressed through additional commits as they reveal themselves, such as the database storage update.

I've marked this ticket with changes-requested to signify that additional work is needed on top of the existing patch.

(In addition to the handbook, I invite you to join Make WordPress Slack if you have questions about the contribution workflow, particularly the #core channel.)

This ticket was mentioned in PR #5237 on WordPress/wordpress-develop by arnt.


12 months ago
#13

  • Keywords has-unit-tests added

This adds support for unicode email addresses in is_email and sanitise_email. It is intended to be sufficient to accept unicode email addresses in contact forms etc.,

it does *not* aspire to unicode support everywhere, in particular it doesn't deal with unicode in logins or passwords.

It also does not add a tests/formatting/sanitizeEmail.php; I wouldn't mind adding one, but don't quite understand your testing policies. It looks like you want to keep the tests fast by avoiding unnecessary tests, I'd say, but I don't understand what you consider necessary and unnecessary. Anyway, if you want a file I'm happy to add one.

arnt commented on PR #5237:


12 months ago
#14

I don't understand the test failures. They appear to be unrelated to these code changes. Could you take a look, @ironprogrammer ? Thanks.

@ironprogrammer commented on PR #5237:


12 months ago
#15

In response to @arnt:

It also does not add a tests/formatting/sanitizeEmail.php; I wouldn't mind adding one...

Absolutely, please do add unit tests (or updates to existing tests) for sanitize_email(). While Core test code coverage is pretty low today 😓, it is a project policy that new Core merges include appropriate tests.

I don't understand the test failures. They appear to be unrelated to these code changes. Could you take a look, @ironprogrammer ? Thanks.

It's not uncommon for E2E test timeouts to occur in this repo's automated jobs. These can be ignored.

@ironprogrammer commented on PR #5237:


12 months ago
#16

Thanks for the updated PR! The updates to sanitize_email() appear to have resolved the issue with storing the correct email in the database 🎉.

This isn't a formal test report (there are lots of scenarios needed), but I ran through a few smoke tests so as to get back to you quickly on what could be addressed next. It would be great to get additional reviewers/testers involved to cover what I've missed:

  • ✅ Add New User: email with unicode characters is accepted and saved to the database correctly.
  • ✅ Profile (edit user): email can be changed with unicode characters, and when saved, persists correctly.
  • ❓ Do new user creation, email change confirmation, or password change emails go to the correct address?
  • ❓ Do links in these emails work correctly?
  • ❓ Does the forgotten password workflow work correctly with a unicode address?
  • ✅ Login: logging in with a unicode email works as expected.
  • ✅ Add post comment: email with unicode is accepted for post comments.
  • ❌ Comments (approval list): emails with unicode are not displayed correctly (Figure 1).
  • ✅ In the Comments list, the Email field under Quick Edit does appear correctly.
  • ❓ Site setup wizard: can the initial admin use a unicode address, and do the emails work correctly?
  • ❓ Do various author dropdowns display unicode addresses as expected?
  • ❓ Are unicode emails in feeds presented accurately?

_Figure 1: Display of commenter unicode email addresses._

https://i0.wp.com/github.com/WordPress/wordpress-develop/assets/824344/80c46bdf-bc77-449c-b9b8-c34f217ff103

arnt commented on PR #5237:


12 months ago
#17

Sigh. I see what you mean.

My own focus is/was quite different: I wanted Contact Form 7 and similar to accept unicode email addresses, which requires some utility functions in Wordpress to do the right thing. I didn't aspire to complete support in Wordpress. The thinking behind your response seems to be that either Wordpress supports it properly or not at all, there's no inbetween.

And, sadly, I think you're right and I was wrong.

I'll extend the PR. It'll take a few weeks. I'll also find some testers.

Note: See TracTickets for help on using tickets.