Opened 10 years ago
Last modified 4 weeks ago
#31992 new defect (bug)
Unicode Email Addresses
Reported by: |
|
Owned by: | |
---|---|---|---|
Milestone: | Priority: | normal | |
Severity: | normal | Version: | |
Component: | Formatting | Keywords: | has-testing-info has-patch has-screenshots changes-requested has-unit-tests |
Focuses: | Cc: |
Description (last modified by )
Tested against trunk (2015-04-16)
Test case
$target_email = 'dummy-üñîçøðé.y.!#$%@üñîçøðé.gmail.com'; echo $target_email.'<br>'; echo sanitize_email($target_email).'<br>'; echo 'is_email : '.is_email($target_email);
Return
dummy-üñîçøðé.y.!#$%@üñîçøðé.gmail.com dummy-email.y.!#$%@gmail.com is_email :
Function is_email @ /wp-includes/formatting.php line 2177
Preg_replace @ line 2211 is not correct.
if ( !preg_match( '/^[a-zA-Z0-9!#$%&\'*+\/=?^_`{|}~\.-]+$/', $local ) ) {
Function sanitize_email() @ /wp-includes/formatting.php line 2430
Preg_replace @ line 2460 is not correct.
$local = preg_replace( '/[^a-zA-Z0-9!#$%&\'*+\/=?^_`{|}~\.-]/', '', $local );
Attachments (2)
Change History (28)
#2
@
10 years ago
ugh... sorry. I actually pasted the email that was sanitized.
The test I made was
$target_email = 'dummy-üñîçøðé.y.!#$%@üñîçøðé.gmail.com'; echo $target_email.'<br>'; echo sanitize_email($target_email).'<br>'; echo 'is_email : '.is_email($target_email);
with return as
dummy-üñîçøðé.y.!#$%@üñîçøðé.gmail.com dummy-email.y.!#$%@gmail.com is_email :
The unicode characters were all removed. For international emails this can be a real problem.
ps. I actually used the Wiki page you sent as a base for my ticket. I tried a mix of one of the last examples in the "Valid email Examples" list.
#3
@
10 years ago
I had looked into email validation when I wrote my SMTP plugin, and eventually concluded it is nearly impossible to validate an email address at all. Most email validators are much more restrictive than any RFC requires.
I think currently my code doesn't even call sanitize_text_field because it failed too many of my test cases.
https://wordpress.org/plugins/postman-smtp/
http://girders.org/blog/2013/01/31/dont-rfc-validate-email-addresses/
#4
@
10 years ago
- Keywords reporter-feedback removed
- Summary changed from sanitize_email() and is_email() preg_replace/preg_match problems to Unicode Email Addresses
- Version trunk deleted
Replying to ysalame:
ugh... sorry. I actually pasted the email that was sanitized.
We need an admin to update the ticket description then.
#8
@
4 years ago
Hi!
I opened #51732 ticket to propose a patch to change email validation rules to work with EAI, but it was recognized a duplicate of this one.
As I can see, nothing was changed with EAI for 6 years from the time this ticket was open and Unicode EAI cannot be used in WP. I would like to propose some things to fix it:
1) Let's modify sanitize_email() and is_email() to use special constants for regex check username and domain name WP_IDN_LOCAL_MAIL_REGEX and WP_IDN_DOMAIN_REGEX in default-constants.php which every admin can change to validate IDN domains rules for language he uses. For example I added Russian IDN rules to these regexes.
2) Let's also add decode_punycode() to sanitize_email() function which check if e-mail domain part in Punycode from and convert it to Unicode as required by uasg.tech
3) In addition to all these things we can modify sanitize_user() in the same manner to allow Unicode usernames to WP Universal Acceptance ready.
I attached the patch for all of these.
#9
@
4 years ago
This patch does not solve the problem with e.g. "测试5@普遍接受-测试.世界"
A more simpler approach would be to use php FILTER_VALIDATE_EMAIL (which doesn't work for unicode, but we add the second if for that) here. An example validation function could look like this:
if (filter_var($email, FILTER_VALIDATE_EMAIL)) { return true; } elseif (preg_match("/^([\w\-\.\+]+@([\w\-]+\.)+[\w\-]{2,63})?$/u", $email)) { return true; } else { return false; }
I guess even more simple would be to use just the regex:
if (preg_match("/^([\w\-\.\+]+@([\w\-]+\.)+[\w\-]{2,63})?$/u", $email)) { return true; } else { return false; }
In any case is_email currently blocks unicode email addresses and should get fixed.
@
22 months ago
Patch to add support for unicode email addresses in is_email, but making no other changes and resolving no other limitations
#10
@
18 months ago
- Keywords has-testing-info has-patch has-screenshots added; is-email removed
Test Report
Patch tested: https://core.trac.wordpress.org/attachment/ticket/31992/31992-icann.patch
Steps to Test
- In WP admin, navigate to Users > Add New.
- Enter a Username in standard ASCII characters (no need to confuse with related #5918).
- Enter a valid Email with extended ASCII or unicode characters, e.g.:
grå@grå.org
. - 🐞 Observe error returned:
Error: The email address is not correct.
Expected Results
- ✅ An email with extended ASCII or unicode characters should be accepted.
Environment
- Hardware: MacBook Pro Apple M1 Pro
- OS: macOS 13.5
- Browser: Safari 16.6
- Server: nginx/1.25.2
- PHP: 8.2.9
- Database: SQLite
- WordPress: 6.4-alpha-56267-src
Actual Results
- ✅ The Add New User form accepts the email with special characters.
- ❌ The email stored in the database (and subsequently displayed on the page) has special characters stripped out. E.g. (also see Figure 1):
grå@grå.org
=>gr@gr.org
测试5@普遍接受-测试.世界
=>(empty!)
Supplemental Artifacts
#11
@
18 months ago
Thanks for testing, and thanks for not confusing it with that other issue.
My patch fixes only email address validation, it doesn't touch database storage. I'll be happy to work on database storage too, but do you want it on this issue? I lean towards fixing only is_email in this issue, and fixing other related issues in separate patches.
(JFYI, my most important goal is to accept unicode email addresses in forms managed by plugins, CF7 for example.)
#12
@
18 months ago
- Keywords changes-requested added
@agulbra, yes, it's fine to work on the DB portion with a separate patch file, but please do include it with this ticket. Based on testing 31992-icann.patch, addressing the DB storage issue is first required to prevent the issue noted in comment:10. Also, there are other workflows that will need testing, like during site creation, etc, so this could very well touch even more areas in Core.
I wanted to note that GitHub pull requests are the preferred mechanism for code review, as they facilitate easier reviews and collaboration. (For instance, see this branch with your first patch applied.) PRs are often faster to update, and additional requirements can be addressed through additional commits as they reveal themselves, such as the database storage update.
I've marked this ticket with changes-requested
to signify that additional work is needed on top of the existing patch.
(In addition to the handbook, I invite you to join Make WordPress Slack if you have questions about the contribution workflow, particularly the #core channel.)
This ticket was mentioned in PR #5237 on WordPress/wordpress-develop by arnt.
17 months ago
#13
- Keywords has-unit-tests added
This adds support for unicode email addresses in is_email and sanitise_email. It is intended to be sufficient to accept unicode email addresses in contact forms etc.,
it does *not* aspire to unicode support everywhere, in particular it doesn't deal with unicode in logins or passwords.
It also does not add a tests/formatting/sanitizeEmail.php; I wouldn't mind adding one, but don't quite understand your testing policies. It looks like you want to keep the tests fast by avoiding unnecessary tests, I'd say, but I don't understand what you consider necessary and unnecessary. Anyway, if you want a file I'm happy to add one.
17 months ago
#14
I don't understand the test failures. They appear to be unrelated to these code changes. Could you take a look, @ironprogrammer ? Thanks.
@ironprogrammer commented on PR #5237:
17 months ago
#15
In response to @arnt:
It also does not add a tests/formatting/sanitizeEmail.php; I wouldn't mind adding one...
Absolutely, please do add unit tests (or updates to existing tests) for sanitize_email()
. While Core test code coverage is pretty low today 😓, it is a project policy that new Core merges include appropriate tests.
I don't understand the test failures. They appear to be unrelated to these code changes. Could you take a look, @ironprogrammer ? Thanks.
It's not uncommon for E2E test timeouts to occur in this repo's automated jobs. These can be ignored.
@ironprogrammer commented on PR #5237:
17 months ago
#16
Thanks for the updated PR! The updates to sanitize_email()
appear to have resolved the issue with storing the correct email in the database 🎉.
This isn't a formal test report (there are lots of scenarios needed), but I ran through a few smoke tests so as to get back to you quickly on what could be addressed next. It would be great to get additional reviewers/testers involved to cover what I've missed:
- ✅ Add New User: email with unicode characters is accepted and saved to the database correctly.
- ✅ Profile (edit user): email can be changed with unicode characters, and when saved, persists correctly.
- ❓ Do new user creation, email change confirmation, or password change emails go to the correct address?
- ❓ Do links in these emails work correctly?
- ❓ Does the forgotten password workflow work correctly with a unicode address?
- ✅ Login: logging in with a unicode email works as expected.
- ✅ Add post comment: email with unicode is accepted for post comments.
- ❌ Comments (approval list): emails with unicode are not displayed correctly (Figure 1).
- ✅ In the Comments list, the Email field under Quick Edit does appear correctly.
- ❓ Site setup wizard: can the initial admin use a unicode address, and do the emails work correctly?
- ❓ Do various author dropdowns display unicode addresses as expected?
- ❓ Are unicode emails in feeds presented accurately?
_Figure 1: Display of commenter unicode email addresses._
17 months ago
#17
Sigh. I see what you mean.
My own focus is/was quite different: I wanted Contact Form 7 and similar to accept unicode email addresses, which requires some utility functions in Wordpress to do the right thing. I didn't aspire to complete support in Wordpress. The thinking behind your response seems to be that either Wordpress supports it properly or not at all, there's no inbetween.
And, sadly, I think you're right and I was wrong.
I'll extend the PR. It'll take a few weeks. I'll also find some testers.
5 months ago
#18
Hi,
you asked for wider testing. Around 20 sites have now run the patch in production, the longest-running one for more than six months already. Nothing has broken. The sites are real (University of Somewhere, not someone's private five-visitor site).
If you want, I can send you the email addresses of some testers. When I pinged them last week, around half said it was okay to give you their addresses, I'm sure most will say the same if I ask a few times ;)
@ironprogrammer commented on PR #5237:
5 months ago
#20
Thanks for the update, @arnt, and for seeking wider testing! There are still some outstanding questions and one previous failure to consider, but it's great to see this moving forward! 🙌🏻
A good next step would be to rebase this PR on the latest trunk
, and to resolve the coding standards issues to tidy things up for an updated review and testing. When the CI checks are in a happy state, then it will help attract more attention.
5 months ago
#21
Finding testers who were willing and able to run a patch on sites that mattered to them was hard work. That took a good long while. Do you want their names? If so, please send me mail.
Rebasing is not a problem, of course. The other stuff was a little difficult. I only recently found a good workaround for IntlChar's lack of support for UAX24.
Can you suggest a model end-to-end test that I could emulate and write automatic tests for the desired features? Manual testing is possible, but I'd much prefer to have complete automatic test coverage.
2 months ago
#22
Hi,
the remaining problems, as far as I can tell:
- The code now requires PHP 7.4, which I thought was OK but now I see that 7.4 is merely recommended, not required. Oops. Not clear to me how to do mb_str_split in 7.2.
- I've clicked around, but don't feel sure that I have all of the "various dropdowns". The filters in default-filters.php are okay, but I don't feel confident that every third-party plugin will be okay, or that I've looked at everything in the UI. Wordpress is large, there are nooks and corners…
2 months ago
#23
I added some code to prevent PHP 7.2.24-7.3.x from choking on mb_str_split().
A bit on testing:
- ✅ Add New User: email with unicode characters is accepted and saved to the database correctly.
- ✅ Profile (edit user): email can be changed with unicode characters, and when saved, persists correctly.
- ✅ New user creation, email change confirmation, and password change emails go to the correct address.
- ✅ Links in all messages work correctly AFAICT (plugins can add new kinds of course…).
- ✅ The forgotten password workflow works correctly with a unicode address.
- ✅ Login: logging in with a unicode email works as expected.
- ✅ Add post comment: email with unicode is accepted for post comments.
- ✅ Comments (approval list): emails with unicode are displayed correctly.
- ✅ In the Comments list, the Email field under Quick Edit does appear correctly.
- ✅ WordPress works on an IPv6-only server (oh, you weren't wondering?)
And… oh drat… Must retest the site setup wizard, too much code has changed.
Two questions remain.
- What author dropdowns display email addresses? I can't find anything. Lots of user names displayed but not email addresses. Maybe I misunderstand. I checked for usage of email in the code that reads the database and use of that code in display code.
- How can email addresses ever appear in feeds? The feed generators don't seem to reveal authors etc.
2 months ago
#24
Just set up a new site from scratch using 'rød@grå.org' as admin address; it works.
6 weeks ago
#25
Hi @ironprogrammer,
would you mind having a look?
Also, on an unrelated matter, I'm with you. I've worked as a full-time maintainer on a widely used open source project and got a lot of flak. It hurt. Some people can be really shitty. Don't let it grind you down.
@ironprogrammer commented on PR #5237:
4 weeks ago
#26
Thanks a lot, @arnt, and apologies for not being involved with this the past several months 🙏🏻. I've got some other work priorities at this time, but hope to be able to dive back in soon!
In the meantime, for code review purposes, I would suggest raising this Trac ticket during #core Dev Chat, or dropping in a comment on the next agenda post to add it to open floor discussion. This will help spread some awareness of the initiative. If you're available on Slack and sync during that time to answer questions, that would be ideal.
Glad you could wrangle additional testing for the updates! If this info can be made public through test reports in Trac, that would be amazing -- and also prerequisite to merging as Core committers begin taking a look at the ticket.
You could get additional visibility by sharing the ticket in the #core-test channel. There may also be contributors willing to help put test instructions together based on what you've got so far. This update covers a lot of territory, so the more detailed the tests for each use case, the better.
Can you specify exactly what the bug is? As far as I can see,
dummy-email.y.!#$%@gmail.com
is a valid email address. See eg http://en.wikipedia.org/wiki/Email_address#Local_part.