Make WordPress Core

Opened 5 months ago

Last modified 5 months ago

#62038 new defect (bug)

Issue with is_email() and sanitize_email()

Reported by: debarghyabanerjee's profile debarghyabanerjee Owned by:
Milestone: Awaiting Review Priority: normal
Severity: normal Version:
Component: Formatting Keywords: has-patch has-unit-tests
Focuses: Cc:

Description

It has been observed that certain email addresses are passing through the validation and sanitization processes with trailing numbers appended to them, such as:

example@example.com1234
example@example.com1234567812345678

Also, example@204.32.222.14 is validated as a valid email by is_email

Currently, the is_email and sanitize_email functions are not handling these cases as expected. Importantly, according to RFC 5321, email validation rules dictate that only IP addresses enclosed in square brackets are considered valid domains. This RFC standard is currently not enforced correctly, leading to cases where email addresses like abc@204.32.222.14 are improperly validated as valid.

Change History (3)

This ticket was mentioned in PR #7334 on WordPress/wordpress-develop by @debarghyabanerjee.


5 months ago
#1

  • Keywords has-patch has-unit-tests added

Trac Ticket: Core-62038

## Problem Statement

  • It has been observed that certain email addresses are passing through the validation and sanitization processes with trailing numbers appended to them, such as:
  • example@example.com1234
  • example@example.com1234567812345678
  • Also, example@204.32.222.14 is validated as a valid email by is_email
  • Currently, the is_email and sanitize_email functions are not handling these cases as expected. Importantly, according to RFC 5321, email validation rules dictate that only IP addresses enclosed in square brackets are considered valid domains. This RFC standard is currently not enforced correctly, leading to cases where email addresses like abc@204.32.222.14 are improperly validated as valid.

## Fixes Implemented

  • The fixes focus on adhering to standard email validation and sanitization criteria, including:

### is_email() Validation Changes:

  • Bracketed IP Address Handling:
    • Validates bracketed IP addresses (e.g., user@[192.0.2.1]) as valid email addresses according to RFC 5321.
    • Rejects non-bracketed IP addresses (e.g., user@192.0.2.1) as invalid.
    • Trailing Numbers in Domain:
    • Updated validation logic to reject email addresses if the domain part contains trailing numbers, unless the domain also includes at least one alphabetic character.
    • sanitize_email() Sanitization Changes:
  • Bracketed IP Address:
    • Ensures that emails with bracketed IP addresses are returned unchanged.
    • Trailing Numbers in Domain:
    • If the domain contains trailing numbers and includes alphabetic characters, the function will remove the trailing numbers as part of the sanitization process.


## Detailed Changes

### is_email()

  • Added logic to validate IP addresses enclosed in square brackets as valid.
  • Introduced a condition to reject email addresses with trailing numbers in the domain if the domain does not contain alphabetic characters.

### sanitize_email()

  • Implemented logic to accept bracketed IP addresses without modification.
  • Modified the sanitization process to remove trailing numbers from the domain if the domain contains alphabetic characters.

## Example Updates

### Validation

  • user@[192.0.2.1] is considered a valid email address.
  • user@192.0.2.1 is considered an invalid email address.
  • example@example.com1234 is considered as invalid due to trailing numbers in domain.

### Sanitization:

  • Bracketed IP Address: user@[192.0.2.1] remains unchanged.
  • Trailing Numbers: example@… is sanitized to example@… if the domain part includes alphabets.

## Testing

  • Bracketed IP Addresses: Verified that bracketed IP addresses are correctly validated and sanitized.
  • Non-Bracketed IP Addresses: Ensured non-bracketed IP addresses are rejected.
  • Trailing Numbers: Confirmed that trailing numbers are handled correctly based on the presence of alphabetic characters in the domain.
  • General Email Validity: Ensured that standard email addresses continue to pass without issues.

## Considerations

  • Backward Compatibility: Ensured that the updated validation and sanitization rules do not negatively impact existing valid email addresses.
  • RFC Compliance: The changes are aligned with email validation standards set by RFC 5321.

#2 @ayeshrajans
5 months ago

Email and IP address validation functions are perhaps the easiest places to start bikeshed conversations, so I will summarize what I think; I'm generally against this change.

  • What's your basis on disallowing hostnames that has trailing numbers? I don't think this is disallowed by the RFC 5321.
  • If we no longer allow user@192.0.2.1 and start to allow user@[192.0.2.1], this can leave existing users with those users stuck.
  • Generally speaking, we should try to reduce the disparity between other email validation functions. The safest way to do it is to allow the most restricted variant of the data. This ship has sailed now, so our second goal would be to reduce this disparity.
  • The last time we updated the email validation logic, it quickly followed up with bug reports saying they can no longer get certain things to work; that's why we now have a separate test for PHPMailer.

You are correct that user@192.0.2.1 type of email addresses are not technically correct. FILTER_VALIDATE_EMAIL agrees on this too. However, this is _practically_ still considered valid. For example, both Chrome and Firefox accept example@example.com1234 and user@192.0.2.1 as valid email addresses for <input type=email /> fields. Chrome source with more test cases here

#3 @debarghyabanerjee
5 months ago

Hi @ayeshrajans , Thanks for your feedback, in that case, we should add the compatibility of allowing user@[192.0.2.1] emails as well, since these are the standard ones.

Note: See TracTickets for help on using tickets.