Make WordPress Core

Opened 5 weeks ago

#62940 new defect (bug)

wp_mail(): Address header parsing is not RFC-5322 complient and fails on quoted-string when including a "<", ">" or ","

Reported by: bhujagendra's profile bhujagendra Owned by:
Milestone: Awaiting Review Priority: normal
Severity: normal Version: trunk
Component: Mail Keywords:
Focuses: Cc:

Description

Problem

The following From field values result in an error, although both comply with RFC 2822:

Case 1: quoted printable:

  • Input: From: "test<stage" <test@example.com>
  • Result:
    • $from_name: "test
    • $from_email: stage" <test@example.com

Case 2: multiple from addresses:

  • Input: From: test <test@example.com>, User <user@example.com>
  • Result:
    • $from_name: test
    • $from_email: test@example.com, User <user@example.com

Case 3: similar problem for to, cc, bcc, and reply_to fields!

  • Input: To: test <test@example.com>, "Joe, Doe" <joe.doe@example.com>
  • Result:
    • address 1: test <test@example.com>
    • address 2: "Joe
    • address 3: Doe" <joe.doe@example.com>

Reason

The php mail() documentation does not explicitly state that the From header needs to comply with RFC 2822. But it does say so for the To header. Furthermore, the current implementation does already support the name-addr-syntax (i.e. [display-name] angle-addr), which was not allowed in the preceding RFC 822.

However, wp_mail does ignore the facts

  • that display-name may include < (case #1 above), and
  • that the From field may actually include multiple addresses.

RFC 2822 Specification

Current Implementation

The parsing of the field simply splits the two parts of the name-addr at the position of the first occurrence of <, see line 2 of the code in question (pluggable.php:301-319):

<?php
case 'from':
        $bracket_pos = strpos( $content, '<' );
        if ( false !== $bracket_pos ) {
                // Text before the bracketed email is the "From" name.
                if ( $bracket_pos > 0 ) {
                        $from_name = substr( $content, 0, $bracket_pos );
                        $from_name = str_replace( '"', '', $from_name );
                        $from_name = trim( $from_name );
                }

                $from_email = substr( $content, $bracket_pos + 1 );
                $from_email = str_replace( '>', '', $from_email );
                $from_email = trim( $from_email );

                // Avoid setting an empty $from_email.
        } elseif ( '' !== trim( $content ) ) {
                $from_email = trim( $content );
        }
        break;

Similarly, for the to, cc, bcc, and reply_to fields:

Discussion

Unfortunately there is no simple solution:

  • Correct parsing - according to the standard outlined in RFC 2822 - would probably require the use of a real parser (as opposed to searching the entire string for certain tokens).
    • PHPMailer::parseAddresses() method cannot be used, as it lacks the same shortcoming, if the IMAP extension is not installed.
    • mail-mime-parser could work as an alternative. But it's a whole other level of complexity and only supports php v8+
  • Even if correct parsing would be supported and $from_name as well as $from_email correctly assigned, this would only work for single-mailbox entries. Multiple from addresses are not supported by

Weak implementation

It must be noted, that the current implementation has repeatedly had bugs and regressions (although it has been quiet around this in the last years):

Pragmatic proposal

  • Introduce a new filter wp_mail_from_raw at the beginning of the From field parsing:
    <?php
    /**
     * Filters the raw "From" header field before parsing it.
     *
     * @since 7.6.2
     *
     * @param string $content Form header content.
     * @param string $header Full raw header.
     */
    $content = apply_filters( 'wp_mail_from_raw', $content, $header );
    
    This would allow a user (or plugin author) to mitigate the issue, if it happens on a given site. But it would save them the effort of parsing the headers entirely (as would be required with wp_mail).
  • Alternatively, the filter could be called for all headers in line 300
    <?php
    
    $origName = $name; // used later for the default case in line 347
    $name = strtolower( $name );
    
    /**
     * Filters the raw header fields before parsing them.
     *
     * @since 7.6.2
     *
     * @param string $content Header field's content.
     * @param string $name Header field's name.
     * @param string $header Full raw header.
     */
    $content = apply_filters( 'wp_mail_raw_header', $content, $name, $header );
    
    /**
     * Filters the raw header fields before parsing them.
     *
     * @since 7.6.2
     *
     * @param string $content Header field's content.
     * @param string $name Header field's name.
     * @param string $header Full raw header.
     */
    $content = apply_filters( "wp_mail_header_$name", $content, $header );
    
    // ...
    
    // Line 347 needs to be updated:
    // $headers[ trim( $name ) ] = trim( $content );
    $headers[ $origName ] = $content; // both values are already trimmed in lines 296 & 297
    
  • Potentially use a RFC-compliant parser to avoid future errors with quoted address names at least in the other address fields
  • Use an instance of an object implementing ArrayAccess and Stringable. This can be used to store one or a set of addresses/names in the $from_email/wp_mail_from and $from_name/wp_mail_from_name values and is sent through the respective filters. By this it is basically backwards compatible.
  • Parse the Sender header, add a filter and default it to the first address of a multi-address From field of the default wordpress from address.
  • Enhance the PHPMailer::setFrom() method to accept $address and $name to be arrays. And therefore also the corresponting class fields $From and $FromName (maybe through a PR to PHPMailer)

Happy to help with a PR. I would just love to know what would be accepted in there.

Thanks,
Bhujagendra

Change History (0)

Note: See TracTickets for help on using tickets.