Make WordPress Core

Opened 4 weeks ago

Closed 3 weeks ago

#62354 closed defect (bug) (fixed)

fetch_feed should use `get_bloginfo( 'charset' )` over `get_option( 'blog_charset' )` when setting the output encoding

Reported by: davidbinda's profile david.binda Owned by: desrosj's profile desrosj
Milestone: 6.7 Priority: normal
Severity: normal Version: 6.7
Component: Feeds Keywords: has-patch has-unit-tests commit dev-reviewed
Focuses: Cc:

Description (last modified by sabernhardt)

Testing the SimplePie 1.8.0 which is part of the upcoming WordPress 6.7 release, I've run into some fatal errors on blogs which have the blog_charset option empty:

Fatal error: Uncaught ValueError: mb_convert_encoding(): Argument #2 ($to_encoding) must be a valid encoding, "" given in wp-includes/SimplePie/src/Misc.php:344

It turns out that the fetch_feed function is setting the SimplePie's output encoding by passing in the get_option( 'blog_charset' ) ( related code in fetch_feed ) which might be empty in some cases.

While I'm not sure how or why the option is empty, the get_bloginfo( 'charset' ) accounts for such a case and outputs the default UTF-8 encoding in such case ( related code in get_bloginfo ).

IMHO, the fetch_feed function should use the get_bloginfo( 'charset' ) over the unchecked option value.

Attachments (2)

62354.diff (506 bytes) - added by david.binda 4 weeks ago.
fetchFeed.php (1.1 KB) - added by peterwilsoncc 3 weeks ago.
First pass at tests -- don't fail on trunk

Download all attachments as: .zip

Change History (20)

@david.binda
4 weeks ago

This ticket was mentioned in PR #7744 on WordPress/wordpress-develop by @yogeshbhutkar.


4 weeks ago
#1

  • Keywords has-patch added

Updated the output encoding to use get_bloginfo() instead of the previous approach of using the option get_option( 'blog_charset' ) as the absence of the option was leading to unexpected fatal errors as described in the Trac Tickets.

Trac ticket: https://core.trac.wordpress.org/ticket/62354

#2 @sabernhardt
4 weeks ago

  • Description modified (diff)
  • Milestone changed from Awaiting Review to 6.7.1

#3 @desrosj
4 weeks ago

Linking a few tickets and change sets for easy reference.

It looks like the current get_bloginfo( 'blog_charset' ) line was introduced in [32469] (see #10713) and shipped in WordPress 4.3. So an upstream change is likely responsible for surfacing this bug that has existed for some time.

The SimplePie library was updated for WP 6.7 to 1.8.0 in [59141] (see #55604).

@davidbinda Do you have any sense for how common this may be on the sites that you manage?

#4 @azaozz
4 weeks ago

IMHO, the fetch_feed function should use the get_bloginfo( 'charset' ) over the unchecked option value.

This was added in #10713 from 15 years ago. Seems the reason the patch there is using get_option() is because get_bloginfo( 'charset' ) was added later/wasn't available at the time.

Frankly I'm thinking this fix should probably be in 6.7 as the fatal seems triggered by the SimplePie update. Looking at the PR, seems the only possibility of any problems might be because of the filter at the bottom of get_bloginfo(). Considering how close to release it is, maybe the patch for 6.7 can be to just fix this in fetch_feed(), but switch to using get_bloginfo() for 6.8+.

#5 follow-up: @azaozz
4 weeks ago

  • Milestone changed from 6.7.1 to 6.7

After chatting with @peterwilsoncc moving to 6.7 for consideration.

An alternative (perhaps a bit safer?) patch would be to hard-code this directly in fetch_feed(). Replace $feed->set_output_encoding( get_option( 'blog_charset' ) ); with:

$blog_charset = get_option( 'blog_charset' );

if ( empty( $blog_charset ) ) {
    $blog_charset = 'UTF-8';
}

$feed->set_output_encoding( $blog_charset );
Last edited 4 weeks ago by azaozz (previous) (diff)

#6 @SergeyBiryukov
3 weeks ago

It appears that we already use get_option( 'blog_charset' ) and get_bloginfo( 'charset' ) somewhat interchangeably in quite a few places, so I think just switching to the latter should be fine here.

@peterwilsoncc
3 weeks ago

First pass at tests -- don't fail on trunk

#7 @peterwilsoncc
3 weeks ago

I think the switch to get_bloginfo() makes sense, and I don't have any objection to doing so for 6.7.

I'd like to include some tests to reproduce the problem on trunk but haven't been able to do so. I've included my first pass in fetchFeed.php but they are passing as the empty character set is eventually converted to UTF-8. If someone could provide a pointer towards what I am missing, it would be very much appreciated.

This ticket was mentioned in PR #7758 on WordPress/wordpress-develop by @peterwilsoncc.


3 weeks ago
#8

  • Keywords has-unit-tests added

Trac ticket: Core-62354

#9 @david.binda
3 weeks ago

@peterwilsoncc the fatal error happens when the feed items are being sanitized, thus they need to be accessed. For instance:

foreach( $feed->get_items( 0, 1 ) as $item ) {
    $content = $item->get_content();
}

in Tests_Feed_fetchFeed::test_empty_charset_does_not_trigger_fatal_error after the $feed = fetch_feed( 'https://wordpress.org/news/feed/' ); should trigger the fatal error.

Last edited 3 weeks ago by david.binda (previous) (diff)

This ticket was mentioned in PR #7763 on WordPress/wordpress-develop by @davidbaumwald.


3 weeks ago
#10

This ticket was mentioned in Slack in #core by david.baumwald. View the logs.


3 weeks ago

#12 in reply to: ↑ 5 @desrosj
3 weeks ago

Replying to azaozz:

An alternative (perhaps a bit safer?) patch would be to hard-code this directly in fetch_feed(). Replace $feed->set_output_encoding( get_option( 'blog_charset' ) ); with:

$blog_charset = get_option( 'blog_charset' );

if ( empty( $blog_charset ) ) {
    $blog_charset = 'UTF-8';
}

$feed->set_output_encoding( $blog_charset );

I think that I'd prefer to just switch to get_bloginfo( 'charset' ). This fallback snippet almost exactly matches the code in the charset case within get_bloginfo(), and I don't see any compelling reason to have duplicate code.

I've been able to reproduce the problem with @davidbinda's latest feedback. You can see the new test class failing here: https://github.com/WordPress/wordpress-develop/actions/runs/11782905557. Working to add a real assertion and to confirm the proposed fix resolves the problem.

#13 @desrosj
3 weeks ago

The linked PR is now has a passing test with the suggested fix included: https://github.com/WordPress/wordpress-develop/actions/runs/11783234915.

Last edited 3 weeks ago by desrosj (previous) (diff)

#14 @desrosj
3 weeks ago

  • Owner set to desrosj
  • Resolution set to fixed
  • Status changed from new to closed

In 59382:

Feeds: Avoid fatal error with empty blog_charset value.

After the SimplePie library was updated to version 1.8.0 in [59141], an edge case has been discovered where a fatal error can encountered if the blog_charset option is missing or empty.

In fetch_feed(), this option is retrieved using get_option() instead of get_bloginfo( ‘charset’ ). The latter will detect this scenario and apply a default value of UTF-8 and is already used interchangeably throughout Core. This switches to get_bloginfo( ‘charset’ ) instead to prevent this edge case.

Props david.binda, davidbaumwald, SergeyBiryukov, sabernhardt, azaozz, peterwilsoncc.
Fixes #62354.

#15 @desrosj
3 weeks ago

  • Keywords dev-feedback commit added
  • Resolution fixed deleted
  • Status changed from closed to reopened

Marking for a second committer's signoff to backport.

#16 @davidbaumwald
3 weeks ago

  • Keywords dev-reviewed added; dev-feedback removed

Looks good to backport to the 6.7 branch.

#18 @desrosj
3 weeks ago

  • Resolution set to fixed
  • Status changed from reopened to closed

In 59383:

Feeds: Avoid fatal error with empty blog_charset value.

After the SimplePie library was updated to version 1.8.0 in [59141], an edge case has been discovered where a fatal error can encountered if the blog_charset option is missing or empty.

In fetch_feed(), this option is retrieved using get_option() instead of get_bloginfo( ‘charset’ ). The latter will detect this scenario and apply a default value of UTF-8 and is already used interchangeably throughout Core. This switches to get_bloginfo( ‘charset’ ) instead to prevent this edge case.

Reviewed by davidbaumwald.
Merges [59382] to the 6.7 branch.

Props david.binda, davidbaumwald, SergeyBiryukov, sabernhardt, azaozz, peterwilsoncc.
Fixes #62354.

Note: See TracTickets for help on using tickets.