Make WordPress Core

Opened 6 weeks ago

Closed 4 weeks ago

Last modified 3 weeks ago

#62105 closed defect (bug) (fixed)

scraping error text while in normal operation

Reported by: georgwordpress's profile georgwordpress Owned by: jorbin's profile jorbin
Milestone: 6.7 Priority: normal
Severity: normal Version: 5.5
Component: Bootstrap/Load Keywords: has-patch commit
Focuses: Cc:

Description

For error detection and rollback functions (e.g. when auto-updating plugin/theme files or editing files in the backend), WordPress also starts a loopback request to the homepage.
This loopback request is made with special parameters (/?wp_scrape_key&wp_scrape_nonce), so WordPress starts special handling of this request.

However, this special treatment also occurs during normal operation (e.g. no maintenance, anonymous users without login).

Every request to the home URL with “/?wp_scrape_key&wp_scrape_nonce” is answered by WordPress with a simple text:

###### wp_scraping_result_start: ######{"code": "scrape_nonce_failure", "message": "Scrape key verification failed. Please try again."}###### wp_scraping_result_end : ### ###

and the http code is 200 = OK.

Check out examples of homepages built with WordPress:
https://wordpress.org/?wp_scrape_key&wp_scrape_nonce
https://www.whitehouse.gov/?wp_scrape_key&wp_scrape_nonce

As one result, the home URL and the scraping error text is also indexed by search engines.

This special handling of requests with scraping parameters should only be done when necessary:
e.g.:

  • in maintenance mode
  • when a user is logged in

Change History (19)

#1 @swissspidy
6 weeks ago

  • Milestone changed from Awaiting Review to 6.7
  • Version set to 5.5

Sounds like there is some room for improvement here. At the very least we could send a X-Robots-Tag: noindex header.

And if the transient doesn't exist we could also bail early.

#2 @georgwordpress
6 weeks ago

Yes - the non existing transient in function wp_start_scraping_edited_file_errors() is the problem:

https://github.com/WordPress/WordPress/blob/6.6-branch/wp-includes/load.php

/**
 * Starts scraping edited file errors.
 *
 * @since 4.9.0
 */
function wp_start_scraping_edited_file_errors() {
...
	$key   = substr( sanitize_key( wp_unslash( $_REQUEST['wp_scrape_key'] ) ), 0, 32 );
	$nonce = wp_unslash( $_REQUEST['wp_scrape_nonce'] );

	if ( get_transient( 'scrape_key_' . $key ) !== $nonce ) {
		echo "###### wp_scraping_result_start:$key ######";
...

Last minutes I tested this change:

        $nonce = wp_unslash( $_REQUEST['wp_scrape_nonce'] );

        # new check:
        if ( get_transient( 'scrape_key_' . $key ) === false ) return;

	if ( get_transient( 'scrape_key_' . $key ) !== $nonce ) {
		echo "###### wp_scraping_result_start:$key ######";

Buggy edited files and also a buggy plugin update was still detected and rollback was done.
And the problem was solved.

This ticket was mentioned in PR #7427 on WordPress/wordpress-develop by Georg-Git.


6 weeks ago
#3

  • Keywords has-patch added

No scraping error text when transient is not existing

Trac ticket:
https://core.trac.wordpress.org/ticket/62105

This ticket was mentioned in PR #7428 on WordPress/wordpress-develop by Georg-Git.


6 weeks ago
#4

Fix: scraping error text while in normal operation

https://core.trac.wordpress.org/ticket/62105

This ticket was mentioned in PR #7433 on WordPress/wordpress-develop by @georgwordpress.


6 weeks ago
#5

No scraping error text when transient does not exist

Trac ticket: https://core.trac.wordpress.org/ticket/62105

#7 @swissspidy
5 weeks ago

  • Keywords commit added

Yep, looks good at first glance

#8 follow-up: @jorbin
5 weeks ago

nocache_headers might also be worth including so that the page doesn't get cached.

#9 in reply to: ↑ 8 @georgwordpress
5 weeks ago

@jorbin:
Thanks for taking a look at this PR and Your comments!

After this bug fix, the error message will only be delivered if a transient exist (triggered by a wp loopback request)
And the transients exists only for a limited time.

After this bug fix it is very unlikely that a user is calling the homepage with parameters belonging to a still existing transient, isn't it?

Or what were you thinking about?

Nevertheless, I have expanded the PR and added the standard WordPress no-caching headers.

Does the PR now match your comments?
https://github.com/WordPress/wordpress-develop/pull/7433/files

#10 @georgwordpress
5 weeks ago

Use nocache_headers()

Last edited 5 weeks ago by georgwordpress (previous) (diff)

#11 @jorbin
4 weeks ago

  • Owner set to jorbin
  • Status changed from new to assigned

This ticket was mentioned in Slack in #core by chaion07. View the logs.


4 weeks ago

#13 @chaion07
4 weeks ago

  • Keywords changes-requested added

Thanks @georgwordpress for reporting this. We reviewed this Ticket during a recent bug-scrub session. Here's a summary of the feedback received:

  1. We can see that the PR for above is approved
  2. It also includes the changes suggested according to this comment
  3. Furthermore we would want to follow up with @jorbin for commit

Thank you.

Props to @pratiklondhe for these suggestions

Cheers!

#14 @swissspidy
4 weeks ago

  • Keywords changes-requested removed

Wrong use of the keyword, as there are no new chsnges requested AFAICT

#15 @georgwordpress
4 weeks ago

@swissspidy
I am with You: there are no new changes requested the last day.
All requests are already part of this PR.

But from the formal point of view Your first review was done before I added the proposal of @jorbin (add function nocache_headers()).

So right now I requested an update of Your review.
https://github.com/WordPress/wordpress-develop/pull/7433/files

#16 @georgwordpress
4 weeks ago

@jorbin
Since this is my first PR, I'm a little bit lost in the workflow on trac.wordpress.

If I understand the documentation correctly, a second review by a core committer is needed?

Do I have to trigger the second review? How?
This second review will be done and approved by You in this ticket?

After that You as the owner could change the status from 'assigned' to 'accepted'?

Am I on the right path - or is anything that I have to do?

#17 @jorbin
4 weeks ago

@georgwordpress Everything you have done looks good, no action is needed on your part. I just needed to find the time to commit it (which I am going to do now). Thank you for helping make WordPress better!

#18 @jorbin
4 weeks ago

  • Resolution set to fixed
  • Status changed from assigned to closed

In 59171:

Bootstrap/Load: Prevent loopback scraping errors when there is no key or nonce.

For error detection and rollback functions WordPress also starts a loopback request to the homepage. This loopback request is made with special parameters that when they don't match, generates an erorr. This hardens that flow by exiting out of the check if the nonce or key is missing or the nonce is not saved in the DB. It further hardens it by not caching the failures and asking search engines not to index the url with the failures.

Props georgwordpress, swissspidy, jorbin.
Fixes #62105.

Note: See TracTickets for help on using tickets.