Make WordPress Core

Opened 10 months ago

Closed 10 months ago

Last modified 8 months ago

#59386 closed defect (bug) (invalid)

Paginated posts enter a redirect loop behind a CDN

Reported by: brentbaccala's profile brentbaccala Owned by:
Milestone: Priority: normal
Severity: normal Version:
Component: Canonical Keywords:
Focuses: Cc:

Description

I've got a server (u20.freesoft.org) that sits behind Amazon's CDN, which is caching for the DNS name www.freesoft.org. "www.freesoft.org" is set for the WordPress Address and the Site Address in General Settings, and everything is mostly working fine. I'm using the twenty seventeen theme (version 3.2) and the "Post name" permalink structure, wordpress version 6.3.1.

The problem arises for paginated posts, and is related to trailing slashes. If I request a URL like https://www.freesoft.org/blogs/soapbox/my-confession/, it works fine for the first page (this is a paginated post). The CDN requests the page from u20.freesoft.org, and the links returned in the HTML look like this:

<a href="https://www.freesoft.org/blogs/soapbox/my-confession/2/" class="post-page-numbers">

The problem arises when a browser tries to access that link. The CDN sends it to https://u20.freesoft.org/blogs/soapbox/my-confession/2/, and this happens:

ubuntu@u20:~$ wget -O- https://u20.freesoft.org/blogs/soapbox/my-confession/2/ 2>&1 | head
--2023-09-18 18:38:40--  https://u20.freesoft.org/blogs/soapbox/my-confession/2/
Resolving u20.freesoft.org (u20.freesoft.org)... 172.30.2.192
Connecting to u20.freesoft.org (u20.freesoft.org)|172.30.2.192|:443... connected.
HTTP request sent, awaiting response... 301 Moved Permanently
Location: https://www.freesoft.org/blogs/soapbox/my-confession/2/ [following]
--2023-09-18 18:38:40--  https://www.freesoft.org/blogs/soapbox/my-confession/2/
Resolving www.freesoft.org (www.freesoft.org)... 172.30.2.192
Connecting to www.freesoft.org (www.freesoft.org)|172.30.2.192|:443... connected.
HTTP request sent, awaiting response... 200 OK

Which, you see, is a redirect loop. I can take the trailing slash off the URL and it works, but that isn't how the links are formatted on the first page.

I've been looking through the code base, and I've honed in on this block of code in wp-includes/canonical.php:

                // Post paging.
                if ( is_singular() && get_query_var( 'page' ) ) {
                        $page = get_query_var( 'page' );

                        if ( ! $redirect_url ) {
                                $redirect_url = get_permalink( get_queried_object_id() );
                                $redirect_obj = get_post( get_queried_object_id() );
                        }

                        if ( $page > 1 ) {
                                $redirect_url = trailingslashit( $redirect_url );

                                if ( is_front_page() ) {
                                        $redirect_url .= user_trailingslashit( "$wp_rewrite->pagination_base/$page", 'paged' );
                                } else {
                                        $redirect_url .= user_trailingslashit( $page, 'single_paged' );
                                }
                        }

                        $redirect['query'] = remove_query_arg( 'page', $redirect['query'] );
                }

I'm wondering if there should be some test here to check if $redirect_url is false and leave it alone. It looks to me like it always sets $redirect_url for a paginated post, and I'm not sure if that's right.

But I really don't know the code base at all. It's just a guess. Maybe it's a useful guess, or maybe I don't have a clue.

Anyway, it's a problem for my site, and probably for others. Hopefully somebody can help get it resolved.

Thanks!

Change History (6)

This ticket was mentioned in Slack in #core-test by ironprogrammer. View the logs.


10 months ago

#2 @ironprogrammer
10 months ago

  • Milestone Awaiting Review deleted
  • Resolution set to invalid
  • Status changed from new to closed
  • Version 6.3.1 deleted

Welcome to Trac, @brentbaccala, and thank you for the report!

In my testing of the URLs in question, the redirect loop occurs even without going through the CDN, e.g. requesting https://www.freesoft.org/blogs/soapbox/my-confession/2/ directly (for example). The redirects also seem to behave differently when making the request through a browser, where it redirects from the origin to the CDN domain. CDN aside, this seems like it might be due to the site/environment configuration.

Unfortunately we cannot provide support here in Trac, as it is used for bug reports and enhancements for the WordPress core software. Please check out the dedicated support forums for further help: https://wordpress.org/support/forums/.

If you do find an underlying redirect problem that can be reproduced independent of your environment, please feel free to re-open this ticket.

#3 @brentbaccala
10 months ago

www.freesoft.org is the CDN. I tried using Redirect Checker at whatsmydns.net to check u20.freesoft.org; it redirected to the CDN (www.freesoft.org) properly, then went into the redirect loop.

I can't imagine what could be wrong in the site configuration. There's some custom CSS, but nothing I can think of that should cause a redirect loop.

I'll look into it some more.

#4 @brentbaccala
9 months ago

I’m not using any kind of CDN plugin here. Instead, I’ve got the entire site configured behind the CDN. So, the DNS entry for http://www.freesoft.org is a CNAME for the AWS cloudfront distribution, which is configured with http://www.freesoft.org as “alternate domain name” with a corresponding SSL certificate. Thus, all requests go first to the CDN, then on to the actual web server.

Is this a supported configuration, or would you consider this to be a site misconfiguration?

#5 @emmaevy
8 months ago

Your configuration with the entire site behind a CDN, using AWS CloudFront, is a supported setup. The issue seems related to the handling of paginated posts and trailing slashes. To address this, you can try adding the following code snippet to your theme's functions.php file:

```php
function remove_pagination_trailing_slash($redirect_url, $post, $leavename) {
    if (is_singular() && get_query_var('page')) {
        $page = get_query_var('page');

        if ($page > 1) {
            $redirect_url = untrailingslashit($redirect_url);
        }
    }

    return $redirect_url;
}

add_filter('post_link', 'remove_pagination_trailing_slash', 10, 3);

This code aims to remove the trailing slash for paginated posts, potentially resolving the redirect loop issue. Add this code to your theme's functions.php file and check if it helps.
Use browser developer tools to inspect network requests and redirects or use this tool Link:Redirect checker This can help you identify trailing slash issue, http stats code and show you detail redirection path.

#6 @brentbaccala
8 months ago

I fixed the problem by configuring AWS CloudFront to pass the HTTP "Host" header through from the CDN to the web server.

It makes sense to me that that would impact its behavior, but I don't completely understand why it works now.

I'd like to see more documentation about which HTTP headers need to be passed through, and why.

Note: See TracTickets for help on using tickets.