Make WordPress Core

Opened 4 years ago

Last modified 3 years ago

#52998 new enhancement

Add a filter to paginate_links to allow bypassing merging query params

Reported by: jonoaldersonwp's profile jonoaldersonwp Owned by:
Milestone: Awaiting Review Priority: normal
Severity: normal Version:
Component: General Keywords: seo
Focuses: performance Cc:

Description (last modified by jonoaldersonwp)

The paginate_links function constructs the HTML markup used in archive pagination.

As part of this, any parameters present in the requesting URL are merged into the pagination links (ref).

This is a safeguard to ensure that any logic which relies on $_GET parameters in a paginated series still functions as expected as the user navigates through paginated states. E.g., a plugin may use query parameters to describe a date range in an archive; such as /archive/page/2/?date_from=2001-12-20&date_to=2020-05-14.

I'd like for theme/plugin authors to be able to disable/bypass this 'merging' process.

The problem(s)

The current behaviour makes it possible to request (paginated) archive templates with arbitrary, meaningless query parameters (e.g., https://wordpress.org/support/topic-tag/iframe/page/5/?cats=yes&dogs=no&cake=please&utm_tracking=broken). Those parameters persist through all subsequent paginated requests.

This bypasses caches, breaks digital analytics, and inflates crawling from third party systems (search engines, social media, etc), as it 'creates' many additional URLs.

Furthermore, because WordPress doesn't natively output canonical URL tags on paginated archives, this can be extremely destructive from an SEO perspective; particularly on large sites, where the 'existence' of many such URLs compounds the problems above. And whilst the presence of a canonical tag would mitigate some of that damage, it wouldn't address those other problems.

A potential solution

I propose that we should introduce a new filter into paginate_links which allows theme/plugin authors to bypass the code block which merges in query parameters.

This could be used by sites/themes/plugins when they're confident that they don't need to persist queries.

The filter should accept a simple boolean value, which acts as a switch to skip the merging logic.

Alternatively:

  • The filter could define a list of allowed/expected query params, or;
  • The filter could restrict valid params to only include registered query vars.

A note on the existing paginate_links filter

Note that a paginate_links filter already exists at the end of the function, but this only allows for manipulating the href attribute of the link. Whilst this could technically be used to strip parameters via brute force, this is cumbersome, and requires knowledge of the correct canonical URL (which WordPress doesn't natively/reliably have).

Change History (5)

#1 @jonoaldersonwp
4 years ago

  • Description modified (diff)

#2 @jonoaldersonwp
4 years ago

  • Description modified (diff)

#3 @jonoaldersonwp
3 years ago

Note that this 'vulnerability' is responsible for large-scale SEO flavoured XSS attacks. Attackers can populate psuedo-links, as well as branding or political messaging, and have those persist across multiple paginated states. Those URLs might be crawled, shared, or exposed by other systems.

#4 @antonvlasenko
3 years ago

Alternatively:
The filter could define a list of allowed/expected query params, or;
The filter could restrict valid params to only include registered query vars.

I like the alternative approach you described above.
IMO allowing to short circuit/bypass the query logic may not be enough. A more flexible approach is better as it will enable fine-tuning the query parameters depending on the needs of a particular plugin.
Speaking of implementation, I'd add 2 filters:

  1. The first filter would be right before this line: https://core.trac.wordpress.org/browser/tags/5.7/src/wp-includes/general-template.php#L4215; it would allow filtering $args['add_args'] values.
  2. The second filter would be before this line: https://core.trac.wordpress.org/browser/tags/5.7/src/wp-includes/general-template.php#L4212. The second filter would allow filtering $url_query_args variable. It would allow to control which values get merged to the $args[‘add_args’] array.

#5 @paulkevan
3 years ago

Note that this 'vulnerability' is responsible for large-scale SEO flavoured XSS attacks.

Aside from the ability to continue with linked parameters, there doesn't appear to be any XSS in the current method.

Note: See TracTickets for help on using tickets.