WordPress.org

Make WordPress Core

Opened 4 years ago

Closed 4 years ago

#36390 closed defect (bug) (fixed)

Disallow /wp-json/ crawling

Reported by: SergeyBiryukov Owned by: rmccue
Milestone: 4.6 Priority: normal
Severity: normal Version: 4.4
Component: REST API Keywords: has-patch commit
Focuses: Cc:

Description (last modified by SergeyBiryukov)

Since 4.4, I'm seeing support topics about /wp-json/ being crawled by search engines, apparently leading to penalties for duplicate content in some cases, caused by wp_oembed_add_discovery_links(), or higher server load in other.

Some blog posts suggest disallowing /wp-json/ in robots.txt or even disabling REST API altogether.

Should we add Disallow: /wp-json/ to do_robots()?

Attachments (2)

36390.patch (710 bytes) - added by m_uysl 4 years ago.
36390.2.diff (634 bytes) - added by rachelbaker 4 years ago.

Download all attachments as: .zip

Change History (12)

#1 @SergeyBiryukov
4 years ago

  • Description modified (diff)

#2 @johnbillion
4 years ago

  • Version set to 4.4

#3 @SergeyBiryukov
4 years ago

  • Summary changed from Disallow: /wp-json/ crawling to Disallow /wp-json/ crawling

#4 @peterwilsoncc
4 years ago

Google, etc, need to crawl it for sites loading content via the API using JavaScript. This prevents it from being included in robots.txt.

Perhaps an X-Robots-Tag: noindex header on the API end-points to prevent the duplicate content penalties.

This ticket was mentioned in Slack in #core-restapi by joehoyle. View the logs.


4 years ago

#6 @peterwilsoncc
4 years ago

  • Keywords needs-patch added

@m_uysl
4 years ago

#7 @SergeyBiryukov
4 years ago

  • Keywords has-patch added; needs-patch removed
  • Milestone changed from Awaiting Review to 4.6

@rachelbaker
4 years ago

#8 @rachelbaker
4 years ago

  • Keywords commit added

Thanks for the patch @m_uysl, I just moved the line up a bit.

@rmccue you okay with 36390.2.diff?

#9 @rachelbaker
4 years ago

  • Owner set to rmccue
  • Status changed from new to assigned

#10 @rachelbaker
4 years ago

  • Resolution set to fixed
  • Status changed from assigned to closed

In 37726:

REST API: Include X-Robots-Tag: noindex header in REST API responses to prevent endpoints from being indexed by search engines.

Prevent duplicate content issues with search engines and REST API endpoint response data.

Fixes #36390.
Props m_uysl for the initial patch.

Note: See TracTickets for help on using tickets.