WordPress.org

Make WordPress Core

Opened 3 years ago

Closed 2 years ago

Last modified 2 years ago

#18841 closed defect (bug) (invalid)

Sites with default permalinks do not return 'robots.txt'

Reported by: mbijon Owned by:
Milestone: Priority: normal
Severity: minor Version: 3.3
Component: Permalinks Keywords: 2nd-opinion dev-feedback close
Focuses: Cc:

Description (last modified by dd32)

Currently there is no .htaccess file when Permalinks are on the default (ie: /?p=123) setting.

Without an .htaccess file Apache does not direct requests for non-existent files to WordPress. Directing the request to WP is needed so the 'do_robots' or 'do_robotstxt' actions can respond with the rules set by the user in Permalink Settings.

  • Servers other than Apache likely have the same problem, but I have not tested them
  • This occurs in 3.2.1 and 3.3

I've marked severity 'major' since this undermines the Privacy Settings panel & may not meet user expectations. Please modify if this is not appropriate.

Change History (12)

comment:1 dd323 years ago

  • Description modified (diff)

The "Privacy" settings are really named incorrectly to start with, They're simply a request, and not one which Bots have to respect (and certainly don't).

As you've noted, without Pretty Permalinks enabled, robots.txt is unable to be processed, There's another ticket to enable them by default, however, for technical reasons, it's not always possible to enable them, thus, they're not enabled by default.

One option would be to disable the Privacy settings when mod_rewrite isn't enabled.

See also: #16416

comment:2 mbijon3 years ago

  • Keywords 2nd-opinion added

I'm not sure the name is completely wrong, it does communicate the "intent" to a new user fairly well. Naming discussions are always challenging to keep short too, so I'll punt on the naming here and suggest that's in another ticket.

As for disabling the Privacy options, we don't disable Permalinks when mod_rewrite or file write perms aren't there. Instead there's a number of alerts an instructions for manual settings or a changed server config. Mirroring the type of help in the Permalink messages to Privacy seems like a good option here.

Another (or additional) option is to try writing a static robots.txt file to disk during setup (since privacy options are prompted then). Then deleting it if privacy options are changed from default. Writing and deleting a static file is more code, but the static file will be more performant <edit> ...more performant in particular than the 404 that's currently returned</edit>

@dd32, thanks for that typo fix

Last edited 3 years ago by mbijon (previous) (diff)

comment:3 mbijon3 years ago

  • Keywords dev-feedback added

Just got time to look into fixing this and there seem to be two ways to approach it. Can I get some feedback on what the preferred method might be?

The options are below, either of them would basically add one .htaccess rule. The difference is if they're run from the Privacy or Permalinks pages:
RewriteRule ^robots\.txt$ /index.php [L]

(a) Add a new permalink structure to the WP_Rewrite class. Then from either WP setup or Options > Privacy we check to see if there's a non-default permalink structure in-use. If there's a structure this bug doesn't occur, and we do nothing. But if permalinks are on the default setting then the "robots_structure" is set.

(b) Create a default permalink structure that attempts to set the one .htaccess rule noted above during WP setup, or when Options > Permalinks is set from anything back to default. This would be a bit more indirect than (a), but would probably require less code.

comment:4 dd323 years ago

I think I'd prefer to just disable robots handling if pretty permalinks are disabled

Pretty permalinks are more of a "Is some kind of rewriting enabled?" switch, there's a reason we don't enable rewrites by default - not all server configurations allow it (especially non-Apache servers) and some even cause HTTP 500 errors when mod_rewrite is attempted to be used but isn't supported.

Until #6481 (enable pretty permalinks by default) is fixed I don't think we can support the direction requested here.

comment:5 mbijon3 years ago

By "disable robots handling" do you mean Privacy settings in install/Options page should be disabled?

Not a problem to disable these until Permalinks are on, but I think it's best UX if they're hidden during the install process then.


Related?

In light of this reliance on Permalinks, is it worth suggesting in #16416 that Privacy settings be moved to the Permalinks page, or at least be linked-from Permalinks (as-if they're "step # 2" of permalinks)

comment:6 nacin3 years ago

Privacy settings probably also need to be disabled if file_exists(robots.txt).

comment:7 mbijon3 years ago

That brings up an interesting workflow issue: Robots.txt is often manually created for complicated sites that want more control than the global (*) that WP provides, or when WP runs in a sub-folder.

Is it a good idea to duplicate the Permalinks prompt like "Paste the following into .htaccess", or does WP aim to abstract sysAdmin tasks away from users (I guess making robots-prompting a plugin issue)?

comment:8 follow-up: mbijon2 years ago

Based on feedback here & several other permalinks tasks, .htaccess is a minor landmine. Using that looks to be an overly-complicated solution.

Instead, would anyone be opposed to the 'noindex' meta tag being used on all pages as an alternative to a missing robots.txt (only when the default permalinks are on)?

@nacin's patch in #19251 could be extended into a simple fix for this.

comment:9 in reply to: ↑ 8 nacin2 years ago

  • Keywords close added
  • Severity changed from major to minor

Replying to mbijon:

Based on feedback here & several other permalinks tasks, .htaccess is a minor landmine. Using that looks to be an overly-complicated solution.

Instead, would anyone be opposed to the 'noindex' meta tag being used on all pages as an alternative to a missing robots.txt (only when the default permalinks are on)?

@nacin's patch in #19251 could be extended into a simple fix for this.

We already do this.

While sites with default permalinks do not return robots.txt, the site can be crawled correctly. Lowering severity, suggesting close.

comment:10 mbijon2 years ago

My mistake. Didn't think to read meta info before I created this ticket. Agreed on closing it.

Is it wrong for me to close it?
(Or should something like that be left for core team?)

comment:11 mbijon2 years ago

  • Resolution set to invalid
  • Status changed from new to closed

comment:12 helenyhou2 years ago

  • Milestone Awaiting Review deleted
Note: See TracTickets for help on using tickets.