Opened 9 months ago
Last modified 2 months ago
#63901 new feature request
Add `AGENTS.md` for the project
| Reported by: |
|
Owned by: | |
|---|---|---|---|
| Milestone: | 7.1 | Priority: | normal |
| Severity: | normal | Version: | |
| Component: | Build/Test Tools | Keywords: | |
| Focuses: | Cc: |
Description
This ticket proposes the inclusion of an AGENTS.md file for WordPress Core, to centrally provide the context for AI assisted coding tools and agentic solutions.
AGENTS.md is an emerging standard for a central LLM assistance file supported by many tools. It addresses the problem of having to favor specific tools, and the problem of having to include many different files with similar contents just to satisfy different popular tools.
AGENTS.md is already widely supported, as seen on the linked website. For tools without out-of-the-box support, it should be possible to configure the file manually as an additional context file, or to symlink it under another file name mandated by the respective tool.
As for the contents of the file, I think we should approach this pragmatically. We will not be able to (nor should we) holistically cover everything, and we will need to come up with something that works well enough to start with, rather than being perfect. Only actual usage with the different AI coding assistants that individual contributors prefer will identify room for improvements, whether missing crucial context or existing content that confuses LLMs.
Change History (18)
#2
@
9 months ago
I've had good luck pointing agents to where they can find information such as how to run tests, rather than including it in the helper file. "See CONTRIBUTING.md for information on how to run tests", etc.
#3
@
9 months ago
I think that this reopens the debate: Is wordpress-develop meant to be a fully-featured development environment? Or it was only meant for CI building purposes?
I have the same doubt when planning to add new features, like the need of testing anything for the Mail component, without having a tool by default.
Personally, I'm in favor of adding dev tools of any type, maybe including this (this is technically not a tool, but it serves as it was).
Although it might generate a lot of redundancy with other files as @johnbillion suggest, like contributing.md. Obviously keeping a file like this short and clear for an LLM will save some context credits
PS: Btw, are we eager to promote the use of LLMs for Core?
#4
@
9 months ago
Here's a proposal for a high-level outline we could start with:
That's a great overview of what could be included. At the same time, I share the general sentiment that, at this point, all these coding tools should be mature enough and to be able to understand the usual project's structure and process README.md, CONTRIBUTING.md, composer.jon, package.json, and synthesize that into its own entry in the context.
I'm not against including AGENTS.md if that improves the workflows. It's more of a remark that CONTRIBUTING.md should work for both usual contributors and agents. They could standardize sections in that file if there is a need for some agent-oriented information.
#5
@
9 months ago
"Are we eager to promote the use of LLMs for Core?"
Absolutely. We aim to make WordPress itself, as well as its plugins, themes, and extended ecosystem, more legible and easy to use with AI tools. This will enable us to harness the passion, talent, and creativity of WordPress contributors to explore and experiment with these tools, ultimately becoming more efficient in achieving our mission of democratizing publishing, making the web more open source, and enhancing the stability, performance, and security of all WordPress users.
Our founding ethos was fueled by web standards, interoperability, and hackability. This is today's version of that. There are tools available for free or pennies that give capabilities beyond what we could have imagined even five years ago, let's support that and see what happens. Let a thousand flowers bloom.
#6
@
9 months ago
- High-level architecture: Outlines a few key concepts, design patterns, and philosophies for the project's architecture, potentially including a few sub sections. Could also cover aspects like directory structure.
@flixos90 I've found really good success for bigger code bases when trying to convey bigger concepts to mention specific folders where the "Agent" can find .md files with more explanations and even code examples, following the Context7 pattern (https://context7.com/wordpress/gutenberg?topic=slotfill).
#7
@
9 months ago
My anecdotal experience aligns with @johnbillion 's and @gziolo 's comments:
whatever.md AGENTS.md is great when a project lacks documentation or tooling, but can't compete with those "sources of truth" and can in many cases cause LLM output to degrade, for example:
- Across models, e.g. using absolutist language ("always","never") is strongly recommended in GPT3.5/Claude Sonnet 3.7, but a footgun in more "sycophantic" models like GPT4o.
- When the .md conflicts with the sources of truth, e.g. when told to "follow WordPress Coding Standards" but the agent keeps discovering noncompliant code (legacy in core, modern if we're talking other WordPress/* projects) or the lints keep failing.
Also want to remind folks how amorphous evaluating the efficacy of these early-stage experiments. Taking a cue from Matt's q&a (albeit in a different context), I think it's crucial to first lay out a plan to test/measure/iterate instead of just theory-crafting with our (albeit collectively experienced) gut. For example, we should be able to answer:
- Is this (or any) AGENTS.md better or worse than no file at all
- It this (or any) AGENTS.md better than a Directory Tree with some context comments and a link to existing documentation. (Or just the CONTRIBUTING.md if it's already been optimized for both humans and agents).
- Is X version of the .md better or worse than whatever first version we decide to commit?
Otherwise we're just throwing seeds out of the car window shouting "bloom" in hopes something will catch hold and germinate, there's faster and more effective/impactful ways to start a garden. (I'm assuming the metaphor was intended literally and not as employed by Mau)
This ticket was mentioned in Slack in #core-committers by westonruter. View the logs.
9 months ago
#9
follow-up:
↓ 10
@
9 months ago
I think adding AGENTS.md is a great idea, that we should start simple, and that while tools should generally be good at analyzing and picking up existing structure, it can be helpful to point them in the right initial direction.
FWIW, I've found success by maintaining context in a separate directory (like /docs) and then explicitly loading it through the main agent file (previously, CLAUDE.md).
# Agent context for the WordPress project This is the root level context file for the open source project, WordPress. ## Project overview Always load @CONTRIBUTING.md, @README.md, and @docs/architecture.md .... when starting a new session.
This can then be tested with a prompt like:
> What context have you loaded already? Please provide filenames.
I've loaded the following context files:
- /{HOME}/wordpress-develop/AGENTS.md
- /{HOME}/wordpress-develop/CONTRIBUTING.md
- /{HOME}/wordpress-develop/README.md
- /{HOME}/wordpress-develop/docs/architecture.md
IMO, this helps keep AGENTS.md clean and can allow for additional context to be designed more for people and agents.
This is also a good opportunity to revisit our existing documentation and improve it for the current state of the project. (e.g. code maintained in other repos, explanation of src/ and build/ directories, etc...)
#10
in reply to:
↑ 9
;
follow-up:
↓ 11
@
8 months ago
Using this solely for illustrative purposes (I understand the specific wording isn't the focus 🙇):
FWIW, I've found success by maintaining context in a separate directory (like
/docs) and then explicitly loading it through the main agent file (previously,CLAUDE.md).
# Agent context for the WordPress project This is the root level context file for the open source project, WordPress. ## Project overview Always load @CONTRIBUTING.md, @README.md, and @docs/architecture.md .... when starting a new session.
I want to repeat that GitHub Copilot explicitly recommends not to use absolute language like "always".
You should also consider the size and complexity of your repository. The following types of instructions may work for a small repository with only a few contributors, but for a large and diverse repository, these may cause problems:
- Requests to refer to external resources when formulating a response
- Instructions to answer in a particular style
- Requests to always respond with a certain level of detail
For example, the following instructions may not have the intended results:
Always conform to the coding styles defined in styleguide.md in repo my-org/my-repo when generating code. Use @terminal when answering questions about Git. Answer all questions in the style of a friendly colleague, using informal language. Answer all questions in less than 1000 characters, and words of no more than 12 characters.
Does Claude Code or whatever still need absolute language to prevent it from ignoring our AGENTS.md and falling back to the built-in instruction set when the context window gets too large? Is GitHub's recommendation just as true for when using GPT5 or only the more sycophantic 4x models that are used by default?
I don't know. But I do feel that in any most other context the bulk of core committers and leadership (yup acutely aware of all my heros I'm core-splaining to right now 😅) would strongly oppose to adding such an opaque footgun to core. I mean we won't even phpcbf legacy code because it might cause some diff headaches on old PRs, but we're cool with something that can actively degrade the contributor experience - while costing them money on wasted tokens! - with no explicit indicator or hint that it's a bug with the instructions and not e.g. Anthropic secretly rate limiting and using a worse model?
More direct feedback
Replying to jeremyfelt:
This can then be tested with a prompt like:
What context have you loaded already? Please provide filenames.
I think we need test the _results_, i.e. the effect on the ability to generate compliant code or accurately answer questions about/navigate the codebase.
- A positive answer here doesn't prove those files are in context. It doesn't even prove that
AGENTS.mdis in the context (or still unsupported by the IDE), just that when asked the question the LLM was able to discover that and parrot back what what's written there. - Just because something is "in context" doesn't mean it's having a positive effect out the LLM output. A big part of the shift to subagents rn is that ability to only have the relevant info for the task.
#11
in reply to:
↑ 10
@
8 months ago
Replying to justlevine:
I don't know. But I do feel that in any most other context the bulk of core committers and leadership [...] would strongly oppose to adding such an opaque footgun to core.
I think that's why starting simple is important. Focus on keeping documentation readable and useful to humans. See what happens when you tell the model to start with that. As time goes on, add documentation for tools that enhance the experience.
My previous example could be just the one line, and without "Always", but I don't think it hurts for AGENTS.md to also be targeted to humans.
- A positive answer here doesn't prove those files are in context.
Don't trust, verify. :) So far, in my test case of me, additional prompts react as expected, even with weird chains of context files I've setup.
- Just because something is "in context" doesn't mean it's having a positive effect out the LLM output.
This is much harder to measure, of course, and I'm not sure there's an automated way to test it.
Issue-specific user prompting matters more, but it at least feels helpful for there to be an entry point that provides a readable, structured overview so that the agent doesn't start attempting to parse a bunch of unrelated files into context immediately.
#12
@
8 months ago
Felt it important to come back and highlight that yet again a tool that was ostensibly supposed to improve results when working with an LLM is now being said to | produce worse results vs traditional best practices that we already enforce. That doesn't mean that LLMs.txt, Structured Markup, MCP Agents.md is complete hype, it just a reminder we should to test the results of adopting this tool at least the same amount we would for any other.
As to why not go with our gut and iterate by trial and error, I'll remind everyone about July's METR report that showed that developers | consistent felt AI assisted coding tools sped them up, when it was actually making them take ~19% longer. There is a measurable dopamine influence involved in how we "experience" AI productivity gains, so it's pretty important we rely on something just a bit more concrete than vibes.
This is much harder to measure
We can use the preexisting | test report flow where we come up with a handful of prompts for a few different types of tasks/scenarios, and then people share their results (branch diff if it was a task or answer if it was a question, chat history, model, contents of their AGENTS.md file etc etc). We make a template so it's mindless to report, and we compare it to a baseline of no AGENTS.md and an AGENTS.md that is just a table-of-contents to existing documentation files.
It's lower barrier-to-entry than contributing normal test reports because the tester doesn't even need specific WP or AI knowledge to prompt the LLM, nor to even evaluate the results, just report them. Plus it balances the presumed model skew based on who's starred this thread (which is probably the closest metaphor to traditional "environments" we have right now). We don't need a specific threshold on how many to collect just a bare minimum of due diligence and a tangible feedback loop.
#13
@
3 months ago
- Milestone changed from Awaiting Review to 7.0
Related: #64587 (AI Guidelines should be referenced in the pull request template)
When using Copilot for pull request reviews, it should have the context from AGENTS.md to provide the best feedback and suggestions.
#14
@
3 months ago
- Type changed from enhancement to task (blessed)
This enhancement still has no patch and no consensus on the patch, so it may be a bit late for this to ship in 7.0.
However, given the scope of the (future) changeset, I believe this is more a Blessed Task than an Enhancement, so I'm converting it to a task so it can ship whenever it's ready to go.
#15
follow-up:
↓ 17
@
3 months ago
whenever it's ready to go.
To move the discussion forward in the meantime, here's | yet another preprint that doesn't just cast doubt on the efficacy of AGENTS.md but tries to quantify the measurable negative impacts. From the abstract:
Across multiple coding agents and LLMs, we find that context files tend to reduce task success rates compared to providing no repository context, while also increasing inference cost by over 20%.
And that's before taking into account that different LLMs treat instructions differently. We recently found out that even many on the Core AI team aren't up-to-date on model particularities like how | you should avoid ALL CAPs GPT-5 model, so I don't know how other folks are supposed to keep up.
Case in point: it took almost half a year before Gutenberg noticed that Claude Code wasn't even reading their AGENTS.md file.
Unsurprisingly, the | vibe-coded update includes ALL CAPS and other anecdotally problematic antipatterns. (I've asked for clarification there whether anyone tested efficacy before committing).
We wouldn't commit any other build tool that we didn't have the availability to review or maintain. And yet, unlike other build tools, its behavior changes wildly by external changes (harnesses, llms themselves), even if we could commit today's "best practices", let alone in a way where it didn't hurt contributors who aren't using a particular proprietary model/harness.
I recommend we immediately remove any existing AGENTS.md files from all public WordPress/* repos that are intended for community contribution, and only restore them when:
- We've done basic tests to demonstrate the specific contents in the file helps, not hurts, contributors (and across more than just 1 proprietary model+harness).
- We have some plan for how folks can sustainably maintain and keep the files updated for as long as it remains relevant as a context hack.
#16
@
3 months ago
I recommend we immediately remove any existing AGENTS.md files from all public WordPress/* repos that are intended for community contribution, and only restore them when:
This is backwards to me. AGENTS.md (and other agent instructions) are early emerging mechanisms for agent focused documentation/intro into a codebase. Like any documentation, they can be iterated on. Instead of gating their introduction, let's suggest improvements and include them.
The ideas around adding some sort of basic tests is neat, run with it - setup some tests and see what it surfaces! I don't think that should be a blocker to getting these extra guidelines in place though. Especially when I'm not that confident in the ability for the tests to cover all the widely different ways non-deterministic LLMs handle different prompting, context, model, and harness usage.
#17
in reply to:
↑ 15
@
3 months ago
Replying to justlevine:
whenever it's ready to go.
To move the discussion forward in the meantime, here's | yet another preprint that doesn't just cast doubt on the efficacy of AGENTS.md but tries to quantify the measurable negative impacts. From the abstract:
Across multiple coding agents and LLMs, we find that context files tend to reduce task success rates compared to providing no repository context, while also increasing inference cost by over 20%.
Thanks for linking to the paper, David. While it has its limitations (python only, mostly smaller repos) I also found it super useful and reading deeper into it, the conclusions that matter are more nuanced. The main headline is “Human context files increase cost and performance” and for newer, smarter models with more context, the cost and context differences are smaller. I wish they had tested with codex 5.3/opus 4.6, but I guess they ran the tests some time ago.
Back to WordPress. Running the tests is an admittedly anecdotal example, but I think it illustrates a similar well. I tried it without much instructions and both claude code opus 4.6 and codex gpt-5.3-codex figured it something, but took 10+ minutes of research, burnt a ton of tokens in the process and did different things (one tried to use directly the scripts in tools/local-env and had some missteps). Adding a few commands and pointers to a main AGENTS.md file saved a lot of effort while adding very little context. Similar experience with making REST API changes, lots of tokens and research went into finding what lives where.
Here is what I'd suggest as guidelines and path forward:
- Write minimal guideline files by hand. Avoid LLM-generated files and long explanations.
- Include mostly conventions and pointers that aren't easily inferred like basic commands, where lives what, important ideas (like backwards compatibility) that we'd like the model to really follow.
- Given the model progress and the size of the codebase, I am not worried about guideline files adding a ton of extra context or steps.
- Over time invite contributors to share situations where LLMs have failed (or have figured things out really slowly) and if they're important enough we can include them in the main file, but we should be conservative.
- I feel strongly that we should add something minimal and iterate to avoid the extra barrier of adding something in the future.
#18
@
2 months ago
- Milestone changed from 7.0 to 7.1
- Type changed from task (blessed) to feature request
With 7.0 RC1 due out in a few hours, I'm going to punt this to 7.1.
If there's consensus and a working patch before final release, this can be reconsidered because it's not a change that affects the built software.
Here's a proposal for a high-level outline we could start with:
composer lintornpm run test:php.