The Prompt-to-PR Workflow: Briefing an AI Agent Like a Senior Developer
Turn a written brief into a reviewable pull request. This guide shows how to delegate to a coding agent the way you'd brief a senior developer, with a step-by-step workflow.

Table of contents
There is a meaningful difference between asking an AI to "fix the login bug" and handing it the kind of brief you'd give a senior developer before a two-week trip. The first gets you a guess. The second gets you a pull request you can actually review. This guide walks the prompt-to-PR workflow: turning a written brief into a reviewable change set, the way good engineering managers delegate.
The shift behind this is that the leading coding tools are no longer autocomplete — they are agents that run a loop. GitHub's Copilot coding agent, launched in 2025, is described plainly in GitHub's own announcement: "You assign Copilot an issue and it plans the work, opens a pull request, writes the code, runs the tests, and then asks for your review." Anthropic's Claude Code is "an agentic coding environment" that "can read your files, run commands, make changes, and autonomously work through problems." OpenAI's Codex runs each task "in its own cloud sandbox environment, preloaded with your repository" and then proposes a pull request. The common pattern: plan, edit, run tests, iterate, open a PR. Your leverage is entirely in the brief you hand it and the review you do at the end.
Step 1 — Write the brief like a spec, not a wish
The biggest predictor of a usable PR is the quality of the brief. Addy Osmani (Google Chrome) puts it bluntly: "Most agent files fail because they're too vague." GitHub's guidance for writing a good Copilot issue lists exactly what to include: relevant background, the expected outcome ("what 'done' looks like"), technical details (file names, functions, components involved), and any formatting or linting rules.
Anthropic's advice is the same in different words: "The more precise your instructions, the fewer corrections you'll need." Its docs recommend you scope the task, point to sources, reference existing patterns, and describe the symptom — turning "fix the login bug" into "users report login fails after session timeout; check the auth flow in src/auth/, write a failing test that reproduces it, then fix it." The most useful specs, Anthropic notes, "name the files and interfaces involved, state what is out of scope, and end with an end-to-end verification step."
Step 2 — Set boundaries and a definition of done
A senior developer knows what not to touch. An agent does not, unless you tell it. Osmani recommends a three-tier boundary system in the brief — what to always do, what to ask first about, and what to never touch — alongside explicit acceptance criteria and a definition of done. Without "out of scope" lines, agents cheerfully refactor things you never asked about, which is how a one-line fix becomes a 400-line diff nobody can review.
Step 3 — Give it a feedback loop it can run
This is the step that separates a session you can walk away from from one you have to babysit. Anthropic's guidance is direct: "Give Claude a check it can run: tests, a build, a screenshot to compare." The agent uses that loop to self-correct before it ever reaches you. The danger it names is the "trust-then-verify gap": the agent "produces a plausible-looking implementation that doesn't handle edge cases," and the fix is "Always provide verification... If you can't verify it, don't ship it."
The infrastructure already supports this. GitHub's Copilot agent works "in its own ephemeral development environment, powered by GitHub Actions, where it can explore your code, make changes, execute automated tests and linters," with a hard 59-minute cap per session that nudges you toward smaller tasks. Codex provides "verifiable evidence of its actions through citations of terminal logs and test outputs." If your repo has tests and CI, the agent inherits a referee.
Step 4 — Briefing parts, side by side
The table maps each part of a good developer brief to its agent equivalent and the failure it prevents.
| Brief component | What to write | What it prevents |
|---|---|---|
| Context / background | Why the change matters; link the issue or PRD | The agent solving the wrong problem |
| Files in scope | Exact paths and components to touch | Sprawling, off-target diffs |
| Out of scope | "Never touch" list (e.g. auth, billing) | Unrequested, risky refactors |
| Acceptance criteria | Observable "done" conditions | Vague output you can't grade |
| Verification | The test/build/command that proves success | The trust-then-verify gap |
| Conventions | House patterns, lint rules, commit format | Style drift and reinvented wheels |
Step 5 — Review the diff, not the vibe
The final step is non-negotiable, and it is yours. Simon Willison (co-creator of Django) frames the rule for agentic engineering: "Don't file pull requests with code you haven't reviewed yourself... The initial review pass is your responsibility, not something you should farm out to others." His mental model of these tools is worth keeping in mind — an "over-confident pair programming assistant" that is fast and well-read but "will absolutely make mistakes — sometimes subtle, sometimes huge." He also prefers "several small PRs" over one large one, which keeps each review tractable.
The agent cannot be the final reviewer of its own code, for the same reason a junior developer can't approve their own PR: the author is the worst-placed person to spot what they missed. Anthropic even builds this in with a "Writer/Reviewer pattern" — a separate, fresh-context session reviews the diff, because "a fresh context improves code review since Claude won't be biased toward code it just wrote."
A note on capability
These agents are genuinely strong now, which is why the discipline matters. On SWE-bench Verified, a benchmark of real-world software-engineering tasks, Anthropic's Claude Opus 4.5 reported 80.9% (announced November 2025) and OpenAI's GPT-5 reported 74.9% (announced August 2025). High scores on bounded benchmark tasks are not the same as judgment on your messy production repo — treat the numbers as evidence the agent deserves a real brief and a real review, not as permission to skip either.
FAQ
Do I need an autonomous agent, or is inline autocomplete enough? For a single line, autocomplete is fine. The prompt-to-PR workflow pays off when the unit of work is a whole feature or fix you want delivered as a reviewable branch.
How big should a task be? Small enough that the diff is reviewable in one sitting. GitHub explicitly advises starting with "well-scoped, smaller tasks before attempting complex ones," and its agent's 59-minute session cap reinforces it.
Where do I put repo conventions so I don't repeat them?
In a checked-in context file — Claude Code reads CLAUDE.md at the start of every conversation. Anthropic warns that bloated context files get ignored, so keep it tight.
Can I let the agent merge its own PR? No. Review is the human's job. Both GitHub's and Anthropic's workflows end with a human reviewing the diff before merge.
Bottom line
Briefing an agent well is the same skill as delegating to a senior developer: state the goal, name the files, mark what's out of scope, give it a way to prove success, and review the diff yourself. Get the brief right and the loop runs itself; skip it and you get fast, confident, wrong code. The tool is capable — the leverage is in how you hand it the work.
Sources and further reading
- Anthropic: Best practices for Claude Code https://code.claude.com/docs/en/best-practices
- GitHub Blog: Assigning and completing issues with coding agent in GitHub Copilot https://github.blog/ai-and-ml/github-copilot/assigning-and-completing-issues-with-coding-agent-in-github-copilot/
- Addy Osmani: How to write a good spec for AI agents https://addyosmani.com/blog/good-spec/
- Simon Willison: Here's how I use LLMs to help me write code https://simonwillison.net/2025/Mar/11/using-llms-for-code/


