Debug a Failing Codex App Task With Logs, Images, and a Tighter Retry Prompt

The first broken Codex run is usually where people make the worst decision.

They throw away the thread, open a new one, and write a longer prompt.

That feels like progress. Most of the time it just erases the only useful thing the failed run produced: evidence.

The better move is narrower. Keep the thread. Isolate the failure signal. Feed that signal back with less narration and better constraints.

This guide is for that repair loop.

Start By Naming What Actually Failed

Do not begin with "it didn't work."

That phrase hides the only part Codex can really use. A failed task usually broke in one of four ways:

the code changed the wrong files
the change did not fix the failing test or command
the code works mechanically but the UI is still wrong
the diff solved part of the task and wandered into unrelated cleanup

Those are different failure shapes. They need different retries.

If you do not name the failure shape first, the next prompt usually gets broader, not better.

Worktree Is Still The Right Place To Be Wrong

If the thread is already in Worktree, keep it there.

The official app docs say worktree mode isolates changes in a Git worktree while keeping the result reviewable inside the app. That matters more on a bad run than on a good one. A messy retry is much easier to tolerate when the damage is still boxed away from your main checkout.

If you started in Local and the task is drifting, that is usually a sign to get stricter with review and staging before you ask for another edit pass.

The goal of the debug loop is not to let Codex try more things. The goal is to keep each retry small enough that you can tell what changed.

Gather One Strong Failure Signal Before You Retry

Do not dump five different kinds of evidence into the thread at once.

Pick the signal that best matches the bug:

For a failing command or test, use terminal output.
For a visual regression, use one image.
For a suspicious edit, use the review pane and comment on the exact line.
For diff sprawl, use the smallest review scope instead of more prose.

The official app features page says every thread includes an integrated terminal, and Codex can read the current terminal output. That makes the terminal the best first source of truth for non-visual failures.

The same page also says you can drag and drop images into the composer as context, and you can ask Codex to view images on your system. That matters for front-end or browser issues where a screenshot explains the bug faster than a paragraph.

One screenshot is useful when the problem is visual. Five screenshots usually mean you have not reduced the bug yet.

Use this map to choose one strong signal before you write the retry prompt. The point is not to give Codex more context. The point is to give it cleaner context.

Use The Review Pane To Shrink The Problem Back Down

Do not scroll the whole diff and hope the right correction appears to you.

The official review docs say the app can show:

uncommitted changes
all branch changes
Last turn changes

When a thread starts drifting, switch to Last turn changes first. That is the fastest way to answer a simple question: which part of the most recent turn actually made things worse?

The review docs also say the pane can show changes made by Codex, by you, and by any other uncommitted work in the repo. That is exactly why broad retries go bad. If the repo state is already mixed, your follow-up prompt has to be more surgical.

Inline Comments Beat Paragraph Feedback

If one hunk is wrong, comment on the hunk.

The app review page says inline comments are often the fastest way to guide Codex to the right fix because the feedback stays anchored to the exact line. After leaving the comment, send a short follow-up message that makes the intent explicit.

That is better than writing a long correction in the thread because line-anchored feedback removes guesswork.

Official OpenAI Developers review documentation screenshot checked on 2026-04-13. It is useful here because it shows the same review loop this tutorial is talking about: line-anchored feedback, visible review output, and Git actions before the next retry.

Useful inline comment patterns:

"Keep this file change, but do not alter the response shape."
"The bug is not here. Revert this hunk and inspect the validation branch instead."
"This test expectation changed, but the underlying bug is still in the formatter."

Bad inline feedback usually sounds like vague disappointment:

"This seems off."
"Try again."
"Not what I meant."

Those phrases create another search task when what you need is a fix task.

Run The Smallest Reproduction In The Integrated Terminal

A retry without a fresh reproduction step is mostly wishful thinking.

The app docs explicitly recommend using the integrated terminal for project commands such as git status, npm test, lint checks, or other project-specific commands. For debugging, that means you should rerun the smallest command that proves the bug still exists.

Good examples:

the one failing test file, not the full suite
the one lint command for the touched package, not the whole monorepo
the one local preview step that reproduces the broken state

If the command output changed, paste that fact back into the thread. Do not summarize it into softer language.

expected 200, got 500 is better than "the endpoint still seems broken."

If the terminal output is very long, paste only the failing section plus one line of command context. Dumping a whole build log back into the thread usually recreates the same ambiguity you were trying to remove.

Use This Diagnosis-First Retry Prompt

When the first pass is wrong, do not ask for another full implementation immediately. Ask Codex to diagnose the failure first and keep the next move small.

The previous turn did not solve the task.

Failure signal:
- [paste the exact failing command output, or attach one image and describe the mismatch in one sentence]

Before editing again:
1. Explain the most likely root cause in 3 to 5 lines
2. Name the smallest file or code path that now matters most
3. Say what from the previous turn should be kept
4. Say what should be reverted or ignored

Retry rules:
- keep the work inside the smallest possible boundary
- do not expand into cleanup
- validate with the same narrow reproduction step
- if the failure signal is still ambiguous, stop after diagnosis instead of guessing

This does two useful things. It forces Codex to re-read the failure instead of continuing momentum from the old plan, and it gives you a clean off-ramp if the root cause still is not clear.

This loop is deliberately small. If the failure changes shape, stop treating it as the same task.

For Visual Bugs, Use One Image And One Sentence

This is where many retries become noisy.

If the bug is visual, the app's image input support is usually worth using. Drag one screenshot into the composer or ask Codex to inspect the saved image on disk. Then describe the mismatch in one sentence:

"The modal footer is outside the card on mobile."
"The selected tab is losing its active background after refresh."
"The CTA wraps to two lines at tablet width."

Do not turn the note into a design essay. The screenshot already carries most of the context.

If the bug is not visual, skip the image and give Codex the failing command output instead. A screenshot of a stack trace is usually worse than copying the text.

If The Diff Is Half Right, Stage The Good Part First

A bad run is not always a fully bad run.

The review docs say the app can stage, unstage, or revert changes at the diff, file, or hunk level. Use that. If one part of the thread is correct, keep it out of the next retry so Codex is not solving the same problem twice.

This is especially useful when:

one test fix is correct but the refactor around it is not
one UI adjustment is right but the surrounding cleanup is noisy
one file has the real fix and the second file is speculative

The app can only stay sharp if you help separate "good but incomplete" from "wrong and distracting."

Know When To Stop Retrying In The Same Thread

Not every bad result deserves a third pass.

Start a new thread when one of these becomes true:

the failure signal changed and you are now debugging a different bug
the diff history is too mixed to review calmly
the task boundary turned out to be wrong
you now need a different mode, project scope, or environment than the one this thread started with

Stay in the same thread when the bug is still the same and the missing piece is clearer evidence.

That distinction matters. New threads are for new task boundaries, not for emotional reset.

Common Mistakes

retrying with a broader prompt instead of a narrower failure signal
attaching multiple screenshots before deciding which visual bug actually matters
summarizing command output instead of pasting the exact failing line
asking for more edits before deciding what to keep, stage, or revert
leaving vague inline comments that create another interpretation problem
treating a changed failure as if it were still the original bug

The first failed run is not wasted work if it gives you a cleaner second brief.

Official References

What To Read Next

Read Use the Codex App to Hand Off One Bounded Coding Task and Review the Result if you skipped the first-thread setup logic and need the calmer starting workflow first.

Read Install and Update Codex CLI Before Your First Real Task if the real problem is still local access, login, or environment setup rather than task debugging.

Read Codex vs Cursor if you keep wanting live editor steering instead of thread-based retries.

Debug a Failing Codex App Task With Logs, Images, and a Tighter Retry Prompt

Start By Naming What Actually Failed

Worktree Is Still The Right Place To Be Wrong

Gather One Strong Failure Signal Before You Retry

Use The Review Pane To Shrink The Problem Back Down

Inline Comments Beat Paragraph Feedback

Run The Smallest Reproduction In The Integrated Terminal

Use This Diagnosis-First Retry Prompt

For Visual Bugs, Use One Image And One Sentence

If The Diff Is Half Right, Stage The Good Part First

Know When To Stop Retrying In The Same Thread

Common Mistakes

Official References

What To Read Next

Use the guide like a checklist.

This guide treats a bad first run as debugging material, not as a reason to start over

Use This Guide If

Related Tool Pages

Related Compare Pages

Related Best Pages

More Tutorials In This Category