Two weeks with the GitHub Coding Agent

Jun 28, 2025 · 6 min read

For the last two weeks, we’ve been learning to work with the new GitHub Coding Agent. This is a shift for me. Moving from working with AI in my IDE, to actually delegating entire tasks to an agent that works asynchronously in the cloud. The aim is to be more efficient, and provide a bigger impact for the business.

From now on we will call the GitHub Coding Agent “Mx Robot”. Why? I think you need to think of this as a colleague that may or may not deliver what you need. I see in the community we are comparing a Coding Agent to be like a Junior Developer, and this makes sense to be at the moment. I’m learning new ways to think of solving the issue, which you also get with Junior engineers (fresh eyes!).

How we did it

First off, I’m in a team of two people at the moment. So the first week, it was very much an after thought. “Oh I could have given this to AI”. Only 7% of the pull requests merged were from Mx Robot. The second week we were much more intentional. Each morning at 9am, we had an informal review of the backlog and agreed what to delegate to Mx Robot. We currently work in Jira, with no integration with the coding agent. This meant we had to re-write/copy/paste the Jira ticket into a GitHub Issue, or the Copilot agent window on GitHub.

In the second week, 24% of our pull requests were created by Mx Robot.

Once the prompt had initiated work, we copied our prompt back to the Jira ticket so we had a reference of what was asked. The intention here is to learn which phrasing provides the most success.

The process took an hour each morning. This was mainly due to learning and then demoing our process to another team. In reality this could have been 30 minutes. I would say roughly 60% of the work last week was ear marked as “AI achievable”. This means that we believed we could delegate to AI if we wanted to and get some level of success.

We agreed to only have 2 tickets in progress with Mx Robot at a time. Maybe it’s a coincidence, or maybe 1 AI ticket per dev (plus their own work) is a workable heuristic. We haven’t gathered enough data to make a determination yet. Our team grows to three devs next week, so we can see if this holds true.

We also agreed to only have three interactions with Mx Robot on the pull request. If we couldn’t get it over the line with three interactions then we either:

  1. Scrap it entirely and get humans to do the work, or
  2. Take over from the pull request if there was a good foundation to build on.

Lastly, and maybe we were too pessimistic, we agreed that green field development that needed a wide range of thought we would leave to the humans.

In week two, 24% of our pull requests were created by Mx Robot. We went from a statistic of 1.1 tickets per human per day, to 1.8. This is over a 3 week rolling average.

What worked

We’ve given Mx Robot the following types of work

Most of those changes were then reviewed, tested, approved, and merged within three interactions. I personally feel like I’ve had more of a “testing” mindset when reviewing the work from Mx Robot. I don’t trust that it will have caught everything, so there is more testing overhead here. Keep this in mind when thinking about productivity. You don’t get the results entirely for free without human productivity helping.

How did it all play out then?

What didn’t work

The flow of going from Jira to a GitHub Issue or Copilot window is very clunky. So much so, that we aim to trial not working in Jira for an entire week. We want to bring the product managers and designers closer to Mx Robot, not make engineers a proxy from ticket to pull request. “Pushing left” is still a thing in this new era.

The other big take away from the two weeks would be specificity. When we were super clear what we wanted, it did well. Where we were more vague we had to be more involved with the pull request. I suspect this is the distinction between AI Assisted and Vibe Coding. We did care about the code, and the solution, in comparison to vibe coding where you’re less concerned with the output of code.

Next experiments

Next week we are going to mix it up a little.

Speaking for myself, I believe it increased the cognitive load and context switching to some extent. However, I think this is about letting the flow bed in. Removing the sync call at 9am will give us some time back, that means we can spread the AI reviews throughout the day a little more.

Tips


It’s early days, but I can see this becoming a core part of my workflow. Not because it’s flashy, but because it quietly gets things done. I’m curious to see when our Mx Robot might be up for promotion too.

… and yes, the banner image was created by Copilot (and Leonardo da Vinci).