GitHub Devs Go Hands-On: Comparing Copilot AI Models Across Modes -- Virtualization Review

GitHub Devs Go Hands-On: Comparing Copilot AI Models Across Modes

By David Ramel
05/12/2025

Which AI is best for your coding task in GitHub Copilot? With multiple models and modes available, it's not always clear. Two GitHub developers directly compared models like Claude 3.5 and Gemini 2.5 Pro in Copilot's Ask, Edit, and Agent modes, demonstrating their strengths and weaknesses on real development challenges.

GitHub developer advocates Kedasha Kerr and Jon Peck built a travel reservation app from scratch using Copilot inside Visual Studio Code and talked about modes and models in the May 10 video, "GitHub Copilot deep dive: Model selection, prompting techniques & agent mode." Over the course of the session, they explored Copilot's three interaction modes -- Ask, Edit, and Agent -- while directly comparing how different AI models performed at each stage. By switching between Claude 3.5 Sonnet, Gemini 2.5 Pro, GPT-4.1 and other models in real time, they revealed how each model handles building, refactoring, debugging, and documenting in practical development workflows.

The video follows guidance on modes published earlier this month by GitHub, "Copilot ask, edit, and agent modes: What they do and when to use them."

Here's a summary of the modes:

Ask mode: A conversational interface for questions, explanations, and quick debugging. Use it when you want insight, feedback, or help understanding code -- without making direct changes.
Edit mode: A targeted editing tool for precise refactors or updates. Use it when you want to apply specific changes across one or more files with full control over what gets modified.
Agent mode: An autonomous assistant for multi-step tasks like building features or scaffolding projects. Use it when you have a broad prompt or PRD and want Copilot to generate or modify code across your project.

Agent mode, of course, has been getting most of the attention lately as agentic AI has been dubbed the next big thing, letting AI constructs go and do things -- including handling computers -- on their own to accomplish tasks dictated by humans.

"Agent mode is just so good, honestly," said Kerr, who added at another point, "And then agent mode is when you just want to vibe -- you just want things done. So you just use agent mode to get it done for you."

And vibing, of course, is where this whole AI thing is going, as it is surely destined to churn out all software for humans in the future.

Peck concurred with admiration for Agent mode. "One of my favorite ways to do prompting for agent mode... what agent mode does, it takes really a fairly large prompt... it's going to pause every now and then, kind of ask if it's doing the right thing, tell us what it wants to do next, go ahead and do more," he said.

Rather than abstract benchmarks or isolated examples, Kerr and Peck demonstrated how Claude 3.5 Sonnet, Gemini 2.5 Pro, GPT-4.1, and others behave in practical scenarios -- building a full-stack app, debugging code, refining UI, and writing documentation. By using the same project across Ask, Edit, and Agent modes, they illustrate how each model responds to developer intent, how to troubleshoot when things go wrong, and how to choose the right model for the job at hand. The takeaway is clear: success with Copilot isn't just about the tool -- it's about matching the right model to the right mode for the task.

Beyond the excitement for powerful agentic capabilities, the core takeaway from Kerr and Peck's demonstration is highly practical: effectively leveraging GitHub Copilot today means understanding not just the different modes available, but also how various underlying AI models perform differently within them. The video's entire point is to show developers, through real-time coding challenges, that the "best" model isn't a one-size-fits-all answer, but rather depends heavily on the specific task at hand -- whether it's brainstorming, implementing a complex feature via Agent mode, applying precise edits in Edit mode, or debugging in Ask mode. By witnessing the developers switch models and modes based on the challenge, viewers gain insight into tailoring their Copilot usage for maximum efficiency and success.

And that all started with a good prompt, with Peck explaining the great detail he put into a Markdown README.md file to describe the app to be built.

The README prompt contains:

A description of the app (a travel reservation app)
A list of desired features (e.g., view hotel rooms, make reservations)
The technology stack (Flask backend, Vue.js frontend loaded via CDN)
A project structure outline written in markdown
Setup instructions, like creating a Python virtual environment
Hints at styling, structure preferences, and coding conventions

"I was kind of working with Copilot and saying like, 'okay this is what I want to do, but help me refine this readme' -- basically making it a full PRD [product requirements document]."

From that voluminous prompt the project began, but before we get into direct use-case comparisons, here are some quotes lifted from the presentation that serve to set the style, tone and approach:

"Ask is really good at talking through changes that you want to do. Edit mode is for those specific changes that you want. And then agent mode is when you just want to vibe so you just use agent mode to get it done for you." -- Kerr
"What agent mode does, it takes really a fairly large prompt, usually pretty descriptive about a number of different things you want to do and it's going to run through it. It's going to pause every now and then, kind of ask if it's doing the right thing, tell us what it wants to do next, go ahead and do more. And then we get to kind of check in with it and see how the progress is going each time." -- Peck

"I've been partial towards Claude 3.5. For some reason I tend to get the best results from Claude 3.5 but I am open." -- Peck

"Not going to lie, I'm liking what Gemini 2.5 Pro did before it went into that terminal command." -- Kerr

"So far already, you know, you can see Claude is already like writing stuff... Claude it looks like is going right to building. Yes. Whereas 2.5 Pro is kind of giving me a description, a pretty heavy description of what its plan is going to be." -- Peck

"Claude is still working. Claude is still going. Claude's going nuts. Let's watch yours for a while because it looks like Gemini is spinning back up and so it'll be sitting for a while." -- Peck

"When you hit a bug like this you can just ask it what's going on and have it try to figure out a solution." -- Peck

"Claude does take that visual input. So like if you're using like I think 04 Mini, 04 Mini does not take visual input and you'll see it by like that it crosses it out out there to indicate like no girl I cannot see that." -- Peck

"There's the keep step but sometimes you also have to manually save the file or do a save all I found just to actually get it down to disk." -- Peck

The 77-minute video is too long to fully digest here, but here are some visual takeaways, with much of the discussion mainly concerning Gemini 2.5 and Claude 3.5:

GitHub Copilot Modes at a Glance

Feature	Ask Mode	Edit Mode	Agent Mode
Primary Use	Conversational help, explanations, Q&A	Targeted code changes, refactoring	Complex, multi-step tasks, autonomous execution
Input Method	Text prompt, file drop (for context)	Text prompt, explicit file/folder selection	Text prompt (often detailed), file drop, visual input (model dependent)
Output Method	Chat response, code suggestions	Direct file modifications	Multi-step execution, file modifications, terminal commands (with approval), chat updates
Model Availability	Broadest selection	Subset of models	Limited subset of models
Key Characteristic	Interactive, explanatory	Precise, focused	Goal-oriented, potentially multi-turn

Model Performance on Key Tasks (Video Examples)

Task Demonstrated	Mode Used	Model(s) Compared	Observed Behavior / Outcome	Key Takeaway
Initial App Build	Agent	Claude 3.5 Sonnet	Went straight to building, completed first, ran server.	Action-oriented, fast.
		Gemini 2.5 Pro	Planned extensively first, hit preview issue, eventually built.	Planning-oriented, detailed approach.
Debugging Front-end Error	Ask	Claude 3.5 Sonnet	Struggled to identify the core workflow issue (caching).	Less effective for workflow-specific bugs.
		GPT-3.5T / GPT-4 (Copilot Base)	Identified caching/file copy issue, provided correct steps.	Better understanding of developer workflow issues.
Code Refinement (Add Price, Remove ID)	Edit	GPT-4 (Copilot Base)	Successfully modified multiple files (JSON, HTML, JS).	Effective for targeted multi-file changes.
UI Restyling from Screenshot	Agent	Gemini (Flash?)	Attempted to change data structure along with UI.	Visual input can be misinterpreted; needs clear scope.
Adhering to Instructions (Pirate Speak)	Agent	Gemini	Successfully adopted the specified persona.	Follows custom instructions effectively.

Model Strengths Highlighted in the Video

Claude 3.5 Sonnet:
- Good balance of speed and precision.
- Effective for building projects in Agent mode.
Gemini 2.5 Pro:
- Strong planning phase before coding.
- Effective for generating documentation.
GPT-4 (Copilot Base):
- Capable of reasoning and understanding context (e.g., security recommendations, generating tests).
- Effective in Edit mode for multi-file changes.
Models Supporting Visual Input (Claude 3.5 Sonnet, Gemini Flash, GPT-4 Mini/4.1):
- Can process images (like screenshots) for context.
Smaller/Faster Models (e.g., GPT-4 Mini, Flash models):
- Suitable for quick code generation tasks.

Choosing Your Copilot Tool (Mode + Model Considerations)

Task / Goal	Recommended Mode	Model Considerations (Based on Video)	Notes
Understanding Code, Getting Explanations	Ask	Any capable model; broader selection available.	Good for initial exploration.
Building a New Feature (Complex)	Agent	Claude 3.5 Sonnet (action-oriented) or Gemini 2.5 Pro (planning-oriented)	Depends on preference for direct action vs. detailed planning. Need a good prompt.
Refactoring Code Across Files	Edit	Models effective in Edit mode (e.g., GPT-4 variants). Need to select files.	Requires careful context selection.
Fixing a Workflow/Environment Bug	Ask	Models demonstrating awareness of developer environments (e.g., GPT variants).	May require pasting errors and describing symptoms.
Applying Design Changes from Mockups	Agent or Edit	Models supporting visual input (Claude 3.5, Gemini, GPT-4 variants).	Be very explicit about the scope of changes (UI vs. Data).
Generating Boilerplate / Simple Code	Ask or Edit	Smaller/faster models can be efficient.	Use Edit for applying patterns across files.
Generating Documentation / Tests	Ask or Agent	Models noted for detailed output or planning (e.g., Gemini 2.5 Pro, GPT-4).	Agent mode can handle multi-part doc generation.

At one point Kerr provided a good session summary: "So to kind of answer that question that we've been seeing floating around of like what model do I use, how do I know what model to use -- it's just a matter of trying the model. If you get the output you like, you keep it. If you don't, you remove the output... and try a different model. If you want to go back and forth, think in thinking mode, use a reasoning model. If you want to not use up your tokens too much, Claude 3.5 is a really good one. I think even GPT-4o is a good one too, or like [OpenAI GPT-4.1] is a good option."