2026-04-14

AI Coding Workflow

I started coding with AI using Claude Code a few months ago, and was blown away with what it can do. At first I was using Claude for everything, but pretty soon I started to realize how fleeting the usage is on the pro plan. So I started experimenting with local models to reduce my Claude usage and came up with this system:

(Note: I use Ubuntu for my daily driver, so this will be linux-biased)

The Workflow

Step 1:

Use a free ChatGPT or other account for brainstorming or general chatting.

Brainstorming is one of the best things about using AI since it knows what tools are available to use. Things I might have built from scratch, it can suggest a tested and polished library for. But doing this on your paid account (Claude in my case) can eat into your usage. Get your idea formed on a free account, then ask it to give you a comprehensive prompt to generate a plan to build the project.

Step 2:

Plan on your paid account.

Take that prompt to the most advanced model you've got (or a smaller model for a smaller project) for the architectural planning. This is high-value work, and we need to save our tokens to do as much of this as possible. I generally use Claude Opus for planning, and that is an extremely expensive model that burns through your usage surprisingly fast.

Step 3:

Build locally.

Actually writing the code is the easy part for an AI coding assistant. Direct a local model to execute the plan generated by your paid model so you don't unnecessarily drain your usage. I was using qwen3-14b locally for a while, but I recently discovered that google is currently offering access to their latest model on the cloud for free. This is a dense 31b model that I wouldn't be able to practically use on my laptop, and I've found it to be excellent at executing plans and even troubleshooting. This should also be your go-to for asking questions about your codebase, since it has direct access to it and is essentially free.

The Google model is gemma4:31b-cloud. You can use it by installing ollama and running this in the terminal:

ollama run gemma4:31b-cloud

Even better, install Claude Code as well and you can run it in Claude Code with:

ollama launch claude --model gemma4:31b-cloud

Environment Set-up

Claude Code has a nice feature callled opusplan which automatically switches to the Opus model when in plan mode, and then to the less expensive Sonnet model when writing code. Switching the Claude Code harness into a non-Anthropic model is a little more involved though, so I came up with these bash functions to make it easy. Add these to your ~/.bashrc file:

# Claude Code with Opus in plan mode
claude-plan() {
  unset ANTHROPIC_AUTH_TOKEN
  unset ANTHROPIC_BASE_URL
  claude --model opus --permission-mode plan "$@"
}

# Claude Code with Gemma4 via Ollama
claude-build() {
  ANTHROPIC_AUTH_TOKEN=ollama \
  ANTHROPIC_BASE_URL=http://localhost:11434 \
  claude --model gemma4:31b-cloud "$@"
}

Reload your bashrc file after editing:

source ~/.bashrc

Now you can launch the Claude Code harness with the expensive Opus model for planning, then copy the path to the plan and exit the session. Open a new session with the free Gemma 4 model, and ask it to run the saved plan that Opus made.

Context Management

This workflow also makes it really easy to manage context, since it happens almost automatically. The brain-storming session in the free LLM account provides a concise and effective prompt for the big model, and the big model provies an appropriately detailed plan document for the build model. Extraneous context is cleared automatically in each transition. If using Claude Code, remember to use the /clear command if you want to switch topics to avoid compressing the context.