
SYS.BLOG
Building Vibe Coder: A Journey into Private, Agentic AI Coding
The story behind Vibe Coder, a privacy-first CLI coding assistant powered by local LLMs via Ollama. Why local matters, how agentic AI changes coding, and the technical details.
Vibe Coder is a command-line coding assistant that runs entirely on local LLMs. No cloud, no API keys, no code leaving your machine. The source is on GitHub, and there's a project page with more details.
Click or hover to start demo...
Why Build This
I wanted an AI coding assistant that didn't require sending my code to someone else's servers. Cloud-based tools like Claude and Cursor are good at what they do, but every query goes through their infrastructure. For proprietary work or personal projects, that's a real constraint.
Ollama makes it straightforward to run models like LLaMA and CodeLLaMA locally. I built Vibe Coder on top of it, not just as a Q&A wrapper, but as a tool that can write files, edit code, and execute commands directly from the terminal.
How It Works
Vibe Coder is written in Python (source on GitHub). The core architecture:
- LLM Interface: The
get_ollama_responsefunction talks to Ollama's API athttp://localhost:11434, sending conversation history and streaming back responses. - Prefix-Based Commands: Input is routed by prefix :
edit:,run:,create:,search:. For example,edit: [main.py] Add error handlingtriggershandle_edit_query, which reads the file, sends it to the LLM with the instruction, and applies changes after confirmation. - Multi-Step Plans: The
plan:command asks the LLM to break a request into a JSON array of steps: create files, write code, run commands. Vibe Coder executes each step sequentially, prompting for approval before anything destructive. - File Handling:
read_file_contentandwrite_file_contenthandle I/O with encoding detection viachardetand line-range support (e.g.,[file.py:10-20]). Thegenerate_colored_difffunction usesdifflibandcoloramato show changes before they're written. - Safety Controls: Command execution goes through
subprocesswith a whitelist of allowed commands (python,git, etc.) and blocks dangerous ones (rm). All operations are scoped to a configurable working directory. Every file write and command execution requires explicit user confirmation.
Dependencies are minimal: requests, beautifulsoup4 for web search scraping, colorama for terminal output, and chardet for encoding detection.
Local LLMs vs. Cloud Tools
Cloud-based assistants like Claude and Cursor use larger, faster models and require no local setup. In practice, their code generation is better. That's the honest tradeoff.
What you get with local execution: your code never leaves your machine. No data retention policies to read, no network dependency, no risk of a third party being breached with your source code in their logs. For proprietary or sensitive work, this matters.
What you give up: local models need real hardware. Expect 16GB+ RAM for useful model sizes, and response quality scales with what your machine can run. A 7B parameter model on a laptop doesn't match a frontier model behind an API. Vibe Coder works best when you have a machine that can comfortably run at least a 13B model.
What Actually Works Well
The plan: command is the most useful feature. Typing plan: Set up a Flask web app with a database produces a step-by-step execution plan (scaffold files, write boilerplate, install dependencies) and runs each step with your approval. For repetitive project setup, this saves real time.
The edit: command also works well for targeted changes. Showing a colored diff before applying edits catches mistakes early, and the line-range syntax lets you point the model at specific code instead of dumping an entire file into context.
Limitations and What I'd Do Differently
Vibe Coder is a useful tool for specific workflows, but it has real limitations. The quality ceiling is set by whatever local model you run, and even good local models struggle with large codebases or subtle bugs that require broad context. The plan: command sometimes generates steps that don't quite fit together, and you end up manually correcting the output, which partly defeats the purpose.
If I were starting over, I'd add better context management. Right now the conversation history is simple: it sends recent messages to the model, but there's no summarization or smart truncation. Long sessions degrade in quality as the context window fills up. I'd also add a test: command for running unit tests and a commit: command for Git integration. Both are obvious gaps.
The code is on GitHub. If you try it, I'd be interested in hearing what breaks.