SYS.BLOG

Building Vibe Coder: A Journey into Private, Agentic AI Coding

The story behind Vibe Coder, a privacy-first CLI coding assistant powered by local LLMs via Ollama. Why local matters, how agentic AI changes coding, and the technical details.

March 24, 2025|Aditya Bawankule

AIPythonOllamaPrivacyOpen Source

Vibe Coder is a command-line coding assistant that runs entirely on local LLMs. No cloud, no API keys, no code leaving your machine. The source is on GitHub, and there's a project page with more details.

VibeCoder Terminal

Click or hover to start demo...

Why Build This

I wanted an AI coding assistant that didn't require sending my code to someone else's servers. Cloud-based tools like Claude and Cursor are good at what they do, but every query goes through their infrastructure. For proprietary work or personal projects, that's a real constraint.

Ollama makes it straightforward to run models like LLaMA and CodeLLaMA locally. I built Vibe Coder on top of it, not just as a Q&A wrapper, but as a tool that can write files, edit code, and execute commands directly from the terminal.

How It Works

Vibe Coder is written in Python (source on GitHub). The core architecture:

LLM Interface: The get_ollama_response function talks to Ollama's API at http://localhost:11434, sending conversation history and streaming back responses.
Prefix-Based Commands: Input is routed by prefix : edit:, run:, create:, search:. For example, edit: [main.py] Add error handling triggers handle_edit_query, which reads the file, sends it to the LLM with the instruction, and applies changes after confirmation.
Multi-Step Plans: The plan: command asks the LLM to break a request into a JSON array of steps: create files, write code, run commands. Vibe Coder executes each step sequentially, prompting for approval before anything destructive.
File Handling: read_file_content and write_file_content handle I/O with encoding detection via chardet and line-range support (e.g., [file.py:10-20]). The generate_colored_diff function uses difflib and colorama to show changes before they're written.
Safety Controls: Command execution goes through subprocess with a whitelist of allowed commands (python, git, etc.) and blocks dangerous ones (rm). All operations are scoped to a configurable working directory. Every file write and command execution requires explicit user confirmation.

Dependencies are minimal: requests, beautifulsoup4 for web search scraping, colorama for terminal output, and chardet for encoding detection.

Local LLMs vs. Cloud Tools

Cloud-based assistants like Claude and Cursor use larger, faster models and require no local setup. In practice, their code generation is better. That's the honest tradeoff.

What you get with local execution: your code never leaves your machine. No data retention policies to read, no network dependency, no risk of a third party being breached with your source code in their logs. For proprietary or sensitive work, this matters.

What you give up: local models need real hardware. Expect 16GB+ RAM for useful model sizes, and response quality scales with what your machine can run. A 7B parameter model on a laptop doesn't match a frontier model behind an API. Vibe Coder works best when you have a machine that can comfortably run at least a 13B model.

What Actually Works Well

The plan: command is the most useful feature. Typing plan: Set up a Flask web app with a database produces a step-by-step execution plan (scaffold files, write boilerplate, install dependencies) and runs each step with your approval. For repetitive project setup, this saves real time.

The edit: command also works well for targeted changes. Showing a colored diff before applying edits catches mistakes early, and the line-range syntax lets you point the model at specific code instead of dumping an entire file into context.

Limitations and What I'd Do Differently

Vibe Coder is a useful tool for specific workflows, but it has real limitations. The quality ceiling is set by whatever local model you run, and even good local models struggle with large codebases or subtle bugs that require broad context. The plan: command sometimes generates steps that don't quite fit together, and you end up manually correcting the output, which partly defeats the purpose.

If I were starting over, I'd add better context management. Right now the conversation history is simple: it sends recent messages to the model, but there's no summarization or smart truncation. Long sessions degrade in quality as the context window fills up. I'd also add a test: command for running unit tests and a commit: command for Git integration. Both are obvious gaps.

The code is on GitHub. If you try it, I'd be interested in hearing what breaks.

FREQUENTLY ASKED QUESTIONS

Can you run a coding LLM locally?

Yes. Frameworks like Ollama, LM Studio, llama.cpp, and Jan make it straightforward to run coding LLMs locally on Windows, macOS, or Linux. Local models keep your code private and work offline.

Is there any private LLM?

Yes. Private LLMs run entirely on your hardware with no data leaving your machine. They can be custom-built on your organization's data if you fine-tune them, or you can use pre-trained coding models locally. The key benefit is complete control over what the model sees.

Can I use Ollama for coding?

Absolutely. Ollama is ideal for sensitive code or offline work since everything runs locally. You sacrifice some output quality compared to cloud models, but gain privacy and zero API costs. Models like CodeLlama, DeepSeek Coder, and Qwen handle most coding tasks well.

Building Vibe Coder: A Journey into Private, Agentic AI Coding

Why Build This

How It Works

Local LLMs vs. Cloud Tools

What Actually Works Well

Limitations and What I'd Do Differently

FREQUENTLY ASKED QUESTIONS

RELATED CONTENT

DreamPixel Forge: Free Local AI Image Generator (2026)

DreamPixel Forge: Free Local AI Image Generator

VibeCoder