I’m a practicing physician who builds production AI systems — and ships them at the pace of an engineering team by working with frontier models. I’d spent a year waiting on vendors to deliver the AI agent I wanted; when it didn’t come, I built it myself in Claude Code — live within days. Four months later the whole practice runs on systems I build and own. This is the work.
I don’t have an engineering department. I have frontier AI and a disciplined way of directing it — and that’s exactly what I help other practices set up.
Most practices buy AI as a black box from an agency or a SaaS vendor and rent it forever. I take the opposite approach: I build the systems in-house and own them — the patient-facing agents, the EMR and CRM integrations, the operations automation — using frontier models as my engineering team. I learned why this matters the hard way: I’d paid vendors for over a year to build my AI and waited, then built it myself in a fraction of the time.
The edge isn’t the models; everyone has those. It’s the harness: typed contracts at every boundary, kill switches on every risky feature, adversarial review of my own work, and verification against primary sources instead of taking the model’s word for it. That discipline is what turns one physician into a shipping team — and it’s what I bring to a practice that wants to actually own its AI.
I did what every practice owner is told to do: hire the experts and wait. Here’s the actual sequence — from my own messages — that ended with me building it myself. It’s exactly why I now build in-house, and why I help other practices skip the wait.
Eleven case studies — each a system in production at my practice today, built between February 16 and June 16, 2026. Ordered by how impressive I think each one is; number one is the one I’m proudest of.
The problemPatients text at all hours; staff can’t keep up, and a missed message is a missed surgery.
I built a production LLM agent — not a chatbot — that replies 24/7 in English and Spanish and books and reschedules patients directly in the EMR. It runs an Opus 4.8 tool-calling loop behind a five-guard safety contract and a correction pass, so it can take irreversible clinical actions over text without saying the wrong thing. Live for months.
The problemThe practice ran on a patchwork of tools and a clunky commercial EMR surface.
I built one app the whole office runs on — patient lookup, scheduling, EMR booking, billing follow-up, clinical notes, consents, and front-desk call handling across ~25 tabs. From zero to a live, production foundation in days (105 commits in the first 72 hours), now past its hundredth release. It’s the operational backbone, not a demo.
The problemOperations don’t scale when there’s one of you.
I run ~97 production automations — SMS agents, call intelligence, CRM task automation, marketing dashboards, monitoring — on real engineering governance: every workflow is generated by a versioned builder script (the builder is the source of truth), with kill switches, dead-letter queues, and drift detection. A 24/7 ops platform run by one person.
The problemOur conversions collapsed 92% and the outside agency couldn’t explain why.
I ran a 100-day forensic recovery from live API data — repeatedly proving the docs and the agency wrong — and restored the account’s biddable conversions. Then I built an hourly pipeline that maps real booked-revenue back into Google’s bidding, so the algorithm finally learns what an actual patient is worth, not just a form fill.
The problemEvery phone call left zero trace in the CRM.
I built a pipeline that turns each call into structured intelligence within minutes — it resolves the caller, enriches the transcript with Claude Sonnet 4.6 (summary, objections, sentiment, next steps), writes a rich CRM note and ~16 properties, and feeds fourteen downstream systems. Live since late February.
The problemThe EMR exposes no public scheduling API — you can’t just “book an appointment.”
I reverse-engineered real booking into a closed EMR over FHIR — appointment creation with conflict detection and free-slot search, closed-day inference, location mapping, and the surgical roster read from the EMR as source of truth — behind a tested integration layer. On the CRM side, HubSpot is the system of record. Nearly everything above stands on this.
The problemDocumentation eats clinical time — and iOS won’t let a web app keep the mic alive in the background.
I built an iPad recorder that defeats the iOS background-mic limit (rotating decodable audio segments and reassembling them at the end), transcribes the visit in Spanish or English with speaker labels, writes a clean note, and files it to the CRM — with a hard wrong-chart guard after a real near-miss.
The problemBuilding fast with AI risks shipping plausible-but-wrong work.
So I built tooling to catch myself: a 16-agent bug hunter, a three-model plan jury that red-teams a design before I build it, and a self-grading quality harness fenced by deterministic checks that refuses to hide its own worst failures. This is the machinery that lets me move fast and stay correct — and it’s the methodology I’d bring to a client.
The problemTurning a paper patient sheet into a correctly-staged CRM deal was slow, manual, and error-prone.
I built a photo-to-deal tool wired right into the coordinator’s Practice OS — a coordinator snaps a photo, GPT Vision extracts the data, the AI maps it to the correct pipeline and stage, and a deal exists in about thirty seconds with no typing. One of the most demo-able things in the whole stack.
The problemStaff clear the board whatever way is fastest — not by whether the patient was actually reached.
I built an engine that treats “Mark done” as a claim, not a close, and verifies it against systems no one can fake — a real connected phone call, an actual EMR booking, or an opt-out — then finalizes, reopens, or escalates. It’s adversarial design applied to human behavior, built to survive turnover. My most original idea here, enforced since June 14.
The problemDictation tools are clunky and cloud-locked — and I wanted one I controlled.
I shipped a signed, native macOS dictation app — a different modality from everything else here. A Swift helper, built and code-signed through CI: hold a key, speak, and clean text lands wherever the cursor is, at ~2.4-second latency. A daily driver, and the desktop sibling of the Clinical Scribe.
A living log of what I shipped, anchored to real version history. It starts here — and it keeps going.
Anyone can call a frontier model. Shipping safe, production systems with one person comes down to a handful of disciplines — the ones I’d set up for any practice I work with.
I fact-check the model’s output against code, version history, and live systems — never the prose. This very portfolio was put through an adversarial review and corrected where it was wrong.
I put structure at the boundaries so a model physically can’t leak its reasoning — or a wrong date — into a patient’s text. Correctness is enforced, not requested.
Every risky feature ships behind a flag and a safe default. Nothing I deploy is irreversible, so I can move fast without betting the practice on it.
Automations are generated from versioned scripts, not hand-edited in a console. The system can always be rebuilt, diffed, and audited — which is how one person safely runs ~97 of them.
I help practices stand up the same kind of systems you see here — patient-facing AI, EMR and CRM integration, and operations automation — built and owned in-house, not rented from an agency. If you want your practice to run on AI you actually control, let’s talk.
Four months ago none of this existed. One person, paired with the right model and the right discipline, can now build at the scale of a team.