I build the AI that
runs a medical practice.

I’m a practicing physician who builds production AI systems — and ships them at the pace of an engineering team by working with frontier models. I’d spent a year waiting on vendors to deliver the AI agent I wanted; when it didn’t come, I built it myself in Claude Code — live within days. Four months later the whole practice runs on systems I build and own. This is the work.

I don’t have an engineering department. I have frontier AI and a disciplined way of directing it — and that’s exactly what I help other practices set up.

How I’m different

One operator,
team-scale output.

Most practices buy AI as a black box from an agency or a SaaS vendor and rent it forever. I take the opposite approach: I build the systems in-house and own them — the patient-facing agents, the EMR and CRM integrations, the operations automation — using frontier models as my engineering team. I learned why this matters the hard way: I’d paid vendors for over a year to build my AI and waited, then built it myself in a fraction of the time.

The edge isn’t the models; everyone has those. It’s the harness: typed contracts at every boundary, kill switches on every risky feature, adversarial review of my own work, and verification against primary sources instead of taking the model’s word for it. That discipline is what turns one physician into a shipping team — and it’s what I bring to a practice that wants to actually own its AI.

The Origin

I asked them to build it.
They didn’t.

I did what every practice owner is told to do: hire the experts and wait. Here’s the actual sequence — from my own messages — that ended with me building it myself. It’s exactly why I now build in-house, and why I help other practices skip the wait.

Sep 2025

The first agent didn’t work.

I’d hired vendors to bring AI into the practice. I shut one of their agents off — it was no good — and asked a simpler question: could we just build an AI that texts patients back and books them?

Dec 2025

I called it the year of AI agents.

I handed the agency a roadmap for 2026. At the very top: deploy AI agents for SMS and website chat. This was going to be the year.

Early Feb 2026

Still waiting — so I started myself.

Months in, I spelled out the real problem: we’re too slow to answer texts, and an AI agent could fix it. I admitted I’d already begun building it in Claude Code. The reply I got was, “Let me have a look.”

Feb 12–13

“This is the dream.”

I made the deck and wrote the build spec for the agent I wanted, and sent it over. That was the last thing I handed off.

Feb 14, 2026

I stopped waiting.

The SMS agent went live — confirming real patient consults in English and Spanish. I’d built it myself.

Feb 16, 2026

Everything else followed.

Two days later I opened the repository that became all of this. The agency never built it. I did — and now I help other practices skip the wait.

The receipts — from my own messages (names blurred)

Me → the agency · Dec 23, 2025 · 2026 roadmap

“Deploy AI agents for SMS and website chat.”

Me → the agency · Feb 10, 2026

“It would be great if AI could be used like an agent to respond to texts and book consults. I’ve been playing with the idea of building this through Claude Code.”

The agency · Feb 10, 2026

“We definitely need to create some AI agents for these kind of task. Let me have a look.”

My SMS agent — live · Feb 14, 2026

“¡Perfecto, [paciente]! Su consulta con el Dr. Leva está confirmada.”

The agency · Mar 16, 2026

“We don’t have lot’s of works to do now as you are doing most of the work your self.”

Case Studies · First Four Months

Eleven systems,
all live.

Eleven case studies — each a system in production at my practice today, built between February 16 and June 16, 2026. Ordered by how impressive I think each one is; number one is the one I’m proudest of.

Bilingual AI Patient-Texting AgentLive

The problemPatients text at all hours; staff can’t keep up, and a missed message is a missed surgery.

I built a production LLM agent — not a chatbot — that replies 24/7 in English and Spanish and books and reschedules patients directly in the EMR. It runs an Opus 4.8 tool-calling loop behind a five-guard safety contract and a correction pass, so it can take irreversible clinical actions over text without saying the wrong thing. Live for months.

Opus 4.8 · 12 toolsEN + ESbooks in the EMRavg 6.71/10 over ~2,053 graded days

The Practice OSLive

The problemThe practice ran on a patchwork of tools and a clunky commercial EMR surface.

I built one app the whole office runs on — patient lookup, scheduling, EMR booking, billing follow-up, clinical notes, consents, and front-desk call handling across ~25 tabs. From zero to a live, production foundation in days (105 commits in the first 72 hours), now past its hundredth release. It’s the operational backbone, not a demo.

Next.js / TypeScript~25 tabs · 106 API routes~100 tests · kill switcheslive in days

The Automation FleetLive

The problemOperations don’t scale when there’s one of you.

I run ~97 production automations — SMS agents, call intelligence, CRM task automation, marketing dashboards, monitoring — on real engineering governance: every workflow is generated by a versioned builder script (the builder is the source of truth), with kill switches, dead-letter queues, and drift detection. A 24/7 ops platform run by one person.

~97 live workflows40+ builder scriptsself-recoveringHubSpot · Ads · Aircall · EMR

Google Ads Recovery + Real-Dollar BiddingLive

The problemOur conversions collapsed 92% and the outside agency couldn’t explain why.

I ran a 100-day forensic recovery from live API data — repeatedly proving the docs and the agency wrong — and restored the account’s biddable conversions. Then I built an hourly pipeline that maps real booked-revenue back into Google’s bidding, so the algorithm finally learns what an actual patient is worth, not just a form fill.

−92% → ramped recovery27 conversions restoredreal-$ → Smart Biddingdedup + circuit breaker

Call Intelligence HubLive

The problemEvery phone call left zero trace in the CRM.

I built a pipeline that turns each call into structured intelligence within minutes — it resolves the caller, enriches the transcript with Claude Sonnet 4.6 (summary, objections, sentiment, next steps), writes a rich CRM note and ~16 properties, and feeds fourteen downstream systems. Live since late February.

Sonnet 4.6 enrichmentcall → CRM in minutes~16 fields written14 consumers

The Integration Spine (EMR + CRM)Live

The problemThe EMR exposes no public scheduling API — you can’t just “book an appointment.”

I reverse-engineered real booking into a closed EMR over FHIR — appointment creation with conflict detection and free-slot search, closed-day inference, location mapping, and the surgical roster read from the EMR as source of truth — behind a tested integration layer. On the CRM side, HubSpot is the system of record. Nearly everything above stands on this.

FHIR · real EMR bookingno API → reverse-engineeredCRM = system of recordoffline-conversion upload

Clinical ScribeLive

The problemDocumentation eats clinical time — and iOS won’t let a web app keep the mic alive in the background.

I built an iPad recorder that defeats the iOS background-mic limit (rotating decodable audio segments and reassembling them at the end), transcribes the visit in Spanish or English with speaker labels, writes a clean note, and files it to the CRM — with a hard wrong-chart guard after a real near-miss.

iPad-firstbeats iOS background-micES / EN · diarizednote in ~20–51s

My AI Tooling HarnessLive

The problemBuilding fast with AI risks shipping plausible-but-wrong work.

So I built tooling to catch myself: a 16-agent bug hunter, a three-model plan jury that red-teams a design before I build it, and a self-grading quality harness fenced by deterministic checks that refuses to hide its own worst failures. This is the machinery that lets me move fast and stay correct — and it’s the methodology I’d bring to a client.

Bug Hunter · 16 agentsModel Jury · 3 modelsnightly grading harness~10 custom skills

DealSnapLive

The problemTurning a paper patient sheet into a correctly-staged CRM deal was slow, manual, and error-prone.

I built a photo-to-deal tool wired right into the coordinator’s Practice OS — a coordinator snaps a photo, GPT Vision extracts the data, the AI maps it to the correct pipeline and stage, and a deal exists in about thirty seconds with no typing. One of the most demo-able things in the whole stack.

built into the coordinator OSphoto → GPT Vision → dealright pipeline + stage~30 seconds

Lead Integrity (Verified-Close)Live

The problemStaff clear the board whatever way is fastest — not by whether the patient was actually reached.

I built an engine that treats “Mark done” as a claim, not a close, and verifies it against systems no one can fake — a real connected phone call, an actual EMR booking, or an opt-out — then finalizes, reopens, or escalates. It’s adversarial design applied to human behavior, built to survive turnover. My most original idea here, enforced since June 14.

verifies vs phone + EMRreconciliation cronoff / observe / enforceturnover-proof

ScribeSnap — native Mac appLive

The problemDictation tools are clunky and cloud-locked — and I wanted one I controlled.

I shipped a signed, native macOS dictation app — a different modality from everything else here. A Swift helper, built and code-signed through CI: hold a key, speak, and clean text lands wherever the cursor is, at ~2.4-second latency. A daily driver, and the desktop sibling of the Clinical Scribe.

signed macOS appCI-built + signed releasehold-to-dictate (⌥Space)~2.4s latency

Track Record

Month by month.

A living log of what I shipped, anchored to real version history. It starts here — and it keeps going.

Feb 2026

Foundations & the fire

After a year of vendors promising an AI agent — and one I’d already shut off as “no good” — I stopped waiting and built it myself in Claude Code. The SMS agent went live confirming consults (EN + ES) on Feb 14, two days before I even opened the repo.
Stood up the core repo Feb 16; a ~92% Google Ads collapse added urgency and started a 100-day recovery.
Got the SMS agent self-rating; began the call-intelligence pipeline and front-desk task automation.

Mar 2026

Ship month

Took the SMS agent’s current decision-service architecture live; added self-improvement packages.
Rebuilt lead scoring, a 40-field conversation object, and a 39-campaign drip system; decomposed a 14,000-line dashboard.
Planned and began the QBO accounting migration; shipped the call-intelligence refresh and a Slack alert fabric.

Apr 2026

Classifier redesign & research loops

Rebuilt SMS intent classification — 138 regex rules to an AI cascade, ~30× cheaper and ~500ms faster.
Drafted a voice-agent design; planned and sent a bilingual promo campaign; brought content + research pipelines online.
Drained a 2,628-deal backlog to 22 in a single day of stabilization.

May 2026

The Practice OS is born

Migrated the codebase to git-only sync and stood up a secrets vault.
Built a true-ROAS aggregator feeding real booked revenue back to Meta; shipped the ScribeSnap Mac app.
Created the Practice OS on May 29 — foundation live within the week.

Jun 2026

It all comes together

Moved the SMS agent to Opus 4.8; shipped the Clinical Scribe, the Front-Desk Live Console, and tablet Consents.
Ran an in-house legal dispute-protection overhaul; enforced the Lead Integrity engine.
Started new ventures: an agentic-investing repo and a personal Health OS.

— the record continues —

By the Numbers

Four months,
counted.

live
systems

~2,400

tracked
commits

~97

production
automations

AI model
families

120

days,
solo + AI

How I Build

The method is
the edge.

Anyone can call a frontier model. Shipping safe, production systems with one person comes down to a handful of disciplines — the ones I’d set up for any practice I work with.

Verify against primary sources

I fact-check the model’s output against code, version history, and live systems — never the prose. This very portfolio was put through an adversarial review and corrected where it was wrong.

Typed contracts over hope

I put structure at the boundaries so a model physically can’t leak its reasoning — or a wrong date — into a patient’s text. Correctness is enforced, not requested.

Kill switches on everything

Every risky feature ships behind a flag and a safe default. Nothing I deploy is irreversible, so I can move fast without betting the practice on it.

The builder is the source of truth

Automations are generated from versioned scripts, not hand-edited in a console. The system can always be rebuilt, diffed, and audited — which is how one person safely runs ~97 of them.

Work With Me

What I can do
for your practice.

I help practices stand up the same kind of systems you see here — patient-facing AI, EMR and CRM integration, and operations automation — built and owned in-house, not rented from an agency. If you want your practice to run on AI you actually control, let’s talk.

See how to work with me — offers & pricing →

Jean-Paul Leva · Leva Medical drleva@drleva.com Queens & Northport, NY

Four months ago none of this existed. One person, paired with the right model and the right discipline, can now build at the scale of a team.

I build the AI thatruns a medical practice.

One operator,team-scale output.

I asked them to build it.They didn’t.

Eleven systems,all live.