How to Train Your Dragon | BadChariot.com

For the first time in the history of technology — what we are is no longer the ceiling of what we can do.

AI just changed that equation. Completely. The DBA can now think like a systems architect. The small organization can now move like an enterprise. The person with a vision but not the title can now build things that only the title used to make possible. We can imagine a world bigger than our current role — and then actually reach for it.

And somewhere in that world, right now, there is a spark.

It is the beginning of the next frontier. A new territory to explore. And some people will tell you that frontier is only for the organizations with vast resources at their disposal. The ones with hundred-person AI teams, nine-figure budgets, and armies of data scientists.

I don't believe that.

I believe you build small. You build now. You build the right way — with intention, with discipline, with a foundation so solid that every generation of technology that comes after it has something real to stand on.

The forests of tomorrow are built upon the seeds we spread today.

That is not a metaphor. That is a strategy.

Our season is here. The time is now. Not when the budget is bigger. Not when the team is larger. Now — with what we have, built the right way, on a foundation that lasts.

This article is about one specific piece of that foundation: how to bring AI into your organization using your own data, without putting that data at risk, without tying your future to any single AI model, and without needing a budget that most of us will never see.

The key insight — the thing that changes everything — is this:

Don't give your knowledge to the AI. Make your knowledge the bedrock. Let the AI build on top of it.

Every AI model you will ever use is temporary. The model that seems revolutionary today will be old news in 18 months. That is not a threat. That is progress.

The mistake people make is building systems where the intelligence — the business rules, the learned patterns, the institutional knowledge that took years to accumulate — lives inside the model. When that model gets replaced, they start over. All that learning, gone.

Ask yourself this: why would anyone spend their time rebuilding the same thing over and over when they could build it once and improve upon it through every generation?

The answer, honestly, is that most people don't think about it until it's too late.

We thought about it first.

We built a layer — a database, a semantic foundation, a structure that holds everything our system knows — that belongs to us. That lives in our infrastructure. That cannot be taken away when the next model comes out.

We call it the Soul.

The AI model is just the heartbeat. It keeps things running. But it is not the life of the system.

Drop the heartbeat. Put in a better one. The Soul does not move.

That is the foundation. Everything else in this article is built on top of it.

The Dragon Metaphor

Training a dragon is a useful way to think about building AI on your own data.

A dragon is powerful. Genuinely, spectacularly powerful. It can do things nothing else can. But an untrained dragon burns your house down. It doesn't mean to. It has no idea it's doing it. It's just doing what it does without understanding where it is, who it's with, or what the rules are.

Most AI implementations are untrained dragons.

Powerful in a demo. Impressive in a press release. And completely unaware that the house they are about to burn down belongs to people who trusted you with something important — their data, their privacy, their financial lives, their medical records, whatever it is you are responsible for.

Training the dragon means teaching the system — on your terms, with your data, inside your walls — exactly what it can say, what it cannot say, where it can go, and where it absolutely cannot go.

That is what this guide is about.

What You Actually Need to Build

I am not going to tell you what vendor to buy. I am going to tell you what you need to build — based on what actually works in a production environment, with real data, and real people asking real questions every day.

There are five things that matter.

// 01 — A Live Connection to Your Source Data

Your AI system needs to work on current data. Not a nightly batch. Not a weekly export. Current.

We use PolyBase to maintain a real-time connection to our production system. Every table syncs on a configurable schedule. New fields are detected automatically. Deleted fields are archived — we never throw away what we had.

If your AI answers questions about data from three days ago, it will eventually give someone a wrong answer at exactly the wrong moment. Build the live connection. Keep it fresh. Always.

// 02 — The Semantic Layer — The Soul of Your System

This is where you put everything your system needs to know about your business. Not the data itself. The meaning of the data.

What does 'active member' mean at your organization? What is the difference between a delinquent account and an overdue account? When someone asks about 'good standing' — what are the actual business rules?

All of that lives here. Every learned query pattern. Every domain vocabulary term. Every business rule. When you move to vector embeddings — and you will — they live here too.

This is the Soul. It belongs to you. It cannot be taken away. Drop the AI model. Swap in a new one. This layer does not move.

// 03 — A Warehouse for What Never Gets Calculated

Your production system knows what is true right now. It cannot tell you what was true last year.

Your data warehouse is your memory layer. You build it from every sync cycle your live connection runs. Running totals. Historical patterns. Behavioral trends. Derived insight scores the source system never calculated and never will.

Storage is cheap. Missing history when someone asks a critical question two years from now is not cheap. Build the warehouse. Fill it every day. Never stop.

// 04 — Security as a Join — Not a Door

This one is non-negotiable. Most people get it wrong.

Most security models work like a door. You authenticate at the front and if you get through you can see what the system shows you. The problem is doors can be propped open. Application-layer security is code. Code has bugs. Code can be bypassed.

We built security as a database join. Every single query — no exceptions — includes a mandatory security join enforced at the data layer. If you are not authorized to see a result that result does not exist in your query. No error. No access denied. Just nothing.

You cannot bypass a join. It is baked into the query itself.

Every interaction is logged permanently with full lineage from question to answer. If you cannot audit it, if you cannot trace every answer back to the exact data that produced it, you have not finished building it.

// 04b — The Training Gate — Access Earned, Not Granted

Security is only half the equation. The other half is this: does the person asking the question actually understand the responsibility that comes with the answer?

Most organizations treat training as a checkbox. You watch a video. You click through some slides. Someone marks you as compliant. Then you get access.

That is not a training program. That is documentation that training happened.

What the most security-conscious organizations figured out — and what regulated industries have formalized for decades — is that access and training should be the same thing. Not sequential. Not separate. The same thing.

You request access to a category of data. That request does not go to a manager. It goes into a workflow. The workflow defines exactly what you need to demonstrate before access is granted — what the data classification means, how the data can be used, how it cannot be used, what regulatory exposure you are taking on by accessing it, and what the consequences are for misuse.

You complete the training. You pass the assessment. You collect the required approvals — each one a defined gate in the workflow, not an informal conversation. When every gate is cleared your training record updates. And the security join — the same join that enforces all other access — now lets you through.

Not because a manager said so. Because the system verified you earned it.

Access is not a privilege granted by a title. It is a responsibility earned through demonstrated understanding. Build the system that enforces that distinction — at the data layer, not on a spreadsheet.

This matters more as AI systems get more powerful. The more your system can answer, the more damage a bad actor — or an untrained well-meaning person — can do with that access. Training gates are not bureaucracy. They are the architectural equivalent of saying: we take this seriously enough to build it into the foundation.

You do not need an expensive enterprise platform to build this. You need a workflow engine, a training record table, and a security join that checks both identity AND training status before returning a single row of data.

That is it. Build it once. Enforce it forever.

// 05 — A Walled Garden for the AI Model

The AI model runs on-premises. One network connection — to the internal network only. No internet. No telemetry. No path out.

This is not a temporary workaround. This is an intentional architectural decision. It is the security story that gives a regulator something they can actually approve. Data never leaves the building. Queries are never sent to a third-party server. There is no API key that if compromised gives someone access to your people.

When a better model comes out — and it will — we swap the engine. Everything else stays exactly where it is. That is the whole point.

The Part Nobody Talks About: Building the Training Data

Here is what the articles about AI skip over entirely.

The quality of your system is determined almost entirely by the quality of what you teach it. Not the model you choose. Not the hardware you run it on. The data you train it on.

We spent months doing something most organizations never do: manually processing real queries, annotating them, validating them, and building a foundation from actual work done by actual people solving actual problems. Not synthetic data. Not sample data. Real questions asked by real staff — and the real, validated, correct answers.

That work was slow. It was not glamorous. It was absolutely the right thing to do.

We mapped out how our database was actually segmented. We found the most efficient path through each piece. We built the optimal joins. We validated the results against reality. We turned that into a repeatable pattern.

The result was a measurable, real-world improvement in data quality that saves money on every campaign we run. Not because the AI is magic. Because the foundation we built it on is solid.

Now the system learns from every query it processes. Every interaction feeds back into the semantic layer. It gets smarter every day it runs — not because we programmed it to, but because we built the architecture to make that possible.

The System That Never Stops Learning

Here is the part most people skip when they design AI systems — and it is the part that determines whether what you build has lasting value or becomes obsolete the moment you stop feeding it.

A system built the right way does not just answer questions. It gets smarter every time it answers one.

Every query your system processes is information. What was asked. How it was interpreted. What data was retrieved. How long it took. Whether the answer was useful. All of that is feedback. And if your architecture is designed to receive that feedback — to store it, learn from it, and use it the next time a similar question comes in — your system compounds in value every single day it runs.

That is not magic. That is design.

The system you build on day one should be smarter on day one thousand — not because you kept maintaining it, but because you built it to learn from itself.

The first is pattern learning. Every time a query is processed successfully, that pattern gets stored. The next time someone asks something similar — even if the wording is completely different — the system recognizes the intent and already knows the most efficient path to the answer. It does not start from scratch. It starts from everything it has already learned.

The second is meaning learning. This is where vector embeddings come in — and understanding this concept matters even if you are not ready to build it yet.

A vector is a mathematical representation of meaning. When you convert your data — your query patterns, your business rules, your domain vocabulary — into vectors, you are teaching the system what things mean, not just what they say. Two questions that use completely different words but ask the same thing will have similar vectors. The system finds them by meaning.

This is how you get from a system that matches keywords to a system that understands intent.

If you build your semantic layer correctly from the beginning — if you store your business rules, your learned patterns, and your domain knowledge in your own infrastructure — you are already building toward this. The vectors live in the same layer as everything else. The Soul holds it all.

You do not need to build it all at once. You need to design for it from day one.

A system that learns compounds. A system that does not learn depreciates. Build the one that compounds.

What I Actually Believe

After doing this for real — in production, with real data, building toward a regulatory environment that will eventually ask hard questions — here is what I know to be true.

We know, or we say nothing. A system that guesses confidently is more dangerous than a system that admits it does not know. If the data does not support the answer, the answer is I don't know — and here is exactly why. That is not a limitation. That is the feature. Trust gets built one honest answer at a time.

Never throw away a measurement. Every fact your system observes should be stored. Storage is cheap. The history you did not keep is the history you cannot analyze when someone needs it.

The system must know when it is wrong. Silent failures are the most dangerous failures. Build tolerance monitoring. Build self-healing queues. When something breaks the system should flag it, log it, and ask for help. Broken things are learning opportunities — not embarrassments.

Every answer carries its reasoning. Facts cite their source. Inferences carry a confidence score. Anyone must be able to trace any answer back to the exact data that produced it. If you cannot audit it — you have not finished building it.

Build for what you cannot imagine. The first use case is never the last. Every architectural decision should be made knowing that other domains will follow. The system you build today should welcome requirements you have not thought of yet — without requiring a rewrite.

The Part That Is Scary

I almost didn't write this.

Not because it's confidential. Because sharing something you built alone — something you've poured years of your life into — is genuinely vulnerable. What if it's wrong? What if someone smarter tears it apart? What if the people around me read this and it's not well received?

But here is what I keep coming back to:

If you have a vision, don't be afraid to share it. The worst that can happen is rejection. The best that can happen is it helps someone else build something they didn't know was possible.

The problems I was trying to solve are not unique to me. Every organization sitting on decades of untouched data is facing these same questions. Every person who sees what the data could do if it was properly connected, properly secured, properly taught — is asking the same thing I asked.

Can we actually build this? Can one person do it? Without a vendor, without a team, without a budget that most organizations would recognize as serious?

Yes.

The frontier is not only for the well-resourced. The forests of tomorrow grow from the seeds we spread today.

Build small. Build right. Build on a foundation that lasts through every generation of technology that comes after it.

The dragon is real. It can be trained. And it does not have to burn your house down.

Our season is here. The time is now.

// About the Author

Richard McCrea

SQL Server DBA and AI systems builder with 25+ years of enterprise database experience across finance, government, manufacturing, and gaming. Currently building AI infrastructure at a financial institution in the American Southwest.

rick@badchariot.com | BadChariot.com

← Back to Profile

How to TrainYour Dragon