SJ Waller | Space Software Sound

Abstract

Software increasingly treats the model as an external dependency: loaded through an SDK, invoked through an API, orchestrated by a framework layered on a language whose semantics know nothing about it. The language sees a network call returning a string. This paper asks a different question: what would a language look like if inference, memory, embeddings and agents were primitives — first-class citizens of the type system and runtime, as functions, strings and arrays are today? It develops the question around Witchcraft, a concept proposal for an AI-native compiled language (Waller, 2025) built on a counterfactual: what if large-scale AI had existed before programming languages were designed?

That counterfactual cuts against the proposal if taken literally — an intelligence-first world would have no high-level language at all, only weights, embeddings and direct binary, because high-level languages exist for humans. So the paper reframes: Witchcraft is not the language AIs would build for themselves but a human-authored language that makes AI a first-class primitive, for the collaboration layer where a person must still read, constrain and answer for what an intelligence does. It proposes four candidate primitives, offers a discriminator for telling a genuine primitive from sugared library, works an example in which inference is the computation rather than a call within it, and turns its own demand for rigour onto its own primitives. The contribution is methodological: an account of what would make a language native to intelligence while remaining legible to its human authors.

1. Introduction

The phrase "AI-native programming language" is now common and almost always imprecise. Usually it means one of two things: a language that models happen to generate well, or a language with good bindings to an inference SDK. Neither is native in any deep sense. JavaScript with a copilot is still JavaScript. Python with an agent framework is still Python. A network call to a model is not a new computational primitive; it is a side-effecting function that returns text, indistinguishable to the compiler from any other.

Much current practice in 2025–26 reflects this. The dominant response to model-mediated software has been protocols and libraries — standards for tool and data access, agent harnesses, retrieval pipelines, orchestration frameworks — all sitting around unchanged languages. This is useful engineering. It is also, structurally, the same move every time: wrap the model in a library and call it from a language that does not know the model exists.

A genuinely AI-native language would begin elsewhere — from the premise that a growing class of software interleaves deterministic code with inference, retrieval, semantic memory, model-mediated judgement and autonomous action, and that a language built for that world would surface these in its type system, compiler and runtime rather than quarantining them in libraries. This is a paradigm-level reframing in Kuhn's (1962) sense: the assumption that a language is a deterministic symbol-manipulator with the model living outside it defines which questions can be asked, and "use AI in a program" can then only mean "call a model from a program." Suspending that assumption is what the Witchcraft counterfactual is for.

The paper uses Witchcraft as a concrete probe — a named, versioned proposal (Waller, 2025) — not a claim that it should be built or that its syntax is right. And one clarification governs everything that follows: the counterfactual does not imply a language written by AIs for AIs. That world has no high-level language at all (§9.1); high-level languages exist for human comprehension, and removing the human removes the reason the language exists. Witchcraft sits deliberately on the other side of that line — a human-authored language in which AI is a first-class primitive, for the collaboration layer where a person must still read, constrain and answer for what an intelligence does. The primitives serve that reader.

2. Terminology and Scope

AI-assisted describes the current mainstream: a model helps produce or transform code that is otherwise conventional, leaving no trace in the program's semantics.
AI-native describes the target: a language whose formal semantics include model-mediated operations, such that the compiler and runtime know which values are inferred, with what uncertainty, from what context, under what policy.
Primitive is used in the programming-language sense: a construct the language provides directly — with its own type, syntax and runtime support — rather than one assembled from a library. Integers, strings, arrays and functions are primitives in most languages; files and sockets usually are not, though some languages elevate them. The paper's question is whether inference, memory, embeddings and agents belong in the first category.
Model-as-value names the treatment of a trained model as a first-class value that can be bound, passed and invoked, rather than a resource reached through a handle.

Witchcraft is a concept proposal (Waller, 2025) for a compiled AI-native language. Its vocabulary is deliberately occult — oracle for a model, familiar for an agent, sigil for a mutable reference, ritual for a function, glyph for text, and a runtime named WitchCore — and §7 examines that choice rather than adopting it uncritically. These are exploratory design-layer concepts, not a finished specification, and the paper treats them as material to be tested, not endorsed.

The scope is language and runtime design, not a survey of tooling and not a specification. Code fragments are illustrative of principles, drawn from or extending the Witchcraft v0.1 proposal; none is proposed as final syntax.

3. Where This Sits: Relation to Existing Work

The instinct — that something deserves first-class language support which is currently handled by libraries — is old, and honesty requires positioning it against stronger, more formal programmes.

Probabilistic programming languages (Goodman and Stuhlmüller, 2014; Ge, Xu and Ghahramani, 2018; Fjelde et al., 2025). This is the most developed body of work in which inference is made part of the programming model rather than an external workflow. Turing.jl, Church, Stan and Pyro embed distributions, conditioning and inference directly in the language. The relation here is debt and extension: PPLs solved the analogous problem for statistical inference, and any AI-native language should inherit their type-system insights rather than reinvent them. The extension claimed is to a messier class of operation — natural-language context, prompt construction, semantic memory, hallucination, explanation, autonomy — that the PPL tradition was not built to address. Witchcraft's oracle is, in effect, a bet that the PPL move generalises from distributions to models.

Effect systems and effect handlers (programming-language theory). Treating a model call as a first-class effect — distinct from pure computation because it consumes external state, may leak information, and may act — is a direct application of existing research. No novelty is claimed in the machinery; only that model calls are a category of effect worth distinguishing, because they carry uncertainty and provenance ordinary effects do not.

The "everything is a primitive" lineage in language design. Languages periodically elevate what a prior generation treated as a library: Lisp made the list primitive (McCarthy, 1960); later languages made the string, the hash map, the coroutine, the async task first-class. Each elevation was justified when a pattern became so pervasive that library-level treatment created more friction than it removed. The paper's empirical bet is that inference, memory, embeddings and agents have reached that threshold. That bet may be wrong — §10 takes the objection seriously — but it is the right kind of argument, made many times before.

Domain-specific and "agentic" frameworks (current practice). The large ecosystem wrapping models in libraries and protocols is the paper's foil, not its ally: orchestration around a language is not nativeness in it, and the persistence of the framework approach is evidence the language layer has not caught up, not evidence it need not.

Stated contribution. This is not new programming-language theory and not a working language. The claim is that a recurring industry phrase — "AI-native" — almost always denotes better libraries around unchanged languages, and that it earns meaning only when inference, memory, embeddings and agents become primitives with their own types, semantics and runtime support. If the proposal reduces to "use a probabilistic language plus a vector database plus an agent library," then it has identified nothing new, and §10.2 requires saying so.

4. Primitive or Sugar? A Discriminator

When a language elevates something to a primitive, the elevation is frequently cosmetic: a keyword wrapping a library call, syntax sugar that compiles to exactly what the programmer could have written by hand. The same oracle keyword could denote a genuine primitive or a thin alias for an HTTP client. If the distinction is to do work, the cosmetic case must be separable from the substantive one.

A construct is more likely to be a genuine primitive than sugared library to the degree that:

The type system treats it specially. A model-as-value has a type the compiler reasons about — what it consumes, what it may emit, what uncertainty it carries — rather than an opaque handle of type Object. If oracle is just Client, nothing native has happened.
It changes what is statically checkable. A genuine primitive lets the compiler catch errors that, in a library, would surface only at runtime: an inferred value used authoritatively without a confidence discharge, a memory accessed outside its declared scope, an embedding compared across incompatible spaces.
The runtime is built around it. A primitive has first-class runtime support — Witchcraft's WitchCore manages model loading, attention-cache, context persistence, embedding operations and agent scheduling as core services, not as user-space libraries the language is unaware of.
It composes with the rest of the language. A primitive interoperates with the language's existing constructs — an oracle's output flows through the type system, an embedding is a value like any vector, a familiar is schedulable like any task — rather than living in a walled-off SDK idiom.
Removing it would force every program to rebuild it. The test of a true primitive: its absence is felt everywhere. If most non-trivial programs in the target domain reimplement the same scaffolding (a retry-and-validate loop around inference, a scoped memory store, an embedding-similarity helper), that scaffolding is a primitive the language is missing.

A construct satisfying none of these is sugar: a nicer way to spell a library call, with no semantic gain. A construct satisfying most is doing something a conventional language cannot express. This converts "first-class" from a marketing word into a checklist; it is fallible — a language could pass it and still be unpleasant — but it tells you whether the nativeness claim refers to anything real.

5. The Minimum-Nativeness Principle

The minimum condition for nativeness is that the language's opacity about the model ends: that it knows it is dealing with a model and can reason about what makes a model unlike an ordinary function — output that is uncertain, non-reproducible across versions, dependent on retrieved context, and potentially consequential. Nativeness is not a keyword; it is the compiler having something true to say about the construct.

From that condition, four candidate primitives follow. Each is currently a library concern; the proposal is that each has become pervasive enough to deserve elevation.

5.1 The model as a value (`oracle`)

In current practice a model is reached through a client object — a handle to a remote or local service. The native move is to make the model itself a value: bindable, passable, invocable, with a type the compiler understands.

oracle muse = summon "llama3"
let response = muse.invoke("Explain quantum computing")

The gain is not the brevity; it is that muse has a model type, not a client type. The compiler can know that muse.invoke performs inference (an effect, per §3), that its result is uncertain and must be handled as such, and that the call has provenance worth tracking. A model-as-value can be passed to a function, stored in a structure, swapped for another model of compatible type — and the type system can enforce that an authoritative decision is not made on a raw, undischarged inference result. None of this is available when the model is an opaque client; all of it is the ordinary consequence of the model being a first-class value.

5.2 Semantic memory (`memory`)

Memory in model-mediated systems is currently an accidental architecture: a vector database bolted to the side of an application, accessed through queries the language does not understand. The native move is to make memory a governed runtime resource with declared scope, retention, retrieval policy and access rules.

memory customer_context:
  scope tenant
  retention 18 months
  retrieval semantic + recency
  audit required

As a primitive, memory becomes something the compiler and runtime can police: a read outside the declared scope is a type error, not a silent data leak; retention and audit are enforced by the runtime rather than hoped for in application code. The point is not that this is impossible with a library — it is that, as a library, the governance depends entirely on discipline, and is absent exactly when it matters most. A memory primitive makes the governance structural.

5.3 The embedding (`embedding`)

Embeddings are already a basic unit of model-mediated software — the representation in which similarity, retrieval and clustering happen — yet no mainstream language treats them as a type. They are arrays of floats the compiler cannot distinguish from any other array, which means the most common embedding bug (comparing vectors from incompatible models or spaces) is invisible until it produces silently wrong results.

embedding q = muse.embed("payment failed")
let hits = customer_context.nearest(q, k: 5)

As a primitive, an embedding carries its space as part of its type: the compiler can refuse a similarity comparison between embeddings of different provenance, the way a strong type system refuses to add a length to a duration. This is a small, concrete example of nativeness earning its place — a class of runtime error converted into a compile-time one, which is exactly what elevating a library type to a language type is for.

5.4 The agent (`familiar`)

The most speculative primitive is the autonomous, persistent, goal-directed process — the agent, which Witchcraft names a familiar. In current practice an agent is a loop in application code wrapped around model calls and tools. The native move is to make the agent a first-class schedulable entity with declared capabilities, permissions and lifecycle.

familiar observer()
  whilst true
    monitor()
  end
end

This is the primitive that most needs the discipline of §4 and the guardrails of §10, because it is where "AI-native" most easily becomes "untestable autonomy." A defensible familiar is not a free-running intelligence; it is a bounded process whose permissions, tools and stopping conditions are declared and enforced by the runtime (WitchCore's agent scheduler), so that what it may see, do and persist is part of the program's checkable structure rather than an emergent property of a prompt. The primitive earns its place only if it makes agent behaviour more legible than a hand-rolled loop, not less.

5.5 Collapsing the proposal's own primitives

The method requires examining these primitives rather than exempting them, and two do not survive cleanly. First, "the model as a value" assumes a model is one kind of thing the type system can name. A deterministic-at-zero-temperature classifier and a frontier model with tool access differ so much that a single oracle type may be too coarse to be honest — collapsing distinctions as badly as the opaque client it replaces. The principle, applied to itself, reaches not one clean model-type but a family of effects whose correct granularity is unresolved: the same bleaching §4 warns against, committed by the proposal. Second, the familiar may not be a primitive at all but a composite — an orchestration of the other three plus tools. If so, elevating it is a category error: it is the integral, not a new unit. The paper does not resolve this; it flags that its most exciting-sounding primitive is its least defensible, and that "agent" may name a pattern, not a primitive.

6. The Method: Designing a Primitive

If inference, memory, embeddings and agents are candidate primitives, the question is how one would responsibly elevate a library concern to a language construct. The following procedure is proposed, with the empirical-contact requirement built in at the end.

6.1 Designing a primitive: the procedure

Five questions decide whether a candidate primitive is real. Is the scaffold pervasive? — the retry-and-validate loop, the scoped store, the similarity helper that nearly every program rewrites (§4, point 5). Can the compiler reason about its type? — for an oracle, that it infers and carries uncertainty; for memory, its scope; for an embedding, its space. If the type adds nothing checkable, the elevation is cosmetic. What becomes statically checkable? — name the class of runtime error it converts to a compile error (§4, point 2); if none, stop, it is sugar. What runtime support is required? — model loading, attention-cache, scheduling; a primitive without runtime support is a keyword (§4, point 3). Does the metaphor pay here? — oracle names something new; whilst themes something old and costs legibility (§7).

6.2 Test the primitive against fault injection

A mandatory discipline: a primitive earns its place only if a program using it fails more safely, or audits more readily, than the same program written with a library. Build a task twice — once with library calls, once with native primitives — then inject faults: a model version change, a low-confidence output, a cross-space embedding comparison, a memory access outside scope, an agent exceeding its permissions. The native version should contain these — the version change flagged by provenance, the low-confidence output blocked by type, the cross-space comparison rejected at compile time, the scope violation refused, the agent halted — where the library version fails silently or at runtime. If the two fail identically, the primitives are decoration. This paper specifies the test rather than reporting it, so the claim is falsifiable rather than merely attractive.

6.3 A worked example: inference as the computation, not a call within it

The four primitives are only interesting if they compose, and the composition only earns the title if inference becomes the computation itself — not a value fetched by hand-written control flow. This is the line between Witchcraft and a fancy prompt DSL, and it must be drawn explicitly, because the two look alike from a distance.

Consider what a conventional "AI-native" example usually does: a human writes the branching logic and calls a model for a sub-result.

urgency = classifier.invoke(msg)          # model fills a value
if urgency == "routine":                  # human wrote the algorithm
    draft = drafter.invoke(msg)
elif urgency == "critical":
    escalate(msg)

The model fills a hole; the reasoning — the routing, the decision structure — is hand-coded. That is orchestration. It is useful, and it is not the thesis: the AI is called by the computation, it is not doing it.

The Witchcraft version inverts this. The hard part of triage is a judgement no one can write as an algorithm: given a garbled, multi-issue, emotionally-loaded message and a customer's messy history, what is actually being asked, how urgent is it really, and what is the right disposition? You cannot express that as if/elif. In Witchcraft it is a single typed inference region — a divine block — whose output type is the specification, and the model's job is to inhabit it:

oracle triage = summon "support-reasoner-v3"

memory tickets:
  scope tenant
  retention 24 months
  retrieval semantic + recency

# the decision the model must produce — this type IS the program logic
type Disposition = {
  issue:      one_of { Billing, Outage, HowTo, Abuse, Unclear }
  urgency:    spark in 0..10
  action:     one_of { Draft(reply: glyph), Escalate(to: Team), AskClarify(q: glyph) }
  rationale:  glyph
}

familiar support_triage(msg: glyph, customer: tenant_id)
  permits { read tickets, invoke triage, escalate }

  embedding q = triage.embed(msg)
  let history  = tickets.nearest(q, k: 5) within customer

  # inference IS the computation: the model resolves the whole judgement,
  # constrained to inhabit Disposition. there is no hand-written branch tree.
  divine decision: Disposition
    from (msg, history)
    using triage
    with confidence >= 0.80
    fallback escalate(msg, customer, reason: "low_confidence")

  enact decision.action          # runtime executes the typed action;
                                 # Escalate/Draft/AskClarify are the only shapes possible
end

There is no if. The routing logic is the inference, bounded by Disposition. The human has not written the algorithm; they have written the space of acceptable answers and handed the reasoning to the model. That is inference as a primitive.

Why this is not a prompt DSL. The obvious objection: is divine just sugar for "assemble a prompt, call the model, parse the JSON, validate against a schema"? If it were, Witchcraft would be a costume over prompt() and the thesis would collapse. Two mechanisms make the difference real, not cosmetic.

First, the type constrains generation, not validation. Disposition is not checked after the model speaks; it is enforced during decoding. The runtime (WitchCore) compiles the output type into a generation-time constraint — a grammar the decoder must satisfy token by token — so that, assuming the runtime enforces constrained decoding correctly, the model cannot emit a value outside the type. urgency cannot come back as "quite high"; the decoder is constrained to a spark in 0..10. action cannot be a fourth, hallucinated shape; only the three declared variants are reachable in the output space. This is the opposite of schema-validation-after-the-fact: illegal outputs are not rejected but, in principle, unreachable. (The qualifier matters: real constrained decoding depends on grammar support, the model interface and tokenisation, so the guarantee is a design intention the runtime must honour, not a property that comes for free.)

Second, confidence and provenance are part of the value and flow downstream. divine does not return a bare Disposition; it returns one carrying its confidence and its provenance (which model, which retrieved history, which prompt lineage). The with confidence >= 0.80 clause is a typed discharge — below threshold, the value never materialises as a Disposition and the fallback fires instead. The provenance rides along into enact, so the audit trail is structural, not logged by hand.

The litmus test. Here is the test that separates genuine inference-as-primitive (§3's "everything is a primitive" sense) from a prompt DSL wearing types: if you deleted the type, would the computation at the moment of inference change? In a prompt DSL, no — the type is post-hoc validation, the model generated the same tokens regardless, and stripping the type changes only whether you catch a bad output afterward. In Witchcraft, yes — deleting Disposition removes the grammar that constrains decoding, so the model's actual generation changes: it is now free to emit prose, malformed actions, out-of-range urgencies. The type is not a check bolted after the computation; it is part of the computation, shaping what the model is permitted to produce as it produces it. A construct that fails this test is a prompt DSL and should be called one. divine passes it, and that passing is the whole claim.

What a conventional language cannot express. The precise "couldn't do" is not that Python cannot triage a ticket — given scaffolding it can — but that Python has no construct for a typed hole filled by reasoning, where the type is enforced into the generation and carries confidence and provenance as part of its value. To approximate divine you would hand-write the prompt, hand-build the grammar, wire constrained decoding against the model API, parse and validate, thread confidence through manually, and attach provenance at every site — exactly the pervasive scaffold §4 says a real primitive eliminates. Once elevated, the compiler can guarantee what the hand-rolled version only hopes: that the reasoning's output is type-correct by construction. And per §9.1 the beneficiary is not the model, which needs none of this, but the human who declared Disposition and must answer for whatever enact does.

The compile-error contrasts of the orchestration case still apply and are sharper here: an undischarged divine result used without its confidence gate will not compile; an unscoped read of the tenant-scoped tickets will not compile; the familiar cannot enact an action outside its declared permits. But the deeper point is the one the litmus test isolates — inference is doing the work, the type is shaping the work as it happens, and neither is true of a prompt DSL.

7. The Whimsy, and the Limits of the Metaphor

Witchcraft's most conspicuous feature is its vocabulary: oracles, familiars, sigils, a WitchCore runtime, a Coven package registry. This is easy to dismiss as gimmick and easy to over-defend as essential; neither is right.

The whimsy does real cognitive work where it names something new. "Oracle" captures what "client" obscures — that you consult it, that its answer is not guaranteed, that it speaks with authority it has not earned. Memorable words lower the cost of thinking about unfamiliar concepts, and a language is a culture as much as a grammar: Python, Ruby and Perl all carry playful names and outlasted soberly-named rivals. The framing makes an arid question — should inference be a primitive? — inviting to a reader who would never open On the Type-Theoretic Status of Model-Mediated Effects, and adoption is downstream of comprehension.

But a metaphor that illuminates a new concept obscures a familiar one. The v0.1 proposal themes the mundane — whilst for while, chant for print, summon for construction — and there it costs rather than pays: a while loop is the same loop it always was, and renaming it subtracts the recognition decades of convention provide. Worse, blanket theming makes the whole language feel like magic, encouraging the reader to treat its non-determinism as mystical rather than managed — and the more delightful the metaphor, the more it can substitute for rigour. An oracle that "just knows things" is a seductive way to avoid confronting that the model is frequently, confidently wrong. A metaphor that makes unreliability feel like enchantment is not a hook but an anaesthetic.

The line, then: the whimsy is a good front door and a bad uniform. Keep it for the brand and the genuinely new primitives, where it makes novel constructs graspable; surrender it everywhere it re-costumes the familiar or makes the hard parts feel easy. Keep the name; let the plumbing look like plumbing.

8. Case Study: Compilation, and the Limits of Static Reach

A compiled AI-native language raises the expectation that model-mediated programs could be checked the way ordinary types are. They can — but only to a boundary. The compiler verifies structural properties: that an inferred value is not used authoritatively without discharge, that an embedding comparison stays within one space, that a memory access respects its scope, that a familiar's actions stay within its permissions. These are exactly the runtime errors §5 promised to convert into compile-time ones.

It cannot verify semantic ones: that the output is correct, the confidence calibrated, the retrieved context relevant, the agent's plan sound. Those are runtime, model-dependent facts. Nativeness reaches as far as the structure of the computation and no further, and the danger is to mistake the structural guarantee for a semantic one — a program that type-checks is not a program whose model outputs are true. Treating the green build as correctness reintroduces the original sin, presenting an uncertain operation as settled, one level up and with more authority.

9. Primitives, History and Directionality

It is natural to treat the set of language primitives as fixed — the furniture every language has. But the set has never been fixed; it has accreted. The list, the string, the hash map, the coroutine, the async task each began as a library pattern and became first-class once it grew pervasive enough that library treatment created more friction than it removed. Asking whether inference should be a primitive is not exotic; it is the same question the field has answered repeatedly, applied to the newest pervasive pattern — library pattern → idiom → first-class construct, driven by ubiquity. This places the proposal in a tradition rather than proving the elevation is due; it remains subject to the objection (§10) that some patterns should stay libraries because they change too fast to fix in a language. The claim is only that drawing the historical line is more useful than treating the primitive set as eternal.

9.1 The objection from human-readability: why an AI-first world might have no language at all

The strongest objection to the premise is historical. Every high-level language exists for humans — assembly was a mnemonic over machine code; Fortran and its descendants were concessions to the fact that people cannot hold raw instructions in their heads — and the compiler's whole job is to discard that legibility and emit what the machine runs. Run the counterfactual honestly against this and it turns hostile: an intelligence that does not think in source code has no reason to exchange annotated source any more than two humans would exchange one another's neural firing patterns. It would trade weights, embeddings and latent state, or compile intent straight to binary. The likeliest product of a truly AI-first history is therefore not Witchcraft but the absence of any high-level language — direct machine representations with no source to read, because no one who mattered needed to read it.

This is fatal to the naive reading of the counterfactual, and the paper accepts it. The objection assumes the language is for the AI; it is not. The moment a human must stay in the loop — to specify, to bound, to audit, to carry responsibility — a readable, checkable artefact becomes necessary again, for the human. So the counterfactual is a provocation, not a prediction: it does not tell us what would have been built, it isolates what is new by stripping the assumption that these concerns must live in libraries. The readable, compiled, human-authored frame is not an inconsistency smuggled in; it is the point. On the seam between human intent and machine execution, readability is the entire reason a language, rather than a binary handshake, is there at all.

10. Guardrails Against Undisciplined Design

A language that makes intelligence native is double-edged, and the edges must be named.

10.1 Four standing cautions

Four limits travel with the proposal and have been argued above; they are gathered here. Conventional languages remain valid for software with no model-mediated behaviour — the apparatus is overhead there. "AI-native" claims must be located against the §4 discriminator: semantics, not marketing. A primitive must touch a real failure (§6.2) — one that changes no outcome under fault injection is decoration. And structural guarantees are not semantic ones (§8): a green build is not a correct output, and treating the type-check as a correctness guarantee is the most dangerous available misreading, placing weight on outputs the language has merely typed, not validated.

10.2 The metaphor must be resisted, and the primitives collapsed

Ritual vocabulary is defensible only where it makes a new primitive graspable, and becomes obfuscation the moment it costumes the familiar or dresses non-determinism as magic (§7). And if the design removes the "model behind an SDK" assumption only to introduce equally unexamined primitives — a single oracle type too coarse to be honest, a familiar that is a composite masquerading as a unit (§5.5) — it has failed by its own standard. If Witchcraft reduces to a probabilistic language plus a vector store plus an agent library wearing a costume, the costume is the only novel thing, and that is not enough. Naming these failure points is the recursive discipline, not a footnote to it.

11. Beyond Languages

The same question transfers wherever model-mediated work is rebuilt by hand: which pervasive scaffold are we reconstructing in every service, and has it earned elevation to infrastructure with checkable scope and provenance? The caution of §8 transfers with it — elevating a concern to first-class status exposes its structure but does not resolve its semantics. The method finds where a pattern is hand-rebuilt and asks whether it should be a primitive; it does not promise the elevated version is correct, only honest about what it is.

12. Discussion: Suspending the Library Boundary Without Abandoning It

"AI-native" does not mean abolishing libraries. Most software will keep reaching models through SDKs, and should; the apparatus here is for systems where inference is central enough that library-level treatment has become the friction rather than the convenience, and even there, conventional code and ordinary libraries surround the native core. The value of asking what if AI had existed before the language? is that it stops the library boundary from being mistaken for a law of nature, and reveals which concerns have grown too pervasive to leave outside the language. Early on an occasional inference call was an exotic side effect, rightly handled by a library; as inference, memory, embeddings and agents move to the centre of whole classes of software, the library approach stops scaling, and the temptation is to paper over the gap with ever-larger frameworks. The argument is that the honest response may be at the language level, where the pervasive concerns can be typed and checked, rather than the framework level, where they remain conventions discipline alone enforces. Witchcraft's value is not that it is right, but that it makes the question hard to avoid.

13. Conclusion

An AI-native programming language is not one a model generates well, nor one with good inference bindings. It is a language whose semantics treat intelligence as a primitive: a model as a first-class value, memory as a governed resource, an embedding as a typed quantity, an agent as a bounded schedulable entity — each with a type the compiler reasons about and a class of error it converts from runtime to compile time. Witchcraft dramatises this through one counterfactual — what if AI had existed before languages? — whose honest answer is the key to the whole proposal: an AI-first world would have no readable language, because it would have no human who needed to read. The reason to build a language at all is the human who stays in the loop. Witchcraft is not the language AIs would write for themselves; it is a human-authored language that makes AI a first-class primitive so a person can still read, constrain and answer for what the intelligence does.

The goal is to make intelligence less magical in software, not more. The deepest question is not:

How do we let a program call a model?

but:

What must a language make primitive for the human who stays responsible for what the intelligence does — and have we earned the right to build it in?

References

Fjelde, T.E., Xu, K., Widmann, D., Tarek, M., Pfiffer, C., Trapp, M., Axen, S.D., Sun, X., Hauru, M., Yong, P., Tebbutt, W., Ghahramani, Z. and Ge, H. (2025) 'Turing.jl: a general-purpose probabilistic programming language', ACM Transactions on Probabilistic Machine Learning. doi:10.1145/3711897.

Ge, H., Xu, K. and Ghahramani, Z. (2018) 'Turing: a language for flexible probabilistic inference', Proceedings of the Twenty-First International Conference on Artificial Intelligence and Statistics, 84, pp. 1682–1690.

Goodman, N.D. and Stuhlmüller, A. (2014) The Design and Implementation of Probabilistic Programming Languages. Available at: http://dippl.org/ (Accessed: 19 June 2025).

Gordon, A.D., Henzinger, T.A., Nori, A.V. and Rajamani, S.K. (2014) 'Probabilistic programming', Proceedings of the on Future of Software Engineering, pp. 167–181. doi:10.1145/2593882.2593900.

Kuhn, T.S. (1962) The Structure of Scientific Revolutions. Chicago: University of Chicago Press.

McCarthy, J. (1960) 'Recursive functions of symbolic expressions and their computation by machine, Part I', Communications of the ACM, 3(4), pp. 184–195. doi:10.1145/367177.367199.

Van de Meent, J.W., Paige, B., Yang, H. and Wood, F. (2018) 'An introduction to probabilistic programming', arXiv. Available at: https://arxiv.org/abs/1809.10756 (Accessed: 19 June 2025).

Waller, S. (2025) Witchcraft: An AI-First Compiled Programming Language (Concept Proposal, v0.1). Original concept 19 June 2025.

Weizenbaum, J. (1976) Computer Power and Human Reason: From Judgment to Calculation. San Francisco: W.H. Freeman.

Witchcraft