The reality of AI CAD generation
It is worth being blunt about the state of the art, because it explains why the toolkit is built the way it is.
What actually ships
Section titled “What actually ships”As of early 2026, no major CAD vendor ships neural-network-based 3D model generation. What is marketed as “AI CAD” falls into three categories that are often conflated:
- Topology optimization rebranded as “generative design” — shipping for years, FEA-based, not neural networks.
- Workflow assistants / copilots — documentation chatbots (Onshape AI Advisor explicitly “does not generate designs”).
- Actual LLM-based CAD generation — one substantive production product (Zoo.dev), a handful of early startups, zero verified enterprise manufacturing deployments.
Even the leading product is candid about limits: it works best for “traditional, simple mechanical parts — fasteners, bearings, connectors,” is stochastic (same prompt → different result), and independent testing shows quality “drops off sharply with medium and high-complexity designs.”
The universal architecture: code → kernel → validate
Section titled “The universal architecture: code → kernel → validate”Every shipping and near-shipping system follows the same three phases:
stochastic phase LLM generates code in a DSL (CadQuery / OpenSCAD / KCL)deterministic phase a CAD kernel executes that code → exact B-Repvalidation phase check watertight / manifold / sane; errors feed backNo neural network writes directly into a CAD kernel’s data structures. The indirection is the point: it leverages decades of proven kernel engineering and gives the network a tractable code-writing task instead of an intractable geometry-synthesis task. The cost is that the system inherits LLM code generation’s limits — stochastic output, prompt sensitivity, failure on complex specs, and no engineering guarantees.
Why ll_gen is “propose, then dispose”
Section titled “Why ll_gen is “propose, then dispose””ll_gen implements exactly this pattern:
- Propose — a generator (neural latent sample, an LLM code proposal, or a deterministic template) produces a candidate.
- Dispose — the candidate is executed in a sandboxed CadQuery subprocess, which produces real geometry and reports whether it is a valid closed solid.
- Align — a REINFORCE loop rewards proposals that dispose into valid solids, pushing the generator toward geometry the kernel accepts.
This is the same code-through-kernel discipline the production systems use, and the same RL-from-kernel-feedback technique that took sketch-constraint satisfaction from ~9% to ~93% in the literature.
Generative design ≠ generative AI CAD
Section titled “Generative design ≠ generative AI CAD”These are architecturally unrelated and routinely confused:
| Generative design (topology opt.) | Generative AI CAD | |
|---|---|---|
| Input | fully specified loads/constraints | natural language |
| Method | gradient-based physics optimization | neural network trained on data |
| Determinism | same input → same output | same input → different outputs |
| Maturity | shipping ~a decade | one production system since late 2023 |
| Output | optimized geometry, engineering meaning | a starting point needing validation |
What to expect from this toolkit
Section titled “What to expect from this toolkit”The toolkit now ships trained generators — and they confirm the thesis above. The ll_gen generators that produce valid CAD are the ones that generate the construction program and execute it: an autoregressive command model (0.914 valid) and a latent diffusion over a program autoencoder (0.934 sampled-z valid), both measured through the real kernel and gated on a non-degenerate solid. The route that doesn’t work is the one that generates raw B-rep faces to be sewn — its independently-sampled faces never mate, so honest validity is 0. That is the “code → kernel → validate” lesson made concrete in this codebase: validity comes from the executable representation, not from a richer neural sampler.
Still, treat generated output as a starting point to be validated, never as a manufacturing-ready part. These models are trained on the DeepCAD distribution of parametric sketch-and-extrude programs; quality drops off for high-complexity designs, and a human does the majority of real design effort. The toolkit makes the reliable part (kernel execution + validation) load-bearing, and the neural part improvable.