Skip to content

Tutorial: Generate CAD with ll_gen

In this tutorial you will run ll_gen’s propose→dispose loop and then train a proof-of-life neural generator. Allow ~20 minutes (training is short and CPU-friendly).

from ll_gen import GenerationOrchestrator
orch = GenerationOrchestrator()
result = orch.generate("a 20 mm cube with a 5 mm hole through the center")
print("valid solid:", result.is_valid)
print("geometry:", result.geometry_report)

The orchestrator routes the prompt (code vs. neural), proposes a candidate, disposes it in the CadQuery sandbox, and — if invalid — retries with structured error feedback.

The code path proposes CadQuery/OpenSCAD code, which the kernel executes — the most reliable route for simple mechanical parts.

from ll_gen import GenerationRoute
result = orch.generate(
"an M6 hex bolt, 30 mm long",
force_route=GenerationRoute.CODE_CADQUERY,
max_retries=3,
export=True, # write STEP/STL on success
)
print(result.is_valid, result.step_path)

Measure prior-sampling validity of the same VAE before and after the REINFORCE dispose-reward loop. You need a small JSONL of eval prompts (eval/heldout.jsonl, one {"prompt": "..."} per line).

Terminal window
python -m ll_gen.training.proof_of_life \
--generator vae \
--prompts eval/heldout.jsonl \
--epochs 5 --steps-per-epoch 80 --n-eval-samples 100 \
--seed 0 --save checkpoints/vae_rl.pt \
--results results/proof_of_life_vae.json

Read the results:

import json
r = json.load(open("results/proof_of_life_vae.json"))
print("baseline validity:", r["baseline"]["validity_rate"],
"distinct:", r["baseline"]["num_distinct_valid"])
print("trained validity:", r["trained"]["validity_rate"],
"distinct:", r["trained"]["num_distinct_valid"])

num_distinct_valid matters: a validity gain from one valid shape repeated (mode collapse) is visible here rather than mistaken for success.

To train on a dataset rather than the proof-of-life harness:

Terminal window
python -m ll_gen.training.run \
--generator vae \
--dataset deepcad --data-path <hf-id-or-path> \
--max-samples 2000 --epochs 1 --lr 1e-5 \
--device cpu --save checkpoints/vae_rl.pt

The command prints a metrics JSON with reward, advantage, baseline, and loss.

The generators that actually produce valid CAD take the construction-program route — generate the command program and execute it, so the kernel builds a watertight solid. They train and run natively in MLX on Apple Silicon:

Terminal window
# Autoregressive command generator: trains on real DeepCAD programs, then samples + executes.
# Reports validity through the real kernel, gated on a non-degenerate solid.
python ll_gen/mlx/ar_generator_mlx.py --mode train
# -> validity 0.914 (234/256), distinct 104, non-degenerate
# Latent diffusion over a program autoencoder: sample z -> decode -> execute.
python ll_gen/mlx/latent_diffusion_mlx.py --mode train
# -> sampled-z validity 0.934 (239/256), distinct 138 (vs a z=0 mean baseline: 14 distinct)

Both report num_distinct alongside validity, so a high rate from one repeated shape (mode collapse) is visible. The latent-diffusion run prints sampled-z validity against a z=0 predict-the-mean baseline — see ll_gen Usage for what that metric means and why the baseline matters.