API Reference

Generated from the stepnet package source. Each symbol links to its definition on GitHub.

`class AdaptiveLayer(nn.Module)`

Source: ll_stepnet/stepnet/conditioning.py:45

Single transformer block with cross-attention for conditioning injection.

Contains a self-attention sub-layer, a cross-attention sub-layer (attending to conditioning embeddings), and a feed-forward sub-layer.

Args: hidden_dim: Model hidden dimension. num_heads: Number of attention heads. dropout: Dropout rate.

Methods

`init`

__init__(hidden_dim: int = 1024, num_heads: int = 8, dropout: float = 0.1) -> None

Source: ll_stepnet/stepnet/conditioning.py:57

`forward`

forward(hidden_states: torch.Tensor, conditioning: torch.Tensor, attention_mask: Optional[torch.Tensor] = None) -> torch.Tensor

Source: ll_stepnet/stepnet/conditioning.py:93

Forward pass with self-attention, cross-attention, and FFN.

Args: hidden_states: [B, S, D] decoder hidden states. conditioning: [B, C, D] conditioning embeddings. attention_mask: Optional mask for the conditioning sequence.

Returns: Updated hidden states [B, S, D].

`class BranchAnnotation`

Source: ll_stepnet/stepnet/annotations.py:20

Annotation for a single DFS branch (subtree rooted at a root entity).

Attributes: root_id: Entity ID of the branch root. root_type: STEP entity type of the branch root. descendant_count: Number of descendants (excluding root). max_depth: Maximum depth reached in this branch. type_distribution: Counts of each entity type in the branch.

Methods

`format`

format(max_types: int = 5) -> str

Source: ll_stepnet/stepnet/annotations.py:37

Format branch annotation as text string.

Args: max_types: Maximum number of type counts to include.

Returns: Formatted annotation string with [BRANCH] delimiters.

`init`

__init__(root_id: int, root_type: str, descendant_count: int, max_depth: int, type_distribution: dict[str, int] = dict()) -> None

Source: ll_stepnet/stepnet/annotations.py

`class CADDenoiser(nn.Module)`

Source: ll_stepnet/stepnet/diffusion.py:356

Self-attention denoiser that predicts noise from noisy latents.

Architecture: sinusoidal timestep embedding + self-attention transformer with num_layers layers and num_heads heads.

Args: latent_dim: Dimension of the input noisy latent. hidden_dim: Transformer hidden dimension. num_layers: Number of self-attention layers. num_heads: Number of attention heads. dropout: Dropout rate.

Methods

`init`

__init__(latent_dim: int = 256, hidden_dim: int = DEFAULT_DENOISER_HIDDEN_DIM, num_layers: int = 12, num_heads: int = DEFAULT_DENOISER_HEADS, dropout: float = 0.1) -> None

Source: ll_stepnet/stepnet/diffusion.py:370

`forward`

forward(noisy_latent: torch.Tensor, timesteps: torch.Tensor) -> torch.Tensor

Source: ll_stepnet/stepnet/diffusion.py:413

Predict noise from a noisy latent and timestep.

Args: noisy_latent: [B, D] or [B, S, D] noisy data. timesteps: [B] integer timestep indices.

Returns: Predicted noise, same shape as noisy_latent.

`class CADGenerationPipeline(nn.Module)`

Source: ll_stepnet/stepnet/generation_pipeline.py:58

End-to-end CAD generation pipeline.

Connects ll_stepnet generative models → geotoken decoding → cadling reconstruction. Supports VAE, VQ-VAE, and Diffusion sampling modes.

Attributes: model: Generative model (STEPVAE, VQVAEModel, or StructuredDiffusion). mode: Generation mode (‘vae’, ‘vqvae’, or ‘diffusion’). device: Target device (‘cpu’ or ‘cuda’). executor_tolerance: Tolerance for cadling CommandExecutor validation. quantization_levels: Number of quantization levels per parameter.

Methods

`init`

__init__(model: nn.Module, mode: str = 'vae', device: str = 'cpu', executor_tolerance: float = 1e-06, quantization_levels: int = 256, fallback_config: Optional[FallbackConfig] = None, on_error: Optional[Callable[[CADGenerationError], None]] = None) -> None

Source: ll_stepnet/stepnet/generation_pipeline.py:72

Initialize the CAD generation pipeline.

Args: model: Generative model (STEPVAE, VQVAEModel, or StructuredDiffusion). mode: Generation mode - ‘vae’, ‘vqvae’, or ‘diffusion’. device: Target device - ‘cpu’ or ‘cuda’. executor_tolerance: Tolerance threshold for cadling CommandExecutor. quantization_levels: Number of quantization bins per parameter. fallback_config: Configuration for error recovery strategies. on_error: Optional callback invoked when generation errors occur.

Raises: ValueError: If mode is not one of the supported modes. TypeError: If model is not a recognized generative model.

`generate`

generate(num_samples: int = 1, seq_len: Optional[int] = None, reconstruct: bool = True, **kwargs) -> List[Dict[str, Any]]

Source: ll_stepnet/stepnet/generation_pipeline.py:184

Generate CAD sequences end-to-end.

Samples from the generative model, decodes to geotoken TokenSequence, and optionally reconstructs using cadling’s CommandExecutor.

Args: num_samples: Number of sequences to generate. seq_len: Sequence length for generation (mode-dependent). reconstruct: Whether to reconstruct via cadling. Defaults to True. **kwargs: Additional keyword arguments passed to the model’s sampling method (e.g., temperature, top_k for diffusion).

Returns: List of result dictionaries, each containing: - ‘token_sequence’: geotoken TokenSequence (if geotoken installed) - ‘commands’: List of command dicts (if geotoken installed) - ‘command_logits’: Raw command logits [num_samples, seq_len, 6] - ‘param_logits’: Raw parameter logits, list of 16 tensors - ‘shape’: Reconstructed cadling Shape (if reconstruct=True and cadling installed) - ‘valid’: Boolean validity flag (if reconstruct=True and cadling installed) - ‘error’: Error message if reconstruction failed (if reconstruct=True)

`evaluate`

evaluate(generated_results: List[Dict[str, Any]], reference_shapes: Optional[List[Any]] = None) -> Dict[str, float]

Source: ll_stepnet/stepnet/generation_pipeline.py:1055

Evaluate generation quality using cadling’s metrics.

Computes validity rate, and optionally novelty and coverage if reference shapes are provided.

Args: generated_results: List of result dicts from generate(). reference_shapes: Optional list of reference cadling.Shape objects for novelty/coverage computation.

Returns: Dictionary with evaluation metrics: - ‘validity_rate’: Fraction of valid generated shapes - ‘novelty’: Novelty score (if references provided) - ‘coverage’: Coverage score (if references provided)

Raises: ImportError: If cadling is not installed.

`class CadlingDataset(Dataset)`

Source: ll_stepnet/stepnet/data.py:413

PyTorch Dataset for cadling Sketch2DItem objects.

Accepts a list of cadling Sketch2DItem instances and converts each one to the same dict format as :class:GeoTokenDataset by calling item.to_geotoken_commands() to get the command sequence in geotoken-compatible format.

This lets you train ll_stepnet’s generative models (STEPVAE, SkexGenVQVAE, etc.) directly on cadling’s in-memory geometry objects without writing them to disk first.

Each __getitem__ returns: - command_types: [max_commands] integer command type IDs - parameters: [max_commands, NUM_PARAM_SLOTS] parameter values - parameter_mask: [max_commands, NUM_PARAM_SLOTS] active-parameter mask - attention_mask: [max_commands] validity mask

Args: sketch_items: List of cadling Sketch2DItem objects (or any object with a to_geotoken_commands() method). max_commands: Maximum command sequence length. include_topology: If True, build topology and include in output. labels: Optional labels for supervised learning.

Methods

`init`

__init__(sketch_items: List, max_commands: int = DEFAULT_MAX_SEQ_LEN, include_topology: bool = False, labels: Optional[List] = None) -> None

Source: ll_stepnet/stepnet/data.py:451

`class CodebookDecoder(nn.Module)`

Source: ll_stepnet/stepnet/vqvae.py:548

Autoregressive transformer decoder for generating codebook indices.

Given a sequence of codebook indices, this module predicts the next index autoregressively. One CodebookDecoder is instantiated per codebook stream (topology, geometry, extrusion) so that each stream can be generated independently.

The architecture is a standard GPT-style transformer decoder with causal masking, learned positional embeddings, and a final linear head projecting to the codebook vocabulary.

Args: code_dim: Hidden dimension of the transformer. num_layers: Number of transformer decoder layers. num_heads: Number of attention heads. vocab_size: Number of codebook entries this decoder predicts over (must match the corresponding VectorQuantizer num_embeddings). max_codes: Maximum sequence length of codes to generate. dropout: Dropout rate applied throughout the transformer.

Methods

`init`

__init__(code_dim: int = 256, num_layers: int = 4, num_heads: int = 8, vocab_size: int = 500, max_codes: int = 16, dropout: float = 0.1) -> None

Source: ll_stepnet/stepnet/vqvae.py:571

`forward`

forward(codes: torch.Tensor, mask: Optional[torch.Tensor] = None) -> torch.Tensor

Source: ll_stepnet/stepnet/vqvae.py:627

Compute next-code logits for a sequence of codebook indices.

Args: codes: (batch, seq_len) LongTensor of codebook indices. Should be prepended with BOS token for teacher-forced training. mask: Optional (batch, seq_len) padding mask where True/1 indicates valid positions and False/0 indicates padding.

Returns: Logits tensor of shape (batch, seq_len, vocab_size) giving the predicted distribution over the next codebook index at each position.

`sample`

sample(num_samples: int, max_codes: Optional[int] = None, temperature: float = 1.0, top_k: Optional[int] = None) -> torch.Tensor

Source: ll_stepnet/stepnet/vqvae.py:680

Generate code sequences autoregressively.

Starts from a BOS token and samples one code at a time, feeding each sampled code back as input for the next step.

Args: num_samples: Batch size of sequences to generate in parallel. max_codes: Maximum number of codes to generate per sequence. Defaults to self.max_codes. temperature: Sampling temperature. Higher values produce more diverse outputs. top_k: If set, only sample from the top-k highest probability entries at each step.

Returns: (num_samples, max_codes) LongTensor of generated codebook indices.

`class CommandType(IntEnum)`

Source: ll_stepnet/stepnet/output_heads.py:31

CAD command types matching geotoken’s 6 command vocabulary.

Ordering matches geotoken.CommandType enum ordering so that integer indices are directly interchangeable between the two modules.

`class CommandTypeHead(nn.Module)`

Source: ll_stepnet/stepnet/output_heads.py:66

Predicts the CAD command type at each sequence position.

Args: embed_dim: Dimension of the decoder hidden states. num_command_types: Number of distinct command types.

Methods

`init`

__init__(embed_dim: int = 256, num_command_types: int = NUM_COMMAND_TYPES) -> None

Source: ll_stepnet/stepnet/output_heads.py:74

`forward`

forward(hidden_states: torch.Tensor) -> torch.Tensor

Source: ll_stepnet/stepnet/output_heads.py:78

Compute command type logits.

Args: hidden_states: [batch, seq_len, embed_dim]

Returns: Logits [batch, seq_len, num_command_types].

`class CompositeHead(nn.Module)`

Source: ll_stepnet/stepnet/output_heads.py:129

Combined command-type and parameter prediction head with masking.

During training this head: 1. Predicts command type logits. 2. Predicts 16 parameter logits. 3. Applies PARAMETER_MASKS to zero-out gradients for parameters that do not belong to the predicted (or target) command type. 4. Optionally predicts vertex positions via :class:VertexPredictionHead (when include_vertex_head=True).

Args: embed_dim: Dimension of the decoder hidden states. num_command_types: Number of distinct command types. num_param_slots: Number of parameter slots. num_levels: Number of quantisation levels per parameter. include_vertex_head: Whether to include the :class:VertexPredictionHead for direct 3D vertex prediction. max_vertices: Maximum number of vertex slots (only used when include_vertex_head=True). num_refinement_steps: Number of learned vertex refinement iterations (only used when include_vertex_head=True).

Methods

`init`

__init__(embed_dim: int = 256, num_command_types: int = NUM_COMMAND_TYPES, num_param_slots: int = NUM_PARAM_SLOTS, num_levels: int = DEFAULT_QUANTIZATION_LEVELS, include_vertex_head: bool = False, max_vertices: int = 512, num_refinement_steps: int = 3) -> None

Source: ll_stepnet/stepnet/output_heads.py:154

`forward`

forward(hidden_states: torch.Tensor, command_targets: Optional[torch.Tensor] = None) -> Dict[str, object]

Source: ll_stepnet/stepnet/output_heads.py:204

Predict command types and parameters with optional masking.

Args: hidden_states: [batch, seq_len, embed_dim] command_targets: [batch, seq_len] integer command-type targets. When provided, masking uses the ground-truth command types; otherwise the argmax prediction is used.

Returns: Dictionary with: - command_type_logits: [batch, seq_len, num_command_types] - parameter_logits: list of 16 [batch, seq_len, num_levels] - parameter_mask: [batch, seq_len, num_param_slots] bool

`decode_to_token_sequence`

decode_to_token_sequence(command_logits: torch.Tensor, param_logits: List[torch.Tensor], batch_index: int = 0)

Source: ll_stepnet/stepnet/output_heads.py:262

Convert model output logits to a geotoken TokenSequence.

Takes the raw logits from a generative model’s forward pass and produces a geotoken-compatible TokenSequence by argmaxing command types and parameters, applying PARAMETER_MASKS, and stopping at the first EOS token.

This is the inverse of what :class:GeoTokenDataset does (TokenSequence → tensors); here we go tensors → TokenSequence.

Args: command_logits: [B, S, num_command_types] logits. param_logits: List of 16 [B, S, num_levels] logits. batch_index: Which sample in the batch to decode (default 0).

Returns: A geotoken.TokenSequence with decoded command_tokens. Only command tokens are populated; graph and constraint tokens are not (those come from separate decoders in full models).

`class ConditioningConfig`

Source: ll_stepnet/stepnet/config.py:212

Configuration for cross-attention conditioning modules.

Methods

`init`

__init__(text_encoder_name: str = 'bert-base-uncased', image_encoder_name: str = 'facebook/dinov2-base', conditioning_dim: int = 1024, skip_cross_attention_blocks: int = 2, freeze_encoder: bool = True, num_adaptive_layers: int = 1) -> None

Source: ll_stepnet/stepnet/config.py

`class DDPMScheduler`

Source: ll_stepnet/stepnet/diffusion.py:30

Linear-beta DDPM noise scheduler.

Precomputes alpha, alpha_bar, and sigma schedules for T timesteps and supports: - add_noise (forward process) - step (single reverse step, DDPM) - pndm_step (accelerated PNDM/PLMS reverse step)

Args: num_timesteps: Total number of diffusion steps. beta_start: Starting value of the linear beta schedule. beta_end: Ending value of the linear beta schedule. inference_steps: Number of evenly-spaced steps for PNDM sampling.

Methods

`init`

__init__(num_timesteps: int = 1000, beta_start: float = 0.0001, beta_end: float = 0.02, inference_steps: int = 200) -> None

Source: ll_stepnet/stepnet/diffusion.py:46

`add_noise`

add_noise(x_start: torch.Tensor, noise: torch.Tensor, timesteps: torch.Tensor) -> torch.Tensor

Source: ll_stepnet/stepnet/diffusion.py:85

Forward diffusion: add noise to clean data.

q(x_t | x_0) = sqrt(alpha_bar_t) * x_0 + sqrt(1 - alpha_bar_t) * eps

Args: x_start: Clean data [B, …] (e.g. [B, D] or [B, S, D]). noise: Gaussian noise, same shape as x_start. timesteps: Integer timestep indices [B].

Returns: Noisy data, same shape as x_start, at the given timesteps.

`step`

step(model_output: torch.Tensor, timestep: int, sample: torch.Tensor) -> torch.Tensor

Source: ll_stepnet/stepnet/diffusion.py:114

Single DDPM reverse step: predict x_{t-1} from x_t.

Args: model_output: Predicted noise [B, D]. timestep: Current integer timestep. sample: Current noisy sample [B, D].

Returns: Denoised sample at t-1, [B, D].

`ddim_step_with_log_prob`

ddim_step_with_log_prob(model_output: torch.Tensor, timestep: int, timestep_prev: int, sample: torch.Tensor, eta: float = 1.0) -> Tuple[torch.Tensor, torch.Tensor, torch.Tensor]

Source: ll_stepnet/stepnet/diffusion.py:155

Stochastic DDIM reverse step returning its Gaussian log-probability.

Implements the per-step transition used by diffusion policy-gradient training (DDPO; Black et al., 2023). The math is the DDIM eta sampler shared verbatim by the reference implementations (Make-a-Shape gaussian_diffusion.ddim_sample and the identical brepdiff/diff3d variants)::

x0    = (x_t - sqrt(1 - ab_t) * eps) / sqrt(ab_t)
sigma = eta * sqrt((1 - ab_prev)/(1 - ab_t)) * sqrt(1 - ab_t/ab_prev)
mean  = sqrt(ab_prev) * x0 + sqrt(max(1 - ab_prev - sigma^2, 0)) * eps
x_prev = mean + sigma * noise            # noise ~ N(0, I)

where eps is the model’s predicted noise. The returned log-prob is log N(x_prev; mean, sigma^2 I) summed over the feature dimension (one scalar per batch element). Because mean is a function of model_output (the denoiser’s epsilon prediction), gradients flow into the model parameters — this is what makes the RL signal real rather than a detached stand-in.

Args: model_output: Predicted noise eps_theta(x_t, t), shape [B, D]. timestep: Current integer timestep t. timestep_prev: Next (smaller) timestep t'; pass -1 for the final step landing on x_0 (ab_prev = 1.0). sample: Current noisy sample x_t, shape [B, D]. eta: DDIM stochasticity. Must be > 0 for a non-degenerate policy gradient; 1.0 recovers ancestral-DDPM-like noise.

Returns: Tuple (x_prev, log_prob, entropy) where log_prob and entropy are shape [B] tensors. log_prob carries the gradient to model_output (and thus the model parameters).

`reset_pndm`

reset_pndm() -> None

Source: ll_stepnet/stepnet/diffusion.py:261

Reset the PNDM multi-step buffer before a new sampling run.

`pndm_step`

pndm_step(model_output: torch.Tensor, timestep: int, sample: torch.Tensor) -> torch.Tensor

Source: ll_stepnet/stepnet/diffusion.py:265

Pseudo Numerical Diffusion Model (PNDM/PLMS) reverse step.

Uses a linear multi-step method (up to 4th order) for faster inference with fewer function evaluations.

Args: model_output: Predicted noise [B, D]. timestep: Current timestep index. sample: Current noisy sample [B, D].

Returns: Denoised sample, [B, D].

`class DataConfig`

Source: ll_stepnet/stepnet/config.py:122

Configuration for data loading.

Methods

`init`

__init__(data_dir: str = 'data', train_split: str = 'train', val_split: str = 'val', test_split: str = 'test', max_length: int = 2048, use_topology: bool = True, num_workers: int = 4) -> None

Source: ll_stepnet/stepnet/config.py

`class DiffusionConfig`

Source: ll_stepnet/stepnet/config.py:185

Configuration for the diffusion-based CAD denoiser.

Methods

`init`

__init__(num_timesteps: int = 1000, beta_start: float = 0.0001, beta_end: float = 0.02, inference_steps: int = 200, denoiser_layers: int = 12, denoiser_heads: int = DEFAULT_DENOISER_HEADS, denoiser_hidden_dim: int = DEFAULT_DENOISER_HIDDEN_DIM, latent_dim: int = 256, num_faces: int = 8, num_edges: int = 12, uv_grid_size: int = 8, edge_num_points: int = 12, codec_hidden_dim: int = 256) -> None

Source: ll_stepnet/stepnet/config.py

`class DiffusionTrainer`

Source: ll_stepnet/stepnet/training/diffusion_trainer.py:30

Trainer for denoising diffusion probabilistic models on CAD data.

Implements the DDPM training procedure:

Sample random timesteps for each item in the batch
Add noise according to the noise schedule at those timesteps
Train the model to predict the added noise
Maintain an EMA copy of the model for generation

Args: model: Denoising model that takes (noisy_input, timestep) and predicts noise. scheduler: Noise scheduler with add_noise() and step() methods, providing the beta schedule and noise levels for each timestep. train_dataloader: DataLoader for training data. val_dataloader: Optional DataLoader for validation data. device: Device string. ‘auto’ selects CUDA if available, else CPU. checkpoint_dir: Directory path for saving checkpoints and samples. ema_decay: Decay rate for exponential moving average (default 0.9999).

Methods

`init`

__init__(model: nn.Module, scheduler: Any, train_dataloader: DataLoader, val_dataloader: Optional[DataLoader] = None, device: str = 'auto', checkpoint_dir: Optional[str] = None, ema_decay: float = 0.9999) -> None

Source: ll_stepnet/stepnet/training/diffusion_trainer.py:50

`train_epoch`

train_epoch() -> Dict[str, float]

Source: ll_stepnet/stepnet/training/diffusion_trainer.py:231

Train for one epoch of denoising diffusion.

For each batch:

Sample random timesteps uniformly
Sample Gaussian noise
Create noisy versions of the input
Predict the noise with the model
Compute MSE between predicted and actual noise
Update EMA model

Returns: Dictionary with keys: ‘loss’, ‘noise_mse’.

`validate`

validate() -> Dict[str, float]

Source: ll_stepnet/stepnet/training/diffusion_trainer.py:334

Run validation on the diffusion model.

Computes noise prediction MSE on the validation set using both the training model and the EMA model.

Returns: Dictionary with keys: ‘val_loss’, ‘val_noise_mse’, ‘ema_val_loss’.

`sample_and_visualize`

sample_and_visualize(num_samples: int, epoch: int) -> None

Source: ll_stepnet/stepnet/training/diffusion_trainer.py:396

Generate samples using the EMA model and save visualizations.

Runs the full reverse diffusion process starting from pure noise and saves the resulting samples and a visualization to the checkpoint directory.

Args: num_samples: Number of samples to generate. epoch: Current epoch number for labeling the output.

`train`

train(num_epochs: int, save_every: int = 1) -> None

Source: ll_stepnet/stepnet/training/diffusion_trainer.py:504

Train for multiple epochs with EMA updates and periodic sampling.

Orchestrates the full diffusion training loop:

Per-epoch noise prediction training
EMA model updates each step
Validation and sample generation
Checkpointing

Args: num_epochs: Total number of epochs to train. save_every: Save a checkpoint and generate samples every N epochs.

`save_checkpoint`

save_checkpoint(filename: str) -> None

Source: ll_stepnet/stepnet/training/diffusion_trainer.py:571

Save model and EMA model checkpoint to disk.

Args: filename: Name of the checkpoint file.

`load_checkpoint`

load_checkpoint(filename: str) -> None

Source: ll_stepnet/stepnet/training/diffusion_trainer.py:596

Load model and EMA model checkpoint from disk.

Args: filename: Name of the checkpoint file to load.

`save_history`

save_history() -> None

Source: ll_stepnet/stepnet/training/diffusion_trainer.py:623

Save training history to a JSON file in the checkpoint directory.

`class DisentangledCodebooks(nn.Module)`

Source: ll_stepnet/stepnet/vqvae.py:269

SkexGen’s three-codebook disentangled quantization system.

Maintains separate codebooks for three aspects of a CAD model:

Topology codebook: encodes curve type sequences (e.g. line, arc, spline ordering in a sketch profile).
Geometry codebook: encodes 2D point positions on a 64x64 quantized grid.
Extrusion codebook: encodes 3D extrusion operations (direction, depth, taper, boolean type).

Each input feature stream is projected to code_dim, quantized through its respective codebook, then decoded back. The model produces 10 total codes per CAD model split across the three codebooks (3 topology + 4 geometry + 3 extrusion by default).

Args: topology_codes: Number of entries in the topology codebook. geometry_codes: Number of entries in the geometry codebook. extrusion_codes: Number of entries in the extrusion codebook. code_dim: Dimensionality shared by all codebook vectors.

Methods

`init`

__init__(topology_codes: int = 500, geometry_codes: int = 1000, extrusion_codes: int = 1000, code_dim: int = 256) -> None

Source: ll_stepnet/stepnet/vqvae.py:299

`set_epoch`

set_epoch(epoch: int) -> None

Source: ll_stepnet/stepnet/vqvae.py:388

Propagate epoch counter to all child codebooks.

Args: epoch: Current training epoch (0-indexed).

`encode`

encode(sketch_features: torch.Tensor, geometry_features: torch.Tensor, extrusion_features: torch.Tensor) -> tuple[torch.Tensor, torch.Tensor, torch.Tensor]

Source: ll_stepnet/stepnet/vqvae.py:411

Encode feature streams into discrete codebook indices.

Each feature tensor is projected, reshaped into a short code sequence, then quantized through the corresponding codebook.

Args: sketch_features: Topology/sketch features of shape (batch, code_dim). geometry_features: Geometry features of shape (batch, code_dim). extrusion_features: Extrusion features of shape (batch, code_dim).

Returns: A tuple of (topology_codes, geometry_codes, extrusion_codes) where each is a LongTensor of codebook indices with shape (batch, num_codes_for_that_stream).

`decode`

decode(topology_codes: torch.Tensor, geometry_codes: torch.Tensor, extrusion_codes: torch.Tensor) -> tuple[torch.Tensor, torch.Tensor, torch.Tensor]

Source: ll_stepnet/stepnet/vqvae.py:475

Decode codebook indices back to reconstructed feature vectors.

Args: topology_codes: (batch, TOPOLOGY_NUM_CODES) LongTensor of topology codebook indices. geometry_codes: (batch, GEOMETRY_NUM_CODES) LongTensor of geometry codebook indices. extrusion_codes: (batch, EXTRUSION_NUM_CODES) LongTensor of extrusion codebook indices.

Returns: A tuple of (topo_features, geom_features, extr_features) each of shape (batch, code_dim).

`decode_quantized`

decode_quantized() -> tuple[torch.Tensor, torch.Tensor, torch.Tensor]

Source: ll_stepnet/stepnet/vqvae.py:513

Decode using the cached quantized outputs from the last encode.

This uses the straight-through quantized tensors (with gradients) from the most recent encode call, which is needed during training to allow gradient flow through the VQ bottleneck.

Returns: A tuple of (topo_features, geom_features, extr_features) each of shape (batch, code_dim).

Raises: RuntimeError: If called before encode.

`class GANTrainer`

Source: ll_stepnet/stepnet/training/gan_trainer.py:28

Trainer for Wasserstein GAN with gradient penalty in the latent space.

Trains a generator and discriminator (critic) to produce realistic latent vectors that can be decoded by a pre-trained VAE decoder into valid CAD token sequences.

The WGAN-GP formulation provides:

Wasserstein distance as a meaningful training signal
Gradient penalty for stable training without mode collapse
Alternating critic/generator updates with configurable ratio

Args: generator: Generator network mapping noise -> latent vectors. discriminator: Discriminator (critic) network scoring latent vectors. train_dataloader: DataLoader providing real latent vectors for training. device: Device string. ‘auto’ selects CUDA if available, else CPU. checkpoint_dir: Directory path for saving checkpoints. gp_lambda: Gradient penalty coefficient (default 10.0 per WGAN-GP paper). n_critic: Number of critic updates per generator update (default 5). lr_gen: Learning rate for the generator optimizer. lr_disc: Learning rate for the discriminator optimizer.

Methods

`init`

__init__(generator: nn.Module, discriminator: nn.Module, train_dataloader: DataLoader, device: str = 'auto', checkpoint_dir: Optional[str] = None, gp_lambda: float = 10.0, n_critic: int = 5, lr_gen: float = 0.0001, lr_disc: float = 0.0001) -> None

Source: ll_stepnet/stepnet/training/gan_trainer.py:52

`train_discriminator_step`

train_discriminator_step(real_latents: torch.Tensor) -> Dict[str, float]

Source: ll_stepnet/stepnet/training/gan_trainer.py:202

Perform one discriminator (critic) training step.

Computes the WGAN loss with gradient penalty: D_loss = D(fake) - D(real) + lambda * GP

Args: real_latents: Real latent vectors, shape (batch, latent_dim).

Returns: Dictionary with keys: ‘d_loss’, ‘gp_loss’, ‘wasserstein_dist’.

`train_generator_step`

train_generator_step(batch_size: int) -> Dict[str, float]

Source: ll_stepnet/stepnet/training/gan_trainer.py:246

Perform one generator training step.

Generates fake latents and optimizes the generator to fool the critic: G_loss = -D(G(z))

Args: batch_size: Number of samples to generate.

Returns: Dictionary with key: ‘g_loss’.

`train_epoch`

train_epoch() -> Dict[str, float]

Source: ll_stepnet/stepnet/training/gan_trainer.py:275

Train for one epoch with alternating critic/generator updates.

Performs n_critic discriminator updates for every 1 generator update, following the WGAN-GP training protocol.

Returns: Dictionary with keys: ‘d_loss’, ‘g_loss’, ‘gp_loss’, ‘wasserstein_dist’.

`validate`

validate() -> Dict[str, float]

Source: ll_stepnet/stepnet/training/gan_trainer.py:362

Compute validation metrics for the GAN.

Evaluates generation quality using FID-style metrics:

Mean and std difference between generated and real latent distributions
Approximate FID score from distribution moments
Discriminator accuracy on real vs fake

Returns: Dictionary with validation metrics including ‘fid_approx’, ‘mean_diff’, ‘std_diff’, ‘disc_accuracy’.

`sample`

sample(num_samples: int) -> torch.Tensor

Source: ll_stepnet/stepnet/training/gan_trainer.py:444

Generate latent vectors using the trained generator.

Args: num_samples: Number of latent vectors to generate.

Returns: Tensor of generated latent vectors, shape (num_samples, latent_dim).

`train`

train(num_epochs: int, save_every: int = 1) -> None

Source: ll_stepnet/stepnet/training/gan_trainer.py:466

Train for multiple epochs.

Orchestrates the full GAN training loop with checkpointing and periodic validation.

Args: num_epochs: Total number of epochs to train. save_every: Save a checkpoint every N epochs.

`save_checkpoint`

save_checkpoint(filename: str) -> None

Source: ll_stepnet/stepnet/training/gan_trainer.py:527

Save generator and discriminator checkpoint to disk.

Args: filename: Name of the checkpoint file.

`load_checkpoint`

load_checkpoint(filename: str) -> None

Source: ll_stepnet/stepnet/training/gan_trainer.py:552

Load generator and discriminator checkpoint from disk.

Args: filename: Name of the checkpoint file to load.

`save_history`

save_history() -> None

Source: ll_stepnet/stepnet/training/gan_trainer.py:587

Save training history to a JSON file in the checkpoint directory.

`class GeoTokenCollator`

Source: ll_stepnet/stepnet/data.py:520

Collate function for batching GeoTokenDataset samples.

Handles variable-length graph and constraint token sequences by padding to the maximum length in the batch.

Args: pad_token_id: Token ID for padding. Default 0.

Methods

`init`

__init__(pad_token_id: int = 0) -> None

Source: ll_stepnet/stepnet/data.py:530

`class GeoTokenDataset(Dataset)`

Source: ll_stepnet/stepnet/data.py:281

PyTorch Dataset for geotoken TokenSequence objects.

Consumes geotoken TokenSequences directly — no format conversion needed because ll_stepnet’s CommandType enum and PARAMETER_MASKS match geotoken’s natively.

The integer index of each geotoken CommandType enum member maps directly to ll_stepnet’s CommandType IntEnum: SOL=0, LINE=1, ARC=2, CIRCLE=3, EXTRUDE=4, EOS=5

Each item is a dictionary containing: - command_types: [seq_len] integer command type IDs - parameters: [seq_len, NUM_PARAM_SLOTS] quantized parameter values - parameter_mask: [seq_len, NUM_PARAM_SLOTS] boolean active-parameter mask - attention_mask: [seq_len] validity mask (1=real, 0=padding)

When encode_graph_tokens=True and the TokenSequence has graph tokens, the item also contains: - graph_token_ids: [variable] integer IDs from CADVocabulary

When encode_constraint_tokens=True and the TokenSequence has constraint tokens, the item also contains: - constraint_token_ids: [variable] integer IDs from CADVocabulary

Args: token_sequences: List of geotoken TokenSequence objects. max_commands: Maximum command sequence length (pad/truncate). labels: Optional labels for supervised learning. encode_graph_tokens: Encode graph tokens via CADVocabulary. Default False. encode_constraint_tokens: Encode constraint tokens via CADVocabulary. Default False.

Methods

`init`

__init__(token_sequences: List, max_commands: int = DEFAULT_MAX_SEQ_LEN, labels: Optional[List] = None, encode_graph_tokens: bool = False, encode_constraint_tokens: bool = False) -> None

Source: ll_stepnet/stepnet/data.py:318

`class ImageConditioner(nn.Module)`

Source: ll_stepnet/stepnet/conditioning.py:332

Condition CAD generation on rendered images.

Wraps a frozen DINOv2 or CLIP vision encoder, projects patch embeddings to the decoder dimension, and applies AdaptiveLayer blocks for cross-attention injection.

Args: encoder_name: Hugging Face model identifier (e.g. “facebook/dinov2-base”). conditioning_dim: Dimension of the conditioning embeddings. freeze_encoder: Whether to freeze the vision encoder weights. num_adaptive_layers: Number of AdaptiveLayer blocks. skip_cross_attention_blocks: Number of initial decoder blocks that skip cross-attention (Text2CAD default = 2).

Methods

`init`

__init__(encoder_name: str = 'facebook/dinov2-base', conditioning_dim: int = 1024, freeze_encoder: bool = True, num_adaptive_layers: int = 1, skip_cross_attention_blocks: int = 2) -> None

Source: ll_stepnet/stepnet/conditioning.py:349

`encode_image`

encode_image(pixel_values: torch.Tensor) -> torch.Tensor

Source: ll_stepnet/stepnet/conditioning.py:438

Encode pixel values into conditioning embeddings.

Args: pixel_values: [B, C, H, W] preprocessed image tensors.

Returns: Conditioning embeddings [B, N, conditioning_dim] where N is the number of patch tokens.

`forward`

forward(hidden_states: torch.Tensor, pixel_values: torch.Tensor, block_index: Optional[int] = None) -> torch.Tensor

Source: ll_stepnet/stepnet/conditioning.py:461

Condition decoder hidden states on image features.

Early decoder blocks skip cross-attention (same logic as :class:TextConditioner).

Args: hidden_states: [B, S, D] decoder hidden states. pixel_values: [B, C, H, W] preprocessed image tensors. block_index: Optional zero-based decoder block index.

Returns: Conditioned hidden states [B, S, D].

`class LatentDiscriminator(nn.Module)`

Source: ll_stepnet/stepnet/latent_gan.py:75

Wasserstein critic that scores latent vectors.

Args: latent_dim: Dimensionality of the input latent vector. hidden_dims: Sizes of hidden layers in the MLP.

Methods

`init`

__init__(latent_dim: int = 256, hidden_dims: Optional[List[int]] = None) -> None

Source: ll_stepnet/stepnet/latent_gan.py:83

`forward`

forward(z: torch.Tensor) -> torch.Tensor

Source: ll_stepnet/stepnet/latent_gan.py:111

Score a latent vector (higher = more real).

Args: z: [batch_size, latent_dim].

Returns: Scalar scores [batch_size, 1].

`class LatentGAN`

Source: ll_stepnet/stepnet/latent_gan.py:123

WGAN-GP training loop for the VAE latent space.

Manages the generator, discriminator, their optimizers, and the gradient-penalty computation.

Args: latent_dim: Dimensionality of the latent space. gen_hidden_dims: Generator MLP hidden sizes. disc_hidden_dims: Discriminator MLP hidden sizes. gp_lambda: Gradient penalty coefficient. n_critic: Number of discriminator updates per generator update. lr_gen: Generator learning rate. lr_disc: Discriminator learning rate. device: Torch device.

Methods

`init`

__init__(latent_dim: int = 256, gen_hidden_dims: Optional[List[int]] = None, disc_hidden_dims: Optional[List[int]] = None, gp_lambda: float = 10.0, n_critic: int = 5, lr_gen: float = 0.0001, lr_disc: float = 0.0001, device: Optional[torch.device] = None) -> None

Source: ll_stepnet/stepnet/latent_gan.py:140

`train_step`

train_step(real_latents: torch.Tensor) -> Dict[str, float]

Source: ll_stepnet/stepnet/latent_gan.py:217

One training step: update critic n_critic times, then generator once.

Args: real_latents: [batch_size, latent_dim] latent vectors produced by the VAE encoder on real data.

Returns: Dictionary of loss values for logging: - disc_loss: latest discriminator loss - gen_loss: generator loss (0 if not updated this step) - gp: gradient penalty - wasserstein_distance: estimated Wasserstein distance

`sample`

sample(num_samples: int = 1, device: Optional[torch.device] = None) -> torch.Tensor

Source: ll_stepnet/stepnet/latent_gan.py:285

Sample latent vectors from the trained generator.

Args: num_samples: Number of samples to generate. device: Target device (defaults to self.device).

Returns: Generated latent vectors [num_samples, latent_dim].

`to`

to(device: torch.device) -> LatentGAN

Source: ll_stepnet/stepnet/latent_gan.py:308

Move all models and state to the given device.

Args: device: Target torch device.

Returns: Self for chaining.

`class LatentGANConfig`

Source: ll_stepnet/stepnet/config.py:173

Configuration for the latent-space WGAN-GP.

Methods

`init`

__init__(latent_dim: int = 256, gen_hidden_dims: List[int] = (lambda: [512, 512])(), disc_hidden_dims: List[int] = (lambda: [512, 512])(), gp_lambda: float = 10.0, n_critic: int = 5, lr_gen: float = 0.0001, lr_disc: float = 0.0001) -> None

Source: ll_stepnet/stepnet/config.py

`class LatentGenerator(nn.Module)`

Source: ll_stepnet/stepnet/latent_gan.py:26

Maps noise vectors to fake latent codes.

Args: latent_dim: Dimension of both input noise and output latent. hidden_dims: Sizes of hidden layers in the MLP.

Methods

`init`

__init__(latent_dim: int = 256, hidden_dims: Optional[List[int]] = None) -> None

Source: ll_stepnet/stepnet/latent_gan.py:34

`forward`

forward(z_noise: torch.Tensor) -> torch.Tensor

Source: ll_stepnet/stepnet/latent_gan.py:63

Generate fake latent vectors from noise.

Args: z_noise: [batch_size, latent_dim] samples from N(0,I).

Returns: Fake latent vectors [batch_size, latent_dim].

`class MultiModalConditioner(nn.Module)`

Source: ll_stepnet/stepnet/conditioning.py:494

Fuses text and image conditioning for CAD generation.

Combines TextConditioner and ImageConditioner by concatenating their conditioning embeddings along the sequence dimension before passing through shared AdaptiveLayer blocks.

Args: text_encoder_name: Hugging Face text model identifier. image_encoder_name: Hugging Face vision model identifier. conditioning_dim: Shared conditioning dimension. freeze_encoders: Whether to freeze both pretrained encoders. num_adaptive_layers: Number of shared AdaptiveLayer blocks.

Methods

`init`

__init__(text_encoder_name: str = 'bert-base-uncased', image_encoder_name: str = 'facebook/dinov2-base', conditioning_dim: int = 1024, freeze_encoders: bool = True, num_adaptive_layers: int = 1, skip_cross_attention_blocks: int = 2) -> None

Source: ll_stepnet/stepnet/conditioning.py:509

`forward`

forward(hidden_states: torch.Tensor, text_input_ids: Optional[torch.Tensor] = None, text_attention_mask: Optional[torch.Tensor] = None, pixel_values: Optional[torch.Tensor] = None, block_index: Optional[int] = None) -> torch.Tensor

Source: ll_stepnet/stepnet/conditioning.py:563

Condition decoder hidden states on text and/or image features.

At least one of text_input_ids or pixel_values must be provided. Early decoder blocks skip cross-attention per Text2CAD design.

Args: hidden_states: [B, S, D] decoder hidden states. text_input_ids: [B, L_text] text token ids (optional). text_attention_mask: [B, L_text] text padding mask. pixel_values: [B, C, H, W] image tensors (optional). block_index: Optional zero-based decoder block index.

Returns: Conditioned hidden states [B, S, D].

`class ParameterHeads(nn.Module)`

Source: ll_stepnet/stepnet/output_heads.py:90

Sixteen independent linear heads, one per parameter slot.

Each head maps the decoder hidden state to num_levels logits representing the quantised bin for that parameter.

Args: embed_dim: Dimension of the decoder hidden states. num_param_slots: Number of parameter slots (default 16). num_levels: Number of quantisation levels per parameter.

Methods

`init`

__init__(embed_dim: int = 256, num_param_slots: int = NUM_PARAM_SLOTS, num_levels: int = DEFAULT_QUANTIZATION_LEVELS) -> None

Source: ll_stepnet/stepnet/output_heads.py:102

`forward`

forward(hidden_states: torch.Tensor) -> List[torch.Tensor]

Source: ll_stepnet/stepnet/output_heads.py:115

Compute per-slot parameter logits.

Args: hidden_states: [batch, seq_len, embed_dim]

Returns: List of NUM_PARAM_SLOTS tensors, each [batch, seq_len, num_levels].

`class STEPAnnotatedOutput`

Source: ll_stepnet/stepnet/annotations.py:103

Combined reserialized output with structural annotations.

Attributes: summary: File-level structural summary (if generated). branches: Per-branch annotations for each root subtree. reserialized_text: The DFS-reserialized entity text. annotated_text: Full output combining summary, branches, and entities.

Methods

`format`

format() -> str

Source: ll_stepnet/stepnet/annotations.py:118

Format full annotated output.

Combines summary, branch annotations, and reserialized text into a single string separated by newlines.

Returns: Complete annotated output string.

`init`

__init__(summary: Optional[StructuralSummary] = None, branches: list[BranchAnnotation] = list(), reserialized_text: str = '', annotated_text: str = '') -> None

Source: ll_stepnet/stepnet/annotations.py

`class STEPAnnotationConfig`

Source: ll_stepnet/stepnet/config.py:145

Configuration for structural annotations.

Methods

`init`

__init__(include_file_summary: bool = True, include_branch_annotations: bool = True, max_types_shown: int = 5, verbose: bool = False) -> None

Source: ll_stepnet/stepnet/config.py

`class STEPCaptioningConfig`

Source: ll_stepnet/stepnet/config.py:83

Configuration for STEP captioning model.

Methods

`init`

__init__(vocab_size: int = 50000, decoder_vocab_size: int = 50000, output_dim: int = 1024, max_caption_length: int = 128) -> None

Source: ll_stepnet/stepnet/config.py

`class STEPClassificationConfig`

Source: ll_stepnet/stepnet/config.py:65

Configuration for STEP classification model.

Methods

`init`

__init__(vocab_size: int = 50000, num_classes: int = 10, output_dim: int = 1024, dropout: float = 0.1) -> None

Source: ll_stepnet/stepnet/config.py

`class STEPCollator`

Source: ll_stepnet/stepnet/data.py:150

Collate function for batching STEP data. Handles variable-length sequences and topology graphs.

Methods

`init`

__init__(pad_token_id: int = 0)

Source: ll_stepnet/stepnet/data.py:156

`class STEPDFSSerializer`

Source: ll_stepnet/stepnet/reserialization.py:181

DFS-based STEP entity serializer.

Reorders STEP entities by depth-first traversal of the reference graph, producing output where related entities appear contiguously. Each entity is expanded exactly once (branch pruning).

Args: config: Reserialization configuration. Uses defaults if None.

Methods

`init`

__init__(config: Optional[STEPReserializationConfig] = None)

Source: ll_stepnet/stepnet/reserialization.py:192

`serialize`

serialize(graph: STEPEntityGraph) -> STEPReserializedOutput

Source: ll_stepnet/stepnet/reserialization.py:195

Perform DFS reserialization of entity graph.

Algorithm:

Find roots using configured strategy
DFS traverse, visiting each entity exactly once
Append orphans (unreachable entities)
Optionally renumber IDs sequentially
Optionally normalize floats

Args: graph: Parsed STEP entity graph.

Returns: STEPReserializedOutput with reserialized text and metadata.

`class STEPDataset(Dataset)`

Source: ll_stepnet/stepnet/data.py:26

PyTorch Dataset for STEP files. Loads and preprocesses STEP files on-the-fly.

Methods

`init`

__init__(file_paths: List[str], labels: Optional[List] = None, tokenizer: Optional[STEPTokenizer] = None, max_length: int = 2048, use_topology: bool = True, use_reserialization: bool = False, use_annotations: bool = False, reserialization_config: Optional[STEPReserializationConfig] = None, annotation_config: Optional[STEPAnnotationConfig] = None)

Source: ll_stepnet/stepnet/data.py:32

Args: file_paths: List of paths to STEP files labels: Optional labels for supervised learning tokenizer: STEPTokenizer instance (creates default if None) max_length: Maximum sequence length use_topology: Whether to build topology graphs use_reserialization: Whether to apply DFS reserialization to entity text use_annotations: Whether to prepend structural annotations reserialization_config: Configuration for DFS reserialization annotation_config: Configuration for structural annotations

`class STEPEncoder(nn.Module)`

Source: ll_stepnet/stepnet/encoder.py:454

Complete STEP encoder combining all components.

Architecture: 1. Tokenizer: Text → Token IDs (external) 2. Transformer: Token IDs → Sequence features 3. Feature Extractor: Entities → Geometric features (external) 4. Graph Network: Topology → Structural features 5. Fusion: Combine all representations

The graph encoder accepts node features from either source natively:

cadling’s TopologyGraph: 48-dim features (default)
ll_stepnet’s STEPTopologyBuilder: 129-dim features (set graph_input_dim=129)

No adapters or conversion needed — pass topology_data directly from cadling’s TopologyGraph or ll_stepnet’s topology builder.

Methods

`init`

__init__(vocab_size: int = 50000, token_embed_dim: int = 256, graph_node_dim: int = 128, graph_input_dim: int = 48, output_dim: int = 1024, num_transformer_layers: int = 6, num_graph_layers: int = 3, expected_feature_dims: Optional[List[int]] = None)

Source: ll_stepnet/stepnet/encoder.py:473

`register_feature_projection`

register_feature_projection(in_dim: int) -> nn.Linear

Source: ll_stepnet/stepnet/encoder.py:523

Pre-register a projection layer for a given input feature dimension.

Call this before constructing the optimizer to ensure the projection parameters are included in model.parameters().

Args: in_dim: Input feature dimension to project from.

Returns: The nn.Linear projection layer (also stored in self._feature_projs).

`load_state_dict`

load_state_dict(state_dict, strict = True, assign = False)

Source: ll_stepnet/stepnet/encoder.py:543

Pre-create lazy projection layers found in checkpoint before loading.

_feature_projs entries are created lazily during forward() so a freshly constructed model has an empty ModuleDict. Without this override, load_state_dict would either error (strict=True) or silently drop the learned projection weights (strict=False).

`forward`

forward(token_ids: torch.Tensor, topology_data: Optional[Dict] = None) -> torch.Tensor

Source: ll_stepnet/stepnet/encoder.py:570

Args: token_ids: [batch_size, seq_len] from STEPTokenizer topology_data: Dict with topology from either cadling or ll_stepnet: - adjacency_matrix: [num_nodes, num_nodes] (dense, sparse COO, or sparse CSR) - node_features: [num_nodes, feature_dim] (48-dim from cadling TopologyGraph, or 129-dim from STEPTopologyBuilder)

Returns: encoded: [batch_size, output_dim] final encoding

`prepare_topology_data`

prepare_topology_data(topology_obj) -> Dict[str, torch.Tensor]

Source: ll_stepnet/stepnet/encoder.py:661

Convert a cadling TopologyGraph or raw dict to forward()-ready format.

Accepts either:

A dict already in forward() format (with adjacency_matrix and node_features tensor values) — returned as-is after ensuring values are tensors.
A cadling TopologyGraph object (or any object with to_numpy_node_features() and to_edge_index() methods) — extracts numpy arrays, builds a sparse adjacency matrix, and converts to tensors.

The adjacency matrix is returned as a sparse COO tensor to avoid O(N^2) memory on large B-Rep graphs.

This lets callers pass a cadling TopologyGraph directly without writing glue code::

topo = cadling_item.topology_graph  # cadling TopologyGraph
out = encoder(token_ids, STEPEncoder.prepare_topology_data(topo))

Args: topology_obj: Either a dict with adjacency_matrix and node_features keys, or an object with to_numpy_node_features() / to_edge_index() methods (e.g. cadling’s TopologyGraph).

Returns: Dict with adjacency_matrix (sparse COO [N, N]) and node_features [N, D] float tensors ready for :meth:forward.

`class STEPEncoderConfig`

Source: ll_stepnet/stepnet/config.py:52

Configuration for STEP encoder.

Methods

`init`

__init__(vocab_size: int = 50000, token_embed_dim: int = 256, graph_node_dim: int = 128, graph_input_dim: int = 48, output_dim: int = 1024, num_transformer_layers: int = 6, num_graph_layers: int = 3, dropout: float = 0.1) -> None

Source: ll_stepnet/stepnet/config.py

`class STEPEntityGraph`

Source: ll_stepnet/stepnet/reserialization.py:58

Graph of STEP entities with parent/child relationships.

Parses raw STEP text into an entity reference graph suitable for DFS traversal. Each entity node stores its forward references (children) and back references (parents). See also STEPFeatureExtractor.extract_entity_info and STEPFeatureExtractor.extract_references in features.py for similar single-entity parsing — the regex approach used here is optimised for bulk graph construction.

Methods

`parse`

parse(step_text: str) -> 'STEPEntityGraph'

Source: ll_stepnet/stepnet/reserialization.py:72

Parse STEP text into entity graph.

Extracts entity definitions and builds parent/child reference graph.

Args: step_text: Raw STEP text containing entity definitions.

Returns: Populated STEPEntityGraph instance.

`roots_by_strategy`

roots_by_strategy(strategy: str = 'both') -> list[int]

Source: ll_stepnet/stepnet/reserialization.py:127

Find roots using specified strategy.

Strategies: no_incoming: Entities with no parent references type_hierarchy: Entities with highest B-Rep type weight both: Combine and deduplicate, no_incoming first

Args: strategy: Root-finding strategy name.

Returns: Ordered list of root entity IDs.

`init`

__init__(nodes: dict[int, STEPEntityNode] = dict()) -> None

Source: ll_stepnet/stepnet/reserialization.py

`class STEPEntityNode`

Source: ll_stepnet/stepnet/reserialization.py:47

A single STEP entity with its references.

Methods

`init`

__init__(entity_id: int, entity_type: str, parameters: str, children: list[int] = list(), parents: list[int] = list(), raw_line: str = '') -> None

Source: ll_stepnet/stepnet/reserialization.py

`class STEPFeatureExtractor`

Source: ll_stepnet/stepnet/features.py:11

Extracts geometric features from tokenized STEP content. Separate from tokenization - operates on parsed entities.

Methods

`init`

__init__()

Source: ll_stepnet/stepnet/features.py:17

Initialize feature extractor with parameter patterns.

`extract_entity_info`

extract_entity_info(entity_text: str) -> Dict

Source: ll_stepnet/stepnet/features.py:51

Parse a single STEP entity to extract basic info.

Args: entity_text: Single STEP entity string (e.g., “#31=CYLINDER(…);”)

Returns: Dictionary with entity_id, entity_type, parameters

`extract_numeric_params`

extract_numeric_params(params_text: str) -> List[float]

Source: ll_stepnet/stepnet/features.py:80

Extract all numeric values from parameter string.

Args: params_text: Parameter text from entity

Returns: List of numeric values

`extract_references`

extract_references(params_text: str) -> List[int]

Source: ll_stepnet/stepnet/features.py:103

Extract entity reference IDs (#123, #456, etc.).

Args: params_text: Parameter text from entity

Returns: List of referenced entity IDs

`extract_geometric_features`

extract_geometric_features(entity_text: str) -> Dict

Source: ll_stepnet/stepnet/features.py:117

Extract complete geometric features from an entity.

Args: entity_text: STEP entity text

Returns: Dictionary with: - entity_id: int - entity_type: str - numeric_params: List[float] - references: List[int] - named_params: Dict (if known pattern)

`extract_features_from_chunk`

extract_features_from_chunk(chunk_text: str) -> List[Dict]

Source: ll_stepnet/stepnet/features.py:152

Extract features from a chunk of STEP text (multiple entities).

Args: chunk_text: Raw STEP text with multiple entities

Returns: List of feature dictionaries

`class STEPForCaptioning(nn.Module)`

Source: ll_stepnet/stepnet/tasks.py:14

STEP encoder with caption generation head. Predicts: Natural language description of CAD part.

Use case: “This is a mounting bracket with 4 bolt holes”

Methods

`init`

__init__(vocab_size: int = 50000, decoder_vocab_size: int = 50000, output_dim: int = 1024, max_caption_length: int = 128)

Source: ll_stepnet/stepnet/tasks.py:22

`forward`

forward(token_ids: torch.Tensor, caption_ids: Optional[torch.Tensor] = None, topology_data: Optional[Dict] = None) -> torch.Tensor

Source: ll_stepnet/stepnet/tasks.py:46

Args: token_ids: [batch, seq_len] STEP tokens caption_ids: [batch, caption_len] target captions (for training) topology_data: Optional topology dict

Returns: logits: [batch, caption_len, vocab_size] - caption predictions

`generate`

generate(token_ids: torch.Tensor, topology_data: Optional[Dict] = None, max_length: int = 64, num_beams: int = 4, temperature: float = 1.0, eos_token_id: int = 2, pad_token_id: int = 0, bos_token_id: int = 1) -> torch.Tensor

Source: ll_stepnet/stepnet/tasks.py:78

Generate captions using beam search decoding.

Args: token_ids: [batch, seq_len] STEP tokens topology_data: Optional topology dict max_length: Maximum caption length to generate num_beams: Number of beams for beam search (1 = greedy) temperature: Sampling temperature (lower = more deterministic) eos_token_id: End of sequence token ID pad_token_id: Padding token ID bos_token_id: Beginning of sequence token ID

Returns: generated_ids: [batch, generated_len] generated caption token IDs

`class STEPForCausalLM(nn.Module)`

Source: ll_stepnet/stepnet/pretrain.py:62

Autoregressive (GPT-style) token prediction for STEP files. Predicts next token given previous tokens.

STEP-aware architecture:

Token sequence modeling with causal attention
Topology/geometry understanding via graph encoder
Fusion of both modalities

Train on raw STEP files with NO LABELS!

Methods

`init`

__init__(vocab_size: int = 50000, embed_dim: int = 512, num_layers: int = 12, num_heads: int = 8, max_length: int = 4096, dropout: float = 0.1, graph_node_dim: int = 128, num_graph_layers: int = 3)

Source: ll_stepnet/stepnet/pretrain.py:75

`forward`

forward(input_ids: torch.Tensor, attention_mask: Optional[torch.Tensor] = None, topology_data: Optional[Dict] = None, labels: Optional[torch.Tensor] = None) -> Dict[str, torch.Tensor]

Source: ll_stepnet/stepnet/pretrain.py:118

Args: input_ids: [batch_size, seq_len] - tokenized STEP content attention_mask: [batch_size, seq_len] - mask for padding topology_data: Optional dict with: - adjacency_matrix: [num_nodes, num_nodes] - node_features: [num_nodes, feature_dim] labels: [batch_size, seq_len] - next token targets (optional, for training)

Returns: Dictionary with: - logits: [batch_size, seq_len, vocab_size] - loss: scalar (if labels provided)

`generate`

generate(input_ids: torch.Tensor, max_new_tokens: int = 100, temperature: float = 1.0, top_k: Optional[int] = 50) -> torch.Tensor

Source: ll_stepnet/stepnet/pretrain.py:173

Generate STEP tokens autoregressively.

Args: input_ids: [batch_size, seq_len] - prompt tokens max_new_tokens: How many tokens to generate temperature: Sampling temperature (higher = more random) top_k: Only sample from top K tokens

Returns: Generated token IDs [batch_size, seq_len + max_new_tokens]

`class STEPForClassification(nn.Module)`

Source: ll_stepnet/stepnet/tasks.py:193

STEP encoder with classification head. Predicts: Part category.

Use case: “bracket”, “housing”, “shaft”, “gear”, etc.

Methods

`init`

__init__(vocab_size: int = 50000, num_classes: int = 100, output_dim: int = 1024)

Source: ll_stepnet/stepnet/tasks.py:201

`forward`

forward(token_ids: torch.Tensor, topology_data: Optional[Dict] = None) -> torch.Tensor

Source: ll_stepnet/stepnet/tasks.py:218

Args: token_ids: [batch, seq_len] topology_data: Optional topology

Returns: logits: [batch, num_classes]

`class STEPForHybridLM(nn.Module)`

Source: ll_stepnet/stepnet/pretrain.py:335

Hybrid model combining causal and masked prediction. Best of both worlds for pre-training.

Both objectives share a single graph encoder so topology understanding learned from one objective transfers to the other.

Methods

`init`

__init__(vocab_size: int = 50000, embed_dim: int = 512, num_layers: int = 12, num_heads: int = 8, max_length: int = 4096, dropout: float = 0.1, graph_node_dim: int = 128, num_graph_layers: int = 3)

Source: ll_stepnet/stepnet/pretrain.py:344

`forward`

forward(input_ids: torch.Tensor, attention_mask: Optional[torch.Tensor] = None, topology_data: Optional[Dict] = None, labels: Optional[torch.Tensor] = None, masked_input_ids: Optional[torch.Tensor] = None, masked_labels: Optional[torch.Tensor] = None) -> Dict[str, torch.Tensor]

Source: ll_stepnet/stepnet/pretrain.py:385

Train both objectives simultaneously with STEP topology awareness.

Args: input_ids: For causal LM attention_mask: Attention mask topology_data: STEP topology (adjacency + node features) labels: Next token labels for causal LM masked_input_ids: For masked LM masked_labels: Original tokens for masked LM

`class STEPForMaskedLM(nn.Module)`

Source: ll_stepnet/stepnet/pretrain.py:221

Masked language modeling (BERT-style) for STEP files. Predict masked tokens from context.

STEP-aware architecture:

Token sequence modeling with bidirectional attention (can use STEPTransformerEncoder!)
Topology/geometry understanding via graph encoder
Fusion of both modalities

Train on raw STEP files with NO LABELS!

Methods

`init`

__init__(vocab_size: int = 50000, embed_dim: int = 512, num_layers: int = 12, num_heads: int = 8, max_length: int = 4096, dropout: float = 0.1, graph_node_dim: int = 128, num_graph_layers: int = 3)

Source: ll_stepnet/stepnet/pretrain.py:234

`forward`

forward(input_ids: torch.Tensor, attention_mask: Optional[torch.Tensor] = None, topology_data: Optional[Dict] = None, labels: Optional[torch.Tensor] = None) -> Dict[str, torch.Tensor]

Source: ll_stepnet/stepnet/pretrain.py:286

Args: input_ids: [batch_size, seq_len] - tokenized STEP with [MASK] tokens attention_mask: [batch_size, seq_len] - mask for padding topology_data: Optional dict with: - adjacency_matrix: [num_nodes, num_nodes] - node_features: [num_nodes, feature_dim] labels: [batch_size, seq_len] - original tokens (before masking)

Returns: Dictionary with: - logits: [batch_size, seq_len, vocab_size] - loss: scalar (if labels provided)

`class STEPForPropertyPrediction(nn.Module)`

Source: ll_stepnet/stepnet/tasks.py:236

STEP encoder with regression head. Predicts: Physical properties (volume, mass, surface area, etc.)

Use case: Predict part weight, bounding box dimensions, etc.

Methods

`init`

__init__(vocab_size: int = 50000, num_properties: int = 10, output_dim: int = 1024)

Source: ll_stepnet/stepnet/tasks.py:244

`forward`

forward(token_ids: torch.Tensor, topology_data: Optional[Dict] = None) -> torch.Tensor

Source: ll_stepnet/stepnet/tasks.py:261

Args: token_ids: [batch, seq_len] topology_data: Optional topology

Returns: properties: [batch, num_properties]

`class STEPForQA(nn.Module)`

Source: ll_stepnet/stepnet/tasks.py:325

STEP encoder for question answering. Predicts: Answer to questions about the CAD part.

Use case: Q: “How many holes does this part have?” A: “4”

Methods

`init`

__init__(step_vocab_size: int = 50000, text_vocab_size: int = 50000, output_dim: int = 1024)

Source: ll_stepnet/stepnet/tasks.py:335

`forward`

forward(step_token_ids: torch.Tensor, question_token_ids: torch.Tensor, answer_token_ids: Optional[torch.Tensor] = None, topology_data: Optional[Dict] = None) -> torch.Tensor

Source: ll_stepnet/stepnet/tasks.py:362

Args: step_token_ids: [batch, step_seq_len] question_token_ids: [batch, q_seq_len] answer_token_ids: [batch, a_seq_len] (for training) topology_data: Optional

Returns: logits: [batch, a_seq_len, vocab_size]

`generate`

generate(step_token_ids: torch.Tensor, question_token_ids: torch.Tensor, topology_data: Optional[Dict] = None, max_length: int = 64, num_beams: int = 4, temperature: float = 1.0, eos_token_id: int = 2, pad_token_id: int = 0, bos_token_id: int = 1) -> torch.Tensor

Source: ll_stepnet/stepnet/tasks.py:402

Generate answers using beam search decoding.

Args: step_token_ids: [batch, step_seq_len] STEP tokens question_token_ids: [batch, q_seq_len] question tokens topology_data: Optional topology dict max_length: Maximum answer length to generate num_beams: Number of beams for beam search (1 = greedy) temperature: Sampling temperature (lower = more deterministic) eos_token_id: End of sequence token ID pad_token_id: Padding token ID bos_token_id: Beginning of sequence token ID

Returns: generated_ids: [batch, generated_len] generated answer token IDs

`class STEPForSimilarity(nn.Module)`

Source: ll_stepnet/stepnet/tasks.py:279

STEP encoder for similarity/retrieval tasks. Predicts: Embedding for similar part search.

Use case: “Find similar CAD parts in database”

Methods

`init`

__init__(vocab_size: int = 50000, embedding_dim: int = 512)

Source: ll_stepnet/stepnet/tasks.py:287

`forward`

forward(token_ids: torch.Tensor, topology_data: Optional[Dict] = None) -> torch.Tensor

Source: ll_stepnet/stepnet/tasks.py:303

Args: token_ids: [batch, seq_len] topology_data: Optional topology

Returns: embeddings: [batch, embedding_dim] - L2 normalized

`class STEPGraphEncoder(nn.Module)`

Source: ll_stepnet/stepnet/encoder.py:350

Graph neural network for STEP/B-Rep topology.

Processes entity reference graphs from either:

ll_stepnet’s STEPTopologyBuilder (129-dim features: 128 numeric + 1 hash)
cadling’s TopologyGraph (48-dim features, native format)

The input_dim parameter controls which format is accepted. When working with cadling data, set input_dim=48 to accept cadling’s native topology features directly with no conversion.

Methods

`init`

__init__(input_dim: int = 48, node_dim: int = 128, edge_dim: int = 64, num_layers: int = 3)

Source: ll_stepnet/stepnet/encoder.py:363

`forward`

forward(node_features: torch.Tensor, adjacency_matrix: torch.Tensor) -> torch.Tensor

Source: ll_stepnet/stepnet/encoder.py:397

Args: node_features: [num_nodes, input_dim] adjacency_matrix: [num_nodes, num_nodes] — dense, sparse COO, or sparse CSR. Sparse inputs avoid O(N^2) memory for large B-Rep graphs.

Returns: updated_features: [num_nodes, node_dim]

`class STEPLearningCurveGenerator`

Source: ll_stepnet/stepnet/data_requirements.py:128

Generate learning curves for STEP models to determine data requirements.

This class trains models on varying dataset sizes and measures performance to establish empirical scaling relationships.

Methods

`init`

__init__(model_class: type, model_kwargs: Dict, train_kwargs: Optional[Dict] = None, device: str = 'cuda' if torch.cuda.is_available() else 'cpu')

Source: ll_stepnet/stepnet/data_requirements.py:136

Args: model_class: STEP model class (e.g., STEPForClassification) model_kwargs: Model initialization arguments train_kwargs: Training configuration (epochs, lr, etc.) device: Device to train on

`generate_learning_curve`

generate_learning_curve(train_dataset: STEPDataset, val_dataset: STEPDataset, sample_fractions: List[float], n_iterations: int = 3, save_dir: Optional[str] = None) -> Dict[str, np.ndarray]

Source: ll_stepnet/stepnet/data_requirements.py:170

Generate learning curve by training on varying dataset sizes.

Args: train_dataset: Full training dataset val_dataset: Validation dataset sample_fractions: List of fractions of training data to use (e.g., [0.1, 0.2, 0.5, 1.0]) n_iterations: Number of training runs per sample size save_dir: Optional directory to save checkpoints

Returns: Dictionary with learning curve data

`class STEPPropertyPredictionConfig`

Source: ll_stepnet/stepnet/config.py:74

Configuration for STEP property prediction model.

Methods

`init`

__init__(vocab_size: int = 50000, num_properties: int = 6, output_dim: int = 1024, dropout: float = 0.1) -> None

Source: ll_stepnet/stepnet/config.py

`class STEPQAConfig`

Source: ll_stepnet/stepnet/config.py:99

Configuration for STEP question answering model.

Methods

`init`

__init__(step_vocab_size: int = 50000, text_vocab_size: int = 50000, output_dim: int = 1024) -> None

Source: ll_stepnet/stepnet/config.py

`class STEPReserializationConfig`

Source: ll_stepnet/stepnet/config.py:134

Configuration for DFS reserialization of STEP files.

Methods

`init`

__init__(max_depth: int = 50, float_precision: int = 6, normalize_floats: bool = True, renumber_ids: bool = True, root_strategy: str = 'both', include_orphans: bool = True) -> None

Source: ll_stepnet/stepnet/config.py

`class STEPReserializedOutput`

Source: ll_stepnet/stepnet/reserialization.py:170

Output of DFS reserialization.

Methods

`init`

__init__(text: str, traversal_order: list[tuple[int, int]], entity_count: int, orphan_count: int, max_depth_reached: int, id_mapping: dict[int, int] = dict()) -> None

Source: ll_stepnet/stepnet/reserialization.py

`class STEPScalingLawAnalyzer`

Source: ll_stepnet/stepnet/data_requirements.py:416

Analyze scaling laws for STEP models and predict data requirements.

Fits power law relationships to learning curve data and extrapolates to estimate required dataset sizes for target performance.

Methods

`init`

__init__()

Source: ll_stepnet/stepnet/data_requirements.py:424

`fit_power_law`

fit_power_law(sample_sizes: np.ndarray, losses: np.ndarray, law_type: str = 'openai') -> Dict[str, float]

Source: ll_stepnet/stepnet/data_requirements.py:428

Fit power law to learning curve data.

Args: sample_sizes: Array of dataset sizes losses: Corresponding validation losses law_type: ‘openai’ for L(D) = (D_c/D)^alpha_D or ‘standard’ for Error = a*n^(-b) + c

Returns: Dictionary of fitted parameters

`predict_required_samples`

predict_required_samples(target_loss: float, current_sizes: Optional[np.ndarray] = None, current_losses: Optional[np.ndarray] = None) -> int

Source: ll_stepnet/stepnet/data_requirements.py:501

Predict number of samples needed to achieve target loss.

Args: target_loss: Desired target loss current_sizes: Optional array of current dataset sizes (for fitting) current_losses: Optional array of current losses (for fitting)

Returns: Estimated required sample size

`extrapolate_performance`

extrapolate_performance(target_size: int, current_sizes: Optional[np.ndarray] = None, current_losses: Optional[np.ndarray] = None) -> float

Source: ll_stepnet/stepnet/data_requirements.py:559

Predict performance at a given dataset size.

Args: target_size: Dataset size to predict for current_sizes: Optional current dataset sizes (for fitting) current_losses: Optional current losses (for fitting)

Returns: Predicted loss at target_size

`class STEPSimilarityConfig`

Source: ll_stepnet/stepnet/config.py:92

Configuration for STEP similarity model.

Methods

`init`

__init__(vocab_size: int = 50000, embedding_dim: int = 512) -> None

Source: ll_stepnet/stepnet/config.py

`class STEPStructuralAnnotator`

Source: ll_stepnet/stepnet/annotations.py:139

Generates structural annotations for STEP entity graphs.

Analyzes DFS-reserialized output and the underlying entity graph to produce human/LLM-readable summaries of the file structure.

Args: config: Annotation configuration. Uses defaults if None.

Methods

`init`

__init__(config: Optional[STEPAnnotationConfig] = None)

Source: ll_stepnet/stepnet/annotations.py:149

`annotate`

annotate(graph: STEPEntityGraph, reserialized: STEPReserializedOutput) -> STEPAnnotatedOutput

Source: ll_stepnet/stepnet/annotations.py:152

Generate annotations for reserialized output.

Args: graph: The entity graph (pre-reserialization). reserialized: The DFS reserialization output.

Returns: STEPAnnotatedOutput with summary, branch annotations, and full text.

`class STEPTokenizer`

Source: ll_stepnet/stepnet/tokenizer.py:14

Standard tokenizer for STEP files. Only handles text → token IDs conversion. No feature extraction or graph building.

Methods

`init`

__init__(vocab_size: int = 50000, config: Optional[STEPTokenizerConfig] = None)

Source: ll_stepnet/stepnet/tokenizer.py:21

Args: vocab_size: Maximum vocabulary size config: Optional STEPTokenizerConfig instance. When provided, its values override the vocab_size and max_length parameters.

`tokenize`

tokenize(text: str) -> List[str]

Source: ll_stepnet/stepnet/tokenizer.py:95

Split STEP text into tokens.

Args: text: Raw STEP text

Returns: List of token strings

`encode`

encode(text: str) -> List[int]

Source: ll_stepnet/stepnet/tokenizer.py:110

Encode STEP text to token IDs.

Args: text: Raw STEP text

Returns: List of token IDs

`decode`

decode(token_ids: List[int]) -> str

Source: ll_stepnet/stepnet/tokenizer.py:132

Decode token IDs back to text (approximate).

Args: token_ids: List of token IDs

Returns: Decoded text

`batch_encode`

batch_encode(texts: List[str], add_special_tokens: bool = True) -> Dict[str, List[List[int]]]

Source: ll_stepnet/stepnet/tokenizer.py:151

Batch encode multiple texts.

Args: texts: List of STEP text strings add_special_tokens: Add CLS and SEP tokens

Returns: Dictionary with token_ids

`class STEPTopologyBuilder`

Source: ll_stepnet/stepnet/topology.py:18

Builds topological graphs from STEP entity relationships. Separate from tokenization and feature extraction.

Methods

`init`

__init__()

Source: ll_stepnet/stepnet/topology.py:24

Initialize topology builder.

`build_reference_graph`

build_reference_graph(features_list: List[Dict]) -> Dict

Source: ll_stepnet/stepnet/topology.py:28

Build entity reference graph from extracted features.

Args: features_list: List of feature dicts from STEPFeatureExtractor

Returns: Dictionary with: - adjacency_dict: Dict[int, List[int]] - entity_id → referenced_ids - edge_list: List[Tuple[int, int]] - list of (from, to) edges - num_nodes: int

`build_adjacency_matrix`

build_adjacency_matrix(reference_graph: Dict) -> torch.Tensor

Source: ll_stepnet/stepnet/topology.py:74

Convert reference graph to sparse adjacency matrix.

Returns a sparse COO tensor to avoid O(N^2) memory on large B-Rep graphs.

Args: reference_graph: Output from build_reference_graph

Returns: Sparse COO adjacency matrix [N, N] where N = num_nodes

`build_edge_index`

build_edge_index(reference_graph: Dict) -> torch.Tensor

Source: ll_stepnet/stepnet/topology.py:114

Build edge index in PyTorch Geometric format.

Args: reference_graph: Output from build_reference_graph

Returns: Edge index tensor [2, num_edges] for PyG

`compute_node_degrees`

compute_node_degrees(reference_graph: Dict) -> Dict[int, Dict[str, int]]

Source: ll_stepnet/stepnet/topology.py:140

Compute in-degree and out-degree for each node.

Args: reference_graph: Output from build_reference_graph

Returns: Dict mapping node_id → {‘in_degree’: int, ‘out_degree’: int}

`identify_topology_types`

identify_topology_types(features_list: List[Dict]) -> Dict[str, List[int]]

Source: ll_stepnet/stepnet/topology.py:158

Categorize entities by topological role.

Args: features_list: List of feature dicts

Returns: Dict mapping category → list of entity IDs

`build_node_features`

build_node_features(features_list: List[Dict], reference_graph: Dict) -> torch.Tensor

Source: ll_stepnet/stepnet/topology.py:200

Build node feature matrix from extracted features.

Args: features_list: List of feature dicts from STEPFeatureExtractor reference_graph: Output from build_reference_graph

Returns: Node features tensor [num_nodes, feature_dim]

`build_complete_topology`

build_complete_topology(features_list: List[Dict], compact: bool = True) -> Dict

Source: ll_stepnet/stepnet/topology.py:242

Build complete topology representation.

Args: features_list: List of feature dicts from STEPFeatureExtractor. compact: If True (default), use build_compact_node_features() to produce 48-dim features in cadling’s native layout. This matches the default input_dim=48 of :class:STEPGraphEncoder. Pass compact=False to use the legacy build_node_features() (129-dim: 128 numeric + 1 type hash).

Returns: Complete topology dictionary with: - reference_graph - adjacency_matrix - edge_index - node_degrees - topology_types - node_features - num_nodes - num_edges

`build_coedge_structure`

build_coedge_structure(features_list: List[Dict]) -> Dict

Source: ll_stepnet/stepnet/topology.py:288

Build coedge adjacency structure from STEP topology.

In B-Rep topology, each topological edge is shared by (at most) two adjacent faces. Each such sharing creates two oriented coedges — one per face. This method reconstructs the coedge-level graph with next/prev/mate pointers from the STEP entity hierarchy.

The coedge structure is the primary input format for the BRepNet architecture (see cadling.models.segmentation.architectures.brep_net).

Args: features_list: List of feature dicts from STEPFeatureExtractor. Each dict should have ‘entity_id’, ‘entity_type’, ‘references’, and optionally ‘numeric_params’.

Returns: Dictionary with: - coedge_features: torch.Tensor [num_coedges, feature_dim] - next_indices: torch.Tensor [num_coedges] - prev_indices: torch.Tensor [num_coedges] - mate_indices: torch.Tensor [num_coedges] - face_indices: torch.Tensor [num_coedges] - num_coedges: int - num_faces: int - face_entity_ids: List[int] - entity IDs of faces - edge_entity_ids: List[int] - entity IDs of edges

`build_compact_node_features`

build_compact_node_features(features_list: List[Dict], reference_graph: Optional[Dict] = None, feature_dim: int = 48) -> torch.Tensor

Source: ll_stepnet/stepnet/topology.py:540

Build node features in cadling-compatible compact format.

Produces node features in the same 48-dim layout used by cadling’s TopologyGraph, so both ll_stepnet and cadling feed the same native representation into STEPGraphEncoder (default input_dim=48) and geotoken’s GraphTokenizer.

Feature layout (48 dims): [0:32] — first 32 numeric parameters (zero-padded) [32:48] — entity type one-hot (16 common B-Rep types)

Args: features_list: Feature dicts from STEPFeatureExtractor. reference_graph: Optional reference graph from build_reference_graph(). If provided, uses its num_nodes and id_to_idx to ensure node_features shape matches adjacency_matrix. If None, uses len(features_list) as num_nodes (legacy behavior). feature_dim: Output feature dimension (default 48).

Returns: torch.Tensor of shape [num_nodes, feature_dim].

`to_cadling_topology_graph`

to_cadling_topology_graph(topo_dict: Dict)

Source: ll_stepnet/stepnet/topology.py:622

Convert a build_complete_topology() output dict to a cadling TopologyGraph.

This closes the round-trip: cadling → ll_stepnet → cadling. The cadling TopologyGraph is constructed from the adjacency matrix and node features stored in topo_dict.

Args: topo_dict: Dictionary returned by :meth:build_complete_topology. Must contain adjacency_matrix (tensor or array) and node_features (tensor or array).

Returns: A cadling.datamodel.base_models.TopologyGraph instance.

Raises: ImportError: If cadling is not installed.

`class STEPTrainer`

Source: ll_stepnet/stepnet/trainer.py:39

Trainer for STEP models. Handles training loop, validation, and checkpointing.

Methods

`init`

__init__(model: nn.Module, train_dataloader: DataLoader, val_dataloader: Optional[DataLoader] = None, optimizer: Optional[torch.optim.Optimizer] = None, loss_fn: Optional[Callable] = None, device: str = 'auto', checkpoint_dir: Optional[str] = None)

Source: ll_stepnet/stepnet/trainer.py:45

Args: model: STEP model to train train_dataloader: Training data loader val_dataloader: Optional validation data loader optimizer: Optimizer (creates AdamW if None) loss_fn: Loss function (creates based on task if None) device: Device to train on checkpoint_dir: Directory to save checkpoints

`train_epoch`

train_epoch() -> float

Source: ll_stepnet/stepnet/trainer.py:213

Train for one epoch.

Returns: Average training loss

`validate`

validate() -> Dict[str, float]

Source: ll_stepnet/stepnet/trainer.py:249

Run validation.

Returns: Dictionary with validation metrics

`train`

train(num_epochs: int, save_every: int = 1)

Source: ll_stepnet/stepnet/trainer.py:285

Train for multiple epochs.

Args: num_epochs: Number of epochs to train save_every: Save checkpoint every N epochs

`save_checkpoint`

save_checkpoint(filename: str)

Source: ll_stepnet/stepnet/trainer.py:333

Save model checkpoint.

`load_checkpoint`

load_checkpoint(filename: str)

Source: ll_stepnet/stepnet/trainer.py:347

Load model checkpoint.

`save_history`

save_history()

Source: ll_stepnet/stepnet/trainer.py:361

Save training history to JSON.

`class STEPTransformerDecoder(nn.Module)`

Source: ll_stepnet/stepnet/encoder.py:112

Transformer decoder for STEP token sequences with causal attention. Used for autoregressive generation (GPT-style).

Supports optional conditioning via a conditioner module (TextConditioner, ImageConditioner, or MultiModalConditioner) that applies cross-attention to inject text/image features. Following Text2CAD, the first N blocks can skip cross-attention via the conditioner’s skip_cross_attention_blocks.

Methods

`init`

__init__(vocab_size: int = 50000, embed_dim: int = 256, num_heads: int = 8, num_layers: int = 6, ff_dim: int = 1024, dropout: float = 0.1)

Source: ll_stepnet/stepnet/encoder.py:123

`forward`

forward(token_ids: torch.Tensor, attention_mask: Optional[torch.Tensor] = None, cross_attention_memory: Optional[torch.Tensor] = None, conditioner: Optional[nn.Module] = None, conditioning_inputs: Optional[Dict] = None) -> torch.Tensor

Source: ll_stepnet/stepnet/encoder.py:226

Args: token_ids: [batch_size, seq_len] attention_mask: [batch_size, seq_len] optional cross_attention_memory: [batch_size, mem_len, embed_dim] optional When provided, used as the memory (key/value) for the decoder cross-attention layers. When None, the decoder uses self-attention only (existing behaviour). conditioner: Optional conditioning module (TextConditioner, ImageConditioner, or MultiModalConditioner). When provided, applies cross-attention conditioning after each decoder block. conditioning_inputs: Dict with conditioning data for the conditioner: - text_input_ids: [B, L] text token ids - text_attention_mask: [B, L] text padding mask - pixel_values: [B, C, H, W] image tensors

Returns: encoded: [batch_size, seq_len, embed_dim]

`class STEPTransformerEncoder(nn.Module)`

Source: ll_stepnet/stepnet/encoder.py:38

Transformer encoder for STEP token sequences. Standard transformer architecture.

Methods

`init`

__init__(vocab_size: int = 50000, embed_dim: int = 256, num_heads: int = 8, num_layers: int = 6, ff_dim: int = 1024, dropout: float = 0.1)

Source: ll_stepnet/stepnet/encoder.py:44

`forward`

forward(token_ids: torch.Tensor, attention_mask: Optional[torch.Tensor] = None) -> torch.Tensor

Source: ll_stepnet/stepnet/encoder.py:78

Args: token_ids: [batch_size, seq_len] attention_mask: [batch_size, seq_len] optional

Returns: encoded: [batch_size, seq_len, embed_dim]

`class STEPVAE(nn.Module)`

Source: ll_stepnet/stepnet/vae.py:35

Variational Autoencoder wrapping existing STEP encoder/decoder.

Follows the DeepCAD architecture: sequences of CAD command tokens are encoded into a Gaussian latent, then decoded autoregressively back to command-type and parameter predictions.

Args: encoder_config: Configuration object with vocab_size, token_embed_dim, num_transformer_layers, dropout, etc. latent_dim: Dimensionality of the latent vector z. kl_weight: Maximum weight applied to the KL divergence term. num_command_types: Number of distinct CAD command types. num_param_levels: Number of quantisation levels per parameter. max_seq_len: Maximum sequence length the decoder can produce.

Methods

`init`

__init__(encoder_config, latent_dim: int = DEFAULT_QUANTIZATION_LEVELS, kl_weight: float = 1.0, num_command_types: int = NUM_COMMAND_TYPES, num_param_levels: int = DEFAULT_QUANTIZATION_LEVELS, max_seq_len: int = DEFAULT_MAX_SEQ_LEN) -> None

Source: ll_stepnet/stepnet/vae.py:52

`set_epoch`

set_epoch(epoch: int) -> None

Source: ll_stepnet/stepnet/vae.py:133

Update current epoch for KL warmup scheduling.

Args: epoch: Current training epoch (0-indexed).

`set_kl_warmup_epochs`

set_kl_warmup_epochs(warmup_epochs: int) -> None

Source: ll_stepnet/stepnet/vae.py:141

Set the number of epochs over which beta warms up.

Args: warmup_epochs: Number of warmup epochs.

`encode`

encode(token_ids: torch.Tensor, attention_mask: Optional[torch.Tensor] = None) -> tuple[torch.Tensor, torch.Tensor]

Source: ll_stepnet/stepnet/vae.py:149

Encode token sequence to Gaussian parameters.

Args: token_ids: [batch_size, seq_len] token indices. attention_mask: [batch_size, seq_len] 1=real, 0=pad.

Returns: Tuple of (mu, log_var) each [batch_size, latent_dim].

`reparameterize`

reparameterize(mu: torch.Tensor, log_var: torch.Tensor) -> torch.Tensor

Source: ll_stepnet/stepnet/vae.py:180

Reparameterization trick: z = mu + eps * exp(0.5 * log_var).

Args: mu: Mean of the posterior, [batch_size, latent_dim]. log_var: Log-variance of the posterior, [batch_size, latent_dim].

Returns: Sampled latent z of shape [batch_size, latent_dim].

`decode`

decode(z: torch.Tensor, seq_len: Optional[int] = None, conditioner: Optional[nn.Module] = None, conditioning_inputs: Optional[Dict] = None) -> torch.Tensor

Source: ll_stepnet/stepnet/vae.py:198

Decode a latent vector to hidden states.

Optionally applies text/image conditioning via a conditioner module that injects cross-attention. Following Text2CAD, the first N decoder blocks can skip cross-attention (controlled by the conditioner’s skip_cross_attention_blocks parameter).

Args: z: Latent vector [batch_size, latent_dim]. seq_len: Length of output sequence. Defaults to max_seq_len. conditioner: Optional conditioning module (TextConditioner, ImageConditioner, or MultiModalConditioner). conditioning_inputs: Dict with conditioning data for the conditioner: - text_input_ids: [B, L] text token ids - text_attention_mask: [B, L] text padding mask - pixel_values: [B, C, H, W] image tensors

Returns: Hidden states [batch_size, seq_len, embed_dim].

`forward`

forward(token_ids: torch.Tensor, attention_mask: Optional[torch.Tensor] = None, command_targets: Optional[torch.Tensor] = None, param_targets: Optional[torch.Tensor] = None) -> Dict[str, torch.Tensor]

Source: ll_stepnet/stepnet/vae.py:307

Full forward pass: encode, reparameterize, decode, compute losses.

Args: token_ids: [batch_size, seq_len] input token ids. attention_mask: [batch_size, seq_len] padding mask. command_targets: [batch_size, seq_len] ground-truth command types. param_targets: [batch_size, seq_len, 16] ground-truth params.

Returns: Dictionary with z, mu, log_var, command_logits, param_logits, kl_loss, and optionally recon_loss and loss.

`sample`

sample(num_samples: int = 1, seq_len: Optional[int] = None, device: Optional[torch.device] = None, conditioner: Optional[nn.Module] = None, conditioning_inputs: Optional[Dict] = None) -> Dict[str, torch.Tensor]

Source: ll_stepnet/stepnet/vae.py:396

Sample new CAD sequences from the prior N(0, I).

Optionally applies text/image conditioning via a conditioner module. Following Text2CAD, the first N decoder blocks skip cross-attention to allow initial CAD structure formation before conditioning kicks in.

Args: num_samples: Number of sequences to generate. seq_len: Output sequence length. Defaults to max_seq_len. device: Target device for the generated tensors. conditioner: Optional conditioning module (TextConditioner, ImageConditioner, or MultiModalConditioner). conditioning_inputs: Dict with conditioning data: - text_input_ids: [B, L] text token ids - text_attention_mask: [B, L] text padding mask - pixel_values: [B, C, H, W] image tensors

Returns: Dictionary with command_preds [N, S] and param_preds [N, S, 16].

`class SinusoidalTimestepEmbedding(nn.Module)`

Source: ll_stepnet/stepnet/diffusion.py:312

Sinusoidal positional embedding for diffusion timesteps.

Maps integer timesteps to dense vectors using sin/cos functions, then projects through a 2-layer MLP.

Args: embed_dim: Output embedding dimension.

Methods

`init`

__init__(embed_dim: int = 1024) -> None

Source: ll_stepnet/stepnet/diffusion.py:322

`forward`

forward(timesteps: torch.Tensor) -> torch.Tensor

Source: ll_stepnet/stepnet/diffusion.py:331

Compute timestep embeddings.

Args: timesteps: Integer timestep indices [B].

Returns: Embeddings [B, embed_dim].

`class StreamingCadlingConfig`

Source: ll_stepnet/stepnet/config.py:223

Configuration for streaming cadling data into ll_stepnet trainers.

When passed to a streaming trainer’s __init__, the trainer will lazy-import cadling.data.streaming.CADStreamingDataset and build a streaming data pipeline from cadling data automatically.

Attributes: dataset_id: HuggingFace dataset ID or local path to cadling data. split: Dataset split ("train", "val", "test"). streaming: Whether to use HuggingFace streaming mode. batch_size: Training batch size. shuffle: Whether to shuffle the stream. shuffle_buffer_size: Buffer size for streaming shuffle. max_samples: Maximum samples to load (None = all). max_commands: Maximum command sequence length (pad/truncate). compact_topology: Use 48-dim compact topology (cadling native).

lazy_load_topology: Whether to load topology on-demand.
topology_cache_size: Maximum topologies to cache in memory.
preprocess_fn: Preprocessing function name ('geotoken', 'tokenize', None).
prefetch_factor: Number of batches to prefetch in background.
max_memory_mb: Maximum memory for cached data.
chunk_size: Number of samples per processing chunk.
include_graph_data: Whether to include graph/topology data.
graph_feature_dim: Expected node feature dimension (48 cadling, 129 legacy).
num_workers: Number of dataloader workers.

Methods

`init`

__init__(dataset_id: str = '', split: str = 'train', streaming: bool = True, batch_size: int = 8, shuffle: bool = True, shuffle_buffer_size: int = 10000, max_samples: Optional[int] = None, max_commands: int = DEFAULT_MAX_SEQ_LEN, lazy_load_topology: bool = True, topology_cache_size: int = 1000, preprocess_fn: Optional[str] = None, prefetch_factor: int = 2, max_memory_mb: int = 4096, chunk_size: int = 1000, include_graph_data: bool = False, graph_feature_dim: int = 48, compact_topology: bool = True, num_workers: int = 4) -> None

Source: ll_stepnet/stepnet/config.py

`class StreamingDiffusionTrainer`

Source: ll_stepnet/stepnet/training/streaming_diffusion_trainer.py:34

Trainer for diffusion models with streaming dataset support.

Combines the step-based training loop from StreamingVAETrainer with the denoising diffusion training from DiffusionTrainer. Key features:

Step-based scheduling: Uses total_steps instead of epochs
EMA model maintenance: Updates EMA weights per step for stable generation
Noise scheduling: Configurable noise schedule with step-based warmup
Mid-stream checkpointing: Save/resume within a streaming epoch
Epoch-based shuffle seed: set_epoch() for reproducible shuffling

Usage: >>> from cadling.data.streaming import CADStreamingDataset, CADStreamingConfig >>> >>> config = CADStreamingConfig( … dataset_id=“latticelabs/deepcad-latents”, … batch_size=8, … ) >>> dataset = CADStreamingDataset(config) >>> >>> trainer = StreamingDiffusionTrainer( … model=diffusion_model, … scheduler=noise_scheduler, … dataset=dataset, … total_steps=100000, … warmup_steps=1000, … ema_decay=0.9999, … ) >>> trainer.train()

Args: model: Denoising model that takes (noisy_input, timestep) and predicts noise. scheduler: Noise scheduler with add_noise() method, providing the beta schedule and noise levels for each timestep. dataset: Streaming dataset with iter() method. val_dataset: Optional validation dataset. total_steps: Total training steps. warmup_steps: Steps for learning rate warmup. ema_decay: Decay rate for exponential moving average (default 0.9999). optimizer: Optional optimizer (creates AdamW if None). lr_scheduler: Optional LR scheduler. device: Device string (‘auto’ selects CUDA if available). checkpoint_dir: Directory for saving checkpoints. log_every: Log metrics every N steps. eval_every: Run validation every N steps. save_every: Save checkpoint every N steps. sample_every: Generate samples every N steps. gradient_accumulation_steps: Accumulate gradients over N steps. max_grad_norm: Maximum gradient norm for clipping. learning_rate: Learning rate for optimizer.

Methods

`init`

__init__(model: 'nn.Module', scheduler: Any, dataset: Any = None, val_dataset: Optional[Any] = None, total_steps: int = 100000, warmup_steps: int = 1000, ema_decay: float = 0.9999, optimizer: Optional[Any] = None, lr_scheduler: Optional[Any] = None, device: str = 'auto', checkpoint_dir: Optional[str] = None, log_every: int = 100, eval_every: int = 5000, save_every: int = 10000, sample_every: int = 10000, gradient_accumulation_steps: int = 1, max_grad_norm: float = 1.0, learning_rate: float = 0.0001, dataset_config: Optional[Any] = None) -> None

Source: ll_stepnet/stepnet/training/streaming_diffusion_trainer.py:87

`train_step`

train_step(batch: Dict[str, Any]) -> Dict[str, float]

Source: ll_stepnet/stepnet/training/streaming_diffusion_trainer.py:338

Execute a single training step.

Args: batch: Training batch.

Returns: Dictionary with step metrics.

`validate`

validate() -> Dict[str, float]

Source: ll_stepnet/stepnet/training/streaming_diffusion_trainer.py:398

Run validation on the validation dataset.

Returns: Dictionary with validation metrics.

`sample_and_visualize`

sample_and_visualize(num_samples: int = 4) -> None

Source: ll_stepnet/stepnet/training/streaming_diffusion_trainer.py:460

Generate samples using the EMA model and save visualizations.

`train`

train() -> None

Source: ll_stepnet/stepnet/training/streaming_diffusion_trainer.py:514

Run the full training loop.

Iterates through the streaming dataset, training until total_steps is reached. Supports resuming from checkpoints and handles epoch boundaries for streaming datasets.

`save_checkpoint`

save_checkpoint(filename: str) -> None

Source: ll_stepnet/stepnet/training/streaming_diffusion_trainer.py:611

Save training checkpoint.

Args: filename: Checkpoint filename.

`load_checkpoint`

load_checkpoint(filename: str) -> None

Source: ll_stepnet/stepnet/training/streaming_diffusion_trainer.py:639

Load training checkpoint.

Args: filename: Checkpoint filename.

`save_history`

save_history() -> None

Source: ll_stepnet/stepnet/training/streaming_diffusion_trainer.py:667

Save training history to JSON.

`class StreamingGANTrainer`

Source: ll_stepnet/stepnet/training/streaming_gan_trainer.py:31

Trainer for WGAN-GP models with streaming dataset support.

Combines the step-based training loop from StreamingVAETrainer with the Wasserstein GAN with gradient penalty from GANTrainer. Key features:

Step-based scheduling: Uses total_steps instead of epochs
Alternating critic/generator updates: n_critic critic steps per generator step
Gradient penalty: WGAN-GP for stable training
Mid-stream checkpointing: Save/resume within a streaming epoch
Epoch-based shuffle seed: set_epoch() for reproducible shuffling

Usage: >>> from cadling.data.streaming import CADStreamingDataset, CADStreamingConfig >>> >>> config = CADStreamingConfig( … dataset_id=“latticelabs/deepcad-latents”, … batch_size=8, … ) >>> dataset = CADStreamingDataset(config) >>> >>> trainer = StreamingGANTrainer( … generator=generator_model, … critic=critic_model, … dataset=dataset, … total_steps=100000, … n_critic=5, … lambda_gp=10.0, … ) >>> trainer.train()

Args: generator: Generator network mapping noise -> latent vectors. critic: Critic (discriminator) network scoring latent vectors. dataset: Streaming dataset with iter() method. total_steps: Total training steps (generator updates). warmup_steps: Steps for learning rate warmup. n_critic: Number of critic updates per generator update (default 5). lambda_gp: Gradient penalty coefficient (default 10.0 per WGAN-GP paper). optimizer_gen: Optional generator optimizer. optimizer_critic: Optional critic optimizer. device: Device string (‘auto’ selects CUDA if available). checkpoint_dir: Directory for saving checkpoints. log_every: Log metrics every N steps. eval_every: Run validation every N steps. save_every: Save checkpoint every N steps. sample_every: Generate samples every N steps. max_grad_norm: Maximum gradient norm for clipping. lr_gen: Learning rate for generator. lr_critic: Learning rate for critic.

Methods

`init`

__init__(generator: 'nn.Module', critic: 'nn.Module', dataset: Any = None, total_steps: int = 100000, warmup_steps: int = 1000, n_critic: int = 5, lambda_gp: float = 10.0, optimizer_gen: Optional[Any] = None, optimizer_critic: Optional[Any] = None, device: str = 'auto', checkpoint_dir: Optional[str] = None, log_every: int = 100, eval_every: int = 5000, save_every: int = 10000, sample_every: int = 10000, max_grad_norm: float = 1.0, lr_gen: float = 0.0001, lr_critic: float = 0.0001, dataset_config: Optional[Any] = None) -> None

Source: ll_stepnet/stepnet/training/streaming_gan_trainer.py:83

`train_critic_step`

train_critic_step(real_latents: 'torch.Tensor') -> Dict[str, float]

Source: ll_stepnet/stepnet/training/streaming_gan_trainer.py:328

Perform one critic training step.

Args: real_latents: Real latent vectors.

Returns: Dictionary with critic metrics.

`train_generator_step`

train_generator_step(batch_size: int) -> Dict[str, float]

Source: ll_stepnet/stepnet/training/streaming_gan_trainer.py:373

Perform one generator training step.

Args: batch_size: Number of samples to generate.

Returns: Dictionary with generator metrics.

`validate`

validate() -> Dict[str, float]

Source: ll_stepnet/stepnet/training/streaming_gan_trainer.py:408

Compute validation metrics for the GAN.

Returns: Dictionary with validation metrics.

`sample`

sample(num_samples: int = 16) -> 'torch.Tensor'

Source: ll_stepnet/stepnet/training/streaming_gan_trainer.py:473

Generate latent vectors using the trained generator.

Args: num_samples: Number of samples to generate.

Returns: Generated latent vectors.

`train`

train() -> None

Source: ll_stepnet/stepnet/training/streaming_gan_trainer.py:488

Run the full training loop.

Iterates through the streaming dataset, training until total_steps is reached. Uses n_critic critic updates per generator update.

`save_checkpoint`

save_checkpoint(filename: str) -> None

Source: ll_stepnet/stepnet/training/streaming_gan_trainer.py:592

Save training checkpoint.

`load_checkpoint`

load_checkpoint(filename: str) -> None

Source: ll_stepnet/stepnet/training/streaming_gan_trainer.py:618

Load training checkpoint.

`save_history`

save_history() -> None

Source: ll_stepnet/stepnet/training/streaming_gan_trainer.py:643

Save training history to JSON.

`class StreamingVAETrainer`

Source: ll_stepnet/stepnet/training/streaming_vae_trainer.py:31

Trainer for VAE models with streaming dataset support.

Extends the standard VAETrainer to work with HuggingFace IterableDatasets and CADStreamingDataset. Key differences from epoch-based training:

Step-based scheduling: Uses total_steps instead of epochs
KL warmup on global_step: Linear warmup over warmup_steps
Mid-stream checkpointing: Save/resume within a streaming epoch
Epoch-based shuffle seed: set_epoch() for reproducible shuffling

Usage: >>> from cadling.data.streaming import CADStreamingDataset, CADStreamingConfig >>> >>> config = CADStreamingConfig( … dataset_id=“latticelabs/deepcad-sequences”, … batch_size=8, … ) >>> dataset = CADStreamingDataset(config) >>> >>> trainer = StreamingVAETrainer( … model=vae_model, … dataset=dataset, … total_steps=100000, … warmup_steps=5000, … ) >>> trainer.train()

Args: model: VAE model with encode(), decode(), and forward() returning (reconstructed, mu, log_var). dataset: Streaming dataset with iter() method. val_dataset: Optional validation dataset. total_steps: Total training steps. warmup_steps: Steps for KL divergence warmup. optimizer: Optional optimizer (creates AdamW if None). scheduler: Optional LR scheduler. device: Device string (‘auto’ selects CUDA if available). checkpoint_dir: Directory for saving checkpoints. log_every: Log metrics every N steps. eval_every: Run validation every N steps. save_every: Save checkpoint every N steps. gradient_accumulation_steps: Accumulate gradients over N steps. max_grad_norm: Maximum gradient norm for clipping.

Methods

`init`

__init__(model: 'nn.Module', dataset: Any = None, val_dataset: Optional[Any] = None, total_steps: int = 100000, warmup_steps: int = 5000, optimizer: Optional[Any] = None, scheduler: Optional[Any] = None, device: str = 'auto', checkpoint_dir: Optional[str] = None, log_every: int = 100, eval_every: int = 5000, save_every: int = 10000, gradient_accumulation_steps: int = 1, max_grad_norm: float = 1.0, learning_rate: float = 0.0001, dataset_config: Optional[Any] = None) -> None

Source: ll_stepnet/stepnet/training/streaming_vae_trainer.py:77

`train_step`

train_step(batch: Dict[str, Any]) -> Dict[str, float]

Source: ll_stepnet/stepnet/training/streaming_vae_trainer.py:296

Execute a single training step.

Args: batch: Training batch.

Returns: Dictionary with step metrics.

`validate`

validate() -> Dict[str, float]

Source: ll_stepnet/stepnet/training/streaming_vae_trainer.py:348

Run validation on the validation dataset.

Returns: Dictionary with validation metrics.

`train`

train() -> None

Source: ll_stepnet/stepnet/training/streaming_vae_trainer.py:406

Run the full training loop.

Iterates through the streaming dataset, training until total_steps is reached. Supports resuming from checkpoints and handles epoch boundaries for streaming datasets.

`save_checkpoint`

save_checkpoint(filename: str) -> None

Source: ll_stepnet/stepnet/training/streaming_vae_trainer.py:511

Save training checkpoint.

Args: filename: Checkpoint filename.

`load_checkpoint`

load_checkpoint(filename: str) -> None

Source: ll_stepnet/stepnet/training/streaming_vae_trainer.py:537

Load training checkpoint.

Args: filename: Checkpoint filename.

`save_history`

save_history() -> None

Source: ll_stepnet/stepnet/training/streaming_vae_trainer.py:563

Save training history to JSON.

`class StructuralSummary`

Source: ll_stepnet/stepnet/annotations.py:61

File-level structural summary.

Attributes: total_entities: Total number of entities in the graph. root_count: Number of root entities identified. max_depth: Maximum DFS depth reached. type_distribution: Counts of each entity type. dominant_category: Classified category (B-Rep, Geometry, Assembly, Mixed).

Methods

`format`

format(max_types: int = 5) -> str

Source: ll_stepnet/stepnet/annotations.py:78

Format summary as text string.

Args: max_types: Maximum number of type counts to include.

Returns: Formatted summary string with [SUMMARY] delimiters.

`init`

__init__(total_entities: int, root_count: int, max_depth: int, type_distribution: dict[str, int] = dict(), dominant_category: str = 'unknown') -> None

Source: ll_stepnet/stepnet/annotations.py

`class StructuredDiffusion(nn.Module)`

Source: ll_stepnet/stepnet/diffusion.py:601

Four-stage sequential diffusion following BrepGen.

Stages (each with its own CADDenoiser): 1. Face positions 2. Face geometry 3. Edge positions 4. Edge-vertex geometry

Each stage is conditioned on the denoised output of the preceding stage via concatenation.

Args: config: DiffusionConfig with architectural hyperparameters.

Methods

`init`

__init__(config: Optional[object] = None) -> None

Source: ll_stepnet/stepnet/diffusion.py:624

`forward_train`

forward_train(stage_data: Optional[Dict[str, torch.Tensor]] = None, geometry: Optional[Dict[str, torch.Tensor]] = None) -> Dict[str, torch.Tensor]

Source: ll_stepnet/stepnet/diffusion.py:738

Training forward: denoising loss per stage + codec reconstruction.

Each stage independently samples a random timestep, adds noise, and predicts the noise (teacher-forced on a pooled summary of the previous stage’s clean tokens). When geometry is supplied, the clean per-stage token latents are the codec’s encoding of the real geometry, so the diffusion learns to denoise in the codec’s latent space, and a masked-MSE reconstruction term trains the codec itself — making the latent<->geometry mapping coherent end to end.

Args: stage_data: Optional explicit clean stage latents, each [B, S, D] (or [B, D], which is promoted to a single token). Overrides the geometry-derived targets per stage when both are given. geometry: Optional dict with face_grids [B, N_faces, U, V, 3], edge_points [B, N_edges, M, 3] and optional face_mask / edge_mask [B, N] (True = padded/empty primitive).

Returns: Dictionary with {stage_name}_loss denoising terms, optional face_recon_loss / edge_recon_loss, and total_loss.

`sample`

sample(batch_size: int = 1, device: Optional[torch.device] = None, use_pndm: bool = True) -> Dict[str, torch.Tensor]

Source: ll_stepnet/stepnet/diffusion.py:831

Generate new structured CAD data via sequential denoising.

Args: batch_size: Number of samples. device: Target device. use_pndm: Whether to use PNDM accelerated sampling.

Returns: Dictionary mapping each stage name to its denoised token latents ([B, N_faces or N_edges, D]) plus decoded geometry tensors face_grids [B, N_faces, U, V, 3] and edge_points [B, N_edges, M, 3].

`sample_with_log_prob`

sample_with_log_prob(batch_size: int = 1, device: Optional[torch.device] = None, num_inference_steps: Optional[int] = None, eta: float = 1.0) -> Tuple[Dict[str, torch.Tensor], torch.Tensor, torch.Tensor]

Source: ll_stepnet/stepnet/diffusion.py:907

DDPO sampling: geometry plus a differentiable trajectory log-prob.

Runs stochastic DDIM reverse diffusion (eta > 0) over all four stages without torch.no_grad and accumulates the per-step Gaussian log-probabilities from :meth:DDPMScheduler.ddim_step_with_log_prob. Each step’s transition mean depends on its denoiser’s epsilon prediction, so the summed log-prob backpropagates into every denoiser (and the stage conditioning projections) — enabling real diffusion policy-gradient (REINFORCE / DDPO) reinforcement learning. This is the path that makes the RL signal train the actual model parameters, replacing the previously decoupled noise-prior stand-in.

Trajectory states are detached between steps (each x_t is treated as a fixed state in the action history, as in DDPO), which bounds memory while preserving the gradient inside each transition’s log-prob.

Args: batch_size: Number of samples to draw. device: Target device (defaults to the model’s device). num_inference_steps: Reverse steps per stage (defaults to the scheduler’s inference_steps). eta: DDIM stochasticity. Coerced to 1.0 if <= 0 because a deterministic trajectory has a degenerate (delta) policy that cannot provide a usable policy gradient.

Returns: Tuple (results, total_log_prob, total_entropy): * results: {stage_name: token latent [B, N, D]} plus decoded face_grids / edge_points (all detached). * total_log_prob: [B] sum of per-step log-probs across all stages, connected to the model parameters. * total_entropy: [B] sum of per-step Gaussian entropies.

`class TextConditioner(nn.Module)`

Source: ll_stepnet/stepnet/conditioning.py:131

Condition CAD generation on natural-language descriptions.

Wraps a frozen BERT or CLIP text encoder, projects its hidden states to the decoder dimension, and applies AdaptiveLayer blocks for cross-attention injection.

Following Text2CAD, the first skip_cross_attention_blocks decoder blocks skip cross-attention to allow initial CAD structure formation before conditioning kicks in. When block_index is passed to :meth:forward, the cross-attention layers are only applied when block_index >= skip_cross_attention_blocks.

Args: encoder_name: Hugging Face model identifier (e.g. “bert-base-uncased”). conditioning_dim: Dimension of the conditioning embeddings (must match the decoder hidden dim). freeze_encoder: Whether to freeze the pretrained encoder weights. num_adaptive_layers: Number of AdaptiveLayer blocks. skip_cross_attention_blocks: Number of initial decoder blocks that skip cross-attention (Text2CAD default = 2).

Methods

`init`

__init__(encoder_name: str = 'bert-base-uncased', conditioning_dim: int = 1024, freeze_encoder: bool = True, num_adaptive_layers: int = 1, skip_cross_attention_blocks: int = 2) -> None

Source: ll_stepnet/stepnet/conditioning.py:155

`encode_text`

encode_text(input_ids: torch.Tensor, attention_mask: Optional[torch.Tensor] = None) -> torch.Tensor

Source: ll_stepnet/stepnet/conditioning.py:253

Encode tokenised text into conditioning embeddings.

Args: input_ids: [B, L] token ids from the text tokenizer. attention_mask: [B, L] padding mask.

Returns: Conditioning embeddings [B, L, conditioning_dim].

`forward`

forward(hidden_states: torch.Tensor, text_input_ids: torch.Tensor, text_attention_mask: Optional[torch.Tensor] = None, block_index: Optional[int] = None) -> torch.Tensor

Source: ll_stepnet/stepnet/conditioning.py:281

Condition decoder hidden states on text.

Following Text2CAD, early decoder blocks can skip cross-attention to let the initial CAD structure form before conditioning kicks in. When block_index is provided and is less than self.skip_cross_attention_blocks, the hidden states are returned unchanged (no cross-attention applied).

Args: hidden_states: [B, S, D] decoder hidden states. text_input_ids: [B, L] text token ids. text_attention_mask: [B, L] text padding mask. block_index: Optional zero-based decoder block index. When provided and < skip_cross_attention_blocks, the cross-attention layers are bypassed entirely.

Returns: Conditioned hidden states [B, S, D].

`class TrainingConfig`

Source: ll_stepnet/stepnet/config.py:107

Configuration for training.

Methods

`init`

__init__(batch_size: int = 8, learning_rate: float = 0.0001, num_epochs: int = 10, warmup_steps: int = 1000, max_grad_norm: float = 1.0, weight_decay: float = 0.01, save_every: int = 1, eval_every: int = 1, checkpoint_dir: str = 'checkpoints', log_dir: str = 'logs') -> None

Source: ll_stepnet/stepnet/config.py

`class VAEConfig`

Source: ll_stepnet/stepnet/config.py:154

Configuration for the STEP Variational Autoencoder.

Command types follow geotoken’s vocabulary: SOL=0, LINE=1, ARC=2, CIRCLE=3, EXTRUDE=4, EOS=5

Methods

`init`

__init__(latent_dim: int = DEFAULT_QUANTIZATION_LEVELS, kl_weight: float = 1.0, kl_warmup_epochs: int = 10, encoder_vocab_size: int = 50000, encoder_embed_dim: int = DEFAULT_QUANTIZATION_LEVELS, encoder_layers: int = 6, decoder_layers: int = 6, num_command_types: int = NUM_COMMAND_TYPES, num_param_levels: int = DEFAULT_QUANTIZATION_LEVELS, max_seq_len: int = DEFAULT_MAX_SEQ_LEN) -> None

Source: ll_stepnet/stepnet/config.py

`class VAETrainer`

Source: ll_stepnet/stepnet/training/vae_trainer.py:31

Trainer for Variational Autoencoder models on CAD token sequences.

Extends the STEPTrainer concept with VAE-specific training:

Beta-VAE warmup: linearly ramps KL weight from 0 to 1 over warmup epochs
Reconstruction loss via cross-entropy on command tokens
KL divergence regularization on the latent distribution
Latent space visualization at each epoch

Supports two model output conventions:

Dict output (STEPVAE): forward() returns a dict with keys command_logits, param_logits, mu, log_var, kl_loss, and optionally recon_loss and loss.
Tuple output (legacy): forward() returns (reconstructed, mu, log_var).

The trainer auto-detects which convention is used on the first batch and adapts accordingly.

Args: model: VAE model with encode(), decode(), and reparameterize() methods. train_dataloader: Training data loader. val_dataloader: Optional validation data loader. optimizer: Optimizer instance. Creates AdamW with lr=1e-4 if None. device: Device string. ‘auto’ selects CUDA if available, else CPU. checkpoint_dir: Directory path for saving checkpoints and visualizations. kl_warmup_epochs: Number of epochs to linearly ramp beta from 0 to 1.

Methods

`init`

__init__(model: nn.Module, train_dataloader: DataLoader, val_dataloader: Optional[DataLoader] = None, optimizer: Optional[Any] = None, device: str = 'auto', checkpoint_dir: Optional[str] = None, kl_warmup_epochs: int = 10) -> None

Source: ll_stepnet/stepnet/training/vae_trainer.py:61

`train_epoch`

train_epoch() -> Dict[str, float]

Source: ll_stepnet/stepnet/training/vae_trainer.py:262

Train for one epoch with beta-VAE warmup.

Computes:

Reconstruction loss (cross-entropy on command tokens)
KL divergence with current beta weight
Total loss = recon_loss + beta * kl_loss

Returns: Dictionary with keys: ‘total_loss’, ‘recon_loss’, ‘kl_loss’, ‘beta’.

`validate`

validate() -> Dict[str, float]

Source: ll_stepnet/stepnet/training/vae_trainer.py:340

Run validation and compute reconstruction quality metrics.

Computes:

Validation loss (recon + beta * KL)
Command accuracy: exact match rate of predicted vs target command tokens
Parameter MSE: mean squared error of continuous parameter predictions

Returns: Dictionary with keys: ‘val_loss’, ‘recon_loss’, ‘kl_loss’, ‘command_accuracy’, ‘param_mse’.

`visualize_latent_space`

visualize_latent_space(epoch: int, max_samples: int = 5000) -> None

Source: ll_stepnet/stepnet/training/vae_trainer.py:419

Encode validation set and visualize latent space in 2D.

Uses t-SNE (or UMAP if available) to reduce latent representations to 2D and saves the scatter plot to the checkpoint directory.

Args: epoch: Current epoch number, used for the filename. max_samples: Maximum number of samples to process (default 1000).

`train`

train(num_epochs: int, save_every: int = 1) -> None

Source: ll_stepnet/stepnet/training/vae_trainer.py:534

Train for multiple epochs with beta-VAE scheduling.

Orchestrates the full training loop with:

Per-epoch training with beta warmup
Validation after each epoch
Latent space visualization
Checkpointing

Args: num_epochs: Total number of epochs to train. save_every: Save a checkpoint every N epochs.

`save_checkpoint`

save_checkpoint(filename: str) -> None

Source: ll_stepnet/stepnet/training/vae_trainer.py:619

Save model checkpoint to disk.

Args: filename: Name of the checkpoint file.

`load_checkpoint`

load_checkpoint(filename: str) -> None

Source: ll_stepnet/stepnet/training/vae_trainer.py:643

Load model checkpoint from disk.

Args: filename: Name of the checkpoint file to load.

`save_history`

save_history() -> None

Source: ll_stepnet/stepnet/training/vae_trainer.py:669

Save training history to a JSON file in the checkpoint directory.

`class VQVAEModel(nn.Module)`

Source: ll_stepnet/stepnet/vqvae.py:749

Complete VQ-VAE model for CAD generation.

Combines an encoder MLP, the :class:DisentangledCodebooks vector quantization layer, and a decoder MLP into an end-to-end model that can:

Encode continuous CAD feature vectors into compact discrete code sequences (10 codes per model, split 3/4/3 across topology, geometry, and extrusion codebooks).
Decode discrete codes back to reconstructed feature vectors.
Train the full pipeline with reconstruction loss + commitment loss.

The encoder splits its output into three equal-sized chunks that are fed to the topology, geometry, and extrusion codebook streams respectively. The decoder concatenates the three decoded streams and projects back to the original input dimensionality.

Args: input_dim: Dimensionality of the input feature vector (e.g. flattened STEP entity features). code_dim: Internal dimensionality for codebook vectors. topology_codes: Number of topology codebook entries. geometry_codes: Number of geometry codebook entries. extrusion_codes: Number of extrusion codebook entries. encoder_hidden_dim: Hidden dimension of the encoder MLP. decoder_hidden_dim: Hidden dimension of the decoder MLP.

Methods

`init`

__init__(input_dim: int, code_dim: int = 256, topology_codes: int = 500, geometry_codes: int = 1000, extrusion_codes: int = 1000, encoder_hidden_dim: int = 512, decoder_hidden_dim: int = 512) -> None

Source: ll_stepnet/stepnet/vqvae.py:779

`set_epoch`

set_epoch(epoch: int) -> None

Source: ll_stepnet/stepnet/vqvae.py:859

Propagate epoch to all codebooks for warmup tracking.

Args: epoch: Current training epoch (0-indexed).

`forward`

forward(x: torch.Tensor) -> Dict[str, torch.Tensor]

Source: ll_stepnet/stepnet/vqvae.py:871

Full forward pass: encode -> quantize -> decode.

Args: x: Input features of shape (batch, input_dim).

Returns: Dictionary containing:

- ``"reconstructed"``: Reconstructed features
  ``(batch, input_dim)``.
- ``"commitment_loss"``: Scalar commitment loss.
- ``"codes"``: Dictionary with ``"topology"``,
  ``"geometry"``, and ``"extrusion"`` index tensors.
- ``"reconstruction_loss"``: MSE reconstruction loss.

`encode_to_codes`

encode_to_codes(x: torch.Tensor) -> Dict[str, torch.Tensor]

Source: ll_stepnet/stepnet/vqvae.py:934

Encode input features to compact discrete codes.

Produces 10 codes per model (3 topology + 4 geometry + 3 extrusion), which serve as a compact discrete representation of the CAD model.

Args: x: Input features of shape (batch, input_dim).

Returns: Dictionary with "topology", "geometry", and "extrusion" keys mapping to LongTensor codebook indices.

`decode_from_codes`

decode_from_codes(topology_codes: torch.Tensor, geometry_codes: torch.Tensor, extrusion_codes: torch.Tensor) -> torch.Tensor

Source: ll_stepnet/stepnet/vqvae.py:970

Decode discrete codebook indices back to reconstructed features.

Args: topology_codes: (batch, 3) topology codebook indices. geometry_codes: (batch, 4) geometry codebook indices. extrusion_codes: (batch, 3) extrusion codebook indices.

Returns: Reconstructed features of shape (batch, input_dim).

`generate`

generate(num_samples: int = 1, temperature: float = 1.0, top_k: Optional[int] = None) -> Dict[str, torch.Tensor]

Source: ll_stepnet/stepnet/vqvae.py:1004

Generate new CAD models by sampling codes autoregressively.

Uses the three independent CodebookDecoder instances to generate topology, geometry, and extrusion code sequences in parallel, then decodes them back to feature space.

Args: num_samples: Number of CAD models to generate. temperature: Sampling temperature for all three decoders. top_k: Top-k filtering for sampling.

Returns: Dictionary containing:

- ``"reconstructed"``: Generated features
  ``(num_samples, input_dim)``.
- ``"codes"``: Dictionary with ``"topology"``,
  ``"geometry"``, and ``"extrusion"`` index tensors.

`compute_ar_loss`

compute_ar_loss(topology_codes: torch.Tensor, geometry_codes: torch.Tensor, extrusion_codes: torch.Tensor) -> Dict[str, torch.Tensor]

Source: ll_stepnet/stepnet/vqvae.py:1079

Compute autoregressive next-code prediction losses.

Used for training the CodebookDecoder modules. Takes ground-truth code sequences (from encode_to_codes) and computes cross-entropy loss for each decoder.

Args: topology_codes: (batch, 3) ground-truth topology indices. geometry_codes: (batch, 4) ground-truth geometry indices. extrusion_codes: (batch, 3) ground-truth extrusion indices.

Returns: Dictionary with "topology_ar_loss", "geometry_ar_loss", "extrusion_ar_loss", and "total_ar_loss" scalar tensors.

`codebook_utilization`

codebook_utilization() -> Dict[str, float]

Source: ll_stepnet/stepnet/vqvae.py:1139

Report utilization (fraction of active entries) per codebook.

Returns: Dictionary mapping codebook name to utilization float in [0.0, 1.0].

`total_codes_per_model`

total_codes_per_model() -> int

Source: ll_stepnet/stepnet/vqvae.py:1152

Number of discrete codes produced per CAD model.

Returns: Total number of codes (10 by default: 3 + 4 + 3).

`num_parameters`

num_parameters() -> Dict[str, int]

Source: ll_stepnet/stepnet/vqvae.py:1160

Count trainable parameters by component.

Returns: Dictionary mapping component names to parameter counts.

`class VectorQuantizer(nn.Module)`

Source: ll_stepnet/stepnet/vqvae.py:33

Core vector quantization layer with EMA codebook updates.

Maps continuous latent vectors to the nearest entry in a learned codebook embedding table. During training the codebook is updated via exponential moving average (EMA) rather than straight gradient descent, which is more stable for VQ-VAE training.

A warmup period (SkexGen stabilisation trick) bypasses quantization for the first warmup_epochs epochs so the encoder can learn a reasonable latent distribution before the codebook locks in.

Args: num_embeddings: Number of codebook entries (K). embedding_dim: Dimensionality of each codebook vector. commitment_cost: Weight beta for the commitment loss term. decay: EMA decay factor for codebook updates. warmup_epochs: Number of initial training epochs to skip quantization (pass-through mode).

Methods

`init`

__init__(num_embeddings: int, embedding_dim: int, commitment_cost: float = 0.25, decay: float = 0.99, warmup_epochs: int = 25) -> None

Source: ll_stepnet/stepnet/vqvae.py:54

`set_epoch`

set_epoch(epoch: int) -> None

Source: ll_stepnet/stepnet/vqvae.py:106

Update the internal epoch counter (used for warmup gating).

Args: epoch: Current training epoch (0-indexed).

`forward`

forward(inputs: torch.Tensor) -> tuple[torch.Tensor, torch.Tensor, torch.Tensor]

Source: ll_stepnet/stepnet/vqvae.py:114

Quantize continuous input vectors to nearest codebook entries.

During warmup (epoch < warmup_epochs) quantization is skipped and the inputs are returned as-is with zero commitment loss so the encoder can pre-train freely.

Args: inputs: Continuous latent vectors of shape (batch, *, embedding_dim) where * is any number of intermediate dimensions (typically sequence length).

Returns: A tuple of (quantized, commitment_loss, encoding_indices) where:

- **quantized** has the same shape as *inputs* and contains
  the selected codebook vectors (with straight-through
  gradient).
- **commitment_loss** is a scalar
  ``beta * ||z_e - sg(z_q)||^2``.
- **encoding_indices** is a ``LongTensor`` of shape
  ``(batch * seq_len,)`` giving the selected codebook index
  for each input vector.

`codebook_utilization`

codebook_utilization() -> float

Source: ll_stepnet/stepnet/vqvae.py:258

Fraction of codebook entries actively used (non-zero EMA count).

Returns: A float in [0.0, 1.0] representing the proportion of codebook entries that have received at least one assignment.

`chinchilla_optimal_tokens`

chinchilla_optimal_tokens(num_params: int) -> int

Source: ll_stepnet/stepnet/data_requirements.py:108

Estimate optimal number of training tokens based on Chinchilla scaling laws.

Rule: ~20-25 tokens per parameter for compute-optimal training.

Args: num_params: Number of model parameters (non-embedding)

Returns: Recommended number of training tokens/samples

`count_model_parameters`

count_model_parameters(model: nn.Module, exclude_embeddings: bool = True) -> int

Source: ll_stepnet/stepnet/data_requirements.py:789

Count model parameters (excluding embeddings for scaling law analysis).

Args: model: PyTorch model exclude_embeddings: Whether to exclude embedding parameters

Returns: Number of parameters

`create_dataloader`

create_dataloader(file_paths: List[str], labels: Optional[List] = None, batch_size: int = 8, max_length: int = 2048, use_topology: bool = True, shuffle: bool = True, num_workers: int = 0) -> DataLoader

Source: ll_stepnet/stepnet/data.py:189

Create DataLoader for STEP files.

Args: file_paths: List of STEP file paths labels: Optional labels batch_size: Batch size max_length: Maximum sequence length use_topology: Whether to build topology graphs shuffle: Whether to shuffle data num_workers: Number of worker processes

Returns: DataLoader instance

`estimate_data_requirements`

estimate_data_requirements(model: nn.Module, target_accuracy: float, sample_sizes: np.ndarray, val_accuracies: np.ndarray) -> int

Source: ll_stepnet/stepnet/data_requirements.py:759

Convenience function to estimate required dataset size for target accuracy.

Args: model: STEP model target_accuracy: Desired accuracy sample_sizes: Current dataset sizes tested val_accuracies: Corresponding validation accuracies

Returns: Estimated required dataset size

`get_config`

get_config(task: str = 'classification', **kwargs)

Source: ll_stepnet/stepnet/config.py:286

Get configuration for a specific task.

Args: task: Task name (‘classification’, ‘property’, ‘captioning’, ‘similarity’, ‘qa’) **kwargs: Additional config overrides

Returns: Configuration object

`inverse_power_law_accuracy`

inverse_power_law_accuracy(D: np.ndarray, a: float, b: float, c: float) -> np.ndarray

Source: ll_stepnet/stepnet/data_requirements.py:90

Inverse power law for accuracy.

Acc(D) = c - a * D^(-b)

Args: D: Dataset size a: Scaling coefficient b: Scaling exponent c: Maximum achievable accuracy (asymptote)

Returns: Accuracy values

`mask_tokens`

mask_tokens(input_ids: torch.Tensor, mask_token_id: int, vocab_size: int, mask_prob: float = 0.15, replace_prob: float = 0.1, random_prob: float = 0.1) -> tuple[torch.Tensor, torch.Tensor]

Source: ll_stepnet/stepnet/pretrain.py:430

Create masked input for BERT-style training.

Args: input_ids: [batch_size, seq_len] - original tokens mask_token_id: ID for [MASK] token vocab_size: Vocabulary size mask_prob: Probability of masking a token replace_prob: Probability of replacing with random token instead of [MASK] random_prob: Probability of keeping original token

Returns: masked_input: [batch_size, seq_len] - input with masks labels: [batch_size, seq_len] - targets (-100 for non-masked)

`plot_learning_curve_with_scaling_law`

plot_learning_curve_with_scaling_law(sample_sizes: np.ndarray, val_losses: np.ndarray, val_accuracies: Optional[np.ndarray] = None, fitted_params: Optional[Dict] = None, fit_type: str = 'openai', extrapolate_to: Optional[int] = None, save_path: Optional[str] = None)

Source: ll_stepnet/stepnet/data_requirements.py:605

Plot learning curves with fitted scaling law.

Args: sample_sizes: Array of dataset sizes val_losses: Validation losses [num_sizes, num_iterations] val_accuracies: Optional validation accuracies fitted_params: Fitted power law parameters fit_type: ‘openai’ or ‘standard’ extrapolate_to: Optional target size for extrapolation save_path: Optional path to save figure

`power_law_error`

power_law_error(n: np.ndarray, a: float, b: float, c: float) -> np.ndarray

Source: ll_stepnet/stepnet/data_requirements.py:72

Power law for error as a function of dataset size.

Error(n) = a * n^(-b) + c

Args: n: Dataset size a: Scaling coefficient b: Scaling exponent c: Irreducible error (Bayes error rate)

Returns: Error values

`power_law_loss`

power_law_loss(D: np.ndarray, D_c: float, alpha_D: float) -> np.ndarray

Source: ll_stepnet/stepnet/data_requirements.py:55

OpenAI-style power law for loss as a function of dataset size.

L(D) = (D_c / D)^alpha_D

Args: D: Dataset size (number of samples or tokens) D_c: Data scaling constant alpha_D: Data scaling exponent (~0.095 for language models)

Returns: Loss values

`reserialize_step`

reserialize_step(step_text: str, config: Optional[STEPReserializationConfig] = None) -> STEPReserializedOutput

Source: ll_stepnet/stepnet/reserialization.py:388

Convenience function to reserialize STEP text via DFS.

Parses the raw STEP text into an entity graph, then performs DFS reserialization to produce semantically-grouped output.

Args: step_text: Raw STEP file text (DATA section entities). config: Optional reserialization configuration.

Returns: STEPReserializedOutput with reserialized text and metadata.

`suggest_dataset_size`

suggest_dataset_size(model: nn.Module, task_type: str = 'classification', quality_level: str = 'good') -> Dict[str, int]

Source: ll_stepnet/stepnet/data_requirements.py:819

Suggest dataset size based on model size and task complexity.

Based on research guidelines:

Classification: 1,000-5,000 samples per class
Property prediction: 10,000-100,000 samples
Fine-tuning: 100-1,000 samples (with pretrained model)

Args: model: STEP model task_type: ‘classification’, ‘property’, ‘captioning’, etc. quality_level: ‘minimum’, ‘good’, or ‘excellent’

Returns: Dictionary with recommended dataset sizes