API Reference

Generated from the geotoken package source. Each symbol links to its definition on GitHub.

`class AdaptiveBitAllocationConfig`

Source: geotoken/geotoken/config.py:43

Configuration for adaptive bit allocation.

Bit semantics: min_bits: Absolute floor — no vertex ever gets fewer bits than this. base_bits: Starting allocation for flat / low-complexity regions. max_bits: Absolute ceiling — no vertex ever gets more bits than this.

The invariant min_bits <= base_bits <= max_bits must always hold.

Methods

`init`

__init__(base_bits: int = 8, max_additional_bits: int = 4, min_bits: int = 4, max_bits: int = 16, curvature_weight: float = 0.7, density_weight: float = 0.3, percentile_low: float = 10.0, percentile_high: float = 90.0) -> None

Source: geotoken/geotoken/config.py

`class AdaptiveQuantizer`

Source: geotoken/geotoken/quantization/adaptive.py:35

Adaptive precision quantizer for geometric data.

Pipeline:

Normalize to unit bounding cube
Analyze curvature
Analyze feature density
Allocate bits per vertex
Quantize with per-vertex precision
Prevent feature collapse

Methods

`init`

__init__(config: Optional[QuantizationConfig] = None)

Source: geotoken/geotoken/quantization/adaptive.py:47

`quantize`

quantize(vertices: np.ndarray, faces: Optional[np.ndarray] = None) -> AdaptiveQuantizationResult

Source: geotoken/geotoken/quantization/adaptive.py:54

Quantize vertices with adaptive precision.

Args: vertices: (N, 3) vertex positions faces: (F, 3) face indices (optional, improves analysis)

Returns: AdaptiveQuantizationResult

Raises: TypeError: If vertices is not a numpy array. ValueError: If vertices is not a 2D array with 3 columns.

`dequantize`

dequantize(result: AdaptiveQuantizationResult) -> np.ndarray

Source: geotoken/geotoken/quantization/adaptive.py:147

Dequantize back to original coordinate space.

Args: result: Quantization result

Returns: (N, 3) reconstructed vertex positions

`class BooleanOpToken`

Source: geotoken/geotoken/tokenizer/token_types.py:153

A CSG boolean operation combining solid bodies.

Args: op_type: The boolean operation (union, intersection, subtraction). operand_indices: Indices of bodies being combined.

Methods

`init`

__init__(op_type: BooleanOpType, operand_indices: list[int] = list()) -> None

Source: geotoken/geotoken/tokenizer/token_types.py

`class BooleanOpType(str, Enum)`

Source: geotoken/geotoken/tokenizer/token_types.py:38

Boolean/CSG operation types.

`class CADVocabulary`

Source: geotoken/geotoken/tokenizer/vocabulary.py:35

Discrete token vocabulary for CAD command sequences.

Maps every possible (command_type, parameter_index, quantized_value) triple to a unique integer token ID. With 6 command types, 16 parameters, and 256 quantization levels this yields ~24,576 tokens plus specials.

Args: num_command_types: Number of command type categories. num_parameters: Maximum number of parameters per command. num_levels: Number of quantization levels per parameter.

Methods

`init`

__init__(num_command_types: int = 6, num_parameters: int = 16, num_levels: int = 256, num_constraint_types: int = 9, max_constraint_index: int = 60, num_graph_structure_types: int = 6, node_feature_dim: int = 48, edge_feature_dim: int = 16, graph_feature_levels: int = 256) -> None

Source: geotoken/geotoken/tokenizer/vocabulary.py:48

`encode`

encode(command_tokens: list[CommandToken]) -> list[int]

Source: geotoken/geotoken/tokenizer/vocabulary.py:169

Convert command token sequence to integer ID sequence.

Each CommandToken produces 1 (command type) + N (active parameter) token IDs. Sequence is wrapped with BOS and EOS.

Args: command_tokens: List of CommandToken objects.

Returns: List of integer token IDs.

Raises: TypeError: If command_tokens is not a list or contains non-CommandToken items.

`decode`

decode(token_ids: list[int]) -> list[CommandToken]

Source: geotoken/geotoken/tokenizer/vocabulary.py:215

Convert integer ID sequence back to CommandToken list.

Args: token_ids: List of integer token IDs.

Returns: List of CommandToken objects.

`encode_flat`

encode_flat(command_tokens: list[CommandToken]) -> list[int]

Source: geotoken/geotoken/tokenizer/vocabulary.py:285

Flat encoding: fixed-width per command.

Each command emits exactly 1 (command type ID) + num_active_params parameter IDs, padded or truncated to a fixed width determined by the command type’s canonical mask.

Args: command_tokens: List of CommandToken objects.

Returns: List of integer token IDs (BOS + fixed-width commands + EOS).

`encode_constraints`

encode_constraints(constraint_tokens: list[ConstraintToken]) -> list[int]

Source: geotoken/geotoken/tokenizer/vocabulary.py:333

Encode constraint tokens to integer IDs.

Each constraint is encoded as a single integer: constraint_offset + type_idx * max_idx^2 + source * max_idx + target

Args: constraint_tokens: List of ConstraintToken objects.

Returns: List of integer token IDs for constraints.

`decode_constraints`

decode_constraints(token_ids: list[int]) -> list[ConstraintToken]

Source: geotoken/geotoken/tokenizer/vocabulary.py:367

Decode integer IDs back to constraint tokens.

Args: token_ids: List of constraint token IDs.

Returns: List of ConstraintToken objects.

`encode_graph_structure`

encode_graph_structure(structure_tokens: list[GraphStructureToken]) -> list[int]

Source: geotoken/geotoken/tokenizer/vocabulary.py:411

Encode graph structure tokens to integer IDs.

Args: structure_tokens: List of GraphStructureToken objects.

Returns: List of integer token IDs.

`encode_graph_node_features`

encode_graph_node_features(node_tokens: list[GraphNodeToken]) -> list[int]

Source: geotoken/geotoken/tokenizer/vocabulary.py:428

Encode quantized node feature tokens to integer IDs.

Each feature dimension of each node becomes a separate token: graph_node_offset + dim_idx * levels + quantized_value

Args: node_tokens: List of GraphNodeToken objects.

Returns: List of integer token IDs (one per feature per node).

`encode_graph_edge_features`

encode_graph_edge_features(edge_tokens: list[GraphEdgeToken]) -> list[int]

Source: geotoken/geotoken/tokenizer/vocabulary.py:456

Encode quantized edge feature tokens to integer IDs.

Each feature dimension of each edge becomes a separate token: graph_edge_offset + dim_idx * levels + quantized_value

Args: edge_tokens: List of GraphEdgeToken objects.

Returns: List of integer token IDs (one per feature per edge).

`encode_full_sequence`

encode_full_sequence(token_sequence) -> list[int]

Source: geotoken/geotoken/tokenizer/vocabulary.py:484

Encode a complete TokenSequence to integer IDs.

Combines command tokens, constraint tokens, and graph tokens into a single flat integer sequence with separators.

Args: token_sequence: A TokenSequence with any combination of command, constraint, and graph tokens.

Returns: List of integer token IDs.

`save`

save(path: str | Path) -> None

Source: geotoken/geotoken/tokenizer/vocabulary.py:560

Save vocabulary configuration to JSON.

`encode_to_ids`

encode_to_ids(command_tokens: list[CommandToken], seq_len: int = 60, pad_id: Optional[int] = None) -> list[int]

Source: geotoken/geotoken/tokenizer/vocabulary.py:585

Encode command tokens to a fixed-length integer ID sequence.

This bridges the variable-length output of :meth:encode to the fixed-length sequences that transformer models consume. The sequence is padded (with pad_id) or truncated to exactly seq_len integer token IDs.

Args: command_tokens: List of CommandToken objects. seq_len: Target sequence length (default 60 per DeepCAD). pad_id: Padding token ID. Defaults to :attr:pad_token_id.

Returns: List of exactly seq_len integer token IDs.

`encode_to_tensor`

encode_to_tensor(command_tokens: list[CommandToken], seq_len: int = 60, pad_id: Optional[int] = None)

Source: geotoken/geotoken/tokenizer/vocabulary.py:619

Encode command tokens to a fixed-length torch.Tensor.

Wrapper around :meth:encode_to_ids that returns an actual torch.Tensor instead of a plain list.

Args: command_tokens: List of CommandToken objects. seq_len: Target sequence length (default 60 per DeepCAD). pad_id: Padding token ID. Defaults to :attr:pad_token_id.

Returns: torch.Tensor of shape [seq_len] with dtype torch.long.

`load`

load(path: str | Path) -> CADVocabulary

Source: geotoken/geotoken/tokenizer/vocabulary.py:649

Load vocabulary configuration from JSON.

`class CommandSequenceTokenizer`

Source: geotoken/geotoken/tokenizer/command_tokenizer.py:28

Tokenize parametric CAD construction history into command sequences.

Follows the DeepCAD pipeline: parse → normalize sketches → normalize 3D → quantize parameters → pad/truncate to fixed length.

Args: quantization_config: Controls precision tiers and normalization. sequence_config: Controls sequence length and padding. command_config: Controls command tokenization specifics.

Methods

`init`

__init__(quantization_config: Optional[QuantizationConfig] = None, sequence_config: Optional[SequenceConfig] = None, command_config: Optional[CommandTokenizationConfig] = None) -> None

Source: geotoken/geotoken/tokenizer/command_tokenizer.py:40

`tokenize`

tokenize(construction_history: dict | list, constraints: list[dict[str, Any]] | None = None) -> TokenSequence

Source: geotoken/geotoken/tokenizer/command_tokenizer.py:88

Main entry point: convert construction history to TokenSequence.

Args: construction_history: DeepCAD-format JSON (list of sketch + extrude operations) or a dict with a "sequence" key. constraints: Optional list of constraint dicts from cadling’s SketchGeometryExtractor or Sketch2DItem.to_geotoken_constraints(). Each dict should have “type”, “source_index”, “target_index”, and optionally “value”. Only processed when include_constraints is True in config.

Returns: TokenSequence with command_tokens, coordinate_tokens, and optionally constraint_tokens.

Raises: TypeError: If construction_history is not a dict or list.

`parse_construction_history`

parse_construction_history(commands: list[dict[str, Any]]) -> list[dict[str, Any]]

Source: geotoken/geotoken/tokenizer/command_tokenizer.py:181

Parse raw DeepCAD JSON commands into internal representation.

Each command becomes a dict with type (CommandType) and params (list of floats).

Supports three source formats via source_format config:

"deepcad": Params are compact and match masks directly.
"cadling": Auto-strips z-interleaving and trailing padding from older cadling output to produce compact params.
"auto" (default): Detects format by checking if params have z-interleaving patterns.

`normalize_sketches`

normalize_sketches(commands: list[dict[str, Any]]) -> list[dict[str, Any]]

Source: geotoken/geotoken/tokenizer/command_tokenizer.py:314

Normalize each sketch group to origin and 2x2 square.

Translates sketch to centroid, scales to fit normalization range, and canonicalizes loop ordering (CCW from bottom-left vertex) when configured.

`normalize_3d`

normalize_3d(commands: list[dict[str, Any]]) -> list[dict[str, Any]]

Source: geotoken/geotoken/tokenizer/command_tokenizer.py:378

Scale the full solid to a bounding cube using normalization range.

Collects all 3D parameters (extrusion heights, offsets) and SOL sketch plane offsets, then scales to fit within the configured normalization range.

`quantize_parameters`

quantize_parameters(commands: list[dict[str, Any]]) -> list[CommandToken]

Source: geotoken/geotoken/tokenizer/command_tokenizer.py:431

Map continuous parameters to discrete quantization levels.

Uses classification-not-regression: each parameter is mapped to one of param_levels discrete bins within the normalization range.

`pad_or_truncate`

pad_or_truncate(tokens: list[CommandToken]) -> list[CommandToken]

Source: geotoken/geotoken/tokenizer/command_tokenizer.py:468

Pad or truncate to fixed sequence length.

Shorter sequences are padded with EOS tokens. Longer sequences are truncated, prioritizing keeping complete sketch-extrude pairs.

`dequantize_parameters`

dequantize_parameters(tokens: list[CommandToken]) -> list[dict[str, Any]]

Source: geotoken/geotoken/tokenizer/command_tokenizer.py:515

Reverse quantization: convert quantized tokens back to floats.

Args: tokens: List of quantized CommandTokens.

Returns: List of dicts with type and continuous params.

`analyze_roundtrip_quality`

analyze_roundtrip_quality(construction_history: dict | list) -> dict[str, float]

Source: geotoken/geotoken/tokenizer/command_tokenizer.py:542

Analyze quality of quantize-dequantize roundtrip.

Measures parameter reconstruction accuracy to assess tokenization quality for a given construction history.

Args: construction_history: DeepCAD-format JSON commands.

Returns: Dict with quality metrics: - param_mse: Mean squared error of parameters - max_error: Maximum parameter error - command_preservation_rate: Fraction of commands preserved

`disentangle`

disentangle(command_tokens: list[CommandToken]) -> dict[str, list[CommandToken]]

Source: geotoken/geotoken/tokenizer/command_tokenizer.py:657

Split a command sequence into SkexGen-style disentangled streams.

SkexGen (Xu et al., 2022) decomposes CAD models into three independently learnable factors:

topology — which commands appear and their connectivity (command types + sketch/extrude grouping structure). For the tokenizer this means each CommandToken is reduced to its type and mask only; parameter values are zeroed.
geometry — the 2-D sketch parameters (LINE endpoints, ARC centers, CIRCLE radii). Only sketch-level command tokens are included, with extrusion parameters zeroed.
extrusion — the 3-D manufacturing parameters (extrude extent, direction, booleans). Only EXTRUDE tokens are included with sketch parameters zeroed.

Each stream is a list of CommandToken objects so they can be independently encoded via :class:CADVocabulary.

Args: command_tokens: Flat command token sequence (output of :meth:tokenize).

Returns: Dictionary with keys "topology", "geometry", and "extrusion", each containing a list of CommandToken objects representing that stream.

`parse_constraints`

parse_constraints(constraints: list[dict[str, Any]]) -> list[ConstraintToken]

Source: geotoken/geotoken/tokenizer/command_tokenizer.py:745

Parse constraint dicts into ConstraintToken objects.

Accepts constraint dicts from cadling’s SketchGeometryExtractor (via Sketch2DItem.to_geotoken_constraints()) or any dict with “type”, “source_index”/“entity_a”, and “target_index”/“entity_b”.

Unknown constraint types are silently skipped.

Args: constraints: List of constraint dicts. Each dict should have: - type: String matching a ConstraintType name (e.g., “PARALLEL”, “TANGENT”, “COINCIDENT”) - source_index or entity_a: Index of first entity - target_index or entity_b: Index of second entity - value (optional): Quantized constraint value

Returns: List of ConstraintToken instances.

Example: constraints = [ {“type”: “PARALLEL”, “source_index”: 0, “target_index”: 2}, {“type”: “TANGENT”, “entity_a”: 1, “entity_b”: 3, “confidence”: 0.95}, ] tokens = CommandSequenceTokenizer.parse_constraints(constraints)

`class CommandToken`

Source: geotoken/geotoken/tokenizer/token_types.py:78

A single CAD operation in a sketch-and-extrude sequence.

Represents one command in the DeepCAD-style construction history. Each command has a type, up to 16 quantized integer parameters, and a mask indicating which parameters are active.

Args: command_type: The CAD operation type (SOL, LINE, ARC, etc.). parameters: Up to 16 quantized integer parameter values. parameter_mask: Boolean mask for active parameters per command type.

Methods

`active_parameters`

active_parameters() -> list[int]

Source: geotoken/geotoken/tokenizer/token_types.py:95

Return only the active parameters for this command.

`get_parameter_mask`

get_parameter_mask(command_type: CommandType) -> list[bool]

Source: geotoken/geotoken/tokenizer/token_types.py:99

Get the canonical parameter mask for a command type.

Parameter semantics per command type:

SOL: 2 active parameters
- params[0]: sketch plane z-offset (height from origin)
- params[1]: sketch plane rotation/normal orientation
LINE: xy endpoints (4 params: x1, y1, x2, y2)
ARC: start/mid/end points (6 params: x1, y1, x2, y2, x3, y3)
CIRCLE: center + radius (3 params: cx, cy, r)
EXTRUDE: extent/scale/boolean params (up to 8 params)
EOS: no active parameters

`init`

__init__(command_type: CommandType, parameters: list[int] = (lambda: [0] * 16)(), parameter_mask: list[bool] = (lambda: [False] * 16)()) -> None

Source: geotoken/geotoken/tokenizer/token_types.py

`class CommandTokenizationConfig`

Source: geotoken/geotoken/config.py:89

Configuration for CAD command sequence tokenization.

Controls how parametric CAD construction history is converted to fixed-length command token sequences for transformer consumption.

Args: max_sequence_length: Target command count (DeepCAD uses 60). coordinate_quantization: Precision for coordinate values. parameter_quantization: Precision for command parameters. normalization_range: Bounding cube size (2.0 = 2x2x2 cube). canonicalize_loops: Reorder sketch loops to canonical form. include_constraints: Include SketchGraphs-style constraint tokens. pad_to_max_length: Pad shorter sequences to max_sequence_length. source_format: Source data format. “deepcad” expects compact params matching masks directly. “cadling” will auto-strip z-interleaved and padded params to compact form for backward compatibility with older cadling output. Default “auto” detects format heuristically.

Methods

`init`

__init__(max_sequence_length: int = 60, coordinate_quantization: PrecisionTier = PrecisionTier.STANDARD, parameter_quantization: PrecisionTier = PrecisionTier.STANDARD, normalization_range: float = 2.0, canonicalize_loops: bool = True, include_constraints: bool = False, pad_to_max_length: bool = True, source_format: Literal['deepcad', 'cadling', 'auto'] = 'auto') -> None

Source: geotoken/geotoken/config.py

`class CommandType(str, Enum)`

Source: geotoken/geotoken/tokenizer/token_types.py:15

CAD command types following DeepCAD’s 6 command vocabulary.

`class ConstraintToken`

Source: geotoken/geotoken/tokenizer/token_types.py:134

A geometric constraint between sketch primitives.

Maps to SketchGraphs’ constraint edges — encodes designer-imposed relationships like parallelism, tangency, and coincidence.

Args: constraint_type: Type of geometric constraint. source_index: Index of first primitive in the command sequence. target_index: Index of second primitive. value: Optional quantized constraint value (for distance/angle).

Methods

`init`

__init__(constraint_type: ConstraintType, source_index: int, target_index: int, value: Optional[int] = None) -> None

Source: geotoken/geotoken/tokenizer/token_types.py

`class ConstraintType(str, Enum)`

Source: geotoken/geotoken/tokenizer/token_types.py:25

Geometric constraint types following SketchGraphs.

`class CoordinateToken`

Source: geotoken/geotoken/tokenizer/token_types.py:49

A quantized 3D coordinate token.

Methods

`to_tuple`

to_tuple() -> tuple[int, int, int]

Source: geotoken/geotoken/tokenizer/token_types.py:58

`init`

__init__(x: int, y: int, z: int, bits: int = 8, vertex_index: int = -1) -> None

Source: geotoken/geotoken/tokenizer/token_types.py

`class EdgeUVGridTokens`

Source: geotoken/geotoken/quantization/uv_grid_quantizer.py:125

Quantized tokens for [10, 6] edge UV-grid.

Represents an edge sampled at regular parameter intervals with xyz points and tangent vectors.

Channels in the input [num_samples, 6] grid: - 0-2: XYZ edge points (quantized) - 3-5: Tangent vectors (quantized separately)

Attributes: edge_index: Index of the edge in the topology graph. num_samples: Number of samples along the edge. quantized_xyz: Quantized XYZ values — shape (N, 3) int32. quantized_tangents: Quantized tangent values — shape (N, 3) int32. params_xyz: Normalization parameters for XYZ quantization. params_tangents: Normalization parameters for tangent quantization. bits: Bit-width per coordinate dimension.

Methods

`init`

__init__(edge_index: int = -1, num_samples: int = 10, quantized_xyz: np.ndarray = (lambda: np.empty((0, 3), dtype=(np.int32)))(), quantized_tangents: np.ndarray = (lambda: np.empty((0, 3), dtype=(np.int32)))(), params_xyz: FeatureQuantizationParams | None = None, params_tangents: FeatureQuantizationParams | None = None, bits: int = 8, is_approximated: bool = False) -> None

Source: geotoken/geotoken/quantization/uv_grid_quantizer.py

`class FaceUVGridTokens`

Source: geotoken/geotoken/quantization/uv_grid_quantizer.py:85

Quantized tokens for [10, 10, 7] face UV-grid.

Represents a full face UV-grid with xyz points, normals, and trim mask. The grid is sampled on a regular (U, V) parameter grid.

Channels in the input [num_u, num_v, 7] grid: - 0-2: XYZ surface points (quantized) - 3-5: Surface normals (quantized separately) - 6: Trim mask (preserved as boolean, NOT quantized)

Attributes: face_index: Index of the face in the topology graph. grid_resolution: (num_u, num_v) grid dimensions. quantized_xyz: Quantized XYZ values — shape (U*V, 3) int32. quantized_normals: Quantized normal values — shape (U*V, 3) int32. trim_mask: Boolean trim mask — shape (U*V,) bool. params_xyz: Normalization parameters for XYZ quantization. params_normals: Normalization parameters for normals quantization. bits: Bit-width per coordinate dimension.

Methods

`init`

__init__(face_index: Optional[int] = None, grid_resolution: tuple[int, int] = (10, 10), quantized_xyz: np.ndarray = (lambda: np.empty((0, 3), dtype=(np.int32)))(), quantized_normals: np.ndarray = (lambda: np.empty((0, 3), dtype=(np.int32)))(), trim_mask: np.ndarray = (lambda: np.empty((0,), dtype=bool))(), params_xyz: FeatureQuantizationParams | None = None, params_normals: FeatureQuantizationParams | None = None, bits: int = 8, is_approximated: bool = False) -> None

Source: geotoken/geotoken/quantization/uv_grid_quantizer.py

`class FeatureQuantizationParams`

Source: geotoken/geotoken/quantization/feature_quantizer.py:21

Fitted normalization parameters for feature quantization.

Stores per-dimension min/max values learned from training data, used to map feature values to the [0, levels-1] quantization range.

Args: dim: Feature dimensionality. min_vals: Per-dimension minimum values (shape: (D,)). max_vals: Per-dimension maximum values (shape: (D,)). scale: Pre-computed per-dimension scale factor.

Methods

`init`

__init__(dim: int, min_vals: np.ndarray, max_vals: np.ndarray, scale: np.ndarray) -> None

Source: geotoken/geotoken/quantization/feature_quantizer.py

`class FeatureVectorQuantizer`

Source: geotoken/geotoken/quantization/feature_quantizer.py:40

Quantize dense feature vectors to discrete token sequences.

Maps N-dimensional float feature vectors to integer values in [0, 2^bits - 1] using per-dimension linear quantization. Supports fit/quantize workflow for learning normalization from data.

Args: bits: Quantization bit width per dimension. Default 8 → 256 levels. strategy: Normalization strategy: “per_dimension” - Normalize each feature dim independently (default) “global” - Use a single min/max across all dimensions

Example: quantizer = FeatureVectorQuantizer(bits=8) params = quantizer.fit(training_features) # (N, 48) float32 quantized = quantizer.quantize(features, params) # (N, 48) int reconstructed = quantizer.dequantize(quantized, params) # (N, 48) float32

Methods

`init`

__init__(bits: int = 8, strategy: str = 'per_dimension')

Source: geotoken/geotoken/quantization/feature_quantizer.py:60

`fit`

fit(features: np.ndarray) -> FeatureQuantizationParams

Source: geotoken/geotoken/quantization/feature_quantizer.py:66

Compute normalization parameters from feature data.

Learns per-dimension (or global) min/max values that will be used to map features to the quantization range.

Note: This method caches the returned params in self._params for convenience (so that subsequent quantize() calls can omit the params argument). However, because the cached state is overwritten on every call, callers that need thread-safety or interleaved fits should use the returned FeatureQuantizationParams object directly instead of relying on the cached instance state.

Args: features: Feature array of shape (N, D) where N is number of samples and D is feature dimensionality.

Returns: FeatureQuantizationParams with learned normalization values.

Raises: ValueError: If features array is not 2D.

`quantize`

quantize(features: np.ndarray, params: Optional[FeatureQuantizationParams] = None) -> np.ndarray

Source: geotoken/geotoken/quantization/feature_quantizer.py:133

Quantize float features to integer tokens.

Maps each feature dimension from [min_d, max_d] to [0, levels-1]. Values outside the fitted range are clamped.

Args: features: Feature array of shape (N, D) float32. params: Normalization params (uses fitted params if None).

Returns: Quantized array of shape (N, D) int64 with values in [0, levels-1].

Raises: ValueError: If params not provided and fit() hasn’t been called. ValueError: If feature dimensionality doesn’t match params.

`dequantize`

dequantize(quantized: np.ndarray, params: Optional[FeatureQuantizationParams] = None) -> np.ndarray

Source: geotoken/geotoken/quantization/feature_quantizer.py:176

Reconstruct approximate float features from quantized tokens.

Inverse of quantize(): maps integer tokens back to approximate continuous feature values.

Args: quantized: Quantized array of shape (N, D) int. params: Normalization params (uses fitted params if None).

Returns: Reconstructed feature array of shape (N, D) float32.

`quantize_single`

quantize_single(feature_vector: np.ndarray, params: Optional[FeatureQuantizationParams] = None) -> list[int]

Source: geotoken/geotoken/quantization/feature_quantizer.py:205

Quantize a single feature vector and return as list of ints.

Convenience method for quantizing individual vectors.

Args: feature_vector: 1D feature array of shape (D,). params: Normalization params.

Returns: List of quantized integer values.

`class GeoTokenizer`

Source: geotoken/geotoken/tokenizer/geo_tokenizer.py:23

Main geometric tokenizer.

Converts 3D mesh/point cloud data into discrete token sequences using adaptive or uniform quantization.

Example: tokenizer = GeoTokenizer() tokens = tokenizer.tokenize(vertices, faces) reconstructed = tokenizer.detokenize(tokens)

Methods

`init`

__init__(config: Optional[QuantizationConfig] = None)

Source: geotoken/geotoken/tokenizer/geo_tokenizer.py:35

Initialize tokenizer.

Args: config: Quantization configuration. Defaults to STANDARD tier with adaptive quantization.

`tokenize`

tokenize(vertices: np.ndarray, faces: Optional[np.ndarray] = None) -> TokenSequence

Source: geotoken/geotoken/tokenizer/geo_tokenizer.py:47

Tokenize 3D geometry into a token sequence.

Args: vertices: (N, 3) vertex positions faces: (F, 3) face indices (optional, improves adaptive quality)

Returns: TokenSequence with coordinate and geometry tokens

Raises: TypeError: If vertices is not a numpy array. ValueError: If vertices is not a 2D array with 3 columns.

`detokenize`

detokenize(tokens: TokenSequence) -> np.ndarray

Source: geotoken/geotoken/tokenizer/geo_tokenizer.py:82

Reconstruct 3D coordinates from token sequence.

Args: tokens: TokenSequence from tokenize()

Returns: (N, 3) reconstructed vertex positions

`analyze_impact`

analyze_impact(vertices: np.ndarray, faces: Optional[np.ndarray] = None) -> ImpactReport

Source: geotoken/geotoken/tokenizer/geo_tokenizer.py:131

Analyze quantization impact on geometry quality.

Args: vertices: Original vertex positions faces: Face indices

Returns: ImpactReport with quality metrics

`class GeometryToken`

Source: geotoken/geotoken/tokenizer/token_types.py:66

A token representing geometric structure (face, edge, etc.).

Methods

`init`

__init__(token_type: str, indices: list[int] = list(), properties: dict = dict()) -> None

Source: geotoken/geotoken/tokenizer/token_types.py

`class GraphEdgeToken`

Source: geotoken/geotoken/tokenizer/token_types.py:194

Quantized edge feature token.

Represents a directed edge in the B-Rep topology graph with its feature vector quantized to discrete token values.

Args: source_index: Source node index. target_index: Target node index. feature_tokens: Quantized feature values (one per dimension). bits: Quantization bit width per feature dimension.

Methods

`init`

__init__(source_index: int, target_index: int, feature_tokens: list[int] = list(), bits: int = 8) -> None

Source: geotoken/geotoken/tokenizer/token_types.py

`class GraphNodeToken`

Source: geotoken/geotoken/tokenizer/token_types.py:169

Quantized node (face/edge/vertex) feature token.

Represents a single node in a B-Rep topology graph with its feature vector quantized to discrete token values. Each dimension of the feature vector becomes a separate quantized integer.

Args: node_index: Index of this node in the graph. feature_tokens: Quantized feature values (one per dimension). node_type: Node entity type (“face”, “edge”, “vertex”). bits: Quantization bit width per feature dimension.

Methods

`init`

__init__(node_index: int, feature_tokens: list[int] = list(), node_type: str = 'face', bits: int = 8) -> None

Source: geotoken/geotoken/tokenizer/token_types.py

`class GraphStructureToken`

Source: geotoken/geotoken/tokenizer/token_types.py:218

Graph structure marker token.

Used to delimit graph, node, and edge boundaries in the serialized flat token sequence.

Args: token_type: Structure marker type. One of: “graph_start” - Start of a graph (value = num_nodes) “graph_end” - End of a graph “node_start” - Start of a node (value = node_index) “node_end” - End of a node “adjacency” - Adjacency list entry (value = neighbor_index) “edge” - Edge marker (value encodes src << 16 | tgt) value: Associated integer value.

Methods

`init`

__init__(token_type: Literal['graph_start', 'graph_end', 'node_start', 'node_end', 'adjacency', 'edge'], value: int = 0) -> None

Source: geotoken/geotoken/tokenizer/token_types.py

`class GraphTokenizationConfig`

Source: geotoken/geotoken/config.py:119

Configuration for B-Rep topology graph tokenization.

Controls how enriched B-Rep topology graphs (nodes with feature vectors, edges with feature vectors, adjacency structure) are converted to flat token sequences for transformer consumption.

Args: node_bits: Quantization bits per dimension for node features. edge_bits: Quantization bits per dimension for edge features. max_nodes: Maximum node count (truncate or pad). max_edges: Maximum edge count (truncate or pad). include_uv_grids: Include UV-grid summary tokens per face/edge. uv_grid_summary_dim: Number of summary dimensions per UV grid. node_feature_dim: Expected node feature dimensionality (48 for cadling). edge_feature_dim: Expected edge feature dimensionality. Set to 12 for BrepGen-aligned configurations, or 16 for cadling’s default enhanced edge features (which include 4 extra topology flags). adjacency_encoding: How adjacency is serialized. “explicit” lists neighbor indices per node. “implicit” relies on sorted edge list. pad_to_max: Pad shorter graphs to max_nodes/max_edges.

Methods

`init`

__init__(node_bits: int = 8, edge_bits: int = 8, max_nodes: int = 256, max_edges: int = 1024, include_uv_grids: bool = False, uv_grid_summary_dim: int = 6, node_feature_dim: int = 48, edge_feature_dim: int = 16, adjacency_encoding: Literal['implicit', 'explicit'] = 'explicit', pad_to_max: bool = True) -> None

Source: geotoken/geotoken/config.py

`class GraphTokenizer`

Source: geotoken/geotoken/tokenizer/graph_tokenizer.py:41

Tokenize B-Rep topology graphs with node/edge features.

Converts a topology graph with dense feature vectors on nodes and edges into a TokenSequence containing quantized graph tokens. The resulting flat token sequence can be encoded into integer IDs via CADVocabulary for transformer input.

The tokenizer includes FeatureVectorQuantizer instances for both node and edge features. These quantizers can be pre-fitted on training data, or will auto-fit on the first call to tokenize().

Args: config: Graph tokenization configuration.

Example: config = GraphTokenizationConfig(node_bits=8, edge_bits=8) tokenizer = GraphTokenizer(config)

# From cadling's TopologyGraph:
node_feats = topology_graph.to_numpy_node_features()  # (N, 48)
edge_idx = topology_graph.to_edge_index()              # (2, M)
edge_feats = topology_graph.to_numpy_edge_features()   # (M, 16)

token_seq = tokenizer.tokenize(node_feats, edge_idx, edge_feats)

Methods

`init`

__init__(config: Optional[GraphTokenizationConfig] = None) -> None

Source: geotoken/geotoken/tokenizer/graph_tokenizer.py:68

`fit`

fit(node_features: np.ndarray, edge_features: Optional[np.ndarray] = None) -> None

Source: geotoken/geotoken/tokenizer/graph_tokenizer.py:82

Pre-fit quantization parameters on training data.

Call this before tokenize() to use consistent normalization across multiple graphs. If not called, tokenize() will auto-fit on each graph independently.

Args: node_features: Training node features, shape (N_total, D_node). edge_features: Training edge features, shape (M_total, D_edge).

`tokenize`

tokenize(node_features: np.ndarray, edge_index: np.ndarray, edge_features: Optional[np.ndarray] = None, node_types: Optional[list[str]] = None) -> TokenSequence

Source: geotoken/geotoken/tokenizer/graph_tokenizer.py:107

Convert topology graph to token sequence.

Args: node_features: Node feature array, shape (N, D_node) float32. Typically 48-dim for cadling face features. edge_index: Edge index array, shape (2, M) int64. Row 0 = source indices, row 1 = target indices. edge_features: Optional edge feature array, shape (M, D_edge) float32. Typically 16-dim for cadling edge features. node_types: Optional per-node type labels (e.g., [“face”, “face”, …]).

Returns: TokenSequence with graph_node_tokens, graph_edge_tokens, and graph_structure_tokens populated.

Raises: ValueError: If array shapes are inconsistent.

`detokenize`

detokenize(token_sequence: TokenSequence) -> dict[str, Any]

Source: geotoken/geotoken/tokenizer/graph_tokenizer.py:289

Reconstruct graph structure from token sequence.

Inverse of tokenize(). Recovers approximate node features, edge features, and adjacency structure from the token sequence.

Args: token_sequence: TokenSequence with graph tokens.

Returns: Dict with keys: - node_features: Reconstructed (N, D) float32 array - edge_index: (2, M) int64 array - edge_features: Reconstructed (M, D) float32 array or None - num_nodes: int - num_edges: int - node_types: list[str] - adjacency: dict[int, list[int]]

`class NormalizationConfig`

Source: geotoken/geotoken/config.py:34

Configuration for geometry normalization.

Methods

`init`

__init__(center: bool = True, uniform_scale: bool = True, preserve_aspect_ratio: bool = True, target_range: tuple[float, float] = (0.0, 1.0)) -> None

Source: geotoken/geotoken/config.py

`class PrecisionTier(str, Enum)`

Source: geotoken/geotoken/config.py:12

Quantization precision tiers.

Tier	Bits	Levels	Use case
DRAFT	6	64	Fast preview
STANDARD	8	256	Balanced (default)
PRECISION	10	1024	High fidelity

`class QuantizationConfig`

Source: geotoken/geotoken/config.py:78

Main quantization configuration.

Methods

`init`

__init__(tier: PrecisionTier = PrecisionTier.STANDARD, adaptive: bool = True, normalization: NormalizationConfig = NormalizationConfig(), bit_allocation: AdaptiveBitAllocationConfig = AdaptiveBitAllocationConfig(), minimum_feature_threshold: float = 0.05, float_tolerance: float = 1e-10) -> None

Source: geotoken/geotoken/config.py

`class SequenceConfig`

Source: geotoken/geotoken/tokenizer/token_types.py:239

Metadata and configuration for a command sequence.

Controls fixed-length padding, quantization resolution, and normalization range for transformer consumption.

Args: max_commands: Target sequence length (DeepCAD uses 60). quantization_bits: Bits for parameter quantization. coordinate_range: Normalization bounding cube size. padding_token_id: Token ID used for padding.

Methods

`init`

__init__(max_commands: int = 60, quantization_bits: int = 8, coordinate_range: float = 2.0, padding_token_id: int = 0) -> None

Source: geotoken/geotoken/tokenizer/token_types.py

`class TokenSequence`

Source: geotoken/geotoken/tokenizer/token_types.py:262

A sequence of geometric tokens.

Contains coordinate tokens (mesh-level), geometry tokens (structural), command tokens (parametric CAD construction history), constraint tokens, and graph tokens (B-Rep topology).

Methods

`to_array`

to_array() -> np.ndarray

Source: geotoken/geotoken/tokenizer/token_types.py:287

Convert coordinate tokens to (N, 3) integer array.

`command_types`

command_types() -> list[CommandType]

Source: geotoken/geotoken/tokenizer/token_types.py:305

Get the sequence of command types.

`init`

__init__(coordinate_tokens: list[CoordinateToken] = list(), geometry_tokens: list[GeometryToken] = list(), command_tokens: list[CommandToken] = list(), constraint_tokens: list[ConstraintToken] = list(), boolean_op_tokens: list[BooleanOpToken] = list(), graph_node_tokens: list[GraphNodeToken] = list(), graph_edge_tokens: list[GraphEdgeToken] = list(), graph_structure_tokens: list[GraphStructureToken] = list(), metadata: dict = dict()) -> None

Source: geotoken/geotoken/tokenizer/token_types.py

`class UVGridQuantizer`

Source: geotoken/geotoken/quantization/uv_grid_quantizer.py:165

Quantize UV-parameter grid samples from B-Rep surface patches.

Samples each B-Rep face on a regular (U, V) parameter grid, evaluates the surface mapping (u, v) → (x, y, z), and quantizes the resulting 3-D points using :class:FeatureVectorQuantizer.

The quantized grid can be linearised into a flat token sequence (row-major order) and prepended/appended to the topology-graph token stream to give the model an explicit surface shape channel.

Args: grid_resolution: (num_u, num_v) sample count along each parameter direction. Total samples per face = num_u × num_v. Default (5, 5) → 25 samples. bits: Quantization bit-width per coordinate axis. Default 8 → 256 levels.

Example::

quantizer = UVGridQuantizer(grid_resolution=(5, 5), bits=8)
tokens = quantizer.quantize_surface_samples(uv, xyz)

Methods

`init`

__init__(grid_resolution: tuple[int, int] = (5, 5), bits: int = 8) -> None

Source: geotoken/geotoken/quantization/uv_grid_quantizer.py:190

`fit_global`

fit_global(all_xyz_samples: np.ndarray, all_normals_samples: np.ndarray | None = None, all_tangents_samples: np.ndarray | None = None) -> None

Source: geotoken/geotoken/quantization/uv_grid_quantizer.py:225

Fit global normalization params on combined data from all faces/edges.

When global params are set, quantize_surface_samples, quantize_face_uv_grid, and quantize_edge_uv_grid will use them instead of per-face fitting. This ensures cross-face token comparability — the same XYZ coordinate always maps to the same token regardless of which face it belongs to.

Args: all_xyz_samples: Combined XYZ points from all faces/edges, shape (N, 3) float32. all_normals_samples: Optional combined normals from all faces, shape (M, 3) float32. all_tangents_samples: Optional combined tangents from all edges, shape (K, 3) float32.

`quantize_surface_samples`

quantize_surface_samples(uv_samples: np.ndarray, xyz_samples: np.ndarray, face_index: Optional[int] = None) -> UVGridTokens

Source: geotoken/geotoken/quantization/uv_grid_quantizer.py:275

Quantize (u, v) → (x, y, z) mappings from a single surface.

The caller is responsible for evaluating the surface at the desired UV points. This method quantizes the xyz side and packages the result into :class:UVGridTokens.

Args: uv_samples: UV parameters — shape (N, 2) where N = num_u * num_v. xyz_samples: Corresponding 3-D points — shape (N, 3). face_index: Optional face index for bookkeeping.

Returns: :class:UVGridTokens with quantized grid.

Raises: ValueError: If shapes are inconsistent.

`quantize_from_topology`

quantize_from_topology(topology: TopologyGraph) -> dict[int, UVGridTokens]

Source: geotoken/geotoken/quantization/uv_grid_quantizer.py:333

Extract and quantize UV grids for all faces in a topology graph.

The topology graph’s 48-dim node features encode UV statistics at known indices (see cadling.backend.step.enhanced_features). When raw UV samples are not directly available, this method synthesises a regular grid from the per-face UV bounds stored in the feature vector.

Args: topology: A :class:TopologyGraph from a CADlingDocument.

Returns: Mapping of {face_index → UVGridTokens}. Faces without UV data are silently skipped.

`dequantize`

dequantize(tokens: UVGridTokens) -> np.ndarray

Source: geotoken/geotoken/quantization/uv_grid_quantizer.py:431

Dequantize grid tokens back to approximate 3-D coordinates.

Args: tokens: :class:UVGridTokens with quantized grid.

Returns: Reconstructed (N, 3) float32 array.

Raises: ValueError: If params are missing from the token object.

`to_flat_tokens`

to_flat_tokens(tokens: UVGridTokens) -> list[int]

Source: geotoken/geotoken/quantization/uv_grid_quantizer.py:450

Linearise quantized grid to a flat integer token list.

The grid is serialised in row-major order (U-major, then V). Each sample produces 3 tokens (quantized x, y, z), so the total length is num_u * num_v * 3.

Args: tokens: :class:UVGridTokens with quantized grid.

Returns: Flat list of integer tokens.

`quantize_face_uv_grid`

quantize_face_uv_grid(uv_grid: np.ndarray, face_index: Optional[int] = None) -> FaceUVGridTokens

Source: geotoken/geotoken/quantization/uv_grid_quantizer.py:469

Quantize full [num_u, num_v, 7] face UV-grid.

Processes a face UV-grid with xyz points, normals, and trim mask. XYZ and normals are quantized independently; the trim mask is preserved as a boolean array without quantization.

Channels: - 0-2: XYZ points (quantized) - 3-5: normals (quantized separately) - 6: trim mask (preserved as bool)

Args: uv_grid: Face UV-grid — shape (num_u, num_v, 7) float32. face_index: Optional face index for bookkeeping.

Returns: :class:FaceUVGridTokens with quantized xyz, normals, and preserved trim mask.

Raises: ValueError: If shape is not (num_u, num_v, 7).

`quantize_edge_uv_grid`

quantize_edge_uv_grid(uv_grid: np.ndarray, edge_index: int = -1) -> EdgeUVGridTokens

Source: geotoken/geotoken/quantization/uv_grid_quantizer.py:543

Quantize full [num_samples, 6] edge UV-grid.

Processes an edge UV-grid with xyz points and tangent vectors. Both are quantized independently.

Channels: - 0-2: XYZ points (quantized) - 3-5: tangent vectors (quantized)

Args: uv_grid: Edge UV-grid — shape (num_samples, 6) float32. edge_index: Optional edge index for bookkeeping.

Returns: :class:EdgeUVGridTokens with quantized xyz and tangents.

Raises: ValueError: If shape is not (num_samples, 6).

`dequantize_face_grid`

dequantize_face_grid(tokens: FaceUVGridTokens) -> np.ndarray

Source: geotoken/geotoken/quantization/uv_grid_quantizer.py:611

Dequantize face grid tokens back to [num_u, num_v, 7] array.

Reconstructs the full face UV-grid from quantized tokens. XYZ and normals are dequantized; the trim mask is restored from the boolean array.

Args: tokens: :class:FaceUVGridTokens with quantized data.

Returns: Reconstructed (num_u, num_v, 7) float32 array.

Raises: ValueError: If params are missing from the token object.

`dequantize_edge_grid`

dequantize_edge_grid(tokens: EdgeUVGridTokens) -> np.ndarray

Source: geotoken/geotoken/quantization/uv_grid_quantizer.py:647

Dequantize edge grid tokens back to [num_samples, 6] array.

Reconstructs the full edge UV-grid from quantized tokens.

Args: tokens: :class:EdgeUVGridTokens with quantized data.

Returns: Reconstructed (num_samples, 6) float32 array.

Raises: ValueError: If params are missing from the token object.

`class UVGridTokens`

Source: geotoken/geotoken/quantization/uv_grid_quantizer.py:54

Quantized UV-grid tokens for a single B-Rep face.

Attributes: face_index: Index of the face in the topology graph. grid_resolution: (num_u, num_v) grid dimensions. uv_samples: Original UV parameter values — shape (U*V, 2). xyz_samples: Original 3-D surface points — shape (U*V, 3). quantized_grid: Quantized XYZ values — shape (U*V, 3) with integer values in [0, 2^bits - 1]. params: Normalization parameters used for quantization. bits: Bit-width per coordinate dimension.

Methods

`init`

__init__(face_index: Optional[int] = None, grid_resolution: tuple[int, int] = (5, 5), uv_samples: np.ndarray = (lambda: np.empty((0, 2), dtype=(np.float32)))(), xyz_samples: np.ndarray = (lambda: np.empty((0, 3), dtype=(np.float32)))(), quantized_grid: np.ndarray = (lambda: np.empty((0, 3), dtype=(np.int32)))(), params: FeatureQuantizationParams | None = None, bits: int = 8, is_approximated: bool = False) -> None

Source: geotoken/geotoken/quantization/uv_grid_quantizer.py

`class UniformQuantizer`

Source: geotoken/geotoken/quantization/uniform.py:18

Fixed-precision quantizer (DeepCAD baseline).

Quantizes all coordinates to the same number of levels.

Methods

`init`

__init__(bits: int = 8)

Source: geotoken/geotoken/quantization/uniform.py:24

Initialize with fixed bit width.

Args: bits: Bit width for all coordinates (default 8 = 256 levels)

`from_tier`

from_tier(tier: PrecisionTier) -> 'UniformQuantizer'

Source: geotoken/geotoken/quantization/uniform.py:35

Create quantizer from precision tier.

`quantize`

quantize(values: np.ndarray) -> np.ndarray

Source: geotoken/geotoken/quantization/uniform.py:40

Quantize normalized values [0, 1] to integer levels.

Args: values: Normalized values in [0, 1]

Returns: Integer quantized values in [0, levels-1]

`dequantize`

dequantize(quantized: np.ndarray) -> np.ndarray

Source: geotoken/geotoken/quantization/uniform.py:53

Dequantize integer levels back to [0, 1] range.

Args: quantized: Integer values in [0, levels-1]

Returns: Reconstructed values in [0, 1]