Skip to content

API Reference

Generated from the geotoken package source. Each symbol links to its definition on GitHub.

Source: geotoken/geotoken/config.py:43

Configuration for adaptive bit allocation.

Bit semantics: min_bits: Absolute floor — no vertex ever gets fewer bits than this. base_bits: Starting allocation for flat / low-complexity regions. max_bits: Absolute ceiling — no vertex ever gets more bits than this.

The invariant min_bits <= base_bits <= max_bits must always hold.

Methods

__init__(base_bits: int = 8, max_additional_bits: int = 4, min_bits: int = 4, max_bits: int = 16, curvature_weight: float = 0.7, density_weight: float = 0.3, percentile_low: float = 10.0, percentile_high: float = 90.0) -> None

Source: geotoken/geotoken/config.py

Source: geotoken/geotoken/quantization/adaptive.py:35

Adaptive precision quantizer for geometric data.

Pipeline:

  1. Normalize to unit bounding cube
  2. Analyze curvature
  3. Analyze feature density
  4. Allocate bits per vertex
  5. Quantize with per-vertex precision
  6. Prevent feature collapse

Methods

__init__(config: Optional[QuantizationConfig] = None)

Source: geotoken/geotoken/quantization/adaptive.py:47

quantize(vertices: np.ndarray, faces: Optional[np.ndarray] = None) -> AdaptiveQuantizationResult

Source: geotoken/geotoken/quantization/adaptive.py:54

Quantize vertices with adaptive precision.

Args: vertices: (N, 3) vertex positions faces: (F, 3) face indices (optional, improves analysis)

Returns: AdaptiveQuantizationResult

Raises: TypeError: If vertices is not a numpy array. ValueError: If vertices is not a 2D array with 3 columns.

dequantize(result: AdaptiveQuantizationResult) -> np.ndarray

Source: geotoken/geotoken/quantization/adaptive.py:147

Dequantize back to original coordinate space.

Args: result: Quantization result

Returns: (N, 3) reconstructed vertex positions

Source: geotoken/geotoken/tokenizer/token_types.py:153

A CSG boolean operation combining solid bodies.

Args: op_type: The boolean operation (union, intersection, subtraction). operand_indices: Indices of bodies being combined.

Methods

__init__(op_type: BooleanOpType, operand_indices: list[int] = list()) -> None

Source: geotoken/geotoken/tokenizer/token_types.py

Source: geotoken/geotoken/tokenizer/token_types.py:38

Boolean/CSG operation types.

Source: geotoken/geotoken/tokenizer/vocabulary.py:35

Discrete token vocabulary for CAD command sequences.

Maps every possible (command_type, parameter_index, quantized_value) triple to a unique integer token ID. With 6 command types, 16 parameters, and 256 quantization levels this yields ~24,576 tokens plus specials.

Args: num_command_types: Number of command type categories. num_parameters: Maximum number of parameters per command. num_levels: Number of quantization levels per parameter.

Methods

__init__(num_command_types: int = 6, num_parameters: int = 16, num_levels: int = 256, num_constraint_types: int = 9, max_constraint_index: int = 60, num_graph_structure_types: int = 6, node_feature_dim: int = 48, edge_feature_dim: int = 16, graph_feature_levels: int = 256) -> None

Source: geotoken/geotoken/tokenizer/vocabulary.py:48

encode(command_tokens: list[CommandToken]) -> list[int]

Source: geotoken/geotoken/tokenizer/vocabulary.py:169

Convert command token sequence to integer ID sequence.

Each CommandToken produces 1 (command type) + N (active parameter) token IDs. Sequence is wrapped with BOS and EOS.

Args: command_tokens: List of CommandToken objects.

Returns: List of integer token IDs.

Raises: TypeError: If command_tokens is not a list or contains non-CommandToken items.

decode(token_ids: list[int]) -> list[CommandToken]

Source: geotoken/geotoken/tokenizer/vocabulary.py:215

Convert integer ID sequence back to CommandToken list.

Args: token_ids: List of integer token IDs.

Returns: List of CommandToken objects.

encode_flat(command_tokens: list[CommandToken]) -> list[int]

Source: geotoken/geotoken/tokenizer/vocabulary.py:285

Flat encoding: fixed-width per command.

Each command emits exactly 1 (command type ID) + num_active_params parameter IDs, padded or truncated to a fixed width determined by the command type’s canonical mask.

Args: command_tokens: List of CommandToken objects.

Returns: List of integer token IDs (BOS + fixed-width commands + EOS).

encode_constraints(constraint_tokens: list[ConstraintToken]) -> list[int]

Source: geotoken/geotoken/tokenizer/vocabulary.py:333

Encode constraint tokens to integer IDs.

Each constraint is encoded as a single integer: constraint_offset + type_idx * max_idx^2 + source * max_idx + target

Args: constraint_tokens: List of ConstraintToken objects.

Returns: List of integer token IDs for constraints.

decode_constraints(token_ids: list[int]) -> list[ConstraintToken]

Source: geotoken/geotoken/tokenizer/vocabulary.py:367

Decode integer IDs back to constraint tokens.

Args: token_ids: List of constraint token IDs.

Returns: List of ConstraintToken objects.

encode_graph_structure(structure_tokens: list[GraphStructureToken]) -> list[int]

Source: geotoken/geotoken/tokenizer/vocabulary.py:411

Encode graph structure tokens to integer IDs.

Args: structure_tokens: List of GraphStructureToken objects.

Returns: List of integer token IDs.

encode_graph_node_features(node_tokens: list[GraphNodeToken]) -> list[int]

Source: geotoken/geotoken/tokenizer/vocabulary.py:428

Encode quantized node feature tokens to integer IDs.

Each feature dimension of each node becomes a separate token: graph_node_offset + dim_idx * levels + quantized_value

Args: node_tokens: List of GraphNodeToken objects.

Returns: List of integer token IDs (one per feature per node).

encode_graph_edge_features(edge_tokens: list[GraphEdgeToken]) -> list[int]

Source: geotoken/geotoken/tokenizer/vocabulary.py:456

Encode quantized edge feature tokens to integer IDs.

Each feature dimension of each edge becomes a separate token: graph_edge_offset + dim_idx * levels + quantized_value

Args: edge_tokens: List of GraphEdgeToken objects.

Returns: List of integer token IDs (one per feature per edge).

encode_full_sequence(token_sequence) -> list[int]

Source: geotoken/geotoken/tokenizer/vocabulary.py:484

Encode a complete TokenSequence to integer IDs.

Combines command tokens, constraint tokens, and graph tokens into a single flat integer sequence with separators.

Args: token_sequence: A TokenSequence with any combination of command, constraint, and graph tokens.

Returns: List of integer token IDs.

save(path: str | Path) -> None

Source: geotoken/geotoken/tokenizer/vocabulary.py:560

Save vocabulary configuration to JSON.

encode_to_ids(command_tokens: list[CommandToken], seq_len: int = 60, pad_id: Optional[int] = None) -> list[int]

Source: geotoken/geotoken/tokenizer/vocabulary.py:585

Encode command tokens to a fixed-length integer ID sequence.

This bridges the variable-length output of :meth:encode to the fixed-length sequences that transformer models consume. The sequence is padded (with pad_id) or truncated to exactly seq_len integer token IDs.

Args: command_tokens: List of CommandToken objects. seq_len: Target sequence length (default 60 per DeepCAD). pad_id: Padding token ID. Defaults to :attr:pad_token_id.

Returns: List of exactly seq_len integer token IDs.

encode_to_tensor(command_tokens: list[CommandToken], seq_len: int = 60, pad_id: Optional[int] = None)

Source: geotoken/geotoken/tokenizer/vocabulary.py:619

Encode command tokens to a fixed-length torch.Tensor.

Wrapper around :meth:encode_to_ids that returns an actual torch.Tensor instead of a plain list.

Args: command_tokens: List of CommandToken objects. seq_len: Target sequence length (default 60 per DeepCAD). pad_id: Padding token ID. Defaults to :attr:pad_token_id.

Returns: torch.Tensor of shape [seq_len] with dtype torch.long.

load(path: str | Path) -> CADVocabulary

Source: geotoken/geotoken/tokenizer/vocabulary.py:649

Load vocabulary configuration from JSON.

Source: geotoken/geotoken/tokenizer/command_tokenizer.py:28

Tokenize parametric CAD construction history into command sequences.

Follows the DeepCAD pipeline: parse → normalize sketches → normalize 3D → quantize parameters → pad/truncate to fixed length.

Args: quantization_config: Controls precision tiers and normalization. sequence_config: Controls sequence length and padding. command_config: Controls command tokenization specifics.

Methods

__init__(quantization_config: Optional[QuantizationConfig] = None, sequence_config: Optional[SequenceConfig] = None, command_config: Optional[CommandTokenizationConfig] = None) -> None

Source: geotoken/geotoken/tokenizer/command_tokenizer.py:40

tokenize(construction_history: dict | list, constraints: list[dict[str, Any]] | None = None) -> TokenSequence

Source: geotoken/geotoken/tokenizer/command_tokenizer.py:88

Main entry point: convert construction history to TokenSequence.

Args: construction_history: DeepCAD-format JSON (list of sketch + extrude operations) or a dict with a "sequence" key. constraints: Optional list of constraint dicts from cadling’s SketchGeometryExtractor or Sketch2DItem.to_geotoken_constraints(). Each dict should have “type”, “source_index”, “target_index”, and optionally “value”. Only processed when include_constraints is True in config.

Returns: TokenSequence with command_tokens, coordinate_tokens, and optionally constraint_tokens.

Raises: TypeError: If construction_history is not a dict or list.

parse_construction_history(commands: list[dict[str, Any]]) -> list[dict[str, Any]]

Source: geotoken/geotoken/tokenizer/command_tokenizer.py:181

Parse raw DeepCAD JSON commands into internal representation.

Each command becomes a dict with type (CommandType) and params (list of floats).

Supports three source formats via source_format config:

  • "deepcad": Params are compact and match masks directly.
  • "cadling": Auto-strips z-interleaving and trailing padding from older cadling output to produce compact params.
  • "auto" (default): Detects format by checking if params have z-interleaving patterns.
normalize_sketches(commands: list[dict[str, Any]]) -> list[dict[str, Any]]

Source: geotoken/geotoken/tokenizer/command_tokenizer.py:314

Normalize each sketch group to origin and 2x2 square.

Translates sketch to centroid, scales to fit normalization range, and canonicalizes loop ordering (CCW from bottom-left vertex) when configured.

normalize_3d(commands: list[dict[str, Any]]) -> list[dict[str, Any]]

Source: geotoken/geotoken/tokenizer/command_tokenizer.py:378

Scale the full solid to a bounding cube using normalization range.

Collects all 3D parameters (extrusion heights, offsets) and SOL sketch plane offsets, then scales to fit within the configured normalization range.

quantize_parameters(commands: list[dict[str, Any]]) -> list[CommandToken]

Source: geotoken/geotoken/tokenizer/command_tokenizer.py:431

Map continuous parameters to discrete quantization levels.

Uses classification-not-regression: each parameter is mapped to one of param_levels discrete bins within the normalization range.

pad_or_truncate(tokens: list[CommandToken]) -> list[CommandToken]

Source: geotoken/geotoken/tokenizer/command_tokenizer.py:468

Pad or truncate to fixed sequence length.

Shorter sequences are padded with EOS tokens. Longer sequences are truncated, prioritizing keeping complete sketch-extrude pairs.

dequantize_parameters(tokens: list[CommandToken]) -> list[dict[str, Any]]

Source: geotoken/geotoken/tokenizer/command_tokenizer.py:515

Reverse quantization: convert quantized tokens back to floats.

Args: tokens: List of quantized CommandTokens.

Returns: List of dicts with type and continuous params.

analyze_roundtrip_quality(construction_history: dict | list) -> dict[str, float]

Source: geotoken/geotoken/tokenizer/command_tokenizer.py:542

Analyze quality of quantize-dequantize roundtrip.

Measures parameter reconstruction accuracy to assess tokenization quality for a given construction history.

Args: construction_history: DeepCAD-format JSON commands.

Returns: Dict with quality metrics: - param_mse: Mean squared error of parameters - max_error: Maximum parameter error - command_preservation_rate: Fraction of commands preserved

disentangle(command_tokens: list[CommandToken]) -> dict[str, list[CommandToken]]

Source: geotoken/geotoken/tokenizer/command_tokenizer.py:657

Split a command sequence into SkexGen-style disentangled streams.

SkexGen (Xu et al., 2022) decomposes CAD models into three independently learnable factors:

  1. topologywhich commands appear and their connectivity (command types + sketch/extrude grouping structure). For the tokenizer this means each CommandToken is reduced to its type and mask only; parameter values are zeroed.
  2. geometry — the 2-D sketch parameters (LINE endpoints, ARC centers, CIRCLE radii). Only sketch-level command tokens are included, with extrusion parameters zeroed.
  3. extrusion — the 3-D manufacturing parameters (extrude extent, direction, booleans). Only EXTRUDE tokens are included with sketch parameters zeroed.

Each stream is a list of CommandToken objects so they can be independently encoded via :class:CADVocabulary.

Args: command_tokens: Flat command token sequence (output of :meth:tokenize).

Returns: Dictionary with keys "topology", "geometry", and "extrusion", each containing a list of CommandToken objects representing that stream.

parse_constraints(constraints: list[dict[str, Any]]) -> list[ConstraintToken]

Source: geotoken/geotoken/tokenizer/command_tokenizer.py:745

Parse constraint dicts into ConstraintToken objects.

Accepts constraint dicts from cadling’s SketchGeometryExtractor (via Sketch2DItem.to_geotoken_constraints()) or any dict with “type”, “source_index”/“entity_a”, and “target_index”/“entity_b”.

Unknown constraint types are silently skipped.

Args: constraints: List of constraint dicts. Each dict should have: - type: String matching a ConstraintType name (e.g., “PARALLEL”, “TANGENT”, “COINCIDENT”) - source_index or entity_a: Index of first entity - target_index or entity_b: Index of second entity - value (optional): Quantized constraint value

Returns: List of ConstraintToken instances.

Example: constraints = [ {“type”: “PARALLEL”, “source_index”: 0, “target_index”: 2}, {“type”: “TANGENT”, “entity_a”: 1, “entity_b”: 3, “confidence”: 0.95}, ] tokens = CommandSequenceTokenizer.parse_constraints(constraints)

Source: geotoken/geotoken/tokenizer/token_types.py:78

A single CAD operation in a sketch-and-extrude sequence.

Represents one command in the DeepCAD-style construction history. Each command has a type, up to 16 quantized integer parameters, and a mask indicating which parameters are active.

Args: command_type: The CAD operation type (SOL, LINE, ARC, etc.). parameters: Up to 16 quantized integer parameter values. parameter_mask: Boolean mask for active parameters per command type.

Methods

active_parameters() -> list[int]

Source: geotoken/geotoken/tokenizer/token_types.py:95

Return only the active parameters for this command.

get_parameter_mask(command_type: CommandType) -> list[bool]

Source: geotoken/geotoken/tokenizer/token_types.py:99

Get the canonical parameter mask for a command type.

Parameter semantics per command type:

  • SOL: 2 active parameters
    • params[0]: sketch plane z-offset (height from origin)
    • params[1]: sketch plane rotation/normal orientation
  • LINE: xy endpoints (4 params: x1, y1, x2, y2)
  • ARC: start/mid/end points (6 params: x1, y1, x2, y2, x3, y3)
  • CIRCLE: center + radius (3 params: cx, cy, r)
  • EXTRUDE: extent/scale/boolean params (up to 8 params)
  • EOS: no active parameters
__init__(command_type: CommandType, parameters: list[int] = (lambda: [0] * 16)(), parameter_mask: list[bool] = (lambda: [False] * 16)()) -> None

Source: geotoken/geotoken/tokenizer/token_types.py

Source: geotoken/geotoken/config.py:89

Configuration for CAD command sequence tokenization.

Controls how parametric CAD construction history is converted to fixed-length command token sequences for transformer consumption.

Args: max_sequence_length: Target command count (DeepCAD uses 60). coordinate_quantization: Precision for coordinate values. parameter_quantization: Precision for command parameters. normalization_range: Bounding cube size (2.0 = 2x2x2 cube). canonicalize_loops: Reorder sketch loops to canonical form. include_constraints: Include SketchGraphs-style constraint tokens. pad_to_max_length: Pad shorter sequences to max_sequence_length. source_format: Source data format. “deepcad” expects compact params matching masks directly. “cadling” will auto-strip z-interleaved and padded params to compact form for backward compatibility with older cadling output. Default “auto” detects format heuristically.

Methods

__init__(max_sequence_length: int = 60, coordinate_quantization: PrecisionTier = PrecisionTier.STANDARD, parameter_quantization: PrecisionTier = PrecisionTier.STANDARD, normalization_range: float = 2.0, canonicalize_loops: bool = True, include_constraints: bool = False, pad_to_max_length: bool = True, source_format: Literal['deepcad', 'cadling', 'auto'] = 'auto') -> None

Source: geotoken/geotoken/config.py

Source: geotoken/geotoken/tokenizer/token_types.py:15

CAD command types following DeepCAD’s 6 command vocabulary.

Source: geotoken/geotoken/tokenizer/token_types.py:134

A geometric constraint between sketch primitives.

Maps to SketchGraphs’ constraint edges — encodes designer-imposed relationships like parallelism, tangency, and coincidence.

Args: constraint_type: Type of geometric constraint. source_index: Index of first primitive in the command sequence. target_index: Index of second primitive. value: Optional quantized constraint value (for distance/angle).

Methods

__init__(constraint_type: ConstraintType, source_index: int, target_index: int, value: Optional[int] = None) -> None

Source: geotoken/geotoken/tokenizer/token_types.py

Source: geotoken/geotoken/tokenizer/token_types.py:25

Geometric constraint types following SketchGraphs.

Source: geotoken/geotoken/tokenizer/token_types.py:49

A quantized 3D coordinate token.

Methods

to_tuple() -> tuple[int, int, int]

Source: geotoken/geotoken/tokenizer/token_types.py:58

__init__(x: int, y: int, z: int, bits: int = 8, vertex_index: int = -1) -> None

Source: geotoken/geotoken/tokenizer/token_types.py

Source: geotoken/geotoken/quantization/uv_grid_quantizer.py:125

Quantized tokens for [10, 6] edge UV-grid.

Represents an edge sampled at regular parameter intervals with xyz points and tangent vectors.

Channels in the input [num_samples, 6] grid: - 0-2: XYZ edge points (quantized) - 3-5: Tangent vectors (quantized separately)

Attributes: edge_index: Index of the edge in the topology graph. num_samples: Number of samples along the edge. quantized_xyz: Quantized XYZ values — shape (N, 3) int32. quantized_tangents: Quantized tangent values — shape (N, 3) int32. params_xyz: Normalization parameters for XYZ quantization. params_tangents: Normalization parameters for tangent quantization. bits: Bit-width per coordinate dimension.

Methods

__init__(edge_index: int = -1, num_samples: int = 10, quantized_xyz: np.ndarray = (lambda: np.empty((0, 3), dtype=(np.int32)))(), quantized_tangents: np.ndarray = (lambda: np.empty((0, 3), dtype=(np.int32)))(), params_xyz: FeatureQuantizationParams | None = None, params_tangents: FeatureQuantizationParams | None = None, bits: int = 8, is_approximated: bool = False) -> None

Source: geotoken/geotoken/quantization/uv_grid_quantizer.py

Source: geotoken/geotoken/quantization/uv_grid_quantizer.py:85

Quantized tokens for [10, 10, 7] face UV-grid.

Represents a full face UV-grid with xyz points, normals, and trim mask. The grid is sampled on a regular (U, V) parameter grid.

Channels in the input [num_u, num_v, 7] grid: - 0-2: XYZ surface points (quantized) - 3-5: Surface normals (quantized separately) - 6: Trim mask (preserved as boolean, NOT quantized)

Attributes: face_index: Index of the face in the topology graph. grid_resolution: (num_u, num_v) grid dimensions. quantized_xyz: Quantized XYZ values — shape (U*V, 3) int32. quantized_normals: Quantized normal values — shape (U*V, 3) int32. trim_mask: Boolean trim mask — shape (U*V,) bool. params_xyz: Normalization parameters for XYZ quantization. params_normals: Normalization parameters for normals quantization. bits: Bit-width per coordinate dimension.

Methods

__init__(face_index: Optional[int] = None, grid_resolution: tuple[int, int] = (10, 10), quantized_xyz: np.ndarray = (lambda: np.empty((0, 3), dtype=(np.int32)))(), quantized_normals: np.ndarray = (lambda: np.empty((0, 3), dtype=(np.int32)))(), trim_mask: np.ndarray = (lambda: np.empty((0,), dtype=bool))(), params_xyz: FeatureQuantizationParams | None = None, params_normals: FeatureQuantizationParams | None = None, bits: int = 8, is_approximated: bool = False) -> None

Source: geotoken/geotoken/quantization/uv_grid_quantizer.py

Source: geotoken/geotoken/quantization/feature_quantizer.py:21

Fitted normalization parameters for feature quantization.

Stores per-dimension min/max values learned from training data, used to map feature values to the [0, levels-1] quantization range.

Args: dim: Feature dimensionality. min_vals: Per-dimension minimum values (shape: (D,)). max_vals: Per-dimension maximum values (shape: (D,)). scale: Pre-computed per-dimension scale factor.

Methods

__init__(dim: int, min_vals: np.ndarray, max_vals: np.ndarray, scale: np.ndarray) -> None

Source: geotoken/geotoken/quantization/feature_quantizer.py

Source: geotoken/geotoken/quantization/feature_quantizer.py:40

Quantize dense feature vectors to discrete token sequences.

Maps N-dimensional float feature vectors to integer values in [0, 2^bits - 1] using per-dimension linear quantization. Supports fit/quantize workflow for learning normalization from data.

Args: bits: Quantization bit width per dimension. Default 8 → 256 levels. strategy: Normalization strategy: “per_dimension” - Normalize each feature dim independently (default) “global” - Use a single min/max across all dimensions

Example: quantizer = FeatureVectorQuantizer(bits=8) params = quantizer.fit(training_features) # (N, 48) float32 quantized = quantizer.quantize(features, params) # (N, 48) int reconstructed = quantizer.dequantize(quantized, params) # (N, 48) float32

Methods

__init__(bits: int = 8, strategy: str = 'per_dimension')

Source: geotoken/geotoken/quantization/feature_quantizer.py:60

fit(features: np.ndarray) -> FeatureQuantizationParams

Source: geotoken/geotoken/quantization/feature_quantizer.py:66

Compute normalization parameters from feature data.

Learns per-dimension (or global) min/max values that will be used to map features to the quantization range.

Note: This method caches the returned params in self._params for convenience (so that subsequent quantize() calls can omit the params argument). However, because the cached state is overwritten on every call, callers that need thread-safety or interleaved fits should use the returned FeatureQuantizationParams object directly instead of relying on the cached instance state.

Args: features: Feature array of shape (N, D) where N is number of samples and D is feature dimensionality.

Returns: FeatureQuantizationParams with learned normalization values.

Raises: ValueError: If features array is not 2D.

quantize(features: np.ndarray, params: Optional[FeatureQuantizationParams] = None) -> np.ndarray

Source: geotoken/geotoken/quantization/feature_quantizer.py:133

Quantize float features to integer tokens.

Maps each feature dimension from [min_d, max_d] to [0, levels-1]. Values outside the fitted range are clamped.

Args: features: Feature array of shape (N, D) float32. params: Normalization params (uses fitted params if None).

Returns: Quantized array of shape (N, D) int64 with values in [0, levels-1].

Raises: ValueError: If params not provided and fit() hasn’t been called. ValueError: If feature dimensionality doesn’t match params.

dequantize(quantized: np.ndarray, params: Optional[FeatureQuantizationParams] = None) -> np.ndarray

Source: geotoken/geotoken/quantization/feature_quantizer.py:176

Reconstruct approximate float features from quantized tokens.

Inverse of quantize(): maps integer tokens back to approximate continuous feature values.

Args: quantized: Quantized array of shape (N, D) int. params: Normalization params (uses fitted params if None).

Returns: Reconstructed feature array of shape (N, D) float32.

quantize_single(feature_vector: np.ndarray, params: Optional[FeatureQuantizationParams] = None) -> list[int]

Source: geotoken/geotoken/quantization/feature_quantizer.py:205

Quantize a single feature vector and return as list of ints.

Convenience method for quantizing individual vectors.

Args: feature_vector: 1D feature array of shape (D,). params: Normalization params.

Returns: List of quantized integer values.

Source: geotoken/geotoken/tokenizer/geo_tokenizer.py:23

Main geometric tokenizer.

Converts 3D mesh/point cloud data into discrete token sequences using adaptive or uniform quantization.

Example: tokenizer = GeoTokenizer() tokens = tokenizer.tokenize(vertices, faces) reconstructed = tokenizer.detokenize(tokens)

Methods

__init__(config: Optional[QuantizationConfig] = None)

Source: geotoken/geotoken/tokenizer/geo_tokenizer.py:35

Initialize tokenizer.

Args: config: Quantization configuration. Defaults to STANDARD tier with adaptive quantization.

tokenize(vertices: np.ndarray, faces: Optional[np.ndarray] = None) -> TokenSequence

Source: geotoken/geotoken/tokenizer/geo_tokenizer.py:47

Tokenize 3D geometry into a token sequence.

Args: vertices: (N, 3) vertex positions faces: (F, 3) face indices (optional, improves adaptive quality)

Returns: TokenSequence with coordinate and geometry tokens

Raises: TypeError: If vertices is not a numpy array. ValueError: If vertices is not a 2D array with 3 columns.

detokenize(tokens: TokenSequence) -> np.ndarray

Source: geotoken/geotoken/tokenizer/geo_tokenizer.py:82

Reconstruct 3D coordinates from token sequence.

Args: tokens: TokenSequence from tokenize()

Returns: (N, 3) reconstructed vertex positions

analyze_impact(vertices: np.ndarray, faces: Optional[np.ndarray] = None) -> ImpactReport

Source: geotoken/geotoken/tokenizer/geo_tokenizer.py:131

Analyze quantization impact on geometry quality.

Args: vertices: Original vertex positions faces: Face indices

Returns: ImpactReport with quality metrics

Source: geotoken/geotoken/tokenizer/token_types.py:66

A token representing geometric structure (face, edge, etc.).

Methods

__init__(token_type: str, indices: list[int] = list(), properties: dict = dict()) -> None

Source: geotoken/geotoken/tokenizer/token_types.py

Source: geotoken/geotoken/tokenizer/token_types.py:194

Quantized edge feature token.

Represents a directed edge in the B-Rep topology graph with its feature vector quantized to discrete token values.

Args: source_index: Source node index. target_index: Target node index. feature_tokens: Quantized feature values (one per dimension). bits: Quantization bit width per feature dimension.

Methods

__init__(source_index: int, target_index: int, feature_tokens: list[int] = list(), bits: int = 8) -> None

Source: geotoken/geotoken/tokenizer/token_types.py

Source: geotoken/geotoken/tokenizer/token_types.py:169

Quantized node (face/edge/vertex) feature token.

Represents a single node in a B-Rep topology graph with its feature vector quantized to discrete token values. Each dimension of the feature vector becomes a separate quantized integer.

Args: node_index: Index of this node in the graph. feature_tokens: Quantized feature values (one per dimension). node_type: Node entity type (“face”, “edge”, “vertex”). bits: Quantization bit width per feature dimension.

Methods

__init__(node_index: int, feature_tokens: list[int] = list(), node_type: str = 'face', bits: int = 8) -> None

Source: geotoken/geotoken/tokenizer/token_types.py

Source: geotoken/geotoken/tokenizer/token_types.py:218

Graph structure marker token.

Used to delimit graph, node, and edge boundaries in the serialized flat token sequence.

Args: token_type: Structure marker type. One of: “graph_start” - Start of a graph (value = num_nodes) “graph_end” - End of a graph “node_start” - Start of a node (value = node_index) “node_end” - End of a node “adjacency” - Adjacency list entry (value = neighbor_index) “edge” - Edge marker (value encodes src << 16 | tgt) value: Associated integer value.

Methods

__init__(token_type: Literal['graph_start', 'graph_end', 'node_start', 'node_end', 'adjacency', 'edge'], value: int = 0) -> None

Source: geotoken/geotoken/tokenizer/token_types.py

Source: geotoken/geotoken/config.py:119

Configuration for B-Rep topology graph tokenization.

Controls how enriched B-Rep topology graphs (nodes with feature vectors, edges with feature vectors, adjacency structure) are converted to flat token sequences for transformer consumption.

Args: node_bits: Quantization bits per dimension for node features. edge_bits: Quantization bits per dimension for edge features. max_nodes: Maximum node count (truncate or pad). max_edges: Maximum edge count (truncate or pad). include_uv_grids: Include UV-grid summary tokens per face/edge. uv_grid_summary_dim: Number of summary dimensions per UV grid. node_feature_dim: Expected node feature dimensionality (48 for cadling). edge_feature_dim: Expected edge feature dimensionality. Set to 12 for BrepGen-aligned configurations, or 16 for cadling’s default enhanced edge features (which include 4 extra topology flags). adjacency_encoding: How adjacency is serialized. “explicit” lists neighbor indices per node. “implicit” relies on sorted edge list. pad_to_max: Pad shorter graphs to max_nodes/max_edges.

Methods

__init__(node_bits: int = 8, edge_bits: int = 8, max_nodes: int = 256, max_edges: int = 1024, include_uv_grids: bool = False, uv_grid_summary_dim: int = 6, node_feature_dim: int = 48, edge_feature_dim: int = 16, adjacency_encoding: Literal['implicit', 'explicit'] = 'explicit', pad_to_max: bool = True) -> None

Source: geotoken/geotoken/config.py

Source: geotoken/geotoken/tokenizer/graph_tokenizer.py:41

Tokenize B-Rep topology graphs with node/edge features.

Converts a topology graph with dense feature vectors on nodes and edges into a TokenSequence containing quantized graph tokens. The resulting flat token sequence can be encoded into integer IDs via CADVocabulary for transformer input.

The tokenizer includes FeatureVectorQuantizer instances for both node and edge features. These quantizers can be pre-fitted on training data, or will auto-fit on the first call to tokenize().

Args: config: Graph tokenization configuration.

Example: config = GraphTokenizationConfig(node_bits=8, edge_bits=8) tokenizer = GraphTokenizer(config)

# From cadling's TopologyGraph:
node_feats = topology_graph.to_numpy_node_features() # (N, 48)
edge_idx = topology_graph.to_edge_index() # (2, M)
edge_feats = topology_graph.to_numpy_edge_features() # (M, 16)
token_seq = tokenizer.tokenize(node_feats, edge_idx, edge_feats)

Methods

__init__(config: Optional[GraphTokenizationConfig] = None) -> None

Source: geotoken/geotoken/tokenizer/graph_tokenizer.py:68

fit(node_features: np.ndarray, edge_features: Optional[np.ndarray] = None) -> None

Source: geotoken/geotoken/tokenizer/graph_tokenizer.py:82

Pre-fit quantization parameters on training data.

Call this before tokenize() to use consistent normalization across multiple graphs. If not called, tokenize() will auto-fit on each graph independently.

Args: node_features: Training node features, shape (N_total, D_node). edge_features: Training edge features, shape (M_total, D_edge).

tokenize(node_features: np.ndarray, edge_index: np.ndarray, edge_features: Optional[np.ndarray] = None, node_types: Optional[list[str]] = None) -> TokenSequence

Source: geotoken/geotoken/tokenizer/graph_tokenizer.py:107

Convert topology graph to token sequence.

Args: node_features: Node feature array, shape (N, D_node) float32. Typically 48-dim for cadling face features. edge_index: Edge index array, shape (2, M) int64. Row 0 = source indices, row 1 = target indices. edge_features: Optional edge feature array, shape (M, D_edge) float32. Typically 16-dim for cadling edge features. node_types: Optional per-node type labels (e.g., [“face”, “face”, …]).

Returns: TokenSequence with graph_node_tokens, graph_edge_tokens, and graph_structure_tokens populated.

Raises: ValueError: If array shapes are inconsistent.

detokenize(token_sequence: TokenSequence) -> dict[str, Any]

Source: geotoken/geotoken/tokenizer/graph_tokenizer.py:289

Reconstruct graph structure from token sequence.

Inverse of tokenize(). Recovers approximate node features, edge features, and adjacency structure from the token sequence.

Args: token_sequence: TokenSequence with graph tokens.

Returns: Dict with keys: - node_features: Reconstructed (N, D) float32 array - edge_index: (2, M) int64 array - edge_features: Reconstructed (M, D) float32 array or None - num_nodes: int - num_edges: int - node_types: list[str] - adjacency: dict[int, list[int]]

Source: geotoken/geotoken/config.py:34

Configuration for geometry normalization.

Methods

__init__(center: bool = True, uniform_scale: bool = True, preserve_aspect_ratio: bool = True, target_range: tuple[float, float] = (0.0, 1.0)) -> None

Source: geotoken/geotoken/config.py

Source: geotoken/geotoken/config.py:12

Quantization precision tiers.

TierBitsLevelsUse case
DRAFT664Fast preview
STANDARD8256Balanced (default)
PRECISION101024High fidelity

Source: geotoken/geotoken/config.py:78

Main quantization configuration.

Methods

__init__(tier: PrecisionTier = PrecisionTier.STANDARD, adaptive: bool = True, normalization: NormalizationConfig = NormalizationConfig(), bit_allocation: AdaptiveBitAllocationConfig = AdaptiveBitAllocationConfig(), minimum_feature_threshold: float = 0.05, float_tolerance: float = 1e-10) -> None

Source: geotoken/geotoken/config.py

Source: geotoken/geotoken/tokenizer/token_types.py:239

Metadata and configuration for a command sequence.

Controls fixed-length padding, quantization resolution, and normalization range for transformer consumption.

Args: max_commands: Target sequence length (DeepCAD uses 60). quantization_bits: Bits for parameter quantization. coordinate_range: Normalization bounding cube size. padding_token_id: Token ID used for padding.

Methods

__init__(max_commands: int = 60, quantization_bits: int = 8, coordinate_range: float = 2.0, padding_token_id: int = 0) -> None

Source: geotoken/geotoken/tokenizer/token_types.py

Source: geotoken/geotoken/tokenizer/token_types.py:262

A sequence of geometric tokens.

Contains coordinate tokens (mesh-level), geometry tokens (structural), command tokens (parametric CAD construction history), constraint tokens, and graph tokens (B-Rep topology).

Methods

to_array() -> np.ndarray

Source: geotoken/geotoken/tokenizer/token_types.py:287

Convert coordinate tokens to (N, 3) integer array.

command_types() -> list[CommandType]

Source: geotoken/geotoken/tokenizer/token_types.py:305

Get the sequence of command types.

__init__(coordinate_tokens: list[CoordinateToken] = list(), geometry_tokens: list[GeometryToken] = list(), command_tokens: list[CommandToken] = list(), constraint_tokens: list[ConstraintToken] = list(), boolean_op_tokens: list[BooleanOpToken] = list(), graph_node_tokens: list[GraphNodeToken] = list(), graph_edge_tokens: list[GraphEdgeToken] = list(), graph_structure_tokens: list[GraphStructureToken] = list(), metadata: dict = dict()) -> None

Source: geotoken/geotoken/tokenizer/token_types.py

Source: geotoken/geotoken/quantization/uv_grid_quantizer.py:165

Quantize UV-parameter grid samples from B-Rep surface patches.

Samples each B-Rep face on a regular (U, V) parameter grid, evaluates the surface mapping (u, v) → (x, y, z), and quantizes the resulting 3-D points using :class:FeatureVectorQuantizer.

The quantized grid can be linearised into a flat token sequence (row-major order) and prepended/appended to the topology-graph token stream to give the model an explicit surface shape channel.

Args: grid_resolution: (num_u, num_v) sample count along each parameter direction. Total samples per face = num_u × num_v. Default (5, 5) → 25 samples. bits: Quantization bit-width per coordinate axis. Default 8 → 256 levels.

Example::

quantizer = UVGridQuantizer(grid_resolution=(5, 5), bits=8)
tokens = quantizer.quantize_surface_samples(uv, xyz)

Methods

__init__(grid_resolution: tuple[int, int] = (5, 5), bits: int = 8) -> None

Source: geotoken/geotoken/quantization/uv_grid_quantizer.py:190

fit_global(all_xyz_samples: np.ndarray, all_normals_samples: np.ndarray | None = None, all_tangents_samples: np.ndarray | None = None) -> None

Source: geotoken/geotoken/quantization/uv_grid_quantizer.py:225

Fit global normalization params on combined data from all faces/edges.

When global params are set, quantize_surface_samples, quantize_face_uv_grid, and quantize_edge_uv_grid will use them instead of per-face fitting. This ensures cross-face token comparability — the same XYZ coordinate always maps to the same token regardless of which face it belongs to.

Args: all_xyz_samples: Combined XYZ points from all faces/edges, shape (N, 3) float32. all_normals_samples: Optional combined normals from all faces, shape (M, 3) float32. all_tangents_samples: Optional combined tangents from all edges, shape (K, 3) float32.

quantize_surface_samples(uv_samples: np.ndarray, xyz_samples: np.ndarray, face_index: Optional[int] = None) -> UVGridTokens

Source: geotoken/geotoken/quantization/uv_grid_quantizer.py:275

Quantize (u, v) → (x, y, z) mappings from a single surface.

The caller is responsible for evaluating the surface at the desired UV points. This method quantizes the xyz side and packages the result into :class:UVGridTokens.

Args: uv_samples: UV parameters — shape (N, 2) where N = num_u * num_v. xyz_samples: Corresponding 3-D points — shape (N, 3). face_index: Optional face index for bookkeeping.

Returns: :class:UVGridTokens with quantized grid.

Raises: ValueError: If shapes are inconsistent.

quantize_from_topology(topology: TopologyGraph) -> dict[int, UVGridTokens]

Source: geotoken/geotoken/quantization/uv_grid_quantizer.py:333

Extract and quantize UV grids for all faces in a topology graph.

The topology graph’s 48-dim node features encode UV statistics at known indices (see cadling.backend.step.enhanced_features). When raw UV samples are not directly available, this method synthesises a regular grid from the per-face UV bounds stored in the feature vector.

Args: topology: A :class:TopologyGraph from a CADlingDocument.

Returns: Mapping of {face_index → UVGridTokens}. Faces without UV data are silently skipped.

dequantize(tokens: UVGridTokens) -> np.ndarray

Source: geotoken/geotoken/quantization/uv_grid_quantizer.py:431

Dequantize grid tokens back to approximate 3-D coordinates.

Args: tokens: :class:UVGridTokens with quantized grid.

Returns: Reconstructed (N, 3) float32 array.

Raises: ValueError: If params are missing from the token object.

to_flat_tokens(tokens: UVGridTokens) -> list[int]

Source: geotoken/geotoken/quantization/uv_grid_quantizer.py:450

Linearise quantized grid to a flat integer token list.

The grid is serialised in row-major order (U-major, then V). Each sample produces 3 tokens (quantized x, y, z), so the total length is num_u * num_v * 3.

Args: tokens: :class:UVGridTokens with quantized grid.

Returns: Flat list of integer tokens.

quantize_face_uv_grid(uv_grid: np.ndarray, face_index: Optional[int] = None) -> FaceUVGridTokens

Source: geotoken/geotoken/quantization/uv_grid_quantizer.py:469

Quantize full [num_u, num_v, 7] face UV-grid.

Processes a face UV-grid with xyz points, normals, and trim mask. XYZ and normals are quantized independently; the trim mask is preserved as a boolean array without quantization.

Channels: - 0-2: XYZ points (quantized) - 3-5: normals (quantized separately) - 6: trim mask (preserved as bool)

Args: uv_grid: Face UV-grid — shape (num_u, num_v, 7) float32. face_index: Optional face index for bookkeeping.

Returns: :class:FaceUVGridTokens with quantized xyz, normals, and preserved trim mask.

Raises: ValueError: If shape is not (num_u, num_v, 7).

quantize_edge_uv_grid(uv_grid: np.ndarray, edge_index: int = -1) -> EdgeUVGridTokens

Source: geotoken/geotoken/quantization/uv_grid_quantizer.py:543

Quantize full [num_samples, 6] edge UV-grid.

Processes an edge UV-grid with xyz points and tangent vectors. Both are quantized independently.

Channels: - 0-2: XYZ points (quantized) - 3-5: tangent vectors (quantized)

Args: uv_grid: Edge UV-grid — shape (num_samples, 6) float32. edge_index: Optional edge index for bookkeeping.

Returns: :class:EdgeUVGridTokens with quantized xyz and tangents.

Raises: ValueError: If shape is not (num_samples, 6).

dequantize_face_grid(tokens: FaceUVGridTokens) -> np.ndarray

Source: geotoken/geotoken/quantization/uv_grid_quantizer.py:611

Dequantize face grid tokens back to [num_u, num_v, 7] array.

Reconstructs the full face UV-grid from quantized tokens. XYZ and normals are dequantized; the trim mask is restored from the boolean array.

Args: tokens: :class:FaceUVGridTokens with quantized data.

Returns: Reconstructed (num_u, num_v, 7) float32 array.

Raises: ValueError: If params are missing from the token object.

dequantize_edge_grid(tokens: EdgeUVGridTokens) -> np.ndarray

Source: geotoken/geotoken/quantization/uv_grid_quantizer.py:647

Dequantize edge grid tokens back to [num_samples, 6] array.

Reconstructs the full edge UV-grid from quantized tokens.

Args: tokens: :class:EdgeUVGridTokens with quantized data.

Returns: Reconstructed (num_samples, 6) float32 array.

Raises: ValueError: If params are missing from the token object.

Source: geotoken/geotoken/quantization/uv_grid_quantizer.py:54

Quantized UV-grid tokens for a single B-Rep face.

Attributes: face_index: Index of the face in the topology graph. grid_resolution: (num_u, num_v) grid dimensions. uv_samples: Original UV parameter values — shape (U*V, 2). xyz_samples: Original 3-D surface points — shape (U*V, 3). quantized_grid: Quantized XYZ values — shape (U*V, 3) with integer values in [0, 2^bits - 1]. params: Normalization parameters used for quantization. bits: Bit-width per coordinate dimension.

Methods

__init__(face_index: Optional[int] = None, grid_resolution: tuple[int, int] = (5, 5), uv_samples: np.ndarray = (lambda: np.empty((0, 2), dtype=(np.float32)))(), xyz_samples: np.ndarray = (lambda: np.empty((0, 3), dtype=(np.float32)))(), quantized_grid: np.ndarray = (lambda: np.empty((0, 3), dtype=(np.int32)))(), params: FeatureQuantizationParams | None = None, bits: int = 8, is_approximated: bool = False) -> None

Source: geotoken/geotoken/quantization/uv_grid_quantizer.py

Source: geotoken/geotoken/quantization/uniform.py:18

Fixed-precision quantizer (DeepCAD baseline).

Quantizes all coordinates to the same number of levels.

Methods

__init__(bits: int = 8)

Source: geotoken/geotoken/quantization/uniform.py:24

Initialize with fixed bit width.

Args: bits: Bit width for all coordinates (default 8 = 256 levels)

from_tier(tier: PrecisionTier) -> 'UniformQuantizer'

Source: geotoken/geotoken/quantization/uniform.py:35

Create quantizer from precision tier.

quantize(values: np.ndarray) -> np.ndarray

Source: geotoken/geotoken/quantization/uniform.py:40

Quantize normalized values [0, 1] to integer levels.

Args: values: Normalized values in [0, 1]

Returns: Integer quantized values in [0, levels-1]

dequantize(quantized: np.ndarray) -> np.ndarray

Source: geotoken/geotoken/quantization/uniform.py:53

Dequantize integer levels back to [0, 1] range.

Args: quantized: Integer values in [0, levels-1]

Returns: Reconstructed values in [0, 1]