Skip to content

Tutorial: Tokenize a mesh

In this tutorial you will tokenize a mesh with geotoken, reconstruct it, and compare precision tiers. Allow ~10 minutes. geotoken needs only NumPy (trimesh is optional).

import numpy as np
# A unit tetrahedron
vertices = np.array(
[[0, 0, 0], [1, 0, 0], [0, 1, 0], [0, 0, 1]], dtype=np.float32
)
faces = np.array(
[[0, 1, 2], [0, 1, 3], [0, 2, 3], [1, 2, 3]], dtype=np.int64
)

To start from a real mesh file instead, load it with trimesh and pass mesh.vertices / mesh.faces.

2. Tokenize with the default (STANDARD, adaptive) config

Section titled “2. Tokenize with the default (STANDARD, adaptive) config”
from geotoken import GeoTokenizer, QuantizationConfig, PrecisionTier
config = QuantizationConfig(tier=PrecisionTier.STANDARD, adaptive=True)
tokenizer = GeoTokenizer(config)
tokens = tokenizer.tokenize(vertices, faces)
reconstructed = tokenizer.detokenize(tokens)
print("reconstructed vertices:\n", reconstructed)

The tokenizer normalizes into a unit cube, scores per-vertex complexity (curvature + feature density), allocates bits accordingly, and prevents distinct vertices from collapsing into the same quantized value.

impact = tokenizer.analyze_impact(vertices, faces)
print(f"mean error: {impact.mean_error:.6f}")
print(f"hausdorff distance: {impact.hausdorff_distance:.6f}")
for tier in (PrecisionTier.DRAFT, PrecisionTier.STANDARD, PrecisionTier.PRECISION):
t = GeoTokenizer(QuantizationConfig(tier=tier, adaptive=True))
impact = t.analyze_impact(vertices, faces)
print(f"{tier.name:9s} mean_error={impact.mean_error:.6f}")

Expect error to fall as bit-width rises (DRAFT 6-bit → STANDARD 8-bit → PRECISION 10-bit), at the cost of more tokens.

from geotoken import CommandSequenceTokenizer, CADVocabulary
commands = [
{"type": "SOL", "params": [0.5, 0.5] + [0] * 14},
{"type": "LINE", "params": [0.0, 0.0, 0, 1.0, 0.0] + [0] * 11},
{"type": "EXTRUDE", "params": [0] * 15 + [5.0]},
{"type": "EOS", "params": [0] * 16},
]
seq = CommandSequenceTokenizer().tokenize(commands)
ids = CADVocabulary().encode(seq.command_tokens)
print("token ids:", ids[:10], "")