Skip to content

API Reference

Generated from the ll_ocadr package source. Each symbol links to its definition on GitHub.

build_model_and_tokenizer(model_name: str, device: str = 'cpu', shape_depth: int | None = None) -> tuple[LatticelabsOCADRForCausalLM, AutoTokenizer, LLOCADRConfig, LLOCADRProcessor]

Source: ll_ocadr/run_ll_ocadr_hf.py:51

Build the OCADR model, tokenizer, config, and preprocessor for HF inference.

n_embed is derived from the chosen language model so the mesh embeddings line up with the LM’s embedding space. The <mesh> token is registered and the LM’s token-embedding matrix is resized to match.

main() -> None

Source: ll_ocadr/run_ll_ocadr_hf.py:151

run_inference(model: LatticelabsOCADRForCausalLM, processor: LLOCADRProcessor, tokenizer, mesh_file: str | Sequence[str], prompt: str, max_new_tokens: int = 64, cropping: bool = True, do_sample: bool = False) -> str

Source: ll_ocadr/run_ll_ocadr_hf.py:94

Run the full mesh-file -> text pipeline and return the decoded output.

mesh_file may be a single path or a sequence of paths. The number of <mesh> placeholders in prompt must equal the number of mesh files; as a convenience, a single mesh with no placeholder gets one appended.

The target device is taken from the model itself (single source of truth), so the inputs always land on the same device as the model.

Source: ll_ocadr/vllm/latticelabs_ocadr.py:575

vLLM integration layer for LL-OCADR. Mirrors DeepseekOCRMultiModalProcessor structure.

Methods

__init__(config)

Source: ll_ocadr/vllm/latticelabs_ocadr.py:581

Source: ll_ocadr/vllm/latticelabs_ocadr.py:520

Metadata about mesh processing for vLLM. Provides token count calculations for KV cache allocation.

Methods

__init__(config)

Source: ll_ocadr/vllm/latticelabs_ocadr.py:526

get_num_mesh_tokens(mesh_file: str, chunking: bool = True) -> int

Source: ll_ocadr/vllm/latticelabs_ocadr.py:532

Calculate token count based on mesh complexity. Critical for vLLM’s KV cache allocation.

Args: mesh_file: Path to mesh file chunking: Whether to use spatial chunking

Returns: Total number of tokens this mesh will generate

class LatticelabsOCADRForCausalLM(nn.Module)

Section titled “class LatticelabsOCADRForCausalLM(nn.Module)”

Source: ll_ocadr/vllm/latticelabs_ocadr.py:15

Main model class integrating 3D geometry processing with LLM. Mirrors DeepseekOCRForCausalLM structure but for 3D meshes.

Methods

__init__(config)

Source: ll_ocadr/vllm/latticelabs_ocadr.py:21

get_input_embeddings(input_ids: torch.Tensor, multimodal_embeddings: list[torch.Tensor] | None = None, image_embeddings: list[torch.Tensor] | None = None) -> torch.Tensor

Source: ll_ocadr/vllm/latticelabs_ocadr.py:338

Merge mesh (and optional rendered-image) embeddings with text embeddings. Mesh logic is identical to DeepSeek-OCR’s implementation; image tokens are spliced at image_token_id positions when the vision modality is enabled.

Args: input_ids: [batch, seq_len] with mesh_token_id (and optionally image_token_id) placeholders multimodal_embeddings: List of [num_mesh_tokens, n_embed] tensors image_embeddings: Optional list of [num_vision_tokens, n_embed] tensors (one per item)

Returns: inputs_embeds: [batch, seq_len, n_embed] merged embeddings

forward(input_ids: torch.Tensor, attention_mask: torch.Tensor | None = None, vertex_coords: torch.Tensor | None = None, vertex_normals: torch.Tensor | None = None, chunks_coords: torch.Tensor | None = None, chunks_normals: torch.Tensor | None = None, mesh_spatial_partition: torch.Tensor | None = None, pixel_values: torch.Tensor | None = None, **kwargs)

Source: ll_ocadr/vllm/latticelabs_ocadr.py:376

Full inference pipeline integrating 3D geometry + language.

Args: input_ids: [batch, seq_len] with mesh_token_id placeholders attention_mask: [batch, seq_len] vertex_coords: [batch, N, 3] vertex_normals: [batch, N, 3] chunks_coords: [batch, num_chunks, M, 3] chunks_normals: [batch, num_chunks, M, 3] mesh_spatial_partition: [batch, 3] pixel_values: optional rendered images [batch, 3, H, W] or [batch, V, 3, H, W] (V views); used only when the vision modality is enabled (config.use_vision)

Returns: Language model outputs

generate(input_ids: torch.Tensor, attention_mask: torch.Tensor | None = None, vertex_coords: torch.Tensor | None = None, vertex_normals: torch.Tensor | None = None, chunks_coords: torch.Tensor | None = None, chunks_normals: torch.Tensor | None = None, mesh_spatial_partition: torch.Tensor | None = None, pixel_values: torch.Tensor | None = None, **kwargs)

Source: ll_ocadr/vllm/latticelabs_ocadr.py:439

Autoregressive generation with 3D mesh (and optional rendered-image) conditioning.

Processes mesh inputs through the 3D encoders (and rendered images through the vision tower when config.use_vision is set), merges the resulting embeddings with text token embeddings, then delegates to the inner language model’s generate() (which inherits from GenerationMixin).

Accepts all keyword arguments supported by transformers.GenerationMixin.generate (e.g. max_new_tokens, temperature, top_p, do_sample).

Returns: Generated token IDs from the language model.

build_ll_ocadr_model(config)

Source: ll_ocadr/vllm/latticelabs_ocadr.py:503

Build LatticeLabs OCADR model.

Module ll_ocadr.vllm.lattice_encoder.geometry_net

Section titled “Module ll_ocadr.vllm.lattice_encoder.geometry_net”

Source: ll_ocadr/vllm/lattice_encoder/geometry_net.py:241

Local geometry encoder for mesh chunks. Equivalent to SAM for images, extracts fine-grained geometric features.

Architecture: - PointNet++ with 2 set abstraction layers - Multi-head attention for local context - Outputs 256-dimensional features per sampled point

Input: coords [B, N, 3], normals [B, N, 3] Output: features [B, 128, 256]

Methods

__init__()

Source: ll_ocadr/vllm/lattice_encoder/geometry_net.py:255

forward(coords, normals)

Source: ll_ocadr/vllm/lattice_encoder/geometry_net.py:285

Args: coords: [B, N, 3] vertex coordinates normals: [B, N, 3] vertex normals

Returns: features: [B, 128, 256] - 128 sampled points with 256-dim features

Source: ll_ocadr/vllm/lattice_encoder/geometry_net.py:195

Set Abstraction layer for hierarchical point cloud feature learning.

Methods

__init__(npoint, radius, nsample, in_channel, mlp)

Source: ll_ocadr/vllm/lattice_encoder/geometry_net.py:200

forward(xyz, points)

Source: ll_ocadr/vllm/lattice_encoder/geometry_net.py:213

Args: xyz: [B, N, 3] coordinates points: [B, N, D] features

Returns: new_xyz: [B, npoint, 3] new_points: [B, npoint, mlp[-1]]

build_geometry_net()

Source: ll_ocadr/vllm/lattice_encoder/geometry_net.py:311

Build GeometryNet encoder.

farthest_point_sample(xyz, npoint)

Source: ll_ocadr/vllm/lattice_encoder/geometry_net.py:68

Farthest Point Sampling for downsampling point clouds.

Uses torch_cluster.fps (single fused CUDA kernel) when available, falling back to a Python loop implementation otherwise.

Args: xyz: [B, N, 3] point cloud npoint: number of samples

Returns: centroids: [B, npoint] sampled point indices

index_points(points, idx)

Source: ll_ocadr/vllm/lattice_encoder/geometry_net.py:100

Index points based on indices.

Args: points: [B, N, C] idx: [B, S] or [B, S, K]

Returns: new_points: [B, S, C] or [B, S, K, C]

query_ball_point(radius, nsample, xyz, new_xyz)

Source: ll_ocadr/vllm/lattice_encoder/geometry_net.py:127

Find all points within radius from query points.

Args: radius: local region radius nsample: max sample number in local region xyz: [B, N, 3] all points new_xyz: [B, S, 3] query points

Returns: group_idx: [B, S, nsample]

sample_and_group(npoint, radius, nsample, xyz, points)

Source: ll_ocadr/vllm/lattice_encoder/geometry_net.py:159

Sample and group points.

Args: npoint: number of centroids radius: ball query radius nsample: max number of samples per ball xyz: [B, N, 3] coordinates points: [B, N, D] point features

Returns: new_xyz: [B, npoint, 3] new_points: [B, npoint, nsample, 3+D]

square_distance(src, dst)

Source: ll_ocadr/vllm/lattice_encoder/geometry_net.py:12

Calculate Euclidean distance between each pair of points.

Args: src: [B, N, C] dst: [B, M, C]

Returns: dist: [B, N, M]

Module ll_ocadr.vllm.lattice_encoder.shape_net

Section titled “Module ll_ocadr.vllm.lattice_encoder.shape_net”

Source: ll_ocadr/vllm/lattice_encoder/shape_net.py:15

Tokenize point cloud into spatial patches. Divides point cloud into groups and embeds each group.

Methods

__init__(patch_size = 32, embed_dim = 768)

Source: ll_ocadr/vllm/lattice_encoder/shape_net.py:21

forward(coords, normals)

Source: ll_ocadr/vllm/lattice_encoder/shape_net.py:41

Args: coords: [B, N, 3] vertex coordinates normals: [B, N, 3] vertex normals

Returns: patch_tokens: [B, num_patches, embed_dim]

Source: ll_ocadr/vllm/lattice_encoder/shape_net.py:148

Global shape encoder for full mesh context. Equivalent to CLIP for images, extracts high-level semantic features.

Architecture: - Patch-based tokenization of point cloud - Transformer encoder with positional encoding - CLS token for global shape representation

Input: coords [B, N, 3], normals [B, N, 3] Output: features [B, 257, 768] - CLS + 256 patch tokens

Methods

__init__(embed_dim = 768, depth = 12, num_heads = 12)

Source: ll_ocadr/vllm/lattice_encoder/shape_net.py:162

forward(coords, normals)

Source: ll_ocadr/vllm/lattice_encoder/shape_net.py:195

Args: coords: [B, N, 3] downsampled mesh vertices normals: [B, N, 3] vertex normals

Returns: features: [B, 257, 768] - CLS token + 256 patch tokens

Source: ll_ocadr/vllm/lattice_encoder/shape_net.py:108

Standard Transformer block with self-attention and FFN.

Methods

__init__(embed_dim, num_heads, mlp_ratio = 4.0, dropout = 0.0)

Source: ll_ocadr/vllm/lattice_encoder/shape_net.py:113

forward(x)

Source: ll_ocadr/vllm/lattice_encoder/shape_net.py:129

Args: x: [B, N, embed_dim]

Returns: x: [B, N, embed_dim]

build_shape_net(embed_dim = 768, depth = 12, num_heads = 12)

Source: ll_ocadr/vllm/lattice_encoder/shape_net.py:237

Build ShapeNet encoder.