API Reference
Generated from the ll_ocadr package source. Each symbol links to its definition on GitHub.
Module ll_ocadr.run_ll_ocadr_hf
Section titled “Module ll_ocadr.run_ll_ocadr_hf”build_model_and_tokenizer
Section titled “build_model_and_tokenizer”build_model_and_tokenizer(model_name: str, device: str = 'cpu', shape_depth: int | None = None) -> tuple[LatticelabsOCADRForCausalLM, AutoTokenizer, LLOCADRConfig, LLOCADRProcessor]Source: ll_ocadr/run_ll_ocadr_hf.py:51
Build the OCADR model, tokenizer, config, and preprocessor for HF inference.
n_embed is derived from the chosen language model so the mesh embeddings
line up with the LM’s embedding space. The <mesh> token is registered and
the LM’s token-embedding matrix is resized to match.
main() -> NoneSource: ll_ocadr/run_ll_ocadr_hf.py:151
run_inference
Section titled “run_inference”run_inference(model: LatticelabsOCADRForCausalLM, processor: LLOCADRProcessor, tokenizer, mesh_file: str | Sequence[str], prompt: str, max_new_tokens: int = 64, cropping: bool = True, do_sample: bool = False) -> strSource: ll_ocadr/run_ll_ocadr_hf.py:94
Run the full mesh-file -> text pipeline and return the decoded output.
mesh_file may be a single path or a sequence of paths. The number of
<mesh> placeholders in prompt must equal the number of mesh files;
as a convenience, a single mesh with no placeholder gets one appended.
The target device is taken from the model itself (single source of truth), so the inputs always land on the same device as the model.
Module ll_ocadr.vllm.latticelabs_ocadr
Section titled “Module ll_ocadr.vllm.latticelabs_ocadr”class LLOCADRMultiModalProcessor
Section titled “class LLOCADRMultiModalProcessor”Source: ll_ocadr/vllm/latticelabs_ocadr.py:575
vLLM integration layer for LL-OCADR. Mirrors DeepseekOCRMultiModalProcessor structure.
Methods
__init__
Section titled “__init__”__init__(config)Source: ll_ocadr/vllm/latticelabs_ocadr.py:581
class LLOCADRProcessingInfo
Section titled “class LLOCADRProcessingInfo”Source: ll_ocadr/vllm/latticelabs_ocadr.py:520
Metadata about mesh processing for vLLM. Provides token count calculations for KV cache allocation.
Methods
__init__
Section titled “__init__”__init__(config)Source: ll_ocadr/vllm/latticelabs_ocadr.py:526
get_num_mesh_tokens
Section titled “get_num_mesh_tokens”get_num_mesh_tokens(mesh_file: str, chunking: bool = True) -> intSource: ll_ocadr/vllm/latticelabs_ocadr.py:532
Calculate token count based on mesh complexity. Critical for vLLM’s KV cache allocation.
Args: mesh_file: Path to mesh file chunking: Whether to use spatial chunking
Returns: Total number of tokens this mesh will generate
class LatticelabsOCADRForCausalLM(nn.Module)
Section titled “class LatticelabsOCADRForCausalLM(nn.Module)”Source: ll_ocadr/vllm/latticelabs_ocadr.py:15
Main model class integrating 3D geometry processing with LLM. Mirrors DeepseekOCRForCausalLM structure but for 3D meshes.
Methods
__init__
Section titled “__init__”__init__(config)Source: ll_ocadr/vllm/latticelabs_ocadr.py:21
get_input_embeddings
Section titled “get_input_embeddings”get_input_embeddings(input_ids: torch.Tensor, multimodal_embeddings: list[torch.Tensor] | None = None, image_embeddings: list[torch.Tensor] | None = None) -> torch.TensorSource: ll_ocadr/vllm/latticelabs_ocadr.py:338
Merge mesh (and optional rendered-image) embeddings with text embeddings.
Mesh logic is identical to DeepSeek-OCR’s implementation; image tokens
are spliced at image_token_id positions when the vision modality is
enabled.
Args: input_ids: [batch, seq_len] with mesh_token_id (and optionally image_token_id) placeholders multimodal_embeddings: List of [num_mesh_tokens, n_embed] tensors image_embeddings: Optional list of [num_vision_tokens, n_embed] tensors (one per item)
Returns: inputs_embeds: [batch, seq_len, n_embed] merged embeddings
forward
Section titled “forward”forward(input_ids: torch.Tensor, attention_mask: torch.Tensor | None = None, vertex_coords: torch.Tensor | None = None, vertex_normals: torch.Tensor | None = None, chunks_coords: torch.Tensor | None = None, chunks_normals: torch.Tensor | None = None, mesh_spatial_partition: torch.Tensor | None = None, pixel_values: torch.Tensor | None = None, **kwargs)Source: ll_ocadr/vllm/latticelabs_ocadr.py:376
Full inference pipeline integrating 3D geometry + language.
Args: input_ids: [batch, seq_len] with mesh_token_id placeholders attention_mask: [batch, seq_len] vertex_coords: [batch, N, 3] vertex_normals: [batch, N, 3] chunks_coords: [batch, num_chunks, M, 3] chunks_normals: [batch, num_chunks, M, 3] mesh_spatial_partition: [batch, 3] pixel_values: optional rendered images [batch, 3, H, W] or [batch, V, 3, H, W] (V views); used only when the vision modality is enabled (config.use_vision)
Returns: Language model outputs
generate
Section titled “generate”generate(input_ids: torch.Tensor, attention_mask: torch.Tensor | None = None, vertex_coords: torch.Tensor | None = None, vertex_normals: torch.Tensor | None = None, chunks_coords: torch.Tensor | None = None, chunks_normals: torch.Tensor | None = None, mesh_spatial_partition: torch.Tensor | None = None, pixel_values: torch.Tensor | None = None, **kwargs)Source: ll_ocadr/vllm/latticelabs_ocadr.py:439
Autoregressive generation with 3D mesh (and optional rendered-image) conditioning.
Processes mesh inputs through the 3D encoders (and rendered images
through the vision tower when config.use_vision is set), merges the
resulting embeddings with text token embeddings, then delegates to the
inner language model’s generate() (which inherits from GenerationMixin).
Accepts all keyword arguments supported by
transformers.GenerationMixin.generate (e.g. max_new_tokens,
temperature, top_p, do_sample).
Returns: Generated token IDs from the language model.
build_ll_ocadr_model
Section titled “build_ll_ocadr_model”build_ll_ocadr_model(config)Source: ll_ocadr/vllm/latticelabs_ocadr.py:503
Build LatticeLabs OCADR model.
Module ll_ocadr.vllm.lattice_encoder.geometry_net
Section titled “Module ll_ocadr.vllm.lattice_encoder.geometry_net”class GeometryNet(nn.Module)
Section titled “class GeometryNet(nn.Module)”Source: ll_ocadr/vllm/lattice_encoder/geometry_net.py:241
Local geometry encoder for mesh chunks. Equivalent to SAM for images, extracts fine-grained geometric features.
Architecture: - PointNet++ with 2 set abstraction layers - Multi-head attention for local context - Outputs 256-dimensional features per sampled point
Input: coords [B, N, 3], normals [B, N, 3] Output: features [B, 128, 256]
Methods
__init__
Section titled “__init__”__init__()Source: ll_ocadr/vllm/lattice_encoder/geometry_net.py:255
forward
Section titled “forward”forward(coords, normals)Source: ll_ocadr/vllm/lattice_encoder/geometry_net.py:285
Args: coords: [B, N, 3] vertex coordinates normals: [B, N, 3] vertex normals
Returns: features: [B, 128, 256] - 128 sampled points with 256-dim features
class PointNetSetAbstraction(nn.Module)
Section titled “class PointNetSetAbstraction(nn.Module)”Source: ll_ocadr/vllm/lattice_encoder/geometry_net.py:195
Set Abstraction layer for hierarchical point cloud feature learning.
Methods
__init__
Section titled “__init__”__init__(npoint, radius, nsample, in_channel, mlp)Source: ll_ocadr/vllm/lattice_encoder/geometry_net.py:200
forward
Section titled “forward”forward(xyz, points)Source: ll_ocadr/vllm/lattice_encoder/geometry_net.py:213
Args: xyz: [B, N, 3] coordinates points: [B, N, D] features
Returns: new_xyz: [B, npoint, 3] new_points: [B, npoint, mlp[-1]]
build_geometry_net
Section titled “build_geometry_net”build_geometry_net()Source: ll_ocadr/vllm/lattice_encoder/geometry_net.py:311
Build GeometryNet encoder.
farthest_point_sample
Section titled “farthest_point_sample”farthest_point_sample(xyz, npoint)Source: ll_ocadr/vllm/lattice_encoder/geometry_net.py:68
Farthest Point Sampling for downsampling point clouds.
Uses torch_cluster.fps (single fused CUDA kernel) when available, falling back to a Python loop implementation otherwise.
Args: xyz: [B, N, 3] point cloud npoint: number of samples
Returns: centroids: [B, npoint] sampled point indices
index_points
Section titled “index_points”index_points(points, idx)Source: ll_ocadr/vllm/lattice_encoder/geometry_net.py:100
Index points based on indices.
Args: points: [B, N, C] idx: [B, S] or [B, S, K]
Returns: new_points: [B, S, C] or [B, S, K, C]
query_ball_point
Section titled “query_ball_point”query_ball_point(radius, nsample, xyz, new_xyz)Source: ll_ocadr/vllm/lattice_encoder/geometry_net.py:127
Find all points within radius from query points.
Args: radius: local region radius nsample: max sample number in local region xyz: [B, N, 3] all points new_xyz: [B, S, 3] query points
Returns: group_idx: [B, S, nsample]
sample_and_group
Section titled “sample_and_group”sample_and_group(npoint, radius, nsample, xyz, points)Source: ll_ocadr/vllm/lattice_encoder/geometry_net.py:159
Sample and group points.
Args: npoint: number of centroids radius: ball query radius nsample: max number of samples per ball xyz: [B, N, 3] coordinates points: [B, N, D] point features
Returns: new_xyz: [B, npoint, 3] new_points: [B, npoint, nsample, 3+D]
square_distance
Section titled “square_distance”square_distance(src, dst)Source: ll_ocadr/vllm/lattice_encoder/geometry_net.py:12
Calculate Euclidean distance between each pair of points.
Args: src: [B, N, C] dst: [B, M, C]
Returns: dist: [B, N, M]
Module ll_ocadr.vllm.lattice_encoder.shape_net
Section titled “Module ll_ocadr.vllm.lattice_encoder.shape_net”class PointPatchEmbedding(nn.Module)
Section titled “class PointPatchEmbedding(nn.Module)”Source: ll_ocadr/vllm/lattice_encoder/shape_net.py:15
Tokenize point cloud into spatial patches. Divides point cloud into groups and embeds each group.
Methods
__init__
Section titled “__init__”__init__(patch_size = 32, embed_dim = 768)Source: ll_ocadr/vllm/lattice_encoder/shape_net.py:21
forward
Section titled “forward”forward(coords, normals)Source: ll_ocadr/vllm/lattice_encoder/shape_net.py:41
Args: coords: [B, N, 3] vertex coordinates normals: [B, N, 3] vertex normals
Returns: patch_tokens: [B, num_patches, embed_dim]
class ShapeNet(nn.Module)
Section titled “class ShapeNet(nn.Module)”Source: ll_ocadr/vllm/lattice_encoder/shape_net.py:148
Global shape encoder for full mesh context. Equivalent to CLIP for images, extracts high-level semantic features.
Architecture: - Patch-based tokenization of point cloud - Transformer encoder with positional encoding - CLS token for global shape representation
Input: coords [B, N, 3], normals [B, N, 3] Output: features [B, 257, 768] - CLS + 256 patch tokens
Methods
__init__
Section titled “__init__”__init__(embed_dim = 768, depth = 12, num_heads = 12)Source: ll_ocadr/vllm/lattice_encoder/shape_net.py:162
forward
Section titled “forward”forward(coords, normals)Source: ll_ocadr/vllm/lattice_encoder/shape_net.py:195
Args: coords: [B, N, 3] downsampled mesh vertices normals: [B, N, 3] vertex normals
Returns: features: [B, 257, 768] - CLS token + 256 patch tokens
class TransformerBlock(nn.Module)
Section titled “class TransformerBlock(nn.Module)”Source: ll_ocadr/vllm/lattice_encoder/shape_net.py:108
Standard Transformer block with self-attention and FFN.
Methods
__init__
Section titled “__init__”__init__(embed_dim, num_heads, mlp_ratio = 4.0, dropout = 0.0)Source: ll_ocadr/vllm/lattice_encoder/shape_net.py:113
forward
Section titled “forward”forward(x)Source: ll_ocadr/vllm/lattice_encoder/shape_net.py:129
Args: x: [B, N, embed_dim]
Returns: x: [B, N, embed_dim]
build_shape_net
Section titled “build_shape_net”build_shape_net(embed_dim = 768, depth = 12, num_heads = 12)Source: ll_ocadr/vllm/lattice_encoder/shape_net.py:237
Build ShapeNet encoder.