vision3d.ops#

Geometric operators for 3D data.

Functions

`batched_nms_3d`(boxes, scores, idxs, ...)	Class-aware 3D NMS: runs `nms_3d()` independently per class.
`box3d_convert`(boxes, in_fmt, out_fmt)	Convert 3D bounding boxes from `in_fmt` to `out_fmt`.
`box3d_corners`(boxes, format)	Compute the 8 world-space corners of 3D bounding boxes.
`box3d_iou`(boxes1, boxes2, format)	Compute the pairwise intersection-over-union of 3D `boxes1` and `boxes2`.
`box3d_overlap`(boxes1, boxes2, format)	Check 3D overlap between two sets of oriented bounding boxes.
`nms_3d`(boxes, scores, iou_threshold, format)	Greedy, class-agnostic non-maximum suppression on 3D bounding boxes.
`points_in_boxes_3d`(points, boxes, format)	Compute a boolean mask indicating which points fall inside which boxes.
`points_in_boxes_3d_indices`(points, boxes, format)	Return per-point box assignment.
`project_to_image`(points_3d, extrinsics, ...)	Project 3D points in lidar frame to pixel coordinates.
`voxelize`(points, point_cloud_range, voxel_size)	Bucket points into a 3D voxel grid.

vision3d.ops.batched_nms_3d(boxes, scores, idxs, iou_threshold, format)[source]#

Class-aware 3D NMS: runs nms_3d() independently per class.

Parameters:

boxes (Tensor) – [N, K] boxes.
scores (Tensor) – [N] prediction confidences.
idxs (Tensor) – [N] integer class labels.
iou_threshold (float) – See nms_3d().
format (BoundingBox3DFormat) – Format of boxes.

Returns:

int64 tensor of indices into boxes that survived, sorted in decreasing order of score.

Return type:

Tensor

vision3d.ops.box3d_convert(boxes, in_fmt, out_fmt)[source]#

Convert 3D bounding boxes from in_fmt to out_fmt.

Only the lossless XYZXYZ <-> XYZLWH conversion is supported. All other conversions would discard or fabricate rotation angles.

Parameters:

boxes (Tensor) – Boxes to convert with shape [..., K]. Supports any number of leading batch dimensions.
in_fmt (BoundingBox3DFormat | str) – Source format.
out_fmt (BoundingBox3DFormat | str) – Target format.

Returns:

Converted boxes with the same leading dimensions.

Return type:

Tensor

vision3d.ops.box3d_corners(boxes, format)[source]#

Compute the 8 world-space corners of 3D bounding boxes.

Supports all rotation formats including full 9-DOF (yaw, pitch, roll).

Corner ordering:

4 -------- 5       z  x
|\         |\      |  /
| 7 -------| 6     | /
| |        | |     |/
0 |--------1 |     +------ y
 \|         \|
  3 -------- 2

Bottom face (z-): {0, 1, 2, 3}. Top face (z+): {4, 5, 6, 7}.

Parameters:

boxes (Tensor) – 3D bounding boxes [N, K].
format (BoundingBox3DFormat) – Format of the bounding boxes.

Returns:

Corner coordinates [N, 8, 3].

Return type:

Tensor

vision3d.ops.box3d_iou(boxes1, boxes2, format)[source]#

Compute the pairwise intersection-over-union of 3D boxes1 and boxes2.

iou = vol / (vol1 + vol2 - vol), where vol is the volume of the intersecting convex polyhedron and vol1, vol2 are the volumes of the two input boxes.

The same algorithm handles every supported box format — including full 9-DOF orientation (XYZLWHYPR) — because the clipping step operates on the 8 box corners regardless of how they were produced.

Note: This function is not differentiable.

Parameters:

boxes1 (Tensor) – First set of boxes [N, K].
boxes2 (Tensor) – Second set of boxes [M, K].
format (BoundingBox3DFormat) – Format of both box sets.

Returns:

[N, M] matrix of IoU values in [0, 1].

Return type:

Tensor

vision3d.ops.box3d_overlap(boxes1, boxes2, format)[source]#

Check 3D overlap between two sets of oriented bounding boxes.

Uses the Separating Axis Theorem (SAT) with 15 potential separating axes (3 face normals per box + 9 edge cross products).

Parameters:

boxes1 (Tensor) – First set of boxes [N, K].
boxes2 (Tensor) – Second set of boxes [M, K].
format (BoundingBox3DFormat) – Format of both box sets.

Returns:

Boolean matrix [N, M] where True indicates overlap.

Return type:

Tensor

vision3d.ops.nms_3d(boxes, scores, iou_threshold, format)[source]#

Greedy, class-agnostic non-maximum suppression on 3D bounding boxes.

Iteratively removes lower-scoring boxes whose IoU with a higher-scoring box exceeds iou_threshold.

Parameters:

boxes (Tensor) – [N, K] boxes to perform NMS on. K depends on format.
scores (Tensor) – [N] prediction confidences.
iou_threshold (float) – Discard any box whose IoU with a higher-scoring kept box is strictly greater than this value.
format (BoundingBox3DFormat) – Format of boxes.

Returns:

int64 tensor of indices into boxes that survived, sorted in decreasing order of score.

Return type:

Tensor

vision3d.ops.points_in_boxes_3d(points, boxes, format)[source]#

Compute a boolean mask indicating which points fall inside which boxes.

Supports all rotation formats including full 9-DOF (yaw, pitch, roll).

Parameters:

points (Tensor) – Point cloud coordinates [N, 3+C]. Only the first 3 columns (x, y, z) are used.
boxes (Tensor) – 3D bounding boxes [M, K] where K depends on format.
format (BoundingBox3DFormat) – Format of the bounding boxes.

Returns:

Boolean tensor [N, M] where entry (i, j) is True if point i is inside box j.

Return type:

Tensor

vision3d.ops.points_in_boxes_3d_indices(points, boxes, format)[source]#

Return per-point box assignment.

If a point is inside multiple boxes, the first (lowest index) box wins.

Parameters:

points (Tensor) – Point cloud coordinates [N, 3+C].
boxes (Tensor) – 3D bounding boxes [M, K].
format (BoundingBox3DFormat) – Format of the bounding boxes.

Returns:

Integer tensor [N] with the index of the box each point belongs to, or -1 if the point is not in any box.

Return type:

Tensor

vision3d.ops.project_to_image(points_3d, extrinsics, intrinsics)[source]#

Project 3D points in lidar frame to pixel coordinates.

Parameters:

points_3d (Tensor) – Points in lidar frame [N, 3].
extrinsics (Tensor) – Lidar-to-camera transformation [4, 4].
intrinsics (Tensor) – Camera intrinsic matrix [3, 3].

Returns:

(uv, depth) where uv is the pixel coordinates [N, 2] (u, v) and depth is the camera-frame depth [N].

Return type:

tuple[Tensor, Tensor]

vision3d.ops.voxelize(points, point_cloud_range, voxel_size, max_points_per_voxel=32, max_voxels=None)[source]#

Bucket points into a 3D voxel grid.

For PointPillars-style pillars set voxel_size = (dx, dy, z_max - z_min) so the grid has a single z-slice and every kept point lands at iz = 0. For 3D voxel detectors (VoxelNet, SECOND, CenterPoint-Voxel) use a normal three-axis voxel_size.

The op is not differentiable.

Grid dimensions are computed as round((max - min) / voxel_size) with ties going away from zero. Ranges that are not an exact multiple of voxel_size round to the nearest integer cell count. Points landing in the partial trailing cell are clamped to the final grid index. Points with non-finite coordinates (NaN, +/-Inf) are treated as out-of-range and dropped.

Parameters:

points (Tensor) – [N, C] float32 point cloud. The first three columns must be (x, y, z). Remaining columns may be arbitrary per-point features. Points outside point_cloud_range along any axis are discarded.
point_cloud_range (tuple[float, float, float, float, float, float]) – (x_min, y_min, z_min, x_max, y_max, z_max).
voxel_size (tuple[float, float, float]) – (dx, dy, dz) voxel size.
max_points_per_voxel (int) – Cap on points stored per voxel. Surplus points are dropped in input order. Default: 32.
max_voxels (int | None) – Optional cap on the number of output voxels. None (default) means no cap. When set, points that would have created a new voxel beyond max_voxels are silently dropped. Points landing in already-allocated voxels still get written.

Returns:

(voxels, coords, num_points) where voxels is [P, max_points_per_voxel, C] per-voxel point buffers padded with zeros at unfilled slots, coords is [P, 3] int64 (iz, iy, ix) voxel indices, and num_points is [P] int64 point counts per voxel, each in [1, max_points_per_voxel]. P is the number of non-empty voxels (capped at max_voxels when set). When no points fall inside point_cloud_range, all three tensors have a leading dimension of zero.

Return type:

tuple[Tensor, Tensor, Tensor]