vision3d.tensors#

TVTensor subclasses with 3D semantics.

Functions

wrap(wrappee, *, like, **kwargs)

Convert a Tensor into the same TVTensor subclass as like.

Classes

`BoundingBox3DFormat`(*values)	Coordinate format of a 3D bounding box.
`BoundingBoxes3D`(data, *, format[, dtype, ...])	`Tensor` subclass for 3D bounding boxes with shape `[N, K]`.
`CameraExtrinsics`(data, *[, dtype, device, ...])	`Tensor` subclass for camera extrinsic matrices with shape `[N, 4, 4]`.
`CameraImages`(data, *[, dtype, device, ...])	`Image` subclass for multi-camera images with shape `[N, C, H, W]`.
`CameraIntrinsics`(data, *, image_size[, ...])	`Tensor` subclass for camera intrinsic matrices with shape `[N, 3, 3]`.
`PointCloud3D`(data, *[, dtype, device, ...])	`Tensor` subclass for 3D point clouds with shape `[N, 3+C]`.

class vision3d.tensors.BoundingBox3DFormat(*values)[source]#

Bases: Enum

Coordinate format of a 3D bounding box.

Dimension convention:

l (length) = extent along X (forward axis, dx)
w (width) = extent along Y (lateral axis, dy)
h (height) = extent along Z (vertical axis, dz)

Rotation uses intrinsic Tait-Bryan ZY’X’’ angles (yaw, pitch, roll) in radians (-pi, +pi).

Available formats are:

XYZXYZ: axis-aligned box via two opposite corners; x1, y1, z1 (min corner), x2, y2, z2 (max corner). 6 values.
XYZLWH: center position and axis-aligned extents; cx, cy, cz (center), l, w, h (dx, dy, dz). 6 values.
XYZLWHY: center, extents, and yaw; cx, cy, cz, l, w, h, yaw. 7 values.
XYZLWHYPR: center, extents, and full Euler rotation; cx, cy, cz, l, w, h, yaw, pitch, roll. 9 values.

class vision3d.tensors.BoundingBoxes3D(data, *, format, dtype=None, device=None, requires_grad=None)[source]#

Bases: TVTensor

Tensor subclass for 3D bounding boxes with shape [N, K].

Where N is the number of bounding boxes and K depends on the format: 6 for axis-aligned (XYZXYZ, XYZLWH), 7 for yaw-only (XYZLWHY), or 9 for full 9-DOF (XYZLWHYPR).

Rotation angles are Euler angles in radians (-pi to +pi).

Parameters:

data (Any) – Any data that can be turned into a tensor with torch.as_tensor().
format (BoundingBox3DFormat, str) – Format of the 3D bounding box.
dtype (torch.dtype, optional) – Desired data type of the bounding box. If omitted, will be inferred from data.
device (torch.device, optional) – Desired device of the bounding box. If omitted and data is a Tensor, the device is taken from it. Otherwise, the bounding box is constructed on the CPU.
requires_grad (bool, optional) – Whether autograd should record operations on the bounding box. If omitted and data is a Tensor, the value is taken from it. Otherwise, defaults to False.

Return type:

Self

class vision3d.tensors.CameraExtrinsics(data, *, dtype=None, device=None, requires_grad=None)[source]#

Bases: TVTensor

Tensor subclass for camera extrinsic matrices with shape [N, 4, 4].

Each matrix transforms a point from the dataset’s source frame to camera frame. The source frame is dataset-defined: lidar for lidar-equipped datasets (e.g. KITTI, nuScenes), ego/world for camera-only datasets.

3D spatial transforms (flip, rotate, etc.) update these matrices to keep the source-to-camera mapping consistent after the source frame changes.

Parameters:

data (Any) – Any data that can be turned into a tensor with torch.as_tensor().
dtype (torch.dtype, optional) – Desired data type.
device (torch.device, optional) – Desired device.
requires_grad (bool, optional) – Whether autograd should record operations.

Return type:

Self

class vision3d.tensors.CameraImages(data, *, dtype=None, device=None, requires_grad=None)[source]#

Bases: Image

Image subclass for multi-camera images with shape [N, C, H, W].

Inherits from torchvision.tv_tensors.Image so every torchvision v2 image transform dispatches automatically.

For 3D spatial transforms (flip, rotate, etc.) this type passes through unchanged.

Parameters:

data (Any) – Any data that can be turned into a tensor with torch.as_tensor().
dtype (torch.dtype, optional) – Desired data type.
device (torch.device, optional) – Desired device.
requires_grad (bool, optional) – Whether autograd should record operations.

Return type:

Self

class vision3d.tensors.CameraIntrinsics(data, *, image_size, dtype=None, device=None, requires_grad=None)[source]#

Bases: TVTensor

Tensor subclass for camera intrinsic matrices with shape [N, 3, 3].

Each matrix maps from camera-frame 3D coordinates to pixel coordinates.

Image-space transforms (resize, crop, etc.) update these matrices. 3D spatial transforms pass them through unchanged.

Parameters:

data (Any) – Any data that can be turned into a tensor with torch.as_tensor().
image_size (tuple) – Height and width of the corresponding images as (h, w). Required for geometric transforms (resize) that need to compute scale factors.
dtype (torch.dtype, optional) – Desired data type.
device (torch.device, optional) – Desired device.
requires_grad (bool, optional) – Whether autograd should record operations.

Return type:

Self

class vision3d.tensors.PointCloud3D(data, *, dtype=None, device=None, requires_grad=None)[source]#

Bases: TVTensor

Tensor subclass for 3D point clouds with shape [N, 3+C].

The first 3 columns are (x, y, z) coordinates. Additional columns are per-point features (e.g. intensity, color, normals).

Parameters:

data (Any) – Any data that can be turned into a tensor with torch.as_tensor().
dtype (torch.dtype, optional) – Desired data type. If omitted, will be inferred from data.
device (torch.device, optional) – Desired device. If omitted and data is a Tensor, the device is taken from it. Otherwise, the point cloud is constructed on the CPU.
requires_grad (bool, optional) – Whether autograd should record operations. If omitted and data is a Tensor, the value is taken from it. Otherwise, defaults to False.

Return type:

Self

vision3d.tensors.wrap(wrappee, *, like, **kwargs)[source]#

Convert a Tensor into the same TVTensor subclass as like.

If like is a BoundingBoxes3D, the format of like is assigned to wrappee unless overridden via kwargs.

Parameters:

wrappee (Tensor) – The tensor to convert.
like (TVTensor) – The reference. wrappee will be converted into the same subclass as like.
kwargs (Any) – Can contain "format" if like is a BoundingBoxes3D. Ignored otherwise.

Returns:

A TVTensor of the same subclass as like.

Return type:

TVTensor