vision3d.datasets#

3D object detection dataset loaders.

Functions

collate_fn(batch)

Collate a batch of (inputs, targets) without stacking.

Classes

CameraInputs

Camera-only sample: images, intrinsics, and extrinsics always present.

FusionInputs

Fusion sample: lidar plus multi-camera, every field present.

Kitti3D(root[, train, transforms, download])

KITTI 3D Dataset.

LidarInputs

Lidar-only sample: points always present.

NuScenes3D(root[, version, split, transforms])

nuScenes 3D object detection dataset.

SampleInputs

Per-frame model inputs; base type with all fields optional.

SampleTargets

Per-frame ground-truth annotations.

class vision3d.datasets.CameraInputs[source]#

Bases: SampleInputs

Camera-only sample: images, intrinsics, and extrinsics always present.

class vision3d.datasets.FusionInputs[source]#

Bases: SampleInputs

Fusion sample: lidar plus multi-camera, every field present.

class vision3d.datasets.Kitti3D(root, train=True, transforms=None, download=False)[source]#

Bases: Dataset[tuple[FusionInputs, SampleTargets | None]]

KITTI 3D Dataset.

Returns samples in lidar frame (X-forward, Y-left, Z-up), converting from KITTI’s camera-frame annotations automatically.

Parameters:
  • root (str or pathlib.Path) –

    Root directory where data is downloaded to. Expects the following folder structure if download=False:

    <root>
        └── Kitti3D/
            └── raw/
                ├── training/
                |   ├── velodyne/
                |   ├── label_2/
                |   ├── calib/
                |   └── image_2/
                └── testing/
                    ├── velodyne/
                    ├── calib/
                    └── image_2/
    

  • train (bool, optional) – Use train split if true, else test split. Defaults to True.

  • transforms (Callable, optional) – A function/transform that takes input sample and its target as entry and returns a transformed version.

  • download (bool, optional) – If true, downloads the dataset from the internet and puts it in root directory. If dataset is already downloaded, it is not downloaded again.

download()[source]#

Download the KITTI data if it doesn’t exist already.

Return type:

None

class vision3d.datasets.LidarInputs[source]#

Bases: SampleInputs

Lidar-only sample: points always present.

class vision3d.datasets.NuScenes3D(root, version='v1.0-mini', split='train', transforms=None)[source]#

Bases: Dataset[tuple[FusionInputs, SampleTargets]]

nuScenes 3D object detection dataset.

Returns samples in the global frame with annotations as BoundingBoxes3D in XYZLWHY format (yaw extracted from quaternion). Multi-camera images, intrinsics, and extrinsics are returned for all 6 cameras.

Requires the nuscenes-devkit package.

Parameters:
  • root (str or pathlib.Path) – Root directory of the nuScenes dataset.

  • version (str) – Dataset version. Default: "v1.0-mini".

  • split (str) – One of "train" or "val". Default: "train".

  • transforms (Callable, optional) – A function/transform that takes input sample and its target as entry and returns a transformed version.

class vision3d.datasets.SampleInputs[source]#

Bases: TypedDict

Per-frame model inputs; base type with all fields optional.

Fields are ReadOnly so dataset-specific subclasses can narrow them from NotRequired to Required.

Variables:
class vision3d.datasets.SampleTargets[source]#

Bases: TypedDict

Per-frame ground-truth annotations.

Variables:
vision3d.datasets.collate_fn(batch)[source]#

Collate a batch of (inputs, targets) without stacking.

Variable-size tensors (point clouds, bounding boxes) cannot be stacked into a single tensor. This collate function groups them as tuples, matching torchvision’s detection collate pattern.

Parameters:

batch (list[tuple[Any, Any]]) – List of (inputs, targets) from the dataset.

Returns:

Tuple of (inputs_tuple, targets_tuple).

Return type:

tuple[tuple[Any, …], tuple[Any, …]]