vision3d.datasets#
3D object detection dataset loaders.
Functions
|
Collate a batch of |
Classes
Camera-only sample: images, intrinsics, and extrinsics always present. |
|
Fusion sample: lidar plus multi-camera, every field present. |
|
|
KITTI 3D Dataset. |
Lidar-only sample: points always present. |
|
|
nuScenes 3D object detection dataset. |
Per-frame model inputs; base type with all fields optional. |
|
Per-frame ground-truth annotations. |
- class vision3d.datasets.CameraInputs[source]#
Bases:
SampleInputsCamera-only sample: images, intrinsics, and extrinsics always present.
- class vision3d.datasets.FusionInputs[source]#
Bases:
SampleInputsFusion sample: lidar plus multi-camera, every field present.
- class vision3d.datasets.Kitti3D(root, train=True, transforms=None, download=False)[source]#
Bases:
Dataset[tuple[FusionInputs,SampleTargets|None]]KITTI 3D Dataset.
Returns samples in lidar frame (X-forward, Y-left, Z-up), converting from KITTI’s camera-frame annotations automatically.
- Parameters:
root (
strorpathlib.Path) –Root directory where data is downloaded to. Expects the following folder structure if download=False:
<root> └── Kitti3D/ └── raw/ ├── training/ | ├── velodyne/ | ├── label_2/ | ├── calib/ | └── image_2/ └── testing/ ├── velodyne/ ├── calib/ └── image_2/train (
bool, optional) – Usetrainsplit if true, elsetestsplit. Defaults toTrue.transforms (
Callable, optional) – A function/transform that takes input sample and its target as entry and returns a transformed version.download (
bool, optional) – If true, downloads the dataset from the internet and puts it in root directory. If dataset is already downloaded, it is not downloaded again.
- class vision3d.datasets.LidarInputs[source]#
Bases:
SampleInputsLidar-only sample: points always present.
- class vision3d.datasets.NuScenes3D(root, version='v1.0-mini', split='train', transforms=None)[source]#
Bases:
Dataset[tuple[FusionInputs,SampleTargets]]nuScenes 3D object detection dataset.
Returns samples in the global frame with annotations as
BoundingBoxes3DinXYZLWHYformat (yaw extracted from quaternion). Multi-camera images, intrinsics, and extrinsics are returned for all 6 cameras.Requires the
nuscenes-devkitpackage.- Parameters:
root (
strorpathlib.Path) – Root directory of the nuScenes dataset.version (
str) – Dataset version. Default:"v1.0-mini".split (
str) – One of"train"or"val". Default:"train".transforms (
Callable, optional) – A function/transform that takes input sample and its target as entry and returns a transformed version.
- class vision3d.datasets.SampleInputs[source]#
Bases:
TypedDictPer-frame model inputs; base type with all fields optional.
Fields are
ReadOnlyso dataset-specific subclasses can narrow them fromNotRequiredtoRequired.- Variables:
points (vision3d.tensors._point_cloud_3d.PointCloud3D) – Lidar point cloud for the frame.
images (vision3d.tensors._camera.CameraImages) – Multi-camera image tensor, one row per camera.
extrinsics (vision3d.tensors._camera.CameraExtrinsics) – Lidar-to-camera transforms, one row per camera.
intrinsics (vision3d.tensors._camera.CameraIntrinsics) – Per-camera pinhole intrinsic matrices.
- class vision3d.datasets.SampleTargets[source]#
Bases:
TypedDictPer-frame ground-truth annotations.
- Variables:
boxes (vision3d.tensors._bounding_boxes_3d.BoundingBoxes3D) – 3D bounding boxes in the lidar frame.
labels (torch.Tensor) – Integer class labels, one per box.