vision3d.metrics#

3D object detection evaluation metrics.

Classes

`APInterpolation`(*values)	AP interpolation mode.
`MeanAveragePrecision3D`(class_ids[, ...])	3D detection mAP metric.
`MeanAveragePrecision3DResult`	Structured result returned by `MeanAveragePrecision3D.compute()`.
`Prediction3D`	Per-frame detection output.
`Target3D`	Per-frame ground-truth annotations.

class vision3d.metrics.APInterpolation(*values)[source]#

Bases: Enum

AP interpolation mode.

R40 = 'r40'#: 40-point interpolation (modern KITTI default).

R11 = 'r11'#: 11-point interpolation (legacy KITTI, PASCAL VOC07).

R101 = 'r101'#: 101-point interpolation (COCO).

ALL_POINTS = 'all_points'#: VOC07 area-under-curve at every recall change.

class vision3d.metrics.MeanAveragePrecision3D(class_ids, iou_thresholds=(0.5, 0.7), ap_interpolation=APInterpolation.R40, range_bins=None)[source]#

Bases: object

3D detection mAP metric.

Matching is greedy by descending score, one prediction to one ground truth, with precision/recall accumulated globally across frames (KITTI convention).

Parameters:

class_ids (list[int]) – Integer class IDs to score. Predictions and GTs with labels outside this set are ignored.
iou_thresholds (tuple[float, ...]) – IoU thresholds to report AP at. Default (0.5, 0.7).
ap_interpolation (APInterpolation) – Interpolation mode. Default APInterpolation.R40.
range_bins (tuple[tuple[float, float], ...] | None) – Optional distance bins [low, high) in meters from the sensor origin. When set, AP is also broken down per bin; boxes are bucketed by their center’s distance.

update(preds, targets)[source]#

Accumulate one or more frames of predictions vs ground truth.

Parameters:

preds (list[Prediction3D]) – List of per-frame Prediction3D dicts.
targets (list[Target3D]) – List of per-frame Target3D dicts.

Raises:

ValueError – If preds and targets differ in length.

Return type:

None

compute()[source]#

Compute the aggregated metric.

Returns:: Populated MeanAveragePrecision3DResult.
Return type:: MeanAveragePrecision3DResult

reset()[source]#

Clear all accumulated state.

Return type:: None

class vision3d.metrics.MeanAveragePrecision3DResult[source]#

Bases: TypedDict

Structured result returned by MeanAveragePrecision3D.compute().

Undefined slots (buckets with no ground-truth boxes accumulated) are reported as -1.0 and callers can filter them with x >= 0.

Variables:

mAP (float) – Overall mean AP, taken over every defined (class, iou, bin) bucket.
mAP_per_class (dict[int, float]) – AP per class, averaged over the other axes.
AP_per_iou (dict[float, float]) – AP per IoU threshold, averaged over the other axes.
AP_per_class_per_iou (dict[tuple[int, float], float]) – AP per (class, iou) pair, averaged over range bins (or a single value when range bucketing is disabled).
AP_per_range (dict[tuple[float, float], float]) – AP per range bin, averaged over the other axes. Only present when range_bins was set on the metric.
AP_per_class_per_range (dict[tuple[int, tuple[float, float]], float]) – AP per (class, range_bin) pair, averaged over IoU thresholds. Only present when range_bins was set on the metric.

class vision3d.metrics.Prediction3D[source]#

Bases: TypedDict

Per-frame detection output.

Variables:

boxes (vision3d.tensors._bounding_boxes_3d.BoundingBoxes3D) – [N, K] predicted 3D bounding boxes; K depends on the box format.
scores (torch.Tensor) – [N] confidence scores.
labels (torch.Tensor) – [N] integer class labels.

class vision3d.metrics.Target3D[source]#

Bases: TypedDict

Per-frame ground-truth annotations.

Variables:

boxes (vision3d.tensors._bounding_boxes_3d.BoundingBoxes3D) – [M, K] ground-truth 3D bounding boxes.
labels (torch.Tensor) – [M] integer class labels.