Skip to content

data.dataset

ImageDataset(metadata, root=None, transform=None, col_path='path', col_label='identity', load_label=True)

PyTorch-style dataset for a image datasets

Parameters:

Name Type Description Default
metadata DataFrame

A pandas dataframe containing image metadata.

required
root str | None

Root directory if paths in metadata are relative. If None, absolute paths in metadata are used.

None
transform Callable | None

A function that takes in an image and returns its transformed version.

None
col_path str

Column name in the metadata containing image file paths.

'path'
col_label str

Column name in the metadata containing class labels.

'identity'
load_label bool

If False, __getitem__ returns only image instead of (image, label) tuple.

True

Attributes:

Name Type Description
labels array

An integers array of ordinal encoding of labels.

labels_string array

A strings array of original labels.

labels_map dict

A mapping between labels and their ordinal encoding.

num_classes int

Return the number of unique classes in the dataset.

WildlifeDataset(metadata, root=None, transform=None, img_load='full', col_path='path', col_label='identity', load_label=True)

Bases: ImageDataset

PyTorch-style dataset for a datasets from wildlife-datasets library.

Parameters:

Name Type Description Default
metadata DataFrame

A pandas dataframe containing image metadata.

required
root str | None

Root directory if paths in metadata are relative. If None, absolute paths in metadata are used.

None
transform Callable | None

A function that takes in an image and returns its transformed version.

None
img_load str

Method to load images. Options: 'full', 'full_mask', 'full_hide', 'bbox', 'bbox_mask', 'bbox_hide', and 'crop_black'.

'full'
col_path str

Column name in the metadata containing image file paths.

'path'
col_label str

Column name in the metadata containing class labels.

'identity'
load_label bool

If False, __getitem__ returns only image instead of (image, label) tuple.

True

Attributes:

Name Type Description
labels array

An integers array of ordinal encoding of labels.

labels_string array

A strings array of original labels.

labels_map dict

A mapping between labels and their ordinal encoding.

num_classes int

Return the number of unique classes in the dataset.

FeatureDataset(features, metadata, col_label='identity', load_label=True)

PyTorch-style dataset for a extracted features. Couples features with metadata.

Parameters:

Name Type Description Default
features list

list, np.array or tensor of features. Index of features should match with metadata.

required
metadata DataFrame

A pandas dataframe containing features metadata.

required
col_label str

Column name in the metadata containing class labels.

'identity'
load_label bool

If False, __getitem__ returns only image instead of (image, label) tuple.

True

Attributes:

Name Type Description
labels array

An integers array of ordinal encoding of labels.

labels_string array

A strings array of original labels.

labels_map dict

A mapping between labels and their ordinal encoding.

num_classes int

Return the number of unique classes in the dataset.

FeatureDatabase(features, metadata, col_label='identity', load_label=True)

Bases: FeatureDataset

Alias for FeatureDataset

Wildlife dataset

WildlifeDataset is a class for creating Pytorch style image datasets and allows integration of datasets provided by wildlife-datasets library. It has implemented __len__ and __getattr__ methods, which allows using standard Pytorch dataloaders for training and inference.

Metadata dataframe

Key part of WildlifeDataset is metadata dataframe, which includes all information about images in the dataset. Typical dataset from the wildlife-dataset have following metadata table:

image_id identity path split bbox segmentation
image_1 a images/a/image_1 train bbox compressed rle
image_2 a images/a/image_2 test bbox compressed rle
image_3 b images/b/image_3 train bbox compressed rle

Columns image_id, identity, path are required, other columns are optional. In the table above, bbox is bounding box in form [x, y, width, height], and can be stored both as list or string. compressed rle is segmentation mask in compressed RLE format as described by pycocotools

Loading methods

If metadata table have optional bbox or segmentation columns, additional alternative image loading methods can be used.

Argument Loading effect
full Full image
full_mask Full image with redacted background
full_hide Full image with redacted foreground
bbox Bounding box cropp
bbox_mask Bounding box cropp with redacted background
bbox_hide Bounding box cropp with redacted foreground
crop_black Black background cropp, if there is one

Image loading methods

Example

from wildlife_tools.data.dataset import WildlifeDataset
import pandas as pd

metadata = pd.read_csv('ExampleDataset/metadata.csv')
dataset = WildlifeDataset(metadata, 'ExampleDataset')

# View first image in the dataset.
image, label = dataset[0]