data.dataset
ImageDataset(metadata, root=None, transform=None, col_path='path', col_label='identity', load_label=True)
PyTorch-style dataset for a image datasets
Parameters:
Name | Type | Description | Default |
---|---|---|---|
metadata |
DataFrame
|
A pandas dataframe containing image metadata. |
required |
root |
str | None
|
Root directory if paths in metadata are relative. If None, absolute paths in metadata are used. |
None
|
transform |
Callable | None
|
A function that takes in an image and returns its transformed version. |
None
|
col_path |
str
|
Column name in the metadata containing image file paths. |
'path'
|
col_label |
str
|
Column name in the metadata containing class labels. |
'identity'
|
load_label |
bool
|
If False, __getitem__ returns only image instead of (image, label) tuple. |
True
|
Attributes:
Name | Type | Description |
---|---|---|
labels |
array
|
An integers array of ordinal encoding of labels. |
labels_string |
array
|
A strings array of original labels. |
labels_map |
dict
|
A mapping between labels and their ordinal encoding. |
num_classes |
int
|
Return the number of unique classes in the dataset. |
WildlifeDataset(metadata, root=None, transform=None, img_load='full', col_path='path', col_label='identity', load_label=True)
Bases: ImageDataset
PyTorch-style dataset for a datasets from wildlife-datasets library.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
metadata |
DataFrame
|
A pandas dataframe containing image metadata. |
required |
root |
str | None
|
Root directory if paths in metadata are relative. If None, absolute paths in metadata are used. |
None
|
transform |
Callable | None
|
A function that takes in an image and returns its transformed version. |
None
|
img_load |
str
|
Method to load images. Options: 'full', 'full_mask', 'full_hide', 'bbox', 'bbox_mask', 'bbox_hide', and 'crop_black'. |
'full'
|
col_path |
str
|
Column name in the metadata containing image file paths. |
'path'
|
col_label |
str
|
Column name in the metadata containing class labels. |
'identity'
|
load_label |
bool
|
If False, __getitem__ returns only image instead of (image, label) tuple. |
True
|
Attributes:
Name | Type | Description |
---|---|---|
labels |
array
|
An integers array of ordinal encoding of labels. |
labels_string |
array
|
A strings array of original labels. |
labels_map |
dict
|
A mapping between labels and their ordinal encoding. |
num_classes |
int
|
Return the number of unique classes in the dataset. |
FeatureDataset(features, metadata, col_label='identity', load_label=True)
PyTorch-style dataset for a extracted features. Couples features with metadata.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
features |
list
|
list, np.array or tensor of features. Index of features should match with metadata. |
required |
metadata |
DataFrame
|
A pandas dataframe containing features metadata. |
required |
col_label |
str
|
Column name in the metadata containing class labels. |
'identity'
|
load_label |
bool
|
If False, __getitem__ returns only image instead of (image, label) tuple. |
True
|
Attributes:
Name | Type | Description |
---|---|---|
labels |
array
|
An integers array of ordinal encoding of labels. |
labels_string |
array
|
A strings array of original labels. |
labels_map |
dict
|
A mapping between labels and their ordinal encoding. |
num_classes |
int
|
Return the number of unique classes in the dataset. |
FeatureDatabase(features, metadata, col_label='identity', load_label=True)
Wildlife dataset
WildlifeDataset is a class for creating Pytorch style image datasets and allows integration of datasets provided by wildlife-datasets
library. It has implemented __len__
and __getattr__
methods, which allows using standard Pytorch dataloaders for training and inference.
Metadata dataframe
Key part of WildlifeDataset is metadata dataframe, which includes all information about images in the dataset. Typical dataset from the wildlife-dataset have following metadata table:
image_id | identity | path | split | bbox | segmentation |
---|---|---|---|---|---|
image_1 | a | images/a/image_1 | train | bbox |
compressed rle |
image_2 | a | images/a/image_2 | test | bbox |
compressed rle |
image_3 | b | images/b/image_3 | train | bbox |
compressed rle |
Columns image_id
, identity
, path
are required, other columns are optional. In the table above, bbox
is bounding box in form [x, y, width, height], and can be stored both as list or string. compressed rle
is segmentation mask in compressed RLE format as described by pycocotools
Loading methods
If metadata table have optional bbox
or segmentation
columns, additional alternative image loading methods can be used.
Argument | Loading effect |
---|---|
full |
Full image |
full_mask |
Full image with redacted background |
full_hide |
Full image with redacted foreground |
bbox |
Bounding box cropp |
bbox_mask |
Bounding box cropp with redacted background |
bbox_hide |
Bounding box cropp with redacted foreground |
crop_black |
Black background cropp, if there is one |
Example
from wildlife_tools.data.dataset import WildlifeDataset
import pandas as pd
metadata = pd.read_csv('ExampleDataset/metadata.csv')
dataset = WildlifeDataset(metadata, 'ExampleDataset')
# View first image in the dataset.
image, label = dataset[0]