Testing machine learning methods
The main goal of the package is to provide a simple way for testing machine learning methods on multiple wildlife re-identification datasets.
from wildlife_datasets import datasets, loader, metrics
Data preparation
The datasets need to be downloaded first. Assume that we have already downloaded the MacaqueFaces dataset. Then we load it
d = loader.load_dataset(datasets.MacaqueFaces, 'data', 'dataframes')
The package contains tools for creating splits. The following code creates a closed-set split (identities in the training and testing set are the same) with 80% samples in the training set:
from wildlife_datasets import splits
splitter = splits.ClosedSetSplit(0.8)
idx_train, idx_test = splitter.split(d.df)[0]
df_train = d.df.loc[idx_train]
image_id ... date
80 80 ... 2015-06-01
134 134 ... 2015-07-23
14 14 ... 2014-10-06
96 96 ... 2015-06-03
150 150 ... 2014-09-18
... ... ... ...
6240 6240 ... 2014-11-25
6205 6205 ... 2015-07-03
6202 6202 ... 2014-02-19
6277 6277 ... 2014-03-21
6134 6134 ... 2014-02-19
[5024 rows x 4 columns]
and similarly the testing set
df_test = d.df.loc[idx_test]
image_id ... date
93 93 ... 2015-07-23
89 89 ... 2014-12-19
183 183 ... 2015-07-08
103 103 ... 2014-12-19
175 175 ... 2014-07-16
... ... ... ...
6262 6262 ... 2015-07-03
6101 6101 ... 2015-07-01
6149 6149 ... 2014-02-19
6175 6175 ... 2014-02-19
6113 6113 ... 2015-07-01
[1256 rows x 4 columns]
Any photo, where the animal was not recognized, are ignored for the split. Therefore, the union of the training and testing sets may be smaller than the whole dataset. It is also possible to create custom splits.
Write your ML method
Now write your method. We create a prediction model predicting always the name Dan
.
y_pred = ['Dan']*len(df_test)
Evaluate the method
We implemented a Scikit-like interface for evaluation metric. We can compute accuracy
y_true = df_test['identity']
metrics.accuracy(y_true, y_pred)
0.030254777070063694
Mass evaluation
For mass evaluation of the developed method on wildlife re-identification datasets, we first load multiple datasets
ds = loader.load_datasets(
[datasets.IPanda50, datasets.MacaqueFaces],
'data',
'dataframes'
)
and then run the same code in a loop
for d in ds:
idx_train, idx_test = splitter.split(d.df)[0]
df_train, df_test = d.df.iloc[idx_train], d.df.iloc[idx_test]
y_pred = [df_train.iloc[0]['identity']]*len(df_test)
y_true = df_test['identity']
print(metrics.accuracy(y_true, y_pred))
0.030254777070063694
0.03781818181818182