Baseline results

This section shows how the toolkit can be used in realistic pipelines for preparing data, training models and extracting features. Specifically, we present guidelines on how to replicate the main results of the accompanying paper and provide baseline results. This includes training and inference with MegaDescriptor flavours.

Prepare datasets

Preparing includes resizing images, cropping bounding boxes, and cropping black backgrounds of segmented images. If multiple identities exist in one image (e.g. ATRW dataset), we crop them and split them into two images. More details about preparing datasets, resizing and splits can be found in this notebook.

We save and use two sets of images:

For inference with 518x518 images: CLIP, DINOv2, and MegaDescriptor-L-384
For inference with 256x256 images: MegaDescriptor-L/B/S/T-224

Datasets splits:

Observations are approximately split into 80% training and 20% test sets.
Each class is present in both the training set and the test set. Images with unknown identities are discarded.
Training sets are aggregated into a single dataset and used for training MegaDescriptors.
Test set for each dataset is used for evaluation.

Training

Metadata for aggregated dataset used for training can be found here.

Inference

In general, we use DeepFeatures feature extractor, cosine similarity and 1-NN KnnClassifier. We provide metadata for each dataset and results for each model

Notebooks and weights

model	Training	Inference	Weights
MegaDescriptor-L-384	notebook	notebook	HuggingFace Hub
MegaDescriptor-L-224	notebook	notebook	HuggingFace Hub
MegaDescriptor-B-224	notebook	notebook	HuggingFace Hub
MegaDescriptor-S-224	notebook	notebook	HuggingFace Hub
MegaDescriptor-T-224	notebook	notebook	HuggingFace Hub