Baseline results
This section shows how the toolkit can be used in realistic pipelines for preparing data, training models and extracting features. Specifically, we present guidelines on how to replicate the main results of the accompanying paper and provide baseline results. This includes training and inference with MegaDescriptor flavours.
Prepare datasets
Preparing includes resizing images, cropping bounding boxes, and cropping black backgrounds of segmented images. If multiple identities exist in one image (e.g. ATRW dataset), we crop them and split them into two images. More details about preparing datasets, resizing and splits can be found in this notebook.
We save and use two sets of images:
- For inference with 518x518 images: CLIP, DINOv2, and MegaDescriptor-L-384
- For inference with 256x256 images: MegaDescriptor-L/B/S/T-224
Datasets splits:
- Observations are approximately split into 80% training and 20% test sets.
- Each class is present in both the training set and the test set. Images with unknown identities are discarded.
- Training sets are aggregated into a single dataset and used for training MegaDescriptors.
- Test set for each dataset is used for evaluation.
Training
Metadata for aggregated dataset used for training can be found here.
Inference
In general, we use DeepFeatures
feature extractor, cosine similarity and 1-NN KnnClassifier
. We provide metadata for each dataset and results for each model
Notebooks and weights
model | Training | Inference | Weights |
---|---|---|---|
MegaDescriptor-L-384 | notebook | notebook | HuggingFace Hub |
MegaDescriptor-L-224 | notebook | notebook | HuggingFace Hub |
MegaDescriptor-B-224 | notebook | notebook | HuggingFace Hub |
MegaDescriptor-S-224 | notebook | notebook | HuggingFace Hub |
MegaDescriptor-T-224 | notebook | notebook | HuggingFace Hub |