Skip to content

Inference

This module is concerned with making predictions on unseen data, using nearest neigbour techiqnues on either features or similarity matrix.

KnnClassifier

Reference

Predict query label as k labels of nearest matches in database. If there is tie at given k, prediction from k-1 is used. Input is similarity matrix with n_query x n_database shape.

Parameters:

Name Type Description Default
k int

use k nearest in database for the majority voting.

1
database_labels np.array | None

list of labels in database. If provided, decode predictions to database (e.g. string) labels.

None

Returns:

Type Description

1D array with length n_query of database labels (col index of the similarity matrix).

Source code in inference/classifier.py
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
class KnnClassifier():
    '''
    Predict query label as k labels of nearest matches in database. If there is tie at given k, prediction from k-1 is used.
    Input is similarity matrix with `n_query` x `n_database` shape.


    Args:
        k: use k nearest in database for the majority voting.
        database_labels: list of labels in database. If provided, decode predictions to database (e.g. string) labels.
    Returns:
        1D array with length `n_query` of database labels (col index of the similarity matrix).
    '''

    def __init__(self, k: int = 1, database_labels: np.array | None = None):
        self.k = k
        self.database_labels = database_labels


    def __call__(self, similarity):
        similarity = torch.tensor(similarity, dtype=float)
        scores, idx = similarity.topk(k=self.k, dim=1)
        pred = self.aggregate(idx)[:, self.k-1]

        if self.database_labels is not None:
            pred = self.database_labels[pred]
        return pred


    def aggregate(self, predictions):
        '''
        Aggregates array of nearest neigbours to single prediction for each k.
        If there is tie at given k, prediction from k-1 is used.

        Args:
            array of with shape [n_query, k] of nearest neighbours.
        Returns:
            array of shape [n_query, k] of predicitons. Column dimensions are predictions for [k=1, k=2 ... k=k]
        '''

        results = defaultdict(list)
        for k in range(1, predictions.shape[1] + 1):
            for row in predictions[:, :k]:
                vals, counts = np.unique(row, return_counts=True)
                best = vals[np.argmax(counts)]

                counts_sorted = sorted(counts)
                if (len(counts_sorted)) > 1 and (counts_sorted[0] == counts_sorted[1]):
                    best = None
                results[k].append(best)

        results = pd.DataFrame(results).T.fillna(method='ffill').T
        return results.values

aggregate(predictions)

Aggregates array of nearest neigbours to single prediction for each k. If there is tie at given k, prediction from k-1 is used.

Returns:

Type Description

array of shape [n_query, k] of predicitons. Column dimensions are predictions for [k=1, k=2 ... k=k]

Source code in inference/classifier.py
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
def aggregate(self, predictions):
    '''
    Aggregates array of nearest neigbours to single prediction for each k.
    If there is tie at given k, prediction from k-1 is used.

    Args:
        array of with shape [n_query, k] of nearest neighbours.
    Returns:
        array of shape [n_query, k] of predicitons. Column dimensions are predictions for [k=1, k=2 ... k=k]
    '''

    results = defaultdict(list)
    for k in range(1, predictions.shape[1] + 1):
        for row in predictions[:, :k]:
            vals, counts = np.unique(row, return_counts=True)
            best = vals[np.argmax(counts)]

            counts_sorted = sorted(counts)
            if (len(counts_sorted)) > 1 and (counts_sorted[0] == counts_sorted[1]):
                best = None
            results[k].append(best)

    results = pd.DataFrame(results).T.fillna(method='ffill').T
    return results.values

KnnMatcher

Reference

Find nearest match to query in existing database of features. Combines CosineSimilarity and KnnClassifier.

Source code in inference/classifier.py
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
class KnnMatcher():
    ''' 
    Find nearest match to query in existing database of features.
    Combines CosineSimilarity and KnnClassifier.
    '''

    def __init__(self, database, k=1):
        self.similarity = CosineSimilarity()
        self.database = database
        self.classifier = KnnClassifier(database_labels=self.database.labels_string, k=k)


    def __call__(self, query):
        if isinstance(query, list):
            query = np.concatenate(query)

        if not isinstance(query, np.ndarray):
            raise ValueError('Query should be array or list of features.')

        sim_matrix = self.similarity(query, self.database.features)['cosine']
        return self.classifier(sim_matrix)