The entity matching vocabulary.
Plain-language definitions of the terms you’ll meet across this site and the field. Each links to a fuller explanation.
Active learning
Active learning is a training strategy in which the model chooses which examples it wants labeled (typically the pairs it is most uncertain about) so it reaches high accuracy from far fewer labels.
Blocking
Blocking is the stage of entity matching that generates a small set of candidate record pairs likely to match, so that the matcher never has to compare every possible pair.
Deduplication
Deduplication is entity matching applied within a single dataset, finding and resolving records that refer to the same entity so each real-world thing appears once.
Entity matching
Entity matching is the task of identifying records that refer to the same real-world entity, across or within datasets, even when those records are not identical.
Matching
Matching is the stage of entity matching that decides, for each candidate pair produced by blocking, whether the two records refer to the same real-world entity.
Record linkage
Record linkage is the process of identifying records across two or more datasets that refer to the same entity. The term is most common in statistics and healthcare.
Want the applied version?
See how these concepts turn into a working pipeline.