Schema Matching using Duplicates

Alexander Bilke & Felix Naumann
Most data integration applications require a matching between the schemas of the respective data sets. We show how the existence of duplicates within these data sets can be exploited to automatically identify matching attributes. We describe an algorithm that first discovers duplicates among data sets with unaligned schemas and then uses these duplicates to perform schema matching between schemas with opaque column names. Discovering duplicates among data sets with unaligned schemas is more difficult than...
This data repository is not currently reporting usage information. For information on how your repository can submit usage information, please see our documentation.