Platinum Sponsor

Identity Mapping and De-Duplicating

In an enterprize, merging master data, like customer data, from multiple sources is a common problem. Typically, you do not have a single, i.e. the same key identifying a customer in different sources. You have to match data based on similarity of strings, like names and addresses. In this session, we are going to check how different algorithms for comparing strings included in SQL Server 2008 R2 and SQL Server 2012 (Denali) work. We are going to use Soundex Transact-SQL function, four different algorithms that come in R2 with Master Data Services (Levenshtein, Jaccard, Jaro-Winkler and Ratcliff-Obershelp), and Fuzzy Lookup transformation from Integration Services. Finally, we are going to introduce how SQL Server 2012 Data Quality Services (DQS) help us here. We are also going to tackle the performance problems with string matching merging.
Presented by Dejan Sarka at SQLBits XI
Tags (no tags)
  • Downloads
    Sorry, there are no downloads available for this session.
  • SpeakerBIO
    dejan_sarka.jpg
    Dejan Sarka, SQL Server MVP, focuses on development of database & business intelligence applications.  Besides projects, he spends about half of the time on training and mentoring. He is the founder of the Slovenian SQL Server and .NET Users Group. Dejan Sarka is the main author or coauthor of eleven books about databases and SQL Server. Dejan Sarka also developed two courses for Solid Quality Mentors - Data Modeling Essentials and Data Mining with SQL Server.
    http://blogs.solidq.com/dsarka/Home.aspx?language=english
  • Video
    The video is not available to view online.
  • Session Files Explorer
    The network name cannot be found.