More and more data collected in organizations is in an encoded format, essentially a binary classification of data. These can be images, audio files, video, or even common formats like Word and Excel files. This data contains lots of important data, but the formatting must be stripped out in order for users to effectively search this data.This presentation starts with a discussion of the three types of data in SQL Server to set the framework. It demos and explains:
  • structured data
  • semi-structured data
  • unstructured data
The talk then looks at how unstructured data is stored in SQL Server, specifically briefly looking at Filestream and Filetable.There is a short discussion of full text search, with a look at the changes in SQL Server 2012 before moving on to the iFilter interfaces which are used to search the binary data while ignoring the encoding.

There are demos of the basics of CONTAINS and FREETEXT searches, along with some of the more advanced options, like customizable NEAR and weighting of search terms.The talk finishes with a short look at the new semantic search feature in SQL Server 2012.
(no tags)
Presented by Steve Jones at SQLBits XI
Slide Deck 2.1 MB
Demo Code 8 KB
MP4 Video Med 175 MB
MP4 Video HD 482 MB