Big Data & Politics – Analysing Free-text Docs in HDInsight
In this session we will walk through how we collected States of Jersey Hansard transcripts from the web, analysed them using HDInsight and loaded them into a data warehouse to be queried and visualised.
The transcripts are unstructured, free-text documents with all the errors and inconsistencies a human can devise! So how do we do it? How can we impose some structure and turn it into something we can work with?
Technologies we will cover include:
- Data Quality Services
- Sql Server Tabular/DAX
This session will give you an introduction to using these technologies and help you to understand how you can use them and how to get started.
Sorry, there are no downloads available for this session.
Charles has nearly ten years experience, first as a .NET software developer, but then crossing over to become a BI consultant working across the full stack of MS technologies. Being based in Jersey he mainly works in Financial Services, and this can often take him to the UK or Europe.
His current professional interests are Big Data and Data Science, and helps organise a local tech meetup group: techtribes.je
. He can be found on twitter (@charles_jsy
) and LinkedIn
The video is not available to view online.
- Session Files Explorer
The network name cannot be found.