The analysis of raw data requires us to find and understand complex patterns in that data.  We all have a toolbox of techniques and methodologies that we use; the more tools we have, the better we are at the job of analysis.  Some of these tools are well known, data mining for example. This talk covers some of the less well-known techniques that are still directly applicable to this kind of analytics.    Last year at Sqlbits I gave a two hour session on four such topics:
  • Monte Carlo simulations (MCS)
  • Nyquist’s Theorem
  • Benford’s Law
  • Simpson’s paradox    
I will not be assuming that you attended last year’s talk; although if you did and enjoyed it then it is highly likely that you will enjoy this one!  This session will focus on more of these invaluable techniques.  For example, we’ll talk about:
  • Dark Data
  • Probability calculations
  • RFI    
In each case I try to give you an understanding, not of the maths behind these techniques, but of how they work, why they work and (most importantly) why it is to your advantage to know about them.  I have genuinely chosen only techniques that I have found invaluable in my commercial work. 
(no tags)
Presented by Mark Whitehorn at SQLBits XV
Slide Deck 2.2 MB