Sam has spent the past 15+ years developing, testing and deploying a wide range of machine learning based systems that have solved complex problems involving large datasets. I have real-world experience in:
● Machine learning pipelines: building automated end-to-end data and ML pipelines using CI/CD tools like VSTS i.e. data ingest -> data prep (feature engineering and selection) -> model build -> model serve & feature store (cloud, mobile, edge) -> A/B testing.
● Machine learning methodologies: deep-learning, regression, decision trees/forests, Bayesian nets, very fast exact/approx nearest neighbour search, clustering methods, over-fitting problem (a priori residual variance estimates, cross validation, Monte Carlo, White's reality test), multiple testing problem (Bonferroni, False Discovery Rate), optimisation theory and implementation (Quasi-Newton, simulated annealing, genetic algorithms, tabu search).
● Time series modelling: stochastic processes, chaotic time series, transfer functions, neural networks, LLR, input selection.
● Data preparation: denoising, feature extraction, feature engineering, feature selection, unbiased estimators, data cleansing, dimension reduction, EDA, sampling, stratified sampling.
● Experienced programmer (20 years). Proficient in R, Python and .Net.
● Modern Data Architectures/Processing: Spark, GPU-based processing.
Sessions