Research

Research Interests

I am broadly interested in the development and application of machine learning methods. In my position as a postdoctoral researcher at the Alan Turing Institute I focused on developing AI-based tools for data wrangling, in an effort to automate the tedious manual tasks of data preparation and data cleaning that often precede a machine learning analysis. I've worked on deep generative models, change point detection, data parsing, matrix factorization, multiclass SVMs, and sparse regression, among other things. Because my research is often focused on developing methods that work well in the real world, I have also created easy-to-use software packages for most of my research projects.

For more about me, check out my industry resume or academic CV.

Publications

Journal articles:

  • AI Assistants: A Framework for Semi-Automated Data Wrangling (PDF)
    and and and and and
    IEEE Transactions on Knowledge and Data Engineering, 35(9):92959306, .
    ▸ Show abstract
  • Wrangling Messy CSV Files by Detecting Row and Type Patterns (HTMLPDF)
    and and
    Data Mining and Knowledge Discovery, 33:17991820, .
    ▸ Show abstract
  • GenSVM: A Generalized Multiclass Support Vector Machine (PDF)
    and
    Journal of Machine Learning Research, 17(224):142, .
    Code: CRPython
    ▸ Show abstract

Conference proceedings:

  • On Memorization in Probabilistic Deep Generative Models (PDF)
    and
    Advances in Neural Information Processing Systems, 34:2791627928, .
    ▸ Show abstract
  • Probabilistic Sequential Matrix Factorization (PDF)
    and and and
    Proceedings of The 24th International Conference on Artificial Intelligence and Statistics, (PMLR 130):34843492, .
    ▸ Show abstract

Preprints:

  • An Evaluation of Change Point Detection Algorithms (PDF)
    and
    arXiv preprint 2003.06222, .
    ▸ Show abstract
  • Fast Meta-Learning for Adaptive Hierarchical Classifier Design (PDF)
    and
    arXiv preprint 1711.03512, .
    Code: Python
    ▸ Show abstract
  • SparseStep: Approximating the Counting Norm for Sparse Regularization (PDF)
    and and
    arXiv preprint 1701.06967, .
    Code: R
    ▸ Show abstract

Dissertation:

  • Algorithms for Multiclass Classification and Regularized Regression (PDF)
    Erasmus University Rotterdam, .
    ▸ Show abstract

Software

I aim to make my research accessible by providing software packages for the methods I develop.

  • CleverCSV. Implements the method from this paper. PyPI - GitHub.
  • SmartSVM. Implements the SmartSVM classifier from this paper. PyPI - GitHub.
  • SparseStep. Implements the SparseStep method from this paper. CRAN - GitHub.
  • GenSVM. Implements the GenSVM method from this paper. PyPI - CRAN - GitHub.
  • Abed. Tool for benchmarking ML methods on compute clusters. PyPI - GitHub.
  • SyncRNG. The same random numbers in R and Python. CRAN - PyPI - GitHub.

Teaching

Lecturer:

  • Programming – part-time lecturer, set up and pioneered the use of Autolab for this course (2015, 2016)

Thesis Supervision:

Teaching assistant: