Research

Research Interests

I am broadly interested in the development and application of machine learning methods. In my current position as postdoctoral researcher at the Alan Turing Institute I focus on developing AI-based tools for data wrangling, in an effort to automate the tedious manual tasks of data preparation and data cleaning that often precede a machine learning analysis. I've worked on change point detection, data parsing, matrix factorization, multiclass SVMs, and sparse regression, among other things. Because my research is often focused on developing methods that work well in the real world, I have also created easy-to-use software packages for most of my research projects.

For more about me, check out my industry resume or academic CV.

Publications

Journal articles:

  • Wrangling Messy CSV Files by Detecting Row and Type Patterns (HTMLPDF)
    and and
    Data Mining and Knowledge Discovery, .
    ▸ Show abstract
  • GenSVM: A Generalized Multiclass Support Vector Machine (PDF)
    and
    Journal of Machine Learning Research, 17(224):142, .
    Code: CRPython
    ▸ Show abstract

Conference proceedings:

  • Probabilistic Sequential Matrix Factorization (PDF)
    and and and
    Accepted for publication at AISTATS, .
    ▸ Show abstract

Preprints:

  • On Memorization in Probabilistic Deep Generative Models (PDF)
    and
    arXiv preprint 2106.03216, .
    ▸ Show abstract
  • An Evaluation of Change Point Detection Algorithms (PDF)
    and
    arXiv preprint 2003.06222, .
    ▸ Show abstract
  • Fast Meta-Learning for Adaptive Hierarchical Classifier Design (PDF)
    and
    arXiv preprint 1711.03512, .
    Code: Python
    ▸ Show abstract
  • SparseStep: Approximating the Counting Norm for Sparse Regularization (PDF)
    and and
    arXiv preprint 1701.06967, .
    Code: R
    ▸ Show abstract

Dissertation:

  • Algorithms for Multiclass Classification and Regularized Regression (PDF)
    Erasmus University Rotterdam, .
    ▸ Show abstract

Software

I aim to make my research accessible by providing software packages for the methods I develop.

  • CleverCSV. Implements the method from this paper. PyPI - GitHub.
  • SmartSVM. Implements the SmartSVM classifier from this paper. PyPI - GitHub.
  • SparseStep. Implements the SparseStep method from this paper. CRAN - GitHub.
  • GenSVM. Implements the GenSVM method from this paper. PyPI - CRAN - GitHub.
  • Abed. Tool for benchmarking ML methods on compute clusters. PyPI - GitHub.
  • SyncRNG. The same random numbers in R and Python. CRAN - PyPI - GitHub.

Teaching

Lecturer:

  • Programming – part-time lecturer, set up and pioneered the use of Autolab for this course (2015, 2016)

Thesis Supervision:

Teaching assistant: