Research

Research Interests

I am broadly interested in the development and application of machine learning methods. Currently I focus on developing AI-based tools for data wrangling in an effort to automate the tedious tasks of data preparation and data cleaning that often precede a machine learning analysis. My past research has been on multiclass classification — typically involving SVMs — as well as meta-learning and hierarchical classifier design. I've also worked on regularized regression methods, where I'm mainly interested in optimization algorithms for non-convex problems. On the more practical side I have developed software packages for most of my research projects, as well as a command-line tool to automate benchmarking of machine learning methods on distributed architectures.

Click here for my full CV.

Publications

Journal articles:

  • Wrangling Messy CSV Files by Detecting Row and Type Patterns (HTMLPDF)
    and and
    Data Mining and Knowledge Discovery, .
    ▸ Show abstract
  • GenSVM: A Generalized Multiclass Support Vector Machine (PDF)
    and
    Journal of Machine Learning Research, 17(224):142, .
    Code: CRPython
    ▸ Show abstract

Preprints:

  • Fast Meta-Learning for Adaptive Hierarchical Classifier Design (PDF)
    and
    arXiv preprint 1711.03512, .
    Code: Python
    ▸ Show abstract
  • SparseStep: Approximating the Counting Norm for Sparse Regularization (PDF)
    and and
    arXiv preprint 1701.06967, .
    Code: R
    ▸ Show abstract

Dissertation:

  • Algorithms for Multiclass Classification and Regularized Regression (PDF)
    Erasmus University Rotterdam, .
    ▸ Show abstract

Software

I aim to make my research accessible by providing software packages for the methods I develop.

  • CleverCSV. Implements the method from this paper. PyPI - GitHub.
  • SmartSVM. Implements the SmartSVM classifier from this paper. PyPI - GitHub.
  • SparseStep. Implements the SparseStep method from this paper. CRAN - GitHub.
  • GenSVM. Implements the GenSVM method from this paper. PyPI - CRAN - GitHub.
  • Abed. Tool for benchmarking ML methods on compute clusters. PyPI - GitHub.
  • SyncRNG. The same random numbers in R and Python. CRAN - PyPI - GitHub.

Teaching

Lecturer:

  • Programming – part-time lecturer, set up and pioneered the use of Autolab for this course (2015, 2016)

Thesis Supervision:

Teaching assistant: