## Research Interests

I am broadly interested in the development and application of machine learning methods. In my current position as postdoctoral researcher at the Alan Turing Institute I focus on developing AI-based tools for data wrangling, in an effort to automate the tedious manual tasks of data preparation and data cleaning that often precede a machine learning analysis. I've worked on change point detection, data parsing, matrix factorization, multiclass SVMs, and sparse regression, among other things. Because my research is often focused on developing methods that work well in the real world, I have also created easy-to-use software packages for most of my research projects.

For more about me, check out my industry resume or academic CV.

## Publications

Journal articles:

- Wrangling Messy CSV Files by Detecting Row and Type Patterns (HTML; PDF)

G. J. J. van den Burg and A. Nazabal and C. Sutton*Data Mining and Knowledge Discovery*, 2019.Code: Python – Reproducible Research Repo**▸ Show abstract** - GenSVM: A Generalized Multiclass Support Vector Machine (PDF)

G. J. J. van den Burg and P. J. F. Groenen*Journal of Machine Learning Research*, 17(224):1–42, 2016.**▸ Show abstract**

Conference proceedings:

- Probabilistic Sequential Matrix Factorization (PDF)

Ö. D. Akyildiz* and G. J. J. van den Burg* and T. Damoulas and M. J. F. Steel*Accepted for publication at AISTATS*, 2021.**▸ Show abstract**

Preprints:

- On Memorization in Probabilistic Deep Generative Models (PDF)

G. J. J. van den Burg and C. K. I. Williams*arXiv preprint 2106.03216*, 2021.**▸ Show abstract** - An Evaluation of Change Point Detection Algorithms (PDF)

G. J. J. van den Burg and C. K. I. Williams*arXiv preprint 2003.06222*, 2020.**▸ Show abstract** - Fast Meta-Learning for Adaptive Hierarchical Classifier Design (PDF)

G. J. J. van den Burg and A. O. Hero*arXiv preprint 1711.03512*, 2017.Code: Python**▸ Show abstract** - SparseStep: Approximating the Counting Norm for Sparse Regularization (PDF)

G. J. J. van den Burg and P. J. F. Groenen and A. Alfons*arXiv preprint 1701.06967*, 2017.Code: R**▸ Show abstract**

Dissertation:

- Algorithms for Multiclass Classification and Regularized Regression (PDF)

G. J. J. van den Burg*Erasmus University Rotterdam*, 2018.**▸ Show abstract**

## Software

I aim to make my research accessible by providing software packages for the methods I develop.

*CleverCSV*. Implements the method from this paper. PyPI - GitHub.*SmartSVM*. Implements the SmartSVM classifier from this paper. PyPI - GitHub.*SparseStep*. Implements the SparseStep method from this paper. CRAN - GitHub.*GenSVM*. Implements the GenSVM method from this paper. PyPI - CRAN - GitHub.*Abed*. Tool for benchmarking ML methods on compute clusters. PyPI - GitHub.*SyncRNG*. The same random numbers in R and Python. CRAN - PyPI - GitHub.

## Teaching

Lecturer:

- Programming – part-time lecturer, set up and pioneered the use of Autolab for this course (2015, 2016)

Thesis Supervision:

- Supervised two MSc thesis students in Econometrics, among whom:
- G. van Rooij, Clustering Stores of Retailers via Consumer Behavior, 2017.

- Supervised four BSc thesis students in Econometrics, among whom:
- L.W. Hoogenboom, Recommender System Optimization through Collaborative Filtering, 2016.
- E.L.J. Mathol, Neighborhood-based Collaborative Filtering: Providing the best recommendations, 2016.
- M.L. Jongsma, Categorised Neighborhood-based Collaborative Filtering, 2016.

Teaching assistant:

- Programming (2015, 2016)
- Applied Econometrics (2015, 2016)
- Mathematical Models (2014, 2015)
- Data Analysis (2014, 2015)