# Research

## Research Interests

I am broadly interested in the development and application of machine learning methods. In my position as a postdoctoral researcher at the Alan Turing Institute I focused on developing AI-based tools for data wrangling, in an effort to automate the tedious manual tasks of data preparation and data cleaning that often precede a machine learning analysis. I've worked on deep generative models, change point detection, data parsing, matrix factorization, multiclass SVMs, and sparse regression, among other things. Because my research is often focused on developing methods that work well in the real world, I have also created easy-to-use software packages for most of my research projects.

For more about me, check out my industry resume or academic CV.

## Publications

Journal articles:

- AI Assistants: A Framework for Semi-Automated Data Wrangling (PDF)

T. Petricek and G. J. J. van den Burg and A. Nazábal and T. Ceritli and E. Jiménez-Ruiz and C. K. I. Williams*IEEE Transactions on Knowledge and Data Engineering*, 35(9):9295–9306, 2023.**▸ Show abstract** - Wrangling Messy CSV Files by Detecting Row and Type Patterns (HTML; PDF)

G. J. J. van den Burg and A. Nazábal and C. Sutton*Data Mining and Knowledge Discovery*, 33:1799–1820, 2019.Code: Python – Reproducible Research Repo**▸ Show abstract** - GenSVM: A Generalized Multiclass Support Vector Machine (PDF)

G. J. J. van den Burg and P. J. F. Groenen*Journal of Machine Learning Research*, 17(224):1–42, 2016.**▸ Show abstract**

Conference proceedings:

- On Memorization in Probabilistic Deep Generative Models (PDF)

G. J. J. van den Burg and C. K. I. Williams*Advances in Neural Information Processing Systems*, 34:27916–27928, 2021.**▸ Show abstract** - Probabilistic Sequential Matrix Factorization (PDF)

Ö. D. Akyildiz* and G. J. J. van den Burg* and T. Damoulas and M. J. F. Steel*Proceedings of The 24th International Conference on Artificial Intelligence and Statistics*, (PMLR 130):3484–3492, 2021.**▸ Show abstract**

Preprints:

- An Evaluation of Change Point Detection Algorithms (PDF)

G. J. J. van den Burg and C. K. I. Williams*arXiv preprint 2003.06222*, 2020.**▸ Show abstract** - Fast Meta-Learning for Adaptive Hierarchical Classifier Design (PDF)

G. J. J. van den Burg and A. O. Hero*arXiv preprint 1711.03512*, 2017.Code: Python**▸ Show abstract** - SparseStep: Approximating the Counting Norm for Sparse Regularization (PDF)

G. J. J. van den Burg and P. J. F. Groenen and A. Alfons*arXiv preprint 1701.06967*, 2017.Code: R**▸ Show abstract**

Dissertation:

- Algorithms for Multiclass Classification and Regularized Regression (PDF)

G. J. J. van den Burg*Erasmus University Rotterdam*, 2018.**▸ Show abstract**

## Software

I aim to make my research accessible by providing software packages for the methods I develop.

*CleverCSV*. Implements the method from this paper. PyPI - GitHub.*SmartSVM*. Implements the SmartSVM classifier from this paper. PyPI - GitHub.*SparseStep*. Implements the SparseStep method from this paper. CRAN - GitHub.*GenSVM*. Implements the GenSVM method from this paper. PyPI - CRAN - GitHub.*Abed*. Tool for benchmarking ML methods on compute clusters. PyPI - GitHub.*SyncRNG*. The same random numbers in R and Python. CRAN - PyPI - GitHub.

## Teaching

Lecturer:

- Programming – part-time lecturer, set up and pioneered the use of Autolab for this course (2015, 2016)

Thesis Supervision:

- Supervised two MSc thesis students in Econometrics, among whom:
- G. van Rooij, Clustering Stores of Retailers via Consumer Behavior, 2017.

- Supervised four BSc thesis students in Econometrics, among whom:
- L.W. Hoogenboom, Recommender System Optimization through Collaborative Filtering, 2016.
- E.L.J. Mathol, Neighborhood-based Collaborative Filtering: Providing the best recommendations, 2016.
- M.L. Jongsma, Categorised Neighborhood-based Collaborative Filtering, 2016.

Teaching assistant:

- Programming (2015, 2016)
- Applied Econometrics (2015, 2016)
- Mathematical Models (2014, 2015)
- Data Analysis (2014, 2015)