Research

Research Interests

I am broadly interested in the development and application of machine learning methods. In my position as a postdoctoral researcher at the Alan Turing Institute I focused on developing AI-based tools for data wrangling, in an effort to automate the tedious manual tasks of data preparation and data cleaning that often precede a machine learning analysis. I've worked on deep generative models, change point detection, data parsing, matrix factorization, multiclass SVMs, and sparse regression, among other things. Because my research is often focused on developing methods that work well in the real world, I have also created easy-to-use software packages for most of my research projects.

For more about me, check out my industry resume or academic CV.

Publications

Journal articles:

AI Assistants: A Framework for Semi-Automated Data Wrangling (PDF)
T. Petricek and G. J. J. van den Burg and A. Nazábal and T. Ceritli and E. Jiménez-Ruiz and C. K. I. Williams
IEEE Transactions on Knowledge and Data Engineering, 35(9):9295–9306, 2023.

▸ Show abstract
Wrangling Messy CSV Files by Detecting Row and Type Patterns (HTML; PDF)
G. J. J. van den Burg and A. Nazábal and C. Sutton
Data Mining and Knowledge Discovery, 33:1799–1820, 2019.

Code: Python – Reproducible Research Repo

▸ Show abstract
GenSVM: A Generalized Multiclass Support Vector Machine (PDF)
G. J. J. van den Burg and P. J. F. Groenen
Journal of Machine Learning Research, 17(224):1–42, 2016.

Code: C – R – Python

▸ Show abstract

Conference proceedings:

On Memorization in Probabilistic Deep Generative Models (PDF)
G. J. J. van den Burg and C. K. I. Williams
Advances in Neural Information Processing Systems, 34:27916–27928, 2021.

Code: Reproducible Research Repo

▸ Show abstract
Probabilistic Sequential Matrix Factorization (PDF)
Ö. D. Akyildiz* and G. J. J. van den Burg* and T. Damoulas and M. J. F. Steel
Proceedings of The 24th International Conference on Artificial Intelligence and Statistics, (PMLR 130):3484–3492, 2021.

Code: Reproducible Research Repo

▸ Show abstract

Preprints:

An Evaluation of Change Point Detection Algorithms (PDF)
G. J. J. van den Burg and C. K. I. Williams
arXiv preprint 2003.06222, 2020.

Code: Python (dataset) – Reproducible Research Repo

▸ Show abstract
Fast Meta-Learning for Adaptive Hierarchical Classifier Design (PDF)
G. J. J. van den Burg and A. O. Hero
arXiv preprint 1711.03512, 2017.

Code: Python

▸ Show abstract
SparseStep: Approximating the Counting Norm for Sparse Regularization (PDF)
G. J. J. van den Burg and P. J. F. Groenen and A. Alfons
arXiv preprint 1701.06967, 2017.

Code: R

▸ Show abstract

Dissertation:

Algorithms for Multiclass Classification and Regularized Regression (PDF)
G. J. J. van den Burg
Erasmus University Rotterdam, 2018.

▸ Show abstract

Software

I aim to make my research accessible by providing software packages for the methods I develop.

CleverCSV. Implements the method from this paper. PyPI - GitHub.
SmartSVM. Implements the SmartSVM classifier from this paper. PyPI - GitHub.
SparseStep. Implements the SparseStep method from this paper. CRAN - GitHub.
GenSVM. Implements the GenSVM method from this paper. PyPI - CRAN - GitHub.
Abed. Tool for benchmarking ML methods on compute clusters. PyPI - GitHub.
SyncRNG. The same random numbers in R and Python. CRAN - PyPI - GitHub.

Teaching

Lecturer:

Programming – part-time lecturer, set up and pioneered the use of Autolab for this course (2015, 2016)

Thesis Supervision:

Supervised two MSc thesis students in Econometrics, among whom:
- G. van Rooij, Clustering Stores of Retailers via Consumer Behavior, 2017.
Supervised four BSc thesis students in Econometrics, among whom:
- L.W. Hoogenboom, Recommender System Optimization through Collaborative Filtering, 2016.
- E.L.J. Mathol, Neighborhood-based Collaborative Filtering: Providing the best recommendations, 2016.
- M.L. Jongsma, Categorised Neighborhood-based Collaborative Filtering, 2016.

Teaching assistant:

Programming (2015, 2016)
Applied Econometrics (2015, 2016)
Mathematical Models (2014, 2015)
Data Analysis (2014, 2015)