## Research Interests

I am broadly interested in the development and application of machine learning methods. Currently I focus on developing AI-based tools for data wrangling in an effort to automate the tedious tasks of data preparation and data cleaning that often precede a machine learning analysis. My past research has been on multiclass classification — typically involving SVMs — as well as meta-learning and hierarchical classifier design. I've also worked on regularized regression methods, where I'm mainly interested in optimization algorithms for non-convex problems. On the more practical side I have developed software packages for most of my research projects, as well as a command-line tool to automate benchmarking of machine learning methods on distributed architectures.

## Publications

Journal articles:

- Wrangling Messy CSV Files by Detecting Row and Type Patterns (HTML; PDF)

G. J. J. van den Burg and A. Nazabal and C. Sutton*Data Mining and Knowledge Discovery*, 2019.Code: Python – Reproducible Research Repo**▸ Show abstract** - GenSVM: A Generalized Multiclass Support Vector Machine (PDF)

G. J. J. van den Burg and P. J. F. Groenen*Journal of Machine Learning Research*, 17(224):1–42, 2016.**▸ Show abstract**

Preprints:

- Fast Meta-Learning for Adaptive Hierarchical Classifier Design (PDF)

G. J. J. van den Burg and A. O. Hero*arXiv preprint 1711.03512*, 2017.Code: Python**▸ Show abstract** - SparseStep: Approximating the Counting Norm for Sparse Regularization (PDF)

G. J. J. van den Burg and P. J. F. Groenen and A. Alfons*arXiv preprint 1701.06967*, 2017.Code: R**▸ Show abstract**

Dissertation:

- Algorithms for Multiclass Classification and Regularized Regression (PDF)

G. J. J. van den Burg*Erasmus University Rotterdam*, 2018.**▸ Show abstract**

## Software

I aim to make my research accessible by providing software packages for the methods I develop.

*CleverCSV*. Implements the method from this paper. PyPI - GitHub.*SmartSVM*. Implements the SmartSVM classifier from this paper. PyPI - GitHub.*SparseStep*. Implements the SparseStep method from this paper. CRAN - GitHub.*GenSVM*. Implements the GenSVM method from this paper. PyPI - CRAN - GitHub.*Abed*. Tool for benchmarking ML methods on compute clusters. PyPI - GitHub.*SyncRNG*. The same random numbers in R and Python. CRAN - PyPI - GitHub.

## Teaching

Lecturer:

- Programming – part-time lecturer, set up and pioneered the use of Autolab for this course (2015, 2016)

Thesis Supervision:

- Supervised two MSc thesis students in Econometrics, among whom:
- G. van Rooij, Clustering Stores of Retailers via Consumer Behavior, 2017.

- Supervised four BSc thesis students in Econometrics, among whom:
- L.W. Hoogenboom, Recommender System Optimization through Collaborative Filtering, 2016.
- E.L.J. Mathol, Neighborhood-based Collaborative Filtering: Providing the best recommendations, 2016.
- M.L. Jongsma, Categorised Neighborhood-based Collaborative Filtering, 2016.

Teaching assistant:

- Programming (2015, 2016)
- Applied Econometrics (2015, 2016)
- Mathematical Models (2014, 2015)
- Data Analysis (2014, 2015)