Using a C library as a NumPy extension

GitHub: numpy_c_skeleton

Introducing an easy-to-use skeleton package

For one of my academic papers – GenSVM – I implemented the method in a C library for speed. Although this library comes with a number of executables, I wanted to create a Python package for this method as well to make it easier to use. Since the library uses a lot of linear algebra methods, the Python package needed to work nicely with NumPy arrays as well.

One of the things which turned out to be quite a bit harder than expected was structuring the package and creating a setup.py file which compiles both the C library and the Cython extension, and installs everything properly. After quite a bit of trial and error, I eventually came to a setup.py file that works. Below, I’ll present a skeleton package that wraps a C library as a Numpy extension for use in Python. You can find the complete package on GitHub, ready to be extended for your own use.

The following directory structure is what I used:

├── Makefile
├── numpy_c_skeleton
│   ├── core.py
│   └── __init__.py
├── README.rst
├── setup.py
├── src
│   ├── c_package
│   │   ├── c_package_helper.c
│   │   ├── include
│   │   │   └── code.h
│   │   └── src
│   │       └── code.c
│   ├── cython_wrapper.pxd
│   └── cython_wrapper.pyx
└── tests
    └── test_package.py

All the Python code is in the numpy_c_skeleton directory, the Cython code is in the top level of the src directory and the C library is in the src/c_package directory. If you're using Git, it might be a good idea to add the C library as a Git Submodule. For this illustration the C code only has a single function for computing the sum of a NumPy array. However, it should be straightforward to extend this for your own C library. In the skeleton package each file is extensively documented to describe what it does, but here is a brief description:

  • numpy_c_skeleton/core.py is the Python code that you want to expose to the users. This is just as in a regular Python package and you’re not restricted to having only one file here. For this example package, the core.py file defines a single function which uses the code from the Cython wrapper.
  • src/cython_wrapper.pxd is a Cython pxd file, which is like a C header file. This is were you define where each function or structure that you want to use comes from.
  • src/cython_wrapper.pyx is the main Cython wrapper file. This contains a function that takes a Python NumPy array and calls the external function from the C helper file on it.
  • src/c_package/c_package_helper.c adds a layer between the C library and the Cython wrapper code. It’s not strictly necessary but it can help in exposing only those parts of your C library to Cython that you want, or to make sure you don’t have to change anything in your library to work nicely with Cython.
  • src/c_package/include contains the header files for your C library. In this case there is only one file, which contains the declaration of the function defined in src/c_package/src/code.c.
  • src/c_package/src contains the source files for your C library. Note that this code can be agnostic to the fact that NumPy is used for the arrays. This makes it easy to reuse a library in a Python package.
  • tests/test_package.py contains unit tests, these show how the function from the core.py file can be used.

As mentioned, the setup.py file can be a bit tricky to get right in terms of compilation of the extension. In the end, I settled on the following:

#!/usr/bin/env python
# -*- coding: utf-8 -*-

import os
import re
import numpy

from setuptools import setup
from numpy.distutils.misc_util import Configuration

# Set this to True to enable building extensions using Cython. Set it to False·
# to build extensions from the C file (that was previously generated using·
# Cython). Set it to 'auto' to build with Cython if available, otherwise from·
# the C file.
USE_CYTHON = 'auto'

# If we are in a release, we always never use Cython directly
IS_RELEASE = os.path.exists('PKG-INFO')
if IS_RELEASE:
    USE_CYTHON = False

# If we do want to use Cython, we double check if it is available
if USE_CYTHON:
    try:
        from Cython.Build import cythonize
    except ImportError:
        if USE_CYTHON == 'auto':
            USE_CYTHON = False
        else:
            raise

def configuration():

    # This is the numpy Configuration class
    config = Configuration('numpy_c_skeleton', '', None)

    # Wrapper code in Cython uses the .pyx extension if we want to USE_CYTHON, 
    # otherwise it ends in .c.
    wrapper_ext = '*.pyx' if USE_CYTHON else '*.c'

    # Sources include the C/Cython code from the wrapper and the source code of 
    # the C library
    sources = [
            os.path.join('src', wrapper_ext),
            os.path.join('src', 'c_package', 'src', '*.c'),
            ]

    # Dependencies are the header files of the C library and any potential 
    # helper code between the library and the Cython code
    depends = [
            os.path.join('src', 'c_package', 'include', '*.h'),
            os.path.join('src', 'c_package', 'c_package_helper.c')
            ]

    # Register the extension
    config.add_extension('cython_wrapper',
            sources=sources,
            include_dirs=[
                os.path.join('src', 'c_package'),
                os.path.join('src', 'c_package', 'include'),
                numpy.get_include(),
                ],
            depends=depends)

    # Cythonize if necessary
    if USE_CYTHON:
        config.ext_modules = cythonize(config.ext_modules)

    return config


def read(fname):
    return open(os.path.join(os.path.dirname(__file__), fname)).read()


if __name__ == '__main__':

    # Pull the version from the package __init__.py
    version = re.search("__version__ = '([^']+)'", 
            open('numpy_c_skeleton/__init__.py').read()).group(1)

    # load the configuration
    attr = configuration().todict()

    # Add the other setup attributes
    attr['description'] = 'Python package skeleton for numpy C extension'
    attr['long_description'] = read('README.rst')
    attr['packages'] = ['numpy_c_skeleton']
    attr['version'] = version
    attr['author'] = "G.J.J. van den Burg"
    attr['author_email'] = "gertjanvandenburg@gmail.com"
    attr['license'] = 'GPL v2'
    attr['install_requires'] = ['numpy']
    attr['zip_safe'] = True

    # Finally, run the setup command
    setup(**attr)

This setup.py file uses Cython to compile the *.pyx files if in development mode. In release mode, the “cythonized” versions of these files are used. This way the cythonize function is run only when needed. Note further that the regular setup() function arguments are supplied as elements to the attr dictionary.

This is it! I hope this skeleton package helps you with adding your C/C++ libraries to a Python package. If so, you can “star” the repository on GitHub to let me know.