Attila's Blog

About writing your own programming language in C, Go and Swift

Generic containers in C

Sep 17, 2018 Comments

Categories: aspl

We have a working parser that is capable of parsing the four basic arithmetic operators (+, -, *, /), negation (prefix -) and grouping (()). We could continue adding new features to the parser until it fully supports the ASPL language but it would be very boring to go this way. It’s going to be much more interesting to implement the full pipeline, and finally see our virtual machine executing some code, am I right? Today we start working towards a fully operational virtual machine. By the time we finish, we will have our own arithmetic calculator that supports addition, subtraction, multiplication, and division. This may seem like a small step but don’t worry, we will keep adding new features to it until we implement every feature of ASPL.

First things first. Before we start implementing the virtual machine we need some basic data containers that support dynamic allocation, and automatic resizing. Ideally, it would be also great to implement it only once and reuse it with every data type that needs storing in a container.

Why?

  • Because I don’t want to deal with low-level memory manipulation every time I need an array of objects
  • I am also lazy and don’t like repeating the same code to support multiple data types
  • C does not have any support for this
  • And most importantly, because it is fun to implement it for yourself

But wait, C doesn’t have generics, how are we going to do this? Well, if you know C, you can do it… with the help of the preprocessor and some macro magic :)

Let me show you how.

Let’s create a new folder called data_structures. Then create a header inside it, called generic_array.h. First, we will need some macros that will make it possible to implement our generic array.

As I already mentioned there is no such thing as generics in C, so we have to cheat a little bit. The idea behind “generic” containers in C is that we will need one data structure for every supported type (every data type we would like to store in these generic containers). Obviously, if we have to copy the whole code every time we want to add support for a new type, it would be a disaster and also we could not really call that a “generic” container.

We can do better than that. How? We will write some macros that will generate our data types during preprocessing. And some macros that will also generate functions that operate on these automatically generated data structures.

#define GA_MAKE_FUNCTION(return_type, prefix, name, ...) static return_type CONCATENATE(prefix, name) (__VA_ARGS__)
#define GA_CALL_FUNCTION(prefix, name, ...) CONCATENATE(prefix, name)(__VA_ARGS__)
#define GA_ARRAY_TYPE(array_name) CONCATENATE(array_name, _t)

GA_MAKE_FUNCTION is a helper macro that will generate a series of different function signatures based on the parameters it receives. It uses the CONCATENATE macro that is defined in commons.h. CONCATENATE expects two parameters, and simply concatenates them into one string.

GA_MAKE_FUNCTION expects some parameters:

  • return_type is the return type of the generated function
  • prefix is a prefix unique to the given data type. We need this so that every supported data type will have different “method” names. For clarity, the prefix will be the name of the data type in every case.
  • name is the name of the function, a name that described what the function does with the data container.
  • and finally ... is a variadic parameter, GA_MAKE_FUNCTION will send any other parameters into the function’s parameter list.

GA_CALL_FUNCTION generates a function call by concatenating the prefix with the name to form the name of the function to be called and sends every other parameter into the parameter list of the generated function call.

Finally, GA_ARRAY_TYPE is just to append a _t string to the end of the array type.

Now, one word of caution about the header guards in generic_array.h. Go and have a look first. You will see that the header guards are only protecting the three macros we just described, and the typedef and all the definitions starting with GA_MAKE_FUNCTION are left unprotected.

This is a crucial piece in our implementation. If everything were to be protected by header guards it would be impossible to generate a series of data types and functions during compilation because the guards would prevent the protected areas to be included more than once. This means, if we were to generate a series of functions with different names from the same code, we have to leave those pieces unprotected.

Let’s see the template of the data type for our generic container.

typedef struct GA_ARRAY_TYPE(__GENERIC_ARRAY_STRUCT_NAME__) {
    size_t capacity;
    size_t count;
    __GENERIC_ARRAY_TYPE__* values;
} __GENERIC_ARRAY_NAME__;

__GENERIC_ARRAY_STRUCT_NAME__ contains the name of the generic data type in the struct namespace. __GENERIC_ARRAY_TYPE__ is the actual data type we want to store in the container. __GENERIC_ARRAY_NAME__ is the name of the generic data type in the global namespace. As you can see, it stores the current count, the capacity, and a dynamic array of values. Now let’s see the functions that work on these containers.

GA_MAKE_FUNCTION(void, __GENERIC_ARRAY_STRUCT_NAME__, _init, __GENERIC_ARRAY_NAME__* array) {
    array->capacity = 0;
    array->count = 0;
    array->values = NULL;
}

GA_MAKE_FUNCTION(void, __GENERIC_ARRAY_STRUCT_NAME__, _free, __GENERIC_ARRAY_NAME__* array) {
    if (array != NULL) {
        if (array->count > 0) {
            DELETE_DYNAMIC_ARRAY(__GENERIC_ARRAY_TYPE__, array->values, array->capacity);
        }

        GA_CALL_FUNCTION(__GENERIC_ARRAY_STRUCT_NAME__, _init, array);
    }
}

GA_MAKE_FUNCTION(void,
                 __GENERIC_ARRAY_STRUCT_NAME__,
                 _free_with_deleter,
                 __GENERIC_ARRAY_NAME__* array,
                 void (* deleter)(__GENERIC_ARRAY_TYPE__ value)) {
    if (array != NULL) {
        if (array->count > 0) {
            if (deleter != NULL) {
                for (int i = 0; i < array->count; i++) {
                    deleter(*(array->values + i));
                }
            }

            DELETE_DYNAMIC_ARRAY(__GENERIC_ARRAY_TYPE__, array->values, array->capacity);
        }

        GA_CALL_FUNCTION(__GENERIC_ARRAY_STRUCT_NAME__, _init, array);
    }
}

GA_MAKE_FUNCTION(void,
                 __GENERIC_ARRAY_STRUCT_NAME__,
                 _add_value,
                 __GENERIC_ARRAY_NAME__* array,
                 __GENERIC_ARRAY_TYPE__ value) {
    if (array->capacity < array->count + 1) {
        size_t oldCapacity = array->capacity;
        array->capacity = memory_grow_capacity(oldCapacity);
        array->values = GROW_DYNAMIC_ARRAY(__GENERIC_ARRAY_TYPE__, array->values, array->count, array->capacity);
    }

    array->values[array->count++] = value;
}

GA_MAKE_FUNCTION(__GENERIC_ARRAY_TYPE__,
                 __GENERIC_ARRAY_STRUCT_NAME__,
                 _get_value,
                 __GENERIC_ARRAY_NAME__* array,
                 size_t index) {
    if (array == NULL || index > array->count - 1) {
        return (__GENERIC_ARRAY_TYPE__) 0;
    }

    return *(array->values + index);
}

GA_MAKE_FUNCTION(size_t, __GENERIC_ARRAY_STRUCT_NAME__, _count, __GENERIC_ARRAY_NAME__* array) {
    return array->count;
}

GA_MAKE_FUNCTION(void,
                 __GENERIC_ARRAY_STRUCT_NAME__,
                 _set,
                 __GENERIC_ARRAY_NAME__* array,
                 size_t index,
                 __GENERIC_ARRAY_TYPE__ value) {
    if (array == NULL || index > array->count - 1) {
        return;
    }

    array->values[index] = value;
}

The first function _init will initialize the dynamic container. The next one, _free is responsible for freeing all dynamically allocated values that are held in the container. _free_with_deleter is a variation of _free, in case you want to provide a custom “deleter” function for the data held in the container, you should use this version. _add_value appends a new value to the end of the array. _get_value retrieves the element stored at the indexth index. _count returns the item count in the array. _set sets (overrides) an existing position with a new value.

GROW_DYNAMIC_ARRAY and DELETE_DYNAMIC_ARRAY are two new macros in memory_management.h and memory_grow_capacity is a new function.

#define GROW_DYNAMIC_ARRAY(type, pointer, currentSize, newSize) (type*)memory_reallocate((pointer), sizeof(type) * (currentSize), sizeof(type) * (newSize))
#define DELETE_DYNAMIC_ARRAY(type, pointer, capacity)           memory_reallocate((pointer), sizeof(type) * (capacity), 0)

extern size_t memory_grow_capacity(size_t current_size);

memory_grow_capacity is used to check if the current container can store more data, and if not it gives back the new size of the container. It is checking if the current size is less than an arbitrarily chosen threshold GROW_CAPACITY_THRESHOLD, which is 8 in our case. If the current capacity is not enough to store more data it will calculate the new size of the container by multiplying the current size with a chosen factor. In our case it is GROW_CAPACITY_FACTOR, and if a resize is needed we double the current capacity. GROW_DYNAMIC_ARRAY is responsible for the actual resizing via memory_reallocate. DELETE_DYNAMIC_ARRAY is a macro to delete the dynamic container along with all the data it contains.

All right, the generic container is basically ready, but how do we use it? First, we need to add some supported data types. If you go back and have a good look on the struct we created, you will see that it contains three macros that we have not defined yet. Let’s do that now.

Here is how we add support for any kind of data to be stored in our generic container.

Let’s say we want to store uint8_t data type in the container.

//
// Created by Attila Haluska on 9/7/18.
//

#ifndef ASPL_C_UINT8_ARRAY_H
#define ASPL_C_UINT8_ARRAY_H

#if defined __GENERIC_ARRAY_TYPE__
    #undef __GENERIC_ARRAY_TYPE__
#endif

#if defined __GENERIC_ARRAY_STRUCT_NAME__
    #undef __GENERIC_ARRAY_STRUCT_NAME__
#endif

#if defined __GENERIC_ARRAY_NAME__
    #undef __GENERIC_ARRAY_NAME__
#endif

#define __GENERIC_ARRAY_TYPE__ uint8_t
#define __GENERIC_ARRAY_NAME__ Uint8Array
#define __GENERIC_ARRAY_STRUCT_NAME__ uint8_array

#include "data_structures/generic_array.h"

#endif //ASPL_C_UINT8_ARRAY_H

We just need to define the three missing macros we talked about previously. As we may have several of these, let’s make sure we undefine them first in case they were defined elsewhere. Then, define them again with the data type and container name we want. Finally, include generic_array.h to make sure the preprocessor will use the newly defined macros and generate our dynamic container and its methods for us. If we want to support more data types, we just have to repeat the process.

Let’s add support for uint64_t as well.

//
// Created by Attila Haluska on 9/7/18.
//

#ifndef ASPL_C_UINT64_ARRAY_H
#define ASPL_C_UINT64_ARRAY_H

#if defined __GENERIC_ARRAY_TYPE__
    #undef __GENERIC_ARRAY_TYPE__
#endif

#if defined __GENERIC_ARRAY_STRUCT_NAME__
    #undef __GENERIC_ARRAY_STRUCT_NAME__
#endif

#if defined __GENERIC_ARRAY_NAME__
    #undef __GENERIC_ARRAY_NAME__
#endif

#define __GENERIC_ARRAY_TYPE__ uint64_t
#define __GENERIC_ARRAY_NAME__ Uint64Array
#define __GENERIC_ARRAY_STRUCT_NAME__ uint64_array

#include "data_structures/generic_array.h"

#endif //ASPL_C_UINT64_ARRAY_H

We repeated the process, and by defining the same three macros we easily added support for another data type.

It’s great, isn’t it? Now, how do we actually use these new data structures to store data?

It’s easy. First, include the header specific to the data type you want to store. (If you include generic_array.h to store data, it won’t work. So, let’s include #include "data_structures/uint8_array.h" and #include "data_structures/uint64_array.h" in our main.c to test the new containers.

Put this somewhere into main.c:

Uint8Array uint8Array;
Uint64Array uint64Array;

uint8_array_init(&uint8Array);
uint64_array_init(&uint64Array);

for (int i = 0; i < 10; i++) {
    uint8_array_add_value(&uint8Array, (uint8_t)i);
    uint64_array_add_value(&uint64Array, (uint64_t)i);
}

size_t uint8ArrayCount = uint8_array_count(&uint8Array);
size_t uint64ArrayCount = uint64_array_count(&uint64Array);

fprintf(stdout, "Uint8Array count: %ld\n", uint8ArrayCount);
fprintf(stdout, "Uint64Array count: %ld\n", uint64ArrayCount);

for (size_t i = 0; i < uint64ArrayCount; i++) {
    fprintf(stdout, "Uint8Array[%ld] = %d\n", i, uint8_array_get_value(&uint8Array, i));
    fprintf(stdout, "Uint64Array[%ld] = %lld\n", i, uint64_array_get_value(&uint64Array, i));
}

uint8_array_free(&uint8Array);
uint64_array_free(&uint64Array);

As you see, the preprocessor generated all the functions we need to actually use the containers, the IDE correctly completes our code, and if you compile and run the code, it works just fine.

We will use this to store all kinds of data dynamically later when we start working on the virtual machine.

Grab the source code of the current state of the project if you like.

Next time we will explore what bytecode is and how virtual machines execute them. We will also implement support for our very first set of bytecodes.

Stay tuned. I’ll be back.

← Checking for memory leaks with Valgrind What is bytecode →

comments powered by Disqus