3. The Buffer Object

Copyright 2020 Andy Curtis & Daniel Curtis

How it compares to other languages

hello_buffer.c

#include "buffer.h"

int main( int argc, char *argv[] ) {
  buffer_t *bh = buffer_init(10);
  buffer_sets(bh, "Hello");
  buffer_appendc(bh, ' ');
  buffer_appendf(bh, "%s!", "Buffer");
  /* print Hello Buffer! followed with a newline */
  printf( "%s\n", buffer_data(bh) );
  buffer_destroy(bh);
  return 0;
}

building:

git clone https://github.com/contactandyc/realtime.git
cd illustrations/buffer/1_buffer
gcc hello_buffer.c buffer.c -o hello_buffer

output

$ ./hello_buffer
Hello Buffer!
$

Instead of gcc hello..., you can run

make

make will build and run hello_buffer along with a few more example programs. If git, gcc, or make fail, you may need to install them. Refer to your favorite search engine and find a tutorial on how to install them on your platform.

If the code above were written in other languages, it might look like

Python

bh = "Hello"
bh += " "
bh += "{0}!".format("Buffer")
print(bh)

Javascript

var bh = "Hello";
bh += " ";
var buffer = "Buffer";
bh += buffer + "!";
console.log(bh);

C++

#include <string>
#include <iostream>

int main(int argc, char *argv[]) {
  std::string s = string("");
  s = std::string("Hello");
  s += std::string(" ");
  s += std::string("Buffer") + std::string("!");
  std::cout << s << std::endl;
  return 0;
}

and so on.

buffer.h is part of this project (it'll later be changed to ac_buffer.h). ac stands for another c library. C doesn't have a built-in mechanism to handle growing strings or arrays. Anyone who has done C will have created an object like buffer or reused someone else's code. I'm just attempting to create standardized objects which others can use.

A bit of history and setup

To build solid C code, you need to build reusable objects. Those objects will then often be used to build larger objects. The buffer has been one of the most used objects throughout my career. In other languages, it would generally replace the string and the array. This object will change in its final form as this object ultimately will build upon another object, which I've called pool that is also one of my goto objects. The pool will be described after the section on linked structures. This object plays an important role in printing binary search trees (described in the linked structures section).

The interface for buffer is found in buffer.h. This interface must be included for the c compiler to be able to prepare the code to be converted to a binary properly. The implementation for buffer is found in buffer.c, and in our example above, we include that in the gcc compilation. It isn't my goal to explain C in this book, but I will cover some of the language aspects as I go.

#include "buffer.h"

Every good C program that is capable of being executed has the main function. The main function is typically implemented with the following two parameters to allow command-line arguments to be passed to the program. argc represents the number of arguments (the name of the program is the 1st argument). argv represents the arguments. argv[0] references the name of the program as called from the command line. The main function returns an integer. If the program executes successfully, it should return 0.

int main( int argc, char *argv[] ) {
  ...
  return 0;
}

Most of the objects you will create throughout the book will be named objectnamet. Functions that pertain to the object will be named objectnamefunctionname. Many of the objects can be created through objectnameinit and destroyed using objectname_destroy. Sometimes there will be alternate init functions, and less frequently alternate destroy functions. In general, if you init an object, you are expected to destroy the object later. Objects will not just automatically disappear (unless the pool is used, which we will discuss later).

The init function expects that you provide a hint as to how much memory your application might need. If you specify too many bytes, your program will consume extra memory needlessly. If you specify too small of a number, the buffer object will automatically grow as needed.

buffer_t *bh = buffer_init(10);
...
buffer_destroy(bh);

Set the buffer (bh) to have the string "Hello"

buffer_sets(bh, "Hello");

Append a space to the buffer (bh)

buffer_appendc(bh, ' ');

Append a formatted string where %s gets replaced with the word "Buffer" to the buffer (bh)

buffer_appendf(bh, "%s!", "Buffer");

This is just a comment in C. You can also have comments that extend to the end of the line using // comment

/* print Hello Buffer! followed with a newline */

Print the contents of the buffer (bh) by using the buffer_data(bh) to access its contents.

printf( "%s\n", buffer_data(bh) );

The buffer interface

Buffer defines several other functions which will be described shortly. Here is the full interface used for this tutorial.

#ifndef _buffer_H
#define _buffer_H

#include <stdarg.h>
#include <stdbool.h>
#include <stdio.h>

struct buffer_s;
typedef struct buffer_s buffer_t;

/* initial size of the buffer, buffer will grow as needed */
buffer_t *buffer_init(size_t initial_size);

/* clear the buffer */
void buffer_clear(buffer_t *h);

/* get the contents of the buffer */
char *buffer_data(buffer_t *h);
/* get the length of the buffer */
size_t buffer_length(buffer_t *h);

/* append bytes to the current buffer */
void buffer_append(buffer_t *h, const void *data, size_t length);
/* append a string to the current buffer */
void buffer_appends(buffer_t *h, const char *s);
/* append a character to the current buffer */
void buffer_appendc(buffer_t *h, char ch);
/* append a character n times to the current buffer */
void buffer_appendn(buffer_t *h, char ch, ssize_t n);
/* append bytes in current buffer using va_args and formatted string */
void buffer_appendvf(buffer_t *h, const char *fmt, va_list args);
/* append bytes in current buffer using a formatted string -similar to printf */
void buffer_appendf(buffer_t *h, const char *fmt, ...);

/* set bytes to the current buffer */
void buffer_set(buffer_t *h, const void *data, size_t length);
/* set a string to the current buffer */
void buffer_sets(buffer_t *h, const char *s);
/* set a character to the current buffer */
void buffer_setc(buffer_t *h, char ch);
/* set a character n times to the current buffer */
void buffer_setn(buffer_t *h, char ch, ssize_t n);
/* set bytes in current buffer using va_args and formatted string */
void buffer_setvf(buffer_t *h, const char *fmt, va_list args);
/* set bytes in current buffer using a formatted string -similar to printf */
void buffer_setf(buffer_t *h, const char *fmt, ...);

/* resize the buffer and return a pointer to the beginning of the buffer.  This
   will retain the original data in the buffer for up to length bytes. */
void *buffer_resize(buffer_t *h, size_t length);
/* grow the buffer by length bytes and return pointer to the new memory.  This
   will retain the original data in the buffer for up to length bytes. */
void *buffer_append_alloc(buffer_t *h, size_t length);
/* resize the buffer and return a pointer to the beginning of the buffer.  This
   will NOT retain the original data in the buffer for up to length bytes. */
void *buffer_alloc(buffer_t *h, size_t length);

/* destroy the buffer */
void buffer_destroy(buffer_t *h);

#endif

Notice there aren't any details as to how the buffer is implemented in the interface. It is very important to separate interfaces from implementations. If you do this, you continually improve upon the interface. If hardware changes over time, you can reimplement the implementation part as needed without affecting the rest of the codebases. Getting interfaces right is somewhat of an art. There is a balance between getting specific work done and creating reusable objects. One element of the coding style that I use is that you can use grep to find every line where a given object is used. I'm not alone in using this coding style. Projects like libcurl, libuv (behind Google Chrome), and others use similar approaches.

The implementation

buffer.c

#include "buffer.h"

#include <stdlib.h>
#include <string.h>

struct buffer_s {
  char *data;
  size_t length;
  size_t size;
};

buffer_t *buffer_init(size_t initial_size) {
  buffer_t *h = (buffer_t *)malloc(sizeof(buffer_t));
  h->data = (char *)malloc(initial_size + 1);
  h->data[0] = 0;
  h->length = 0;
  h->size = initial_size;
  return h;
}

void buffer_clear(buffer_t *h) {
  h->length = 0;
  h->data[0] = 0;
}

void buffer_destroy(buffer_t *h) {
  free(h->data);
  free(h);
}

char *buffer_data(buffer_t *h) { return h->data; }
size_t buffer_length(buffer_t *h) { return h->length; }

static inline void _buffer_grow(buffer_t *h, size_t length) {
  size_t len = (length + 50) + (h->size >> 3);
  char *data = (char *)malloc(len + 1);
  memcpy(data, h->data, h->length + 1);
  free(h->data);
  h->data = data;
  h->size = len;
}

void *buffer_resize(buffer_t *h, size_t length) {
  if (length > h->size)
    _buffer_grow(h, length);
  h->length = length;
  h->data[h->length] = 0;
  return h->data;
}

void *buffer_append_alloc(buffer_t *h, size_t length) {
  if (length + h->length > h->size)
    _buffer_grow(h, length + h->length);
  char *r = h->data + h->length;
  h->length += length;
  r[h->length] = 0;
  return r;
}

static inline void _buffer_append(buffer_t *h, const void *data,
                                  size_t length) {
  if (h->length + length > h->size)
    _buffer_grow(h, h->length + length);

  memcpy(h->data + h->length, data, length);
  h->length += length;
  h->data[h->length] = 0;
}

void buffer_append(buffer_t *h, const void *data, size_t length) {
  _buffer_append(h, data, length);
}

void buffer_appends(buffer_t *h, const char *s) {
  _buffer_append(h, s, strlen(s));
}

void buffer_appendc(buffer_t *h, char ch) {
  if (h->length + 1 > h->size)
    _buffer_grow(h, h->length + 1);

  char *d = h->data + h->length;
  *d++ = ch;
  *d = 0;
  h->length++;
}

void buffer_appendn(buffer_t *h, char ch, ssize_t n) {
  if (n <= 0)
    return;

  if (h->length + n > h->size)
    _buffer_grow(h, h->length + n);

  char *d = h->data + h->length;
  memset(d, ch, n);
  d += n;
  *d = 0;
  h->length += n;
}

static inline void _buffer_alloc(buffer_t *h, size_t length) {
  size_t len = (length + 50) + (h->size >> 3);
  free(h->data);
  h->data = (char *)malloc(len + 1);
  h->size = len;
}

void *buffer_alloc(buffer_t *h, size_t length) {
  if (length > h->size)
    _buffer_alloc(h, length);
  h->length = length;
  h->data[h->length] = 0;
  return h->data;
}

static inline void _buffer_set(buffer_t *h, const void *data, size_t length) {
  if (length > h->size)
    _buffer_alloc(h, length);
  memcpy(h->data, data, length);
  h->length = length;
  h->data[length] = 0;
}

void buffer_set(buffer_t *h, const void *data, size_t length) {
  _buffer_set(h, data, length);
}

void buffer_sets(buffer_t *h, const char *s) { _buffer_set(h, s, strlen(s)); }

void buffer_setc(buffer_t *h, char c) { _buffer_set(h, &c, 1); }

void buffer_setn(buffer_t *h, char ch, ssize_t n) {
  h->length = 0;
  buffer_appendn(h, ch, n);
}

void buffer_setvf(buffer_t *h, const char *fmt, va_list args) {
  h->length = 0;
  buffer_appendvf(h, fmt, args);
}

void buffer_setf(buffer_t *h, const char *fmt, ...) {
  h->length = 0;
  va_list args;
  va_start(args, fmt);
  buffer_appendvf(h, fmt, args);
  va_end(args);
}

void buffer_appendvf(buffer_t *h, const char *fmt, va_list args) {
  va_list args_copy;
  va_copy(args_copy, args);
  size_t leftover = h->size - h->length;
  char *r = h->data + h->length;
  int n = vsnprintf(r, leftover, fmt, args_copy);
  if (n < 0)
    abort();
  va_end(args_copy);
  if (n < leftover)
    h->length += n;
  else {
    _buffer_grow(h, h->length + n);
    r = h->data + h->length;
    va_copy(args_copy, args);
    int n2 = vsnprintf(r, n + 1, fmt, args_copy);
    if (n != n2)
      abort(); // should never happen!
    va_end(args_copy);
    h->length += n;
  }
}

void buffer_appendf(buffer_t *h, const char *fmt, ...) {
  va_list args;
  va_start(args, fmt);
  buffer_appendvf(h, fmt, args);
  va_end(args);
}

Most objects and code will manipulate data. Buffer isn't an exception. It is often the case that I will design software around data. The buffert structure is the input to all of the functions except bufferinit, which creates it. In C, it is common to name a structure names and then typedef that to be namet. names and namet are the same thing. To refer to names, you must include the keyword struct before it. The namet doesn't require the struct keyword to precede the name (it doesn't work).

The code in buffer.h indicates that the buffert is a new type (struct buffers) and that the members of buffers are hidden. By hiding the members of buffers in the implementation, one can force users to only work through the interface.

struct buffer_s;
typedef struct buffer_s buffer_t;

The buffers structure is defined as follows in buffer.c. buffers maintains both a length and a size (and data). data is always size bytes in length (+1). length is the number of bytes that are currently in use by the buffer within data.

struct buffer_s {
  char *data;
  size_t length;
  size_t size;
};

The bufferinit function will initialize the object as being empty (length=0), with data being assigned to the initialsize (+1). In C, strings are terminated with a zero byte. I've found it handy in the buffer object to keep a zero character at the end of the buffer.

buffer_t *buffer_init(size_t initial_size) {
  buffer_t *h = (buffer_t *)malloc(sizeof(buffer_t));
  /* notice the +1 to allow for zero terminator */
  h->data = (char *)malloc(initial_size + 1);
  h->data[0] = 0;
  h->length = 0;
  h->size = initial_size;
  return h;
}

Objects will typically be allocated as

objectnamet *h = (objectname *)malloc(sizeof(objectname_t));

This allocates the space needed (the size of the buffer_t structure) from the heap.

buffer_t *h = (buffer_t *)malloc(sizeof(buffer_t));

The data member needs to be initially allocated to be initial_size + 1 bytes.

h->data = (char *)malloc(initial_size + 1);

Because the initial value of length is zero, set the first byte to zero.

h->data[0] = 0;

Set the initial length to zero

h->length = 0;

Set the size to the initialsize passed into the bufferinit function.

h->size = initial_size;

Return the buffer_t object now that it has been properly initialized.

return h;

You may notice that I don't check the return from malloc. If malloc fails, it will return NULL. The line after both malloc calls will cause the program to abort, which is the only thing I'd want to do anyway. For this reason, there isn't a point in having a line like if(!h) abort();.

The bufferdestroy method frees up the resources used by bufferinit.

void buffer_destroy(buffer_t *h) {
  free(h->data);
  free(h);
}

In C, you must keep track of the pointer to memory that you allocate through malloc so that it can be freed later using free. We will develop objects which will keep track of allocation for us. These objects will follow a common pattern, which reduces the chance that end users will make a mistake in this regard.

The buffer maintains a single contiguous block of memory that may change as you add more data to it (if the current block of memory is too small, for example). Clearing the buffer doesn't free memory. It only resets the length counter to 0 and sets the zero terminator to the 0th byte.

void buffer_clear(buffer_t *h) {
  h->length = 0;
  h->data[0] = 0;
}

To access the data, call buffer_data. The implementation simply returns the data member from the structure.

char *buffer_data(buffer_t *h) { return h->data; }

Similarly, you can get the length of your data by calling buffer_length.

size_t buffer_length(buffer_t *h) { return h->length; }

To set bytes into the buffer, you can use buffer_set, which is implemented as follows. The buffer doesn't make a distinction between binary data and text. You can set (or append) binary data or text through this function.

void buffer_set(buffer_t *h, const void *data, size_t length) {
  _buffer_set(h, data, length);
}

The function above simply calls an internal inlined function to set bytes into the buffer.

static inline void _buffer_set(buffer_t *h, const void *data, size_t length) {
  if (length > h->size)
    _buffer_alloc(h, length);
  memcpy(h->data, data, length);
  h->length = length;
  h->data[length] = 0;
}

If length is greater than the size member (h->size), then the buffer needs to be resized using bufferalloc

if (length > h->size)
  _buffer_alloc(h, length);

The inline bufferalloc method.

static inline void _buffer_alloc(buffer_t *h, size_t length) {
  size_t len = (length + 50) + (h->size >> 3);
  free(h->data);
  h->data = (char *)malloc(len + 1);
  h->size = len;
}

Our new length will be the requested length + 50 + (the old size / 8). size_t is a type that is defined in stdio.h.

size_t len = (length + 50) + (h->size >> 3);

Because we are setting the value, we don't need to retain the old data. We can just free it.

free(h->data);

Allocate data to the new size and set the size parameter accordingly.

h->data = (char *)malloc(len + 1);
h->size = len;

In bufferset again, copy data which is length bytes long into h->data.

memcpy(h->data, data, length);

Set the length of the buffer to be length and set the zero terminator.

h->length = length;
h->data[length] = 0;

Similar to bufferset, bufferappend just calls bufferappend.

void buffer_append(buffer_t *h, const void *data, size_t length) {
  _buffer_append(h, data, length);
}

The key differences are that the new size will be h->length + length instead of just length. buffergrow makes a copy of the old data if the memory has to be reallocated.

static inline void _buffer_append(buffer_t *h, const void *data,
                                  size_t length) {
  if (h->length + length > h->size)
    _buffer_grow(h, h->length + length);

  memcpy(h->data + h->length, data, length);
  h->length += length;
  h->data[h->length] = 0;
}

The new length is the same as bufferalloc (length+50+(h->size/8)). The new memory is malloced into data, and the current h->data is copied to the new memory before h->data is freed.

static inline void _buffer_grow(buffer_t *h, size_t length) {
  size_t len = (length + 50) + (h->size >> 3);
  char *data = (char *)malloc(len + 1);
  memcpy(data, h->data, h->length + 1);
  free(h->data);
  h->data = data;
  h->size = len;
}

buffersets uses _bufferset for its implementation using strlen to determine length.

void buffer_sets(buffer_t *h, const char *s) { _buffer_set(h, s, strlen(s)); }

bufferappends appends a string. bufferappends calls bufferappend with the strlen of s.

void buffer_appends(buffer_t *h, const char *s) {
  _buffer_append(h, s, strlen(s));
}

buffersetc uses _bufferset and takes the address of c and passes a length of 1.

void buffer_setc(buffer_t *h, char c) { _buffer_set(h, &c, 1); }

This method is optimized a bit since we are only adding exactly one byte. Appending is done far more frequently than setting, so it is optimized.

void buffer_appendc(buffer_t *h, char ch) {
  if (h->length + 1 > h->size)
    _buffer_grow(h, h->length + 1);

  char *d = h->data + h->length;
  *d++ = ch;
  *d = 0;
  h->length++;
}

buffersetn sets the length to zero (clearing buffer) and then calls bufferappendn.

void buffer_setn(buffer_t *h, char ch, ssize_t n) {
  h->length = 0;
  buffer_appendn(h, ch, n);
}

buffer_appendn appends ch n times to the buffer. Using memset instead of memcpy to repeat character n times.

void buffer_appendn(buffer_t *h, char ch, ssize_t n) {
  if (n <= 0)
    return;

  if (h->length + n > h->size)
    _buffer_grow(h, h->length + n);

  char *d = h->data + h->length;
  memset(d, ch, n);
  d += n;
  *d = 0;
  h->length += n;
}

buffersetvf sets the length to zero (clearing buffer) and then calls bufferappendvf. va_list is defined in stdarg.h

void buffer_setvf(buffer_t *h, const char *fmt, va_list args) {
  h->length = 0;
  buffer_appendvf(h, fmt, args);
}

buffersetf sets the length to zero (clearing buffer). It then associates a valist variable to the parameter fmt (which is considered a format string). The format string is ultimately used much like printf (search for "c printf tutorial" to find more about it). bufferappendvf is called, and then the args variable's resources are freed up. You can make custom functions that set/append to a buffer in much the same way as the setf/appendf methods. Of course, you will use bufferclear before buffer_appendvf if you wish to implement a custom set method.

void buffer_setf(buffer_t *h, const char *fmt, ...) {
  h->length = 0;
  va_list args;
  va_start(args, fmt);
  buffer_appendvf(h, fmt, args);
  va_end(args);
}

bufferappendf is identical to buffersetf, except that the length isn't reset (h->length = 0 doesn't exist).

void buffer_appendf(buffer_t *h, const char *fmt, ...) {
  va_list args;
  va_start(args, fmt);
  buffer_appendvf(h, fmt, args);
  va_end(args);
}

This method is used to append a formatted string to a buffer and probably one of the more complex functions in buffer. vsnprintf returns the number of bytes needed to print a given string. Because the buffer has reserve memory, the leftover (or reserved) memory is frequently enough, and the buffer doesn't need to grow. I'll break it down afterward.

void buffer_appendvf(buffer_t *h, const char *fmt, va_list args) {
  va_list args_copy;
  va_copy(args_copy, args);
  size_t leftover = h->size - h->length;
  char *r = h->data + h->length;
  int n = vsnprintf(r, leftover, fmt, args_copy);
  if (n < 0)
    abort();
  va_end(args_copy);
  if (n < leftover) {
    r[n] = 0;
    h->length += n;
  } else {
    _buffer_grow(h, h->length + n);
    r = h->data + h->length;
    va_copy(args_copy, args);
    int n2 = vsnprintf(r, n + 1, fmt, args_copy);
    if (n != n2)
      abort(); // should never happen!
    va_end(args_copy);
    h->length += n;
  }
}

Before calling vsnprintf, a copy of the args should be made as follows. The va_end should be called once done with the copy of the args.

va_list args_copy;
va_copy(args_copy, args);
... // call vsnprintf
va_end(args_copy);

Get the number of bytes leftover in the buffer. Maybe vsnprintf will only have to be called once!

size_t leftover = h->size - h->length;

Get a pointer to the end of the current data (to append the string)

char *r = h->data + h->length;

vsnprintf requires a buffer (r), and a length of the buffer (leftover), and the fmt, and valist (argscopy). It returns the number of bytes needed to be successful. If n is -1, something bad went wrong. Just abort (maybe this isn't the best decision???)

int n = vsnprintf(r, leftover, fmt, args_copy);
if (n < 0)
  abort();

If n is less than leftover, then the write was successful, and we are done. Add n bytes to the overall length.

if (n < leftover)
  h->length += n;

If n is greater or equal to leftover, the buffer size must be increased to the current length + n bytes.

else {
  _buffer_grow(h, h->length + n);

Get a pointer to the end of the current data (to append the string)

  r = h->data + h->length;

Copy the args like before.

  va_copy(args_copy, args);

The memory needed for this to complete is 1 byte more than n because vsnprintf returns the number of bytes - 1 for the zero terminator. Once this is done, add n bytes to length.

  int n2 = vsnprintf(r, n + 1, fmt, args_copy);
  if (n != n2)
    abort(); // should never happen!
  va_end(args_copy);
  h->length += n;

buffer_alloc resizes the buffer to length bytes and returns you a pointer to the data. The old data may be overridden.

void *buffer_alloc(buffer_t *h, size_t length) {
  if (length > h->size)
    _buffer_alloc(h, length);
  h->length = length;
  h->data[h->length] = 0;
  return h->data;
}

buffer_resize allows you to alter the size of the buffer and then returns a pointer to the data. The resize will not overwrite old data (except if resizing to a smaller length). If resizing to a smaller length, a zero is written to the length byte.

void *buffer_resize(buffer_t *h, size_t length) {
  if (length > h->size)
    _buffer_grow(h, length);
  h->length = length;
  h->data[h->length] = 0;
  return h->data;
}

bufferappendalloc is like buffer_resize, except that it allocates length more bytes and returns a pointer to those new bytes. It is safe to assume that the byte after length is set to zero.

void *buffer_append_alloc(buffer_t *h, size_t length) {
  if (length + h->length > h->size)
    _buffer_grow(h, length + h->length);
  char *r = h->data + h->length;
  h->length += length;
  r[h->length] = 0;
  return r;
}

If you followed everything in this, that is great. Otherwise, it might make more sense as the buffer object is used throughout the project. Next up is /2linkedstructures.md

Table of Contents (if viewing on Github)