Most modern programming languages have some form of automated garbage collection mechanism that takes up the tasks of memory management on behalf of the programmer. This typically happens by identifying objects that are no longer referenced by the program in execution and marking them for removal. The heap space of the memory must be recycled because, in the process of program execution, the entire memory may be used up unless garbage is collected in time.
Garbage collection in Go is based upon tricolor algorithms. This algorithm is the backbone for many other languages as well. This Go programming tutorial explains the concept behind tricolor algorithms and how Golang uses it to implement garbage collection mechanisms.
Reading: Understanding Functions in Go
Memory Management in Go
One way to garbage collection in Go is to leave the task of memory management up to the programmer, who hardcodes the clean up process and deletes any objects that go out of scope. This is typically seen in C/C++, where we use free and delete to do the clean up. The problem is that the programmer who is occupied with the logic of the code now has the extra burden of writing a clean-up procedure. This practically becomes an uphill task, as the lines of code increases. However, this manual memory management scheme is not without its advantages because programmers can now optimize the clean-up process to the teeth.
As mentioned, it is not an easy thing to achieve, however; the programmer might miss a step at some critical point. So, it was with this thought that we decided, why not provide an automatic garbage collector that can identify the unreferenced objects at runtime and do the necessary clean up automatically without the intervention of the programmer? This is not a super optimum scheme, but can do a lot to relieve the programmer’s burden. The algorithm used for automatic memory management may not be perfect today but has evolved into something that has garnered trust among developers. In light of this, many compiled languages, like Java, Goand pythonhave built in garbage collection routines in them.
Implementing automatic garbage collection is easier said than done, however. The hardest part is not in the algorithm, but in imbibing the scheme that works seamlessly and silently without affecting the performance of the program. While Java uses threads, Golang uses goroutines to achieve the same effect. Many algorithms are proposed for implementing automatic garbage collection, such as tri-color marking, Marks and Sweep, Cheney’s copy algorithm, Knuth’s Lisp 2 algorithm, and so forth. Understand that there is no one perfect algorithm to do the job and most programming languages use one or more of these schemes with some intricate modifications. Golang, for its part, uses tricolor with a mark and sweep algorithm.
Reading: Understanding Garbage Collection in Go
What is the Tricolor Mark and Sweep Algorithm in Go?
The Tricolor Mark and Sweep algorithm in Go works concurrently with the write barrier. It is necessary that it works concurrently because automatic garbage collection is an overhead that works behind the scenes during program execution. When a Go program runs, the Go scheduler schedules both the application and the garbage collector. This is not very difficult to do thanks to the use of goroutines; goroutines are like threads that can execute concurrently. You can read more about them in our Golang tutorial: Introduction to Goroutines in Go.
Now, the first task of any garbage collection algorithm is to identify all unreachable objects and mark them to be cleared from memory. These unreachable objects are called garbage in the system and the memory space it occupies must be deallocated to make it reusable. The second task is to reclaim the space. The first task is relatively the hardest.
Tricolor Marks and Sweep in Golang
This Tricolor Marks and Sweep algorithm divides objects in the heap into three sets according to the color assigned, such as black, white, and gray. The white set objects are the candidates for garbage collection. The black set objects guarantee to have no reference to any objects in the white set. But note: objects in the white set may have reference to the objects in the black set; this has no effect on the operation of the garbage collector. The objects in the gray set might have reference to the objects in the white set.
Here, Go programmers must understand that the gray set is the intermediary set and no black set object can directly be white or black under any circumstance. They must go through the gray set from where it is decided whether to put in the black or white sets. This means that when the garbage collection cycle begins, all objects are marked as white to begin with, then gray, and, as the algorithm operates, the garbage collector visits the root object and colors them black or white accordingly. The root of the object essentially means the objects that the application can directly access.
Tricolor Scheme in Go
The steps in the Tricolor Scheme can grossly be summed up as follows:
- Step 1: Pick an object from the gray set.
- Step 2: Gray all the object references and move it to the black set
- Step 3: Repeat step 1 and 2 until the gray set is empty.
- Step 4: Put all other objects in the black set if they are reachable from the root otherwise put them into the white set.
- Step 5: The objects in the white set can be garbage collected.
Let’s illustrate the algorithm in a simple manner.
The nodes in the graph below indicate the objects in memory and the arrows indicate references to other objects:
The three different sets of colors are: black, gray, and white. The algorithm begins by putting all objects/nodes in the directed graph above in a queue and colors them white. The color of the node changes as nodes are traversed. The BFS – Breadth First Search or any other search scheme may be applied.
Let’s start from A (as root node), marked as gray, which means it is reachable and we are processing it at the moment. Pull out A from the queue. We can reach C, D, and B from A. Again, BFS – Breadth First Search may be applied here.
Marking A as black means it is safe and not to be garbage collected.
then, pull out C, the next element from the queue. We can reach F and D from C, which, in turn, has root A. Therefore, we mark C as black.
Next, from D we get B, so mark D as black.
Similarly, we can reach B and F, so subsequently mark them black as well.
As we pull out G and E, we can see that it cannot reach from any of the root. So it remains white.
Therefore, G and E in the white set are candidates for garbage collection.
Note garbage collection runs in cycles. If any objects in the gray set become unreachable at some point, the garbage collector deals with it in the next cycle. The running application called the mutator has a function called write barrier, which is invoked whenever a pointer in the heap is modified. Any modification in the heap means the object concerned needs to be reconsidered, therefore put it in the gray list. It is the mutator which ensures that the black set has no reference to the white set.
Reading: How to Use Pointers in Go
Final Thoughts on the Go Tricolor Algorithm
The actual garbage collection process is more complicated then stated here, but you perhaps get an idea for how the Tricolor scheme is used with the garbage collection routine in Go. As the garbage collection cycle runs in the background at its suitable time, Golang programmers can force clean-up in code by invoking the runtime.GC() statement. This, however, is not a good idea because it may block the caller and the entire program may be halted, especially when the program is busy with many objects. In such a situation, the garbage collector has a hard time identifying objects and marking them as black, gray, or white in the rapidly changing scenario. So the best idea is to leave it to the expert automatic garbage collector running in the background and focus on the program logic at hand.
read more Go and Golang programming tutorials.