Here’s a fun question: what would Go look like embedded inside of Haskell?
By “Go” here I don’t mean the full language itself but some kind of core calculus with the features that people really like about Go: namely, its synchronization primitives. This core calculus should at least have the ability to spawn new goroutines and the ability to pass data between goroutines using channels.
By “embedding” a language in Haskell, I mean that I want to re-use Haskell’s constructs for most things and only write what is specific to the language. That means only writing code involving goroutines and channels. Everything else---variables, functions, control structures---are Haskell’s. The interpreter we’re going to write for this language---let’s call it “HaskGo”---will look like a virtual machine executing a small instruction set. This makes the interpreter quite simple, since there’s no AST to traverse: HaskGo programs are just lists of instructions.
It’s not just for fun that I wanted to embed Go in Haskell. Embedding a language inside of Haskell is an easy way to model its actual implementation. There’s no lexer or parser to create, just the interpreter and the instruction set. Embedding a language in Haskell allows us to easily experiment with extending the language with new extensions. For example, determining what might channel combinators in Go look like, as we will see later on, is as easy as writing a few example programs and changing the interpreter---all within one language, Haskell.
Writing HaskGo with Free Monads
The standard way of embedding a language in Haskell is through free monads.
I’m not going to dwell too much on how free monads work since there are already
a lot of good resources that provide an introduction. I’m particularly fond of
Gabriel Gonzalez’s blog post on the topic, linked
here.
The short of it is that free monads are so named because from any functor, we
can derive a monad for it, for free.1 Free monads are useful for creating
embedded languages because they allow you to write monadic expressions that look
like regular imperative code---just like how expressions in the IO
monad look
like imperative code---but actually denote a list of instructions for our
language’s interpeter / virtual machine to execute.
With that preamble, let’s get to the code. Here’s the datatype representing the instruction set of our language:
As promised, the instruction set is simple. GoRun
denotes the instruction to
spawn off a new goroutine. GoMakeChan
denotes the creation of a new channel.
GoPutChan
and GoGetChan
denote putting and and getting values from a
channel respectively. GoSelect
implements Go’s select
control structure,
which allows the user to wait on a list of channels instead of just one. If
there is a branch waiting on a channel that is not blocked, select
jumps to
that branch; otherwise it blocks until one of the branches is unblocked.
GoPrint
denotes an instruction to print something on the screen. GoProgram
is a type synonym for the free monad for our GoCmd
instruction set.
GoChan
, the type of the channel value that the user has access to, is merely
an Int
identifier that the runtime uses, as we will see below.
Some things to note. All of the constructors have a next
argument. This
argument allows the free monad to build a list of instructions. GoMakeChan
and
GoGetChan
have (GoChan -> next)
and (GoExpr -> next)
arguments instead of
just next
because the next instruction for these expects a new variable bound
to the value of the new channel or value extracted from the channel
respectively. To test your understanding of how free monads work, as yourself
why the goroutine argument in GoRun
has type GoProgram ()
instead of just
next
.
The channels aren’t typed---or, rather, are unityped: they only pass GoExpr
s
between goroutines. The channels can only store one value at a time---this is
not apparent in the datatype, but it will be reflected in the implementation of
the interpreter. We can add channel types and channel buffers to the language
quite easily, but we’re going to punt on those for now.
Here’s some free monad boilerplate. These are the monadic functions we will use to build HaskGo programs. I will present these with no further comment---read Gabriel’s blog post linked above to understand how this works!
An interesting thing to note is that select
as implemented above is strictly
more expressive than in Go, as the list of branches on which it waits is just a
regular value. That means it can block on a list of branches given as an
argument to a function, something that is impossible in Go.
Here’s what a HaskGo program looks like, using the functions above. The following is an elegant solution to the classic producer-consumer problem:
While distinctly Haskell in syntax, if you squint a little it does indeed have
Go-like synchronization primitives: you can create a channel and bind a
variable to it (i.e. channels are first-class); you can spawn off goroutines
just with the go
command; and you can get and put values into channels.
Now for the interpreter proper. The interpreter will be, essentially, a simulation of the Go runtime. What kind of data does the HaskGo runtime need to keep track of? The runtime must handle goroutines and channels, so we must at least have a list of each. We also need a map from channels to values put in them. For scheduling purposes we also need a ready queue, as well as wait queues for each channel to keep track of goroutines blocked on it. So the runtime state should look something like this:
This should be almost self-explanatory. curGoroutine
is the current goroutine
scheduled; nextGoroutine
and nextPort
are the identifiers of the next
created goroutine and channel respectively. goroutines
is a map from
goroutines to the program they are executing; portVals
is a map from channels
to the (possible) values they hold. readyQueue
and waitQueues
keep track of
scheduling information as described above. (Why the runtime state keeps data on
ports instead of channels will become clear later when we extend HaskGo.)
Finally, the HaskGo interpreter. First, some helper functions:
Notice that the interpreter is in a monad transformer stack consisting of IO
(to print to stdout
), State
(to track runtime state) and Except
(to handle
errors). scheduleNextGoroutine
is, as its name implies, the scheduling
function. It implements a basic round-robin scheduler. wakeFromWaitQueue
and
addToWaitQueue
either removes or adds a goroutine from channel wait queues.
The only thing that might give pause is the cont
argument to
addToWaitQueue
, which contains the rest of the computation (the continuation)
that the goroutine needs to execute. If we do not update this, the next time
the goroutine is scheduled it will restart execution of the computation from
the beginning!
Once these helper functions are in place, the actual interpreter is quite straightforward.
GoRun
and GoMakeChan
create goroutines and channels respectively. There’s
just a lot of code to update the runtime state, but otherwise these are pretty
straightfoward.
Next are GoPutChan
and GoGetChan
.
These are duals of each other. GoPutChan
puts a value in the channel if it is
empty, and blocks the goroutine otherwise. GoGetChan
extracts a value from the
channel if it is full, and blocks the goroutine otherwise. When either operation
is successful (i.e. doesn’t block), we wake goroutines that are blocked on the
channel.
A subtlety: notice that when the goroutine is blocked on the channel, it stores
prog
as the continuation instead of next
. This is so because prog
contains
the current instruction being executed, and we want to try executing the
instruction again when it is woken up. For example, if executing GoGetChan
blocks on an empty channel, we want to execute GoGetChan
again when the
goroutine is woken up and the channel is full.
select
is probably the most complicated.
There are two cases:
-
If there are branches that are unblocked, we pick the first one, and essentially execute a
GoGetChan
command: we extract the value from the channel we block on, apply the value to the branch code, and then execute it. -
If all branches are blocked, we block the goroutine.
Another subtlety: notice that we need to bind the next
argument to the
executed branch. Otherwise the rest of the instructions after select
won’t be
executed!
Finally here’s GoPrint
, which is trivial:
We now have a HaskGo interpreter. Let’s write a helper function to initialize the runtime state and run the interpreter:
Let’s see the output of the producer-consumer HaskGo program from above:
Works as expected!
How about something more complicated? Here’s a sieve of Eratosthenes!2
Let’s break down what’s going on here. source
just puts integers into a
channel in sequence, up to some maximum number. sink
extracts a value from
chan
channel, and then once it does it spawns off a pfilter
goroutine,
creates a new channel, and then extracts values from that. pfilter
extracts
values from a left
chan, checks if it is a multiple of p
, and if it isn’t
passes it to the right
channel.
Do you see what’s going on? Every time sink
extracts a prime number from
chan
, it creates a new filter between it and source
to filter out multiples
of the prime number just extracted. It looks like this:
source <--> pfilter 2 <--> pfilter 3 <--> ... <--> sink
And thus:
As you can see, this style of programming leads to particularly elegant solutions, even though concurrency isn’t strictly necessary to solve them. I’m feeling a little philosophical so I’m going to draw a more general lesson here. Most programs we write are structured to reflect an increasingly archaic model of hardware: a single-core processor executing instructions one at a time. We resort to concurrency only when the problem demands it---as in writing web servers or databases, for example---or to exploit multicore processors and make our programs parallelized. But what if we write concurrent programs simply because they’re the most elegant way of solving the problem at hand? How will we change how we code if, instead of treating concurrency as a tool of last resort or as an optimization, we use it pervasively? Is this aversion to concurrency another incarnation of the von Neumann bottleneck?
I’m glad Go has expanded the Overton window, so to speak, when it comes to what counts as mainstream synchronization primitives. But the design space is obviously still wide and sparse, and I’m sure there are more interesting questions to be answered in the offing.
I digress---now let’s talk about how to make channels more interesting.
HaskGo with Channel Combinators
Here’s where our choice to implement an embedded language instead of implementing one from scratch will pay dividends. Let’s say we want to implement channel combinators---functions that take a channel, does some transformation on it, and return a new channel. What kinds of combinators do we want to write? At a very high level, we can see channels as semantically being streams of values. So let’s start off with some combinators from elementary functional programming.
Here’s a first pass at chan_map
, which like regular old map
applies a
function “element-wise” to every value in the channel:
Hm. As we’ve seen in the implementation of the HaskGo interpreter, a channel
value is really just an identifier. There is no way to extract values from it
outside of the interpreter. So chan_map
actually needs to be a HaskGo
program.
Here’s a second pass at chan_map
:
A little better, though definitely not right either. This will only work as
intended if chan_map
is scheduled to run before any other goroutine that
extracts a value from chan
. Also, this will only work at most once, since it
only extracts a value from chan
once.
What we really need chan_map
to do is to spawn a new goroutine, one that
repeatedly extracts values from chan
, applies the value to f
, and then
pushes the output of f
into a new channel. In this way, goroutines pushing
values should do it to the old channel, and goroutines extracting values should
do so from the new channel.
This works, albeit awkwardly. Here’s one way to use it:
This will work as intended and print out 101 to 200. However, notice that the
two channels here are being treated as unidirectional: one channel is only
receiving input (i.e. only calls to the channel are putchan
) while the other
channel is only returning output (i.e. only calls to the channel are getchan
).
chan_map
spawned off a goroutine to pipe the input from inchan
to outchan
,
but that is (and should be) invisible to the user. It would be nice if instead
of having two unidirectional channels, we would have just one bidirectional
channel — that is, just a regular channel. We want something like this:
The source
goroutine puts values 1 to 100 in the channel, and then when
sink
extracts values out of the channel it will magically receive 101 to 200.
As currently implemented, chan_map
will not work as intended here. How will we
implement this feature?
The trick is as follows: we’re going to change what a channel value means. Currently, a channel is a key into a runtime map that may or may not hold a value. Now instead of a single key into a runtime map, instead it will be a pair of keys: one key will be used to push values into the map, and the other key will be used to get values from it.
Some diagrams might help with the explanation.
The diagram above should be pretty self-explanatory. For this to work properly
the source
function needs to be passed the inchan
channel and the sink
channel needs to be passed the outchan
channel. chan_map
currently returns
outchan
. To make it so that we can pass the same channel to both source
and
sink
, we’re going to change chan_map
so that it returns a “virtual” channel:
when a goroutine pushes a value to it, the goroutine is really pushing a value
to inchan
; when a goroutine extracts a value from it, the goroutine is really
extracting a value out of outchan
. So under the hood the same thing is
happening, but the interface to the user is a bit nicer. Like so:
Let’s see how this change looks like in code.
As mentioned above, we’re changing channel values to be pairs of keys, one of
the “inport” of the channel and another for the “outport” of the channel. Hence
the name “port” instead of “channel” in the runtime code. For the helper
functions, nothing really changes except the type annotations. Since GoChan
is now a pair these should instead take in ChanPort
arguments.
Now to changes to the interpreter proper.
There’s not much to change. GoMakeChan
should now return a pair. GoPutChan
should put values in the first port of the channel. GoGetChan
should put
values in the second port of the channel. That’s pretty much it!
Now, we can finally write chan_map
.
And voila!
Here’s a particularly useful combinator, chan_partition
, and a nice
use for it.
And as you’d expect:
In fact, you can basically just go through Haskell’s Data.List library and implement most of the functions there as channel combinators. It’s a fun exercise---I recommend doing it.
A nice improvement for the future would be to do some kind of deforestation analysis on channel combinators. For example, consider the channel
Instead of creating two intermediate goroutines, we can optimize this to just have one:
I think we can get away with having at most one intermediate goroutine that does the transformation between input and output channels, no matter how many channel transformations we bind together. But it would involve adding extra state to the runtime that keeps track of what goroutines feed from and push to which channels.
That’s it for now. If you want to see the full source code for HaskGo, see this gist. Until next time!
Footnotes
-
“Free” as in beer (gratis). I’m a little fuzzy on the category theory details, but I think free monads are also free, as in speech (libre). Here is a discussion on StackOverflow to whet your appetite. ↩
-
NB: I originally encountered this implementation of the sieve in a class for which I was a TA, Cornell’s CS 4410 (Operating Systems). It was a testcase for a project that involved implementing a threading library in C. The original C code had to manually manage channels using semaphores. The HaskGo version is a lot simpler, since the semaphores are managed by the runtime: you can see calls to
addToWaitQueue
andwakeFromWaitQueue
in the interpreter as calls toP()
andV()
to channel semaphores. ↩