Let’s talk about what Crystal does with your code: how it represents types in memory, how it does method lookup at runtime, how it does method dispatch, etc. When using a programming language it’s always useful to know this in order to structure our code in the most efficient way, and to precisely understand what our code will be transformed to.
How types are represented in memory
For talking about type representations we will use C and LLVM IR code, so be sure to check the reference.
Crystal has built-in types, user defined types and unions.
These are Nil, Bool, Char and the various number types (Int32, Float64, etc.), Symbol, Pointer, Tuple, StaticArray, Enum, Proc and Class.
Let’s check how a Bool is represented. For this, let’s write this small program:
To see the generated LLVM we can use this command:
crystal build test.cr --emit llvm-ir --prelude=empty
--emit llvm-ir flag tells the compiler to dump the resulting LLVM IR code to a test.ll file.
--prelude=empty tells the compiler to not use
the default prelude file, which, for example,
initializes the GC.
In this way we can get a very simple and clean LLVM IR code file with just the code we write:
The gist is in
__crystal_main: we can see the compiler allocates an
i1 in the stack for
x and then stores
true in it.
That is, the compiler represents a Bool as a single bit, which is pretty efficient.
Let’s do the same for an int:
x this time we will get:
In LLVM, and i32 is an int represented with 32 bits, which, again, is pretty efficient and what we would expect the representation
Int32 to be.
That is, even though Crystal is object-oriented and an Int32 behaves like an object (it has methods), its internal representation is as efficient as possible.
Let’s see Symbol now:
Let’s see the full LLVM IR code this time:
Three things are important here. First, we can see that a Symbol is represented as
i32, that is, with four bytes. Second,
we can see
x is assigned a value of 0 and
y is assigned a value of 1. Third, we can see some constants at the top:
Basically, a Symbol in Crystal is just a name assigned to a unique number. A Symbol can’t be created dynamically
String#to_sym) and the only way to create them is with their literal value, so the compiler can know
all the symbols used across the program. The compiler assigns a number to each of them, starting from zero, and it also
builds a table that map their number to a string, to be able to implement
Symbol#to_s in a very efficient way.
This makes symbols very attractive to use for small groups of constants, because it’s like using magic numbers but with names instead.
A Pointer is a generic type that represents a typed pointer to some memory location. For example:
If you look at the generated LLVM IR code you will see a bunch of code. First,
x is represented like this:
Again, this is just a pointer to an int32, as it should be. Next you will see a call to
malloc (will ask memory from the GC
using the regular prelude) and
memset to clear the memory, and then some instructions to assign 1 in that memory address.
This is not very important for the subject of this blog post, so we skip it, but it’s important to know that generated code
is very similar to what would be generated in C.
A Tuple is a fixed-size, immutable sequence of values, where the types at each position are known at compile time.
Pieces of LLVM IR code for the above are:
As we can see, a tuple is represented as an LLVM structure, which just packs values sequentially. This representation of tuples allows us, for example, to decompose an Int32 into its bytes, in this way:
A StaticArray is a fixed-size, mutable sequence of values of a same type, allocated on the stack and passed by value. The prelude includes safe ways to create them, but since we are using a bare-bones prelude an unsafe (will be initialized to data containing garbage) way to create them is this:
Its LLVM representation:
We won’t talk much more about this type because it’s not used that much, mostly for IO buffers and such: Array is the recommended type for all other operations.
Here’s an enum:
An enum is, in a way, similar to Symbol: numbers associated to names so we can use names in our code instead of magic numbers. As expected, an enum is represented as an i32, that is four bytes, unless specified otherwise in its declaration:
The nice thing about enums is that you can print them and you get their name, not their value:
This is done in a different way than with Symbol, using compile-time reflection and macros.
But, basically, an enum’s
to_s method is generated only when needed. But it’s nice that an enum is memory and speed efficient
and also comfortable to use and to debug with (like, you get names instead of numbers when printing them).
A Proc is a function pointer with an optional closure data information. For example:
This is a function pointer that receives an Int32 and returns an Int32. Since it doesn’t capture any local variables it’s not a closure. But the compiler still represents it like this:
That is a pair of pointers: one containing the pointer to the real function, another one containing a pointer to the closured data.
The LLVM IR code for the above is:
A bit harder to digest than the above examples, but it’s basically assining a pointer to
~fun_literal_1 in the first
null in the second. If our Proc captures a local variable:
The LLVM IR code changes:
This is even harder to digest, but basically some memory is asked that will contain the value of the variable
the Proc receives it and uses it. In this case the memory is asked with
malloc, but with the regular prelude the memory
will be allocated by the GC and released when no longer needed.
Classes are objects too:
Not surprisingly, a class is represented as an Int32:
Because classes can’t be created at runtime, and the compiler knows all classes, it assigns a type id to them and that way it can identify them.
Users can define classes and structs. The difference is that newing a class allocates it on the heap, and a pointer to that data is passed across variables and methods, while newing a struct allocates that memory on the stack and the whole struct’s value is passed, copied, across variables and methods.
Let’s try it:
The LLVM IR code contains:
Mmm… wait! A Point has just two instance variables,
@y, both of type Int32, so why there’s another
there? Well, it turns out Crystal adds an Int32 to store a type id associated with the class. This doesn’t make much sense
right now, but when we’ll talk about how unions are represented it will make more sense.
Let’s see the same for a struct:
The LLVM IR code contains:
In this case a struct doesn’t contain the extra Int32 field for the type id.
Now comes the fun part: unions!
Crystal supports unions of arbitrary types. For example you can have a variable that has either an Int32 or a Bool:
At the end of the
if the variable
x will either be
false, which makes it type an Int32 or a Bool.
The Crystal way to talk about a union is using a pipe, like this:
Int32 | Bool. In the LLVM IR code we can find:
We can see that the representation of this particular union is an LLVM structure containing two fields. The first one will contain the type id of the value. The second one is the value itself, which is a bit array as large as the largest type in that union (due to some alignment concerns, the size is extended to 64 bits boundaries in 64 bit architectures). In C it would be:
The first field, the type id, will be used by the compiler when you invoke a method on
So, it would seem that here ends the story about how union types are represented. However, there are some unions that are very common: nilable types.
We didn’t talk about Nil previously, but since it can only contain a single value, and you can’t use
void for a value,
its represented as i1:
Let’s make now a union of nil and a class:
If we check the LLVM IR code we will see this for x:
So a union of
Point | Nil, where Point is a class, is represented in the same was as the Point class. How can
we tell if x is Nil or Point? Easy: a null pointer means it’s Nil, a non-null pointer means it’s a Point.
In fact, all unions that only involve classes and/or nil are always represented as a single pointer. If it’s a null pointer, it’s Nil. Otherwise, if the union contains many possible classes, we can know the type with the first member of the value, an Int32, remember? Having all of these unions be represented as pointers makes the code much more efficient, as pointers fit in registers and occupy very little memory.
However, a union of Nil and a struct will always be represented as a tagged union, like the
Int32 | Bool case.
But these unions are much less common.
Now that we understand how types are represented and how, at runtime, we can know what type is contained in a union, let’s talk about method dispatch.
Although Crystal is object-oriented, method lookup and dispatch work very different than other object-oriented languages. For example, there are no virtual tables and no metadata stored for types (except that type id field we talked before). We try to minimize the runtime data needed for a program to work, and also maximize its speed of execution, sometimes sacrificing the resulting binary size (which doesn’t grow a lot, either). For example, let’s consider this class hierarchy:
Wow, a big class hierarchy and even an included module, and two definitions for
foo. By looking at the code,
can you know which
foo method will get invoked in this case?
Moo#foo, right? Yes, indeed. Well, it turns out the compiler knows this too, and if you take a look
at the generated code you will see something like this:
What happens if we create an instance of Bar and we invoke
foo on it too:
Now the LLVM IR code contains this:
Oops, isn’t there a duplicated definition of
foo there? Well, yes. You can think as if the compiler
copied foo’s definition into each class, and so there will be, indeed, many copies of the same method.
But this doesn’t matter much: most methods are not big, and method call speed is much more important. Furthermore,
small methods get inlined anyway in an optimized build, and there’s even an LLVM transformation pass to
detect duplicated functions and merge them.
Of course, the story changes a bit if
Moo#foo invokes an instance method or uses an instance variable. In this
case the “duplicated” methods will actually be different, again, as if we copied each method definition
into each type that finally contains it. This makes method call as efficient as possible, at the cost
of (possibly) increasing executable size. But end users are usually more concerned about speed than
All of the above is possible because the compiler knows the exact type of
Bar.new. What happens
if the compiler doesn’t know this? Let’s start with a simple union where types are not classes
in the same hierarchy:
This time the compiler will generate code that more or less does this: before invoking
check what type is
obj. This can be known by loading the first field (the type id) of the pointer
that represents the object. Then based on this we invoke one method or another. The decision for this
is just one memory load and a comparison: very efficient. For a bigger union it would still be one memory
load or just reading the type id field of a union, and then many comparisons. But… wouldn’t a lookup
table be faster?
Well, it turns out LLVM is pretty smart, and when it detects many comparisons it can sometimes build a lookup table for us. For this to work better, the numbers inside the lookup table must be close to each other (imagine a lookup table for the values 1 and 1000000, it would take a lot of space so LLVM would decide to do comparisons in that case). Luckily, we assign type ids in a way that helps LLVM achieve this.
When we say
big unions chances are that that union contains classes of the same hierarchy: you usually
build a class hierarchy to make all types follow a certain rule, respond to a similar set of methods.
Although you can do this without a class hierarchy, they are a very common way of structuring code.
Consider this class hierarchy:
Considering these types only, the compiler assigns type ids in a post-order way: first Baz gets assigned 1, then Qux gets assigned 2, then Bar gets assigned 3, and finally Foo gets assigned 4. Also, the compiler tracks the range of type ids of a type’s subtypes, including itself, so for Bar it also assigns the range 1-3, and for Foo it assigns the range 1-4.
Now, consider this:
First, the compiler will type
Foo+, meaning it can be Foo or one of its sublcasses (read
more about this here).
In this case, there will be only two different method instantiations: one for
Foo+ and one for
Baz and Qux don’t redefine that method. To know which one we need to call, we load the type id. Then, instead
of having to say “if the type id is that of Bar, or Baz or Qux, call Bar#foo, otherwise call Foo#foo`, we
can simply check if the type id is in the range previously assigned to Bar (1-3): just two comparisons.
This range check also works with
is_a?. When you do
obj.is_a?(Foo), and maybe
obj is an Int32 or, Foo,
or one of its subclasses, we can solve this with at most two comparisons.
Finally, an interesting aspect of Crystal is that method dispatch happens based on possibly many types:
And this also works if the type of all arguments is not known at compile time. But… this blog post is getting a bit long and complex by now: there are many more micro-optimizations that we apply to your code to make it as efficient as possible. So don’t be afraid to use Crystal to its full potential :-)