Here we’ll continue explaining how Crystal assigns types to each variable and expression of your program. This post is a bit long, but in the end it’s just about making Crystal behave in the most intuitive way for the programmer, to make it behave as similar as possible to Ruby.
We’ll start with literals, C functions and some primitives. Then we’ll continue with flow
control structures, like if
, while
, and blocks. Then we’ll talk about the special
NoReturn
type and type filters.
Literals
Literals have a type of their own, known by the compiler:
C functions
When you define a C function you must tell the compiler its types:
allocate
The allocate primitive gives you an uninitialized instance of an object:
You don’t normally invoke it directly. Instead, you invoke new
, which
is automatically generated by the compiler to something like this:
A similar primitive is Pointer#malloc
, which gives you a typed pointer to a
memory region:
Variables
Next, when you assign an expression to a variable, the variable will be bound to that expression’s type (if the expresision’s type changes, so the variable’s type changes).
The compiler tries to be as smart as possible when you use variables. For example, you can assign multiple times to a variable:
To achieve this, the compiler remembers which expression was the last one assigned
to a variable. In the above example, after the first line the compiler knows that
a
has type Int32
, so a call to abs
is valid. In the third line we assign
a String
to it, so the compiler remembers this and, on the fourth line, it’s perfectly
valid to invoke size
on it.
Additionally, the compiler remembers that both an Int32
and a String
were
assigned to a
. When generating LLVM code, the compiler will represent a
as a union type that can be Int32 or String. It would be something like this in C:
struct Int32OrString { int type_id; union { int int_value; string string_value; } data; }
This might seem inefficient if we continually assign different types to the same variable.
However, the compiler knows that when you invoked abs
, a
was an Int32, so it never
checks the type_id
field: it directly uses the int_value
field. LLVM notices this
and optimizes this out, so in the generated code there will never be a union (the type_id
field is never read).
Going back to Ruby, if you assign a variable multiple times in a row, the last value (and type) is the one that counts for subsequent calls. Crystal mimics this behaviour. A variable then just becomes a name for the last expression that we assigned to.
If
Let’s take a piece of Ruby code and analyze it:
In Ruby, the only line that can fail at runtime is the last one. The first call to abs
will never fail, as an Int32
was assigned to a
. The first call to size
will
also never fail, as a String
was assigned to a
. However, after the if
, a
can
either be an Int32
or a String
.
So Crystal tries to keep this intuitive reasoning about a
’s type. When a variable is assigned
inside an if
’s then or else branch, the compiler knows that it will continue to have that type until the if
ends or until it is assigned a new expression. When an if
ends, the compiler will let a
have the type of the last expressions that it was assigned to in each branch.
The last line in Crystal will give a compiler error: “undefined method ‘size’ for Int32”.
That’s because even though String
has a size
method, Int32
doesn’t.
In designing the language we had two choices: make the above a compile-time error (like now) or just make it a runtime error (like in Ruby). We believe it’s better to make it a compile-time error. In some cases you might know better than the compiler and you will be sure that a variable has the type that you might think. But in some cases the compiler will let you know that you overlooked a case or some logic, and you’ll thank it for that.
The if
has some more cases to take into account. For example, the variable a
might not exist
before the if
. In this case, if it’s not assigned in one of the branches, at the end of the
if
it will also contain the Nil type if it’s read:
This, again, mimics Ruby’s behaviour.
Finally, an if
’s type is the union of the last expressions in both branches. If a branch
is missing, it’s considered to have a Nil
type.
While
A while
is in a way similar to an if
:
That’s because some_condition
might be falsy the first time.
However, since a while
is a loop there are some more things to consider. For example,
the last expression assigned to a variable inside a while
determines the type of that
variable in the next iteration. In this way, the type at the beginning of the loop
will be a union of the type before the loop and the type after the loop:
Some other things to consider inside a while
are break
and next
. A break
makes the types right before the break add to the type at the exit of the while
:
A next
adds the type to the beginning of the while
:
Blocks
Blocks are very similar to a while
: they can be executed zero or more times. So
the logic for variables’ types is very similar to that of a while
.
NoReturn
There’s a mysterious type in Crystal called NoReturn
. One such example is
C’s exit
function:
Another very useful method that is NoReturn is raise
: raising an exception.
The type basically means: after this point there’s nothing else. Nothing gets returned,
and nothing that comes afterwars is executed (of course, a rescue
will be executed
if there’s one surrounding the code, but the normal path won’t be executed).
The compiler knows about NoReturn
. For example, take a look at the following code:
Remember that after an if
a variable’s type is the union of the types of
both branches. However, since the first branch ends there, because raise
is NoReturn
,
the compiler knows that code after the if
, if that branch is taken, will never be
executed. So it can definitely say: a
will only have the type of the else
branch.
The same logic applies when you have return
, break
or next
inside an if
.
Also, when you define a method whose type is NoReturn
, that method is in turn NoReturn
:
Union of NoReturn
Remember that an if
’s type is the union of the last expressions of the if
’s branches.
What type has the following if
(and consequently the a
variable)?
Well, the then
branch is definitely NoReturn
. The else
branch is definitely
Int32
. We could conclude then that a
has type NoReturn
or Int32
.
However, NoReturn
means that nothing gets executed afterwards. So a
can only
be Int32
at the end of the previous snippet, and that’s how the compiler behaves.
With this we can implement a little method called not_nil!
. Here it is:
a
’s type is Int32
or Nil
. One thing that we didn’t say yet is that when you
have a union type and you invoke a method on it, and all types respond to that method, the resulting
type is the union of the types of each method.
In this case, a.not_nil!
will have the type Int32
if a
is Int32
, or
NoReturn
if it’s Nil
(because of the raise
). Combining these types just
gives Int32
, so the above code is perfectly valid. And that’s how you can discard Nil
from a variable and turn it into a runtime exception if it turns out to be nil
. No special language
construct is needed. All is made with the logic explained so far.
Type filters
Now, what if we want to execute a method on a variable whose type is Int32
or Nil
,
but only if that variable is Int32
. If it’s Nil
, we don’t want to do anything.
We can’t use not_nil!
, because that will raise a runtime exception when nil.
We can define another method, try
:
(if you are not sure what &.abs
means, read this)
Since doing something depending on whether a value is Nil
or not is so common, Crystal
provides another way to do the above. This was shortly explained
here, but now we’ll explain it
better and combine it with the previous explanations.
If a variable is an if
’s condition, the compiler assumes the variable is not nil
inside the then
branch:
This makes sense: if a
is truthy then it means it is not nil
. Not only this,
but the compiler also makes a
’s type be that one after the if
, combined with
whatever type a
has in the else
branch. For example:
Just like a programmer expects the above to always work in Ruby (never raise an “undefined method” error in runtime), so it works in Crystal.
We call the above a “type filter”: a
’s type got filtered inside the if
’s
then
branch by removing Nil
from the possible types a
can have.
Another type filter happens when you do is_a?
:
And another type filter happens when you do responds_to?
:
These are special methods, known by the compiler, and that’s why the compiler is
able to filter the types. On the contrary, the method nil?
is not special right now
so the following won’t work:
We’ll probably make nil?
a special method too, so it’s more consistent with the
rest of the language and the above works. We’ll also probably make the unary !
method
special, not overloadable, so you could do:
Conclusion
In conclusion, as was said in the beginning of this post, we want Crystal to behave as much as possible as Ruby, and if something is intuitive and makes sense for the programmer to make the compiler understand it too. For example:
The above shouldn’t give you a compile time error. The programmer knows that if x
was nil
inside foo
, the method returns. It follows that x
can never be
nil
afterwards so it’s ok to invoke abs
on it. How does the compiler know this?
Well, first, the compiler rewrites an unless
to an if
:
Next, inside the then
branch of the if
we know that x
is not nil
.
Inside the else
branch the method returns, so we don’t care about the type of x
afterwards. So, after the if
, x
can only be of type Int32
. This is idiomatic
code in Ruby, and so it is in Crystal if we carefully follow the language rules.
We still have to talk about methods and instance variables, but this post is already long enough so that will have to be explained in a following post. Stay tuned!
Here we’ll continue explaining how Crystal assigns types to each variable and expression of your program. This post is a bit long, but in the end it’s just about making Crystal behave in the most intuitive way for the programmer, to make it behave as similar as possible to Ruby.
We’ll start with literals, C functions and some primitives. Then we’ll continue with flow control structures, like
if
,while
, and blocks. Then we’ll talk about the specialNoReturn
type and type filters.Literals
Literals have a type of their own, known by the compiler:
C functions
When you define a C function you must tell the compiler its types:
allocate
The allocate primitive gives you an uninitialized instance of an object:
You don’t normally invoke it directly. Instead, you invoke
new
, which is automatically generated by the compiler to something like this:A similar primitive is
Pointer#malloc
, which gives you a typed pointer to a memory region:Variables
Next, when you assign an expression to a variable, the variable will be bound to that expression’s type (if the expresision’s type changes, so the variable’s type changes).
The compiler tries to be as smart as possible when you use variables. For example, you can assign multiple times to a variable:
To achieve this, the compiler remembers which expression was the last one assigned to a variable. In the above example, after the first line the compiler knows that
a
has typeInt32
, so a call toabs
is valid. In the third line we assign aString
to it, so the compiler remembers this and, on the fourth line, it’s perfectly valid to invokesize
on it.Additionally, the compiler remembers that both an
Int32
and aString
were assigned toa
. When generating LLVM code, the compiler will representa
as a union type that can be Int32 or String. It would be something like this in C:This might seem inefficient if we continually assign different types to the same variable. However, the compiler knows that when you invoked
abs
,a
was an Int32, so it never checks thetype_id
field: it directly uses theint_value
field. LLVM notices this and optimizes this out, so in the generated code there will never be a union (thetype_id
field is never read).Going back to Ruby, if you assign a variable multiple times in a row, the last value (and type) is the one that counts for subsequent calls. Crystal mimics this behaviour. A variable then just becomes a name for the last expression that we assigned to.
If
Let’s take a piece of Ruby code and analyze it:
In Ruby, the only line that can fail at runtime is the last one. The first call to
abs
will never fail, as anInt32
was assigned toa
. The first call tosize
will also never fail, as aString
was assigned toa
. However, after theif
,a
can either be anInt32
or aString
.So Crystal tries to keep this intuitive reasoning about
a
’s type. When a variable is assigned inside anif
’s then or else branch, the compiler knows that it will continue to have that type until theif
ends or until it is assigned a new expression. When anif
ends, the compiler will leta
have the type of the last expressions that it was assigned to in each branch.The last line in Crystal will give a compiler error: “undefined method ‘size’ for Int32”. That’s because even though
String
has asize
method,Int32
doesn’t.In designing the language we had two choices: make the above a compile-time error (like now) or just make it a runtime error (like in Ruby). We believe it’s better to make it a compile-time error. In some cases you might know better than the compiler and you will be sure that a variable has the type that you might think. But in some cases the compiler will let you know that you overlooked a case or some logic, and you’ll thank it for that.
The
if
has some more cases to take into account. For example, the variablea
might not exist before theif
. In this case, if it’s not assigned in one of the branches, at the end of theif
it will also contain the Nil type if it’s read:This, again, mimics Ruby’s behaviour.
Finally, an
if
’s type is the union of the last expressions in both branches. If a branch is missing, it’s considered to have aNil
type.While
A
while
is in a way similar to anif
:That’s because
some_condition
might be falsy the first time.However, since a
while
is a loop there are some more things to consider. For example, the last expression assigned to a variable inside awhile
determines the type of that variable in the next iteration. In this way, the type at the beginning of the loop will be a union of the type before the loop and the type after the loop:Some other things to consider inside a
while
arebreak
andnext
. Abreak
makes the types right before the break add to the type at the exit of thewhile
:A
next
adds the type to the beginning of thewhile
:Blocks
Blocks are very similar to a
while
: they can be executed zero or more times. So the logic for variables’ types is very similar to that of awhile
.NoReturn
There’s a mysterious type in Crystal called
NoReturn
. One such example is C’sexit
function:Another very useful method that is NoReturn is
raise
: raising an exception.The type basically means: after this point there’s nothing else. Nothing gets returned, and nothing that comes afterwars is executed (of course, a
rescue
will be executed if there’s one surrounding the code, but the normal path won’t be executed).The compiler knows about
NoReturn
. For example, take a look at the following code:Remember that after an
if
a variable’s type is the union of the types of both branches. However, since the first branch ends there, becauseraise
isNoReturn
, the compiler knows that code after theif
, if that branch is taken, will never be executed. So it can definitely say:a
will only have the type of theelse
branch.The same logic applies when you have
return
,break
ornext
inside anif
.Also, when you define a method whose type is
NoReturn
, that method is in turnNoReturn
:Union of NoReturn
Remember that an
if
’s type is the union of the last expressions of theif
’s branches.What type has the following
if
(and consequently thea
variable)?Well, the
then
branch is definitelyNoReturn
. Theelse
branch is definitelyInt32
. We could conclude then thata
has typeNoReturn
orInt32
. However,NoReturn
means that nothing gets executed afterwards. Soa
can only beInt32
at the end of the previous snippet, and that’s how the compiler behaves.With this we can implement a little method called
not_nil!
. Here it is:a
’s type isInt32
orNil
. One thing that we didn’t say yet is that when you have a union type and you invoke a method on it, and all types respond to that method, the resulting type is the union of the types of each method.In this case,
a.not_nil!
will have the typeInt32
ifa
isInt32
, orNoReturn
if it’sNil
(because of theraise
). Combining these types just givesInt32
, so the above code is perfectly valid. And that’s how you can discardNil
from a variable and turn it into a runtime exception if it turns out to benil
. No special language construct is needed. All is made with the logic explained so far.Type filters
Now, what if we want to execute a method on a variable whose type is
Int32
orNil
, but only if that variable isInt32
. If it’sNil
, we don’t want to do anything.We can’t use
not_nil!
, because that will raise a runtime exception when nil.We can define another method,
try
:(if you are not sure what
&.abs
means, read this)Since doing something depending on whether a value is
Nil
or not is so common, Crystal provides another way to do the above. This was shortly explained here, but now we’ll explain it better and combine it with the previous explanations.If a variable is an
if
’s condition, the compiler assumes the variable is notnil
inside thethen
branch:This makes sense: if
a
is truthy then it means it is notnil
. Not only this, but the compiler also makesa
’s type be that one after theif
, combined with whatever typea
has in theelse
branch. For example:Just like a programmer expects the above to always work in Ruby (never raise an “undefined method” error in runtime), so it works in Crystal.
We call the above a “type filter”:
a
’s type got filtered inside theif
’sthen
branch by removingNil
from the possible typesa
can have.Another type filter happens when you do
is_a?
:And another type filter happens when you do
responds_to?
:These are special methods, known by the compiler, and that’s why the compiler is able to filter the types. On the contrary, the method
nil?
is not special right now so the following won’t work:We’ll probably make
nil?
a special method too, so it’s more consistent with the rest of the language and the above works. We’ll also probably make the unary!
method special, not overloadable, so you could do:Conclusion
In conclusion, as was said in the beginning of this post, we want Crystal to behave as much as possible as Ruby, and if something is intuitive and makes sense for the programmer to make the compiler understand it too. For example:
The above shouldn’t give you a compile time error. The programmer knows that if
x
wasnil
insidefoo
, the method returns. It follows thatx
can never benil
afterwards so it’s ok to invokeabs
on it. How does the compiler know this?Well, first, the compiler rewrites an
unless
to anif
:Next, inside the
then
branch of theif
we know thatx
is notnil
. Inside theelse
branch the method returns, so we don’t care about the type ofx
afterwards. So, after theif
,x
can only be of typeInt32
. This is idiomatic code in Ruby, and so it is in Crystal if we carefully follow the language rules.We still have to talk about methods and instance variables, but this post is already long enough so that will have to be explained in a following post. Stay tuned!