authors/asterite.jpg Ary Borenzweig 27 Apr 2014

Type inference rules

Here we’ll continue explaining how Crystal assigns types to each variable and expression of your program. This post is a bit long, but in the end it’s just about making Crystal behave in the most intuitive way for the programmer, to make it behave as similar as possible to Ruby.

We’ll start with literals, C functions and some primitives. Then we’ll continue with flow control structures, like if, while, and blocks. Then we’ll talk about the special NoReturn type and type filters.

Literals

Literals have a type of their own, known by the compiler:

true     # Boolean
1        # Int32
"hello"  # String
1.5      # Float64

C functions

When you define a C function you must tell the compiler its types:

lib C
  fun sleep(seconds : UInt32) : UInt32
end

sleep(1_u32) # sleep has type UInt32

allocate

The allocate primitive gives you an uninitialized instance of an object:

class Foo
  def initialize(@x)
  end
end

Foo.allocate # Foo.allocate has type Foo

You don’t normally invoke it directly. Instead, you invoke new, which is automatically generated by the compiler to something like this:

class Foo
  def self.new(x)
    foo = allocate
    foo.initialize(x)
    foo
  end

  def initialize(@x)
  end
end

Foo.new(1)

A similar primitive is Pointer#malloc, which gives you a typed pointer to a memory region:

Pointer(Int32).malloc(10) # has type Pointer(Int32)

Variables

Next, when you assign an expression to a variable, the variable will be bound to that expression’s type (if the expresision’s type changes, so the variable’s type changes).

a = 1 # 1 is Int32, so a is Int32

The compiler tries to be as smart as possible when you use variables. For example, you can assign multiple times to a variable:

a = 1       # a is Int32
a.abs       # ok, Int32 has a method 'abs'
a = "hello" # a is now String
a.size    # ok, String has a method 'size'

To achieve this, the compiler remembers which expression was the last one assigned to a variable. In the above example, after the first line the compiler knows that a has type Int32, so a call to abs is valid. In the third line we assign a String to it, so the compiler remembers this and, on the fourth line, it’s perfectly valid to invoke size on it.

Additionally, the compiler remembers that both an Int32 and a String were assigned to a. When generating LLVM code, the compiler will represent a as a union type that can be Int32 or String. It would be something like this in C:

struct Int32OrString {
  int type_id;
  union {
    int int_value;
    string string_value;
  } data;
}

This might seem inefficient if we continually assign different types to the same variable. However, the compiler knows that when you invoked abs, a was an Int32, so it never checks the type_id field: it directly uses the int_value field. LLVM notices this and optimizes this out, so in the generated code there will never be a union (the type_id field is never read).

Going back to Ruby, if you assign a variable multiple times in a row, the last value (and type) is the one that counts for subsequent calls. Crystal mimics this behaviour. A variable then just becomes a name for the last expression that we assigned to.

If

Let’s take a piece of Ruby code and analyze it:

if some_condition
  a = 1
  a.abs
else
  a = "hello"
  a.size
end
a.size

In Ruby, the only line that can fail at runtime is the last one. The first call to abs will never fail, as an Int32 was assigned to a. The first call to size will also never fail, as a String was assigned to a. However, after the if, a can either be an Int32 or a String.

So Crystal tries to keep this intuitive reasoning about a’s type. When a variable is assigned inside an if’s then or else branch, the compiler knows that it will continue to have that type until the if ends or until it is assigned a new expression. When an if ends, the compiler will let a have the type of the last expressions that it was assigned to in each branch.

The last line in Crystal will give a compiler error: “undefined method ‘size’ for Int32”. That’s because even though String has a size method, Int32 doesn’t.

In designing the language we had two choices: make the above a compile-time error (like now) or just make it a runtime error (like in Ruby). We believe it’s better to make it a compile-time error. In some cases you might know better than the compiler and you will be sure that a variable has the type that you might think. But in some cases the compiler will let you know that you overlooked a case or some logic, and you’ll thank it for that.

The if has some more cases to take into account. For example, the variable a might not exist before the if. In this case, if it’s not assigned in one of the branches, at the end of the if it will also contain the Nil type if it’s read:

if some_condition
  a = 1
end
a # here a is Int32 or Nil

This, again, mimics Ruby’s behaviour.

Finally, an if’s type is the union of the last expressions in both branches. If a branch is missing, it’s considered to have a Nil type.

While

A while is in a way similar to an if:

a = 1
while some_condition
  a = "hello"
end
a # here a is Int32 or String

That’s because some_condition might be falsy the first time.

However, since a while is a loop there are some more things to consider. For example, the last expression assigned to a variable inside a while determines the type of that variable in the next iteration. In this way, the type at the beginning of the loop will be a union of the type before the loop and the type after the loop:

a = 1
while some_condition
  a           # here a is actually Int32 or String
  a = false   # here a is Bool
  a = "hello" # here a is String
  a.size    # ok, a is String
end
a             # here a is Int32 or String

Some other things to consider inside a while are break and next. A break makes the types right before the break add to the type at the exit of the while:

a = 1
while some_condition
  a             # here a is Int32 or Bool
  if some_other_condition
    a = "hello" # we break, so at the exit a can also be String
    break
  end
  a = false     # here a is Bool
end
a               # here a is Int32 or String or Bool

A next adds the type to the beginning of the while:

a = 1
while some_condition
  a             # here a is Int32 or String or Bool
  if some_other_condition
    a = "hello" # we next, so in the next iteration a can be String
    next
  end
  a = false     # here a is Bool
end
a               # here a is Int32 or String or Bool

Blocks

Blocks are very similar to a while: they can be executed zero or more times. So the logic for variables’ types is very similar to that of a while.

NoReturn

There’s a mysterious type in Crystal called NoReturn. One such example is C’s exit function:

lib C
  fun exit(status : Int32) : NoReturn
end

C.exit(1)    # this is NoReturn
puts "hello" # this will never be executed

Another very useful method that is NoReturn is raise: raising an exception.

The type basically means: after this point there’s nothing else. Nothing gets returned, and nothing that comes afterwars is executed (of course, a rescue will be executed if there’s one surrounding the code, but the normal path won’t be executed).

The compiler knows about NoReturn. For example, take a look at the following code:

a = some_int
if a == 1
  a = "hello"
  puts a.size # ok
  raise "Boom!"
else
  a = 2
end
a # here a can only be Int32

Remember that after an if a variable’s type is the union of the types of both branches. However, since the first branch ends there, because raise is NoReturn, the compiler knows that code after the if, if that branch is taken, will never be executed. So it can definitely say: a will only have the type of the else branch.

The same logic applies when you have return, break or next inside an if.

Also, when you define a method whose type is NoReturn, that method is in turn NoReturn:

def raise_boom
  raise "Boom!"
end

if some_condition
  a = 1
else
  raise_boom
end
a.abs # ok

Union of NoReturn

Remember that an if’s type is the union of the last expressions of the if’s branches.

What type has the following if (and consequently the a variable)?

a = if some_condition
      raise "Boom!"
    else
      1
    end
a # a is...?

Well, the then branch is definitely NoReturn. The else branch is definitely Int32. We could conclude then that a has type NoReturn or Int32. However, NoReturn means that nothing gets executed afterwards. So a can only be Int32 at the end of the previous snippet, and that’s how the compiler behaves.

With this we can implement a little method called not_nil!. Here it is:

class Object
  def not_nil!
    self
  end
end

class Nil
  def not_nil!
    raise "Nil assertion failed"
  end
end

a = some_condition ? 1 : nil
a.not_nil!.abs # compiles!

a’s type is Int32 or Nil. One thing that we didn’t say yet is that when you have a union type and you invoke a method on it, and all types respond to that method, the resulting type is the union of the types of each method.

In this case, a.not_nil! will have the type Int32 if a is Int32, or NoReturn if it’s Nil (because of the raise). Combining these types just gives Int32, so the above code is perfectly valid. And that’s how you can discard Nil from a variable and turn it into a runtime exception if it turns out to be nil. No special language construct is needed. All is made with the logic explained so far.

Type filters

Now, what if we want to execute a method on a variable whose type is Int32 or Nil, but only if that variable is Int32. If it’s Nil, we don’t want to do anything.

We can’t use not_nil!, because that will raise a runtime exception when nil.

We can define another method, try:

class Object
  def try
    yield self
  end
end

class Nil
  def try(&block)
    nil
  end
end

a = some_condition ? 1 : nil
b = a.try &.abs # b is Int32 or Nil

(if you are not sure what &.abs means, read this)

Since doing something depending on whether a value is Nil or not is so common, Crystal provides another way to do the above. This was shortly explained here, but now we’ll explain it better and combine it with the previous explanations.

If a variable is an if’s condition, the compiler assumes the variable is not nil inside the then branch:

a = some_condition ? 1 : nil # a is Int32 or Nil
if a
  a.abs                      # a is Int32
end

This makes sense: if a is truthy then it means it is not nil. Not only this, but the compiler also makes a’s type be that one after the if, combined with whatever type a has in the else branch. For example:

a = some_condition ? 1 : nil
if a
  a.abs   # ok, here a is Int32
else
  a = 1   # here a is Int32
end
a.abs     # ok, a can only be Int32 here

Just like a programmer expects the above to always work in Ruby (never raise an “undefined method” error in runtime), so it works in Crystal.

We call the above a “type filter”: a’s type got filtered inside the if’s then branch by removing Nil from the possible types a can have.

Another type filter happens when you do is_a?:

a = some_condition ? 1 : nil
if a.is_a?(Int32)
  a.abs # ok
end

And another type filter happens when you do responds_to?:

a = some_condition ? 1 : nil
if a.responds_to?(:abs)
  a.abs # ok
end

These are special methods, known by the compiler, and that’s why the compiler is able to filter the types. On the contrary, the method nil? is not special right now so the following won’t work:

a = some_condition ? 1 : nil
if a.nil?
else
  a.abs # should be ok, but now gives error
end

We’ll probably make nil? a special method too, so it’s more consistent with the rest of the language and the above works. We’ll also probably make the unary ! method special, not overloadable, so you could do:

a = some_condition ? 1 : nil
if !a
else
  a.abs # should be ok, but now gives error
end

Conclusion

In conclusion, as was said in the beginning of this post, we want Crystal to behave as much as possible as Ruby, and if something is intuitive and makes sense for the programmer to make the compiler understand it too. For example:

def foo(x)
  return unless x

  x.abs # ok
end

a = some_condition ? 1 : nil
b = foo(a)

The above shouldn’t give you a compile time error. The programmer knows that if x was nil inside foo, the method returns. It follows that x can never be nil afterwards so it’s ok to invoke abs on it. How does the compiler know this?

Well, first, the compiler rewrites an unless to an if:

def foo(x)
  if x
  else
    return
  end

  x.abs # ok
end

Next, inside the then branch of the if we know that x is not nil. Inside the else branch the method returns, so we don’t care about the type of x afterwards. So, after the if, x can only be of type Int32. This is idiomatic code in Ruby, and so it is in Crystal if we carefully follow the language rules.

We still have to talk about methods and instance variables, but this post is already long enough so that will have to be explained in a following post. Stay tuned!

Here we’ll continue explaining how Crystal assigns types to each variable and expression of your program. This post is a bit long, but in the end it’s just about making Crystal behave in the most intuitive way for the programmer, to make it behave as similar as possible to Ruby.

We’ll start with literals, C functions and some primitives. Then we’ll continue with flow control structures, like if, while, and blocks. Then we’ll talk about the special NoReturn type and type filters.

Literals

Literals have a type of their own, known by the compiler:

true     # Boolean
1        # Int32
"hello"  # String
1.5      # Float64

C functions

When you define a C function you must tell the compiler its types:

lib C
  fun sleep(seconds : UInt32) : UInt32
end

sleep(1_u32) # sleep has type UInt32

allocate

The allocate primitive gives you an uninitialized instance of an object:

class Foo
  def initialize(@x)
  end
end

Foo.allocate # Foo.allocate has type Foo

You don’t normally invoke it directly. Instead, you invoke new, which is automatically generated by the compiler to something like this:

class Foo
  def self.new(x)
    foo = allocate
    foo.initialize(x)
    foo
  end

  def initialize(@x)
  end
end

Foo.new(1)

A similar primitive is Pointer#malloc, which gives you a typed pointer to a memory region:

Pointer(Int32).malloc(10) # has type Pointer(Int32)

Variables

Next, when you assign an expression to a variable, the variable will be bound to that expression’s type (if the expresision’s type changes, so the variable’s type changes).

a = 1 # 1 is Int32, so a is Int32

The compiler tries to be as smart as possible when you use variables. For example, you can assign multiple times to a variable:

a = 1       # a is Int32
a.abs       # ok, Int32 has a method 'abs'
a = "hello" # a is now String
a.size    # ok, String has a method 'size'

To achieve this, the compiler remembers which expression was the last one assigned to a variable. In the above example, after the first line the compiler knows that a has type Int32, so a call to abs is valid. In the third line we assign a String to it, so the compiler remembers this and, on the fourth line, it’s perfectly valid to invoke size on it.

Additionally, the compiler remembers that both an Int32 and a String were assigned to a. When generating LLVM code, the compiler will represent a as a union type that can be Int32 or String. It would be something like this in C:

struct Int32OrString {
  int type_id;
  union {
    int int_value;
    string string_value;
  } data;
}

This might seem inefficient if we continually assign different types to the same variable. However, the compiler knows that when you invoked abs, a was an Int32, so it never checks the type_id field: it directly uses the int_value field. LLVM notices this and optimizes this out, so in the generated code there will never be a union (the type_id field is never read).

Going back to Ruby, if you assign a variable multiple times in a row, the last value (and type) is the one that counts for subsequent calls. Crystal mimics this behaviour. A variable then just becomes a name for the last expression that we assigned to.

If

Let’s take a piece of Ruby code and analyze it:

if some_condition
  a = 1
  a.abs
else
  a = "hello"
  a.size
end
a.size

In Ruby, the only line that can fail at runtime is the last one. The first call to abs will never fail, as an Int32 was assigned to a. The first call to size will also never fail, as a String was assigned to a. However, after the if, a can either be an Int32 or a String.

So Crystal tries to keep this intuitive reasoning about a’s type. When a variable is assigned inside an if’s then or else branch, the compiler knows that it will continue to have that type until the if ends or until it is assigned a new expression. When an if ends, the compiler will let a have the type of the last expressions that it was assigned to in each branch.

The last line in Crystal will give a compiler error: “undefined method ‘size’ for Int32”. That’s because even though String has a size method, Int32 doesn’t.

In designing the language we had two choices: make the above a compile-time error (like now) or just make it a runtime error (like in Ruby). We believe it’s better to make it a compile-time error. In some cases you might know better than the compiler and you will be sure that a variable has the type that you might think. But in some cases the compiler will let you know that you overlooked a case or some logic, and you’ll thank it for that.

The if has some more cases to take into account. For example, the variable a might not exist before the if. In this case, if it’s not assigned in one of the branches, at the end of the if it will also contain the Nil type if it’s read:

if some_condition
  a = 1
end
a # here a is Int32 or Nil

This, again, mimics Ruby’s behaviour.

Finally, an if’s type is the union of the last expressions in both branches. If a branch is missing, it’s considered to have a Nil type.

While

A while is in a way similar to an if:

a = 1
while some_condition
  a = "hello"
end
a # here a is Int32 or String

That’s because some_condition might be falsy the first time.

However, since a while is a loop there are some more things to consider. For example, the last expression assigned to a variable inside a while determines the type of that variable in the next iteration. In this way, the type at the beginning of the loop will be a union of the type before the loop and the type after the loop:

a = 1
while some_condition
  a           # here a is actually Int32 or String
  a = false   # here a is Bool
  a = "hello" # here a is String
  a.size    # ok, a is String
end
a             # here a is Int32 or String

Some other things to consider inside a while are break and next. A break makes the types right before the break add to the type at the exit of the while:

a = 1
while some_condition
  a             # here a is Int32 or Bool
  if some_other_condition
    a = "hello" # we break, so at the exit a can also be String
    break
  end
  a = false     # here a is Bool
end
a               # here a is Int32 or String or Bool

A next adds the type to the beginning of the while:

a = 1
while some_condition
  a             # here a is Int32 or String or Bool
  if some_other_condition
    a = "hello" # we next, so in the next iteration a can be String
    next
  end
  a = false     # here a is Bool
end
a               # here a is Int32 or String or Bool

Blocks

Blocks are very similar to a while: they can be executed zero or more times. So the logic for variables’ types is very similar to that of a while.

NoReturn

There’s a mysterious type in Crystal called NoReturn. One such example is C’s exit function:

lib C
  fun exit(status : Int32) : NoReturn
end

C.exit(1)    # this is NoReturn
puts "hello" # this will never be executed

Another very useful method that is NoReturn is raise: raising an exception.

The type basically means: after this point there’s nothing else. Nothing gets returned, and nothing that comes afterwars is executed (of course, a rescue will be executed if there’s one surrounding the code, but the normal path won’t be executed).

The compiler knows about NoReturn. For example, take a look at the following code:

a = some_int
if a == 1
  a = "hello"
  puts a.size # ok
  raise "Boom!"
else
  a = 2
end
a # here a can only be Int32

Remember that after an if a variable’s type is the union of the types of both branches. However, since the first branch ends there, because raise is NoReturn, the compiler knows that code after the if, if that branch is taken, will never be executed. So it can definitely say: a will only have the type of the else branch.

The same logic applies when you have return, break or next inside an if.

Also, when you define a method whose type is NoReturn, that method is in turn NoReturn:

def raise_boom
  raise "Boom!"
end

if some_condition
  a = 1
else
  raise_boom
end
a.abs # ok

Union of NoReturn

Remember that an if’s type is the union of the last expressions of the if’s branches.

What type has the following if (and consequently the a variable)?

a = if some_condition
      raise "Boom!"
    else
      1
    end
a # a is...?

Well, the then branch is definitely NoReturn. The else branch is definitely Int32. We could conclude then that a has type NoReturn or Int32. However, NoReturn means that nothing gets executed afterwards. So a can only be Int32 at the end of the previous snippet, and that’s how the compiler behaves.

With this we can implement a little method called not_nil!. Here it is:

class Object
  def not_nil!
    self
  end
end

class Nil
  def not_nil!
    raise "Nil assertion failed"
  end
end

a = some_condition ? 1 : nil
a.not_nil!.abs # compiles!

a’s type is Int32 or Nil. One thing that we didn’t say yet is that when you have a union type and you invoke a method on it, and all types respond to that method, the resulting type is the union of the types of each method.

In this case, a.not_nil! will have the type Int32 if a is Int32, or NoReturn if it’s Nil (because of the raise). Combining these types just gives Int32, so the above code is perfectly valid. And that’s how you can discard Nil from a variable and turn it into a runtime exception if it turns out to be nil. No special language construct is needed. All is made with the logic explained so far.

Type filters

Now, what if we want to execute a method on a variable whose type is Int32 or Nil, but only if that variable is Int32. If it’s Nil, we don’t want to do anything.

We can’t use not_nil!, because that will raise a runtime exception when nil.

We can define another method, try:

class Object
  def try
    yield self
  end
end

class Nil
  def try(&block)
    nil
  end
end

a = some_condition ? 1 : nil
b = a.try &.abs # b is Int32 or Nil

(if you are not sure what &.abs means, read this)

Since doing something depending on whether a value is Nil or not is so common, Crystal provides another way to do the above. This was shortly explained here, but now we’ll explain it better and combine it with the previous explanations.

If a variable is an if’s condition, the compiler assumes the variable is not nil inside the then branch:

a = some_condition ? 1 : nil # a is Int32 or Nil
if a
  a.abs                      # a is Int32
end

This makes sense: if a is truthy then it means it is not nil. Not only this, but the compiler also makes a’s type be that one after the if, combined with whatever type a has in the else branch. For example:

a = some_condition ? 1 : nil
if a
  a.abs   # ok, here a is Int32
else
  a = 1   # here a is Int32
end
a.abs     # ok, a can only be Int32 here

Just like a programmer expects the above to always work in Ruby (never raise an “undefined method” error in runtime), so it works in Crystal.

We call the above a “type filter”: a’s type got filtered inside the if’s then branch by removing Nil from the possible types a can have.

Another type filter happens when you do is_a?:

a = some_condition ? 1 : nil
if a.is_a?(Int32)
  a.abs # ok
end

And another type filter happens when you do responds_to?:

a = some_condition ? 1 : nil
if a.responds_to?(:abs)
  a.abs # ok
end

These are special methods, known by the compiler, and that’s why the compiler is able to filter the types. On the contrary, the method nil? is not special right now so the following won’t work:

a = some_condition ? 1 : nil
if a.nil?
else
  a.abs # should be ok, but now gives error
end

We’ll probably make nil? a special method too, so it’s more consistent with the rest of the language and the above works. We’ll also probably make the unary ! method special, not overloadable, so you could do:

a = some_condition ? 1 : nil
if !a
else
  a.abs # should be ok, but now gives error
end

Conclusion

In conclusion, as was said in the beginning of this post, we want Crystal to behave as much as possible as Ruby, and if something is intuitive and makes sense for the programmer to make the compiler understand it too. For example:

def foo(x)
  return unless x

  x.abs # ok
end

a = some_condition ? 1 : nil
b = foo(a)

The above shouldn’t give you a compile time error. The programmer knows that if x was nil inside foo, the method returns. It follows that x can never be nil afterwards so it’s ok to invoke abs on it. How does the compiler know this?

Well, first, the compiler rewrites an unless to an if:

def foo(x)
  if x
  else
    return
  end

  x.abs # ok
end

Next, inside the then branch of the if we know that x is not nil. Inside the else branch the method returns, so we don’t care about the type of x afterwards. So, after the if, x can only be of type Int32. This is idiomatic code in Ruby, and so it is in Crystal if we carefully follow the language rules.

We still have to talk about methods and instance variables, but this post is already long enough so that will have to be explained in a following post. Stay tuned!

comments powered by Disqus