In the last summer, I developed my first programming language SAMPL. Measured against my technical skills at that time, it was a clear success. I was able to implement an interpreter and a compiler for a self-designed functional programming language with only the knowledge to implement an interpreter for a toy language in Cornell CS 3110. I was particularly proud of the module system and generics in that language.
However, that language has some fundamental limitations. Type inference isn’t great because it requires type annotation for all functional arguments; the code I used to I solve generic parameter inference looks long and hacky; more importantly, the error messages are really unhelpful which makes debugging very painful. What’s more, I don’t like the name of that language. In the last blog post, I mentioned that I named it “Sound and Modern Programming Language” just to make it contain the substring SAM, which seems very unnatural.
Given that giant heap of legacy issues, I decided to start from scratch again to design and implement a better programming language in my mind, with the additional knowledge in programming language theory thanks to Cornell CS 4110 and past experience of designing a language. I decided to name it SAMLANG because I can’t think of other names. In this week, I archived the old language’s repo, open sourced the new language’s repo on GitHub and published its official documentation. Now it’s the time to discuss some of my design choices.
Influence of Design Choices and My Justification
Since I would start from scratch, I had the liberty to design a completely new type system. The first decision would be whether the typing rule would be a nominal or structural. I initially favored structural because Flow’s type system is structural and it works great with the React components typing. Nevertheless, as soon as I dug deeper into the typing rules, I found some fundamental difficulties related to how extension function works.
If the type system is structural, then the user may choose to write:
Then deciding which function to call requires iterating over the entire typing environment, which
is inefficient and slow. It also places some burden on the user to reason which function gets
called. In a nominal type system, the search is confined to a single module and it will simply be a
string search in an immutable map in
O(lg n) time.
Therefore, I decided to use the nominal typing rule to simplify the implementation. It also makes type inference a lot easier because the type constraints with objects are more explicit under my design. In order to simplify the object literal type checking, I decided that the object definition is only fully visible inside the defining module but opaque outside of the module. In order to make the typing rule consistent, same applies to variant type. I personally think this is a valid simplification, since it’s also a bad practice to expose the fields publicly in other programming languages.
The greatest challenge is type inference. Cornell’s course notes say that we should use a set of constraints to gradually solve the type, but the examples given are quite simple.
With some experimentation and thoughts, I introduced undecided type to represent an expression
without clear type in a limited context and free type to represent a free constraint. Then I used a
union-find data structure to keep track of all the aliasing relation detected by the type-checker
between different temporarily undecided types. Right now in one place, it requires an
iteration over all the typing constraints, which I hope I can optimize it away later.
Even if we require type annotation for the top-level functions, it still doesn’t help in this case.
It’s also not very good to make the value
identity generic in this case, because some of my
planned compilation target (like Java) does not support generic values.
There are two options: make the undecided type
unit or die with a compile-time error. I chose the
latter. The former option may look fine in this case, but it misses an important mistake. Suppose
the user define a (bad) generic function and use it in this way:
The random function clearly has a problem: it defines a generic type that is never used. If we
random is called, the user can catch his/her mistake.