SlideShare a Scribd company logo
Crystal internals
Part 1
Is a compiler a hard thing?
At Manas we usually do webapps
Let’s talk about webapps...
Let’s talk about webapps...
● HTML/CSS/JS
● React/Angular/Knockout
● Ruby/Erlang/Elixir
● Database (mysql/postgres)
● Elasticsearch
● Redis/Sidekiq/Background-jobs
● Docker, capistrano, deploy, servers
Let’s talk about webapps...
● HTML/CSS/JS
● React/Angular/Knockout
● Ruby/Erlang/Elixir
● Database (mysql/postgres)
● Elasticsearch
● Redis/Sidekiq/Background-jobs
● Docker, capistrano, deploy, servers
Easy…?
Let’s talk about compilers...
● HTML/CSS/JS
● React/Angular/Knockout
● Ruby/Erlang/Elixir
● Database (mysql/postgres)
● Elasticsearch
● Redis/Sidekiq/Background-jobs
● Docker, capistrano, deploy, servers
Easy!
Let’s talk about compilers...
Let’s talk about compilers...
No, let’s talk about usual programs
No, let’s talk about usual programs
INPUT -> [PROCESSING…] -> OUTPUT
No, let’s talk about compilers
SOURCE CODE -> [PROCESSING…] -> EXECUTABLE
No, let’s talk about compilers
SOURCE CODE -> [PROCESSING…] -> EXECUTABLE
How do we go from source code to an executable?
Traditional stages of a compiler
class Foo
def bar
1 + 2
end
end
● Lexer: [“class”, “Foo”, “;”, “def”, “bar”, “;”, “1”, “+”, “2”, “;”, “end”, “;”, “end”]
● Parser: ClassDef(“Foo”, body: [Def.new(“bar”)])
● Semantic (a.k.a “type check”): make sure there are no type errors
● Codegen: generate machine code
Let’s start with the codegen phase
Goal: generate efficient assembly code for many architectures (32 bits, 64 bits,
intel, arm, etc.)
● Generating assembly code is hard
● Generating efficient assembly code is harder
● Generating assembly code for many architectures is hard/tedious/boring
Let’s start with the codegen phase
Goal: generate efficient assembly code for many architectures (32 bits, 64 bits,
intel, arm, etc.)
● Generating assembly code is hard
● Generating efficient assembly code is harder
● Generating assembly code for many architectures is hard/tedious/boring
Thus: writing a compiler is HARD! :-(
Let’s start with the codegen phase
Goal: generate efficient assembly code for many architectures (32 bits, 64 bits,
intel, arm, etc.)
● Generating assembly code is hard
● Generating efficient assembly code is harder
● Generating assembly code for many architectures is hard/tedious/boring
Thus: writing a compiler is HARD! :-(
Well, not anymore...
Crystal internals (part 1)
Crystal internals (part 1)
Codegen
With LLVM, we generate LLVM IR (internal representation) instead of assembly,
and LLVM takes care of generating efficient assembly code for us!
The hardest part is solved :-)
define i32 @add(i32 %x, i32 %y) {
%0 = add i32 %x, %y
ret i32 %0
}
Codegen: LLVM (example)
LLVM provides a nice API to generate IR
require "llvm"
mod = LLVM::Module.new("main")
mod.functions.add("add", [LLVM::Int32, LLVM::Int32], LLVM::Int32) do |func|
func.basic_blocks.append do |builder|
res = builder.add(func.params[0], func.params[1])
builder.ret(res)
end
end
puts mod
● Lexer
● Parser
● Semantic
Remaining phases
● Kind of easy: go char by char until we get a keyword, identifier, number, etc.
● We won’t go into implementation details...
Lexer
● Kind of easy: go token by token and create a tree of expressions
● This tree is called AST: Abstract Syntax Tree
● An AST is like a directed, acyclic graph
● We won’t go into implementation details...
Parser
● This is the fundamental piece of the compiler
● It takes an AST as input and analyzes it
● Analysis can result in:
○ Declaring types: for example “class Foo; end” will declare a type Foo
○ Checking methods: for example “Foo.bar” will check that “Foo” is a declared type and that the
method “bar” exists in it, and has the correct arity and types
○ Giving each non-dead expression in the program a type
○ Gathering some info for the codegen phase: for example know the local variables of a method,
and their type
Semantic
● The interesting part of the compiler is the semantic phase
● It’s just about processing an AST
● In Crystal’s compiler you just need to know one language: Crystal!
● No HTML/CSS/JS/JSX/etc.
● No untyped, dynamic languages: no Ruby/Erlang/Elixir. Type safe!
● Stuff is processed in memory
● No databases, no Elasticsearch, no Redis
Semantic
● The interesting part of the compiler is the semantic phase
● It’s just about processing an AST
● In Crystal’s compiler you just need to know one language: Crystal!
● No HTML/CSS/JS/JSX/etc.
● No untyped, dynamic languages: no Ruby/Erlang/Elixir. Type safe!
● Stuff is processed in memory
● No databases, no Elasticsearch, no Redis
Writing a compiler is easier than writing a web app! ^_^
Semantic
● The interesting part of the compiler is the semantic phase
● It’s just about processing an AST
● In Crystal’s compiler you just need to know one language: Crystal!
● No HTML/CSS/JS/JSX/etc.
● No untyped, dynamic languages: no Ruby/Erlang/Elixir. Type safe!
● Stuff is processed in memory
● No databases, no Elasticsearch, no Redis
Writing a compiler is easier than writing a web app! ^_^
(Or at least it’s more fun :-P)
Semantic
Crystal internals (part 1)
Directory layout
● src/compiler/crystal
○ command/
○ syntax/
○ semantic/
○ macros/
○ codegen/
○ tools/
○ compiler.cr
○ types.cr
○ program.cr
Directory layout
● src/compiler/crystal
○ command/ : the command line interface
○ syntax/ : lexer, parser, ast, visitor, transformer
○ semantic/ : type declaration, method lookup, etc.
○ macros/ : macro expansion logic
○ codegen/ : codegen
○ tools/ : doc generator, formatter, init
○ compiler.cr : combines syntax + semantic + codegen
○ types.cr : all possible types in Crystal (Int32, String, unions, custom types, etc.)
○ program.cr : holds definitions of a program (holds Int32, String, etc.)
Directory layout
● src/compiler/crystal : ~43K LOC
○ command/ : ~300LOC
○ syntax/ : ~10K LOC
○ semantic/ : ~12K LOC
○ macros/ : ~2K LOC
○ codegen/ : ~6K LOC
○ tools/ : ~7K LOC
○ compiler.cr : ~300LOC
○ types.cr :~2K LOC
○ program.cr : ~300 LOC
Directory layout
● src/compiler/crystal : ~43K LOC
○ command/ : ~300LOC
○ syntax/ : ~10K LOC
○ semantic/ : ~12K LOC
○ macros/ : ~2K LOC
○ codegen/ : ~6K LOC
○ tools/ : ~7K LOC
○ compiler.cr : ~300LOC
○ types.cr :~2K LOC
○ program.cr : ~300 LOC
About 14K LOC to analyze source code.
Directory layout
● src/compiler/crystal : ~43K LOC
○ command/ : ~300LOC
○ syntax/ : ~10K LOC
○ semantic/ : ~12K LOC
○ macros/ : ~2K LOC
○ codegen/ : ~6K LOC
○ tools/ : ~7K LOC
○ compiler.cr : ~300LOC
○ types.cr :~2K LOC
○ program.cr : ~300 LOC
About 14K LOC to analyze source code.
One big Rails app at Manas has 14K LOC in “./app”
Directory layout
● src/compiler/crystal : ~43K LOC
○ command/ : ~300LOC
○ syntax/ : ~10K LOC
○ semantic/ : ~12K LOC
○ macros/ : ~2K LOC
○ codegen/ : ~6K LOC
○ tools/ : ~7K LOC
○ compiler.cr : ~300LOC
○ types.cr :~2K LOC
○ program.cr : ~300 LOC
About 14K LOC to analyze source code.
One big Rails app at Manas has 14K LOC in “./app”
A compiler can’t be that hard! ;-)
Show me the code
Show me the code
# src/compiler/crystal/compiler.cr
def compile(source : Source | Array(Source), output_filename : String) : Result
source = [source] unless source.is_a?(Array)
program = new_program(source)
node = parse program, source
node = program.semantic node, @stats
codegen program, node, source, output_filename unless @no_codegen
Result.new program, node
end
Show me the code
# src/compiler/crystal/compiler.cr
def compile(source : Source | Array(Source), output_filename : String) : Result
source = [source] unless source.is_a?(Array)
program = new_program(source)
node = parse program, source
node = program.semantic node, @stats
codegen program, node, source, output_filename unless @no_codegen
Result.new program, node
end
Show me the code
# src/compiler/crystal/compiler.cr
def compile(source : Source | Array(Source), output_filename : String) : Result
source = [source] unless source.is_a?(Array)
program = new_program(source)
node = parse program, source
node = program.semantic node, @stats
codegen program, node, source, output_filename unless @no_codegen
Result.new program, node
end
What is a program?
Program
● Holds all types and top-level methods for a given compilation
● For example, if I compile “class Foo; end” and you compile “class Bar; end”,
the first program will have a type named “Foo”, and the second one won’t (but
it will have a type named “Bar”)
● It lets us test the compiler more easily, because we can use different Program
instances for each snippet of code that we want to test
● In contrast of having global variables holding all of a program’s data
● A Program is passed around in all phases of a compilation (except lexing and
parsing, which don’t need semantic info)
Show me the code
# src/compiler/crystal/compiler.cr
def compile(source : Source | Array(Source), output_filename : String) : Result
source = [source] unless source.is_a?(Array)
program = new_program(source)
node = parse program, source # from source to Crystal::ASTNode
node = program.semantic node, @stats
codegen program, node, source, output_filename unless @no_codegen
Result.new program, node
end
What is a program?
Show me the code
# src/compiler/crystal/compiler.cr
def compile(source : Source | Array(Source), output_filename : String) : Result
source = [source] unless source.is_a?(Array)
program = new_program(source)
node = parse program, source
node = program.semantic node, @stats # Semantic! :-)
codegen program, node, source, output_filename unless @no_codegen
Result.new program, node
end
What is a program?
Semantic
● The entry point for semantic analysis is in
src/compiler/crystal/semantic.cr
● Other files are in src/compiler/crystal/semantic/
● The file semantic.cr has comments that explain the overall algorithm :-)
Semantic: overall algorithm
● top level: declare classes, modules, macros, defs and other top-level stuff
● new methods: create `new` methods for every `initialize` method
● type declarations: process type declarations like `@x : Int32`
● check abstract defs: check that abstract defs are implemented
● class_vars_initializers: process initializers like `@@x = 1`
● instance_vars_initializers: process initializers like `@x = 1`
● main: process "main" code, calls and method bodies (the whole program).
● cleanup: remove dead code and other simplifications
● check recursive structs: check that structs are not recursive (impossible to
codegen)
Semantic: overall algorithm
Note!
● This algorithm didn’t come from the Skies
(nor from a textbook, nor from a paper)
● It’s not written in stone!
● It can definitely be improved: readability,
performance, etc.
Note!
● It’s actually more like this…
Semantic: overall algorithm
Semantic
But before looking at each phase, we need to learn about the most useful pattern
for analyzing an AST...
The Visitor pattern
require "compiler/crystal/syntax"
class SumVisitor < Crystal::Visitor
getter sum = 0
def visit(node : Crystal::NumberLiteral)
@sum += node.value.to_i
end
def visit(node : Crystal::ASTNode)
true # true: continue visiting children nodes
end
end
ast = Crystal::Parser.parse("foo(1 + 2, 3, [4])")
visitor = SumVisitor.new
ast.accept(visitor)
puts visitor.sum
The Visitor pattern
● We define a visit method for each node of interest
● We process the nodes
● We return true if we want to process children, false otherwise
● Example: if we only want to process class declarations, we could just define
visit(node : Crystal::ClassDef) and define some logic there (and return true,
because of nested class definitions)
● A visitor abstracts over the way nodes are composed
● ...though in many cases, for semantic purposes, we need and use the way a
node is composed (for example, to analyze a call we need to know the
argument types, so we check the arguments, not all children in a generic way)
Semantic: overall algorithm
● top level: declare classes, modules, macros, defs and other top-level stuff
● new methods
● type declarations
● check abstract defs
● class_vars_initializers
● instance_vars_initializers
● main
● cleanup
● check recursive structs
Top level: declare classes, modules, macros, defs...
# src/compiler/crystal/semantic/top_level_visitor.cr
class Crystal::TopLevelVisitor < Crystal::SemanticVisitor
# ...
end
● Located at semantic_visitor.cr
● This is a base visitor used in most of the phases of the semantic analysis
● It keeps track of the “current type”
● For example in “class Foo; class Bar; baz; end; end”, “current type” starts at
the top-level (the Program). When “class Foo” is found, the current type
becomes “Foo” (we search “Foo” in the current type). When “class Bar” is
found, the current type becomes “Foo::Bar” (we search “Bar” in the current
type). When “baz” is found, it will be looked up inside the current type.
● But initially there’s no “Foo” inside the current type (the Program). Who
defines it? … The top-level visitor!
Crystal::SemanticVisitor
● Located at top_level_visitor.cr
● Defines classes, methods, etc.
● Given “class Foo; class Bar; baz; end; end”...
● current_type starts at Program
● When “class Foo” is found (ClassDef), we check if “Foo” exists in the current
type. If not, we create it. If it exists with a different type (if it’s a module), we
give an error.
● We attach this type “Foo” to the AST node ClassDef. SemnticVisitor will use
this in every subsequent phase.
● … the “baz” call is not analyzed here (unless it’s a macro)
Crystal::TopLevelVisitor
Crystal::TopLevelVisitor
● Many other things done in this visitor: methods and macros are added to
types, aliases and enums are defined, etc.
● Question: why are methods and macros defined at this phase?
● The “inherited” macro hook must be processed as soon as “Bar <
Foo” and “Baz < Foo” are found
● The macro expands to “do_something”, which must expand to
“def foo; 1; end”
● This must happen before we continue processing Baz’s body:
“def foo; 3; end” must win and be the method found when doing
“Baz.new.foo”
● Conclusion: methods, macros and hooks must be defined in the
first pass, when defining types. Additionally, macros might be
looked up in types in this same pass (like “do_something”)
● SemanticVisitor takes care to look up and expand calls that
resolve to macro calls
When should macros be defined and expanded
class Foo
macro inherited
do_something
end
macro do_something
def foo; 1; end
end
end
class Bar < Foo; end
class Baz < Foo
def foo; 3; end
end
puts Bar.new.foo # => 1
puts Baz.new.foo # => 3
Method overloads
● Crystal methods are very powerful! For example: optional type restrictions,
different number of arguments, default arguments, splat, etc.
● When methods are added to types we need to:
○ Know if a method replaces (redefines) an old method
○ Track whether a method is “stricter” than another method, to quickly know, given a call
argument types, in which order they are going to be tested
Method restrictions
def foo(x : Int32)
puts 1
end
def foo(x)
puts 2
end
foo(1)
foo('a')
● Given foo(1), both methods match it. However, the first overload
should be invoked because it has a stronger restriction than the
second overload.
● If we define the methods in a different order, it still works the
same
● This is because an argument with a type restriction is stronger than
one without one. We say that the first one is a restriction of the
second one (we should probably rename this to use stronger)
● This applies to types too: Int32 is stronger than Int32 |
String. And Bar is stronger than Foo, if Bar < Foo.
● Given two methods with the same name, if all arguments of a
method are stronger than the others’, the whole method is stronger
and should come first. Each type stores an ordered list of methods
indexed by method name, with this notion.
● If the methods are both stronger than each other, they have the
same restriction.
Method restrictions
def foo(x : Int32)
puts 1
end
def foo(x)
puts 2
end
foo(1)
foo('a')
● This logic is located at restrictions.cr
● A lot of cases to consider: generics, tuples, splats, etc.
● The code and algorithms could probably use a simpler, unified logic
and a cleanup, but first all of these concepts and definitions must be
defined much more formally
Semantic: overall algorithm
● top level
● new methods: create `new` methods for every `initialize` method
● type declarations
● check abstract defs
● class_vars_initializers
● instance_vars_initializers
● main
● cleanup
● check recursive structs
● Located at new.cr
● TopLevelVisitor creates a `new` class method for every `initialize` method it
finds (the logic for this is also in new.cr)
● Classes that end up without an `initialize` need a default, argless `self.new`
method
● This phase is a bit messy right now because of some missing things related to
generics…
Semantic: new methods
class Foo
def initialize(x : Int32)
@x = x
end
# Generated from the above
def self.new(x : Int32)
instance = allocate
instance.initialize(x)
if instance.responds_to?(:finalize)
::GC.add_finalizer(instance)
end
end
end
Semantic: new methods
Semantic: overall algorithm
● top level
● new methods
● type declarations: process type declarations like `@x : Int32`
● check abstract defs
● class_vars_initializers
● instance_vars_initializers
● main
● cleanup
● check recursive structs
● Located at type_declaration_processor.cr (and
type_declaration_visitor.cr and type_guess_visitor.cr)
● Combines info gathered by these two visitors to declare the type of instance
and class variables.
● TypeDeclarationVisitor deals with explicit type declarations
● TypeGuessVisitor tries to “guess” the type of instance and class variables
without an explicit type annotations (for example @x = 1 and @x =
Foo.new)
Semantic: type declarations
Semantic: overall algorithm
● top level
● new methods
● type declarations
● check abstract defs: check that abstract defs are implemented
● class_vars_initializers
● instance_vars_initializers
● main
● cleanup
● check recursive structs
● Located at abstract_def_checker.cr
● Not a visitor, but traverses all types, and for those that have abstract defs
checks that subclasses or including modules defined those methods
Semantic: check abstract defs

More Related Content

What's hot (20)

Dart programming language
Dart programming languageDart programming language
Dart programming language
Aniruddha Chakrabarti
 
C# language
C# languageC# language
C# language
Akanksha Shukla
 
Dart
DartDart
Dart
Jana Moudrá
 
Dart
DartDart
Dart
Pritam Tirpude
 
R ext world/ useR! Kiev
R ext world/ useR!  KievR ext world/ useR!  Kiev
R ext world/ useR! Kiev
Ruslan Shevchenko
 
Python for Swift
Python for SwiftPython for Swift
Python for Swift
LINE Corporation
 
Dart
DartDart
Dart
Raoul-Gabriel Urma
 
Getting started with Perl XS and Inline::C
Getting started with Perl XS and Inline::CGetting started with Perl XS and Inline::C
Getting started with Perl XS and Inline::C
daoswald
 
Ruby
RubyRuby
Ruby
Aizat Faiz
 
Hunt for dead code
Hunt for dead codeHunt for dead code
Hunt for dead code
Damien Seguy
 
The Hitchhiker's Guide to Faster Builds. Viktor Kirilov. CoreHard Spring 2019
The Hitchhiker's Guide to Faster Builds. Viktor Kirilov. CoreHard Spring 2019The Hitchhiker's Guide to Faster Builds. Viktor Kirilov. CoreHard Spring 2019
The Hitchhiker's Guide to Faster Builds. Viktor Kirilov. CoreHard Spring 2019
corehard_by
 
Glance rebol
Glance rebolGlance rebol
Glance rebol
crazyaxe
 
How to build SDKs in Go
How to build SDKs in GoHow to build SDKs in Go
How to build SDKs in Go
Diwaker Gupta
 
Dart presentation
Dart presentationDart presentation
Dart presentation
Lucas Leal
 
Sugar Presentation - YULHackers March 2009
Sugar Presentation - YULHackers March 2009Sugar Presentation - YULHackers March 2009
Sugar Presentation - YULHackers March 2009
spierre
 
HTTP Potpourri
HTTP PotpourriHTTP Potpourri
HTTP Potpourri
Kevin Hakanson
 
Ruby Programming Language - Introduction
Ruby Programming Language - IntroductionRuby Programming Language - Introduction
Ruby Programming Language - Introduction
Kwangshin Oh
 
DSL's with Groovy
DSL's with GroovyDSL's with Groovy
DSL's with Groovy
paulbowler
 
PHP7
PHP7PHP7
PHP7
Alexandre André
 
Groovy DSL
Groovy DSLGroovy DSL
Groovy DSL
NexThoughts Technologies
 

Viewers also liked (10)

Concurrencia en Crystal
Concurrencia en CrystalConcurrencia en Crystal
Concurrencia en Crystal
Crystal Language
 
Crystal internals (part 1)
Crystal internals (part 1)Crystal internals (part 1)
Crystal internals (part 1)
Crystal Language
 
Por qué Crystal? Why Crystal Language?
Por qué Crystal? Why Crystal Language?Por qué Crystal? Why Crystal Language?
Por qué Crystal? Why Crystal Language?
Crystal Language
 
Crystal: herramientas, uso y creación.
Crystal: herramientas, uso y creación.Crystal: herramientas, uso y creación.
Crystal: herramientas, uso y creación.
Crystal Language
 
Crystal internals (part 1)
Crystal internals (part 1)Crystal internals (part 1)
Crystal internals (part 1)
Ary Borenszweig
 
Blogger
BloggerBlogger
Blogger
Carlos Morales
 
NEMA's electroindustry January 2010
NEMA's electroindustry January 2010NEMA's electroindustry January 2010
NEMA's electroindustry January 2010
Eric Hsieh
 
Flip Men Season 2 Premier Gala Article pic revised
Flip Men Season 2 Premier Gala Article pic revisedFlip Men Season 2 Premier Gala Article pic revised
Flip Men Season 2 Premier Gala Article pic revised
Jannette Newton
 
A Report about SCM & Logistics
A Report about SCM & LogisticsA Report about SCM & Logistics
A Report about SCM & Logistics
Santosh Khadka
 
Programa curricular primaria
Programa curricular primariaPrograma curricular primaria
Programa curricular primaria
Manuel Ramirez Sanchez
 
Crystal internals (part 1)
Crystal internals (part 1)Crystal internals (part 1)
Crystal internals (part 1)
Crystal Language
 
Por qué Crystal? Why Crystal Language?
Por qué Crystal? Why Crystal Language?Por qué Crystal? Why Crystal Language?
Por qué Crystal? Why Crystal Language?
Crystal Language
 
Crystal: herramientas, uso y creación.
Crystal: herramientas, uso y creación.Crystal: herramientas, uso y creación.
Crystal: herramientas, uso y creación.
Crystal Language
 
Crystal internals (part 1)
Crystal internals (part 1)Crystal internals (part 1)
Crystal internals (part 1)
Ary Borenszweig
 
NEMA's electroindustry January 2010
NEMA's electroindustry January 2010NEMA's electroindustry January 2010
NEMA's electroindustry January 2010
Eric Hsieh
 
Flip Men Season 2 Premier Gala Article pic revised
Flip Men Season 2 Premier Gala Article pic revisedFlip Men Season 2 Premier Gala Article pic revised
Flip Men Season 2 Premier Gala Article pic revised
Jannette Newton
 
A Report about SCM & Logistics
A Report about SCM & LogisticsA Report about SCM & Logistics
A Report about SCM & Logistics
Santosh Khadka
 

Similar to Crystal internals (part 1) (20)

Apidays Paris 2023 - Forget TypeScript, Choose Rust to build Robust, Fast and...
Apidays Paris 2023 - Forget TypeScript, Choose Rust to build Robust, Fast and...Apidays Paris 2023 - Forget TypeScript, Choose Rust to build Robust, Fast and...
Apidays Paris 2023 - Forget TypeScript, Choose Rust to build Robust, Fast and...
apidays
 
(1) c sharp introduction_basics_dot_net
(1) c sharp introduction_basics_dot_net(1) c sharp introduction_basics_dot_net
(1) c sharp introduction_basics_dot_net
Nico Ludwig
 
[Td 2015] what is new in visual c++ 2015 and future directions(ulzii luvsanba...
[Td 2015] what is new in visual c++ 2015 and future directions(ulzii luvsanba...[Td 2015] what is new in visual c++ 2015 and future directions(ulzii luvsanba...
[Td 2015] what is new in visual c++ 2015 and future directions(ulzii luvsanba...
Sang Don Kim
 
C Language
C LanguageC Language
C Language
Syed Zaid Irshad
 
A Life of breakpoint
A Life of breakpointA Life of breakpoint
A Life of breakpoint
Hajime Morrita
 
Ruxmon.2013-08.-.CodeBro!
Ruxmon.2013-08.-.CodeBro!Ruxmon.2013-08.-.CodeBro!
Ruxmon.2013-08.-.CodeBro!
Christophe Alladoum
 
Lecture 1 introduction to language processors
Lecture 1  introduction to language processorsLecture 1  introduction to language processors
Lecture 1 introduction to language processors
Rebaz Najeeb
 
Road to sbt 1.0 paved with server
Road to sbt 1.0   paved with serverRoad to sbt 1.0   paved with server
Road to sbt 1.0 paved with server
Eugene Yokota
 
ArangoDB
ArangoDBArangoDB
ArangoDB
ArangoDB Database
 
Road to sbt 1.0: Paved with server (2015 Amsterdam)
Road to sbt 1.0: Paved with server (2015 Amsterdam)Road to sbt 1.0: Paved with server (2015 Amsterdam)
Road to sbt 1.0: Paved with server (2015 Amsterdam)
Eugene Yokota
 
Power Leveling your TypeScript
Power Leveling your TypeScriptPower Leveling your TypeScript
Power Leveling your TypeScript
Offirmo
 
TI1220 Lecture 14: Domain-Specific Languages
TI1220 Lecture 14: Domain-Specific LanguagesTI1220 Lecture 14: Domain-Specific Languages
TI1220 Lecture 14: Domain-Specific Languages
Eelco Visser
 
mloc.js 2014 - JavaScript and the browser as a platform for game development
mloc.js 2014 - JavaScript and the browser as a platform for game developmentmloc.js 2014 - JavaScript and the browser as a platform for game development
mloc.js 2014 - JavaScript and the browser as a platform for game development
David Galeano
 
Software Development Automation With Scripting Languages
Software Development Automation With Scripting LanguagesSoftware Development Automation With Scripting Languages
Software Development Automation With Scripting Languages
Ionela
 
Language Server Protocol - Why the Hype?
Language Server Protocol - Why the Hype?Language Server Protocol - Why the Hype?
Language Server Protocol - Why the Hype?
mikaelbarbero
 
Structure-Compiler-phases information about basics of compiler. Pdfpdf
Structure-Compiler-phases information  about basics of compiler. PdfpdfStructure-Compiler-phases information  about basics of compiler. Pdfpdf
Structure-Compiler-phases information about basics of compiler. Pdfpdf
ovidlivi91
 
Adventures in Thread-per-Core Async with Redpanda and Seastar
Adventures in Thread-per-Core Async with Redpanda and SeastarAdventures in Thread-per-Core Async with Redpanda and Seastar
Adventures in Thread-per-Core Async with Redpanda and Seastar
ScyllaDB
 
Build Great Networked APIs with Swift, OpenAPI, and gRPC
Build Great Networked APIs with Swift, OpenAPI, and gRPCBuild Great Networked APIs with Swift, OpenAPI, and gRPC
Build Great Networked APIs with Swift, OpenAPI, and gRPC
Tim Burks
 
Compiler_Lecture1.pdf
Compiler_Lecture1.pdfCompiler_Lecture1.pdf
Compiler_Lecture1.pdf
AkarTaher
 
C-sharping.docx
C-sharping.docxC-sharping.docx
C-sharping.docx
LenchoMamudeBaro
 
Apidays Paris 2023 - Forget TypeScript, Choose Rust to build Robust, Fast and...
Apidays Paris 2023 - Forget TypeScript, Choose Rust to build Robust, Fast and...Apidays Paris 2023 - Forget TypeScript, Choose Rust to build Robust, Fast and...
Apidays Paris 2023 - Forget TypeScript, Choose Rust to build Robust, Fast and...
apidays
 
(1) c sharp introduction_basics_dot_net
(1) c sharp introduction_basics_dot_net(1) c sharp introduction_basics_dot_net
(1) c sharp introduction_basics_dot_net
Nico Ludwig
 
[Td 2015] what is new in visual c++ 2015 and future directions(ulzii luvsanba...
[Td 2015] what is new in visual c++ 2015 and future directions(ulzii luvsanba...[Td 2015] what is new in visual c++ 2015 and future directions(ulzii luvsanba...
[Td 2015] what is new in visual c++ 2015 and future directions(ulzii luvsanba...
Sang Don Kim
 
Lecture 1 introduction to language processors
Lecture 1  introduction to language processorsLecture 1  introduction to language processors
Lecture 1 introduction to language processors
Rebaz Najeeb
 
Road to sbt 1.0 paved with server
Road to sbt 1.0   paved with serverRoad to sbt 1.0   paved with server
Road to sbt 1.0 paved with server
Eugene Yokota
 
Road to sbt 1.0: Paved with server (2015 Amsterdam)
Road to sbt 1.0: Paved with server (2015 Amsterdam)Road to sbt 1.0: Paved with server (2015 Amsterdam)
Road to sbt 1.0: Paved with server (2015 Amsterdam)
Eugene Yokota
 
Power Leveling your TypeScript
Power Leveling your TypeScriptPower Leveling your TypeScript
Power Leveling your TypeScript
Offirmo
 
TI1220 Lecture 14: Domain-Specific Languages
TI1220 Lecture 14: Domain-Specific LanguagesTI1220 Lecture 14: Domain-Specific Languages
TI1220 Lecture 14: Domain-Specific Languages
Eelco Visser
 
mloc.js 2014 - JavaScript and the browser as a platform for game development
mloc.js 2014 - JavaScript and the browser as a platform for game developmentmloc.js 2014 - JavaScript and the browser as a platform for game development
mloc.js 2014 - JavaScript and the browser as a platform for game development
David Galeano
 
Software Development Automation With Scripting Languages
Software Development Automation With Scripting LanguagesSoftware Development Automation With Scripting Languages
Software Development Automation With Scripting Languages
Ionela
 
Language Server Protocol - Why the Hype?
Language Server Protocol - Why the Hype?Language Server Protocol - Why the Hype?
Language Server Protocol - Why the Hype?
mikaelbarbero
 
Structure-Compiler-phases information about basics of compiler. Pdfpdf
Structure-Compiler-phases information  about basics of compiler. PdfpdfStructure-Compiler-phases information  about basics of compiler. Pdfpdf
Structure-Compiler-phases information about basics of compiler. Pdfpdf
ovidlivi91
 
Adventures in Thread-per-Core Async with Redpanda and Seastar
Adventures in Thread-per-Core Async with Redpanda and SeastarAdventures in Thread-per-Core Async with Redpanda and Seastar
Adventures in Thread-per-Core Async with Redpanda and Seastar
ScyllaDB
 
Build Great Networked APIs with Swift, OpenAPI, and gRPC
Build Great Networked APIs with Swift, OpenAPI, and gRPCBuild Great Networked APIs with Swift, OpenAPI, and gRPC
Build Great Networked APIs with Swift, OpenAPI, and gRPC
Tim Burks
 
Compiler_Lecture1.pdf
Compiler_Lecture1.pdfCompiler_Lecture1.pdf
Compiler_Lecture1.pdf
AkarTaher
 

Recently uploaded (20)

Wilcom Embroidery Studio Crack Free Latest 2025
Wilcom Embroidery Studio Crack Free Latest 2025Wilcom Embroidery Studio Crack Free Latest 2025
Wilcom Embroidery Studio Crack Free Latest 2025
Web Designer
 
Download MathType Crack Version 2025???
Download MathType Crack  Version 2025???Download MathType Crack  Version 2025???
Download MathType Crack Version 2025???
Google
 
Time Estimation: Expert Tips & Proven Project Techniques
Time Estimation: Expert Tips & Proven Project TechniquesTime Estimation: Expert Tips & Proven Project Techniques
Time Estimation: Expert Tips & Proven Project Techniques
Livetecs LLC
 
NYC ACE 08-May-2025-Combined Presentation.pdf
NYC ACE 08-May-2025-Combined Presentation.pdfNYC ACE 08-May-2025-Combined Presentation.pdf
NYC ACE 08-May-2025-Combined Presentation.pdf
AUGNYC
 
Autodesk Inventor Crack (2025) Latest
Autodesk Inventor    Crack (2025) LatestAutodesk Inventor    Crack (2025) Latest
Autodesk Inventor Crack (2025) Latest
Google
 
Robotic Process Automation (RPA) Software Development Services.pptx
Robotic Process Automation (RPA) Software Development Services.pptxRobotic Process Automation (RPA) Software Development Services.pptx
Robotic Process Automation (RPA) Software Development Services.pptx
julia smits
 
Mobile Application Developer Dubai | Custom App Solutions by Ajath
Mobile Application Developer Dubai | Custom App Solutions by AjathMobile Application Developer Dubai | Custom App Solutions by Ajath
Mobile Application Developer Dubai | Custom App Solutions by Ajath
Ajath Infotech Technologies LLC
 
Beyond the code. Complexity - 2025.05 - SwiftCraft
Beyond the code. Complexity - 2025.05 - SwiftCraftBeyond the code. Complexity - 2025.05 - SwiftCraft
Beyond the code. Complexity - 2025.05 - SwiftCraft
Dmitrii Ivanov
 
Download 4k Video Downloader Crack Pre-Activated
Download 4k Video Downloader Crack Pre-ActivatedDownload 4k Video Downloader Crack Pre-Activated
Download 4k Video Downloader Crack Pre-Activated
Web Designer
 
Top Magento Hyvä Theme Features That Make It Ideal for E-commerce.pdf
Top Magento Hyvä Theme Features That Make It Ideal for E-commerce.pdfTop Magento Hyvä Theme Features That Make It Ideal for E-commerce.pdf
Top Magento Hyvä Theme Features That Make It Ideal for E-commerce.pdf
evrigsolution
 
Mastering Selenium WebDriver: A Comprehensive Tutorial with Real-World Examples
Mastering Selenium WebDriver: A Comprehensive Tutorial with Real-World ExamplesMastering Selenium WebDriver: A Comprehensive Tutorial with Real-World Examples
Mastering Selenium WebDriver: A Comprehensive Tutorial with Real-World Examples
jamescantor38
 
From Vibe Coding to Vibe Testing - Complete PowerPoint Presentation
From Vibe Coding to Vibe Testing - Complete PowerPoint PresentationFrom Vibe Coding to Vibe Testing - Complete PowerPoint Presentation
From Vibe Coding to Vibe Testing - Complete PowerPoint Presentation
Shay Ginsbourg
 
[gbgcpp] Let's get comfortable with concepts
[gbgcpp] Let's get comfortable with concepts[gbgcpp] Let's get comfortable with concepts
[gbgcpp] Let's get comfortable with concepts
Dimitrios Platis
 
Best HR and Payroll Software in Bangladesh - accordHRM
Best HR and Payroll Software in Bangladesh - accordHRMBest HR and Payroll Software in Bangladesh - accordHRM
Best HR and Payroll Software in Bangladesh - accordHRM
accordHRM
 
How to Install and Activate ListGrabber Plugin
How to Install and Activate ListGrabber PluginHow to Install and Activate ListGrabber Plugin
How to Install and Activate ListGrabber Plugin
eGrabber
 
A Comprehensive Guide to CRM Software Benefits for Every Business Stage
A Comprehensive Guide to CRM Software Benefits for Every Business StageA Comprehensive Guide to CRM Software Benefits for Every Business Stage
A Comprehensive Guide to CRM Software Benefits for Every Business Stage
SynapseIndia
 
Deploying & Testing Agentforce - End-to-end with Copado - Ewenb Clark
Deploying & Testing Agentforce - End-to-end with Copado - Ewenb ClarkDeploying & Testing Agentforce - End-to-end with Copado - Ewenb Clark
Deploying & Testing Agentforce - End-to-end with Copado - Ewenb Clark
Peter Caitens
 
wAIred_LearnWithOutAI_JCON_14052025.pptx
wAIred_LearnWithOutAI_JCON_14052025.pptxwAIred_LearnWithOutAI_JCON_14052025.pptx
wAIred_LearnWithOutAI_JCON_14052025.pptx
SimonedeGijt
 
Orion Context Broker introduction 20250509
Orion Context Broker introduction 20250509Orion Context Broker introduction 20250509
Orion Context Broker introduction 20250509
Fermin Galan
 
Surviving a Downturn Making Smarter Portfolio Decisions with OnePlan - Webina...
Surviving a Downturn Making Smarter Portfolio Decisions with OnePlan - Webina...Surviving a Downturn Making Smarter Portfolio Decisions with OnePlan - Webina...
Surviving a Downturn Making Smarter Portfolio Decisions with OnePlan - Webina...
OnePlan Solutions
 
Wilcom Embroidery Studio Crack Free Latest 2025
Wilcom Embroidery Studio Crack Free Latest 2025Wilcom Embroidery Studio Crack Free Latest 2025
Wilcom Embroidery Studio Crack Free Latest 2025
Web Designer
 
Download MathType Crack Version 2025???
Download MathType Crack  Version 2025???Download MathType Crack  Version 2025???
Download MathType Crack Version 2025???
Google
 
Time Estimation: Expert Tips & Proven Project Techniques
Time Estimation: Expert Tips & Proven Project TechniquesTime Estimation: Expert Tips & Proven Project Techniques
Time Estimation: Expert Tips & Proven Project Techniques
Livetecs LLC
 
NYC ACE 08-May-2025-Combined Presentation.pdf
NYC ACE 08-May-2025-Combined Presentation.pdfNYC ACE 08-May-2025-Combined Presentation.pdf
NYC ACE 08-May-2025-Combined Presentation.pdf
AUGNYC
 
Autodesk Inventor Crack (2025) Latest
Autodesk Inventor    Crack (2025) LatestAutodesk Inventor    Crack (2025) Latest
Autodesk Inventor Crack (2025) Latest
Google
 
Robotic Process Automation (RPA) Software Development Services.pptx
Robotic Process Automation (RPA) Software Development Services.pptxRobotic Process Automation (RPA) Software Development Services.pptx
Robotic Process Automation (RPA) Software Development Services.pptx
julia smits
 
Mobile Application Developer Dubai | Custom App Solutions by Ajath
Mobile Application Developer Dubai | Custom App Solutions by AjathMobile Application Developer Dubai | Custom App Solutions by Ajath
Mobile Application Developer Dubai | Custom App Solutions by Ajath
Ajath Infotech Technologies LLC
 
Beyond the code. Complexity - 2025.05 - SwiftCraft
Beyond the code. Complexity - 2025.05 - SwiftCraftBeyond the code. Complexity - 2025.05 - SwiftCraft
Beyond the code. Complexity - 2025.05 - SwiftCraft
Dmitrii Ivanov
 
Download 4k Video Downloader Crack Pre-Activated
Download 4k Video Downloader Crack Pre-ActivatedDownload 4k Video Downloader Crack Pre-Activated
Download 4k Video Downloader Crack Pre-Activated
Web Designer
 
Top Magento Hyvä Theme Features That Make It Ideal for E-commerce.pdf
Top Magento Hyvä Theme Features That Make It Ideal for E-commerce.pdfTop Magento Hyvä Theme Features That Make It Ideal for E-commerce.pdf
Top Magento Hyvä Theme Features That Make It Ideal for E-commerce.pdf
evrigsolution
 
Mastering Selenium WebDriver: A Comprehensive Tutorial with Real-World Examples
Mastering Selenium WebDriver: A Comprehensive Tutorial with Real-World ExamplesMastering Selenium WebDriver: A Comprehensive Tutorial with Real-World Examples
Mastering Selenium WebDriver: A Comprehensive Tutorial with Real-World Examples
jamescantor38
 
From Vibe Coding to Vibe Testing - Complete PowerPoint Presentation
From Vibe Coding to Vibe Testing - Complete PowerPoint PresentationFrom Vibe Coding to Vibe Testing - Complete PowerPoint Presentation
From Vibe Coding to Vibe Testing - Complete PowerPoint Presentation
Shay Ginsbourg
 
[gbgcpp] Let's get comfortable with concepts
[gbgcpp] Let's get comfortable with concepts[gbgcpp] Let's get comfortable with concepts
[gbgcpp] Let's get comfortable with concepts
Dimitrios Platis
 
Best HR and Payroll Software in Bangladesh - accordHRM
Best HR and Payroll Software in Bangladesh - accordHRMBest HR and Payroll Software in Bangladesh - accordHRM
Best HR and Payroll Software in Bangladesh - accordHRM
accordHRM
 
How to Install and Activate ListGrabber Plugin
How to Install and Activate ListGrabber PluginHow to Install and Activate ListGrabber Plugin
How to Install and Activate ListGrabber Plugin
eGrabber
 
A Comprehensive Guide to CRM Software Benefits for Every Business Stage
A Comprehensive Guide to CRM Software Benefits for Every Business StageA Comprehensive Guide to CRM Software Benefits for Every Business Stage
A Comprehensive Guide to CRM Software Benefits for Every Business Stage
SynapseIndia
 
Deploying & Testing Agentforce - End-to-end with Copado - Ewenb Clark
Deploying & Testing Agentforce - End-to-end with Copado - Ewenb ClarkDeploying & Testing Agentforce - End-to-end with Copado - Ewenb Clark
Deploying & Testing Agentforce - End-to-end with Copado - Ewenb Clark
Peter Caitens
 
wAIred_LearnWithOutAI_JCON_14052025.pptx
wAIred_LearnWithOutAI_JCON_14052025.pptxwAIred_LearnWithOutAI_JCON_14052025.pptx
wAIred_LearnWithOutAI_JCON_14052025.pptx
SimonedeGijt
 
Orion Context Broker introduction 20250509
Orion Context Broker introduction 20250509Orion Context Broker introduction 20250509
Orion Context Broker introduction 20250509
Fermin Galan
 
Surviving a Downturn Making Smarter Portfolio Decisions with OnePlan - Webina...
Surviving a Downturn Making Smarter Portfolio Decisions with OnePlan - Webina...Surviving a Downturn Making Smarter Portfolio Decisions with OnePlan - Webina...
Surviving a Downturn Making Smarter Portfolio Decisions with OnePlan - Webina...
OnePlan Solutions
 

Crystal internals (part 1)

  • 2. Is a compiler a hard thing?
  • 3. At Manas we usually do webapps
  • 4. Let’s talk about webapps...
  • 5. Let’s talk about webapps... ● HTML/CSS/JS ● React/Angular/Knockout ● Ruby/Erlang/Elixir ● Database (mysql/postgres) ● Elasticsearch ● Redis/Sidekiq/Background-jobs ● Docker, capistrano, deploy, servers
  • 6. Let’s talk about webapps... ● HTML/CSS/JS ● React/Angular/Knockout ● Ruby/Erlang/Elixir ● Database (mysql/postgres) ● Elasticsearch ● Redis/Sidekiq/Background-jobs ● Docker, capistrano, deploy, servers Easy…?
  • 7. Let’s talk about compilers... ● HTML/CSS/JS ● React/Angular/Knockout ● Ruby/Erlang/Elixir ● Database (mysql/postgres) ● Elasticsearch ● Redis/Sidekiq/Background-jobs ● Docker, capistrano, deploy, servers Easy!
  • 8. Let’s talk about compilers...
  • 9. Let’s talk about compilers...
  • 10. No, let’s talk about usual programs
  • 11. No, let’s talk about usual programs INPUT -> [PROCESSING…] -> OUTPUT
  • 12. No, let’s talk about compilers SOURCE CODE -> [PROCESSING…] -> EXECUTABLE
  • 13. No, let’s talk about compilers SOURCE CODE -> [PROCESSING…] -> EXECUTABLE How do we go from source code to an executable?
  • 14. Traditional stages of a compiler class Foo def bar 1 + 2 end end ● Lexer: [“class”, “Foo”, “;”, “def”, “bar”, “;”, “1”, “+”, “2”, “;”, “end”, “;”, “end”] ● Parser: ClassDef(“Foo”, body: [Def.new(“bar”)]) ● Semantic (a.k.a “type check”): make sure there are no type errors ● Codegen: generate machine code
  • 15. Let’s start with the codegen phase Goal: generate efficient assembly code for many architectures (32 bits, 64 bits, intel, arm, etc.) ● Generating assembly code is hard ● Generating efficient assembly code is harder ● Generating assembly code for many architectures is hard/tedious/boring
  • 16. Let’s start with the codegen phase Goal: generate efficient assembly code for many architectures (32 bits, 64 bits, intel, arm, etc.) ● Generating assembly code is hard ● Generating efficient assembly code is harder ● Generating assembly code for many architectures is hard/tedious/boring Thus: writing a compiler is HARD! :-(
  • 17. Let’s start with the codegen phase Goal: generate efficient assembly code for many architectures (32 bits, 64 bits, intel, arm, etc.) ● Generating assembly code is hard ● Generating efficient assembly code is harder ● Generating assembly code for many architectures is hard/tedious/boring Thus: writing a compiler is HARD! :-( Well, not anymore...
  • 20. Codegen With LLVM, we generate LLVM IR (internal representation) instead of assembly, and LLVM takes care of generating efficient assembly code for us! The hardest part is solved :-)
  • 21. define i32 @add(i32 %x, i32 %y) { %0 = add i32 %x, %y ret i32 %0 } Codegen: LLVM (example)
  • 22. LLVM provides a nice API to generate IR require "llvm" mod = LLVM::Module.new("main") mod.functions.add("add", [LLVM::Int32, LLVM::Int32], LLVM::Int32) do |func| func.basic_blocks.append do |builder| res = builder.add(func.params[0], func.params[1]) builder.ret(res) end end puts mod
  • 23. ● Lexer ● Parser ● Semantic Remaining phases
  • 24. ● Kind of easy: go char by char until we get a keyword, identifier, number, etc. ● We won’t go into implementation details... Lexer
  • 25. ● Kind of easy: go token by token and create a tree of expressions ● This tree is called AST: Abstract Syntax Tree ● An AST is like a directed, acyclic graph ● We won’t go into implementation details... Parser
  • 26. ● This is the fundamental piece of the compiler ● It takes an AST as input and analyzes it ● Analysis can result in: ○ Declaring types: for example “class Foo; end” will declare a type Foo ○ Checking methods: for example “Foo.bar” will check that “Foo” is a declared type and that the method “bar” exists in it, and has the correct arity and types ○ Giving each non-dead expression in the program a type ○ Gathering some info for the codegen phase: for example know the local variables of a method, and their type Semantic
  • 27. ● The interesting part of the compiler is the semantic phase ● It’s just about processing an AST ● In Crystal’s compiler you just need to know one language: Crystal! ● No HTML/CSS/JS/JSX/etc. ● No untyped, dynamic languages: no Ruby/Erlang/Elixir. Type safe! ● Stuff is processed in memory ● No databases, no Elasticsearch, no Redis Semantic
  • 28. ● The interesting part of the compiler is the semantic phase ● It’s just about processing an AST ● In Crystal’s compiler you just need to know one language: Crystal! ● No HTML/CSS/JS/JSX/etc. ● No untyped, dynamic languages: no Ruby/Erlang/Elixir. Type safe! ● Stuff is processed in memory ● No databases, no Elasticsearch, no Redis Writing a compiler is easier than writing a web app! ^_^ Semantic
  • 29. ● The interesting part of the compiler is the semantic phase ● It’s just about processing an AST ● In Crystal’s compiler you just need to know one language: Crystal! ● No HTML/CSS/JS/JSX/etc. ● No untyped, dynamic languages: no Ruby/Erlang/Elixir. Type safe! ● Stuff is processed in memory ● No databases, no Elasticsearch, no Redis Writing a compiler is easier than writing a web app! ^_^ (Or at least it’s more fun :-P) Semantic
  • 31. Directory layout ● src/compiler/crystal ○ command/ ○ syntax/ ○ semantic/ ○ macros/ ○ codegen/ ○ tools/ ○ compiler.cr ○ types.cr ○ program.cr
  • 32. Directory layout ● src/compiler/crystal ○ command/ : the command line interface ○ syntax/ : lexer, parser, ast, visitor, transformer ○ semantic/ : type declaration, method lookup, etc. ○ macros/ : macro expansion logic ○ codegen/ : codegen ○ tools/ : doc generator, formatter, init ○ compiler.cr : combines syntax + semantic + codegen ○ types.cr : all possible types in Crystal (Int32, String, unions, custom types, etc.) ○ program.cr : holds definitions of a program (holds Int32, String, etc.)
  • 33. Directory layout ● src/compiler/crystal : ~43K LOC ○ command/ : ~300LOC ○ syntax/ : ~10K LOC ○ semantic/ : ~12K LOC ○ macros/ : ~2K LOC ○ codegen/ : ~6K LOC ○ tools/ : ~7K LOC ○ compiler.cr : ~300LOC ○ types.cr :~2K LOC ○ program.cr : ~300 LOC
  • 34. Directory layout ● src/compiler/crystal : ~43K LOC ○ command/ : ~300LOC ○ syntax/ : ~10K LOC ○ semantic/ : ~12K LOC ○ macros/ : ~2K LOC ○ codegen/ : ~6K LOC ○ tools/ : ~7K LOC ○ compiler.cr : ~300LOC ○ types.cr :~2K LOC ○ program.cr : ~300 LOC About 14K LOC to analyze source code.
  • 35. Directory layout ● src/compiler/crystal : ~43K LOC ○ command/ : ~300LOC ○ syntax/ : ~10K LOC ○ semantic/ : ~12K LOC ○ macros/ : ~2K LOC ○ codegen/ : ~6K LOC ○ tools/ : ~7K LOC ○ compiler.cr : ~300LOC ○ types.cr :~2K LOC ○ program.cr : ~300 LOC About 14K LOC to analyze source code. One big Rails app at Manas has 14K LOC in “./app”
  • 36. Directory layout ● src/compiler/crystal : ~43K LOC ○ command/ : ~300LOC ○ syntax/ : ~10K LOC ○ semantic/ : ~12K LOC ○ macros/ : ~2K LOC ○ codegen/ : ~6K LOC ○ tools/ : ~7K LOC ○ compiler.cr : ~300LOC ○ types.cr :~2K LOC ○ program.cr : ~300 LOC About 14K LOC to analyze source code. One big Rails app at Manas has 14K LOC in “./app” A compiler can’t be that hard! ;-)
  • 37. Show me the code
  • 38. Show me the code # src/compiler/crystal/compiler.cr def compile(source : Source | Array(Source), output_filename : String) : Result source = [source] unless source.is_a?(Array) program = new_program(source) node = parse program, source node = program.semantic node, @stats codegen program, node, source, output_filename unless @no_codegen Result.new program, node end
  • 39. Show me the code # src/compiler/crystal/compiler.cr def compile(source : Source | Array(Source), output_filename : String) : Result source = [source] unless source.is_a?(Array) program = new_program(source) node = parse program, source node = program.semantic node, @stats codegen program, node, source, output_filename unless @no_codegen Result.new program, node end
  • 40. Show me the code # src/compiler/crystal/compiler.cr def compile(source : Source | Array(Source), output_filename : String) : Result source = [source] unless source.is_a?(Array) program = new_program(source) node = parse program, source node = program.semantic node, @stats codegen program, node, source, output_filename unless @no_codegen Result.new program, node end What is a program?
  • 41. Program ● Holds all types and top-level methods for a given compilation ● For example, if I compile “class Foo; end” and you compile “class Bar; end”, the first program will have a type named “Foo”, and the second one won’t (but it will have a type named “Bar”) ● It lets us test the compiler more easily, because we can use different Program instances for each snippet of code that we want to test ● In contrast of having global variables holding all of a program’s data ● A Program is passed around in all phases of a compilation (except lexing and parsing, which don’t need semantic info)
  • 42. Show me the code # src/compiler/crystal/compiler.cr def compile(source : Source | Array(Source), output_filename : String) : Result source = [source] unless source.is_a?(Array) program = new_program(source) node = parse program, source # from source to Crystal::ASTNode node = program.semantic node, @stats codegen program, node, source, output_filename unless @no_codegen Result.new program, node end What is a program?
  • 43. Show me the code # src/compiler/crystal/compiler.cr def compile(source : Source | Array(Source), output_filename : String) : Result source = [source] unless source.is_a?(Array) program = new_program(source) node = parse program, source node = program.semantic node, @stats # Semantic! :-) codegen program, node, source, output_filename unless @no_codegen Result.new program, node end What is a program?
  • 44. Semantic ● The entry point for semantic analysis is in src/compiler/crystal/semantic.cr ● Other files are in src/compiler/crystal/semantic/ ● The file semantic.cr has comments that explain the overall algorithm :-)
  • 45. Semantic: overall algorithm ● top level: declare classes, modules, macros, defs and other top-level stuff ● new methods: create `new` methods for every `initialize` method ● type declarations: process type declarations like `@x : Int32` ● check abstract defs: check that abstract defs are implemented ● class_vars_initializers: process initializers like `@@x = 1` ● instance_vars_initializers: process initializers like `@x = 1` ● main: process "main" code, calls and method bodies (the whole program). ● cleanup: remove dead code and other simplifications ● check recursive structs: check that structs are not recursive (impossible to codegen)
  • 46. Semantic: overall algorithm Note! ● This algorithm didn’t come from the Skies (nor from a textbook, nor from a paper) ● It’s not written in stone! ● It can definitely be improved: readability, performance, etc.
  • 47. Note! ● It’s actually more like this… Semantic: overall algorithm
  • 48. Semantic But before looking at each phase, we need to learn about the most useful pattern for analyzing an AST...
  • 50. require "compiler/crystal/syntax" class SumVisitor < Crystal::Visitor getter sum = 0 def visit(node : Crystal::NumberLiteral) @sum += node.value.to_i end def visit(node : Crystal::ASTNode) true # true: continue visiting children nodes end end ast = Crystal::Parser.parse("foo(1 + 2, 3, [4])") visitor = SumVisitor.new ast.accept(visitor) puts visitor.sum
  • 51. The Visitor pattern ● We define a visit method for each node of interest ● We process the nodes ● We return true if we want to process children, false otherwise ● Example: if we only want to process class declarations, we could just define visit(node : Crystal::ClassDef) and define some logic there (and return true, because of nested class definitions) ● A visitor abstracts over the way nodes are composed ● ...though in many cases, for semantic purposes, we need and use the way a node is composed (for example, to analyze a call we need to know the argument types, so we check the arguments, not all children in a generic way)
  • 52. Semantic: overall algorithm ● top level: declare classes, modules, macros, defs and other top-level stuff ● new methods ● type declarations ● check abstract defs ● class_vars_initializers ● instance_vars_initializers ● main ● cleanup ● check recursive structs
  • 53. Top level: declare classes, modules, macros, defs... # src/compiler/crystal/semantic/top_level_visitor.cr class Crystal::TopLevelVisitor < Crystal::SemanticVisitor # ... end
  • 54. ● Located at semantic_visitor.cr ● This is a base visitor used in most of the phases of the semantic analysis ● It keeps track of the “current type” ● For example in “class Foo; class Bar; baz; end; end”, “current type” starts at the top-level (the Program). When “class Foo” is found, the current type becomes “Foo” (we search “Foo” in the current type). When “class Bar” is found, the current type becomes “Foo::Bar” (we search “Bar” in the current type). When “baz” is found, it will be looked up inside the current type. ● But initially there’s no “Foo” inside the current type (the Program). Who defines it? … The top-level visitor! Crystal::SemanticVisitor
  • 55. ● Located at top_level_visitor.cr ● Defines classes, methods, etc. ● Given “class Foo; class Bar; baz; end; end”... ● current_type starts at Program ● When “class Foo” is found (ClassDef), we check if “Foo” exists in the current type. If not, we create it. If it exists with a different type (if it’s a module), we give an error. ● We attach this type “Foo” to the AST node ClassDef. SemnticVisitor will use this in every subsequent phase. ● … the “baz” call is not analyzed here (unless it’s a macro) Crystal::TopLevelVisitor
  • 56. Crystal::TopLevelVisitor ● Many other things done in this visitor: methods and macros are added to types, aliases and enums are defined, etc. ● Question: why are methods and macros defined at this phase?
  • 57. ● The “inherited” macro hook must be processed as soon as “Bar < Foo” and “Baz < Foo” are found ● The macro expands to “do_something”, which must expand to “def foo; 1; end” ● This must happen before we continue processing Baz’s body: “def foo; 3; end” must win and be the method found when doing “Baz.new.foo” ● Conclusion: methods, macros and hooks must be defined in the first pass, when defining types. Additionally, macros might be looked up in types in this same pass (like “do_something”) ● SemanticVisitor takes care to look up and expand calls that resolve to macro calls When should macros be defined and expanded class Foo macro inherited do_something end macro do_something def foo; 1; end end end class Bar < Foo; end class Baz < Foo def foo; 3; end end puts Bar.new.foo # => 1 puts Baz.new.foo # => 3
  • 58. Method overloads ● Crystal methods are very powerful! For example: optional type restrictions, different number of arguments, default arguments, splat, etc. ● When methods are added to types we need to: ○ Know if a method replaces (redefines) an old method ○ Track whether a method is “stricter” than another method, to quickly know, given a call argument types, in which order they are going to be tested
  • 59. Method restrictions def foo(x : Int32) puts 1 end def foo(x) puts 2 end foo(1) foo('a') ● Given foo(1), both methods match it. However, the first overload should be invoked because it has a stronger restriction than the second overload. ● If we define the methods in a different order, it still works the same ● This is because an argument with a type restriction is stronger than one without one. We say that the first one is a restriction of the second one (we should probably rename this to use stronger) ● This applies to types too: Int32 is stronger than Int32 | String. And Bar is stronger than Foo, if Bar < Foo. ● Given two methods with the same name, if all arguments of a method are stronger than the others’, the whole method is stronger and should come first. Each type stores an ordered list of methods indexed by method name, with this notion. ● If the methods are both stronger than each other, they have the same restriction.
  • 60. Method restrictions def foo(x : Int32) puts 1 end def foo(x) puts 2 end foo(1) foo('a') ● This logic is located at restrictions.cr ● A lot of cases to consider: generics, tuples, splats, etc. ● The code and algorithms could probably use a simpler, unified logic and a cleanup, but first all of these concepts and definitions must be defined much more formally
  • 61. Semantic: overall algorithm ● top level ● new methods: create `new` methods for every `initialize` method ● type declarations ● check abstract defs ● class_vars_initializers ● instance_vars_initializers ● main ● cleanup ● check recursive structs
  • 62. ● Located at new.cr ● TopLevelVisitor creates a `new` class method for every `initialize` method it finds (the logic for this is also in new.cr) ● Classes that end up without an `initialize` need a default, argless `self.new` method ● This phase is a bit messy right now because of some missing things related to generics… Semantic: new methods
  • 63. class Foo def initialize(x : Int32) @x = x end # Generated from the above def self.new(x : Int32) instance = allocate instance.initialize(x) if instance.responds_to?(:finalize) ::GC.add_finalizer(instance) end end end Semantic: new methods
  • 64. Semantic: overall algorithm ● top level ● new methods ● type declarations: process type declarations like `@x : Int32` ● check abstract defs ● class_vars_initializers ● instance_vars_initializers ● main ● cleanup ● check recursive structs
  • 65. ● Located at type_declaration_processor.cr (and type_declaration_visitor.cr and type_guess_visitor.cr) ● Combines info gathered by these two visitors to declare the type of instance and class variables. ● TypeDeclarationVisitor deals with explicit type declarations ● TypeGuessVisitor tries to “guess” the type of instance and class variables without an explicit type annotations (for example @x = 1 and @x = Foo.new) Semantic: type declarations
  • 66. Semantic: overall algorithm ● top level ● new methods ● type declarations ● check abstract defs: check that abstract defs are implemented ● class_vars_initializers ● instance_vars_initializers ● main ● cleanup ● check recursive structs
  • 67. ● Located at abstract_def_checker.cr ● Not a visitor, but traverses all types, and for those that have abstract defs checks that subclasses or including modules defined those methods Semantic: check abstract defs
  翻译: