From 3753f344c75af77aeaa8491b22a8ca890cb471d8 Mon Sep 17 00:00:00 2001 From: Adam Date: Sat, 23 Apr 2022 21:06:29 +0200 Subject: Better css margins, adamski.wtf -> adast.xyz --- dst/SPO notes.html | 132 ++++++++++++++++++++++++++--------------------------- 1 file changed, 66 insertions(+), 66 deletions(-) (limited to 'dst/SPO notes.html') diff --git a/dst/SPO notes.html b/dst/SPO notes.html index 178a9cb..cbffc3e 100644 --- a/dst/SPO notes.html +++ b/dst/SPO notes.html @@ -3,12 +3,12 @@ -Exam — adamski.wtf +Exam — adast.xyz -

adamski.wtf

-

Exam

+

adast.xyz

+

Exam

-

Language processors

+

Language processors

Most important problem in computing : increasing programmer productivity to improve speed of software release.

@@ -36,7 +36,7 @@ to improve speed of software release.

You can gain deeper understanding of programming through learning about compilers.

-

Programming languages

+

Programming languages

-

Why are there so many programming languages?

+

Why are there so many programming languages?

Programming has evolved over time because different programming languages can be desinged for different types of programs.

-

Types of programming languages

+

Types of programming languages

-

EBNF

+

EBNF

A non-standardised extension to BNF notation.

@@ -403,9 +403,9 @@ N ::= a | B | ...

EBNF can also be rewritten to standard BNF fairly easily.

-

Context free grammars

+

Context free grammars

-

Properties of grammars (non-terminals)

+

Properties of grammars (non-terminals)

-

Grammar transformations

+

Grammar transformations

-

From tokens to parse tree

+

From tokens to parse tree

Parsing is finding the structure in a stream of tokens. This is done by the parser.

@@ -430,15 +430,15 @@ N ::= a | B | ... - top-down (Leftmost derivation / LL), constructed in pre-order. - bottom-up (Rightmost derivation in reverse / LR), constructed in post-order.

-

First sets

+

First sets

The set of all terminal symbols, in order, derived from a sentence S. The empty string λ is never included.

-

Follow sets

+

Follow sets

The set of terminals that can follow a non-terminal A, where the EOF symbol is $.

-

LL(1) grammars

+

LL(1) grammars

-

LR grammars

+

LR grammars

-

Lexical analysis

+

Lexical analysis

-

Developing a scanner

+

Developing a scanner

The scanner/lexer is what reads through source code character by character and divides them into lexemes and tokens. This includes whitespace which can have meaning for languages like Python.

@@ -466,7 +466,7 @@ N ::= a | B | ...

How do we know how to implement a scanner? Read the language specification. Included in the specification is typically tokens/lexemes as regular expressions and reserved words.

-

Lexical elements

+

Lexical elements

-

Generating scanners

+

Generating scanners

Something about REs and the different types of FAs are equivalent. There is an algorithm for transforming NDFA with epsilon into DFAs. DFAs are easily implementable by computers.

A scanner generator a program that takes tokens as REs and will generate a scanner based on a FAs in whatever language. JLex/Lex/CSharpLex are tools for generating a scanner that recognizes tokens.

-

Performance considerations

+

Performance considerations

Performance of scanners is important for production compilers. Size of scanner sometimes matters. Note: modern scanners use explicit control, not table, because bigger tables may perform worse on modern architectures.

-

Top down parsing

+

Top down parsing

The job of the syntax analysis, is to read to source text and determine it’s phase structure. This can be used to construct the AST. The input to the scanner is the stream of characters which output a stream of tokens to the parser or reports errors. The parser produces the AST or reports errors again.

@@ -532,7 +532,7 @@ The cat sees a rat

This is equivalent to the parse tree that would be produced from the sentence. So Top down is a straight-forward parsing algorithm. The parse structure corresponds to the set of procedures.

-

Recursive decent parsing

+

Recursive decent parsing

Each non-terminal will have a parse method. There will also need to be a method that asks for the next token. This function will either return the this token or give an error in the event of an unexepected token.

@@ -546,7 +546,7 @@ The cat sees a rat

This is a little more involved when there are alternatives. This can be implemented using if-elseif or a switch structure that checks for the alternatives.

-

Algo to convert EBNF into a RD parser

+

Algo to convert EBNF into a RD parser

Foreach non-terminal in the grammar, create a parse method. Parsing a terminal just uses the ‘accept’ method. Parsing of non-terminals is just method calls to those non-terminal’s parse methods in order. If there are repetitions, (0 .. more) you use a while loop. As long as the current token is in the first set, you keep parsing it. Use a switch case of if-else for alternatives.

@@ -554,21 +554,21 @@ The cat sees a rat

If we want the parser to return an AST each of the parse methods should return their little bit of the AST instead of void. (Think of our visitor methods).

-

The visitor pattern (briefly)

+

The visitor pattern (briefly)

For OOP the visitor pattern enbales the definition of a new operator on an object structure without changing classes of the objects.

-

LL(1) grammars

+

LL(1) grammars

A grammar which can be parsed with a top-down parse with a lookahead (in the input stream of tokens) of one token.

Lecture 7 @ 32:00 for formal definition of LL(1) grammar.

-

Table-driven LL(1) parsers

+

Table-driven LL(1) parsers

A table is generated that simulates the way the regular parser’s stack works. This presumably doesn’t consume as much memory as a traditional parser’s stack. The table is constructed with non-terminals and the lookahead terminals. The table tells which rule to use in the grammar. 49:00 Lecture 7. ANTLR an example of a Java parser generator, that allows LL(*) grammars and is table-driven.

-

Bottom up parsing

+

Bottom up parsing

LR languages are more powerful than LL. - LR(0), SLR(1), LALR(1), LR(k) grammars. @@ -588,7 +588,7 @@ The cat sees a rat

There are tools that can automate generating the bottom up parser. 50:00 Lecture 8.

-

Terminology 15:00 Lecture 8

+

Terminology 15:00 Lecture 8

-

The shift reduce process 18:00 Lecture 8

+

The shift reduce process 18:00 Lecture 8

This is how bottom up parser work. They have a stack and a table of operations. This can also be represented as a knitting game.

@@ -613,36 +613,36 @@ If there is an empty state in the lookup table, we know we have an error.

How is the parse table built? The table is coded as a DFA (28:00 Lecture 8).

-

LR(0) conflicts

+

LR(0) conflicts

This algo doesn’t always work. A LR(0) conflict is a situation (DFA state) in which there is more than one possible action for the algo. There are two kinds of conflicts: - Shift-reduce: can’t decide between shift or reduce. - Reduce-reduce: can’t decide between two or more reductions for different grammar rules.

-

The shift-reduce problem

+

The shift-reduce problem

This is when you can either shift or reduce. This is a sign that the grammar is not LR(0). There are tools that can be used to input a grammar and output what kind of grammar it is.

-

LR(0) vs SLR(1)

+

LR(0) vs SLR(1)

SLR(1) looks at the symbol after the handle (follow set). This means fewer reduce-actoins and therefore removes a lot of potential conflicts.

-

LR(1)

+

LR(1)

While SLR just uses the follow set for each production. LR(1) looks at the follow set for each handle. This gives even more reductions. LR(1) parse tables are very big. Expensive but almost all common languages are LR(1).

-

LALR(1)

+

LALR(1)

A variant of LR(1) that gives smaller parse tables. This is done by combining some states in the parse tree. In practice most practical languages are LALR(1).

-

Parser conflict resolution

+

Parser conflict resolution

Parsing conflicts are commonly caused by ambiguous grammars. In practice, these ambiguities are solved using a resolution rule however this may or may not do what you want (fx: dangling-else).

Parsing conflicts must be fixed. You cannot rely on the resolution rule.

-

Abstract syntax trees (AST)

+

Abstract syntax trees (AST)

Like a parse tree but with some details omitted.

@@ -653,18 +653,18 @@ Synthesized and inherited attributes.

It should be possible to unparse an AST (reconstruct the source from an AST).

-

Single-pass vs multi-pass compilers

+

Single-pass vs multi-pass compilers

Single-pass makes a single pass over source text, parsing, analyzing, and generating code all at once. This means that the contextual analyzer and code generator are called from the parser implementation.

Multi-pass makes several passes over the program. This is the modern approach. The output of a preceeding phase is stored in a data structure and used by subsequent phases. This is more readable and maintainable as the code is more broken down into independant modules.

-

Visitor pattern

+

Visitor pattern

35:00 Lecture 9.

-

Contextual analysis

+

Contextual analysis

Identification and type checking are combined into a depth-first traversal of the AST.

- + -- cgit v1.2.3-70-g09d2