diff options
Diffstat (limited to 'dst/SPO notes.html')
-rw-r--r-- | dst/SPO notes.html | 132 |
1 files changed, 66 insertions, 66 deletions
diff --git a/dst/SPO notes.html b/dst/SPO notes.html index 178a9cb..cbffc3e 100644 --- a/dst/SPO notes.html +++ b/dst/SPO notes.html @@ -3,12 +3,12 @@ <meta charset="UTF-8"> <meta name="viewport" content="width=device-width, initial-scale=1.0"> -<title>Exam — adamski.wtf</title> +<title>Exam — adast.xyz</title> <link rel="icon" type="image/png" href="favicon.png"> <link rel="stylesheet" href="styles/style.css"> -<p class="header"><a href="/">adamski.wtf</a></p> -<h1 id="Exam">Exam</h1> +<p class="header"><a href="/">adast.xyz</a></p> +<h1 id="exam">Exam</h1> <ul> <li>15 minute video presentation</li> @@ -16,7 +16,7 @@ <li>Chosen subject appears on digital exam</li> </ul> -<h1 id="Language%20processors">Language processors</h1> +<h1 id="language-processors">Language processors</h1> <p>Most important problem in computing : increasing programmer productivity to improve speed of software release.</p> @@ -36,7 +36,7 @@ to improve speed of software release.</p> <p>You can gain deeper understanding of programming through learning about compilers.</p> -<h2 id="Programming%20languages">Programming languages</h2> +<h2 id="programming-languages">Programming languages</h2> <ul> <li>Set of rules to tell a computer what operations to perform</li> @@ -45,11 +45,11 @@ to improve speed of software release.</p> <li>Symbols, words, rules of grammar, rules of semantics</li> </ul> -<h3 id="Why%20are%20there%20so%20many%20programming%20languages?">Why are there so many programming languages?</h3> +<h3 id="why-are-there-so-many-programming-languages">Why are there so many programming languages?</h3> <p>Programming has evolved over time because different programming languages can be desinged for different types of programs.</p> -<h3 id="Types%20of%20programming%20languages">Types of programming languages</h3> +<h3 id="types-of-programming-languages">Types of programming languages</h3> <ul> <li>Low level - high level 1st-2nd gen</li> @@ -65,7 +65,7 @@ to improve speed of software release.</p> <p>We are in the “multi-paradigm era”, Anders Hejlsberg says languages in of the future will be a mix of all existing paradigms.</p> -<h3 id="What%20is%20a%20good%20language?">What is a good language?</h3> +<h3 id="what-is-a-good-language">What is a good language?</h3> <ul> <li>Formerly: run-time performance</li> @@ -93,13 +93,13 @@ to improve speed of software release.</p> - Type rules - Semantics</p> -<h3 id="Does%20syntax%20matter?">Does syntax matter?</h3> +<h3 id="does-syntax-matter">Does syntax matter?</h3> <p>Syntax is the visible part of the language. The paradigm is the next most visible. Most invisible part is the language semantics. But clear semantics usually means simple and efficient implementations.</p> <p>Yes. Syntax matters. </p> -<h2 id="Language%20Processors">Language Processors</h2> +<h2 id="language-processors-1">Language Processors</h2> <ul> <li>Text Editors / IDEs</li> @@ -111,11 +111,11 @@ to improve speed of software release.</p> <li>Interpreters</li> </ul> -<h3 id="Interpreters">Interpreters</h3> +<h3 id="interpreters">Interpreters</h3> <p>Take source code and input and produces output. For example, web browser.</p> -<h3 id="Compilation">Compilation</h3> +<h3 id="compilation">Compilation</h3> <p>A 2-step process. Source program is inputted to compiler and a new program target program is outputted from the compiler. However this is much more complicated. Lexical-, syntax-, semantic analysis, code generation (IL)?. Sometimes you may generate high-level code. For example, taking source and converting it to C code.</p> @@ -126,7 +126,7 @@ to improve speed of software release.</p> <p>Symbol tables can be accessed from several phases.</p> -<h3 id="Compiler%20design">Compiler design</h3> +<h3 id="compiler-design">Compiler design</h3> <p>Compiler design is strong affected and strong affects programming language design. Advantages of good compiler: easier to learn, read, understand. Fewer bugs. More reliable. More diagnostic messages and tools.</p> @@ -151,11 +151,11 @@ to improve speed of software release.</p> <li>Code-generation tools</li> </ul> -<h1 id="Programming%20language%20evolution%20and%20tombstone%20diagrams">Programming language evolution and tombstone diagrams</h1> +<h1 id="programming-language-evolution-and-tombstone-diagrams">Programming language evolution and tombstone diagrams</h1> <p>Most new languages are invented out of frustration. When a new language is created it tends to be because it would be an improvement over some existing language. For example, Fortran was invented to improve programmer productivity.</p> -<h2 id="Programming%20language%20design">Programming language design</h2> +<h2 id="programming-language-design">Programming language design</h2> <ol> <li>Create ideas @@ -173,7 +173,7 @@ to improve speed of software release.</p> <p>The programming language critera table can be used to generate ideas for or evaluate a language’s design. Reusability and relocatability of code has become a more and more important criteria over time. </p> -<h2 id="Programming%20language%20history">Programming language history</h2> +<h2 id="programming-language-history">Programming language history</h2> <p>40-50s: Assembly languages made programming more readable more usable. Came with a compiler that could translate assembly to machine operations.</p> @@ -189,7 +189,7 @@ to improve speed of software release.</p> <p>10s: Multi-paradigm (OO + FP)</p> -<h2 id="Tombstone%20diagrams%20-%20gravstens%20diagramer">Tombstone diagrams - gravstens diagramer</h2> +<h2 id="tombstone-diagrams---gravstens-diagramer">Tombstone diagrams - gravstens diagramer</h2> <p>A notation consisting of puzzle pieces that can be used to reason about language processors and programs.</p> @@ -210,9 +210,9 @@ to improve speed of software release.</p> <p>Bootstrapping can be used as a way to generate more efficient compilers.</p> -<h1 id="The%20ac%20language%20and%20compiler">The ac language and compiler</h1> +<h1 id="the-ac-language-and-compiler">The ac language and compiler</h1> -<h2 id="Phases%20of%20a%20compiler">Phases of a compiler</h2> +<h2 id="phases-of-a-compiler">Phases of a compiler</h2> <p>Phases: - Syntax analysis -> Error reports | AST @@ -239,7 +239,7 @@ to improve speed of software release.</p> <p>Modern compilers all make use of symbol tables. This can generally be used in any phase of compilation but it is now best practice to only access it from type checker / contextual analysis.</p> -<h2 id="Phases%20of%20a%20simple%20compiler">Phases of a simple compiler</h2> +<h2 id="phases-of-a-simple-compiler">Phases of a simple compiler</h2> <ul> <li>Syntax analysis @@ -273,7 +273,7 @@ to improve speed of software release.</p> <p>Code generation: formulation of target-machine instructions that represent the semantics of the source program.</p> -<h2 id="The%20ac%20language%20(adding%20calculator)">The ac language (adding calculator)</h2> +<h2 id="the-ac-language-adding-calculator">The ac language (adding calculator)</h2> <p>A simple language for arithmetic. Monolithic scope, meaning a variable can only be declared once. Target of translation: dc (desk calculator)</p> @@ -286,7 +286,7 @@ b = a + 3.2 p b // print b </code></pre> -<h3 id="Formal%20definition%20of%20ac">Formal definition of ac</h3> +<h3 id="formal-definition-of-ac">Formal definition of ac</h3> <ul> <li>Syntax specification @@ -301,17 +301,17 @@ p b // print b </ul></li> </ul> -<h2 id="Context%20free%20grammar%20(CFG)">Context free grammar (CFG)</h2> +<h2 id="context-free-grammar-cfg">Context free grammar (CFG)</h2> <p>A set of production rules for a language. Two types of symbols: terminals and non-terminals. Terminals cannot be rewritten (id, num, etc). Non-terminals are rules like (prog, scope, etc).</p> <p>A syntax tree / parse tree can be used to show the production rules of some source code in a tree structure.</p> -<h2 id="Regular%20expressions%20(RE)">Regular expressions (RE)</h2> +<h2 id="regular-expressions-re">Regular expressions (RE)</h2> <p>For specifying tokens. For example: id, num, “+”, “-”, etc.</p> -<h1 id="Language%20specifications">Language specifications</h1> +<h1 id="language-specifications">Language specifications</h1> <p>A communication device between people who need a common understanding of the programming language or PL. For example, the language desinger, implementor, user, etc.</p> @@ -334,7 +334,7 @@ p b // print b - Semantics (dynamic semantics) - Informal or formal</p> -<h2 id="Syntax%20terminology">Syntax terminology</h2> +<h2 id="syntax-terminology">Syntax terminology</h2> <ul> <li>Sentence: string of characters over some alphabet</li> @@ -343,14 +343,14 @@ p b // print b <li>Token: a category of lexemes (e.g. identifier)</li> </ul> -<h2 id="Formal%20definition%20of%20languages">Formal definition of languages</h2> +<h2 id="formal-definition-of-languages">Formal definition of languages</h2> <ul> <li>Generators: a device that generates senteces of a language.</li> <li>Recognizer: reads input strings over the alphabet and decides whether the input strings belong to the language.</li> </ul> -<h2 id="Syntax%20specification">Syntax specification</h2> +<h2 id="syntax-specification">Syntax specification</h2> <p>Syntax is specified suing a CFG. - a finite set of terminals (tokens) @@ -372,7 +372,7 @@ N ::= a | B | ... <p>In BNF, syntactic lists are described using recursion.</p> -<h2 id="Derivations">Derivations</h2> +<h2 id="derivations">Derivations</h2> <p>This is where you take some source code and you step through the CFG with it to see which rules are rewritten to deeper rules, until you get the end result. This can also be represented as a parse tree.</p> @@ -380,7 +380,7 @@ N ::= a | B | ... <p>Leftmost derivation always choses the leftmost possible non-terminal and vice-versa for rightmost derivation.</p> -<h2 id="Ambiguity%20in%20grammars">Ambiguity in grammars</h2> +<h2 id="ambiguity-in-grammars">Ambiguity in grammars</h2> <p>A grammar is ambiguous if a string (sentence) can be defined by more than one different parse trees. Ususally ambiguity is undesirable and CFGs are rewritten to disallow this behaviour. This rewritting process can be done systematically.</p> @@ -391,7 +391,7 @@ N ::= a | B | ... <li>Ambiguity in grammars is an undecidable property.</li> </ul> -<h2 id="EBNF">EBNF</h2> +<h2 id="ebnf">EBNF</h2> <p>A non-standardised extension to BNF notation.</p> @@ -403,9 +403,9 @@ N ::= a | B | ... <p>EBNF can also be rewritten to standard BNF fairly easily.</p> -<h1 id="Context%20free%20grammars">Context free grammars</h1> +<h1 id="context-free-grammars">Context free grammars</h1> -<h2 id="Properties%20of%20grammars%20(non-terminals)">Properties of grammars (non-terminals)</h2> +<h2 id="properties-of-grammars-non-terminals">Properties of grammars (non-terminals)</h2> <ul> <li>Left-recursive: <code>N ::= N ...</code></li> @@ -414,7 +414,7 @@ N ::= a | B | ... <li>Useless: if a non-terminal can never produce a string of terminal symbols.</li> </ul> -<h2 id="Grammar%20transformations">Grammar transformations</h2> +<h2 id="grammar-transformations">Grammar transformations</h2> <ul> <li>Left factorization: <code>X Y | X Z</code> is transformed to <code>X (Y | Z)</code></li> @@ -422,7 +422,7 @@ N ::= a | B | ... <li>Substitution of non-terminals: for clearer CFGs </li> </ul> -<h2 id="From%20tokens%20to%20parse%20tree">From tokens to parse tree</h2> +<h2 id="from-tokens-to-parse-tree">From tokens to parse tree</h2> <p>Parsing is finding the structure in a stream of tokens. This is done by the parser.</p> @@ -430,15 +430,15 @@ N ::= a | B | ... - top-down (Leftmost derivation / LL), constructed in pre-order. - bottom-up (Rightmost derivation in reverse / LR), constructed in post-order.</p> -<h2 id="First%20sets">First sets</h2> +<h2 id="first-sets">First sets</h2> <p>The set of all terminal symbols, in order, derived from a sentence S. The empty string λ is never included.</p> -<h2 id="Follow%20sets">Follow sets</h2> +<h2 id="follow-sets">Follow sets</h2> <p>The set of terminals that can follow a non-terminal A, where the EOF symbol is $.</p> -<h2 id="LL(1)%20grammars">LL(1) grammars</h2> +<h2 id="ll1-grammars">LL(1) grammars</h2> <ul> <li>Cannot be left-recursive.</li> @@ -447,7 +447,7 @@ N ::= a | B | ... <li>λ-free grammars are very easy as you only need to check for the first set.</li> </ul> -<h2 id="LR%20grammars">LR grammars</h2> +<h2 id="lr-grammars">LR grammars</h2> <ul> <li>Can be parsed using an LR algorithm.</li> @@ -456,9 +456,9 @@ N ::= a | B | ... <li>LR parsing methods are the most commonly used for automatic tools today.</li> </ul> -<h1 id="Lexical%20analysis">Lexical analysis</h1> +<h1 id="lexical-analysis">Lexical analysis</h1> -<h2 id="Developing%20a%20scanner">Developing a scanner</h2> +<h2 id="developing-a-scanner">Developing a scanner</h2> <p>The scanner/lexer is what reads through source code character by character and divides them into lexemes and tokens. This includes whitespace which can have meaning for languages like Python. </p> @@ -466,7 +466,7 @@ N ::= a | B | ... <p>How do we know how to implement a scanner? Read the language specification. Included in the specification is typically tokens/lexemes as regular expressions and reserved words.</p> -<h3 id="Lexical%20elements">Lexical elements</h3> +<h3 id="lexical-elements">Lexical elements</h3> <ul> <li>Character set (ASCII vs Unicode)</li> @@ -485,11 +485,11 @@ N ::= a | B | ... <p>The audience of your PL matters for lexical design, for example if target audience are C programmers they will prefer curly braces over begin-end keywords. It is a good idea to keep your design simple though. </p> -<h3 id="Regular%20grammars&#47;expressions">Regular grammars/expressions</h3> +<h3 id="regular-grammarsexpressions">Regular grammars/expressions</h3> <p>It is generally better to use regular expressions instead of production rules where possible. Regular expressions are typically easy to implement. REs can be implemented by finite state machines (FSM). These automatas can then be easily implemented in code and as such make the job of implementing a scanner/lexer a little easier. </p> -<h3 id="Implement%20a%20scanner%20based%20on%20RE%20by%20hand">Implement a scanner based on RE by hand</h3> +<h3 id="implement-a-scanner-based-on-re-by-hand">Implement a scanner based on RE by hand</h3> <ol> <li><p>Express the ‘lexical’ grammar as RE. Sometimes it is easier to start with (E)BNF and do the necessary transformations.</p></li> @@ -504,18 +504,18 @@ N ::= a | B | ... <li>Most compilers are developed using a generated scanner.</li> </ul> -<h2 id="Generating%20scanners">Generating scanners</h2> +<h2 id="generating-scanners">Generating scanners</h2> <p>Something about REs and the different types of FAs are equivalent. There is an algorithm for transforming NDFA with epsilon into DFAs. DFAs are easily implementable by computers.</p> <p>A scanner generator a program that takes tokens as REs and will generate a scanner based on a FAs in whatever language. JLex/Lex/CSharpLex are tools for generating a scanner that recognizes tokens.</p> -<h2 id="Performance%20considerations">Performance considerations</h2> +<h2 id="performance-considerations">Performance considerations</h2> <p>Performance of scanners is important for production compilers. Size of scanner sometimes matters. Note: modern scanners use explicit control, not table, because bigger tables may perform worse on modern architectures.</p> -<h1 id="Top%20down%20parsing">Top down parsing</h1> +<h1 id="top-down-parsing">Top down parsing</h1> <p>The job of the syntax analysis, is to read to source text and determine it’s phase structure. This can be used to construct the AST. The input to the scanner is the stream of characters which output a stream of tokens to the parser or reports errors. The parser produces the AST or reports errors again.</p> @@ -532,7 +532,7 @@ The cat sees a rat <p>This is equivalent to the parse tree that would be produced from the sentence. So Top down is a straight-forward parsing algorithm. The parse structure corresponds to the set of procedures. </p> -<h2 id="Recursive%20decent%20parsing">Recursive decent parsing</h2> +<h2 id="recursive-decent-parsing">Recursive decent parsing</h2> <p>Each non-terminal will have a parse method. There will also need to be a method that asks for the next token. This function will either return the this token or give an error in the event of an unexepected token.</p> @@ -546,7 +546,7 @@ The cat sees a rat <p>This is a little more involved when there are alternatives. This can be implemented using if-elseif or a switch structure that checks for the alternatives.</p> -<h2 id="Algo%20to%20convert%20EBNF%20into%20a%20RD%20parser">Algo to convert EBNF into a RD parser</h2> +<h2 id="algo-to-convert-ebnf-into-a-rd-parser">Algo to convert EBNF into a RD parser</h2> <p>Foreach non-terminal in the grammar, create a parse method. Parsing a terminal just uses the ‘accept’ method. Parsing of non-terminals is just method calls to those non-terminal’s parse methods in order. If there are repetitions, (0 .. more) you use a while loop. As long as the current token is in the first set, you keep parsing it. Use a switch case of if-else for alternatives. </p> @@ -554,21 +554,21 @@ The cat sees a rat <p>If we want the parser to return an AST each of the parse methods should return their little bit of the AST instead of void. (Think of our visitor methods).</p> -<h2 id="The%20visitor%20pattern%20(briefly)">The visitor pattern (briefly)</h2> +<h2 id="the-visitor-pattern-briefly">The visitor pattern (briefly)</h2> <p>For OOP the visitor pattern enbales the definition of a new operator on an object structure without changing classes of the objects.</p> -<h2 id="LL(1)%20grammars-2">LL(1) grammars</h2> +<h2 id="ll1-grammars-1">LL(1) grammars</h2> <p>A grammar which can be parsed with a top-down parse with a lookahead (in the input stream of tokens) of one token.</p> <p>Lecture 7 @ 32:00 for formal definition of LL(1) grammar.</p> -<h3 id="Table-driven%20LL(1)%20parsers">Table-driven LL(1) parsers</h3> +<h3 id="table-driven-ll1-parsers">Table-driven LL(1) parsers</h3> <p>A table is generated that simulates the way the regular parser’s stack works. This presumably doesn’t consume as much memory as a traditional parser’s stack. The table is constructed with non-terminals and the lookahead terminals. The table tells which rule to use in the grammar. 49:00 Lecture 7. ANTLR an example of a Java parser generator, that allows LL(*) grammars and is table-driven.</p> -<h1 id="Bottom%20up%20parsing">Bottom up parsing</h1> +<h1 id="bottom-up-parsing">Bottom up parsing</h1> <p>LR languages are more powerful than LL. - LR(0), SLR(1), LALR(1), LR(k) grammars. @@ -588,7 +588,7 @@ The cat sees a rat <p>There are tools that can automate generating the bottom up parser. 50:00 Lecture 8.</p> -<h2 id="Terminology%2015:00%20Lecture%208">Terminology 15:00 Lecture 8</h2> +<h2 id="terminology-1500-lecture-8">Terminology 15:00 Lecture 8</h2> <ul> <li>A reduction transforms <code>uwv to uAv</code> if <code>A -> w</code> is a production.</li> @@ -596,7 +596,7 @@ The cat sees a rat <li>Handle (informally: a production than can be used in reverse without getting stuck).</li> </ul> -<h2 id="The%20shift%20reduce%20process%2018:00%20Lecture%208">The shift reduce process 18:00 Lecture 8</h2> +<h2 id="the-shift-reduce-process-1800-lecture-8">The shift reduce process 18:00 Lecture 8</h2> <p>This is how bottom up parser work. They have a stack and a table of operations. This can also be represented as a knitting game.</p> @@ -613,36 +613,36 @@ If there is an empty state in the lookup table, we know we have an error.</p> <p>How is the parse table built? The table is coded as a DFA (28:00 Lecture 8).</p> -<h3 id="LR(0)%20conflicts">LR(0) conflicts</h3> +<h3 id="lr0-conflicts">LR(0) conflicts</h3> <p>This algo doesn’t always work. A LR(0) conflict is a situation (DFA state) in which there is more than one possible action for the algo. There are two kinds of conflicts: - Shift-reduce: can’t decide between shift or reduce. - Reduce-reduce: can’t decide between two or more reductions for different grammar rules.</p> -<h3 id="The%20shift-reduce%20problem">The shift-reduce problem</h3> +<h3 id="the-shift-reduce-problem">The shift-reduce problem</h3> <p>This is when you can either shift or reduce. This is a sign that the grammar is not LR(0). There are tools that can be used to input a grammar and output what kind of grammar it is.</p> -<h2 id="LR(0)%20vs%20SLR(1)">LR(0) vs SLR(1)</h2> +<h2 id="lr0-vs-slr1">LR(0) vs SLR(1)</h2> <p>SLR(1) looks at the symbol after the handle (follow set). This means fewer reduce-actoins and therefore removes a lot of potential conflicts.</p> -<h2 id="LR(1)">LR(1)</h2> +<h2 id="lr1">LR(1)</h2> <p>While SLR just uses the follow set for each production. LR(1) looks at the follow set for each handle. This gives even more reductions. LR(1) parse tables are very big. Expensive but almost all common languages are LR(1).</p> -<h2 id="LALR(1)">LALR(1)</h2> +<h2 id="lalr1">LALR(1)</h2> <p>A variant of LR(1) that gives smaller parse tables. This is done by combining some states in the parse tree. In practice most practical languages are LALR(1).</p> -<h2 id="Parser%20conflict%20resolution">Parser conflict resolution</h2> +<h2 id="parser-conflict-resolution">Parser conflict resolution</h2> <p>Parsing conflicts are commonly caused by ambiguous grammars. In practice, these ambiguities are solved using a resolution rule however this may or may not do what you want (fx: dangling-else).</p> <p>Parsing conflicts must be fixed. You cannot rely on the resolution rule.</p> -<h1 id="Abstract%20syntax%20trees%20(AST)">Abstract syntax trees (AST)</h1> +<h1 id="abstract-syntax-trees-ast">Abstract syntax trees (AST)</h1> <p>Like a parse tree but with some details omitted.</p> @@ -653,18 +653,18 @@ Synthesized and inherited attributes.</p> <p>It should be possible to unparse an AST (reconstruct the source from an AST).</p> -<h2 id="Single-pass%20vs%20multi-pass%20compilers">Single-pass vs multi-pass compilers</h2> +<h2 id="single-pass-vs-multi-pass-compilers">Single-pass vs multi-pass compilers</h2> <p>Single-pass makes a single pass over source text, parsing, analyzing, and generating code all at once. This means that the contextual analyzer and code generator are called from the parser implementation.</p> <p>Multi-pass makes several passes over the program. This is the modern approach. The output of a preceeding phase is stored in a data structure and used by subsequent phases. This is more readable and maintainable as the code is more broken down into independant modules.</p> -<h2 id="Visitor%20pattern">Visitor pattern</h2> +<h2 id="visitor-pattern">Visitor pattern</h2> <p>35:00 Lecture 9. </p> -<h2 id="Contextual%20analysis">Contextual analysis</h2> +<h2 id="contextual-analysis">Contextual analysis</h2> <p>Identification and type checking are combined into a depth-first traversal of the AST.</p> -<p class="footer"><a href="/">adamski.wtf</a></p> +<p class="footer"><a href="/">adast.xyz</a></p> </html> |