San 474/574
Languages and their Processors
Compiler Project
The compiler project for San 474 involves an extension of the Micro compiler. This is a compiler that is described in Chapter 2 of Crafting a Compiler with C by Fischer and LeBlanc. A compiler for this language will be provided to you. There are several versions of this compiler: one written in Pascal, one in C, another in C++, and a fourth written for Lex and Yacc. Each produces a simple virtual machine language that has the form of a three-address assembly language.
The components of the project, as suggested in the instructor's guide to this text, are the following:
Phase 1: Declarations. Define a new syntax and change the semantic processing of identifier references to require a previous declaration. The declaration should allow a compile time initialization. Give specific error messages if a variable is used before being declared, or if a declaration has the wrong syntax. Also, extend the language to include real literals and variables. (It currently allows only integers.) This will involve extending the scanner to handle real literals. Symbol table routines must be extended to store a type attribute with each identifier. All semantic routines which generate code must be changed to handle both real and integer values. That is, these routines must consider the type of the parameter they receive.
Phase 2: Multiplication, Division, and Logical Operators. Addition, multiplication, division (integer and real), remainder, logical not, logical and, and logical or to the language. (Integer division gives an integer answer.) New symbols must be scanned. The grammar should enforce operator precedence and associativity. Semantic routines must be extended to handle code generation for the new tokens. You should also allow mixed mode expressions. That is, real + integer should be permitted, and the code generated should convert the integer to a real and then add it with a real add. Also parenthesized expressions should be permitted.
Phase 3: Conditional and loop statements. Extend the language to include if and while statements. Nested looping constructs should be permitted. The scanner must recognize these new keywords. The grammar must be extended. Semantic routines must be extended to generate appropriate tests and jumps. Add comparison operators, i.e. >, >=, < <=, =, !=.
Phase 4: Procedures. Incorporate nested procedures (parameterless) into the language. The scanner and parser must be extended. The symbol table must now handle nested scopes. Thus, there can now be more than one variable named i. The semantic routines must be extended to generate code to transfer control at the point of call and in the procedure (at the beginning and end).
Phase 5: Parameters. Incorporate both value and reference parameters for procedures. The actual parameter should be a valid expression for value parameters. Obviously, an actual reference parameter will have to be a variable (l-value).
Include error messages where appropriate in each phase.
For each phase, you are required to hand in a revised grammar for the extended micro language, a test plan, the source code listing with the changes documented and highlighted, and completed test cases. The test plan should consist of a set of test cases with expected output from the compiler. The test cases should illustrate every feature of the additions, and should convince me that they are correct. The test cases will need to include some regression testing to prove that the new changes do not change the functionality of the remainder of the program.
Each phase will be equally weighted. Furthermore, each sub-phase will be equally weighted. That is, the grammar and test plan for phase X is worth the same as the executable code and test cases for phase X.
All documents should be printed. The test plan is an important component of the system. Preparing this early forces you to understand more precisely the functions of this phase. In addition to a prose description and categorization of the tests to be executed, you should include copies of files used to test this phase and expected outputs. Source code will be graded for documentation and style in addition to correctness. You are responsible to "prove" that your compiler operates correctly for each phase by means of the test cases submitted. Submit source code that will compile and run (or executable code) to test in addition to your own testing input and output.
Because of project is divided into five phases, it is important that you do not fall behind schedule. Items handed in late will be penalized 20% per day for the first three days late. (Weekends count as one day.) After three days, that item will not be accepted.
The due dates for various components of this project are:
Grammar and Test plan for phase 1 February 15
Executable code and completed tests for phase 1 February 22
Grammar and Test plan for phase 2 February 29
Executable code and completed tests for phase 2 March 7
Grammar and Test plan for phase 3 March 23
Executable code and completed tests for phase 3 March 30
Grammar and Test plan for phase 4 April 6
Executable code and completed tests for phase 4 April 13
Grammar and Test plan for phase 5 April 20
Executable code and completed tests for phase 5 April 27
This is a group project. Groups of two or three members are acceptable. All team members receive the same grade. All team members should be active in all phase of the project although a particular person may be primarily responsible for a specific phase. If one member of the team is not participating adequately, it is the responsibility of the other team members to confront that team member. If that fails during the course of the project, report the problem to me. Do not wait until the end of the semester to bring such problems to my attention.
This compiler may be built on any computer system that you desire. You will be provided with the initial compiler that implements a subset of Micro. Also, the grammar for the output language MU to be generated will be given to you.