I just finished the first programming assignment for the Stanford Compilers course, which was to implement a lexer for the Cool (classroom object oriented language) using Flex.
Flex is a C-language based lexical analyzer generator. It generates a valid c source file for the lexical rules you put together. The .flex
file format is basically a c file with some special syntax.
The bulk of this assignment was hunting for implementation details. I do not enjoy courses as much when they don't go through an assignment pre-amble in the actual lectures. Feels bad to be hunting through pdfs. Yes, yes, RTFM and all that -- I just feel that education should be optimized for speed of intake, not to "teach you how to hunt for stuff." Thats a valuable lesson I already learned literally 20 years ago. We're in a new age here.
Gotchas
make lexer
, not justmake
- I spent 20 minutes wondering if my changes to the basecool.flex
were actually taking effectA nice base case for the flex file
Pre-defined return types in a magic header not in your assignment folder (
/usr/class/cs143/include/PA2/cool-parse.h
)You can nest rules via
%x
directive -- directly applicable to string literal and comment parsingThe actual lexeme is in a global variable called
yytext
which is a null terminated c-stringWhen a lexeme is semantically relevant, you are expected to provide it to the parser by copying it into
cool_yylval
(which is a union, relevant members beingerror_msg
andsymbol
)They expect you to not just copy the string, but use 2 different global variables
inttable
,stringtable
which implement a customStringTable
class. These are definied instringtable.cc
and the only relevant method ischar* add_string(char *)
.
Testing
Right up front, I will tell you test.cl
has some minor syntax errors in it. Your parser should be able to handle them. You don't have to do anything thats particularly special for it to do so.
The output of running ./lexer
, with no rules defined, is just the entire output file. This makes it overwhelming to try to parse their supplied test.cl
. Start with an extremely simple file containing just =>
, on a line by itself. This is the only rule they provide.
The output should be:
#name "your_file.cl" #1 DARROW
The anatomy is the #name
preamble, which is just the file, followed by #{line number} {lexeme class} {lexeme if applicable}
. The operators and keywords and things are not expected to have a lexeme -- only things like INT_CONST
which will have its associated value (like 42
or 1
or whatever the number is).
Start adding individual things you know are valid Cool syntax to the file. Write rules for them. The output should be only the #{line number} {lexeme class} {lexeme if applicable}
. If its printing characters that aren't parsed, you've missed something in your ruleset.
Eventually switch to test.cl
when you feel you've got a handle. Do a pass with ./lexer test.cl | less
, scroll through the input and look for things that aren't parsed. Fix them.
Once everything is parsed, you can find a REFERENCE LEXER that I didn't see mentioned anywhere in usr/class/cs143/bin/reflexer
. Use it to parse the same file and output it to a test file like so `usr/class/cs143/bin/reflexer test.cl
ref_test.out
. Now do the same thing with your own lexer like so
lexer test.cl > own_test.out. Now diff them
diff own_test.out ref_test.out`. Fix, rinse, repeat.
Overall, I learned something and was pleased when finished. Could have gone smoother.
Resources
- Stanford Course - self paced
- My Original Course Post
- My lexer (don't cheat though)