The objectives for this project are to

  • Create finite state machines to recognize tokens from regular expressions
  • Implement the state machines with a C++ program
  • Produce a correct stream of tokens given arbitrary input

Notation and Definitions

A deterministic finite state machine is a quintuple $(S,\ I,\ f,\ s_0,\ F)$, where:

  • $S$ is a finite, non-empty set of states
  • $I$ is the input alphabet (a finite, non-empty set of symbols)
  • $f$ is the state-transition function: $f\ :\ S \times I \rightarrow S$
  • $s_0$ is an initial state, an element of $S$
  • $F$ is the set of final states, a (possibly empty) subset of $S$.

A finite state transducer is a 6-tuple $(S,\ I,\ O,\ f,\ g,\ s_0)$ as above except:

  • $O$ is the output alphabet (a finite, non-empty set of symbols)
  • $f$ is the state-transition function: $f\ :\ S \times I \rightarrow S$
  • $g$ is the output function: $g\ :\ S \times I \rightarrow O$


  1. What is the line of code in Lex.cpp that establishes <math>s_0</math>?
  2. Where are the elements of <math>S</math> enumerated? Give the first three as they appear in the code.
  3. What are the elements in the set <math>I</math>?
  4. Describe a way to refactor the code so that it would work for any FSM, not just the one that has the structure required for the Datalog lexer.
  5. The code implements the state-transition function <math>f\ :\ S \times I \rightarrow S</math> in two different ways depending on whether a non-empty token in <math>O</math> is to be emitted according to <math>g\ :\ S \times I \rightarrow O</math>. Match the state-transition function to the code below by circling, labeling, and explaining how the code components represent the state, S, obtain the input I, and represents the output O for the Comma state and the input I and output O for the SawColon state.
  6. Why is there no token emitted for the SawColon case in the switch statement?
 State Lex::nextState() {
    State result;
    char character;
    switch(state) {
        case Start:               result = getNextState(); break;
        case Comma:               emit(COMMA); result = getNextState(); break;
        case Period:              emit(PERIOD); result = getNextState(); break;
        case SawColon:
            character = input->getCurrentCharacter();
            if(character == '-') {
                result = Colon_Dash;
            } else { //Every other character
                throw "ERROR:: in case SawColon:, Expecting  '-' but found " + character + '.';
      case Colon_Dash:          emit(COLON_DASH); result = getNextState(); break;
cs-236/project-1-worksheet.txt · Last modified: 2015/01/05 14:06 by egm
Back to top
CC Attribution-Share Alike 4.0 International = chi`s home Valid CSS Driven by DokuWiki do yourself a favour and use a real browser - get firefox!! Recent changes RSS feed Valid XHTML 1.0