Attila's Blog

About writing your own programming language in C, Go and Swift

Parsing

Jun 25, 2018 Comments

Categories: aspl
Tags: parsing

Parsing is the part when the rules of the grammar are applied to the tokens. It is similar to what we, humans do when we learn the rules of our mother tongue in elementary school. The parser takes the token stream and turns it into a tree that reflects the rules of the grammar. This is the Abstract Syntax Tree (AST).

Just as with lexing, reporting errors is also part of the job. When the parser notices a token violating the rules of the grammar it reports it as a syntax error. It is the same as when the children in class fail to use proper grammar, and the teacher dutifully points out that they made a mistake.

Remember the tiny source code we used in the previous article?

Here it is:

int result = (12 + 56) / 3 * 2;

And the AST tree produced by the parser:

VariableDeclarationNode {
  Name {
    VariableExpressionNode {
      result
    }
  }
  Operator =
  BinaryInfixOperatorExpressionNode {
    BinaryInfixOperatorExpressionNode {
      BinaryInfixOperatorExpressionNode {
        IntegerExpressionNode {
          12
        }
        Operator +
        IntegerExpressionNode {
          56
        }
      }
      Operator /
      IntegerExpressionNode {
        3
      }
    }
    Operator *
    IntegerExpressionNode {
      2
    }
  }
}

As you can see it is indeed a tree. Let’s break it down. The root node is a variable declaration node with 3 children.

VariableDeclarationNode {
  Name {
    ...
  }
  Operator =
  BinaryInfixOperatorExpressionNode {
    ...
  }
}

The first child is the name node which is a variable expression node and its name is “result”.

VariableExpressionNode {
  result
}

The next child is the is the assignment operator.

Operator =

And the last child is the value of the assignment which is an infix binary expression. That means the expression has 2 operands and the operator is between them.

BinaryInfixOperatorExpressionNode {
  BinaryInfixOperatorExpressionNode {
    BinaryInfixOperatorExpressionNode {
      IntegerExpressionNode {
        12
      }
      Operator +
      IntegerExpressionNode {
        56
      }
    }
    Operator /
    IntegerExpressionNode {
      3
    }
  }
  Operator *
  IntegerExpressionNode {
    2
  }
}

This expression has 2 other binary expressions embedded in it. Of course this is just one way to visualize the AST tree, I chose it because it’s relatively easy to understand. The important thing is that each and every node type has its own data structure and after parsing there will be instances of these structures in the memory representing the AST tree.

They can be easily traversed starting with the root node of the tree, and this is exactly how they are used. Either the tree-walk interpreter, the static analyzer, or the bytecode compiler will traverse all nodes and does what it needs to in order to produce its output.

Next time I will briefly talk about static analysis.

Stay tuned. I’ll be back.

Tags: parsing

← Lexical Analysis Static Analysis →

comments powered by Disqus