It is time to start describing the grammar for ASPL. We start with basic arithmetic expressions that support adding, subtracting, multiplying, and dividing positive numbers only. Later we will add negative numbers and grouping as well. This will be the base of our parser.

First, let’s examine what our expressions are made of. We want to start really small, so this will be a tiny subset of the full grammar.

For now, we will only have:

. These are integer and floating point numbers**Literals**. These consist of literals and arithmetic operators for addition**Binary infix expressions**`+`

, subtraction`-`

, multiplication`*`

and division`/`

Based on these elements we can describe the grammar as follows:

```
expression := literal | infix_binary ;
literal := integer | floating_point ;
integer := digit+ ;
floating_point := digit+ ( "." digit+ )? ;
digit := "0" ... "9" ;
infix_binary := expression operator expression ;
operator := "+" | "-" | "*" | "/" ;
```

All right, we have the grammar, but what does it mean? Let’s see.

As I mentioned previously, every rule has a head and a body. The head is the name of the rule, and the body describes what the rule is supposed to do.

In our notation the head and body is separated by `:=`

. Nonterminals are all lowercase strings. Terminals are strings enclosed in quotes. Rules are terminated with `;`

at the end. We allow the selection between a series of productions with the `|`

symbol. `...`

is used to denote a range with both limits included.

We also borrow some stuff from regular expressions, so we have a familiar syntax. We use `(`

and `)`

for grouping. `*`

means the previous symbol or group may repeat zero or more times. `+`

is similar, it requires the preceding production to appear one or more times. And finally `?`

requires the preceding production to appear zero or one time.

Now we can fully understand our grammar. Let’s build an expression from scratch.

`expression := literal | infix_binary ;`

It means that an expression is either a `literal`

, or an `infix_binary`

, both of them are referring to rules.

Let’s pick `literal`

:

`literal := integer | floating_point ;`

We see that a literal is either an integer or a floating point number.

We select `floating_point`

this time:

`floating_point := digit+ ( "." digit+ )? ;`

Here comes an interesting bit. A floating point number is one or more digits (`digit+`

), optionally followed by a `.`

and at least one or more digits (`( "." digit+ )?`

).

If we choose `integer`

instead, we get this rule:

`integer := digit+ ;`

It’s much simpler than the `floating_point`

rule, it simply states that an integer is a sequence of one or more digits.

Now we need to know what a digit is:

`digit := "0" ... "9" ;`

A digit is a character in the range of `0 ... 9`

.

So far so good. Here comes the most important rule in our tiny grammar:

`infix_binary := expression operator expression ;`

Here is the point where the grammar gets recursive. An `infix_binary`

rule is combination of an `expression`

, an `operator`

, and another `expression`

, `expression`

being the first rule we started our grammar with. This means it can contain anything we described so far.

The only unknown rule left is `operator`

.

`operator := "+" | "-" | "*" | "/" ;`

It says an operator is one of the following symbols: `+`

, `-`

, `*`

, `/`

.

If you have sharp eyes you may notice now that we have a problem. This grammar is actually *ambiguous*. It means, it is possible to choose different productions that will lead to the same string at the end.

I give you an example.

Try creating this expression with our current grammar: `2 * 3 + 4`

. How many different ways can you produce it? Let’s see.

One way to do it:

- Start with the
`expression`

rule, and pick`infix_binary`

. - For the left-hand side of the
`infix_binary`

, pick`infix_binary`

again.- In the inner
`infix_binary`

, pick`integer`

(`2`

) for the left-hand side. - Pick
`*`

for the`operator`

. - Pick
`integer`

(`3`

) for the right-hand side.

- In the inner
- For the operator of the outer
`infix_binary`

, pick`+`

. - For the right-hand side of the outer
`infix_binary`

, pick`integer`

(`4`

).

Another way:

- Start with the
`expression`

rule, and pick`infix_binary`

. - Pick
`integer`

(`2`

) for the left-hand side. - Pick
`*`

for the`operator`

. - Pick
`infix_binary`

for the right-hand side.- In the inner
`infix_binary`

, pick`integer`

(`3`

) for the left-hand side. - Pick
`+`

for the`operator`

. - Pick
`integer`

(`4`

) for the right-hand side.

- In the inner

What did we get as a result of these 2 expressions?

`(2 * 3) + 4`

in the first expression and `2 * (3 + 4)`

in the second. It’s clear that these two are different, right?

Let’s execute them:

`(2 * 3) + 4 = 10`

versus `2 * (3 + 4) = 14`

, so we can say that

`(2 * 3) + 4`

≠ `2 * (3 + 4)`

This is a serious issue because it means that the parser may misunderstand the source code. We have to fix this.

This is exactly what we are going to do next time.

Stay tuned. I’ll be back.