Why I love Rust for tokenising and parsing

Reading Time: 3 minutes

I am currently writing a analysis tool for Sql: sqleibniz , specifically for the sqlite
dialect.

The goal is to perform static analysis for sql input, including: syntax
checks, checks if tables, columns and functions exist. Combining this with an
embedded sqlite runtime and the ability to assert conditions in this runtime,
creates a really great dev experience for sql.

Furthermore, I want to be able to show the user high quality error messages
with context, explainations and the ability to mute certain diagnostics.

This analysis includes the stages of lexical analysis/tokenisation, the
parsing of SQL according to the sqlite documentation 1 and
the analysis of the resulting constructs.

After completing the static analysis part of the project, I plan on writing a
lsp server for sql, so stay tuned for that.

In the process of the above, I need to write a tokenizer and a parser – both
for SQL. While I am nowhere near completion of sqleibniz, I still made some
discoveries around rust and the handy features the language provides for
developing said software.

Macros

Macros work different in most languages. However they are used for mostly the
same reasons: code deduplication and less repetition.

Abstract Syntax Tree Nodes

A node for a statement in sqleibniz implementation is defined as follows:

1
2 #[derive(Debug)]
3 /// holds all literal types, such as strings, numbers, etc.
4 pub struct Literal {
5 pub t: Token,
6 }

Furthermore all nodes are required to implement the Node-trait, this trait
is returned by all parser functions and is later used to analyse the contents
of a statement:

1 pub trait Node: std::fmt::Debug {
2 fn token(&self) -> & Token;
3 }
Code duplication

Thus every node not only has to be defined, but an implementation for the
Node-trait has to be written. This requires a lot of code duplication and
rust has a solution for that.

I want a macro that is able to:

define a structure with a given identifier and a doc comment add arbitrary fields to the structure satisfying the Node trait by implementing fn token(&self) -> &Token

Lets take a look at the full code I need the macro to produce for the
Literal and the Explain nodes. While the first one has no further fields
except the Token field t, the second node requires a child field with a
type.

1 #[derive(Debug)]
2 /// holds all literal types, such as strings, numbers, etc.
3 pub struct Literal {
4 /// predefined for all structures defined with the node! macro
5 pub t: Token,
6 }
7 impl Node for Literal {
8 fn token(&self) -> & Token {
9 &self.t
10 }
11 }
12
13
14 #[derive(Debug)]
15 /// Explain stmt, see: https://www.sqlite.org/lang_explain.html
16 pub struct Explain {
17 /// predefined for all structures defined with the node! macro
18 pub t: Token,
19 pub child: Option Parser Parser

Article Source




Information contained on this page is provided by an independent third-party content provider. This website makes no warranties or representations in connection therewith. If you are affiliated with this page and would like it removed please contact editor @americanfork.business

Skip to content