Hello everyone,
I am writing the current state of progress and what I am doing in these blog-like posts, hopefully these will receive regular updates as I make progress. Feel free to post in this thread to offer comments, suggestions, insights, questions etc.
Currently I am working on parsing statements as I finished working on parsing expressions.
The issue I am tackling with at the moment is dealing whitespace between tokens and safely traversing through the token array.
As ECI is written in C, there are no high-level programming features like objects which can hold a dynamic state or safeguards to prevent the program from crashing if reading beyond the array. So I need to come up with some method to safely and conveniently access tokens and skip whitespace when needed.
I have thought about writing a peek
function which will automatically raise an error, thanks to the somewhat smart error system in the parser which is implemented via setjmp
(basically a beefed up version of "GoTo" which can jump across functions).
Here is the current code I have for parsing statements:
struct Statement statement_get(struct Token *token, struct Token **next) {
struct Statement statement;
struct Token *next_token = NULL;
bool function, declaration = false;
if (token->type == TOK_WORD && kwd_is_declarator(token->keyword)) {
function = token->keyword == KWD_FUNC;
declaration = true;
}
if (declaration) {
statement.type = SMT_DECLARATION;
statement.declaration = malloc(sizeof *statement.declaration);
if (statement.declaration == NULL) raise_mem("parsing declaration statement");
statement.declaration->is_function = function;
if (function) {
// ...
} else {
// Variable Declaration
statement.declaration->scope = SCO_AUTO;
statement.declaration->is_static = false;
statement.declaration->is_constant = false;
statement.declaration->name = NULL;
statement.declaration->initializer = NULL;
// Metadata
do {
if (!token->info) /* Not a keyword*/ break;
enum Keyword kwd = *(enum Keyword *)(token->info);
if (!kwd_is_declarator(kwd)) break;
switch (kwd) {
case KWD_GLOBAL:
statement.declaration->scope = SCO_GLOBAL;
break;
case KWD_LOCAL:
statement.declaration->scope = SCO_LOCAL;
break;
case KWD_STATIC:
statement.declaration->is_static = true;
break;
case KWD_CONST:
statement.declaration->is_constant = true;
break;
}
} while (TOK_WORD == (++token)->type);
// Name
if (token->type != TOK_VARIABLE) raise_unexpected_token("a variable", token);
statement.declaration->name = malloc(token->data_len + 1);
if (!statement.declaration->name) raise_mem("storing variable name");
strncpy(statement.declaration->name, token->data, token->data_len);
// Initializer
if (token[1].type != TOK_OPERATOR) goto next;
if (token[1].op_info.sym != OPR_EQU) /* ... */;
// ... parse expression and store it as initializer
}
} else {
statement.type = SMT_EXPRESSION;
statement.expression = malloc(sizeof *statement.expression);
if (!statement.expression) raise_mem("parsing expression statement");
size_t token_count = 0;
while (true) {
if (token[token_count].type == TOK_WHITESPACE && token[token_count].newline || token[token_count].type == TOK_EOF) break;
++token_count;
}
*statement.expression = expression_get(token, token_count);
next_token = token + token_count + 1;
}
// Set the next token
next: *next = next_token ? next_token : token + 1;
return statement;
}
As shown in the code, the tokens
array is being accessed directly and that is a safety hazard, I have implemented a basic safeguard in the form of dummy padding tokens at both the start and end of the array, but it only protects accessing 1 step beyond the valid range, so another solution is needed.
I'll post updates here on what strategy I am going to use. Thanks for reading my technical rambling 🙂