How to write a lexer in C - part 1
A lexer translates text to tokens which you can use in your application. With these tokens your could build your own programming language or a json parser for example. Since i’m fan of OOP you’ll see that I applied an OO aproach. The code quality is good according to ChatGPT. According to valgrind there are no memory leaks. We’re gonna create in this part:
- token struct
- token tests
- Makefile
Requirements:
- gcc
- make
Writing token.h
Create a new file called “token.h”.
Implement header protection:
#ifndef TOKEN_H_INCLUDED
#define TOKEN_H_INCLUDED
// Where the code goes
#endif
This will prevent that the file gets double included.
Import the headers we need:
#include <string.h>
#include <stdlib.h>
Define config:
#define TOKEN_LEXEME_SIZE 256
this means our token size is limited to 256 chars. It’s not possible to use a huge string now. Dynamic memory allocation is too much to include in this tutorial.
Define the token struct:
typedef struct Token {
char lexeme[TOKEN_LEXEME_SIZE];
int line;
int col;
struct Token * next;
struct Token * prev;
} Token;
Implement new function. This will instantiate Token with default values.
Token * token_new(){
Token * token = (Token *)malloc(sizeof(Token));
memset(token->lexeme, 0,TOKEN_LEXEME_SIZE);
token->line = 0;
token->col = 0;
token->next = NULL;
token->prev = NULL;
return token;
}
Imlement init function. This will instantiate a Token with given parameters as values.
Token * token_init(Token * prev, char * lexeme, int line, int col){
Token * token = token_new();
token->line = line;
token->col = col;
if(prev != NULL){
token->prev = prev;
prev->next = token;
}
strcpy(token->lexeme, lexeme);
return token;
}
Implement free function. This is our destructor.
It will:
- find first token using given token
- will call itself with related token(s)
void token_free(Token * token){
// Find first token
while(token->prev != NULL)
token = token->prev;
Token * next = token->next;
if(next){
token->next->prev = NULL;
token_free(next);
}
free(token);
}
Testing
Now it’s time to build some tests using assert.
Create a new file called “token_test.h”.
Add this:
int main()
{
// Check default values
Token *token = token_new();
assert(token->next == NULL);
assert(token->prev == NULL);
assert(token->line == 0);
assert(token->col == 0);
assert(strlen(token->lexeme) == 0);
// Test init function
Token *token2 = token_init(token, "print", 1, 3);
assert(token->next == token2);
assert(token2->prev == token);
assert(token2->line == 1);
assert(token2->col == 3);
assert(!strcmp(token2->lexeme, "print"));
token_free(token2);
printf("Tests succesful\n");
return 0;
}
Now we have a working application. Let’s make compilation easy using a Makefile.
Makefile
Create a file named “Makefile”.
all: tests
tests:
gcc token_test.c -o token_test
./token_test
Run make
That’s it!
So, we created Token which is required for the lexer in next part of this tutorial.
If something not working or you need help; send a message.
tysm^^. how to handle parser baddie?