These toolchain are created for experts to create industrial-level compilers. Even if you think you got a syntactic design concept that is such hot shit that you can’t wait to get it bootstrapped, even if hell, it proves Rice’s theorem wrong, please, write a simple interpreter for it to prove your syntax works. In fact, I think one way you can test your language’s design is to have it mooch off an established VM like JVM, CPython’s VM or CLR.

But if you wanna ‘learn’ compiler design, I beg you to roll your own backend. You don’t need SSA or any of that shit. You don’t need to super-optimize the output at first try. Just make a tree-rewrite optimizer and that’s that.

Same is true with LP generators. From Yacc to ANTLR, they just make the experience harder and less rewarding. I know hand-rolling LP is hard in a language like C, in which case, don’t fucking use it lol. There’s honestly no inherent worth in using C in 2024 for compiler design.

But there’s still use for C in being the subject of your compiler. It’s a very simple, straightforward and more importantly, standardized language, you don’t need to write a runtime for it, because when it comes to both UNIX and Windows, runtime is OS itself! Just make sure you add a syscall interface and then C runtimes like glibc and CRT can be easily strapped.

I’m going to do exactly this. I named my C compiler ‘Cephyr’. I have started over several times now. I am using OCaml.

I know my point about using LP generators is preaching to the choir and most people despise them — but I just don’t understand why people love to use IRs/ILs when doing so teaches you shit.

I recommend beginning to design your language with the IR – your IR.

I don’t just wanna focus on Cephyr. There are other stuff I wanna do, like Nock, a PostScript interpreter in Rust (because GhostScript had made me hard-reset 4-5 times. GhostScript is just un-secure, leaky garbage).

Anyways tell me what you think about my ‘take’ on this. Of course I am not saying you are ‘less knowledgeable’ for using LLM or MLIR, I’m just saying, they don’t teach you stuff.

Still, some people just use LLVM and MLIR as a final ‘portable’ interface, having done the optimization on graphs and trees. i think there should be some sort of ‘retargatble assembly’ language. Like something with fixed number of registers which… oh wait that’s just a VM!

Also you don’t need to necessarily translate to a super super low-level language. Just target C. GNU C to be exact. Cyclone does that, in fact, I am planning to bootstrap my functional language, which I named ‘Xeph’, based on Zephyr ASDL, into GNU C as a test. Or I might use JVM. I dunno. JVM languages are big these days.

PS: Do you guys know any cool VMs I can target beside CPython and JVM? Something with AoT perhaps?

Thanks.

  • porgamrer@programming.dev
    link
    fedilink
    arrow-up
    1
    ·
    edit-2
    7 months ago

    But I don’t see why by that logic any turing-complete language wouldn’t be in the group “portable”, including any hardware-specific assembly. Because you can always implement a translator that has well defined-behaviour?

    What matters is the practical reality. Generally, languages are not portable when they don’t have well-defined behaviour, and when this causes their implementations to differ.

    And thanks to this low standard for portability, a lot of VMs and high level languages are portable until you get to the FFI.

    e.g. is 6502 assembly now portable that C64 and NES emulators are commonplace?

    I would say yes! It’s just that portability is not the only thing required to make a VM spec useful.

    But if you lacked other options, you could theoretically build gcc for 6502 assembly once, and then use the same binary to bootstrap the gcc on lots of different platforms, specifically thanks to the proliferation of NES emulators.

    This would also only work if there is a standard NES API available in all the emulators that is rich enough to back a portable libc implementation. I have no idea about that part.