As the title suggests, basically i have a few LLM models and wanted to see how they perform with different hardware (Cpus only instances, gpus - t4, v100, a100). Ideally it’s to get an idea on the performance and overall price(vm hourly rate/ efficiency)
Currently I’ve written a script to calculate ms per token, ram usage(memory profiler), total time taken.
Wanted to check if there are better methods or tools. Thanks!
You must log in or register to comment.