Taalas Prints LLMs on Chips, Claims 10x Performance Boost
Taalas has developed a method to 'print' large language models (LLMs) onto ASIC chips. Their first chip, which runs Llama 3.1 8B, achieves an inference rate of 17,000 tokens per second and is claimed to be 10x cheaper, less electricity-intensive, and faster than GPU-based systems. This technolo