Sign in Subscribe

Topic

AI Hardware

A collection of 2 issues

Taalas Prints LLMs on Chips, Claims 10x Performance Boost

Taalas has developed a method to 'print' large language models (LLMs) onto ASIC chips. Their first chip, which runs Llama 3.1 8B, achieves an inference rate of 17,000 tokens per second and is claimed to be 10x cheaper, less electricity-intensive, and faster than GPU-based systems. This technolo

Taalas Achieves Breakthrough with Llama 3.1 8B at 17,000 Tokens/Second

Taalas, a Canadian hardware startup, has achieved a breakthrough by serving the Llama 3.1 8B model at 17,000 tokens per second. This milestone, announced on February 20, 2026, is enabled by aggressive quantization techniques and positions Taalas as a key player in AI hardware. The company's nex