You can use 0.6B for speculative decoding on the larger models. It'll speed up 3...

		SparkyMcUnicorn 6 days ago \| parent \| context \| favorite \| on: Running Qwen3 on your macbook, using MLX, to vibe ... You can use 0.6B for speculative decoding on the larger models. It'll speed up 32B, but slows down 30B-A3B dramatically.