Using the Lamborghini of inference engines for serverless Llama 3 | Hacker News

Hacker News new | past | comments | ask | show | jobs | submit

login

		Using the Lamborghini of inference engines for serverless Llama 3 (modal.com)
		1 point by birdculture 14 days ago \| hide \| past \| favorite \| 1 comment

gnabgib 14 days ago [–]

Title: Serve an interactive language model app with latency-optimized TensorRT-LLM

Join us for AI Startup School this June 16-17 in San Francisco!
Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact