Cookies Psst! Do you accept cookies?

We use cookies to enhance and personalise your experience.
Please accept our cookies. Checkout our Cookie Policy for more information.

SEQUOIA: Exact Llama2-70B on an RTX4090 with half-second per-token latency

Article URL: https://infini-ai-lab.github.io/Sequoia-Page/

Comments URL: https://news.ycombinator.com/item?id=40261965

Points: 43

# Comments: 14

It seems like this feed has limited content. Do you want to fetch full content?

Fetch

Unfortunately, we're not able to fetch full content in this moment,
Do you want to check full content in the source blog?

View source 🔗

Last Stories

What's your thoughts?

Please Register or Login to your account to be able to submit your comment.