Owlver | SEQUOIA: Exact Llama2-70B on an RTX4090 with half-second per-token latency

Cookies

Psst! Do you accept cookies?

We use cookies to enhance and personalise your experience.
Please accept our cookies. Checkout our Cookie Policy for more information.

Okay Hm?

Hacker News @hackernews

SEQUOIA: Exact Llama2-70B on an RTX4090 with half-second per-token latency

May 5, 2024 1 minute

Article URL: https://infini-ai-lab.github.io/Sequoia-Page/

Comments URL: https://news.ycombinator.com/item?id=40261965

Points: 43

# Comments: 14

It seems like this feed has limited content. Do you want to fetch full content?

Fetch

Unfortunately, we're not able to fetch full content in this moment,
Do you want to check full content in the source blog?

View source 🔗

Author: Hacker News
Date: May 5, 2024
Permalink: owlver.com/s/n2NkCZ03
A story from Owlver Network.

Last Stories

What's your thoughts?

Please Register or Login to your account to be able to submit your comment.