Advertisement · 728 × 90

Posts by Zack Angelo

just realized bsky doesn't support gifs lol

1 year ago 1 0 0 0
Post image

functions can even compose, here's the model using the output of one as the input into another

1 year ago 0 0 1 0
Post image

one of the most slept on capabilities of newer AI models is the ability to call multiple tools in a single shot. here's the newest llama 70b running on mixlayer calling 4 tools (lookup weather in 3 cities and perform some arithmetic)

1 year ago 2 0 1 0
Preview
LLM Reasoning 101 - Mixlayer Large Language Models (LLMs) can be made better at complex reasoning tasks through techniques like few-shot prompting and Chain of Thought (CoT) reasoning, which allow smaller models to match the perf...

Want to play around with chain of thought and some other prompting techniques? I put up a few
Mixlayer demos on Meta's Llama 3.1 8b in this blog post. www.mixlayer.com/blog/2024-12...

1 year ago 0 0 0 0

weird that the instruction tuned Llama3 8b models are downloaded less than the original?

1 year ago 0 0 1 0

I doubt they switch to a lower precision model, but would not be surprised if they start using a quantized or fp8 KV cache. Much easier to switch out dynamically in response to load vs the model weights.

1 year ago 0 0 0 0
Extending the Context Length to 1M Tokens! API Documentation (Chinese) HuggingFace Demo ModelScope Demo Introduction After the release of Qwen2.5, we heard the community’s demand for processing longer contexts. In recent months, we have made m...

Crazy to think that a 1M token context window will be the norm soon.

Doesn't look like this model has made it onto HF yet (just a space, no weights), curious to learn more about the sparse attention mechanism.

qwenlm.github.io/blog/qwen2.5...

1 year ago 0 0 0 0

woke up in a 3am fit of terror last night bc I dreamt I left an 8x a100 gpu cluster running by accident 🫠

1 year ago 1 0 0 0
Advertisement