Gemma 4 is now actually usable in llama.cpp. KV cache and tokenizer bugs fixed. You can run it on consumer GPUs without melting your VRAM. #LocalAI #Gemma4 #LlamaCpp
https://bymachine.news/gemma-4-llama-cpp-fixes-kv-cache
So after another few hours of tinkering and bending I got #translategemma #LLM multi-modal text in picture translation working with #llamacpp as model serving. Custom #python REST API and simple web interface. It is not that straightforward as spinning llamacpp and […]
[Original post on f.cz]
Dnes jsem jen tak cvičně vyrobil #python #fastfastapi openrest api a webui nad #llamacpp a #translategemma #LLM. Musel jsem trochu ohýbat, ale nakonec mi to funguje a musím přiznat, se kvalita překladu je celkem dobrá tedy minimálně 12b verze translategemma.
Je me suis encore amusé cet après-midi (mauvais temps oblige).
Un LLM local ce'st bien mais sans recherche web c'est pas top.
On s'était amusé en live avec des recherches Wikipedia.
Là c'est tout le Web qui est interrogé :)
#python #streamlit #duckduckgo #llamacpp #qwen35
llama.cppがまた進化してる…!
なんとWeb UIが統合されて、Ollamaなしでもブラウザから直接モデルを動かせるようになったみたい。
ローカルLLMをサクッと試したい時の選択肢がまた増えましたね。これ、実用面でどこまでOllamaを追い上げられるか興味深い…!
みんなはローカルLLMの管理、何使ってる?
#AI #ローカルLLM #llamacpp #OSS
www.zeeklog.com/llama-cppzhong-da-geng-x...
Run 70B+ AI models on your 32GB Mac? Hypura makes it possible by streaming tensors from NVMe, pushing local inference boundaries. See how this clever engineering bypasses OOM errors!
thepixelspulse.com/posts/nvme-streaming-mac...
#hypura #applesilicon #llamacpp
DLLM - D Language 🤖 on 🦙.cpp
All D 🤪. No Python 🐍. No bindings 🩹. Just vibes 🌴.
🔗 github.com/DannyArends/DL…
#dlang #llm #llamacpp #opensource #AI
Just uploaded an experimental patch for the llama.cpp webui
I needed more control over the model's reasoning, so I added a toggle in the WebUI to manage it. You can disable it entirely or set it to different levels (Low, Medium, High).
It's still very […]
[Original post on mastodon.social]
The current state of the GPU market for AI is more diverse than many assume
whyaiman.substack.com/p/the-curren...
#AI #GPUComputing #Llamacpp #localai #IntelArc
yzma 1.11 is out, with more of what you need:
- Support for latest llama.cpp (>97% of functions covered)
- ROCm backend+benchmarks
- @officialarduino.bsky.social Uno Q install info
Go get it right now!
github.com/hybridgroup/...
#golang #llamacpp #yzma #arduino #unoq
I built a CLI to make it easier to run local LLMs with llama.cpp.
A few weeks later I have another tool using it to read my email and apply labels with a local model.
No credits, just scripts and a model on my laptop.
This is what I wanted local LLMs for.
#llm #localllm #llamacpp #devtools
Wow! My MLX vs llama.cpp benchmark hit #9 on r/LocalLLaMA today. Did not expect that.
Takeaway: benchmark actual scenarios, do not rely on just the tok/s counter in your UI. Ran into a caching bug specific to Qwen 3.5 (35B-A3B) on MLX. Effective tokens/s is what we experience
#MLX #LlamaCpp #Qwen
Gopherbot blinking
Less than 2 weeks until Embedded World & I will be at large on the expo floor! Let's chat about all things Go with microcontrollers, computer vision, & machine learning. Just look for Gopherbot.
#golang #tinygo #ew26 #embedded #computerVision #ml #openCV #llamacpp #yzma
yzma 1.10 is out with improvements like:
- install info for @officialarduino.bsky.social UNO Q and @raspberrypi.com
- experimental 'VLM' type
- improved yzma cmd so 'go install' works with latest
Go and get it!
github.com/hybridgroup/...
#golang #llama #llamacpp #ml #llm #vlm #cuda #vulkan
winbuzzer.com/2026/02/21/g...
Open-Source llama.cpp Finds Long-Term Home at Hugging Face
#AI #HuggingFace #AIInference #OpenSourceAI #OnDeviceAI #GGML #LlamaCpp #LocalAI #GeorgiGerganov
llama.cpp Creator Joins Hugging Face, Cementing the Open-Source AI Inference Stack
awesomeagents.ai/news/ggml-llama-cpp-join...
#LlamaCpp #HuggingFace #OpenSource
"Running local LLMs and VLMs on the Arduino UNO Q with yzma"
@golang.org running on the @officialarduino.bsky.social #unoq can be your new tiny edge inference device!
projecthub.arduino.cc/marc-edgeimp...
#golang #yzma #llama #llamacpp #llm #vlm #arduino #ml
The robots never sleep and neither do we! yzma 1.9 is out to handle breaking changes in upstream llama.cpp & since we were at it, auto-detect CUDA on installation and other useful changes too.
github.com/hybridgroup/...
#golang #llamacpp #llama #ml #nvidia #cuda
Just released yzma 1.8 with what Go coders need:
- latest llama.cpp features/models such as ModelFitParams
- @raspberrypi.com & #nvidia Jetson Orin quick installs
- more benchmarks
go get it right now!
github.com/hybridgroup/...
#golang #ml #llama #llamacpp #cuda #vulkan #raspberrypi #nvidia
#llamacpp announcing #anthropic Messages API the same day as GLM4.7-Flash releases is just chef's kiss 🧠🤖🧑🍳😘💋.
huggingface.co/blog/ggml-or...
Lance o TempFS Um protótipo para orquestrar contêineres temporários para modelos GGUF usando Node.js e Podman.
O projeto foca em ambientes efêmeros e limpeza automática de recursos.
github.com/FabioSmuu/Te...
#AI
#GGUF
#LLM
#LlamaCPP
#NodeJS
#Podman
#Containers
#Ubuntu
#Sandboxing
#Automation
yzma 1.4.1 has just been released for compatibility with llama.cpp b7628+
Available now!
github.com/hybridgroup/...
#golang #llama #llamacpp #llm #vlm #tlm #slm #vla
yzma 1.4 is out, the last release of the year. Support for split models and a few new features that just got added to llamacpp too. Enjoy high performance local inference from Go!
github.com/hybridgroup/...
#golang #llamacpp
Live now: build a lean llama.cpp server image, wire it with Compose, park it behind Caddy. Plus GHCR tip so your VPS does not cry. Read:
demystifai.substack.com/p/small-lang...
#Docker #llamaCpp #Caddy #SLM #RAG #DevOps
Reach native speed with MacOS llama.cpp container inference
buff.ly/CCR6tLn
#podman #llamacpp #macos
Had some fun building and using a next token prediction App to demonstrate AI concepts to deliver some #AITraining
#AI #EnterpriseAI #NextTokenPredictor #AIEducation #AIClassroom #CriticalThinking #LlamaCPP
LLM 양자화 완벽 가이드! INT4로 메모리 87.5% 절감, FP8로 처리량 43% 향상. GPTQ vs AWQ vs GGUF 비교, Llama 3 양자화 성능 벤치마크, Q4까지 손실 2% 미만! Pruning + Knowledge Distillation 경량화 기법, 하드웨어별 추천 전략, QLoRA Fine-tuning까지!
#AWQ #FP8 #GGUF #GPTQ #INT4 #INT8 #KnowledgeDistillation #Llama3 #llamacpp
doyouknow.kr/618/llm-quan...
We're moving at the speed of thought, so yzma v1.0 beta2 is out!
Better, faster, and more benchmarks to show it too.
Run local models using Go with your CPU, CUDA, or Vulkan.
You know what to do!
github.com/hybridgroup/...
#golang #llama #llamacpp
#LlamaCpp on the Apple Silicon
#MacOS
https://www.youtube.com/watch?app=desktop&v=2t9XrPcAiHg