Kimi 2.6 is now available on @hf.co 🔥🎉
huggingface.co/moonshotai/K...
✨ 1T MoE / 32B active / 256K context
✨ Agent Swarm: 300 sub-agents × 4,000 steps
✨ Modified MIT
Posts by Adina Yakup
MOSS-VL 🔥 Vision model from Open MOSS
Model: huggingface.co/collections/...
Demo: huggingface.co/spaces/OpenM...
✨ 11B - Apache 2.0
✨ Cross-attention + XRoPE (3D: time, height, width)
✨ Beats Qwen3-VL-8B by 8.3 pts on VSI-bench
Baidu just released ERNIE-Image on Hugging Face🔥
Model:
huggingface.co/collections/...
Demo:
huggingface.co/spaces/baidu...
✨ 8B DiT - Image/Image Turbo
✨ Apache2.0
✨ Strong text rendering for posters & UI-style images
✨ Structured outputs (comics, multi-panel scenes)
A new large-scale RGB-D dataset from Ant Group: LingBot-Depth 🤖
huggingface.co/datasets/rob...
✨3M+ samples / 2.7TB
✨Real-world + simulation + VLA robotics data
✨Raw sensor depth + ground truth
LongCat-AudioDiT 🔊New TTS from Meituan LongCat team
huggingface.co/meituan-long...
✨ 1B & 3.5B - MIT license
✨ Diffusion + non-AR generation
✨ Operates directly in waveform latent space
✨ Simpler pipeline (no mel-spectrograms)
Matrix-Game 3.0🔥real-time interactive world models from
Skywork
huggingface.co/Skywork/Matr...
✨ MIT license
✨ 720p @ 40FPS with a 5B model
✨ Minute-long memory consistency
✨ Unreal + AAA + real-world data
✨ Scales up to 28B MoE
daVinci-LLM 🔥 The SII-GAIR team just shared the full training pipeline on @hf.co
huggingface.co/SII-GAIR-NLP...
✨ 3B, competitive with 7B models
✨ 8T-token transparent training
✨ 200+ ablation studies
✨ Data Darwinism (L0–L9) framework
LongCat-Next 🐱 A multimodal foundation model released by
Meituan
huggingface.co/meituan-long...
✨ 74B total - 3B active - MIT
✨ One token space for all modalities
✨ DiNA paradigm for unified learning
✨ Seeing, creating, talking all in one
✨ 4 modules: semantic, temporal, aesthetic & spatial
✨ Multi-dimensional RL for better audio alignment
✨ Builds on ThinkSound’s CoT V2A
PrismAudio 🔥 RL-powered video-to-audio (V2A) framework from Alibaba FunAudio team
Model: huggingface.co/FunAudioLLM/...
Paper:
huggingface.co/papers/2511....
Demo: huggingface.co/spaces/FunAu...
LongCat-Flash-Prover 🐱 A 560B MoE from Meituan just dropped on @hf.co
huggingface.co/meituan-long...
✨ MIT license
✨ Native Formal Reasoning: Auto-formalize, sketch & verify proofs live in Lean4
✨ 97.1% pass rate on MiniF2F-Test & 70.8% ProverBench
daVinci-MagiHuman 🎬 Human Centric Audio-Video Generative Model by GAIR
Model: huggingface.co/GAIR/daVinci...
Paper: huggingface.co/GAIR/daVinci...
✨ 15B – Fully open source!
✨ 5-sec 1080p video in 38s on one H100
✨ Supports 6 languages
✨ Unified model with text + video + audio
AutoMathText-V2 🔥 A 2.46T token STEM dataset from Shanghai QI ZHI Institute
huggingface.co/datasets/Ope...
✨ Optimized for Math & STEM
✨ Triple deduplication: Exact → Fuzzy → Semantic
Another OCR model just dropped 🔥 (so many OCRs lately!)
dots.mocr from RedNote Hi Lab looks really impressive on the benchmarks.
huggingface.co/collections/...
✨ 3B
✨ Multilingual support
✨ Converts charts, diagrams, and UI layouts directly into SVG code
Qianfan-OCR 🔥 New end-to-end document intelligence model from Baidu is now available on @hf.co
huggingface.co/baidu/Qianfa...
huggingface.co/papers/2603....
✨ 4B - Apache 2.0
✨ OCR across 192 languages
✨ 1 page/sec on a single A100
✨ Trained on 1,024 Kunlun P800 chips
Xperience 10M 🔥 One of the largest egocentric multimodal datasets from Ropedia_ai
huggingface.co/datasets/rop...
✨ 10M interactions
✨ 10k hours egocentric recordings
✨ RGB + audio + depth + SLAM + hand & body mocap + IMU
✨ Structured 3D/4D annotations
MiroThinker 1.7 🔥 New open research agents
huggingface.co/collections/...
✨ 1.7 & 1.7 mini (30B)
✨ 256K context longth
✨ 300 tool calls per task
✨ Qwen3-based + custom agent (MiroThinker-H1)
Fish Audio S2 Pro 🔊
huggingface.co/collections/...
✨ Fine-grained prosody & emotion control
✨ Supports 80+ languages
✨ Low-latency streaming & long context inference
Yuan3.0 Ultra 🔥 A 1T multimodal LLM from YuanLab
huggingface.co/YuanLabAI
✨ 64K context
✨ Enterprise-ready: RAG, summarization, Text-to-SQL
✨ 103-layer MoE w/ LAEP (49% efficiency boost)
IQuest-Coder-V1 Update! 7B & 14B series now on
@hf.co 🔥
huggingface.co/collections/...
✨ 7B/14B - Base, instruct, thinking
✨ Optimized for tool use & CLI agents
✨ 128k context length
Step 3.5 Flash 🔥New MoE model from StepFun
huggingface.co/stepfun-ai/S...
huggingface.co/stepfun-ai/S...
✨ Base & Base-Midtrain
✨ 196B total/11B active - Apache 2.0
✨ 256K context
✨ High-speed reasoning & agentic tasks
Qwen 3.5 Small Model Series just dropped on
@hf.co 🔥
huggingface.co/collections/...
✨ 0.8B/2B/4B/9B
✨ Apache2.0
✨ 262K→1M token context
MiniMax M2.5 is now available on @hf.co
huggingface.co/MiniMaxAI/Mi...
✨ 229B - Modified MIT license
✨37% faster than M2.1
✨ ~$1/hour at 100 TPS
Ovis2.6-30B-A3B🚀 The latest multimodal LLM from the AIDC team at Alibaba
huggingface.co/AIDC-AI/Ovis...
✨ 64K context + 2880×2880 resolution
✨ MoE 30B/3B active
✨ Apache 2.0
✨ “Think with Image” : Active visual reasoning
Ring-1T-2.5 🔥 1T reasoning model based on hybrid linear attention from Ant Group
huggingface.co/inclusionAI/...
✨ MIT license
✨ 128K -> 256K (YaRN)
✨ Hybrid MLA + Lightning Linear Attention (1:7)
✨ Agentic: Natively with Claude Code & OpenClaw
RynnBrain 🤖 a physics aware embodied brain for robots from Alibaba DAMO
huggingface.co/collections/...
✨ 2B/8B/30B (3B active)
✨ Apache 2.0
✨ Understands egocentric scenes with strong spatial awareness
✨ Tracks objects and motion over time
MiniCPM-SALA 🚀 Hybrid model combining Sparse + Linear Attention from OpenBMB
huggingface.co/openbmb/Mini...
✨ 25% Sparse + 75% Linear Attention
✨ Up to 3.5× faster inference
✨ 1M+ tokens on RTX 5090 / A6000D
✨ Apache 2.0
While Seedance 2.0’s videos are all over the timeline, DeepSeek quietly pushed a new model update in its app.
GLM-5, Ming-flash-omni from Ant Group , MiniCPM-SALA from OpenBMB, and the upcoming MiniMax M2.5 keep the heat on 🔥
Spring Festival is around the corner, no one’s sleeping!
Ming-flash-omni 2.0 🚀 New open omni-MLLM released by Ant Group
huggingface.co/inclusionAI/...
✨ MIT license
✨ MoE - 100B/6B active
✨ Zero-shot voice cloning + controllable audio
✨ Fine-grained visual knowledge grounding