(@dmitryontech) Bsky

Paper page - HY-World 2.0: A Multi-Modal World Model for Reconstructing, Generating, and Simulating 3D Worlds Join the discussion on this paper page

HY-World 2.0: A Multi-Modal World Model for Reconstructing, Generating, and Simulating 3D Worlds

huggingface.co/papers/2604....

6 days ago 0 0 0 0

NVIDIA GPUs with 12 GB of VRAM YouTube video by AIProgrammingHardware

youtu.be/m-h8zDkd13w

6 days ago 0 0 0 0

NVIDIA GPUs with 12 GB of video memory

NVIDIA GPUs with 12 GB of video memory

javaeeeee.medium.com/nvidia-gpus-...

6 days ago 0 0 1 0

Paper page - Seedance 2.0: Advancing Video Generation for World Complexity Join the discussion on this paper page

Seedance 2.0: Advancing Video Generation for World Complexity

huggingface.co/papers/2604....

1 week ago 0 0 0 0

Paper page - ClawGUI: A Unified Framework for Training, Evaluating, and Deploying GUI Agents Join the discussion on this paper page

ClawGUI: A Unified Framework for Training, Evaluating, and Deploying GUI Agents

huggingface.co/papers/2604....

1 week ago 0 0 0 0

Paper page - Uni-ViGU: Towards Unified Video Generation and Understanding via A Diffusion-Based Video Generator Join the discussion on this paper page

Uni-ViGU: Towards Unified Video Generation and Understanding via A Diffusion-Based Video Generator

huggingface.co/papers/2604....

1 week ago 0 0 0 0

Paper page - VisionFoundry: Teaching VLMs Visual Perception with Synthetic Images Join the discussion on this paper page

VisionFoundry: Teaching VLMs Visual Perception with Synthetic Images

huggingface.co/papers/2604....

1 week ago 0 0 0 0

How to Build a Production-Ready Claude Code Skill | Towards Data Science What I learned building and distributing my first Skill from scratch

How to Build a Production-Ready Claude Code Skill in @towardsdatascience.com

towardsdatascience.com/how-to-build...

1 week ago 0 0 0 0

Paper page - MegaStyle: Constructing Diverse and Scalable Style Dataset via Consistent Text-to-Image Style Mapping Join the discussion on this paper page

MegaStyle: Constructing Diverse and Scalable Style Dataset via Consistent Text-to-Image Style Mapping

huggingface.co/papers/2604....

1 week ago 0 0 0 0

Paper page - FP4 Explore, BF16 Train: Diffusion Reinforcement Learning via Efficient Rollout Scaling Join the discussion on this paper page

FP4 Explore, BF16 Train: Diffusion Reinforcement Learning via Efficient Rollout Scaling

huggingface.co/papers/2604....

2 weeks ago 0 0 0 0

Paper page - Video-MME-v2: Towards the Next Stage in Benchmarks for Comprehensive Video Understanding Join the discussion on this paper page

Video-MME-v2: Towards the Next Stage in Benchmarks for Comprehensive Video Understanding

huggingface.co/papers/2604....

2 weeks ago 0 0 0 0

Paper page - TriAttention: Efficient Long Reasoning with Trigonometric KV Compression Join the discussion on this paper page

TriAttention: Efficient Long Reasoning with Trigonometric KV Compressio

huggingface.co/papers/2604....

2 weeks ago 0 0 0 0

Paper page - Agentic-MME: What Agentic Capability Really Brings to Multimodal Intelligence? Join the discussion on this paper page

Agentic-MME: What Agentic Capability Really Brings to Multimodal Intelligence?

huggingface.co/papers/2604....

2 weeks ago 0 0 0 0

NVIDIA Ada Lovelace Architecture for AI and Deep Learning YouTube video by AIProgrammingHardware

youtu.be/Ehdrt7v0TsM

2 weeks ago 0 0 0 0

NVIDIA Ada Lovelace architecture for AI and Deep Learning

NVIDIA Ada Lovelace architecture for AI and Deep Learning

javaeeeee.medium.com/nvidia-ada-l...

2 weeks ago 0 1 2 0

Paper page - CORAL: Towards Autonomous Multi-Agent Evolution for Open-Ended Discovery Join the discussion on this paper page

CORAL: Towards Autonomous Multi-Agent Evolution for Open-Ended Discovery

huggingface.co/papers/2604....

2 weeks ago 0 0 0 0

How Vision Language Models Are Trained from “Scratch” | Towards Data Science A deep dive into exactly how text-only language models are finetuned to *see* images

How Vision Language Models Are Trained from “Scratch”

towardsdatascience.com/how-vision-l...

3 weeks ago 0 0 0 0

Paper page - MiroEval: Benchmarking Multimodal Deep Research Agents in Process and Outcome Join the discussion on this paper page

MiroEval: Benchmarking Multimodal Deep Research Agents in Process and Outcome

huggingface.co/papers/2603....

3 weeks ago 0 0 0 0

How to Use Ollama to Run Large Language Models Locally – Real Python Learn how to use Ollama to run large language models locally. Install it, pull models, and start chatting from your terminal without needing API keys.

How to Use Ollama to Run Large Language Models Locally

realpython.com/ollama/

3 weeks ago 0 0 0 0

Paper page - GEMS: Agent-Native Multimodal Generation with Memory and Skills Join the discussion on this paper page

GEMS: Agent-Native Multimodal Generation with Memory and Skills

huggingface.co/papers/2603....

3 weeks ago 0 0 0 0

Paper page - DreamLite: A Lightweight On-Device Unified Model for Image Generation and Editing Join the discussion on this paper page

DreamLite: A Lightweight On-Device Unified Model for Image Generation and Editing

huggingface.co/papers/2603....

3 weeks ago 0 0 0 0

Paper page - Voxtral TTS Join the discussion on this paper page

Voxtral TTS

huggingface.co/papers/2603....

3 weeks ago 0 0 0 0

Beyond Code Generation: AI for the Full Data Science Workflow | Towards Data Science Using Codex and MCP to connect Google Drive, GitHub, BigQuery, and analysis in one real workflow

Beyond Code Generation: AI for the Full Data Science Workflow in @towardsdatascience.com

towardsdatascience.com/beyond-code-...

3 weeks ago 0 0 0 0

Paper page - PixelSmile: Toward Fine-Grained Facial Expression Editing Join the discussion on this paper page

PixelSmile: Toward Fine-Grained Facial Expression Editing

huggingface.co/papers/2603....

3 weeks ago 0 0 0 0

Zero-Waste Agentic RAG: Designing Caching Architectures to Minimize Latency and LLM Costs at Scale | Towards Data Science Reducing LLM costs by 30% with validation-aware, multi-tier caching

Zero-Waste Agentic RAG: Designing Caching Architectures to Minimize Latency and LLM Costs at Scale in @towardsdatascience.com

towardsdatascience.com/zero-waste-a...

4 weeks ago 0 0 0 0

Paper page - UI-Voyager: A Self-Evolving GUI Agent Learning via Failed Experience Join the discussion on this paper page

UI-Voyager: A Self-Evolving GUI Agent Learning via Failed Experience

huggingface.co/papers/2603....

4 weeks ago 0 0 0 0

Paper page - PEARL: Personalized Streaming Video Understanding Model Join the discussion on this paper page

PEARL: Personalized Streaming Video Understanding Model

huggingface.co/papers/2603....

4 weeks ago 0 1 0 0

Paper page - Speed by Simplicity: A Single-Stream Architecture for Fast Audio-Video Generative Foundation Model Join the discussion on this paper page

Speed by Simplicity: A Single-Stream Architecture for Fast Audio-Video Generative Foundation Model

huggingface.co/papers/2603....

4 weeks ago 0 0 0 0

A Visual Guide to Attention Variants in Modern LLMs From MHA and GQA to MLA, sparse attention, and hybrid architectures

A Visual Guide to Attention Variants in Modern LLMs
open.substack.com/pub/sebastia...

1 month ago 0 0 0 0

Paper page - Astrolabe: Steering Forward-Process Reinforcement Learning for Distilled Autoregressive Video Models Join the discussion on this paper page

Astrolabe: Steering Forward-Process Reinforcement Learning for Distilled Autoregressive Video Models

huggingface.co/papers/2603....

1 month ago 0 0 0 0

Posts by