Advertisement · 728 × 90
#
Hashtag
#cuda
Advertisement · 728 × 90
谷歌一篇论文砸崩内存巨头?不懂“显存墙”,怎么做 AI 时代的工程师! - Tony Bai 本文永久链接 - https://tonybai.com/2026/03/28/ai-engineer-gpu-introduction-course 大家好,我是Tony Bai。 就在最近,科技界发生了一件极其戏剧性的事情。本周三美股开盘,全球存储产业巨头——美光、西部数

谷歌一篇论文砸崩内存巨头?不懂“显存墙”,怎么做 AI 时代的工程师! 本文永久链接 – tonybai.com/2026/03/28/ai-engineer-g... 大家好...

#技术志 #AIModel #AI模型 #ArtificialIntelligence #AttentionMechanism #ComputeBound #ComputingPower #CUDA #FlashAttention #FP8 #Go

Origin | Interest | Match

2 1 0 0
Post image

От MNIST к Transformer. Часть 4. Gradient Descent. Обучаем нашу модель Мы живем в эпоху, когда ИИ стал доступен каждому. Но за м...

#cuda #c++ #ml

Origin | Interest | Match

0 0 0 0

¿
#CUDA:
Use #Custom-installer option
to redirect Toolkit path to E:\.

0 0 0 0

#CUDA 13.2 cu130
Uninstall toolkit and related drivers
via Control Panel
Look for entries labeled
#NVIDIA-CUDA or
#CUDA-Toolkit

0 0 0 0
Join us at IWOCL 2026 for Paulius Velesko's keynote, chipStar: OpenCL as a Portability Layter for CUDA/HIP Applications

Join us at IWOCL 2026 for Paulius Velesko's keynote, chipStar: OpenCL as a Portability Layter for CUDA/HIP Applications

Keynote at IWOCL 2026: Paulius Velesko presents chipStar — compiling unmodified CUDA/HIP code into OpenCL & SPIR-V fat binaries that run on Intel, AMD, NVIDIA, ARM, and RISC-V hardware. No recompilation needed.

Join us at IWOCL 2026, May 6–8 in Heilbronn […]

[Original post on fosstodon.org]

2 0 0 0

¿
To use or set
#variables
in Windows 11 command files
(batch scripts) and to
avoid typing long pathnames for
#Python 3.14t
#Python 3.14
#CUDA 13.2
you can use
#set command
for temporary sessions
or
#setx for permanent changes

2 0 0 0
Preview
DRTriton: Large-Scale Synthetic Data Reinforcement Learning for Triton Kernel Generation Developing efficient CUDA kernels is a fundamental yet challenging task in the generative AI industry. Recent researches leverage Large Language Models (LLMs) to automatically convert PyTorch refer…

DRTriton: Large-Scale Synthetic Data Reinforcement Learning for Triton Kernel Generation

#Triton #CUDA #LLM

hgpu.org?p=30706

0 0 0 0
Preview
AutoKernel: Autonomous GPU Kernel Optimization via Iterative Agent-Driven Search Writing high-performance GPU kernels is among the most labor-intensive tasks in machine learning systems engineering. We present AutoKernel, an open-source framework that applies an autonomous agen…

AutoKernel: Autonomous GPU Kernel Optimization via Iterative Agent-Driven Search

#CUDA #Triton #Package

hgpu.org?p=30703

0 1 0 0
Original post on hgpu.org

AutoKernel: Autonomous GPU Kernel Optimization via Iterative Agent-Driven Search Writing high-performance GPU kernels is among the most labor-intensive tasks in machine learning systems engineering...

#Computer #science #CUDA #paper #Machine #learning #nVidia #nVidia #B200 #nVidia #H100

Origin […]

0 0 0 0

GPUmonty harnesses CUDA power to simulate the spectacular light from material spiraling into black holes, accelerating relativistic radiative transfer 10x faster than CPU codes.

https://github.com/black-hole-group/gpumonty

#BlackHoles #CUDA #Astrophysics

0 0 0 0
Post image

ICYMI: NVIDIA driver 595.58.03 released as the big new recommended stable driver for Linux

#CUDA #GeForce #Linux #LinuxGaming #NVIDIA #OpenGL #PCGaming #RTXOn #Vulkan

www.gamingonlinux.com/2026/03/nvid...

0 1 0 0

¤
how to verify
in window11
that
#cuda.tile(cuTile)-library
got priperly installed with
#CUDA 13.2
for headless GPU
#GeForce-RTX-5060

0 0 0 0

¤
which #AI stack
for robotics development
in
#windows11
with
#CUDA 13.2
and
#python 3.14
and
#PyTorch 2.10.0

1 0 0 0
Preview
MobileKernelBench: Can LLMs Write Efficient Kernels for Mobile Devices? Large language models (LLMs) have demonstrated remarkable capabilities in code generation, yet their potential for generating kernels specifically for mobile devices remains largely unexplored. In …

MobileKernelBench: Can LLMs Write Efficient Kernels for Mobile Devices?

#CUDA #LLM #CodeGeneration

hgpu.org?p=30695

0 0 0 0
Preview
SOL-ExecBench: Speed-of-Light Benchmarking for Real-World GPU Kernels Against Hardware Limits As agentic AI systems become increasingly capable of generating and optimizing GPU kernels, progress is constrained by benchmarks that reward speedup over software baselines rather than proximity t…

SOL-ExecBench: Speed-of-Light Benchmarking for Real-World GPU Kernels Against Hardware Limits

#CUDA #Triton #Benchmarking #Package

hgpu.org?p=30694

0 0 0 0
Preview
LLMQ: Efficient Lower-Precision LLM Training for Consumer GPUs We present LLMQ, an end-to-end CUDA/C++ implementation for medium-sized language-model training, e.g. 3B to 32B parameters, on affordable, commodity GPUs. These devices are characterized by low mem…

LLMQ: Efficient Lower-Precision LLM Training for Consumer GPUs

#CUDA #LLM #Package

hgpu.org?p=30692

0 0 0 0
Original post on hgpu.org

MobileKernelBench: Can LLMs Write Efficient Kernels for Mobile Devices? Large language models (LLMs) have demonstrated remarkable capabilities in code generation, yet their potential for generating...

#Computer #science #CUDA #paper #Benchmarking #Code #generation #LLM #nVidia #nVidia #A100 […]

2 0 0 0

which
command mode not Microsoft
#C-compiler
with
#CUDA 13.2
and
#python 3.14
and
#PyTorch 2.10.0
with headless GPU
#GeForce-RTX-5060
and processor
#AMD-Ryzen-9-9900X
in
#windows11
for
#AI development

1 0 0 0

📝 【CUDA】GPUの計算能力(Compute Capability)を確認する方法:...

問題の概要:CUDA対応GPUの計算能力(Compute Capability)がわからない CUDA(Compute …

🔗 https://aitroublesolution.com/?p=2664

#CUDA #NVIDIA #GPU

0 0 0 0

📝 【CUDA】update-alternativesで複数バージョンを共存・切り替え!...

問題の概要:CUDAバージョン競合によるエラー AI開発、特に深層学習のモデルトレーニングや推論を行う際、異なるフレーム…

🔗 https://aitroublesolution.com/?p=2666

#CUDA #NVIDIA #GPU

0 0 0 0

Ξ dragonized
ξ #imported-torch
ξ #CUDA available
ξ #GPU Name: NVIDIA GeForce RTX 5060
ξ #PyTorch CUDA version: 13.0
ξ #Tensor on GPU: tensor([1.0, 2.0], device="cuda:0")

0 0 0 0

Ξ dragonized
ξ #imported-torch
ξ #CUDA available
ξ #GPU Name: NVIDIA GeForce RTX 5060
ξ #PyTorch CUDA version: 13.0
ξ #Tensor on GPU: tensor([1.0, 2.0], device="cuda:0")

0 0 0 0

Ξ dragonized
ξ #imported-torch
ξ #CUDA available
ξ #GPU Name: NVIDIA GeForce RTX 5060
ξ #PyTorch CUDA version: 13.0
ξ #Tensor on GPU: tensor([1.0, 2.0], device="cuda:0")

0 0 0 0

Ξ dragonized
ξ #imported-torch
ξ #CUDA available
ξ #GPU Name: NVIDIA GeForce RTX 5060
ξ #PyTorch CUDA version: 13.0
ξ #Tensor on GPU: tensor([1.0, 2.0], device="cuda:0")

0 0 0 0

Ξ dragonized
ξ #imported-torch
ξ #CUDA available
ξ #GPU Name: NVIDIA GeForce RTX 5060
ξ #PyTorch CUDA version: 13.0
ξ #Tensor on GPU: tensor([1.0, 2.0], device="cuda:0")

0 0 0 0

Ξ dragonized
ξ #imported-torch
ξ #CUDA available
ξ #GPU Name: NVIDIA GeForce RTX 5060
ξ #PyTorch CUDA version: 13.0
ξ #Tensor on GPU: tensor([1.0, 2.0], device="cuda:0")

0 0 0 0

Ξ dragonized
ξ #imported-torch
ξ #CUDA available
ξ #GPU Name: NVIDIA GeForce RTX 5060
ξ #PyTorch CUDA version: 13.0
ξ #Tensor on GPU: tensor([1.0, 2.0], device="cuda:0")

0 0 0 0

Ξ dragonized
ξ #imported-torch
ξ #CUDA available
ξ #GPU Name: NVIDIA GeForce RTX 5060
ξ #PyTorch CUDA version: 13.0
ξ #Tensor on GPU: tensor([1.0, 2.0], device="cuda:0")

0 0 0 0

Ξ dragonized
ξ #imported-torch
ξ #CUDA available
ξ #GPU Name: NVIDIA GeForce RTX 5060
ξ #PyTorch CUDA version: 13.0
ξ #Tensor on GPU: tensor([1.0, 2.0], device="cuda:0")

0 0 0 0

Ξ dragonized
ξ #imported-torch
ξ #CUDA available
ξ #GPU Name: NVIDIA GeForce RTX 5060
ξ #PyTorch CUDA version: 13.0
ξ #Tensor on GPU: tensor([1.0, 2.0], device="cuda:0")

0 0 0 0