Advertisement · 728 × 90
#
Hashtag
#MXFP4
Advertisement · 728 × 90
Preview
Diagnosing FP4 inference: a layer-wise and block-wise sensitivity analysis of NVFP4 and MXFP4 Quantization addresses the high resource demand for large language models (LLMs) by alleviating memory pressure and bandwidth congestion and providing significantly scaled compute power with a tole…

Diagnosing FP4 inference: a layer-wise and block-wise sensitivity analysis of NVFP4 and MXFP4

#LLM #FP4 #NVFP4 #MXFP4 #Precision #AMD #NVIDIA

hgpu.org?p=30661

0 1 0 0
Preview
What’s MXFP4? The 4-Bit Secret Powering OpenAI’s GPT‑OSS Models on Modest Hardware A Blog post by Rakshit Aralimatti on Hugging Face

With #ChatGPT5, #OpenAI shifted part of their priorities to efficiency and cost-saving.

The new auto-routing helps select the best model for the prompt (to some success).

But another big change is the support for #MXFP4, which can have a changing impact on all future models.

#AI #ChatGPT

0 1 0 0
Preview
OpenAI gpt-oss LLMs use MXFP4: smaller, faster, cheaper Analysis: Decision to use MXFP4 makes models smaller, faster, and more importantly, cheaper for everyone involved

How OpenAI used a new data type to cut inference costs by 75%

Decision to use #MXFP4 makes models smaller, faster, and more importantly, cheaper for everyone involved

MXFP4 is a micro-scaling block floating-point format defined by OCP

www.theregister.com/2025/08/10/o...

3 1 1 0