Advertisement · 728 × 90

Posts by Matteo Robino

I don't like at all the instructions following of all Gemini models, it's a model for people with easy/medium task and just one shot every requests

4 months ago 0 0 0 0
Video

Choudhury and Kim et al., "Accelerating Vision Transformers With Adaptive Patch Sizes"

Transformer patches don't need to be of uniform size -- choose sizes based on entropy --> faster training/inference. Are scale-spaces gonna make a comeback?

6 months ago 16 4 3 1
A scatter plot titled “AIME25 — Total Memory vs. Accuracy (Qwen3)” compares model accuracy (%) against total memory usage (weights + KV cache, in GB) for various Qwen3 model sizes and quantization levels.

Axes:
	•	X-axis: Total Memory (Weight + KV Cache) [GB] (log scale, ranging roughly from 1 to 100)
	•	Y-axis: Accuracy (%), ranging from 0 to 75

Legend:
	•	Colors: model sizes —
	•	0.6B (yellow)
	•	1.7B (orange)
	•	4B (salmon)
	•	8B (pink)
	•	14B (purple)
	•	32B (blue)
	•	Shapes: precision levels —
	•	Circle: 16-bit
	•	Triangle: 8-bit
	•	Square: 4-bit
	•	Marker size: context length —
	•	Small: 2k tokens
	•	Large: 30k tokens

Main trend:
Larger models (rightward and darker colors) achieve higher accuracy but require significantly more memory. Smaller models (left, yellow/orange) stay below 30% accuracy. Compression (8-bit or 4-bit) lowers memory usage but can reduce accuracy slightly.

Inset zoom (upper center):
A close-up box highlights the 8B (8-bit) and 14B (4-bit) models showing their proximity in accuracy despite differing memory footprints.

Overall, the chart demonstrates scaling behavior for Qwen3 models—accuracy grows with total memory and model size, with diminishing returns beyond the 14B range.

A scatter plot titled “AIME25 — Total Memory vs. Accuracy (Qwen3)” compares model accuracy (%) against total memory usage (weights + KV cache, in GB) for various Qwen3 model sizes and quantization levels. Axes: • X-axis: Total Memory (Weight + KV Cache) [GB] (log scale, ranging roughly from 1 to 100) • Y-axis: Accuracy (%), ranging from 0 to 75 Legend: • Colors: model sizes — • 0.6B (yellow) • 1.7B (orange) • 4B (salmon) • 8B (pink) • 14B (purple) • 32B (blue) • Shapes: precision levels — • Circle: 16-bit • Triangle: 8-bit • Square: 4-bit • Marker size: context length — • Small: 2k tokens • Large: 30k tokens Main trend: Larger models (rightward and darker colors) achieve higher accuracy but require significantly more memory. Smaller models (left, yellow/orange) stay below 30% accuracy. Compression (8-bit or 4-bit) lowers memory usage but can reduce accuracy slightly. Inset zoom (upper center): A close-up box highlights the 8B (8-bit) and 14B (4-bit) models showing their proximity in accuracy despite differing memory footprints. Overall, the chart demonstrates scaling behavior for Qwen3 models—accuracy grows with total memory and model size, with diminishing returns beyond the 14B range.

Is 32B-4bit equal to 16B-8bit? Depends on the task

* math: precision matters
* knowledge: effective param count is more important
* 4B-8bit threshold — for bigger prefer quant, smaller prefer more params
* parallel TTC only works above 4B-8bit

arxiv.org/abs/2510.10964

6 months ago 31 8 3 0
Luc Julia au Sénat : autopsie d'un grand N'IMPORTE QUOI
Luc Julia au Sénat : autopsie d'un grand N'IMPORTE QUOI YouTube video by Monsieur Phi

NOUVELLE VIDEO ! Je décortique le cas Luc Julia, le réputé co-créateur de Siri et expert mondial de l'IA, encensé dans les médias et récemment auditionné au Sénat. Le résultat est salé mais c'était nécessaire.

youtu.be/e5kDHL-nnh4
youtu.be/e5kDHL-nnh4
youtu.be/e5kDHL-nnh4

8 months ago 396 163 45 54
Preview
Eleven-minute race for food: how aid points in Gaza became ‘death traps’ – a visual story Hundreds of people have died while seeking food since delivery was taken over by the Gaza Humanitarian Foundation in May. But Palestinians facing extreme hunger have no choice but to take the risk

Je vous implore de lire ceci.

8 months ago 88 57 8 5

Why would you ride in a car driven by a human? Do you have some sort of death wish?

9 months ago 44 5 6 1

Last month I did a little experiment.

I wanted to see how the exact same post would perform on both X (Twitter) and Bluesky.

The results were...interesting...

[Thread]

1 year ago 2489 1407 125 347

New LinkedIn wall background, thanks

1 year ago 0 0 0 0
Advertisement
Post image

Want strong SSL, but not the complexity of DINOv2?

CAPI: Cluster and Predict Latents Patches for Improved Masked Image Modeling.

1 year ago 49 10 1 1
Preview
DINOv2: Learning Robust Visual Features without Supervision The recent breakthroughs in natural language processing for model pretraining on large quantities of data have opened the way for similar foundation models in computer vision. These models could...

Outstanding Finalist 2: “DINOv2: Learning Robust Visual Features without Supervision," by Maxime Oquab, Timothée Darcet, Théo Moutakanni et al. 5/n openreview.net/forum?id=a68...

1 year ago 8 3 2 0

Pourquoi faire

1 year ago 0 0 1 0

J’ai l’impression que le graphique nous dit pourtant l’inverse, non?

1 year ago 0 0 0 0
Preview
GitHub - verlab/accelerated_features: Implementation of XFeat (CVPR 2024). Do you need robust and fast local feature extraction? You are in the right place! Implementation of XFeat (CVPR 2024). Do you need robust and fast local feature extraction? You are in the right place! - verlab/accelerated_features

XFeat: Accelerated Features for Lightweight Image Matching

code: github.com/verlab/accel...
paper: arxiv.org/abs/2404.19174
project: www.verlab.dcc.ufmg.br/descriptors/...

1 year ago 3 1 0 0
Originally the default wallpaper of Microsoft's Windows XP, this photo shows green rolling hills with a vibrant blue sky and white clouds in the background. Charles O'Rear took the photo in California, USA.

Originally the default wallpaper of Microsoft's Windows XP, this photo shows green rolling hills with a vibrant blue sky and white clouds in the background. Charles O'Rear took the photo in California, USA.

We've always been a fan of blueskies.

51 years ago 11815 2095 673 651

Free speech on twitter:

1 year ago 109 5 5 0
Post image Post image Post image Post image

My deep learning course at the University of Geneva is available on-line. 1000+ slides, ~20h of screen-casts. Full of examples in PyTorch.

fleuret.org/dlc/

And my "Little Book of Deep Learning" is available as a phone-formatted pdf (nearing 700k downloads!)

fleuret.org/lbdl/

1 year ago 1250 247 46 17
Preview
GitHub - davidgasquez/docs-to-llmstxt: 🤖 Compile docs into text files for LLMs 🤖 Compile docs into text files for LLMs. Contribute to davidgasquez/docs-to-llmstxt development by creating an account on GitHub.

Inspired by @simonwillison.net llm-docs repo, I did a similar one compiling projects docs into single TXT files that can be fed to LLMs.

Right now, it only has atproto docs but already been useful to me to answer random questions about the project.

github.com/davidgasquez...

1 year ago 64 6 4 0
Advertisement