Many people want try visual models for study, hobby, or personal projects.
We added a matched subscription tier for you!
Just launched 'Plus Tier' in MediaSage with huge discounts.
Ready to grant Plus tier for active contributors in our forum. 😊
media.nurie.ai
#MediaSage #VLMs #AiImage #AiVideo
It's not easy to describe what we see.
So you can also try the Picture-to-Prompt.
My colleague draw this in the wall for Christmas, I wonder if I can reuse it somehow.
Check my result.
You can also try it here - media.nurie.ai
#Image2Prompt #Picture2Prompt #VLMs #MediaSage #NURIEAI #NURIE
🎄 Merry Christmas from @vlmrun.bsky.social!
Grateful to our customers and partners trusting us with the most demanding visual workloads: documents, images, and video at scale.
Here’s to a bigger year turning pixels into production systems.
#genai #multimodal #vlms #infrastructure
#NVIDIA Robotics Research and Development Digest (#R²D²) explores novel approaches to improving #robot @manipulation skills via research efforts that use reasoning #LLMs, sim-and-real co-training & #VLMs for designing tools.
developer.nvidia.com/blog/r2d2-im...
🚗 Call for Papers — #COMMTR Special Issue
"Foundation Models for Intelligent Control in Autonomous Driving Traffic Systems"
COMMTR welcomes submissions exploring how #LLMs, #VLMs, and #multimodalfoundationmodels are advancing autonomous driving and intelligent traffic systems. 🤖
* @imperialcollegeldn.bsky.social @ic-cep.bsky.social @imperialsci.bsky.social @ucl.ac.uk
#IDSSD #AI #NLP #LLMs #VLMs #AIforSustainability #AIforNature #AIforClimate #SDG13 #SDGs #GlobalGoals #2030Agenda #AIforGood #AI4SDGs #Science4Policy #ResponsibleAI #SustainableAI #DigitalPublicInfrastructure
🐇Into the Rabbit Hull — Part 1: A Deep Dive into DINOv2🧠
Our latest Deeper Learning blog post is an #interpretability deep dive into one of today’s leading vision foundation models: DINOv2.
📖Read now: bit.ly/4nNfq8D
Stay tuned — Part 2 coming soon.
#AI #VLMs #DINOv2
SURE-VQA: Systematic Understanding of Robustness Evaluation in Medical VQA Tasks
Kim-Celine Kahl, Selen Erkan, Jeremias Traub et al.
Action editor: Weijian Deng
https://openreview.net/forum?id=qjNdGpgpV8
#vlm #vlms #visual
New #OpenAccess research in #RESSystematicEnt
Descriptron: Artificial intelligence for automating taxonomic species descriptions with a user-friendly software package
doi.org/10.1111/syen.70005
#AnalyticalAI #AI #VLMs #ViTs #LLMs #Taxonomy
@gkergoat.bsky.social @wiley.com
I2C-UHU-PEGASUS at FungiCLEF 2025: Multimodal Pipeline for Rare Fungal Species Classification Using Fine-Tuned VLMs and Ecological Context
#FungiCLEF
#FungalSpecies
#VLMs
#Few-shotlearning
#Fungalclassification
#Rarespecies
#MultimodalAI
#DeepLearning
#TransferLearning
ceur-ws.org/Vol-4038/pap...
🚀 Big news from #CLiCit2025!
Our PhD student Davide Testa presented MAIA 🎞️🧠🇮🇹 — the first Italian benchmark to test Vision-Language Models on multimodal reasoning & robustness.
📄 You can check the Paper in the CLiC-it pre-processing: clic2025.unica.it/wp-content/u...
#AI #NLProc #VLMs
Presentamos FineVision, el dataset más grande para entrenar VLMs. ¡Un recurso increíble con 24M de muestras que democratiza el acceso a datos de alta calidad!
youtu.be/VxXd48jOdJs
#IA #LLMs #FineVision #HuggingFace #MachineLearning #VLMs
"We conduct a head-to-head comparison of 30 cutting-edge general-purpose and medical-specialized VLMs. The results show that the current state-of-the-art #VLMs perform poorly on PET report generation task, falling considerably short of fulfilling practical needs."
arxiv.org/abs/2508.040...
We're thrilled to have Ahmet Iscen of Google DeepMind with us tomorrow to talk about his work on #VLMs. Join us online! 'VLM-driven data & context curation for visual understanding'
🗓️Tuesday July 8th: 11am CEST
Registration & more info: tinyurl.com/yz3rvz3z
🙌 Huge thanks to the team:
Muhammad Sohail Danish, Muhammad Akhtar Munir, Syed Roshaan Ali Shah, Kartik Kuckreja, Fahad Khan, Paolo Fraccaro, Alexandre Lacoste, Salman Khan
Follow for updates!
#ICCV2025 #VLMs #AI4EO #RemoteSensing #GeospatialAI #MachineLearning #Benchmarking
It was an honor to give a keynote lecture about #VLMs for Bio-Image #DataScience at the @helmholtzimaging.bsky.social #HIconference2025.
You find my slides #openaccess 🔬🖥️🚀
doi.org/10.5281/zeno...
The conversation linked Text-to-LoRA to similar adaptation efforts in other domains, like Vision Language Models (VLMs). This highlights a broader trend in making large models more flexible. #VLMs 5/6
🤔 "Overall, while modern #VLMs demonstrate promise in basic and recognition-heavy tasks, their applicability to real-world diagnostics is currently limited by weak visual signal, unreliable numeracy, and shallow reasoning chains."
www.arxiv.org/abs/2505.18915
New in the Deeper Learning blog: Kempner researchers show how VLMs speak the same semantic language across images and text.
bit.ly/KempnerVLM
by @isabelpapad.bsky.social ,Chloe Huangyuan Su, @thomasfel.bsky.social, Stephanie Gil, and @shamkakade.bsky.social
#AI #ML #VLMs #SAEs
🧵 7/7
📢 Shoutout to my amazing co-authors and to ServiceNow Research and Mila for making this happen! 🚀
📄 Read the full paper: arxiv.org/abs/2502.15210
#PairBench #LLMs #VLMs #GenAI #AutoEval
So happy to share this: secret instructions hidden in images can terribly alter the output of VLMs to create misdiagnosis. Models have different susceptibility, though, with Claude3.5 apparently being much better aligned to ethical outputs than GPT4o.
#aisafety
#LLMs
#VLMs
Dr. Stewart Worrall (The University of Sydney)
Dr. Ignacio Alvarez (Intel Labs)
Maria Lyssenko (BOSCH)
Andra Petrovai (TU Cluj-Napoca)
#IV2025 #IEEE #ITS #FoundationModels #AutonomousDriving
#Perception #IntelligentVehicles
#LLMs #VLMs
#IntelligentTransportationSystems #3DVision
Llama 3.2 Vision — A Deep Dive Vision-Language Models (VLMs) allow LLMs to “see”, but how d...
graphcore-research.github.io/posts/llama-vision/
#posts #transformers #LLMs #VLMs
Event Attributes
Llama 3.2 Vision — A Deep Dive Vision-Language Models (VLMs) allow LLMs to “see”, but how d...
graphcore-research.github.io/posts/llama-vision/
#posts #transformers #LLMs #VLMs
Event Attributes
Anyone is aware of the latest 🍿 on patch/pixel-level image-text alignment? Ideally something that does not use/need segmentation masks. #VLMs #multimodal
sdxl was much more open to non-standard inpainting edits than flux.
I need a more scientific approach to this one to be sure.
#buildinpublic #vlms #stablediffusion #flux #sdxl
🎮 BALROG: Benchmarking AI with Games
📊 BALROG tests #LLMs & #VLMs on 6 games:
🧭 BabyAI (navigation)
🛠️ Crafter (crafting/survival)
📜 TextWorld (puzzles)
🎲 Baba Is #AI (rule manipulation)
🔍 #Llama3 70B outperforms #GPT4 in tasks like Baba Is AI. Open models excel in text over visuals.
#Gaming
Discovered GeoDE at today's Sundai Club—a Princeton dataset showing how 'stove' or 'house' are perceived differently worldwide. Exciting potential for culturally-aware ML projects with foundation models ( #CLIP, #VLMs). Check it out: geodiverse-data-collection.cs.princeton.edu #MachineLearning #AI