Benchmark Tests MLLM Web Understanding: Reasoning, Robustness, Safety
WebRSSBench, a benchmark for multimodal LLMs, defines eight web tasks and evaluated twelve models, exposing gaps in compositional reasoning and reduced robustness to layout changes. getnews.me/benchmark-tests-mllm-web... #webrssbench #multimodalllm
0
0
0
0