LongProc Benchmark Shows Gaps in Long-Context Model Performance
The new LongProc benchmark tests procedural long-context tasks with outputs up to 8,000 tokens, evaluating 23 models including GPT-4o, which dropped performance on the toughest tier. Read more: getnews.me/longproc-benchmark-shows... #longproc #lclm
0
0
0
0