TF-Bench Introduces Type Inference Benchmark for LLM Program Reasoning
TF‑Bench, a new benchmark for type inference in System F, shows Claude‑3.7‑sonnet reaching 55.85% accuracy on the pure variant, showing gaps in LLM reasoning. getnews.me/tf-bench-introduces-type... #tfsbench #typeinference #llm
0
0
0
0