Chris Pal (@chrisjpal) Bsky

2025 BERT is NeoBERT! We have fully pre-trained a next-generation encoder for 2.1T tokens with the latest advances in data, training, and architecture. This is a heroic effort from my PhD student, Lola Le Breton, in collaboration with Quentin Fournier and Mariam El Mezouar (1/n)

1 year ago 39 11 1 4

🎉 Excited to introduce BigDocs!
An open, transparent multimodal dataset designed for:
📄 Documents
🌐 Web content
🖥️ GUI understanding
👨‍💻 Code generation from images
We’re also launching BigDocs-Bench:
➡️ Document, Web, GUI Visual reasoning
➡️ Converting images into JSON, Markdown, LaTeX, SVG, and more!

1 year ago 16 8 1 2

I think it is the AC's job to detect when a reviewer's criticisms have been addressed, but the reviewer has checked out, and then throw out that reviewer's score. I think this should be made an official policy of ICLR. Reviewers who do this are being far worse that just rude.

1 year ago 4 0 1 0

This could easily be a photo of exactly what you describe.

1 year ago 2 0 0 0

LLMs have a lot of potential for science, but scientists can be particularly sensitive to factuality, nuances, and hallucinations. The new ScholarQABench benchmark in this paper looks pretty useful for the community to monitor progress on LLMs for science. arxiv.org/html/2411.14199

1 year ago 1 1 0 0

Posts by Chris Pal