LLMs can repair code, but often miss the broader context developers use every day.
We propose a 3-layer knowledge injection framework that incrementally feeds LLMs with bug, repository, and project knowledge.
Preprint of our ASE '25 paper: arxiv.org/pdf/2506.24015
Posts by Preetha Chatterjee
Error analysis reveals that unresolved bugs are not randomly distributed; they cluster around specific bug types and higher complexity profiles. In particular, Program Anomaly, Network, and GUI bugs remain the most challenging for both models.
Evaluated on 314 real-world Python bugs, we observed consistent gains in both #fixed and Pass@k scores for Llama 3.3 and GPT-4o-mini, demonstrating a 23% improvement over prior work.
This layered approach offers several advantages.
Allows simpler bugs to be fixed with minimal input, conserving tokens and computation.
Scales context progressively, injecting more information only when necessary. Enables analysis of bug types & complexity.
1️⃣ Bug Knowledge (e.g., immediate code and test context)
2️⃣ Repository Knowledge (e.g., related files, dependencies, commit history)
3️⃣ Project Knowledge (e.g., documentation, past bug fixes)
LLMs can repair code, but often miss the broader context developers use every day.
We propose a 3-layer knowledge injection framework that incrementally feeds LLMs with bug, repository, and project knowledge.
Preprint of our ASE '25 paper: arxiv.org/pdf/2506.24015
🌍 The future of #icse is global!
🇧🇷 ICSE 2026 – Brazil #icse2026
🇮🇪 ICSE 2027 – Ireland #icse2027
🌺 ICSE 2028 – Hawaii #icse2028
We can't wait to see you there! Pack your ideas and your passport. 🧳✈️
💡 If you are building, evaluating, or relying on LLMs for software development, please ask yourself: Did it warn you about the hidden security risk?
As a preliminary solution to this problem, we built a CLI tool prototype that integrates static analysis with LLM prompting, aiming to make AI code suggestions more secure by design.
However, when LLMs do warn you, they tend to offer more complete explanations, including potential causes of the vulnerability, exploits, and even fixes.
We evaluated GPT-4, Claude 3, and Llama 3 across 300 real-world Stack Overflow posts containing vulnerable code.
The results?
⚠️<40% of vulns flagged
⚠️As low as 12.6% when code was obfuscated
⚠️Common issues (e.g., unsanitized input) often missed - unless explicitly prompted
LLMs are great at generating code, but are they silently spreading vulnerabilities? TLDR: Yes.
In our latest EMSE paper, we look into: when developers unknowingly share vulnerable code with LLMs, do these models proactively raise security red flags? 🧵
👉 Read the paper: arxiv.org/abs/2502.14202
Delighted to share that our paper, led by my PhD advisee Ramtin Ehsani, “Towards Detecting Prompt Knowledge Gaps for Improved LLM-guided Issue Resolution,” has been accepted to the Research Track of MSR 2025.
Preprint: soar-lab.github.io//papers/MSR2...
I can now run a GPT-4 class model on my laptop
(The exact same laptop that could just about run a GPT-3 class model 20 months ago)
The new Llama 3.3 70B is a striking example of the huge efficiency gains we've seen in the last two years
simonwillison.net/2024/Dec/9/l...
Congrats!!
#NeurIPS2024 paper 3, Assemblage - the dataset of source-to-binary projects compiled from GitHub that you've dreamed of bet never had before! Collab with @krismicinski.bsky.social and a multi-year effort to get to @NeurIPSConf @BoozAllen arxiv.org/abs/2405.03991
🎉 Thrilled to share that our paper (with Ramtin Ehsani and @rezapour.bsky.social) has been accepted at NLBSE'25, co-located with @icseconf.bsky.social! 🎉
Our work shows promise in improving toxicity detection in OSS using moral values & psycholinguistic cues. Preprint coming soon.
Can you please add me here