This is the output I want. What input do I need? Today at ICSE - International Conference on Software Engineering, Tural Mammadov presented his work on Modelizer - the framework that learns from synthesized program executions to predict inputs from outputs and vice versa: dl.acm.org/doi/10.1145/...
Posts by Andreas Zeller
I‘m gonna need a bigger suitcase #ICSE2026
In the #ICSE2026 Wednesday 14:00 session, I will be giving my Harlan D. Mills Award talk (likely at 14:10 already). Enjoy! conf.researchr.org/details/icse...
On my way to Rio de Janiero, visiting #ICSE2026 - here with Tural Mammadov. See you soon!
"If you don't test your software, someone else will." Others may discover bugs and vulnerabilities. Here’s where Fandango, CISPA’s new tool for automated software testing comes in. More from CISPA-Faculty @andreaszeller.bsky.social youtube.com/shorts/LuPgQ...
Visiting #FSE2026 in Montreal? Do not miss our #Fandango tutorial on Sunday, July 5, where we show how to systematically generate inputs and interactions for comprehensive software testing (with Pepe Zamudio, Marius Smytzek, and Alexander Liggesmeyer). Find Fandango at fandango-fuzzer.github.io
The IEEE Computer Society interviewed me on my past and the Future of Automated Debugging and Software Testing. Enjoy! www.computer.org/publications...
Homepage of Andreas Zeller, now with the text "help build better software" rather than "help _developers_ build better software"
For decades, my mission was to help developers build better software. Now I help anyone, including AI: andreas-zeller.info
FLAT: Formal Languages as Types And Their Applications in Testing FENGMIN ZHU, CISPA Helmholtz Center for Information Security, Germany ANDREAS ZELLER, CISPA Helmholtz Center for Information Security, Germany Programmers regularly use strings to encode many types of data, such as Unix file paths, URLs, and email addresses. They are conceptually different, but existing mainstream programming languages treat them as the same string type. This is problematic: the type system allows, for instance, malicious HTML text to be passed to a function expecting an email address. To distinguish conceptually different string types and to avoid potential vulnerabilities, we regard formal languages as types (FLAT), thereby restricting the set of valid strings using context-free grammars and, if needed, semantic constraints. Applying this type-based approach, we offer a unified solution for string API documentation, input validation, malicious input detection, language-based fuzzing, and test oracles, all at once, based on user-annotated formal language types and, if necessary, pre- and post-conditions. We implement this idea and present FLAT-PY, a testing framework for Python. By attaching annotations directly to Python code, FLAT-PY automatically performs runtime type checking via code instrumentation and reports any detected type errors as soon as possible. We conducted case studies on real Python code fragments: FLAT-PY can detect logical bugs from random inputs generated by a language-based fuzzer, relying on a reasonable number of user annotations.
In a call "retrieve(account: string)", nobody checks the contents of "account". What if we could specify its type not just as a string, but as a formal language - say, a regex "[0-9]+"? In our new paper, we do exactly this - for better type checking and even test generation: doi.acm.org?doi=3799978
Brad Pitt in front of a classroom (AI-generated)
My successor as a professor will be some AI video tutor with the appearance of Brad Pitt, available 24/7, unlimited patience, personalized towards each student, the ability to teach any subject ever discussed in a textbook, and a cost of < 1$/hour. Good thing I can still do research! (Now wait...)
IEEE Computer Society Harlan D. Mills Award and Talk by Andreas Zeller Should Computer Scientists Experiment Less? On the past, present, and future of software engineering research
IEEE Computer Society Harlan D. Mills Award and Talk by @andreaszeller.bsky.social
Should Computer Scientists Experiment Less? On the past, present, and future of software engineering research
More information at conf.researchr.org/details/icse...
"Should Computer Scientists Experiment Less?" This is the title of my upcoming Harlan D. Mills Award Talk at ICSE 2026 on the past, present, and future of Software Engineering research. Looking forward to lots of productive discussions!
conf.researchr.org/details/icse...
Impact award! I am happy to report that my ICSE 2006 paper "Mining metrics to predict component failures," with Nachi Nagappan and Thomas Ball, has been selected to receive a retrospective ICSE SEIP Most Influential Paper Award. Read it here: dl.acm.org/doi/10.1145/...
Over the past decade, the automated generation of test inputs has made significant advances. Modern fuzzers and test generators easily produce complex input formats that do systematically cover the input and execution space. Testing protocols, though, has remained a frontier for automated testing, as a test generator has to interact with the program under test, producing messages that conform to the current state of the system. In this paper, we introduce language-based protocol testing, the first approach to specify, automatically test, and systematically cover the full state and input space of protocol implementations. We specify protocols as interaction grammars—an extension of context-free grammars that tag each message element with the communication party that is in charge of producing it. Interaction grammars embed classical state models by unifying states, messages, and transitions all into nonterminals, and can be used for producing interactions as well as parsing them, making them ideally suited for testing protocols. Additional constraints over grammar elements allow us to specify and test semantic features such as binary message formats, checksums, encodings, and the many ways that message features induce states and vice versa. To evaluate the effectiveness of language-based protocol testing, we have implemented it as part of the FANDANGO test generator. We specify several protocols as interaction grammars, including features such as human-readable interactions (SMTP), bit-level encodings (DNS), and dynamic port assignments (FTP), and use them to test the corresponding protocol implementations. By systematically covering the interaction grammar and solving the associated constraints, FANDANGO achieves comprehensive coverage of the protocol interactions, resulting in high code coverage and a thorough assessment of the program under test.
With more and more AI-generated code, comprehensive system testing becomes more important than ever. Our new paper "Language-Based Protocol Testing" (with Alexander Liggesmeyer and Pepe Zamudio), shows how to specify and test all details of how programs interact: arxiv.org/abs/2509.20308
On my way to Savannah, Georgia to an IFIP WG 4.3 meeting, where I’ll present our work on Parameterized Compiler Testing (a joint work with my fantastic co-workers Addison Crump and Alexi Turcotte)
#Fandango 1.1 is now available! With this release, #Fandango becomes a full-fledged _protocol fuzzer_, happily exploring states and messages of protocols such as FTP or DNS. Thanks to José. Valentin, Alexander, and Marius for their hard work!
Find Fandango at fandango-fuzzer.github.io
Andreas Zeller and PhD students
About time: A multi-celebration for becoming a member of Academia Europaea, my SIGSOFT Influential Educator Award, my 60th birthday, becoming an IEEE Fellow, _and_ getting the 2026 IEEE Harlan D. Mills Award. With cake and fizzy drinks!
Starting this year, I will only review for conferences that get rid of a "bidding" phase, as allowing reviewers to bid on papers they want to review opens too many opportunities for manipulation and collusion. For details, see andreas-zeller.info/2025/12/07/R... #nobidding
I am happy to report that I have been named the recipient of the
2026 Harlan D. Mills award
"For sustained contributions to software debugging, program analysis, mining software repositories, and automated test generation." This is a big award – thanks to all!
www.computer.org/volunteering...
Fault localization aims to identify code regions responsible for failures. Traditional techniques primarily correlate statement execution with failures; however, program behavior involves diverse execution features, including variable values, branch conditions, and definition-use pairs, which can provide richer diagnostic insights. This paper comprehensively investigates execution features for fault understanding, addressing two complementary goals. First, we conduct an empirical study of 310 bugs across 20 projects, analyzing 17 execution features and assessing their correlation with failure outcomes. Our findings suggest that fault localization benefits from a broader range of execution features: (1) Scalar pairs exhibit the strongest correlation with failures; (2) Beyond line executions, def-use pairs and functions executed are key indicators for fault localization; and (3) Combining multiple features enhances effectiveness compared to relying on individual features. Second, building on these insights, we introduce a debugging approach that learns relevant features from labeled test outcomes. The approach extracts fine-grained execution features and trains a decision tree to differentiate passing and failing runs. The trained model generates fault diagnoses that explain the underlying causes of failures. Our evaluation demonstrates that the generated diagnoses achieve high predictive accuracy. These interpretable diagnoses empower developers to debug software efficiently by providing deeper insights into failures.
How do execution features relate to failures? In this new ACM TOSEM paper, Marius Smytzek, Martin Eberlein, Lars Grunske, and I analyze which execution features beyond code coverage correlate best with failures and lead to accurate explanations of failure causes: dl.acm.org/doi/10.1145/...
Four hours later, I _think_ I have fixed things again - reinstalled Python and all its packages, rebuilt Spotlight and Mail indexes, cleared macOS caches, subscribed to Creator Studio, and now back to these lost mails… Today I hate you, Apple.
* Mail has lost all my emails sent since Monday
* Mail search is broken too
* Search in reminders cannot find anything
* New Keynote is full of ads!?
* Invoke Python-3.13, get 3.14 instead - venvs are messed up
* LaTeX "minted" crashes (likely b/c Python)
So glad I'm an expert in debugging /sarcasm
Fuzzing software becomes much more effective if you can generate _valid_ inputs. We have now built the first approach to _statically_ extract complete and precise input grammars from parser code, producing syntactically valid and diverse inputs by construction. Enjoy! dl.acm.org/doi/10.1145/...
After a visit to Max Planck Institute for Security and Privacy (MPI-SP) in Bochum, seeing my awesome colleagues @thorstenholz.bsky.social, @mboehme.bsky.social, Mathias Payer, and many more, now on my way to Paris to celebrate ten years of @softwareheritage.org with the great Roberto Di Cosmo
Correction: It's 2,000+ *en*-dashes ("--"), but actually 5,800 *em*-dashes ("---")
$ cd ~/Papers/ $ grep -e '[ ~]-- ' */*.tex | wc -l 2258 $
A researcher used more than 2,000 em-dashes in his papers, revealing AI-based manipulation in 400+ papers since 1985. Professor Zeller claims he "typed" these dashes into the paper by using "two hyphens" and a "typesetting" system.
Fun fact: This is my tenth test of time award :-) We will give a keynote at the FSE 2026 conference. @acm.org @sigsoft.bsky.social
Happy New Year! I am thrilled to report that Jacek Śliwerski, Tom Zimmermann, and I won the ACM SIGSOFT 2026 Impact Award 🏆 for "When do changes induce fixes?" (MSR 2005). The paper introduced the popular SZZ algorithm for linking change histories and bug databases: dl.acm.org/doi/10.1145/...
Problem: Reviewers did not read the paper.
Solution: Write a detailed rebuttal and point to all the places in the paper that answer their questions.
New problem: Reviewers did not read the rebuttal.
The talk is now online:
* Video: www.youtube.com/watch?v=tBO_...
* Slides: andreas-zeller.info/assets/Shoul...
Enjoy! -- Andreas