Advertisement · 728 × 90

Posts by Andreas Zeller

Post image

This is the output I want. What input do I need? Today at ICSE - International Conference on Software Engineering, Tural Mammadov presented his work on Modelizer - the framework that learns from synthesized program executions to predict inputs from outputs and vice versa: dl.acm.org/doi/10.1145/...

4 days ago 2 0 0 0
Post image

I‘m gonna need a bigger suitcase #ICSE2026

6 days ago 13 0 1 0
Video

In the #ICSE2026 Wednesday 14:00 session, I will be giving my Harlan D. Mills Award talk (likely at 14:10 already). Enjoy! conf.researchr.org/details/icse...

1 week ago 8 0 2 0
Post image

On my way to Rio de Janiero, visiting #ICSE2026 - here with Tural Mammadov. See you soon!

1 week ago 5 0 1 0
Andreas Zeller on software testing and it's optimization
Andreas Zeller on software testing and it's optimization YouTube video by CISPA

"If you don't test your software, someone else will." Others may discover bugs and vulnerabilities. Here’s where Fandango, CISPA’s new tool for automated software testing comes in. More from CISPA-Faculty @andreaszeller.bsky.social youtube.com/shorts/LuPgQ...

1 week ago 2 1 0 0
Fuzzing with Fandango — Fuzzing with Fandango

Visiting #FSE2026 in Montreal? Do not miss our #Fandango tutorial on Sunday, July 5, where we show how to systematically generate inputs and interactions for comprehensive software testing (with Pepe Zamudio, Marius Smytzek, and Alexander Liggesmeyer). Find Fandango at fandango-fuzzer.github.io

2 weeks ago 5 0 1 0
Preview
The Future of Automated Debugging and Software Testing with Harlan D Mills Award Winner Andreas Zeller Hardware-based protections validate what software cannot independently verify. They shift visibility from reactive observation to foundational assurance. This article takes a deeper look at the what, ...

The IEEE Computer Society interviewed me on my past and the Future of Automated Debugging and Software Testing. Enjoy! www.computer.org/publications...

1 month ago 5 2 0 0
Homepage of Andreas Zeller, now with the text "help build better software" rather than "help _developers_ build better software"

Homepage of Andreas Zeller, now with the text "help build better software" rather than "help _developers_ build better software"

For decades, my mission was to help developers build better software. Now I help anyone, including AI: andreas-zeller.info

1 month ago 6 0 1 0
FLAT: Formal Languages as Types
And Their Applications in Testing
FENGMIN ZHU, CISPA Helmholtz Center for Information Security, Germany
ANDREAS ZELLER, CISPA Helmholtz Center for Information Security, Germany
Programmers regularly use strings to encode many types of data, such as Unix file paths, URLs, and email addresses. They are
conceptually different, but existing mainstream programming languages treat them as the same string type. This is problematic:
the type system allows, for instance, malicious HTML text to be passed to a function expecting an email address.
To distinguish conceptually different string types and to avoid potential vulnerabilities, we regard formal languages as types
(FLAT), thereby restricting the set of valid strings using context-free grammars and, if needed, semantic constraints. Applying
this type-based approach, we offer a unified solution for string API documentation, input validation, malicious input detection,
language-based fuzzing, and test oracles, all at once, based on user-annotated formal language types and, if necessary, pre-
and post-conditions. We implement this idea and present FLAT-PY, a testing framework for Python. By attaching annotations
directly to Python code, FLAT-PY automatically performs runtime type checking via code instrumentation and reports any
detected type errors as soon as possible. We conducted case studies on real Python code fragments: FLAT-PY can detect
logical bugs from random inputs generated by a language-based fuzzer, relying on a reasonable number of user annotations.

FLAT: Formal Languages as Types And Their Applications in Testing FENGMIN ZHU, CISPA Helmholtz Center for Information Security, Germany ANDREAS ZELLER, CISPA Helmholtz Center for Information Security, Germany Programmers regularly use strings to encode many types of data, such as Unix file paths, URLs, and email addresses. They are conceptually different, but existing mainstream programming languages treat them as the same string type. This is problematic: the type system allows, for instance, malicious HTML text to be passed to a function expecting an email address. To distinguish conceptually different string types and to avoid potential vulnerabilities, we regard formal languages as types (FLAT), thereby restricting the set of valid strings using context-free grammars and, if needed, semantic constraints. Applying this type-based approach, we offer a unified solution for string API documentation, input validation, malicious input detection, language-based fuzzing, and test oracles, all at once, based on user-annotated formal language types and, if necessary, pre- and post-conditions. We implement this idea and present FLAT-PY, a testing framework for Python. By attaching annotations directly to Python code, FLAT-PY automatically performs runtime type checking via code instrumentation and reports any detected type errors as soon as possible. We conducted case studies on real Python code fragments: FLAT-PY can detect logical bugs from random inputs generated by a language-based fuzzer, relying on a reasonable number of user annotations.

In a call "retrieve(account: string)", nobody checks the contents of "account". What if we could specify its type not just as a string, but as a formal language - say, a regex "[0-9]+"? In our new paper, we do exactly this - for better type checking and even test generation: doi.acm.org?doi=3799978

1 month ago 6 0 0 0
Advertisement
Brad Pitt in front of a classroom (AI-generated)

Brad Pitt in front of a classroom (AI-generated)

My successor as a professor will be some AI video tutor with the appearance of Brad Pitt, available 24/7, unlimited patience, personalized towards each student, the ability to teach any subject ever discussed in a textbook, and a cost of < 1$/hour. Good thing I can still do research! (Now wait...)

1 month ago 4 0 0 0
IEEE Computer Society Harlan D. Mills Award and Talk by Andreas Zeller
Should Computer Scientists Experiment Less? On the past, present, and future of software engineering research

IEEE Computer Society Harlan D. Mills Award and Talk by Andreas Zeller Should Computer Scientists Experiment Less? On the past, present, and future of software engineering research

IEEE Computer Society Harlan D. Mills Award and Talk by @andreaszeller.bsky.social

Should Computer Scientists Experiment Less? On the past, present, and future of software engineering research

More information at conf.researchr.org/details/icse...

1 month ago 3 2 0 0
IEEE Computer Society Harlan D. Mills Award and Talk by Andreas Zeller: Should Computer Scientists Experiment Less? On the past, present, and future of software engineering research (ICSE 2026 - Main ... This year, ICSE 2026 innovates with an expanded Main Plenaries program—bringing a total of four exceptional keynote talks to the main conference stage. Across Wednesday to Friday, these sessions gathe...

"Should Computer Scientists Experiment Less?" This is the title of my upcoming Harlan D. Mills Award Talk at ICSE 2026 on the past, present, and future of Software Engineering research. Looking forward to lots of productive discussions!
conf.researchr.org/details/icse...

1 month ago 8 1 0 0
Preview
Mining metrics to predict component failures | Proceedings of the 28th international conference on Software engineering

Impact award! I am happy to report that my ICSE 2006 paper "Mining metrics to predict component failures," with Nachi Nagappan and Thomas Ball, has been selected to receive a retrospective ICSE SEIP Most Influential Paper Award. Read it here: dl.acm.org/doi/10.1145/...

1 month ago 13 0 0 0
Over the past decade, the automated generation of test inputs has made significant advances. Modern fuzzers and test generators easily
produce complex input formats that do systematically cover the input and execution space. Testing protocols, though, has remained a
frontier for automated testing, as a test generator has to interact with the program under test, producing messages that conform to
the current state of the system.
In this paper, we introduce language-based protocol testing, the first approach to specify, automatically test, and systematically cover
the full state and input space of protocol implementations. We specify protocols as interaction grammars—an extension of context-free
grammars that tag each message element with the communication party that is in charge of producing it. Interaction grammars embed
classical state models by unifying states, messages, and transitions all into nonterminals, and can be used for producing interactions as
well as parsing them, making them ideally suited for testing protocols. Additional constraints over grammar elements allow us to
specify and test semantic features such as binary message formats, checksums, encodings, and the many ways that message features
induce states and vice versa.
To evaluate the effectiveness of language-based protocol testing, we have implemented it as part of the FANDANGO test generator. We
specify several protocols as interaction grammars, including features such as human-readable interactions (SMTP), bit-level encodings
(DNS), and dynamic port assignments (FTP), and use them to test the corresponding protocol implementations. By systematically
covering the interaction grammar and solving the associated constraints, FANDANGO achieves comprehensive coverage of the protocol
interactions, resulting in high code coverage and a thorough assessment of the program under test.

Over the past decade, the automated generation of test inputs has made significant advances. Modern fuzzers and test generators easily produce complex input formats that do systematically cover the input and execution space. Testing protocols, though, has remained a frontier for automated testing, as a test generator has to interact with the program under test, producing messages that conform to the current state of the system. In this paper, we introduce language-based protocol testing, the first approach to specify, automatically test, and systematically cover the full state and input space of protocol implementations. We specify protocols as interaction grammars—an extension of context-free grammars that tag each message element with the communication party that is in charge of producing it. Interaction grammars embed classical state models by unifying states, messages, and transitions all into nonterminals, and can be used for producing interactions as well as parsing them, making them ideally suited for testing protocols. Additional constraints over grammar elements allow us to specify and test semantic features such as binary message formats, checksums, encodings, and the many ways that message features induce states and vice versa. To evaluate the effectiveness of language-based protocol testing, we have implemented it as part of the FANDANGO test generator. We specify several protocols as interaction grammars, including features such as human-readable interactions (SMTP), bit-level encodings (DNS), and dynamic port assignments (FTP), and use them to test the corresponding protocol implementations. By systematically covering the interaction grammar and solving the associated constraints, FANDANGO achieves comprehensive coverage of the protocol interactions, resulting in high code coverage and a thorough assessment of the program under test.

With more and more AI-generated code, comprehensive system testing becomes more important than ever. Our new paper "Language-Based Protocol Testing" (with Alexander Liggesmeyer and Pepe Zamudio), shows how to specify and test all details of how programs interact: arxiv.org/abs/2509.20308

1 month ago 10 2 1 0
Post image

On my way to Savannah, Georgia to an IFIP WG 4.3 meeting, where I’ll present our work on Parameterized Compiler Testing (a joint work with my fantastic co-workers Addison Crump and Alexi Turcotte)

1 month ago 2 0 0 0
Video

#Fandango 1.1 is now available! With this release, #Fandango becomes a full-fledged _protocol fuzzer_, happily exploring states and messages of protocols such as FTP or DNS. Thanks to José. Valentin, Alexander, and Marius for their hard work!
Find Fandango at fandango-fuzzer.github.io

1 month ago 6 1 1 0
Andreas Zeller and PhD students

Andreas Zeller and PhD students

About time: A multi-celebration for becoming a member of Academia Europaea, my SIGSOFT Influential Educator Award, my 60th birthday, becoming an IEEE Fellow, _and_ getting the 2026 IEEE Harlan D. Mills Award. With cake and fizzy drinks!

2 months ago 13 1 0 0
Reviewer-Author Collusion Rings and How to Fight Them In 2012, I attended a physical meeting of the program committee responsible for selecting the best scientific papers for the ESEC/FSE 2013 conference in Saint Petersburg, Russia. This meeting was part...

Starting this year, I will only review for conferences that get rid of a "bidding" phase, as allowing reviewers to bid on papers they want to review opens too many opportunities for manipulation and collusion. For details, see andreas-zeller.info/2025/12/07/R... #nobidding

2 months ago 9 0 1 0
Advertisement
Post image

I am happy to report that I have been named the recipient of the

2026 Harlan D. Mills award

"For sustained contributions to software debugging, program analysis, mining software repositories, and automated test generation." This is a big award – thanks to all!
www.computer.org/volunteering...

2 months ago 22 1 1 0
Fault localization aims to identify code regions responsible for failures. Traditional techniques primarily correlate statement
execution with failures; however, program behavior involves diverse execution features, including variable values, branch
conditions, and definition-use pairs, which can provide richer diagnostic insights.
This paper comprehensively investigates execution features for fault understanding, addressing two complementary goals.
First, we conduct an empirical study of 310 bugs across 20 projects, analyzing 17 execution features and assessing their
correlation with failure outcomes. Our findings suggest that fault localization benefits from a broader range of execution
features: (1) Scalar pairs exhibit the strongest correlation with failures; (2) Beyond line executions, def-use pairs and functions
executed are key indicators for fault localization; and (3) Combining multiple features enhances effectiveness compared to
relying on individual features.
Second, building on these insights, we introduce a debugging approach that learns relevant features from labeled test
outcomes. The approach extracts fine-grained execution features and trains a decision tree to differentiate passing and failing
runs. The trained model generates fault diagnoses that explain the underlying causes of failures.
Our evaluation demonstrates that the generated diagnoses achieve high predictive accuracy. These interpretable diagnoses
empower developers to debug software efficiently by providing deeper insights into failures.

Fault localization aims to identify code regions responsible for failures. Traditional techniques primarily correlate statement execution with failures; however, program behavior involves diverse execution features, including variable values, branch conditions, and definition-use pairs, which can provide richer diagnostic insights. This paper comprehensively investigates execution features for fault understanding, addressing two complementary goals. First, we conduct an empirical study of 310 bugs across 20 projects, analyzing 17 execution features and assessing their correlation with failure outcomes. Our findings suggest that fault localization benefits from a broader range of execution features: (1) Scalar pairs exhibit the strongest correlation with failures; (2) Beyond line executions, def-use pairs and functions executed are key indicators for fault localization; and (3) Combining multiple features enhances effectiveness compared to relying on individual features. Second, building on these insights, we introduce a debugging approach that learns relevant features from labeled test outcomes. The approach extracts fine-grained execution features and trains a decision tree to differentiate passing and failing runs. The trained model generates fault diagnoses that explain the underlying causes of failures. Our evaluation demonstrates that the generated diagnoses achieve high predictive accuracy. These interpretable diagnoses empower developers to debug software efficiently by providing deeper insights into failures.

How do execution features relate to failures? In this new ACM TOSEM paper, Marius Smytzek, Martin Eberlein, Lars Grunske, and I analyze which execution features beyond code coverage correlate best with failures and lead to accurate explanations of failure causes: dl.acm.org/doi/10.1145/...

2 months ago 7 0 0 1

Four hours later, I _think_ I have fixed things again - reinstalled Python and all its packages, rebuilt Spotlight and Mail indexes, cleared macOS caches, subscribed to Creator Studio, and now back to these lost mails… Today I hate you, Apple.

2 months ago 0 0 0 0

* Mail has lost all my emails sent since Monday
* Mail search is broken too
* Search in reminders cannot find anything
* New Keynote is full of ads!?
* Invoke Python-3.13, get 3.14 instead - venvs are messed up
* LaTeX "minted" crashes (likely b/c Python)

So glad I'm an expert in debugging /sarcasm

2 months ago 3 0 1 0
Inferring Input Grammars from Code with Symbolic Parsing | ACM Transactions on Software Engineering and Methodology Generating effective test inputs for a software system requires that these inputs be valid, as they will otherwise be rejected without reaching actual functionality. In the absence of a specification ...

Fuzzing software becomes much more effective if you can generate _valid_ inputs. We have now built the first approach to _statically_ extract complete and precise input grammars from parser code, producing syntactically valid and diverse inputs by construction. Enjoy! dl.acm.org/doi/10.1145/...

2 months ago 12 4 0 0
Post image

After a visit to Max Planck Institute for Security and Privacy (MPI-SP) in Bochum, seeing my awesome colleagues @thorstenholz.bsky.social, @mboehme.bsky.social, Mathias Payer, and many more, now on my way to Paris to celebrate ten years of @softwareheritage.org with the great Roberto Di Cosmo

2 months ago 5 0 0 0

Correction: It's 2,000+ *en*-dashes ("--"), but actually 5,800 *em*-dashes ("---")

3 months ago 5 0 0 0
$ cd ~/Papers/
$ grep -e '[ ~]-- ' */*.tex | wc -l
    2258
$

$ cd ~/Papers/ $ grep -e '[ ~]-- ' */*.tex | wc -l 2258 $

A researcher used more than 2,000 em-dashes in his papers, revealing AI-based manipulation in 400+ papers since 1985. Professor Zeller claims he "typed" these dashes into the paper by using "two hyphens" and a "typesetting" system.

3 months ago 19 1 1 0

Fun fact: This is my tenth test of time award :-) We will give a keynote at the FSE 2026 conference. @acm.org @sigsoft.bsky.social

3 months ago 1 0 0 0
When do changes induce fixes? | ACM SIGSOFT Software Engineering Notes As a software system evolves, programmers make changes that sometimes cause problems. We analyze CVS archives for fix-inducing changes---changes that lead to problems, indicated by fixes. We show how ...

Happy New Year! I am thrilled to report that Jacek Śliwerski, Tom Zimmermann, and I won the ACM SIGSOFT 2026 Impact Award 🏆 for "When do changes induce fixes?" (MSR 2005). The paper introduced the popular SZZ algorithm for linking change histories and bug databases: dl.acm.org/doi/10.1145/...

3 months ago 17 0 1 0
Advertisement

Problem: Reviewers did not read the paper.
Solution: Write a detailed rebuttal and point to all the places in the paper that answer their questions.
New problem: Reviewers did not read the rebuttal.

3 months ago 16 1 4 1
IPN Colloquium 15 12 2025 Andreas Zeller
IPN Colloquium 15 12 2025 Andreas Zeller YouTube video by IPN (ICT Research Platform Nederland)

The talk is now online:

* Video: www.youtube.com/watch?v=tBO_...
* Slides: andreas-zeller.info/assets/Shoul...

Enjoy! -- Andreas

4 months ago 2 0 0 0