Another edition of my security course finished. We moved to the Data Analytics for CyberSec (marcusbotacin.github.io/teaching/dat...) (prev. ML-Based cyberdefense: marcusbotacin.github.io/teaching/ml-1), but the spirit of the course remains the same. Click to check what we achieved this semester!
Posts by Marcus Botacin
The video for my talk "Hardware is the New Software: The Next-Gen AntiViruses and how your hardware will self-secure your system!" is available at: www.youtube.com/watch?v=P3p-...
The recording of the talk is available at: www.youtube.com/watch?v=0Nke...
Wed. Oct. 29th, 4:30pm ET: "Malware Detection under Concept Drift: Science and Engineering" - Marcus Botacin - Texas A&M ceri.as/marcus
My most recent talk at @HouSecCon "Hardware is the New Software: The Next-Gen AntiViruses and how your hardware will self-secure your system!" See the slides at marcusbotacin.github.io/talks/housec25
[New Paper] "Making Acoustic Side-Channel Attacks on Noisy Keyboards Viable with LLM-Assisted Spectrograms Typo Correction" usenix.org/conference/w... We published this week at @wootsecurity.bsky.social
And I will be talking there!
Want to know more? Check our work!
And there are pretty significant cases of dataset imbalances in popular malware dataset, such as in DREBIN. See the results for more than 5K runs with different configurations:
This includes false positives (on the drift detection report). We are able to pinpoint, for instance, when a FP occurs because the model did not learn enough due to class imbalance.
The result is that this approach can explain what is happening at every drift point.
We created an entire taxonomy about when drift happens and when not, for the most formal ones.
We also identified that concept drift is directional, i.e., only expansions towards the border cause true drift in the main classifier. Therefore, by measuring directionality we can predict if a concept expansion will cause a drift in the future and even anticipate to it (early retrain).
We detect these cases via an architecture of external meta-models to be applied to any internal ML model. They measure the concepts and the main model measures the boundaries. True drift represent changes in both meta models and boundaries, and false ones affect only the boundary.
Our insight is that there is a difference between the concept (circles) and the decision boundary (lines) of a classifier. Sometimes samples cross the boundary because concept expansion (true drift), but sometimes because the line is misplaced (false positive drift). We want to detect these cases.
[New Paper] "Towards Explainable Drift Detection and Early Retrain in ML-Based Malware Detection Pipelines" - My first paper having a student as main author. Congrats to Jayesh for his presentation today at DIMVA! Check the paper here: marcusbotacin.github.io/publication/...
See you in the next offering!
All the vulnerabilities were disclosed to the developers. Many of them (unfortunately not all) answered and even fixed them, which is great!
I recorded some of the classes, if you are interested: www.youtube.com/watch?v=E8qV...
But don't worry. The students were able to patch many of those vulnerabilities and to verify many other patches, such as those escapes:
In a more sophisticated attack, one team was able to abuse an intent to move the window to the foreground while screenshoting it via accessibility services.
The previous attack was ran against a mobile app. What happen when the app is protected by a password? Well, students could bruteforce it.
In the worst case, one could remotely trigger user deletion by manipulation the client-side requests.
So why not setting it to the maximum value possible?
Another classical attack: MITM. One team identified an application (game) whose credits were set at the user side and not validated.
OK, sometimes the students exaggerate on how much payload they add to the requests...
Or to steal cookies. That moment when your students come to you with a panel of stolen session cookies...
In a more ellaborated attack, one could use XSS to turn an input form into a complete keylogger.
More than one team found XSS cases, in diverse websites.
Another classical problem identified by the teams were XSS, that can be still widely found online.