Artificial Intelligence

Academics Build AI-Powered Android Vulnerability Discovery and Validation Tool

Called A2, the framework mimics human analysis to identify vulnerabilities in Android applications and then validates them. The post Academics Build AI-Powered Android Vulnerability Discovery and Validation Tool appeared first on SecurityWeek.

Two academic researchers from Nanjing University and the University of Sydney have created a framework that relies on AI for the discovery and validation of vulnerabilities in Android applications.

Called A2, the system mirrors human experts’ analysis and validation activities by first reasoning about an application’s security and then validating each potential flaw through exploitation attempts.

During the Agentic Vulnerability Discovery phase, semantic code understanding is mixed with traditional security tools to create vulnerability hypotheses. The next phase, the Agentic Vulnerability Validation, involves the planning, execution, and verification of exploitation operations to validate each hypothesis.

As part of their research, the academics considered threat actors capable of reverse-engineering the Android applications’ APKs, of observing runtime behavior, and of injecting inputs through Android’s interaction channels.

“They do not control the Android platform, kernel, or hardware. Attacks requiring rooted devices, custom firmware, or hardware side channels are out of scope. Adversaries instead focus on application-layer vulnerabilities introduced by developers or insecure library use,” they note in their research paper (PDF).

When fed an APK, A2 uses LLMs to analyze the code and generate speculative vulnerability findings. It also uses warnings from static application security testing (SAST) tools to generate additional findings, and consolidates all discoveries using an aggregator.

At the next phase, each finding is passed through a PoC planner that generates tasks and expected outcomes, each task is then executed, and a validator verifies the outcomes for iterative refinement, until either the vulnerability is successfully validated or the retry limits reached.

During the analysis phase, A2 decompiles the APK’s code, eliminates third-party libraries and extracts manifest details, processes the code and manifest data, and, if integrated with third-party tools, standardizes the diverse output for downstream processing so they can be aggregated.

Next, the PoC planner analyzes each bug’s characteristics to plot a validation plan and eliminate false-positives, and assigns the tasks to the executor, which performs the validation steps across “code execution, device control, file system, static analysis, UI interaction, log analysis, APK generation, and web server management,” the researchers explain.

Finally, the validator independently verifies each PoC outcome, without accepting the task executor’s reported success. Instead, it relies on its own observations to verify that the expected results occurred.

If execution fails or the validator rejects success claims, feedback is sent to the PoC planner, which revises the strategy and retries. If all tasks pass validation, the process ends.

The academics relied on Gemini to produce 82 speculative vulnerability findings, but excluded 19 of them. Of the remaining 63 findings, 56 were true positives, validated with a complete proof-of-concept (PoC) code.

Looking into the computational costs and efficiency of A2 across O3, Gemini, and ChatGPT, the researchers estimate that detection-only costs are well under $1 per APK, while full validation pipeline costs could reach up to $26.85 per vulnerability in Gemini (median $8.94).

The researchers tested the framework on a real-world dataset of 160 APKs. Of the 136 speculative vulnerabilities reported during the detection phase, 60 were validated as exploitable security defects, while 29 were marked as false positives. The solution also identified bugs outside its validation scope.

Manual review showed that only three of the 60 validated bugs were false positives. The remaining 57 issues were cryptographic, access control, and input validation flaws that were responsibly disclosed.

According to the academics, A2 is a step forward toward automated security analysis for Android, as it achieves higher coverage than existing tools, but it still comes with multiple limitations related to scope, LLM reasoning reliability, and context.

Latest News

Publisher