WAP is a source code static analysis and data mining tool to detect and correct input validation vulnerabilities in web applications written in PHP (version 4.0 or higher) with a low rate of false positives. This tool semantically analyses the source code. More precisely, it does taint analysis (data-flow analysis) to detect the input validation vulnerabilities. The aim of the taint analysis is to track malicious inputs inserted by entry points ($_GET, $_POST arrays) and to verify if they reach some sensitive sink (PHP functions that can be exploited by malicious input). After the detection, the tool uses data mining to confirm if the vulnerabilities are real or false positives. At the end, the real vulnerabilities are corrected with the insertion of the fixes (small pieces of code) in the source code.
WAP detects and corrects the following vulnerabilities:
- SQL Injection (SQLI)
- Cross-site scripting (XSS)
- Remote File Inclusion (RFI)
- Local File Inclusion (LFI)
- Directory Traversal or Path Traversal (DT/PT)
- Source Code Disclosure (SCD)
- OS Command Injection (OSCI)
- PHP Code Injection
WAP is written in Java language and is constituted by three modules:
composed by the tree generator and taint analyzer. The tool has integrated a lexer and a parser generated by ANTLR, and based in a grammar and a tree grammar written to PHP language. The tree generator uses the lexer and the parser to build the AST (Abstract Sintatic Tree) to each PHP file. The taint analyzer performs the taint analysis navigating through the AST to detect potentials vulnerabilities.
False Positives Predictor
composed by a supervised trained data set with instances classified as being vulnerabilities and false positives and by the Logistic Regression machine learning algorithm. For each potential vulnerability detected by code analyzer, this module collects the presence of the attributes that define a false positive. Then, the Logistic Regression algorithm receives them and classifies the instance as being a false positive or not (real vulnerability).
Each real vulnerability is removed by correction of its source code. This module for the type of vulnerability selects the fix that removes the vulnerability and signalizes the places in the source code where the fix will be inserted. Then, the code is corrected with the insertion of the fixes and new files are created.