The Myth of False Positives in Static Application Security Testing

26 October, 2020 | 5 Min Read

Static application security testing tools are notorious for presenting false positives, i.e., incorrect warnings. In this article, we explain the myth behind false positives and discuss two types of false positives - and which of those future SAST tools must solve. The idea behind Static Application Security Testing (SAST) is flawless. Theoretically.

SAST allows you to detect security vulnerabilities early on in the development phase - emphasis on the word early. SAST analyzes your software’s source code during development, long before testing, deployment and release of your software. Any vulnerability detected during development saves money during the following development iterations and reduces the risk of an attack. So much for the theoretical side.

Now on to the practical side: legacy SAST tools are notorious for presenting false positives, i.e. spurious warnings that are not actionable. Developers and security analysts using SAST tools are typically forced to wade through endless lists of warnings. They evaluate and rank the findings one by one - until eventually the warnings are ignored.

The idea behind SAST tools is to trace data-flows along all execution paths of the program. Typically, SAST traces data-flow connections between so-called “sources” (method invocations that load user input) and “sinks” (method invocations that execute security-critical functionality based on user input).

Example

SQL injection of a web application is the classic textbook example. The source is typically a method that receives the GET or POST parameters of an HTTP request. In the case of the Spring Application, any controller method annotated by `@GetMapping, e.g.,

@GetMapping("/{userName}")
public User getUserByName(@PathVariable String username){...}

The sink is a method that executes a SQL query without properly sanitizing the parameters. For instance, the method executeQuery in the code below

...
String sql = "SELECT id FROM users WHERE username='" + username + "'"
Connection connection = dataSource.getConnection();
ResultSet result = c.createStatement().executeQuery(sql);

where the string variable sql is a concatenated query string that contains the user input userName from the request. Obviously, using the user input directly in the SQL statement renders a serious SQL injection vulnerability. This, for example, allows an attacker to circumvent the user authentication by providing a properly crafted SQL statement as username input, e.g., 'user' OR 1=1 which leads to the statement SELECT id FROM users WHERE username='user' OR 1=1 being executed. No matter what the username is, due to the OR 1=1 part, the execution of the statement returns the first user of the users table, which often is the administrator user. `

The Two Types of False Positives

During the static analysis, the analyzed program is never executed. Therefore, the analysis engine needs to model potential program behavior during execution. This includes modeling the call stack, branch conditions and execution environments (hereafter context). The engine has to make (frequently conservative) assumptions about the path the data actually takes.

For instance, a data-flow engine needs to decide which concrete method implementation an interface invocation resolves to at runtime, the analysis needs to argue which branch will be executed and the analysis needs to reason about code locations that are at all reachable given the execution environment.

All those assumptions are encoded into the engine and lead to different types of false positives. However, we argue, there are two types of false positives: technical and contextual ones.

Technical False Positives

Technical false positives are inherited from properties of the underlying data-flow engine. Typical examples are context-sensitive, flow-sensitive, and field-sensitive analyses.

Context-sensitive analyses model the call stack, [flow-sensitive analyses](flow-sensitive analyses) model the order of control-flow and field-sensitive analyses statically model the object-oriented programming style: objects referencing other objects using (im)mutable fields or class members.

Modern analysis engines, at least state-of-the-art research tools, address and handle all those details and are mostly false positive free - at least from a technical perspective.

Contextual False Positives

However, if you speak to developers and security analysts about SAST tools, you will frequently hear them say something along the lines of:

It is not possible to perform a SQL injection attack at this code location. The method will only be executed from within our own infrastructure. An attacker cannot execute this part of the program.

This command injection is a false positive. The injection is only possible if the DEBUG_MODE flag is set. In our production environment, we do not set the environment variable DEBUG_MODE and the code is never executed.

For most SAST users, none of the above mentioned technical false positives (originating from context-sensitive, flow-sensitive, field-sensitive) come directly to mind when speaking about false positives. The two false positives mentioned in the quotes above also have one thing in common: the developer and analyst know more than the data-flow engine can derive from the pure software’s source code.

A challenge that a next-generation SAST tool needs to cope with.

Contextual False Negatives

While the world speaks about SAST solution’s false positives, SAST is also prone to false negatives. However, false negatives are invisible to the user. No one but the attacker - and your CISO, if they notice the attack - cares.

A vulnerability which many SAST solutions fail to detect is a vulnerability that involves several components, for instance, a database. SAST solutions fall short on modeling data being stored in one cell of the database and being loaded at a different location within the program. If the database flow is part of a potential security vulnerability, the SAST tool will either miss the critical data-flow and not report the vulnerability, or alternatively, the tool will need to heavily over-approximate and mark every cell of the database as potentially manipulated.

Analyzing data-flows across several components is an even bigger issue in today’s micro-service cloud architectures, where functionality is split among multiple separate services. Due to the missing context, legacy SAST tools are no longer able to properly model data-flow between those services and, again, are prone to either miss some vulnerabilities or over-approximate heavily.

CodeShield’s data-flow modeling feature aims to overcome this exact shortcoming of legacy SAST tools.