Mastering Static Analysis with Semgrep: A Comprehensive Lab Guide

School

George Mason University**We aren't endorsed by this school

Course

CYSE 430

Subject

Computer Science

Date

Dec 12, 2024

Pages

Uploaded by MegaEmu3454

This lab aims to help students understand the critical steps involvedin static analysis of real code.It is an optional and individual activity; however, the students who performit and provide correct answers will receive10/100 points to be applied inAssignment 2.You can run the lab on your computer; however, if you prefer, solve the LabSetup Azure, which enables you to run the lab in a VM hosted in the AzureCloud environment.Submission:The student must submit the YouTube linkto a video showinghim performing all the steps provided in this lab. It is not compulsory to run iton Linux; you can use it on Windows or Mac.Static Application Security Testing (SAST) Static Application Security Testing (SAST) is a methodology that analysesapplication source code to identify security vulnerabilities (including, but notlimited to, Injection vulnerabilities, Insecure Functions, CryptographicWeaknesses, and more). Typically, SAST includes both manual andautomated testing techniques, which complement each other. For this lab,we selected the Semgrep.Semgrep is a fast, open-source, static analysis tool that searches code, findsbugs, and enforces secure guardrails and coding standards. Semgrepsupports 30+ languages and can run in an IDE, as a pre-commit check, andas part of CI/CD workflows.Semgrep is faster than many traditional static analysis tools because itdoesn't require building a full abstract syntax tree (AST). It works byscanning source code directly using pattern-matching rules, making itefficient for quick feedback loops.CYSE 411: Lab Static Analysis1

Why it matters:It can be integrated into CI/CD pipelines without slowing down the buildprocess.Developers can get immediate feedback on potential security flaws.Use Cases in Security Static Analysis:Detecting injection vulnerabilities: SQL injection, command injection, etc.Identifying insecure configurations: Hardcoded secrets, insecure API calls.Enforcing secure coding practices: Input validation, error handling.Compliance checks: Ensures code adheres to industry standards like PCI-DSS or GDPR.To use semgrep, you must first install Python on your machine1. Next, weneed to install semgrep using the following commands2:sudo apt updatesudo apt install python3 python3-pip -ypip3 install semgrepAfter the installation, you must restart the VM (use the Azure Dashboardpage to perform this task). Next, you can check if the tool is corrected andinstalled, you must type this command:semgrep --version1Azure Ubuntu VM already comes with Python and several other common tools (Java, Git, etc.).2Remember, you need to run this command through the SSH client.CYSE 411: Lab Static Analysis2

The next step is to clone a repository developed by the course that containsessential JavaScript files with minor vulnerabilities. The project ishttps://github.com/kabartsjc/cyse411-lab-sast. Run this command to clonethe repository for your VM.git clone https://github.com/kabartsjc/cyse411-lab-sast.gitCYSE 411: Lab Static Analysis3

Now, we can start to use semgrep for our static analysis. To perform it, youmust move to the project folder and run this command:cd cyse411-lab-sastsemgrep --config="p/javascript" . > report.txtPlease pay attention to the token > report.txt; make that commandsave the output in a file, enabling you to read it later.You can see that we use the built configuration “p/javascript.” However,semgrep has several others which provide support for other languages. Forexample, a significant one that including security checks like OWASP Top 10.The OWASP Top 103is a widely recognized list of the top ten most criticalsecurity risks to web applications, published by the Open Web ApplicationSecurity Project (OWASP). It guides developers, security professionals, andorganizations to understand, prioritize, and mitigate common vulnerabilitiesin web applications. To run using this configuration, use this command:semgrep --config="https://semgrep.dev/p/owasp-top-ten" .3https://owasp.org/www-project-top-ten/CYSE 411: Lab Static Analysis4

Now, we will analyze the report provided. You can use Vimto read the reporton the console or export the report4to your machine and read it with better editor support.vim report-owasp.txtThe report shows that semgrep founds 11 code issues. Files organize the findings. In our project, you have two JavaScript files: app1.js and app2.js. It starts with the finds in app2.js, as shown in the following Figure.4You can use SFTP tools. Bitvise has an SFTP editor embedded; in its main screen below the SSH, you find it.CYSE 411: Lab Static Analysis5

Reading the report, we find code lines not used by the program, hard-codecredentials, tainted inputs, etc. For example, in the line 76, you can find ahard-code credentials. Semgrep not only detects and recommends how to fixit; in this case, it suggests you use environment variables.CYSE 411: Lab Static Analysis6

CYSE 411: Lab Static Analysis7

Next, it detects a tainted SQL vulnerability. Checking the source code (app2.js), we can see that the usernameandpasswordare the source, and the function db.queryis the sink.CYSE 411: Lab Static Analysis8

Given the report, we need to fix each issue in our code. Issue 1: default session cookie nameThe problem is that you used a default session cookie name. By default,express-session uses the cookie name connect.sidto store session IDs.Using this default name makes it easier for attackers to fingerprint yourapplication, as it signals the use of Express with express-session. This canmake your application more identifiable and susceptible to targeted attacks.To mitigate this issue, a recommended solution is to change the cookie nameto something unique. The name option in the session configuration allowsyou to specify the cookie name.CYSE 411: Lab Static Analysis9

Issue 2: Missing domain nameThis issue highlights that the session middleware (express-session) is notexplicitly setting the domain attribute for the session cookie. The domainattribute determines the cookie's scope — which domains can access it.Without specifying a domain, the session cookie is only available to the exactdomain that issued it (by default). This can sometimes be insecure if notappropriately managed in scenarios like subdomains.Explicitly setting the domain ensures better control, especially in multi-subdomain applications, where you might want cookies to be available toonly a particular domain or across subdomains. Refactoring the code, thenew code will be similar to this one.CYSE 411: Lab Static Analysis10

Issue 3: Missing expirationThis issue points out that the expiration attribute for the session cookie is notset. The expires attribute controls when the session cookie should expire andbe removed from the client. By default, express-session creates a sessioncookie, which means the cookie will be deleted when the browser is closed.There is no control over cookie duration without an explicit expiration time ifyou want it to persist across sessions (e.g., a "Remember Me" feature).Setting the expires attribute helps control how long a session should remainvalid, which can:-Mitigate session hijacking risks by reducing the cookie's lifetime.-Allow sessions to persist only for a controlled duration.To fix this issue, you must explicitly define the domainin the cookieconfiguration of the session middleware.CYSE 411: Lab Static Analysis11

CYSE 411: Lab Static Analysis12

Issue 4: HTTP Only not setThis warning indicates that the httpOnly attribute for the session cookie is not set. The httpOnly flag prevents cookies from being accessed via client-side JavaScript, which helps mitigate Cross-Site Scripting (XSS) attacks. WhatThis Means:-Without httpOnly: The session cookie could be accessed by malicious scripts injected into your application through XSS vulnerabilities.-With httpOnly: The cookie is inaccessible to JavaScript running on the client, ensuring that even if XSS occurs, the session cookie is protectedfrom being stolen.The previous code already fixed this issue.Issue 5: Path not setThis warning highlights that the path attribute for the session cookie is notset. The path attribute controls the URL path where the cookie will be sent. Ifnot set, the default is /, meaning the cookie is sent for all paths in thedomain.CYSE 411: Lab Static Analysis13

By default, the cookie is sent to all paths in the domain (i.e., it is availablethroughout the entire site). However, in some cases, you may want to restrictthe cookie to specific paths (e.g., login paths, admin sections).Security Implication: While not necessarily a critical issue, setting aspecific path can limit the cookie's scope, which helps manage sessioncookies more securely in multi-path applications.Explicitly define the path attribute in the cookie configuration to control where the cookie should be sent. The previous code already fixed this issue.Issue 6: Secure attribute is not setThis warning indicates that the secure attribute for the session cookie is notset. The secure flag ensures that the session cookie is only transmitted overHTTPS, protecting it from being exposed over insecure HTTP connections.Without setting secure: true, the cookie may be sent over HTTP connections,which is a security risk, as attackers can potentially intercept the cookiethrough man-in-the-middle attacks. When secure: true is set, the cookie isonly sent over HTTPS connections, ensuring encrypted communication andbetter protection for sensitive information.The previous code already fixed this issue.CYSE 411: Lab Static Analysis14

Issue 7: Hard Code CredentialsThis warning indicates that the source code has detected a hard-codedsecret. In this case, the secret used for session management (secret:'mysecret') is hard-coded into the application. Hard-coding secrets directlyinto your source code is considered a security risk because it exposes thesecret to anyone with access to the source code, increasing the likelihood ofit being compromised.Instead of hard-coding the session secret in the code, it is recommended thatit be stored in an environment variable or used with a secret managementservice like AWS Secrets Manager, HashiCorp Vault, or a Hardware SecurityModule (HSM).A simple solution is to store the secret in an environment variable. Forexample, you can set the secret in a .env file:.env file:SESSION_SECRET=mysecretOur code must be refactored.CYSE 411: Lab Static Analysis15

CYSE 411: Lab Static Analysis16

Issue 8: Tainted CodeThis warning indicates that user input (in this case, the username andpassword variables) is being directly inserted into a SQL query string. Thispractice is highly vulnerable to SQL injection attacks, where an attacker canmanipulate the query by injecting malicious SQL code, potentially allowingthem to access or modify sensitive data in the database.Tainted SQL String: The SQL query is manually constructed by embeddinguser input into the query string. This makes it possible for an attacker toinject SQL code (e.g., by entering username=' OR '1'='1'), which couldcompromise the database.SQL Injection Risk:SQL injection is one of the most common and severevulnerabilities, allowing attackers to bypass authentication, retrieve sensitivedata, and perform harmful operations on the database.The best way to prevent SQL injection is to use parameterized queries(preparedstatements).Modern databaselibraries (includingMySQL'sNode.jsclient)supportparameterizedqueries,whichautomaticallyCYSE 411: Lab Static Analysis17

handle escaping and sanitizing user input. Here is the refactorization of ourcode.Issue 9: XSS VulnerabilityThis warning highlights a potential cross-site scripting (XSS) vulnerability dueto the direct writing of user input (username) into the HTML response. Theproblem arises because the res.send() method directly injects the user-provided data into the HTML without any escaping, allowing an attacker toinject malicious scripts if the input is not properly sanitized.The code directly inserts the value of the username into the response HTMLwithout escaping it. If an attacker submits a specially crafted value forusername (e.g., <script>alert('XSS')</script>), this could lead to XSS attackswhere malicious JavaScript is executed in the context of the user's browser.Attackers can use XSS vulnerabilities to steal session cookies, performactions on behalf of users, or deface a website.Instead of injecting unescaped data directly into the HTML, you should escape the user input to prevent script execution. Additionally, using a templating engine like res.render() is safer, as it automatically escapes data to prevent XSS.Example Fix Using res.render() (with a Template Engine):-Ensure you use a templating engine like EJS, Pug, or Handlebars in yourExpress app.-Render the HTML page using res.render(), which will safely escape any user-provided data.The refactoring code is in the following Figure.CYSE 411: Lab Static Analysis18

CYSE 411: Lab Static Analysis19

Remember to update the welcome.ejs.<h1>Welcome,<%=username%></h1>Issue 10: Raw HTML InjectionThis warning highlights a potential Cross-Site Scripting (XSS) vulnerabilitybecause user-provided data (username) is directly inserted into the HTMLwithout proper sanitization. By directly embedding user input in the HTMLresponse (e.g., using res.send()), you bypass any HTML escapingmechanisms, potentially exposing the application to XSS attacks.User input (username) is directly embedded within HTML tags (<h1>), whichcan lead to the execution of malicious JavaScript code if the user submitsharmful content (e.g., <script>alert('XSS')</script>). If an attacker controlsthe input, they could inject malicious code that executes within the user'sbrowser. This could allow them to steal session cookies, perform actions onbehalf of the user, or deface the site.To prevent XSS vulnerabilities:-Sanitize HTML: Use a sanitization library such as DOMPurify to ensureuser input is sanitized before being injected into HTML.-Use Template Engines: Alternatively, use a templating engine like EJS,Pug, or Handlebars, which automatically escapes user-provided data.The previous code already fixed this issue.CYSE 411: Lab Static Analysis20

Issue 11: Raw HTML InjectionThis warning indicates that user input (userId) is being directly inserted intoa SQL query string without proper sanitization, which exposes the applicationto SQL injection vulnerabilities. Directly concatenating user input into a SQLquery allows an attacker to manipulate the query and potentially gainunauthorized access to or modify the database. For example, an attackermight submit userId=1 OR 1=1, which could modify the query to return datafor all users instead of just one.Here,userIdis directlyembedded into the SQLquery,makingitvulnerable to SQL injection attacks if the input is not sanitized.To prevent SQL injection, you must use parameterized queries. The mostmodern database libraries, including mysql in Node.js, supportparameterized queries. This ensures that user input is treated as data andnot executable code. You can see next the refactoring code.In this example,the?isaplaceholder thatwill be replacedwith userIdin aCYSE 411: Lab Static Analysis21

way that safely escapes any special characters, protecting the query fromSQL injection.CYSE 411: Lab Static Analysis22