Are You Sure Your Regular Expression Is Secure? ☠️

Are You Sure Your Regular Expression Is Secure? ☠️

Node js security, regular expression denial of service

Regular expression Denial of Service

A regular expression is a method used in programming for pattern matching. It analyzes a string finding every possible combination that matches the given regular expression. Regular expression Denial of Service can happen when you have a vulnerable regular expression on which your regular expression engine might take exponential time to execute.

When you match a string with a vulnerable regular expression the event loop will be busy matching the given string and your server will deny service to requests.

Let’s take an example, this regular expression finds a match that starts with (one or more small letters and any character)+ ends with one or more alphabet.

const regEx = /^([a-z]+.)+[A-Za-z]+$/;   
const found = str.match(regEx);
console.log(regEx);

Now let’s assume you have a regular expression validating if input from a user matches the above regex. If a user passes something like “a7a7Hcc” there won’t be a problem, it just works fine. The problem starts when an attacker exploits a combination of inputs that can stop the server from working.

Let’s write a combination of letters that could make the engine run forever.

str = 'a';
for (let i = 0; i < 100; i++) {
  str += 'a';
}
const regEx = /^([a-z]+.)+[A-Za-z]+$/;
str = str + '!'; 
const found = str.match(regEx);
console.log(regEx);

So if the attacker can come up with something very long like a(100)! like aaaaaaaa…100 and ! at the end that creates a long-running loop because there will be many paths to explore and it will literally crash your server 💥

Regex denial of service is triggered when there is a mismatch but Node.js can’t be certain until it tries many paths through the input string.

A Regex engine has a feature called backtracking. Simply, if the input (token) fails to match, the engine goes back to previous positions where it could take a different path. The engine tries this many times until it explores all possible paths.

Many people have the wrong understanding when it comes to node js performing without blocking since the regex matches happen at the event loop, even your favorite node js asynchronous feature won’t save you.

The solution can be to write a safe regex that doesn’t have an Evil Regex like below with nested (+,*,|). Example:-

  • a+)+

  • ([a-zA-Z]+)*

  • (a|aa)+

  • (a|a?)+

  • (.*a){x} for x \> 10

If you happen to have one of those in your regex you might want to test it before using it. There are some tools to check your regexps for safety, like safe-regex.

You can check your regular expression safety by running node safe.js “regex expression”

$ node safe.js '(x+x+)+y'
false
$ node safe.js '(beep|boop)*'
true
$ node safe.js '(a+){10}'
false
$ node safe.js '\blocation\s*:[^:\n]+\b(Oakland|San Francisco)\b'
true

There are still some regex-vulnerable expressions that can’t be caught by the regex package and you might still need to exploit and test them if you are especially accepting input from a user. In addition to this,
if you are just doing a simple string match, use indexOf or the local equivalent. It will be cheaper and will never take more than O(n)while regex matching can take O(2^n)

— — — — — — — — — — — — — — — — — —

Follow on linked in and get the full advanced node js course I will release soon.