Meet the JavaScript AST

Apr 12, 2018

I was developing a tool to inlinify the external Web Worker references in a JavaScript file. It checks all external Web Worker references like: new Worker('worker.js') and then reads the content from worker.js, eventually transforms the external reference to inline reference like: new Worker(window.URL.createObjectURL(new Blob(["/* the code of worker.js */"]))). The function of the tool seems to be very simple, I was considering to use Regular expression to find out the value of the external Web Worker reference, however it turned out that a lot of edge cases cannot be covered by the Regular expression solution. In the end, JavaScript AST (Abstract Syntax Tree) parser became the life saver.

First, let's take a look at the simplest scenario:

var myWorker = new Worker('worker.js');

Quite simple, we can use Regular expression new Worker\('(.*\.js)'\) to get the external worker reference we need: worker.js.

But what if the string is inside double quotes, the Regular expression won't work.

var myWorker = new Worker("worker.js");

It is still easy, we can use backward reference in Regular expression to solve this. new Worker\((['"])(.*\.js)\1\) works for both single quotes and double quotes.

But what if there is a query string?

var myWorker = new Worker('worker.js?v=1.0');

Well, we may need a longer Regular expression to solve this like: new Worker\((['"])(.*\.js)(\?.*)?\1\). However it is getting more complicated now, consider following abnormal syntax:

var myWorker = new Worker('worker.js?t=' + new Date().getTime());
var myWorker2 = new Worker('worker.js?isIE=' + (isIE() ? 'true' : 'false'));
var myWorker3 = new Worker(
    'worker.js' +
    '?v=1.0')

Though they are abnormal, they are all valid JavaScript expressions and have external references to worker.js which should be inlinified.

I thought about not supporting these edge cases. But it is getting worse when I found that the following string definition will also be matched by the Regular expression. Even though it is an edge scenario, but it is definitely a bug. This made me think Regular expression might not work at all.

var str = "new Worker('worker.js')"

What I need is not a Regular expression, but a JavaScript parser to parse the code and find out the new expression on Worker and get its literal argument.

I heard that JavaScript Abstract Syntax Tree can represent the syntactic structure of JavaScript code. By testing on AST explorer, I can be sure this is absolutely what I was looking for. AST Parser is able to find out all new Worker expressions and their arguments with accurate positions like below:

[
    ...
    {
        ...
        "init": {
            "type": "NewExpression",
            "start": 15,
            "end": 64,
            "callee": {
                "type": "Identifier",
                "start": 19,
                "end": 25,
                "name": "Worker"
            },
            "arguments": [
                {
                    "type": "BinaryExpression",
                    "start": 26,
                    "end": 63,
                    "left": {
                        "type": "Literal",
                        "start": 26,
                        "end": 40,
                        "value": "worker.js?t=",
                        "raw": "'worker.js?t='"
                    },
                    "operator": "+",
                    "right": {
                        ...
                    }
                }
            ]
        }
    },
    {
        ...
        "init": {
            "type": "NewExpression",
            "start": 82,
            "end": 141,
            "callee": {
                "type": "Identifier",
                "start": 86,
                "end": 92,
                "name": "Worker"
            },
            "arguments": [
                {
                    "type": "BinaryExpression",
                    "start": 93,
                    "end": 140,
                    "left": {
                        "type": "Literal",
                        "start": 93,
                        "end": 110,
                        "value": "worker.js?isIE=",
                        "raw": "'worker.js?isIE='"
                    },
                    "operator": "+",
                    "right": {
                        ...
                    }
                }
            ]
        }
    },
    {
        ...
        "init": {
            "type": "NewExpression",
            "start": 159,
            "end": 201,
            "callee": {
                "type": "Identifier",
                "start": 163,
                "end": 169,
                "name": "Worker"
            },
            "arguments": [
                {
                    "type": "BinaryExpression",
                    "start": 175,
                    "end": 200,
                    "left": {
                        "type": "Literal",
                        "start": 175,
                        "end": 186,
                        "value": "worker.js",
                        "raw": "'worker.js'"
                    },
                    "operator": "+",
                    "right": {
                        "type": "Literal",
                        "start": 193,
                        "end": 200,
                        "value": "?v=1.0",
                        "raw": "'?v=1.0'"
                    }
                }
            ]
        }
    }
]

Eventually, I chose to use Acorn as the JavaScript parser and finished my tool: https://github.com/js1016/worker-inlinify.

Keywords:

JavaScript