I was developing a tool to inlinify the external Web Worker references in a JavaScript file. It checks all external Web Worker references like: new Worker('worker.js')
and then reads the content from worker.js
, eventually transforms the external reference to inline reference like: new Worker(window.URL.createObjectURL(new Blob(["/* the code of worker.js */"])))
. The function of the tool seems to be very simple, I was considering to use Regular expression to find out the value of the external Web Worker reference, however it turned out that a lot of edge cases cannot be covered by the Regular expression solution. In the end, JavaScript AST (Abstract Syntax Tree) parser became the life saver.
First, let's take a look at the simplest scenario:
var myWorker = new Worker('worker.js');
Quite simple, we can use Regular expression new Worker\('(.*\.js)'\)
to get the external worker reference we need: worker.js
.
But what if the string is inside double quotes, the Regular expression won't work.
var myWorker = new Worker("worker.js");
It is still easy, we can use backward reference in Regular expression to solve this. new Worker\((['"])(.*\.js)\1\)
works for both single quotes and double quotes.
But what if there is a query string?
var myWorker = new Worker('worker.js?v=1.0');
Well, we may need a longer Regular expression to solve this like: new Worker\((['"])(.*\.js)(\?.*)?\1\)
. However it is getting more complicated now, consider following abnormal syntax:
var myWorker = new Worker('worker.js?t=' + new Date().getTime());
var myWorker2 = new Worker('worker.js?isIE=' + (isIE() ? 'true' : 'false'));
var myWorker3 = new Worker(
'worker.js' +
'?v=1.0')
Though they are abnormal, they are all valid JavaScript expressions and have external references to worker.js
which should be inlinified.
I thought about not supporting these edge cases. But it is getting worse when I found that the following string definition will also be matched by the Regular expression. Even though it is an edge scenario, but it is definitely a bug. This made me think Regular expression might not work at all.
var str = "new Worker('worker.js')"
What I need is not a Regular expression, but a JavaScript parser to parse the code and find out the new
expression on Worker
and get its literal argument.
I heard that JavaScript Abstract Syntax Tree can represent the syntactic structure of JavaScript code. By testing on AST explorer, I can be sure this is absolutely what I was looking for. AST Parser is able to find out all new Worker
expressions and their arguments with accurate positions like below:
[
...
{
...
"init": {
"type": "NewExpression",
"start": 15,
"end": 64,
"callee": {
"type": "Identifier",
"start": 19,
"end": 25,
"name": "Worker"
},
"arguments": [
{
"type": "BinaryExpression",
"start": 26,
"end": 63,
"left": {
"type": "Literal",
"start": 26,
"end": 40,
"value": "worker.js?t=",
"raw": "'worker.js?t='"
},
"operator": "+",
"right": {
...
}
}
]
}
},
{
...
"init": {
"type": "NewExpression",
"start": 82,
"end": 141,
"callee": {
"type": "Identifier",
"start": 86,
"end": 92,
"name": "Worker"
},
"arguments": [
{
"type": "BinaryExpression",
"start": 93,
"end": 140,
"left": {
"type": "Literal",
"start": 93,
"end": 110,
"value": "worker.js?isIE=",
"raw": "'worker.js?isIE='"
},
"operator": "+",
"right": {
...
}
}
]
}
},
{
...
"init": {
"type": "NewExpression",
"start": 159,
"end": 201,
"callee": {
"type": "Identifier",
"start": 163,
"end": 169,
"name": "Worker"
},
"arguments": [
{
"type": "BinaryExpression",
"start": 175,
"end": 200,
"left": {
"type": "Literal",
"start": 175,
"end": 186,
"value": "worker.js",
"raw": "'worker.js'"
},
"operator": "+",
"right": {
"type": "Literal",
"start": 193,
"end": 200,
"value": "?v=1.0",
"raw": "'?v=1.0'"
}
}
]
}
}
]
Eventually, I chose to use Acorn as the JavaScript parser and finished my tool: https://github.com/js1016/worker-inlinify.