In software development, choosing the most efficient method to parse large datasets is crucial for performance. I conducted an experiment to compare two different approaches for counting line breaks in a large text file using Node.js: using
indexOf and manual byte-by-byte checking.
I was recently developing a text file parser based on File Reader API in browsers. To prevent the heavy file parsing job from blocking the UI thread, using Web Worker to parse the file would be the best choice. Web Worker needs to parse the text file into lines and transfer each line to the main thread. We will encounter performance issues on transfer speed and memory usage when Web Worker needs to pass hundreds of thousands, or even millions of lines. Most browsers implement the structured clone algorithm that allows you to pass more complex types in/out of Web Worker such as File, Blob, ArrayBuffer, and JSON objects. However, when passing these types of data using
postMessage(), a copy is still made. Therefore, if you are passing a large 100MB file, there's a noticeable overhead in getting that file between the worker and the main thread. To solve this problem,
postMessage() method was designed to support passing Transferable Objects. When passing Transferable Objects, data is transferred from one context (worker thread or main thread) to another without additional memory consumption, and the transfer speed is really quick since no copy is made. However I still see notable performance issue when Web Worker needs to pass massive Transferable Objects.
FileReader API in HTML5 allows web browsers to access user files without uploading the files to the web servers. It not only lightens the load of web server but also saves the time of uploading files. It is very easy to use
FileReader.readAsText() to process a 300K log file. However, when the file size grows to 1GB, or even 2GB, then the page might crash in browser or not function correctly. This is because
readAsText() loads the full file into memory before processing it. As a result, the process memory exceeds the limitation. To prevent this issue, we should use
FileReader.readAsArrayBuffer() to stream the file when the web application needs to process huge files, so you only hold a part of the file in the memory.
We need to take special care of the performance while hooking up to the onscroll event as it fires more frequently with shorter interval time comparing to other events like mouse and keyboard. The FPS drops if your onscroll event contains elaborate logic or animations. In addition, user usually scrolls the mouse wheel continuously, it may aggravate the FPS drop, increase the browser CPU usage and impact the user experience. Developers sometimes don't notice such kind of issue since most development machines have better performance than normal user machines. Next, let's talk about how to prevent the performance issue caused by onscroll event.
The browser sends http header: If-Modified-Since and If-None-Match to check whehter a static file has been updated when it fetches the file again. If the file has not been updated, the server would send 304 Not Modified HTTP header message and the browser would use cached version instead of downloading the file again. This is the default cache mechanism of most web servers, however there is some room for improvement in the web performance. The default cache behavior still requires establishing HTTP connection between the client and the server in order to check whether the file has been modified. It may takes the browser tens to hundreds of milliseconds to talk to the server and it also consumes server resources.