Challenges for PAC script servers in the Chromium ubiquitous ecosystem

With the release of Edge based on Chromium and Edge WebView2 Runtime by Microsoft, Chromium has been playing an increasingly important role on Windows desktops. In daily work, when a user opens Outlook to send and receive emails, uses Teams for meetings, and browses the web with Edge, there are already three instances of Chromium running simultaneously on the operating system. In a corporate environment, configuring the system to use a PAC (Proxy Auto-Configuration) script can place significant request pressure on the servers hosting the PAC script in today's widespread Chromium ecosystem.

Introduction to PAC scripts

PAC scripts, standing for Proxy Auto-Configuration scripts, are JavaScript files used to guide the browser on deciding which proxy server to use or whether to connect directly based on the URL of the website being accessed. In a corporate setting, utilizing PAC scripts to dynamically specify proxy rules for network traffic can effectively manage and optimize network access strategies while enhancing network security and efficiency.

function FindProxyForURL(url, host) {
    // If it is an internal network address, connect directly
    if (shExpMatch(host, '*.internal.example.com') || isPlainHostName(host)) {
        return 'DIRECT';
    }
    // In other cases, use the specified proxy server
    return 'PROXY proxy.example.com:8080';
}

When a browser is configured to use a PAC script, it executes the script before attempting to access any URL to decide how to connect to the target website. Specifically, the browser calls the FindProxyForURL function defined in the PAC file, passing the current URL and hostname as arguments to this function. Based on the results returned by this function, the browser decides whether to directly access the target website (DIRECT) or to access it through a specified proxy server (such as PROXY proxy.example.com:8080).

In the example script above, the shExpMatch function is used to check if the target hostname matches a specific pattern (in this case, the internal domain name *.internal.example.com), while the isPlainHostName function is used to determine if it is a simple hostname without a specified domain, typically used to identify resources within a local area network. If either of these conditions is met, the script instructs the browser to directly connect to the target address. Otherwise, the browser connects through the proxy server specified on proxy.example.com port 8080. This configuration ensures that internal network traffic does not go through an external proxy, while external access is routed through the corporate proxy server, optimizing access speed and securing network traffic management.

How to configure PAC script in Windows

In Windows 10/11, you can enable the PAC script by going to Settings -> Network & internet -> Proxy and turning on Use setup script. Then, enter the PAC script address in the Script address field below.

Alternatively, you can enable the PAC script through the traditional Control Panel -> Internet Options -> Connections -> LAN settings by selecting Use automatic configuration script and entering the PAC script address in the Address field below.

Who retrieves the PAC script setting configured in Windows?

The proxy settings configured on Windows (whether to use automatic proxy detection, a PAC script, or a static proxy server) are stored in the registry under the following path: HKEY_CURRENT_USER\Software\Microsoft\Windows\CurrentVersion\Internet Settings\Connections, in the DefaultConnectionSettings binary key.

Therefore, any application has the permission to retrieve the current Windows proxy settings. However, settings are generally not retrieved directly from the registry but through the WinHttp API: WinHttpGetIEProxyConfigForCurrentUser.

Even though Chromium has its own network stack responsible for handling various network protocols, including HTTP access, it still uses WinHttpGetIEProxyConfigForCurrentUser on Windows to obtain the current user's proxy settings.

Excerpt from net/proxy_resolution/win/proxy_config_service_win.h:

...
// Implementation of ProxyConfigService that retrieves the system proxy
// settings.
//
// It works by calling WinHttpGetIEProxyConfigForCurrentUser() to fetch the
// Internet Explorer proxy settings.
//
// We use two different strategies to notice when the configuration has
// changed:
//
// (1) Watch the internet explorer settings registry keys for changes. When
//     one of the registry keys pertaining to proxy settings has changed, we
//     call WinHttpGetIEProxyConfigForCurrentUser() again to read the
//     configuration's new value.
//
// (2) Do regular polling every 10 seconds during network activity to see if
//     WinHttpGetIEProxyConfigForCurrentUser() returns something different.
//
// Ideally strategy (1) should be sufficient to pick up all of the changes.
// However we still do the regular polling as a precaution in case the
// implementation details of  WinHttpGetIEProxyConfigForCurrentUser() ever
// change, or in case we got it wrong (and are not checking all possible
// registry dependencies).
...

Who downloads and uses PAC script?

Since any application can retrieve the user-configured PAC script settings, applications can naturally choose to use PAC script for network communication according to user preferences. Therefore, in theory, all applications that require network connectivity will generally download and use PAC script. Applications on Windows that download and use PAC script can be divided into two main categories:

Chromium-based applications

The first category includes applications developed based on the Chromium kernel, which includes:

  1. Browsers based on Chromium, such as Chrome and Microsoft Edge.
  2. Desktop applications developed with Edge WebView2 Runtime, such as the plugin system in Outlook, the new version of Teams, PowerBI, Quick Assist, Clipchamp, etc.
  3. Desktop applications developed with the Chromium Embedded Framework, such as Adobe Acrobat, Spotify, etc.
  4. Desktop applications developed with Electron, such as VS Code, Atom, QQ, Slack, etc.

Below are examples of User-agent strings carried by Chromium-based applications when downloading PAC script. Keep in mind that Edge WebView2 Runtime, CEF, and Electron all provide the capability to customize the User-agent strings they send.

Application User-agent
Edge / Apps using Edge WebView2 Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/122.0.0.0 Safari/537.36 Edg/122.0.0.0
Chrome Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/122.0.0.0 Safari/537.36
VS Code Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Code/1.87.0 Chrome/118.0.5993.159 Electron/27.3.2 Safari/537.36
Adobe Acrobat Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) ReaderServices/23.8.20555 Chrome/105.0.0.0 Safari/537.36
QQ Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) QQ/9.9.7-21804 Chrome/120.0.6099.56 Electron/28.0.0 Safari/537.36

WinHttp AutoProxy Service

The second category is a built-in system service in Windows: the WinHTTP Web Proxy Auto-Discovery Service. Applications on Windows that use WinINet or WinHttp as their network stack will call this service to uniformly execute PAC script to determine how to access a given URL. This category includes:

  1. The Internet Explorer browser or applications that have embedded the WebBrowser Control (the core of WebBrowser Control is also IE).
  2. Any application that utilizes the WinINet or WinHttp libraries, such as Word, Outlook, OneDrive, etc.
  3. Any .NET applications using System.Net.Http or System.Net.HttpWebRequest for network communications.

The User-agent carried by the WinHttp AutoProxy Service when downloading PAC script is: WinHttp-Autoproxy-Service/5.1.

Handling HTTP requests involves numerous details, especially when it comes to proxy configurations. In real-world development, the process becomes even more complex due to the need for a JavaScript engine to execute PAC scripts. Therefore, most developers and applications prefer to use pre-packaged libraries like WinHttp, WinINet, or Chromium to initiate HTTP requests. One major advantage of these libraries is their ability to automatically handle PAC scripts without requiring developers to delve into the complex logic behind them.

Technically speaking, it is entirely possible for developers to implement an HTTP library from scratch, including logic for downloading and processing PAC scripts. However, given the complexity of network programming knowledge and the additional workload involved (especially in implementing and maintaining a JavaScript engine to execute PAC scripts), this approach is very rare in practice. In most cases, for efficiency and stability, reusing existing, well-tested network libraries is a more reasonable choice.

Chromium's behavior in downloading PAC script

After extensive local testing, the behavior of Chromium in downloading PAC script has been summarized as follows:

  1. Each time a Chromium instance is launched, it sends 2-3 PAC script download requests and caches the PAC script. If Chromium continues to run, it will download the PAC script again approximately 12 hours later.
  2. Each Chromium instance monitors changes in Windows proxy settings. If the PAC script address changes, Chromium will send another 2-3 PAC script download requests to obtain the latest PAC script.
  3. If there is a JavaScript syntax error in the PAC script, Chromium will resend 2-3 PAC script download requests every 8 seconds.
  4. If there is a network change (switching networks, IP address change), Chromium will resend 2-3 PAC script download requests.

According to the explanation on downloading PAC scripts in Proxy support in Chrome, we can also understand that:

  1. Chromium does not use any HTTP cache mechanism (neither cache negotiation nor strong cache) when downloading PAC scripts.
  2. The download of PAC script does not support HTTP authentication (automatic authentication may work, but it will never prompt a dialog box).
  3. The timeout for obtaining PAC script is set to 30 seconds.
  4. When fetching an explicitly configured PAC URL fails, Chromium will try to re-fetch it:
    • In exactly 8 seconds
    • 32 seconds after that
    • 2 minutes after that
    • Every 4 hours thereafter

The impact of Chromium's ubiquity on PAC script host servers

Understanding the behavior of Chromium in downloading PAC scripts, one might think that initiating 2-3 download requests each time it opens is not a big issue. However, in today's ubiquitous Chromium ecosystem, this could potentially put some strain on servers deployed with PAC scripts. PAC script deployment is common in corporate environments, where a single enterprise often manages hundreds to thousands of client machines.

Let's assume that the following commonly used applications are running on the client side:

  1. Outlook
  2. Teams
  3. Edge
  4. Adobe Reader

This means there are four instances of Chromium running on the system at the moment. If the IT department pushes a new PAC script address through Group Policy or the registry, all Chromium instances on this machine will detect the configuration change and issue 8-12 download requests to obtain the updated PAC script address. If this PAC script change affects 5,000 PCs, the instant number of requests that the PAC script deployment server has to handle could exceed 50,000, which might overwhelm the server. Consequently, a large number of requests might time out after waiting for 30 seconds, leading Chromium to initiate another round of download requests after 8 seconds, putting the server under immense traffic pressure for the next few minutes.

Since Chromium does not use HTTP caching for downloading PAC scripts, there seems to be no straightforward solution to this problem. The only feasible action is to enhance server performance or use load balancing to distribute the traffic across multiple servers. However, since PAC script downloads only occur at application startup and when the PAC address changes, the servers deployed with PAC scripts might remain idle for long periods afterward. Thus, investing in additional hardware solely to cope with the traffic pressure caused by each script address change may seem somewhat wasteful.

WinHttp AutoProxy Service: A thoughtful design

At the beginning of Chromium's development, it likely wasn't anticipated that so many Chromium applications would run on a single machine, making the act of downloading PAC script from the server upon opening Chromium or when the PAC script changes seem natural.

Microsoft, with a focus on enterprise solutions, might have recognized the pain points of PAC script downloads in corporate environments earlier. It would clearly be unacceptable if every application using WinHttp or WinINet had to download the PAC script on its own.

Initially, the only application requiring network access was the IE browser. As more applications needed to connect to the internet, the API WinHttpGetIEProxyConfigForCurrentUser was created to allow various applications needing network access to obtain the user's browser proxy settings, enabling them to use the correct proxy for network access. Eventually, IE proxy settings evolved into Windows system-level proxy settings. Therefore, when users configure a PAC script or WPAD, it was impractical to embed a JavaScript engine within an HTTP library for parsing proxy script, nor do we want each application to download the PAC script independently. A better approach is to deploy a system service specifically designed to handle the downloading, parsing, and execution of PAC scripts as well as WPAD discovery, leading to the creation of the WinHttp AutoProxy Service. The topic of WPAD mentioned earlier is another subject and will not be delved into deeply in this article. However, its essence is similar to that of PAC scripts, except that its URL does not need to be explicitly specified but is detected by the system according to WPAD discovery rules.

WinHttp provides the WinHttpGetProxyForUrl function and the WinHttpGetProxyForUrlEx function to query the WinHttp AutoProxy Service for the proxy server information needed to access a particular URL. The main difference between them is that WinHttpGetProxyForUrl is synchronous, whereas WinHttpGetProxyForUrlEx offers the ability to execute asynchronously.

WinHttp AutoProxy Service's behavior in downloading PAC script

After extensive local testing, the behavior of the WinHttp AutoProxy Service in downloading PAC script has been summarized as follows:

  1. When the WinHttp AutoProxy Service process starts, it issues a request to download a PAC script.

  2. After leaving Windows in an idle standby state for some time, PAC server logs reveal that the WinHttp AutoProxy Service downloads a PAC script every half hour.

  3. The WinHttp AutoProxy Service also monitors changes in Windows proxy settings. If the PAC script address changes, it immediately downloads the PAC script.

  4. Applications can even specify a different PAC script address by setting the lpszAutoConfigUrl parameter when calling the WinHttpGetProxyForUrl function, prompting the WinHttp AutoProxy Service to download the specified PAC script.

Can Chromium Use WinHttp AutoProxy Service?

Yes, it can.

Use WinHttp AutoProxy Service on Chrome / Edge

Both Edge and Chrome can specify the use of the operating system's network stack for proxy resolution at startup by adding the --winhttp-proxy-resolver parameter.

If you prefer not to add this parameter at every startup, you can enable Edge's group policy setting: Use Windows proxy resolver to make Edge use the WinHttp AutoProxy Service by default on Windows for proxy resolution.

Use WinHttp AutoProxy Service on Edge WebView2 Runtime

From my testing, adding the --winhttp-proxy-resolver parameter also works when launching Edge WebView2 processes. Although the parameters for starting Edge WebView2 processes are often determined by the application code, fortunately, we can specify additional startup parameters through the WEBVIEW2_ADDITIONAL_BROWSER_ARGUMENTS environment variable. This allows us to add --winhttp-proxy-resolver to every process that uses Edge WebView2.

Electron and CEF

As for applications developed with Electron and CEF, there seems to be no corresponding environment variable to specify additional startup parameters like with Edge WebView2; it can only be done at the code level.

Note: As mentioned in https://crbug.com/1032820, it's possible that --use-system-proxy-resolver will replace --winhttp-proxy-resolver in the future, along with some limitations of using --winhttp-proxy-resolver with the current Chromium.