Generate and Analyze (encrypted) Web Browsing Traffic like a PRO

Generate and Analyze (encrypted) Web Browsing Traffic like a PRO

Even though the rise of encrypted protocols (e.g., DNS-over-HTTPS, TLS 1.3 and Encrypted Client Hellos (ECH), QUIC), website fingerprinting became an interesting research topic.
Why? Because we tend to forget that the application we use most often is our web browser. Be it simple web content consumption, reading your favorite news channels, listening to music on Spotify, watching cooking videos on Youtube, or even using Microsoft web applications to write reports for a project in Linux environments, the key indicator for your daily productivity heavily depends on using a browser.
Accordingly, if a third-party (e.g., your ISP, a malicious entity on the same public WiFi network, an authoritarian regime) can eavesdrop on your web browsing traffic and can identify the domains you are visiting, they can profile you. And since the content is changing way too often today, it is no longer vital what you consume at a given time…it is enough to see where that content is coming from.

Today, even if you tunnel everything through the ToR network, or in other words, you use the ToR browser, it is still possible to identify (from the encrypted packet trace any intermediate node might have eavesdropped on its way from your PC to the destination) what website/domain you are trying to visit with (relatively) high accuracy. Here, I don’t want to jump into the machine learning and discuss how it is done; I rather focus on how you can see, capture, and understand your web traffic, which can also be beneficial if you want to kickstart your machine learning+website fingerprinting research career.

More