Mesh and Grid Routing
SOCKS and similar proxies only provide one level of indirection to separate the source from the destination. Although proxy servers can be chained, the chain is relatively static and causes significant speed impacts. Each packet must relay through a proxy server that may not be along the optimal route. Connections that may take a few microseconds to travel directly can take a few seconds when going through a series of chained proxies. Moreover, an attacker at any of the proxies may be able to identify the source and destination, either through observing data content or connection sequences.
Mesh and grid networks provide an alternative to chained proxies. Originally designed for high-availability networking, mesh and grid networks consist of a series of relays arranged in a predetermined pattern. A grid network places each node in a geometric grid pattern. Although typically portrayed as a two-dimensional grid with each node containing four links to other nodes, grids can be three-dimensional (or n-dimensional), and nodes can be arranged in triangular, square, or even octagonal patterns. In contrast to the highly organized patterns that form grid networks, different nodes in a mesh network may contain different numbers of links,but each node is connected to at least two other nodes.
Regardless of the organization, redundancy is the driving concept behind grid and mesh networks. If any single node becomes unavailable or overworked, traffic can easily follow a different path. There are always multiple paths through these decentralized configurations. From an anonymity viewpoint, grids and meshes that operate at the network layer are ideal for obscuring sender, recipient, and connection content. Each packet from the sender will take an unknown, likely changing route to the destination. An attacker at any one of the relays may never see the traffic, or will only see a portion of the communication.
Although a packet traversing a grid or mesh network must contain source and destination information, these relate to the grid/mesh and not the sender and recipient. For example, a secure grid network may assign private keys to each node. The sender encodes the packet with the grid’s destination node’s public key. The grid routes the data to the destination where it is decoded. After decoding, the packet is forwarded out of the grid-proxy to the destination. An attacker within the grid cannot observe the transaction without compromising the destination. And in a decentralized network, the destination may change between packets. JAP provides anonymity for Web requests through the use of a mix network. A mix is essentially a mesh network with random paths.
Onion Routing
In a mesh or grid network, the network determines the route. If a node is unavailable, then data is routed through a different path. Onion routing leaves the path selection to the sender. The basic onion routing approach is as follows:
- The sender acquires a list of available nodes in the onion network. The list includes public keys for each node.
- The sender selects a few nodes that will form the path through the network.
- The sender encodes the data and path using each node’s public key. The resulting encoded data must be decoded by the correct node in the correct order. In particular, no node can see further than one hop through the path.
- The data is sent to the first node, where it is decoded and the next node in the path is identified. The data is decoded and forwarded until the unencrypted data is identified. The final node in the path connects to the destination and delivers the data. Originally started by the Office of Naval Research in the mid- 1990s, it quickly became a supported DARPA project. Tor is designed to provide anonymity from network analysis while providing bidirectional connectivity. This is achieved by supporting only connection-oriented network protocols, such as TCP (OSI layer 4). Each TCP connection establishes a path through the proxy network and the path is maintained until the TCP connection is closed. Tor provides sender, destination, and link anonymity:
Sender Anonymity: The sender cannot be identified by arbitrary nodes within the network nor by the destination; the destination only sees the final Tor node’s network address.
Destination Anonymity: The destination is unknown by any Tor node except for the final node. And only the final node can view the content of the packet.
Link Anonymity: Because the source randomly chooses the path for each TCP connection, a Tor node can only observe the volume from a single TCP transaction. There is not enough information to identify long-term patterns. Tor provides general anonymity, but it does have a few limitations:
Source Identification: Although the data is encrypted, the first node can identify the source’s network address. This is simply done by comparing the connection’s address with the list of known Tor nodes. If the connection is not from a known Tor node, then it is a source.
Tor Monitoring: An attacker who can monitor a large number of Tor nodes can statistically model traffic and potentially identify source, destinations, and links.
Tor Nodes: By default, Tor version 2.0 usually uses 3 or 4 nodes per path. This is chosen as a tradeoff between anonymity and performance. Assuming true random selection and a Tor network of 300 nodes, a single node can expect to be used as both a source and a destination approximately once out of every 900 connections. For a system that generates a high number of Tor connections, a single node may be able to correlate the observed source with the observed destination.
Path Reuse: Tor is not fast when it comes to establishing a new path through the set of Tor nodes. The path, called a circuit, may be reused for multiple TCP connections rather than establishing a new circuit each time. This increases the likelihood of a volume analysis.
DNS Leakage: Whereas many applications may tunnel data through Tor, many others may resolve hostnames directly. An attacker (preferably at the destination) monitoring DNS queries may be able to correlate clients performing DNS lookups with Tor connections. The DNS lookup can leak the sender’s identity. The ideal Tor client will use Tor for DNS lookups and periodically change the circuit through the network.
Chaffing
In 1997, AT&T researchers developed an anonymity system called Crowds. Their slogan, Anonymity loves company, succinctly defines a critical element to identity disclosure. Even with the most sophisticated anonymity system, a user can still be tracked and identified if nobody else uses the system. The only user of a system, or someone using a system in a unique way, can easily be traced. The best anonymity systems are the ones that are used by many people. A packet with an undisclosed source and destination cannot be linked to other packets if there is a crowd of equally anonymous packets from many users. Unfortunately, a large volume of users may not always be available. Chaffing isa method to generate artificial traffic, allowing a solitary connection to be hidden by noise. The four types of chaffing are directional, volume, size, and sequential. In each of these approaches, less chaffing is required as more users employ a particular anonymity system.
Directional Chaffing
When observing network traffic, all packets flow from a source to a destination. Even if proxies or relays obscure the source and destination, the packet still follows a point-to-point direction. If very little traffic traverses a particular path or flows to a specific destination, then all traffic is likely related. This permits analysis of the link. Directional chaffing addresses the point-to-point issue by generating many fake packets that traverse many different paths. An observer will be unable to separate the true data from the noise.
Volume Chaffing
When observing an anonymous network connection, the volume of traffic can reveal information about the content, even if the content is encrypted. To avoid volume analysis, volume chaffing can be used to generate arbitrary data. The additional traffic is used to normalize the overall volume. For example, all observed data could appear as a high, sustained volume even if the true content consists of low volume with irregular bursts. Similarly, chaffing can be used to slow down high-volume traffic so that it appears to be low volume.
Volume chaffing should be used before establishing and after terminating a connection. This prevents the identification of a connection. Otherwise, if the chaffing only appears when a connection is present, then an observer can identify the presence and duration of a connection.
Size Chaffing
The network and data link layers fragment data to fit into transmission blocks. For maximum throughput, each block is filled to the maximum data size. The last fragment is commonly smaller than previous fragments. An observer can use this information to determine the size of the data being transmitted. For example, if the MTU is 1,500 bytes and the third packet contains 100 bytes, then the total data size is likely 3,100 bytes (two 1,500-byte packets plus a 100-byte fragment). An observer may not know what the data contains but does know the data size. Size chaffing varies the size of the transmitted blocks. This may take either of two forms: normal or random transmit sizes. When using the normal transmission sizes, all packets appear to be the same size.
Larger packets are cut to a smaller size, and smaller packets are padded to appear larger. Normalizing the packet size prevents size identification. In contrast, random transmit sizes constantly change the transmitted data size. Large packets may be split, and small packets may be padded, but few sequential packets will appear to be the same size. Without the ability to observe data content, both of these approaches can defeat data size detection.
Sequential Chaffing
Normally packets are transmitted in a sequential order. An attacker collecting packets can be fairly certain that the packets are in order. If the data is encrypted, then knowing the transmission sequence can aid in cracking the encryption. Sequential chaffing reorders packets before transmission.
Although two packets captured minutes apart may be in order, packets captured a few seconds apart are likely out of order. For attackers to crack any encryption, they must first identify the packet order. If there are 8 packets in an unknown order, the chaffing yields 40,320 combinations (8 factorial). This dramatically increases the difficulty.
COMMON LIMITATIONS
Most anonymity systems are directed at protecting identification based on the network layer; however, factors beyond the network layer can lead to identification:
Higher-Level Information Leakage: Many higher-layer OSI protocols explicitly include identification information. Web sites use tracking cookies, email discloses addresses, and someone may post personal information to a forum. Even if the anonymity system uses encryption, the data must be decrypted before is can be used. Information leakage from higher protocols can compromise an otherwise anonymous connection.
Timing Attacks: Automated systems frequently operate on specific schedules, and people may have particular hours when they are online. Analyzing occurrence rates and durations can separate a person from an automated process, isolate connection functionality, and identify probably geographic regions. An attacker can optimize his attack to take advantage of known timings, and people desiring anonymity can be tracked.
Habitual and Sequential Patterns: Similar to timing attacks, habitual behavior and sequential accesses can identify an individual. When using Tor, each Web connection may go through a different node. By tracking all connection from Tor nodes, Web administrators can identify (1) where the anonymous individual visited, (2) how long they stayed on each Web page, and (3) the estimated speed of the anonymous sender’s network connection. If the anonymous person repeats the pattern daily, then his activities can be tracked over time. Similar patterns can be formed by correlation the order links are accessed, average timings between accesses, and amount of data transmitted and received.
Speed: Anonymity is not fast. Chaffing consumes network bandwidth, and relays add noticeable connectivity delays. Cryptography, for data privacy, adds additional delays.
In addition to the common limitations, there are a few methods to directly attack and breach anonymity. These can be implemented as a conscious effort.
Cookies: Many Web sites use cookies to track visitors. Even if the Web browser uses an anonymous network connection, cookies can permit tracking the user online.
Web Bugs: Besides Web browsers, many applications process URLs. Email clients and Instant Messaging (IM) systems, for example, can load images from remote Web sites simply by opening an email or receiving a link. By intentionally sending an anonymous target a Web bug, an attacker can force a breach of anonymity. If the bug is acquired directly, then the actual network address is disclosed. If the bug is accessed using the anonymity system, then timing analysis and information disclosure can attack the anonymity.
DNS Leaks: Many applications do not tunnel all data through the anonymous channel. DNS lookups are commonly performed directly. If an attacker sends a unique hostname to an anonymous target, then watching the DNS resolution can identify the target’s network address.
Trojans and Callback Bombs: An anonymous download may transfer hostile code to the anonymous sender or anonymous recipient. These systems can establish direct connections to a remote server that permits identification of an otherwise anonymous node. Alternately, the malware can use the anonymous connection and transmit information that reveals an identity.
Prevention
The OSI network layer permits communication between two nodes across the network through the use of network-wide addressing. Network addresses disclose more than routing information, however. In particular, an attacker can identify a specific individual’s location and identity. To remain anonymous online, defenders have a selection of tools. Moving network addresses and subnets can make location identification difficult. Blind drops, proxies, relays, and onion networks can obscure both the source and destination.
Chaffing can also deter network transfer analysis. Network anonymity is feasible, but it is vulnerable to disclosure from higherlayer protocols and usage. Attackers may identify individuals based on habitual patterns, transmitted content, and time-based analysis.
Thanks