Create Presentation
Download Presentation

Download

Download Presentation

Global Network Positioning: A New Approach to Network Distance Prediction

165 Views
Download Presentation

Download Presentation
## Global Network Positioning: A New Approach to Network Distance Prediction

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -

**Global Network Positioning: A New Approach to Network**Distance Prediction Tze Sing Eugene Ng Department of Computer Science Carnegie Mellon University**MIT**CMU Stanford MIT Berkeley CMU Berkeley Stanford CMU MIT MIT CMU Stanford Stanford MIT MIT Berkeley Berkeley CMU CMU Stanford Berkeley Berkeley Stanford New Challenges • Large-scale distributed services and applications • Napster, Gnutella, End System Multicast, etc • Large number of configuration choices • K participants O(K2) e2e paths to consider**Role of Network Distance Prediction**• On-demand network measurement can be highly accurate, but • Not scalable • Slow • Network distance • Round-trip propagation and transmission delay • Relatively stable • Network distance can be predicted accurately without on-demand measurement • Fast and scalable first-order performance optimization • Refine as needed**Applying Network Distance**• Napster, Gnutella • Use directly in peer-selection • Quickly weed out 95% of likely bad choices • End System Multicast • Quickly build a good quality initial distribution tree • Refine with run-time measurements • Key: network distance prediction mechanism must be scalable, accurate, and fast**A/B**50ms HOPS Server Tracer Tracer Tracer State of the Art: IDMaps [Francis et al ‘99] • A network distance prediction service A B**IDMaps Benefits**• Significantly reduce measurement traffic compared to (# end hosts)2 measurements • End hosts can be simplistic**Challenging Issues**• Scalability • Topology data widely disseminated to HOPS servers • Requires more HOPS servers to scale with more client queries • Prediction speed/scalability • Communication overhead is O(K2) for distances among K hosts • Prediction accuracy • How accurate is the “Tracers/end hosts” topology model when the number of Tracers is small? • Deployment • Tracers/HOPS servers are sophisticated; probing end hosts may be viewed as intrusive**Global Network Positioning (GNP)**• Model the Internet as a geometric space (e.g. 3-D Euclidean) • Characterize the position of any end host with coordinates • Use computed distances to predict actual distances • Reduce distances to coordinates (x2,y2,z2) y (x1,y1,z1) x z (x4,y4,z4) (x3,y3,z3)**(x2,y2)**y L2 (x1,y1) L1 L1 x L3 L2 L3 (x3,y3) • Small number of distributed hosts called Landmarks measure inter-Landmark distances Landmark Operations • Compute Landmark coordinates by minimizing the overall discrepancy between measured distances and computed distances • Cast as a generic multi-dimensional global minimization problem Internet**Landmark Operations**• Landmark coordinates are disseminated to ordinary end hosts • A frame of reference • e.g. (2-D, (L1,x1,y1), (L2,x2,y2), (L3,x3,y3))**(x4,y4)**Ordinary Host Operations (x2,y2) y L2 • Each ordinary host measures its distances to the Landmarks, Landmarks just reflect pings (x1,y1) L1 L1 L3 x Internet L2 L3 (x3,y3) • Ordinary host computes its own coordinates relative to the Landmarks by minimizing the overall discrepancy between measured distances and computed distances • Cast as a generic multi-dimensional global minimization problem**GNP Advantages Over IDMaps**• High scalability and high speed • End host centric architecture, eliminates server bottleneck • Coordinates reduce O(K2) communication overhead to O(K*D) • Coordinates easily exchanged, predictions are locally and quickly computable by end hosts • Enable new applications • Structured nature of coordinates can be exploited • Simple deployment • Landmarks are simple, non-intrusive (compatible with firewalls)**Evaluation Methodology**• 19 Probes we control • 12 in North America, 5 in East Asia, 2 in Europe • Select IP addresses called Targets we do not control • Probes measure • Inter-Probe distances • Probe-to-Target distances • Each distance is the minimum RTT of 220 pings**T**T T T T T Evaluation Methodology (Cont’d) • Choose a subset of well-distributed Probes to be Landmarks, and use the rest for evaluation P2 (x1,y1) P1 P3 (x2, y2) P4**Computing Coordinates**• Multi-dimensional global minimization problem • Will discuss the objective function later • Simplex Downhill algorithm [Nelder & Mead ’65] • Simple and robust, few iterations required f(x) x**Data Sets**Global Set • 19 Probes • 869 Targets uniformly chosen from the IP address space • biased towards always-on and globally connected nodes • 44 Countries • 467 in USA, 127 in Europe, 84 in East Asia, 39 in Canada, …, 1 in Fiji, 65 unknown Abilene Set • 10 Probes are on Abilene • 127 Targets that are Abilene connected web servers**Performance Metrics**• Directional relative error • Symmetrically measure over and under predictions • Relative error = abs(Directional relative error) • Rank accuracy • % of correct prediction when choosing some number of shortest paths**???**Why the Difference? • IDMaps tends to heavily over-predict short distances • Consider (measured 50ms) • 22% of all paths in evaluation • IDMaps on average over-predicts by 150 % • GNP on average over-predicts by 30%**Basic Questions**• How to measure model error? • How to select Landmarks? • How does prediction accuracy change with the number of Landmarks? • What is geometric model to use? • How can we further improve GNP?**Measuring Model Error**is measured distance is computed distance is an error measuring function**Error Function**• Squared error • May not be good because one unit of error for short distances carry the same weight as one unit of error for long distances**More Error Functions**• Normalized error • Logarithmic transformation**Selecting N Landmarks**• Intuition: Landmarks should be well separated • Method 1: Clustering • start with 19 clusters, one probe per cluster • iteratively merge the two closest clusters until there are N clusters • choose the center of each cluster as the Landmarks • Method 2: Find “N-Medians” • choose the combination of N Probes that minimizes the total distance from each not chosen Probe to its nearest chosen Probe • Method 3: Maximum separation • choose the combination of N Probes that maximizes the total inter-Probe distances**K-Fold Validation**• Want more than just one set of N Landmarks to reduce noise • Select N+1 Landmarks based on a criterion • Eliminate one Landmark to get N Landmarks • i.e., N+1 different sets of N Landmarks that are close to the selection criterion**What Geometric Model to Use?**• Spherical surface, cylindrical surface • No better than 2-D Euclidean space • Euclidean space of varying dimensions**5**A C 1 B D 2-dimensional model 5 A C 1 B D 3-dimensional model Why Additional Dimensions Help? ISP A,B C,D AA B C D A 0 1 5 5 B 1 0 5 5 C 5 5 0 1 D 5 5 1 0**T**Reducing Measurement Overhead • Hypothesis: End hosts do not need to measure distances to all Landmarks to compute accurate coordinates P1 P3 P2 P5 P6 (x, y) P4**T**Reducing Measurement Overhead • Hypothesis: End hosts do not need to measure distances to all Landmarks to compute accurate coordinates P1 P3 P2 P5 P6 (x’, y’) P4**Removing Triangular Inequality Violations**• Remove Target (t) from data if • t in {a, b, c} • (a,c)/((a,b)+(b,c)) > threshold • Try two thresholds • 2.0; 647 of 869 Targets remain • 1.5; 392 of 869 Targets remain • Note: at 1.1, only 19 of 869 Targets remain!!!