Tuesday, July 07, 2009

Ping & Traceroute Troubleshooting Example

In this example, a ping to 186.9.17.153 gave a “TTL timeout” message. Ping TTLs will usually only timeout if there is a routing loop in which the packet bounces between two routers on the way to the target. Each “bounce” causes the TTL to decrease by a count of one until the TTL reaches zero at which point you get the timeout.

The routing loop was confirmed by the traceroute in which the packet was proven to be bouncing between routers at 186.40.64.94 and 186.40.64.93.

G:\>ping 186.9.17.153

Pinging 186.9.17.153 with 32 bytes of data:

Reply from 186.40.64.94: TTL expired in transit.

Reply from 186.40.64.94: TTL expired in transit.

Reply from 186.40.64.94: TTL expired in transit.

Reply from 186.40.64.94: TTL expired in transit.

Ping statistics for 186.9.17.153:

Packets: Sent = 4, Received = 4, Lost = 0 (0% loss),

Approximate round trip times in milli-seconds:

Minimum = 0ms, Maximum = 0ms, Average = 0ms

G:\>tracert 186.9.17.153

Tracing route to lostserver.confusion.net [186.9.17.153]

over a maximum of 30 hops:

1 <10>

2 60 ms 70 ms 60 ms rtr-2.confusion.net [186.40.64.94]

3 70 ms 71 ms 70 ms rtr-1.confusion.net [186.40.64.93]

4 60 ms 70 ms 60 ms rtr-2.confusion.net [186.40.64.94]

5 70 ms 70 ms 70 ms rtr-1.confusion.net [186.40.64.93]

6 60 ms 70 ms 61 ms rtr-2.confusion.net [186.40.64.94]

7 70 ms 70 ms 70 ms rtr-1.confusion.net [186.40.64.93]

8 60 ms 70 ms 60 ms rtr-2.confusion.net [186.40.64.94]

9 70 ms 70 ms 70 ms rtr-1.confusion.net [186.40.64.93]

...

...

...

Trace complete.

This problem was solved by resetting the routing process on both routers. The problem was initially triggered by an unstable network link that caused frequent routing recalculations. The constant activity eventually corrupted the routing tables of one of the routers.


Possible Reasons For Failed Traceroutes

Traceroutes can fail to reach their intended destination for a number of reasons, these include:

o Traceroute packets are being blocked or rejected by a router in the path. The router immediately after the last visible one is usually the culprit. It’s usually good to check the routing table and/or other status of this next hop device.

o The target server doesn’t exist on the network. It could be disconnected, or turned off. (!H or !N messages may be produced.)

o The network on which you expect the target host to reside doesn’t exist in the routing table of one of the routers in the path (!H or !N messages may be produced.)

o You may have a typographical error in the IP address of the target server

o You may have a routing loop in which packets bounce between two routers and never get to the intended destination.

o The packets don’t have a proper return path to your server. The last visible hop being the last hop in which the packets return correctly. The router immediately after the last visible one is the one at which the routing changes. It’s usually good to:

v log on to the last visible router.

v Look at the routing table to determine what the next hop is to your intended traceroute target.

v Log on to this next hop router.

v Do a traceroute from this router to your intended target server.

v If this works: Routing to the target server is OK. Do a traceroute back to your source server. The traceroute will probably fail at the bad router on the return path.

v If it doesn’t work: Test the routing table and/or other status of all the hops between it and your intended target.

Note: If there is nothing blocking your traceroute traffic, then the last visible router of an incomplete trace is either the last good router on the path, or the last router that has a valid return path to the server issuing the traceroute.