Dear Maintainer, As part of an effort to diagnose - and later to confirm the fix of - an ongoing network problem, I have maintained an mtr session running for several weeks straight. The current overall summary for one hop in that session presently reads as follows: Hostname Loss Rcv Snt Last Best Avg Worst StDev 73.223.7.1 0.3% 4028069 4039341 68 7 12 60593 -2147483.75 The standard deviation value is negative, which is meaningless AFAIK, and therefore should not be possible. The specific negative value in question looks at a glance like the result of an overflow. I am not clear on exactly what it would take to reproduce this problem. Presumably, unreasonably high worst-case ping times in what is otherwise a normal network environment would be at least a contributing factor. However, I am relatively certain that I recall past sessions where this hop has shown a Worst value of over 70000 milliseconds, but the StDev value has remained positive; as such, I am not sure whether that would be sufficient. This bug is of course extremely minor, as even if it does occur reproducibly, the circumstances for it are rare and it is unlikely to have more than a cosmetic effect even when it does occur. However, as it is still a bug, I felt it worth reporting anyway. If there is anything I can do to help to track this down, please don't hesitate to let me know.
Hi, The variance, which is used to calculate the stdev, is stored in a 64-bit integer. However, what we store there are the squares of the difference from the average. So if you have 70 second ping time (sometimes), the square of 70000 miliseconds becomes 4900 million! Quite a lot, but unlikely to overflow a 64-bit value.... However the calculation is done in microseconds.... Thus your 70 seconds is 70 million microseocnds, giving 4900 trillion (4.9 * 10^15) added to the running total every second or so, (as long as the average remains around zero). This can overflow a 64-bit variable in human-observable time. I've modified the code to do the calculations in miliseconds from now on. This should buy us a factor of a million of margin. :-) Roger.
The case at hand was only about 60000 milliseconds, but yes, that would explain the problem. The fact that I've seen 70000-millisecond "Worst" times without seeing this problem would then be explained by the fact that those sessions didn't last this long; IIRC they were about two weeks at most, and this one is over six. Not a 100% fix in theory, but it should hide the problem for pretty much any case that's actually reasonable to support. Sounds good to me; thanks for the prompt response!
Control: fixed -1 0.85-1 So, if I'm not mistaken this is https://github.com/traviscross/mtr/commit/bc39728995df74dd0ab78feea9a8ecfc53579fce and was ultimately included in 0.83. Marking the first Debian version that has the fix. Bernhard