#696060 mtr: StDev overflowed to negative

Package:
mtr
Source:
mtr
Description:
Full screen ncurses and X11 traceroute tool
Submitter:
The Wanderer
Date:
2017-11-13 20:45:03 UTC
Severity:
minor
#696060#5
Date:
2012-12-16 13:30:28 UTC
From:
To:
Dear Maintainer,

As part of an effort to diagnose - and later to confirm the fix of - an
ongoing network problem, I have maintained an mtr session running for
several weeks straight.

The current overall summary for one hop in that session presently reads
as follows:

   Hostname    Loss  Rcv      Snt      Last   Best  Avg  Worst  StDev
   73.223.7.1  0.3%  4028069  4039341   68    7     12   60593  -2147483.75

The standard deviation value is negative, which is meaningless AFAIK,
and therefore should not be possible. The specific negative value in
question looks at a glance like the result of an overflow.

I am not clear on exactly what it would take to reproduce this problem.
Presumably, unreasonably high worst-case ping times in what is otherwise
a normal network environment would be at least a contributing factor.
However, I am relatively certain that I recall past sessions where this
hop has shown a Worst value of over 70000 milliseconds, but the StDev
value has remained positive; as such, I am not sure whether that would
be sufficient.

This bug is of course extremely minor, as even if it does occur
reproducibly, the circumstances for it are rare and it is unlikely to
have more than a cosmetic effect even when it does occur. However, as it
is still a bug, I felt it worth reporting anyway.

If there is anything I can do to help to track this down, please don't
hesitate to let me know.

#696060#10
Date:
2012-12-16 15:44:04 UTC
From:
To:
Hi,

The variance, which is used to calculate the stdev, is stored in a
64-bit integer.

However, what we store there are the squares of the difference from
the average. So if you have 70 second ping time (sometimes), the
square of 70000 miliseconds becomes 4900 million! Quite a lot, but
unlikely to overflow a 64-bit value.... However the calculation is
done in microseconds.... Thus your 70 seconds is 70 million
microseocnds, giving 4900 trillion (4.9 * 10^15) added to the running
total every second or so, (as long as the average remains around
zero). This can overflow a 64-bit variable in human-observable time.

I've modified the code to do the calculations in miliseconds from now
on. This should buy us a factor of a million of margin. :-)

	Roger.

#696060#15
Date:
2012-12-16 16:07:52 UTC
From:
To:
The case at hand was only about 60000 milliseconds, but yes, that would explain
the problem.

The fact that I've seen 70000-millisecond "Worst" times without seeing this
problem would then be explained by the fact that those sessions didn't last this
long; IIRC they were about two weeks at most, and this one is over six.

Not a 100% fix in theory, but it should hide the problem for pretty much any
case that's actually reasonable to support.

Sounds good to me; thanks for the prompt response!

#696060#20
Date:
2017-11-13 20:42:41 UTC
From:
To:
Control: fixed -1 0.85-1

So, if I'm not mistaken this is
https://github.com/traviscross/mtr/commit/bc39728995df74dd0ab78feea9a8ecfc53579fce
and was ultimately included in 0.83. Marking the first Debian version
that has the fix.

Bernhard