We use ntpd to keep time synced. Then somehow the machine uses two times, a "good" one set by ntpd, and an "internal drifted" one that should never be shown. Confusingly, ps shows the "wrong" START time: psz@pisa:~$ date Mon Jan 29 08:47:40 EST 2007 psz@pisa:~$ last -1 psz pts/6 y622.yt.maths.us Mon Jan 29 08:46 still logged in wtmp begins Mon Jan 1 08:54:56 2007 psz@pisa:~$ ps aux | grep -E 'psz|USER' USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND psz 14608 0.0 0.0 8568 2896 ? S 08:33 0:00 xterm -font 9x15 -bg #ffdab9 -fg black -T psz@pisa -n psz@pisa -sb -sl 10000 -ls psz 14611 0.0 0.0 2568 1436 pts/6 Ss 08:33 0:00 -bash psz 14626 0.0 0.0 1548 472 pts/6 S+ 08:34 0:00 grep -E psz|USER psz 14627 0.0 0.0 2496 848 pts/6 R+ 08:34 0:00 ps aux psz@pisa:~$ Cheers, Paul Szabo psz@maths.usyd.edu.au http://www.maths.usyd.edu.au/u/psz/ School of Mathematics and Statistics University of Sydney Australia
Again looking at BTS, this bug seems similar to #161633. Cheers, Paul Szabo psz@maths.usyd.edu.au http://www.maths.usyd.edu.au/u/psz/ School of Mathematics and Statistics University of Sydney Australia
"me too"; Paul, does this happen to you on a laptop machine, or otherwise?
Justin, departmental login server.... Cheers, Paul Szabo psz@maths.usyd.edu.au http://www.maths.usyd.edu.au/u/psz/ School of Mathematics and Statistics University of Sydney Australia
Same for us. You mentioned bug #161633, in which the submitter is clearly convinced that it is APM suspend time that isn't taken into account. Do you suspect some particular cause here, too? (Or, you aren't suspending your shell server, are you?) Justin
Justin, Yes, I "blame" ntpd. I only guess that it uses "suspend" to slow things down. (No, I wouldn't "manually" suspend my server.) Cheers, Paul Szabo psz@maths.usyd.edu.au http://www.maths.usyd.edu.au/u/psz/ School of Mathematics and Statistics University of Sydney Australia
I'm of the understanding that ntp uses adjtimex() to do it's stuff.
I suppose the next thing to work out is, is it a kernel problem or a ps problem. - Craig
I guess (strace) it uses some combination of /proc/uptime and
/proc/PID/stat
starttime %lu
The time in jiffies the process started
after system boot.
uptime (jiffies): sh -c 'cut -d" " -f22 /proc/$$/stat'
uptime (seconds): cut -d' ' -f1 /proc/uptime
$ date; ps aux |tail -1
Thu Feb 22 08:57:35 EST 2007
jpryzby 20446 0.0 0.0 2696 472 pts/1521 R+ 09:37 0:00 tail -1
$ echo `sh -c 'cut -d" " -f22 /proc/$$/stat'` / `cut -d' ' -f1 /proc/uptime` |bc -l
100.02309929333339513293
but,
09:06:13 up 118 days, 13:39, 9 users, load average: 0.00, 0.02, 0.00
Might have expected hz to be accurate to better than .02; on another machine:
up 15 days 99.99999777870238057132 (1000 more accurate)
I don't know if that can explain it.
merge 408879 414104 thanks Hi, earlier I was curious about this problem, too. I've seen it on several servers. Now I just retried and I couldn't reproduce it any longer, perhaps it was fixed somewhere, e.g. * kernel 2.6.21 * ntp 4.2.4 Andreas
tags 408879 unreproducible thankyou Ah, it might be related to the Hertz calculation, but the bug submitters are not using strange architectures so its not that. The problem here is I don't see it. My start times always seem to line up, when I check them. I've also used ntp for years. I'm not sure anyone is seeing this problem anymore, which if noone is then I'll close the bug. - Craig
I still see the problem: psz@pisa:~$ date; /bin/ps ux | grep /ps; date Sun Aug 10 14:45:24 EST 2008 psz 15724 0.0 0.0 2496 844 pts/3 R+ 14:38 0:00 /bin/ps ux psz 15725 0.0 0.0 1548 472 pts/3 S+ 14:38 0:00 grep /ps Sun Aug 10 14:45:24 EST 2008 I often observe this, mostly on that one machine "pisa". I do not think pisa is "special" in any way: has same hardware as some others, runs the same kernel, more-or-less the same daemons. Pisa is still running sarge: psz@pisa:~$ cat /etc/debian_version 3.1 psz@pisa:~$ uname -a Linux pisa.maths.usyd.edu.au 2.6.8-spm1.11 #1 SMP Mon Jul 23 18:53:40 EST 2007 i686 GNU/Linux This behaviour is not unique, some other (also sarge but hardware-wise completely different) machines also show the problem: root@p6283:~# date; /bin/ps ux | grep /ps; date Sun Aug 10 14:58:24 EST 2008 root 15027 0.0 0.1 2504 852 pts/1 R+ 14:57 0:00 /bin/ps ux root 15028 0.0 0.0 1556 484 pts/1 S+ 14:57 0:00 grep /ps Sun Aug 10 14:58:24 EST 2008 I do not see the problem on all my machines; some though exactly the same hardware as "pisa", never exhibited the problem (not while they were running sarge, not now under etch). I do not now see the problem on any etch machines. Thanks, Paul Paul Szabo psz@maths.usyd.edu.au http://www.maths.usyd.edu.au/u/psz/ School of Mathematics and Statistics University of Sydney Australia
Curious, just a standard type computer. There are some arches where the
jiffies don't quite work and I thought it was that.
Probably best to work out what numbers we use.
Start time is found in the 22nd field of /proc/<pid>/stat and is in
number of jiffies since the computer had booted. To get this to a wall
clock start time we add the boot time of the computer to the converted
start time. BUT, to get the boot time we need to subtract the uptime of
the computer from "now".
Or:
w_stime = (now - uptime) + ( j_stime / Hertz)
w_stime is wall clock start time
now is seconds since epoch for the time now
uptime is first column of /proc/uptime
j_stime is 22nd column of /proc/<pid>/uptime
Hertz is usually 100
Can you run the following commands, you will need to change the PID to
whatever process id you have.
PID=21789 ; ( date +"%s -" ; cut -f 1 -d ' ' /proc/uptime ; echo " + ("
; cut -f 22 -d ' ' /proc/$PID/stat ; echo ' / 100 )' ) | (tr '\n' ' ' ;
echo ) ; ps -p $PID -o pid,start_time,cmd
then take the first line of the output (not the ps bit) and then:
echo "THE OUTPUT" | bc | perl -e 'use Time::localtime; my $mytime=<>; ; print
ctime($mytime)."\n";'
So for me I get:
gonzo$ PID=21789 ; ( date +"%s -" ; cut -f 1 -d ' ' /proc/uptime ; echo " + (" ; cut -f 22 -d ' ' /proc/$PID/stat ; echo ' / 100 )' ) | (tr '\n' ' '
; echo ) ; ps -p $PID -o pid,start_time,cmd
1218419241 - 251503.45 + ( 24646725 / 100 )
PID START CMD
21789 10:23 sshd: csmall@pts/0
gonzo$ echo "1218419241 - 251503.45 + ( 24646725 / 100 )" | bc | perl -e 'use Time::localtime; my $mytime=<>; ; print > ctime($mytime)."\n";'
Mon Aug 11 10:23:24 2008
As you can see, I'm getting the same results (10:23) here. We may have
an under/overflow problem too.
- Craig
Running the commands suggested, I get:
psz@pisa:~$ date
Mon Aug 11 12:55:15 EST 2008
psz@pisa:~$ bash
psz@pisa:~$ PID=$$; ( date +"%s -" ; cut -f 1 -d ' ' /proc/uptime ; echo " + (" ; cut -f 22 -d ' ' /proc/$PID/stat ; echo ' /100 )' ) | (tr '\n' ' '; echo ) ; ps -p $PID -o pid,start_time,cmd
1218423322 - 2339691.29 + ( 233925387 /100 )
PID START CMD
24954 12:48 bash
psz@pisa:~$ echo '1218423322 - 2339691.29 + ( 233925387 /100 )' | bc | perl -e 'use Time::localtime; my $mytime=<>; ; print ctime($mytime)."\n";'
Mon Aug 11 12:48:03 2008
psz@pisa:~$ date
Mon Aug 11 12:55:46 EST 2008
psz@pisa:~$ exit
psz@pisa:~$ date
Mon Aug 11 12:55:49 EST 2008
Cheers,
Paul Szabo psz@maths.usyd.edu.au http://www.maths.usyd.edu.au/u/psz/
School of Mathematics and Statistics University of Sydney Australia
That's interesting, but its looking like its not a procps bug. Or perhaps there is something procps assumes that is incorrect. There is no overflow problems here, the manual calculation is agreeing with procps and The now() time is correct. The uptime is about 29 days, the computer was booted 11am 15th July Now you would expect that the uptime and the process start time would be pretty close to each other, because they are being done at the same time, or a second or so between them. But they are not: In fact, your process start time is *less* than the uptime, by about 438 seconds, or 7 minutes. That's 7 minutes. So, the kernel is giving the wrong start time.
Dear Craig, I suspect the latter (this is "idle speculation", without having delved into the sources). Does the start time given by the kernel need a correction? Is the bug in the kernel for lieing, or in ps for not applying that correction? Cheers, Paul Szabo psz@maths.usyd.edu.au http://www.maths.usyd.edu.au/u/psz/ School of Mathematics and Statistics University of Sydney Australia
tags 408879 help thankyou And that, is exactly my problem. Why do processes have a start time that is less than the uptime. Is there another field that needs a correction or not. So I have tagged this bug with help because I do need some help with this. To me it looks like a problem with the kernel but I'd like some confirmation: Why is the 22nd field in /proc/<PID>/stat bigger than the first field in /proc/uptime ? It shouldn't be, I think. - Craig
I have now updated my "problem" machine to lenny and 2.6.26 kernel: root@pisa:~# cat /etc/debian_version 5.0.1 root@pisa:~# uname -r 2.6.26-pk03.10-svr and the problem already is visible two hours after boot: root@pisa:~# date; /bin/ps -o lstart,command | grep /ps; date Wed May 13 11:40:18 EST 2009 Wed May 13 11:40:17 2009 /bin/ps -o lstart,command Wed May 13 11:40:17 2009 grep /ps Wed May 13 11:40:18 EST 2009 (I expect the time drift to grow steadily). Cheers, Paul Szabo psz@maths.usyd.edu.au http://www.maths.usyd.edu.au/u/psz/ School of Mathematics and Statistics University of Sydney Australia
One day after the lenny upgrade and last boot of the machine, there does not seem to be a great time drift. I was running something like while :; do date; /bin/ps -o lstart,command | grep /ps; date done and that showed: ... Thu May 14 13:18:42 EST 2009 Thu May 14 13:18:42 2009 /bin/ps -o lstart,command Thu May 14 13:18:42 EST 2009 Thu May 14 13:18:43 EST 2009 Thu May 14 13:18:43 2009 /bin/ps -o lstart,command Thu May 14 13:18:43 EST 2009 (above block repeated 22 times) Thu May 14 13:18:43 EST 2009 Thu May 14 13:18:42 2009 /bin/ps -o lstart,command <<< note bad time Thu May 14 13:18:43 EST 2009 Thu May 14 13:18:43 EST 2009 Thu May 14 13:18:43 2009 /bin/ps -o lstart,command Thu May 14 13:18:43 EST 2009 (above block repeated 8 times) Thu May 14 13:18:44 EST 2009 Thu May 14 13:18:44 2009 /bin/ps -o lstart,command Thu May 14 13:18:44 EST 2009 ... which would suggest something other than a few milliseconds drift. Rather curious; the "bad time" always positioned about the 3/4 of the elapsed second... Cheers, Paul Szabo psz@maths.usyd.edu.au http://www.maths.usyd.edu.au/u/psz/ School of Mathematics and Statistics University of Sydney Australia
I can still (2.6.24 kernel, lenny up-to-date) reproduce with command
(wrapped for readability, OK for cut-and-paste into xterm, ctrl-C when
had enough):
while :; do \
date; /bin/ps -o lstart,command | grep /ps; date; echo; \
done | \
perl -ne '
BEGIN { $/ = "\n\n" }
@x=grep(s/^(... ... .. ..:..:..).*$/$1/, split(/\n/));
$p=""; $z=0;
$z ||= $p gt $_, $p=$_ foreach @x;
print if $z'
that shows things like:
Wed Aug 25 12:36:44 EST 2010
Wed Aug 25 12:36:43 2010 /bin/ps -o lstart,command
Wed Aug 25 12:36:44 EST 2010
Should the tag "unreproducible" be removed?
Thanks, Paul
Paul Szabo psz@maths.usyd.edu.au http://www.maths.usyd.edu.au/u/psz/
School of Mathematics and Statistics University of Sydney Australia
Testing this old thing again at squeeze 6.0.4 with 2.6.32 kernel,
can still reproduce, the long command in previous message shows:
$ while :; do \
date; /bin/ps -o lstart,command | grep /ps; date; echo; \
done | \
perl -ne '
BEGIN { $/ = "\n\n" }
@x=grep(s/^(... ... .. ..:..:..).*$/$1/, split(/\n/));
$p=""; $z=0;
$z ||= $p gt $_, $p=$_ foreach @x;
print if $z'
Mon Jan 30 12:00:45 EST 2012
Mon Jan 30 12:00:44 2012 /bin/ps -o lstart,command
Mon Jan 30 12:00:44 2012 grep /ps
Mon Jan 30 12:00:45 EST 2012
Mon Jan 30 12:00:52 EST 2012
Mon Jan 30 12:00:51 2012 /bin/ps -o lstart,command
Mon Jan 30 12:00:51 2012 grep /ps
Mon Jan 30 12:00:52 EST 2012
...
Cheers, Paul
Paul Szabo psz@maths.usyd.edu.au http://www.maths.usyd.edu.au/u/psz/
School of Mathematics and Statistics University of Sydney Australia
Hi Paul, Thanks for letting me know this is still a problem. The only thing is, I couldn't see which version of procps you were using. We've had a lot of changes in the 3.3.x versions. Is it always just a second earlier? That is what I'm noticing. The relevant code is: t = getbtime() + pp->start_time / Hertz; getbtime is basically 'grep btime /proc/stat' start_time comes from column 23 of /proc/<PID>/stat divided by jiffies. It is looking like a kernel problem, or more specifically a procfs output problem. - Craig
Dear Craig, "dpkg -l" shows "ii procps 1:3.2.8-9 ..." (squeeze up-to-date). Yes, it seems so. Have not seen any other discrepancies for a long time now. I also think so. Please let me know if I should run some other tests. (Noting bug #657916 also.) Thanks, Paul Paul Szabo psz@maths.usyd.edu.au http://www.maths.usyd.edu.au/u/psz/ School of Mathematics and Statistics University of Sydney Australia
Hello Paul, I'm chasing up old procps bugs and noticed this one is still opened. It's been a long way since a 2.6 kernel and 3.2.x procps, wondering if you still get your odd start times? - Craig
this time with the right email alias.
this time with the right email alias.
Dear Craig, I have not noticed any large discrepancies lately. However, I wonder whether that is due to the machine clocks being more accurate, so not much adjustments by ntpd (the old machine where this issue was so obviously apparent, is long gone). In fact I still see: psz@bari:~$ date --rfc-3339=ns; ps -ouser,lstart,command | grep -E 'psz|USER' 2016-01-04 08:59:35.472909349+11:00 USER STARTED COMMAND psz Mon Jan 4 08:43:01 2016 -bash psz Mon Jan 4 08:59:34 2016 ps -ouser,lstart,command psz Mon Jan 4 08:59:34 2016 grep -E psz|USER and I wonder whether that is this issue, or just "rounding error". Cheers, Paul Paul Szabo psz@maths.usyd.edu.au http://www.maths.usyd.edu.au/u/psz/ School of Mathematics and Statistics University of Sydney Australia
Hi, This is a really old bug about ps start times. There has been a lot of changes with hardware, kernels and procps itself so I'm closing this bug. If it does still occur on current systems, we can reopen it or open another, but I'd say its fixed due to the update of those three things. - Craig
Sorry but the bug is still there! Now my machine is running running up-to-date bookworm, and I see (after just 12 days uptime): psz@enna:~$ date; ps -o user,lstart,command | grep -E 'psz.*grep|USER' Tue Apr 15 19:14:51 AEST 2025 USER STARTED COMMAND psz Tue Apr 15 19:14:50 2025 grep -E psz.*grep|USER Maybe you could kindly re-open? Thanks, Paul
Another machine (similarly modern hardware, with bookworm, and 12 days uptime) shows a discrepancy over half a second: psz@azaona:~$ date --rfc-3339=ns; ps -ouser,lstart,command | grep -E 'psz.*grep|USER' 2025-04-15 19:38:41.589754682+10:00 USER STARTED COMMAND psz Tue Apr 15 19:38:40 2025 grep -E psz.*grep|USER Surely worth re-opening, and fixing anytime in the next 20 years? Thanks, Paul
I've had another look at this, I've been looking at the angle from start time to boot time, but that might not be where the issue is. lstart comes from boot_time + TICS_BEGAN which is then converted to a real time with localtime and strftime boot_time comes from the btime field in /proc/stat TICS_BEGAN comes from field 22 of /proc/PID/stat (converted from Hz, or divided by 100). So, could btime be wrong? Or rather be correct at time of boot and then the system (once network is up and ntp is online) change its systen time? $ date +%s ; btime=`grep btime /proc/stat |cut -f 2 -d' '` ; utime=`cut -f 1 -d'.' /proc/uptime` ; date +%s ; now=$((utime + btime)) ; echo $now 1744714188 1744714188 1744714187 - Craig
Dear Craig, Thanks for re-opening. (Good that finally you were able to observe yourself.) I always suspected exactly that! Suspected, because I do not really know how any of this works. The way I understood things (this is probably totally wrong): Process times, including start times, are counted in "jiffies", some count that is monotonically increasing; and it is "known" that one jiffy is some fraction of a second. But that does not take into account time adjustments (say by ntpd). When time is adjusted, the "current jiffy" is left unchanged: not incremented when that would be appropriate, also never decremented as neither time nor jiffies are ever set backwards but some other slowdown mechanism is used. Either the set-forward should use some speedup mechanism similar to slowdown, or increment jiffies together with the time. Hope the above makes some sense. Cheers, Paul
Of course what I meant is that utime is wrong (not btime).