#408879 [ps] process start times fluctuate

Package:
procps
Source:
procps
Description:
/proc file system utilities
Submitter:
Paul Szabo
Date:
2025-04-15 23:39:02 UTC
Severity:
normal
Tags:
#408879#5
Date:
2007-01-28 22:03:35 UTC
From:
To:
We use ntpd to keep time synced. Then somehow the machine uses two times,
a "good" one set by ntpd, and an "internal drifted" one that should never
be shown. Confusingly, ps shows the "wrong" START time:

psz@pisa:~$ date
Mon Jan 29 08:47:40 EST 2007
psz@pisa:~$ last -1
psz      pts/6        y622.yt.maths.us Mon Jan 29 08:46   still logged in

wtmp begins Mon Jan  1 08:54:56 2007
psz@pisa:~$ ps aux | grep -E 'psz|USER'
USER       PID %CPU %MEM   VSZ  RSS TTY      STAT START   TIME COMMAND
psz      14608  0.0  0.0  8568 2896 ?        S    08:33   0:00 xterm -font 9x15 -bg #ffdab9 -fg black -T psz@pisa -n psz@pisa -sb -sl 10000 -ls
psz      14611  0.0  0.0  2568 1436 pts/6    Ss   08:33   0:00 -bash
psz      14626  0.0  0.0  1548  472 pts/6    S+   08:34   0:00 grep -E psz|USER
psz      14627  0.0  0.0  2496  848 pts/6    R+   08:34   0:00 ps aux
psz@pisa:~$

Cheers,

Paul Szabo   psz@maths.usyd.edu.au   http://www.maths.usyd.edu.au/u/psz/
School of Mathematics and Statistics   University of Sydney    Australia

#408879#10
Date:
2007-01-28 22:49:49 UTC
From:
To:
Again looking at BTS, this bug seems similar to #161633.

Cheers,

Paul Szabo   psz@maths.usyd.edu.au   http://www.maths.usyd.edu.au/u/psz/
School of Mathematics and Statistics   University of Sydney    Australia

#408879#15
Date:
2007-02-18 02:54:26 UTC
From:
To:
"me too"; Paul, does this happen to you on a laptop machine, or
otherwise?

#408879#20
Date:
2007-02-18 10:29:05 UTC
From:
To:
Justin,
departmental login server....

Cheers,

Paul Szabo   psz@maths.usyd.edu.au   http://www.maths.usyd.edu.au/u/psz/
School of Mathematics and Statistics   University of Sydney    Australia

#408879#25
Date:
2007-02-18 12:07:54 UTC
From:
To:
Same for us.  You mentioned bug #161633, in which the submitter is
clearly convinced that it is APM suspend time that isn't taken into
account.  Do you suspect some particular cause here, too?  (Or, you
aren't suspending your shell server, are you?)

Justin

#408879#30
Date:
2007-02-18 19:41:24 UTC
From:
To:
Justin,

Yes, I "blame" ntpd. I only guess that it uses "suspend" to slow
things down. (No, I wouldn't "manually" suspend my server.)

Cheers,

Paul Szabo   psz@maths.usyd.edu.au   http://www.maths.usyd.edu.au/u/psz/
School of Mathematics and Statistics   University of Sydney    Australia

#408879#35
Date:
2007-02-18 20:38:00 UTC
From:
To:
I'm of the understanding that ntp uses adjtimex() to do it's stuff.
#408879#40
Date:
2007-02-18 22:50:36 UTC
From:
To:
I suppose the next thing to work out is, is it a kernel problem or a ps
problem.

 - Craig

#408879#45
Date:
2007-02-26 11:41:24 UTC
From:
To:
I guess (strace) it uses some combination of /proc/uptime and
/proc/PID/stat

              starttime %lu
				   The time in jiffies  the  process  started
				   after  system boot.

uptime (jiffies):	sh -c 'cut -d" " -f22 /proc/$$/stat'
uptime (seconds):	cut -d' ' -f1 /proc/uptime

$ date; ps aux |tail -1
Thu Feb 22 08:57:35 EST 2007
jpryzby  20446  0.0  0.0   2696   472 pts/1521 R+   09:37   0:00 tail -1

$ echo `sh -c 'cut -d" " -f22 /proc/$$/stat'` / `cut -d' ' -f1 /proc/uptime` |bc -l
100.02309929333339513293

but,
 09:06:13 up 118 days, 13:39,  9 users,  load average: 0.00, 0.02, 0.00

Might have expected hz to be accurate to better than .02; on another machine:

  up 15 days 99.99999777870238057132 (1000 more accurate)

I don't know if that can explain it.

#408879#52
Date:
2007-06-24 12:42:59 UTC
From:
To:
merge 408879 414104
thanks

Hi,

earlier I was curious about this problem, too. I've seen it on several
servers. Now I just retried and I couldn't reproduce it any longer,
perhaps it was fixed somewhere, e.g.
* kernel 2.6.21
* ntp 4.2.4


Andreas

#408879#59
Date:
2008-08-10 00:33:30 UTC
From:
To:
tags 408879 unreproducible
thankyou
Ah, it might be related to the Hertz calculation, but the bug submitters
are not using strange architectures so its not that.

The problem here is I don't see it. My start times always seem to line
up, when I check them.  I've also used ntp for years.

I'm not sure anyone is seeing this problem anymore, which if noone is
then I'll close the bug.

 - Craig

#408879#66
Date:
2008-08-10 05:13:50 UTC
From:
To:
I still see the problem:

psz@pisa:~$ date; /bin/ps ux | grep /ps; date
Sun Aug 10 14:45:24 EST 2008
psz      15724  0.0  0.0  2496  844 pts/3    R+   14:38   0:00 /bin/ps ux
psz      15725  0.0  0.0  1548  472 pts/3    S+   14:38   0:00 grep /ps
Sun Aug 10 14:45:24 EST 2008

I often observe this, mostly on that one machine "pisa". I do not think
pisa is "special" in any way: has same hardware as some others, runs the
same kernel, more-or-less the same daemons. Pisa is still running sarge:

psz@pisa:~$ cat /etc/debian_version
3.1
psz@pisa:~$ uname -a
Linux pisa.maths.usyd.edu.au 2.6.8-spm1.11 #1 SMP Mon Jul 23 18:53:40 EST 2007 i686 GNU/Linux

This behaviour is not unique, some other (also sarge but hardware-wise
completely different) machines also show the problem:

root@p6283:~# date; /bin/ps ux | grep /ps; date
Sun Aug 10 14:58:24 EST 2008
root     15027  0.0  0.1  2504  852 pts/1    R+   14:57   0:00 /bin/ps ux
root     15028  0.0  0.0  1556  484 pts/1    S+   14:57   0:00 grep /ps
Sun Aug 10 14:58:24 EST 2008

I do not see the problem on all my machines; some though exactly the
same hardware as "pisa", never exhibited the problem (not while they
were running sarge, not now under etch). I do not now see the problem
on any etch machines.

Thanks, Paul

Paul Szabo   psz@maths.usyd.edu.au   http://www.maths.usyd.edu.au/u/psz/
School of Mathematics and Statistics   University of Sydney    Australia

#408879#71
Date:
2008-08-11 01:53:41 UTC
From:
To:
Curious, just a standard type computer. There are some arches where the
jiffies don't quite work and I thought it was that.

Probably best to work out what numbers we use.

Start time is found in the 22nd field of /proc/<pid>/stat and is in
number of jiffies since the computer had booted. To get this to a wall
clock start time we add the boot time of the computer to the converted
start time. BUT, to get the boot time we need to subtract the uptime of
the computer from "now".

Or:
 w_stime = (now - uptime) + ( j_stime / Hertz)

w_stime is wall clock start time
now is seconds since epoch for the time now
uptime is first column of /proc/uptime
j_stime is 22nd column of /proc/<pid>/uptime
Hertz is usually 100

Can you run the following commands, you will need to change the PID to
whatever process id you have.

PID=21789 ; ( date +"%s -" ; cut -f 1 -d ' ' /proc/uptime ; echo " + ("
; cut -f 22 -d ' ' /proc/$PID/stat ; echo ' / 100 )' ) | (tr '\n' ' ' ;
echo ) ; ps -p $PID -o pid,start_time,cmd

then take the first line of the output (not the ps bit)  and then:

echo "THE OUTPUT" | bc | perl -e 'use Time::localtime; my $mytime=<>; ; print
ctime($mytime)."\n";'

So for me I get:

gonzo$ PID=21789 ; ( date +"%s -" ; cut -f 1 -d ' ' /proc/uptime ; echo " + (" ; cut -f 22 -d ' ' /proc/$PID/stat ; echo ' / 100 )' ) | (tr '\n' ' '
; echo ) ; ps -p $PID -o pid,start_time,cmd
1218419241 - 251503.45  + ( 24646725  / 100 )
  PID START CMD
21789 10:23 sshd: csmall@pts/0
gonzo$ echo "1218419241 - 251503.45  + ( 24646725  / 100 )" | bc | perl -e 'use Time::localtime; my $mytime=<>; ; print > ctime($mytime)."\n";'
  Mon Aug 11 10:23:24 2008

As you can see, I'm getting the same results (10:23) here. We may have
an under/overflow problem too.

 - Craig

#408879#78
Date:
2008-08-11 02:58:54 UTC
From:
To:
Running the commands suggested, I get:

psz@pisa:~$ date
Mon Aug 11 12:55:15 EST 2008
psz@pisa:~$ bash
psz@pisa:~$ PID=$$; ( date +"%s -" ; cut -f 1 -d ' ' /proc/uptime ; echo " + (" ; cut -f 22 -d ' ' /proc/$PID/stat ; echo ' /100 )' ) | (tr '\n' ' '; echo ) ; ps -p $PID -o pid,start_time,cmd
1218423322 - 2339691.29  + ( 233925387  /100 )
  PID START CMD
24954 12:48 bash
psz@pisa:~$ echo '1218423322 - 2339691.29  + ( 233925387  /100 )' | bc | perl -e 'use Time::localtime; my $mytime=<>; ; print  ctime($mytime)."\n";'
Mon Aug 11 12:48:03 2008
psz@pisa:~$ date
Mon Aug 11 12:55:46 EST 2008
psz@pisa:~$ exit
psz@pisa:~$ date
Mon Aug 11 12:55:49 EST 2008

Cheers,

Paul Szabo   psz@maths.usyd.edu.au   http://www.maths.usyd.edu.au/u/psz/
School of Mathematics and Statistics   University of Sydney    Australia

#408879#83
Date:
2008-08-12 00:03:16 UTC
From:
To:
That's interesting, but its looking like its not a procps bug.  Or
perhaps there is something procps assumes that is incorrect.

There is no overflow problems here, the manual calculation is agreeing
with procps
and
 The now() time is correct.
 The uptime is about 29 days, the computer was booted 11am 15th July

Now you would expect that the uptime and the process start time would be
pretty close to each other, because they are being done at the same
time, or a second or so between them. But they are not:

In fact, your process start time is *less* than the uptime, by about 438
seconds, or 7 minutes.
That's 7 minutes.

So, the kernel is giving the wrong start time.

#408879#88
Date:
2008-08-12 01:04:27 UTC
From:
To:
Dear Craig,

I suspect the latter (this is "idle speculation", without having delved
into the sources).

Does the start time given by the kernel need a correction? Is the bug
in the kernel for lieing, or in ps for not applying that correction?

Cheers,

Paul Szabo   psz@maths.usyd.edu.au   http://www.maths.usyd.edu.au/u/psz/
School of Mathematics and Statistics   University of Sydney    Australia

#408879#93
Date:
2008-08-12 07:23:51 UTC
From:
To:
tags 408879 help
thankyou

And that, is exactly my problem.  Why do processes have a start time
that is less than the uptime.  Is there another field that needs a
correction or not.

So I have tagged this bug with help because I do need some help with
this.  To me it looks like a problem with the kernel but I'd like some
confirmation:

Why is the 22nd field in /proc/<PID>/stat bigger than the first field in
/proc/uptime ?  It shouldn't be, I think.

 - Craig

#408879#102
Date:
2009-05-13 03:03:24 UTC
From:
To:
I have now updated my "problem" machine to lenny and 2.6.26 kernel:

root@pisa:~# cat /etc/debian_version
5.0.1
root@pisa:~# uname -r
2.6.26-pk03.10-svr

and the problem already is visible two hours after boot:

root@pisa:~# date; /bin/ps -o lstart,command | grep /ps; date
Wed May 13 11:40:18 EST 2009
Wed May 13 11:40:17 2009 /bin/ps -o lstart,command
Wed May 13 11:40:17 2009 grep /ps
Wed May 13 11:40:18 EST 2009

(I expect the time drift to grow steadily).

Cheers,

Paul Szabo   psz@maths.usyd.edu.au   http://www.maths.usyd.edu.au/u/psz/
School of Mathematics and Statistics   University of Sydney    Australia

#408879#107
Date:
2009-05-14 03:56:59 UTC
From:
To:
One day after the lenny upgrade and last boot of the machine, there
does not seem to be a great time drift. I was running something like

while :; do
  date; /bin/ps -o lstart,command | grep /ps; date
done

and that showed:

...

Thu May 14 13:18:42 EST 2009
Thu May 14 13:18:42 2009 /bin/ps -o lstart,command
Thu May 14 13:18:42 EST 2009

Thu May 14 13:18:43 EST 2009
Thu May 14 13:18:43 2009 /bin/ps -o lstart,command
Thu May 14 13:18:43 EST 2009
(above block repeated 22 times)

Thu May 14 13:18:43 EST 2009
Thu May 14 13:18:42 2009 /bin/ps -o lstart,command   <<< note bad time
Thu May 14 13:18:43 EST 2009

Thu May 14 13:18:43 EST 2009
Thu May 14 13:18:43 2009 /bin/ps -o lstart,command
Thu May 14 13:18:43 EST 2009
(above block repeated 8 times)

Thu May 14 13:18:44 EST 2009
Thu May 14 13:18:44 2009 /bin/ps -o lstart,command
Thu May 14 13:18:44 EST 2009

...

which would suggest something other than a few milliseconds drift.
Rather curious; the "bad time" always positioned about the 3/4 of
the elapsed second...

Cheers,

Paul Szabo   psz@maths.usyd.edu.au   http://www.maths.usyd.edu.au/u/psz/
School of Mathematics and Statistics   University of Sydney    Australia

#408879#112
Date:
2010-08-25 02:43:47 UTC
From:
To:
I can still (2.6.24 kernel, lenny up-to-date) reproduce with command
(wrapped for readability, OK for cut-and-paste into xterm, ctrl-C when
had enough):

  while :; do \
    date; /bin/ps -o lstart,command | grep /ps; date; echo; \
  done | \
  perl -ne '
    BEGIN { $/ = "\n\n" }
    @x=grep(s/^(... ... .. ..:..:..).*$/$1/, split(/\n/));
    $p=""; $z=0;
    $z ||= $p gt $_, $p=$_ foreach @x;
    print if $z'

that shows things like:

Wed Aug 25 12:36:44 EST 2010
Wed Aug 25 12:36:43 2010 /bin/ps -o lstart,command
Wed Aug 25 12:36:44 EST 2010

Should the tag "unreproducible" be removed?

Thanks, Paul

Paul Szabo   psz@maths.usyd.edu.au   http://www.maths.usyd.edu.au/u/psz/
School of Mathematics and Statistics   University of Sydney    Australia

#408879#117
Date:
2012-01-30 01:06:27 UTC
From:
To:
Testing this old thing again at squeeze 6.0.4 with 2.6.32 kernel,
can still reproduce, the long command in previous message shows:


$ while :; do \
    date; /bin/ps -o lstart,command | grep /ps; date; echo; \
  done | \
  perl -ne '
    BEGIN { $/ = "\n\n" }
    @x=grep(s/^(... ... .. ..:..:..).*$/$1/, split(/\n/));
    $p=""; $z=0;
    $z ||= $p gt $_, $p=$_ foreach @x;
    print if $z'
Mon Jan 30 12:00:45 EST 2012
Mon Jan 30 12:00:44 2012 /bin/ps -o lstart,command
Mon Jan 30 12:00:44 2012 grep /ps
Mon Jan 30 12:00:45 EST 2012

Mon Jan 30 12:00:52 EST 2012
Mon Jan 30 12:00:51 2012 /bin/ps -o lstart,command
Mon Jan 30 12:00:51 2012 grep /ps
Mon Jan 30 12:00:52 EST 2012

...

Cheers, Paul

Paul Szabo   psz@maths.usyd.edu.au   http://www.maths.usyd.edu.au/u/psz/
School of Mathematics and Statistics   University of Sydney    Australia

#408879#122
Date:
2012-01-30 22:02:45 UTC
From:
To:
Hi Paul,
  Thanks for letting me know this is still a problem. The only thing is,
I couldn't see which version of procps you were using.  We've had a lot
of changes in the 3.3.x versions.

Is it always just a second earlier? That is what I'm noticing.

The relevant code is:
 t = getbtime() + pp->start_time / Hertz;

getbtime is basically 'grep btime /proc/stat'
start_time comes from column 23 of /proc/<PID>/stat divided by jiffies.

It is looking like a kernel problem, or more specifically a procfs
output problem.

 - Craig

#408879#127
Date:
2012-01-30 22:25:55 UTC
From:
To:
Dear Craig,

"dpkg -l" shows "ii procps 1:3.2.8-9 ..." (squeeze up-to-date).

Yes, it seems so. Have not seen any other discrepancies for a long time
now.

I also think so. Please let me know if I should run some other tests.
(Noting bug #657916 also.)

Thanks, Paul

Paul Szabo   psz@maths.usyd.edu.au   http://www.maths.usyd.edu.au/u/psz/
School of Mathematics and Statistics   University of Sydney    Australia

#408879#138
Date:
2016-01-03 02:01:21 UTC
From:
To:
Hello Paul,
  I'm chasing up old procps bugs and noticed this one is still opened.
It's been a long way since a 2.6 kernel and 3.2.x procps, wondering
if you still get your odd start times?

 - Craig

#408879#143
Date:
2016-01-03 10:26:56 UTC
From:
To:
this time with the right email alias.
#408879#146
Date:
2016-01-03 10:26:56 UTC
From:
To:
this time with the right email alias.
#408879#151
Date:
2016-01-03 22:01:54 UTC
From:
To:
Dear Craig,

I have not noticed any large discrepancies lately. However, I wonder
whether that is due to the machine clocks being more accurate, so not
much adjustments by ntpd (the old machine where this issue was so
obviously apparent, is long gone). In fact I still see:

psz@bari:~$ date --rfc-3339=ns; ps -ouser,lstart,command | grep -E 'psz|USER'
2016-01-04 08:59:35.472909349+11:00
USER                      STARTED COMMAND
psz      Mon Jan  4 08:43:01 2016 -bash
psz      Mon Jan  4 08:59:34 2016 ps -ouser,lstart,command
psz      Mon Jan  4 08:59:34 2016 grep -E psz|USER

and I wonder whether that is this issue, or just "rounding error".

Cheers, Paul

Paul Szabo   psz@maths.usyd.edu.au   http://www.maths.usyd.edu.au/u/psz/
School of Mathematics and Statistics   University of Sydney    Australia

#408879#156
Date:
2025-04-15 07:53:54 UTC
From:
To:
Hi,
  This is a really old bug about ps start times.
There has been a lot of changes with hardware, kernels
and procps itself so I'm closing this bug.

If it does still occur on current systems, we can
reopen it or open another, but I'd say its fixed
due to the update of those three things.

 - Craig

#408879#169
Date:
2025-04-15 09:23:41 UTC
From:
To:
Sorry but the bug is still there! Now my machine is running running
up-to-date bookworm, and I see (after just 12 days uptime):

psz@enna:~$ date; ps -o user,lstart,command | grep -E 'psz.*grep|USER'
Tue Apr 15 19:14:51 AEST 2025
USER                      STARTED COMMAND
psz      Tue Apr 15 19:14:50 2025 grep -E psz.*grep|USER

Maybe you could kindly re-open?

Thanks, Paul

#408879#174
Date:
2025-04-15 09:44:08 UTC
From:
To:
Another machine (similarly modern hardware, with bookworm, and 12 days
uptime) shows a discrepancy over half a second:

psz@azaona:~$ date --rfc-3339=ns; ps -ouser,lstart,command | grep -E 'psz.*grep|USER'
2025-04-15 19:38:41.589754682+10:00
USER                      STARTED COMMAND
psz      Tue Apr 15 19:38:40 2025 grep -E psz.*grep|USER

Surely worth re-opening, and fixing anytime in the next 20 years?

Thanks, Paul

#408879#181
Date:
2025-04-15 10:51:58 UTC
From:
To:
I've had another look at this, I've been looking at the angle from start
time to boot time, but that might not be where the issue is.

lstart comes from boot_time + TICS_BEGAN which is then converted to a real
time with localtime and strftime
boot_time comes from the btime field in /proc/stat
TICS_BEGAN comes from field 22 of /proc/PID/stat (converted from Hz, or
divided by 100).

So, could btime be wrong? Or rather be correct at time of boot and then the
system (once network is up and ntp is online) change its systen time?
$ date +%s ; btime=`grep btime /proc/stat |cut -f 2 -d' '` ; utime=`cut -f
1 -d'.' /proc/uptime` ; date +%s ; now=$((utime + btime)) ; echo $now
1744714188
1744714188
1744714187

 - Craig

#408879#186
Date:
2025-04-15 23:36:31 UTC
From:
To:
Dear Craig,

Thanks for re-opening.
(Good that finally you were able to observe yourself.)

I always suspected exactly that!

Suspected, because I do not really know how any of this works.
The way I understood things (this is probably totally wrong):

Process times, including start times, are counted in "jiffies", some
count that is monotonically increasing; and it is "known" that one jiffy
is some fraction of a second. But that does not take into account time
adjustments (say by ntpd). When time is adjusted, the "current jiffy" is
left unchanged: not incremented when that would be appropriate, also
never decremented as neither time nor jiffies are ever set backwards but
some other slowdown mechanism is used. Either the set-forward should use
some speedup mechanism similar to slowdown, or increment jiffies
together with the time.

Hope the above makes some sense.

Cheers, Paul

#408879#191
Date:
2025-04-15 23:37:40 UTC
From:
To:
Of course what I meant is that utime is wrong (not btime).