#945055 huge CPU temperature increase from 5.2 to 5.5 ... and when using intel_pstate

Package:
linux
Source:
linux
Submitter:
Christoph Anton Mitterer
Date:
2021-05-02 14:21:03 UTC
Severity:
important
Tags:
#945055#5
Date:
2019-11-19 04:56:45 UTC
From:
To:
Hi.

With the most recent upgrade, the CPU seems to run at considerably
higher temperatures.


Shortly after a fresh boot, but long enough so that everything has
started up, and the system being basically idle it looks like this:

microcode updated early to revision 0xca, date = 2019-09-26

iwlwifi-virtual-0
Adapter: Virtual device
temp1:            N/A

coretemp-isa-0000
Adapter: ISA adapter
Package id 0:  +58.0°C  (high = +100.0°C, crit = +100.0°C)
Core 0:        +57.0°C  (high = +100.0°C, crit = +100.0°C)
Core 1:        +57.0°C  (high = +100.0°C, crit = +100.0°C)

CMB1-acpi-0
Adapter: ACPI interface
in0:          16.60 V
curr1:         0.00 A




And with the same kernel but older microcode:

microcode updated early to revision 0xc6, date = 2019-08-14

iwlwifi-virtual-0
Adapter: Virtual device
temp1:            N/A

coretemp-isa-0000
Adapter: ISA adapter
Package id 0:  +55.0°C  (high = +100.0°C, crit = +100.0°C)
Core 0:        +53.0°C  (high = +100.0°C, crit = +100.0°C)
Core 1:        +54.0°C  (high = +100.0°C, crit = +100.0°C)

CMB1-acpi-0
Adapter: ACPI interface
in0:          16.60 V
curr1:         0.00 A



top shows basically no CPU utilisation (in both cases) but the fan
goes up and the CPU is constantly notacibly hot, which both wasn't
the case previously when the system was idle.




Now another strange thing:
With the NEW microcode, once some load was put on the system, even when
this is gone and the CPU back to basically no utilisation, the temperatures
are at a *much* higher level and stay there, for whichever reason:

iwlwifi-virtual-0
Adapter: Virtual device
temp1:        +34.0°C

coretemp-isa-0000
Adapter: ISA adapter
Package id 0:  +76.0°C  (high = +100.0°C, crit = +100.0°C)
Core 0:        +70.0°C  (high = +100.0°C, crit = +100.0°C)
Core 1:        +69.0°C  (high = +100.0°C, crit = +100.0°C)

CMB1-acpi-0
Adapter: ACPI interface
in0:          16.59 V
curr1:         0.00 A



Is this a problem with the microcode or a kinda expected side-effect
of the security workarounds?



Cheers,
Chris.

#945055#10
Date:
2019-11-19 18:30:06 UTC
From:
To:
...

...

I need the output of "cat /proc/cpuinfo" and also of "grep . /sys/devices/system/cpu/vulnerabilities/*" please. We need to know exactly what your processor is, and what got enabled.

Alternatively, you can report this directly upstream at:
https://github.com/intel/Intel-Linux-Processor-Microcode-Data-Files/issues

They will need the same information I requested.

There is nothing expected about it, as far as I know.

#945055#15
Date:
2019-11-22 00:20:26 UTC
From:
To:
All below running with:
# dmesg | head -n1
[    0.000000] microcode: microcode updated early to revision 0xca, date = 2019-09-26

# uname -a
Linux heisenberg 5.3.0-2-amd64 #1 SMP Debian 5.3.9-2 (2019-11-12) x86_64 GNU/Linux



# cat /proc/cpuinfo
processor	: 0
vendor_id	: GenuineIntel
cpu family	: 6
model		: 142
model name	: Intel(R) Core(TM) i7-7600U CPU @ 2.80GHz
stepping	: 9
microcode	: 0xca
cpu MHz		: 992.983
cache size	: 4096 KB
physical id	: 0
siblings	: 4
core id		: 0
cpu cores	: 2
apicid		: 0
initial apicid	: 0
fpu		: yes
fpu_exception	: yes
cpuid level	: 22
wp		: yes
flags		: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb invpcid_single pti ssbd ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm mpx rdseed adx smap clflushopt intel_pt xsaveopt xsavec xgetbv1 xsaves dtherm ida arat pln pts hwp hwp_notify hwp_act_window hwp_epp md_clear flush_l1d
bugs		: cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs taa itlb_multihit
bogomips	: 5799.77
clflush size	: 64
cache_alignment	: 64
address sizes	: 39 bits physical, 48 bits virtual
power management:

processor	: 1
vendor_id	: GenuineIntel
cpu family	: 6
model		: 142
model name	: Intel(R) Core(TM) i7-7600U CPU @ 2.80GHz
stepping	: 9
microcode	: 0xca
cpu MHz		: 999.945
cache size	: 4096 KB
physical id	: 0
siblings	: 4
core id		: 1
cpu cores	: 2
apicid		: 2
initial apicid	: 2
fpu		: yes
fpu_exception	: yes
cpuid level	: 22
wp		: yes
flags		: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb invpcid_single pti ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm mpx rdseed adx smap clflushopt intel_pt xsaveopt xsavec xgetbv1 xsaves dtherm ida arat pln pts hwp hwp_notify hwp_act_window hwp_epp
bugs		: cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs taa itlb_multihit
bogomips	: 5799.77
clflush size	: 64
cache_alignment	: 64
address sizes	: 39 bits physical, 48 bits virtual
power management:

processor	: 2
vendor_id	: GenuineIntel
cpu family	: 6
model		: 142
model name	: Intel(R) Core(TM) i7-7600U CPU @ 2.80GHz
stepping	: 9
microcode	: 0xca
cpu MHz		: 994.502
cache size	: 4096 KB
physical id	: 0
siblings	: 4
core id		: 0
cpu cores	: 2
apicid		: 1
initial apicid	: 1
fpu		: yes
fpu_exception	: yes
cpuid level	: 22
wp		: yes
flags		: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb invpcid_single pti ssbd ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm mpx rdseed adx smap clflushopt intel_pt xsaveopt xsavec xgetbv1 xsaves dtherm ida arat pln pts hwp hwp_notify hwp_act_window hwp_epp md_clear flush_l1d
bugs		: cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs taa itlb_multihit
bogomips	: 5799.77
clflush size	: 64
cache_alignment	: 64
address sizes	: 39 bits physical, 48 bits virtual
power management:

processor	: 3
vendor_id	: GenuineIntel
cpu family	: 6
model		: 142
model name	: Intel(R) Core(TM) i7-7600U CPU @ 2.80GHz
stepping	: 9
microcode	: 0xca
cpu MHz		: 999.951
cache size	: 4096 KB
physical id	: 0
siblings	: 4
core id		: 1
cpu cores	: 2
apicid		: 3
initial apicid	: 3
fpu		: yes
fpu_exception	: yes
cpuid level	: 22
wp		: yes
flags		: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb invpcid_single pti ssbd ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm mpx rdseed adx smap clflushopt intel_pt xsaveopt xsavec xgetbv1 xsaves dtherm ida arat pln pts hwp hwp_notify hwp_act_window hwp_epp md_clear flush_l1d
bugs		: cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs taa itlb_multihit
bogomips	: 5799.77
clflush size	: 64
cache_alignment	: 64
address sizes	: 39 bits physical, 48 bits virtual
power management:



# grep . /sys/devices/system/cpu/vulnerabilities/*
/sys/devices/system/cpu/vulnerabilities/itlb_multihit:KVM: Mitigation: Split huge pages
/sys/devices/system/cpu/vulnerabilities/l1tf:Mitigation: PTE Inversion; VMX: conditional cache flushes, SMT vulnerable
/sys/devices/system/cpu/vulnerabilities/mds:Mitigation: Clear CPU buffers; SMT vulnerable
/sys/devices/system/cpu/vulnerabilities/meltdown:Mitigation: PTI
/sys/devices/system/cpu/vulnerabilities/spec_store_bypass:Mitigation: Speculative Store Bypass disabled via prctl and seccomp
/sys/devices/system/cpu/vulnerabilities/spectre_v1:Mitigation: usercopy/swapgs barriers and __user pointer sanitization
/sys/devices/system/cpu/vulnerabilities/spectre_v2:Mitigation: Full generic retpoline, IBPB: conditional, IBRS_FW, STIBP: conditional, RSB filling
/sys/devices/system/cpu/vulnerabilities/tsx_async_abort:Mitigation: Clear CPU buffers; SMT vulnerable


In the meantime, I've especially observed this situation that after
some higher load, the CPU stays at pretty high temps, even though what
produced the load has stopped and top show basically nothing, just now
it's e.g. on:
$ sensors
coretemp-isa-0000
Adapter: ISA adapter
Package id 0:  +77.0°C  (high = +100.0°C, crit = +100.0°C)
Core 0:        +76.0°C  (high = +100.0°C, crit = +100.0°C)
Core 1:        +73.0°C  (high = +100.0°C, crit = +100.0°C)

iwlwifi-virtual-0
Adapter: Virtual device
temp1:        +32.0°C

CMB1-acpi-0
Adapter: ACPI interface
in0:          16.59 V
curr1:         0.00 A


Sometimes it starts to reduce again, for no apparent reason, but right
now it already runs hot for several minutes.


I'll try the older 5.2 now (with the current microcode) see whether
that changes anything.


Cheers,
Chris.

#945055#20
Date:
2019-11-22 01:34:59 UTC
From:
To:
After some further tests I think it could be actually independent of
the microcode and rather be kernel 5.3's fault.


First perhaps some notes on that notebook:
It's an Fujitsu LIFEBOOK U757 with:
model name	: Intel(R) Core(TM) i7-7600U CPU @ 2.80GHz

When I got the system in around 2017, IIRC, I already had considerable
CPU overheating problems seeing often messages like:
ov 17 14:41:22 heisenberg kernel: [   36.347425] mce: CPU2: Core temperature above threshold, cpu clock throttled (total events = 1)
Nov 17 14:41:22 heisenberg kernel: [   36.347426] mce: CPU0: Core temperature above threshold, cpu clock throttled (total events = 1)
Nov 17 14:41:22 heisenberg kernel: [   36.347427] mce: CPU1: Package temperature above threshold, cpu clock throttled (total events = 1)
Nov 17 14:41:22 heisenberg kernel: [   36.347427] mce: CPU3: Package temperature above threshold, cpu clock throttled (total events = 1)
Nov 17 14:41:22 heisenberg kernel: [   36.347429] mce: CPU0: Package temperature above threshold, cpu clock throttled (total events = 1)
Nov 17 14:41:22 heisenberg kernel: [   36.347531] mce: CPU2: Package temperature above threshold, cpu clock throttled (total events = 1)
Nov 17 14:41:22 heisenberg kernel: [   36.348423] mce: CPU2: Core temperature/speed normal
Nov 17 14:41:22 heisenberg kernel: [   36.348424] mce: CPU0: Core temperature/speed normal
Nov 17 14:41:22 heisenberg kernel: [   36.348461] mce: CPU1: Package temperature/speed normal
Nov 17 14:41:22 heisenberg kernel: [   36.348461] mce: CPU3: Package temperature/speed normal
Nov 17 14:41:22 heisenberg kernel: [   36.348498] mce: CPU2: Package temperature/speed normal
Nov 17 14:41:22 heisenberg kernel: [   36.348568] mce: CPU0: Package temperature/speed normal

Just with many thousands of events and temperatures reaching pretty
exactly 100°C.

Fujitsu support had no real solution, claiming it wouldn't happen under
Windows.
Eventually the solution was to disable the turbo:
/sys/devices/system/cpu/intel_pstate/no_turbo = 1

(and all my previous tests as well as the ones from this mail have that
set).

Since then I rarely see the temperature warnings from above, and if
it's usually only exactly one event during boot.


I guess the cooling of that slim ultrabook is just not designed well
enough to transport enough heat away if the turbo is on.


One further constant thing is that playback of videos always lead to
considerable CPU utilisation (and higher temperatures), much worse than
the previous ~2012 lifebook the university bought me.

I've never found a real solution for that,... video decoding
acceleration is enabled and seems to work but still,...
Suspicion was that it might be some issue in cinnamon, cause the
cinnamon process also gets quite high CPU usage when I play back
videos.


But when I've reported this ticket, things had gotten much worse (and
it was like that already for a week till I've took action or even
noticed it), especially when the system was basically idle, the temps
were also much higher and the fan running much louder/faster.

And as I've described before, there are situations when it get's really
hot (80° and more) an doesn't cool down again a lot, even if the it
becomes idle again.






Just before I did some more testing with different kernel/microcode
combinations:

**************************************************************
With 5.2.17 0xca-2019-09-26:

 Event/s PID      %CPU  PR  NI Task               Init Function
   59.82    1730   0.2   0   0 Xorg               hrtimer_wakeup
   55.83    1730   0.2   0   0 Xorg               it_real_fn
   14.96    3203   0.5   0   0 gnome-terminal-    hrtimer_wakeup
   13.96    3086   0.0   0   0 diodon             hrtimer_wakeup
    7.98    3065   0.2   0   0 cinnamon           tick_sched_timer
    3.99    3203   0.5   0   0 gnome-terminal-    tick_sched_timer
    2.99      32   0.0   0   0 [kworker/1:1]      intel_uncore_fw_release_timer
    1.99     509   0.0   0   0 [kworker/u8:4]     intel_uncore_fw_release_timer
    1.99    1730   0.2   0   0 Xorg               intel_uncore_fw_release_timer
    1.99      45   0.0   0   0 [kworker/2:1]      intel_uncore_fw_release_timer
    1.00     847   0.0   0   0 gmain              hrtimer_wakeup
    1.00    2861   0.0   0   0 gmain              hrtimer_wakeup
    1.00     236   0.0   0   0 [kworker/3:2]      intel_uncore_fw_release_timer
    1.00     751   0.0   0   0 haveged            hrtimer_wakeup
    1.00    2920   0.0   0   0 gmain              hrtimer_wakeup
    1.00    1730   0.2   0   0 Xorg               tick_sched_timer
    1.00    3065   0.3   0   0 cinnamon           intel_uncore_fw_release_timer
173 Total events, 172.48 events/sec (kernel:  7.98, userspace: 164.51)

 Event/s PID      %CPU  PR  NI Task               Init Function
   68.00    1730   0.5   0   0 Xorg               hrtimer_wakeup
   65.00    1730   0.5   0   0 Xorg               it_real_fn
   20.00    3203   0.3   0   0 gnome-terminal-    hrtimer_wakeup
   14.00    3086   0.0   0   0 diodon             hrtimer_wakeup
    7.00    3065   0.4   0   0 cinnamon           tick_sched_timer
    5.00      32   0.0   0   0 [kworker/1:1]      intel_uncore_fw_release_timer
    4.00    1730   0.5   0   0 Xorg               tick_sched_timer
    4.00    1730   0.5   0   0 Xorg               intel_uncore_fw_release_timer
    2.00     751   0.0   0   0 haveged            hrtimer_wakeup
    1.00    2695   0.0   0   0 ssh-agent          hrtimer_wakeup
    1.00    3074   0.0   0   0 gdbus              tick_sched_timer
    1.00    3203   0.3   0   0 gnome-terminal-    tick_sched_timer
    1.00     145   0.0   0   0 [kworker/0:2]      tick_sched_timer
    1.00    3104   0.0   0   0 gdbus              tick_sched_timer
194 Total events, 194.00 events/sec (kernel:  6.00, userspace: 188.00)

 Event/s PID      %CPU  PR  NI Task               Init Function
   63.00    1730   0.5   0   0 Xorg               hrtimer_wakeup
   61.00    1730   0.5   0   0 Xorg               it_real_fn
   14.00    3086   0.1   0   0 diodon             hrtimer_wakeup
   13.00    3203   0.3   0   0 gnome-terminal-    hrtimer_wakeup
   12.00    3065   0.5   0   0 cinnamon           tick_sched_timer
    7.00      32   0.0   0   0 [kworker/1:1]      intel_uncore_fw_release_timer
    5.00    1730   0.5   0   0 Xorg               intel_uncore_fw_release_timer
    3.00    1730   0.5   0   0 Xorg               tick_sched_timer
    2.00     173   0.0   0   0 [kworker/u8:3]     intel_uncore_fw_release_timer
    2.00    3203   0.3   0   0 gnome-terminal-    tick_sched_timer
    2.00     236   0.5   0   0 [kworker/3:2]      intel_uncore_fw_release_timer
    1.00     145   0.0   0   0 [kworker/0:2]      intel_uncore_fw_release_timer
    1.00     751   0.0   0   0 haveged            hrtimer_wakeup
    1.00    3086   0.0   0   0 diodon             tick_sched_timer
187 Total events, 187.00 events/sec (kernel: 12.00, userspace: 175.00)

 Event/s PID      %CPU  PR  NI Task               Init Function
   60.00    1730   0.2   0   0 Xorg               hrtimer_wakeup
   55.00    1730   0.2   0   0 Xorg               it_real_fn
   16.00    3086   0.1   0   0 diodon             hrtimer_wakeup
   14.00    3203   0.2   0   0 gnome-terminal-    hrtimer_wakeup
    7.00    3065   0.2   0   0 cinnamon           tick_sched_timer
    5.00      32   0.0   0   0 [kworker/1:1]      intel_uncore_fw_release_timer
    2.00    1730   0.2   0   0 Xorg               tick_sched_timer
    1.00    3203   0.2   0   0 gnome-terminal-    tick_sched_timer
    1.00     751   0.0   0   0 haveged            hrtimer_wakeup
    1.00    3065   0.3   0   0 cinnamon           hrtimer_wakeup
    1.00    1730   0.2   0   0 Xorg               intel_uncore_fw_release_timer
    1.00      32   0.0   0   0 [kworker/1:1]      tick_sched_timer
    1.00    3065   0.4   0   0 cinnamon           intel_uncore_fw_release_timer
165 Total events, 165.00 events/sec (kernel:  6.00, userspace: 159.00)

 Event/s PID      %CPU  PR  NI Task               Init Function
   65.00    1730   0.5   0   0 Xorg               hrtimer_wakeup
   61.00    1730   0.5   0   0 Xorg               it_real_fn
   34.00    3065   1.3   0   0 cinnamon           tick_sched_timer
   19.00    3086   0.1   0   0 diodon             hrtimer_wakeup
   14.00    3203   0.2   0   0 gnome-terminal-    hrtimer_wakeup
   11.00     842   0.0   0   0 NetworkManager     tick_sched_timer
    8.00     907   0.0   0   0 gdbus              tick_sched_timer
    8.00     841   0.0   0   0 dbus-daemon        tick_sched_timer
    5.00    3074   0.7   0   0 gdbus              tick_sched_timer
    5.00      32   0.0   0   0 [kworker/1:1]      intel_uncore_fw_release_timer
    4.00    3104   0.2   0   0 gdbus              tick_sched_timer
    2.00    3203   0.2   0   0 gnome-terminal-    tick_sched_timer
    2.00     236   0.0   0   0 [kworker/3:2]      intel_uncore_fw_release_timer
    2.00     841   0.0   0   0 <...>              tick_sched_timer
    2.00    3086   0.1   0   0 diodon             tick_sched_timer
    2.00    1730   0.5   0   0 Xorg               tick_sched_timer
    2.00    3088   0.0   0   0 nm-applet          tick_sched_timer
    2.00     888   0.0   0   0 wpa_supplicant     hrtimer_wakeup
    2.00    1730   0.5   0   0 Xorg               intel_uncore_fw_release_timer
    1.00     751   0.0   0   0 haveged            hrtimer_wakeup
251 Total events, 251.00 events/sec (kernel:  7.00, userspace: 244.00)

 Event/s PID      %CPU  PR  NI Task               Init Function
   59.30    1730   0.5   0   0 Xorg               hrtimer_wakeup
   57.29    1730   0.5   0   0 Xorg               it_real_fn
   23.12    3203   0.0   0   0 gnome-terminal-    hrtimer_wakeup
   16.08    3086   0.0   0   0 diodon             hrtimer_wakeup
    7.04      32   0.0   0   0 [kworker/1:1]      intel_uncore_fw_release_timer
    7.04    3065   0.3   0   0 cinnamon           tick_sched_timer
    3.02    1730   0.5   0   0 Xorg               tick_sched_timer
    3.02    1730   0.5   0   0 Xorg               intel_uncore_fw_release_timer
    2.01     751   0.0   0   0 haveged            hrtimer_wakeup
    1.01    3203   0.0   0   0 gnome-terminal-    tick_sched_timer
178 Total events, 178.89 events/sec (kernel:  7.04, userspace: 171.86)

^C Event/s PID      %CPU  PR  NI Task               Init Function
  141.84    1730   0.0   0   0 Xorg               hrtimer_wakeup
  120.57    1730   0.0   0   0 Xorg               it_real_fn
   85.11    3203   1.8   0   0 gnome-terminal-    hrtimer_wakeup
   28.37    3203   1.8   0   0 gnome-terminal-    tick_sched_timer
   14.18     751   0.0   0   0 haveged            hrtimer_wakeup
   14.18      32   0.0   0   0 [kworker/1:1]      intel_uncore_fw_release_timer
    7.09    2718   0.0   0   0 <...>              hrtimer_wakeup
    7.09    7530   0.0   0   0 [kworker/u8:8]     tick_sched_timer
    7.09    3065   0.8   0   0 cinnamon           hrtimer_wakeup
    7.09    1730   0.0   0   0 Xorg               intel_uncore_fw_release_timer
61 Total events, 432.62 events/sec (kernel: 21.28, userspace: 411.35)


=> I thought I'd had seen much higher numbers for hrtimer_wakeup when
running 5.3, but that didn't turn out to be the case



at an idle system (DE/cinnamon running but no real load)

root@heisenberg:~# sensors
coretemp-isa-0000
Adapter: ISA adapter
Package id 0:  +54.0°C  (high = +100.0°C, crit = +100.0°C)
Core 0:        +52.0°C  (high = +100.0°C, crit = +100.0°C)
Core 1:        +52.0°C  (high = +100.0°C, crit = +100.0°C)

iwlwifi-virtual-0
Adapter: Virtual device
temp1:        +33.0°C



update-iniramfs -u -k all barely hits 70°C

**************************************************************




**************************************************************
With 5.3.9+0xCA-2019-09-26

eventstat didn't show considerably higher numbers for e.g.
hrtimer_wakeup
which I thought I'd had seen at first.


But now idle system (again cinnamon running) seems to run much hotter,
barely getting below 60°:

# sensors
coretemp-isa-0000
Adapter: ISA adapter
Package id 0:  +66.0°C  (high = +100.0°C, crit = +100.0°C)
Core 0:        +61.0°C  (high = +100.0°C, crit = +100.0°C)
Core 1:        +61.0°C  (high = +100.0°C, crit = +100.0°C)

iwlwifi-virtual-0
Adapter: Virtual device
temp1:        +33.0°C

CMB1-acpi-0
Adapter: ACPI interface
in0:          16.58 V
curr1:         0.00 A



Here I did some apt/aptitude stuff to get older an intel-microcode from
stable or oldstable.
After that (installation of packages and update-initramfs) it took (I'd
say) noticeable longer (not extremely much, but noticable) till the CPU
cools down to the (still higher base level of) idle temps from above
(~60-68°)

Running update-initramfs -k all -u let's the temps go easily above 70°
up to 85°.


Interestingly sometimes it cools down again rather fast (but still only
to the 60° range).
Sometimes it doesn't.

Especially video playpack seems to be a killer.
Playing a:
    Stream #0:0(und): Video: h264 (High) (avc1 / 0x31637661), yuv420p,
720x304 [SAR 152:151 DAR 360:151], 529 kb/s, SAR 181:180 DAR 181:76, 25
fps, 25 tbr, 25k tbn, 50 tbc (default)
in full screen lets the CPU heat up to:
# sensors
coretemp-isa-0000
Adapter: ISA adapter
Package id 0:  +93.0°C  (high = +100.0°C, crit = +100.0°C)
Core 0:        +83.0°C  (high = +100.0°C, crit = +100.0°C)
Core 1:        +81.0°C  (high = +100.0°C, crit = +100.0°C)

iwlwifi-virtual-0
Adapter: Virtual device
temp1:        +33.0°C

CMB1-acpi-0
Adapter: ACPI interface
in0:          16.57 V
curr1:         0.00 A

and it took quite a while to cool down, even though I've stopped the
video for a minute or so already.

**************************************************************




**************************************************************
With 5.3.9+0xb-2019-04-01:

I.e. current kernel, but even older microcode (the last one where I
though it was ok, was 3.20191112.1 ... but that might be just a
coincidence since on Nov 14 2019 I've installed kernel 5.3 packages,
and that is roughly around 3.20191113.1 (Fri, 15 Nov 2019) where I've
started to slowly notice the CPU temperature issues.


Idle temp sems to be around:

# sensors
iwlwifi-virtual-0
Adapter: Virtual device
temp1:        +33.0°C

coretemp-isa-0000
Adapter: ISA adapter
Package id 0:  +71.0°C  (high = +100.0°C, crit = +100.0°C)
Core 0:        +66.0°C  (high = +100.0°C, crit = +100.0°C)
Core 1:        +65.0°C  (high = +100.0°C, crit = +100.0°C)

CMB1-acpi-0
Adapter: ACPI interface
in0:          16.57 V
curr1:         0.00 A


so here I concluded that maybe 5.3 is the offender... and not the
microcode!?


Installing the current microcode again and afterwards doing:
update-initramfs -k all -u
leads to temps around that:

$ sensors
iwlwifi-virtual-0
Adapter: Virtual device
temp1:        +33.0°C

coretemp-isa-0000
Adapter: ISA adapter
Package id 0:  +79.0°C  (high = +100.0°C, crit = +100.0°C)
Core 0:        +74.0°C  (high = +100.0°C, crit = +100.0°C)
Core 1:        +79.0°C  (high = +100.0°C, crit = +100.0°C)

CMB1-acpi-0
Adapter: ACPI interface
in0:          16.57 V
curr1:         0.00 A


staying long at around:
iwlwifi-virtual-0
Adapter: Virtual device
temp1:        +33.0°C

coretemp-isa-0000
Adapter: ISA adapter
Package id 0:  +74.0°C  (high = +100.0°C, crit = +100.0°C)
Core 0:        +68.0°C  (high = +100.0°C, crit = +100.0°C)
Core 1:        +66.0°C  (high = +100.0°C, crit = +100.0°C)

CMB1-acpi-0
Adapter: ACPI interface
in0:          16.57 V
curr1:         0.00 A

even though the initrd creation is already long over and top shows
nothing else.

**************************************************************


I'm now back at running 5.2.17-1 (2019-10-06) from the linux-image-5.2.0-3-amd64-unsigned
package with the most recent intel-microcode package version.

Temperatures seem good (in the sense: as from before I noticed issues).

So my conclusion would be 5.3 is the bad boy...

Shall we reassign it to src:linux?


Cheers,
Chris.

#945055#25
Date:
2019-11-24 01:36:02 UTC
From:
To:
Please send the output of the grep line:
grep . /sys/devices/system/cpu/vulnerabilities/*

With the 5.2 kernel in the new microcode (20191115).

Thank you!

#945055#30
Date:
2019-11-25 02:30:11 UTC
From:
To:
/sys/devices/system/cpu/vulnerabilities/l1tf:Mitigation: PTE Inversion; VMX: conditional cache flushes, SMT vulnerable
/sys/devices/system/cpu/vulnerabilities/mds:Mitigation: Clear CPU buffers; SMT vulnerable
/sys/devices/system/cpu/vulnerabilities/meltdown:Mitigation: PTI
/sys/devices/system/cpu/vulnerabilities/spec_store_bypass:Mitigation: Speculative Store Bypass disabled via prctl and seccomp
/sys/devices/system/cpu/vulnerabilities/spectre_v1:Mitigation: usercopy/swapgs barriers and __user pointer sanitization
/sys/devices/system/cpu/vulnerabilities/spectre_v2:Mitigation: Full generic retpoline, IBPB: conditional, IBRS_FW, STIBP: conditional, RSB filling


during:

root@heisenberg:~# uname -a
Linux heisenberg 5.2.0-3-amd64 #1 SMP Debian 5.2.17-1 (2019-10-06) x86_64 GNU/Linux
root@heisenberg:~# dmesg | head -n1
[    0.000000] microcode: microcode updated early to revision 0xca, date = 2019-09-26

(which I think is the one from 20191115, right?

Cheers,
Chris.

#945055#35
Date:
2019-12-13 23:43:30 UTC
From:
To:
Reassigning to the kernel, since the problem is likely there.
#945055#46
Date:
2019-12-13 23:49:44 UTC
From:
To:
btw: 5.3.15-1 seems to be still affected.


While I see mostly "normal" temperatures right after boot (and when
everything has settled)... after some point in time, tmeperatures get
up and remain at high levels, e.g. :
  Package id 0:  +81.0°C  (high = +100.0°C, crit = +100.0°C)
  Core 0:        +75.0°C  (high = +100.0°C, crit = +100.0°C)
  Core 1:        +74.0°C  (high = +100.0°C, crit = +100.0°C)
even though top shows an effectively idle system.


Cheers,
Chris.

#945055#51
Date:
2019-12-16 15:18:39 UTC
From:
To:
I should perhaps add that there is some slight indication that this
might be graphics related.

Cause when I switch from the running X/Cinnamon (with the CPU at
average temperatures between 75-70°C, while top doesn't show really
much) to the virtual kernel console, temperatures go down drastically
(to around 56°C).


Cheers,
Chris.

#945055#56
Date:
2019-12-31 21:56:01 UTC
From:
To:
Raising severity, since this current kernels are completely unusable on
at least some hardware (i.e. the one I use here), since the temperature
just explodes.
I'd say grave is justified already by potential hardware damages of
systems running even at little actual load at 100 °C not to talk about
the fact that one can effectively not upgrade to >5.2 kernels and thus
miss any security updates.


I've just checked the 5.4 packages from sid and the described issue
still occurs.


It seems to me that it's likely somehow graphics related, cause if I do
nothing (i.e. the screen also does nothing) the temperatures go down to
acceptable ~60°C .. but if I just scroll up and down e.g. in my email
client's mail list (which ist just the list of subject/from/etc.
lines), the temperature goes up to 80°C

And still, as previously described, even if I stop the actions that
caused the temperatures going up (like no longer scrolling up/down) it
takes quite a while till CPU temperatures go down again (eventually
they do).


Downgrading to 5.2 and everything's back to normal.


Cheers,
Chris.

#945055#63
Date:
2020-01-08 01:58:35 UTC
From:
To:
Hey.

I've forwarded this to lkml.

My most recent post in that thread[0] contains an pretty elaborate test
series comparing kernel 5.2 vs. 5.4 (each with intel_pstate=disable and
without), each on Cinnamon and GNOME Classic... under different
scenarios (idle system and several videos played back).

My personal conclusion would be that something changed between 5.2 and
5.3, which made temperatures and CPU utilisation considerably worse for
Cinnamon,... and not such much, but still noticeably for GNOME.


Apart from that however, there seems to be additionally something wrong
with Cinnamon, as it performs much worse with video playback than GNOME
does - even under 5.2.

So I've additionally created a ticket there at Cinnamon[1].


[0] https://lore.kernel.org/lkml/c7b7e81b14380709c3d63033b0e67ee12b737b55.camel@scientia.net/
[1] https://github.com/linuxmint/cinnamon/issues/9085#issuecomment-570654676

#945055#70
Date:
2020-01-09 22:28:38 UTC
From:
To:
Hey.

According to https://gitlab.freedesktop.org/drm/intel/issues/953 the
bug was introduced by:
drm/i915/gen8+: Add RC6 CTX corruption WA (d4360736a7c0a6326e3bbdf7d41181f6ed03d9a6)

which, AFAIU, is actually a security fix.


There seem to be some patches, but not sure when they'll be "final" (if
ever)... without opening the security issue again.


Also this would just fix my imminent showstopper of Cinnamon running at
extreme temperatures when being effectively idle.

As my test series shows the following issues likely remain:
- Cinnamon performs noticeably worse with video playback than GNOME
  even under 5.2 (where the offending commit isn't there)
- vaapi performs than xv (which I guess it shouldn't)
- intel_pstate makes the system hotter to.


Thanks,
Chris.

#945055#77
Date:
2020-01-23 12:49:29 UTC
From:
To:
Hey.

The offending patch is apparently:
drm/i915/gen8+: Add RC6 CTX corruption WA

which is contained in:
$ git log --oneline --all | grep "drm/i915/gen8+: Add RC6 CTX
corruption WA"
5013e6d917ac drm/i915/gen8+: Add RC6 CTX corruption WA
2248a28384fe drm/i915/gen8+: Add RC6 CTX corruption WA
d4360736a7c0 drm/i915/gen8+: Add RC6 CTX corruption WA
255ed51599de drm/i915/gen8+: Add RC6 CTX corruption WA
1a5a64e0bde8 drm/i915/gen8+: Add RC6 CTX corruption WA
00194ecfb32c drm/i915/gen8+: Add RC6 CTX corruption WA
284d38667f7e drm/i915/gen8+: Add RC6 CTX corruption WA
7e34f4e4aad3 drm/i915/gen8+: Add RC6 CTX corruption WA

$ git describe --contains  5013e6d917ac 2248a28384fe d4360736a7c0
255ed51599de 1a5a64e0bde8 00194ecfb32c 284d38667f7e 7e34f4e4aad3
v3.16.77~12
v5.5-rc1~28^2~19
v5.3.11~20
v4.19.84~28
v4.14.154~28
v4.9.201~2
v4.4.201~2
v5.4-rc8~28^2~1


The issue seems to affect *all* the i915/gen8+ GPUs, preventing them to
enter sleep states.

It seems a patch is available at
https://gitlab.freedesktop.org/drm/intel/issues/614
and according to
https://gitlab.freedesktop.org/drm/intel/issues/953#note_385488 these
seem to be the final versions(?).

Apparently it's however kinda stuck to get them in a stable release
(not really sure why), so could you possibly cherry pick the patch,
since that issue is really a major showstopper for all affected people.


Cheers,
Chris.

#945055#94
Date:
2020-01-23 12:59:53 UTC
From:
To:
Oh I've just seen that the fixing commit seems to be a already part of
5.5-rc1:

$ git log --oneline --all | grep "drm/i915/gt: Schedule request retirement when timeline idles"
311770173fac drm/i915/gt: Schedule request retirement when timeline idles

$ git describe --contains  311770173fac
v5.5-rc1~28^2^2~6

#945055#103
Date:
2020-03-15 04:58:00 UTC
From:
To:
Guess that issue can be closed as wontfix.
Despite a patch being available for nearly two months no in an issue
that causes complete breakage of affected systems, there seems to be no
intentions to pick it up or release a recent enough kernel which would
contain it already.

Upstream's apparently also unwilling to submit this to -stable kernels
so yeah... people affected to it should probably switch hardware or OS.


Cheers,
Chris.

#945055#118
Date:
2020-04-17 17:02:31 UTC
From:
To:
This is now neither "fixed" nor "found" in any 5.5 version.  Please
update the versions properly.

This is also tagged "patch" but without a direct link to the patch(es)
that are supposed to fix it.  (Linking to the upstream bug report is
not specific enough.)

Ben.

#945055#125
Date:
2020-04-17 19:19:29 UTC
From:
To:
Hey.


For several months now, I've been chasing a tremendous heat increase
(CPU/GPU) respectively power usage on my notebook.

It basically started after upgrading from 5.2 to 5.3, at least I
haven't explicitly noted any grave changes from before 5.2 to 5.2.
The issue (actually there might be several) persists until at least 5.4
and 5.5.

Things are so bad, that when just type this mail,... that I can hear
the fan go up considerably (and temps up to 90°C) just by typing the
mail in the mail client (while it goes back to - still insane - 65°C
idle, when not typing... ok idle here(!) is with firefox running).
Similar things when I scroll through a terminal window, Alt-Tab cycle
between windows, and so on.


Testing is a bit difficult for me, as I couldn't come up with an easy
way to reproducibly generate real world load (like this typing, or
scrolling terminal windows), yet I tried to do an extensive test
series, which I think will illustrate some things.


Not really sure what the normal average or idle temps of that CPU are,
but I guess getting at average >80°C by just typing shouldn't be the
case.




1) Previous tests
*****************
When first searching for the reason of the temperature increase, I've
had opened several tickets:
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=945055
https://lore.kernel.org/lkml/d05aba2742ae42783788c954e2a380e7fcb10830.camel@scientia.net/

Finally to find (by coincidence):
https://gitlab.freedesktop.org/drm/intel/issues/614
when reporting:
https://gitlab.freedesktop.org/drm/intel/-/issues/953
myself.

At first I thought #614 would be the bug, but the fix for that went
into 5.5-rc, and in fact, with 5.5.x I do see the GPU entering RC6
sleep states again, yet the temperature of my system is still crazy.




2) Testing Environment
**********************
(for these new tests here)
- Fujitsu Lifebook U757
- most recent BIOS version (1.25) in the tests below (I've had used an
  older one in previous tests from the links)
- 32GB memory, some Sandisk SSD
- Intel(R) Core(TM) i7-7600U CPU @ 2.80GHz
- microcode: sig=0x806e9, pf=0x80, revision=0xca
- Debian sid, all packages (unless some totally unrelated stuff at
  their newest versions in unstable)
- all used kernels are stock kernels from Debian
- I do use full dm-crypt encryption of the system, but that shouldn't
  be a cause for the problems, I guess.
- in my /etc/sysfs.conf I have:
  devices/system/cpu/intel_pstate/no_turbo = 1
  basically since I have that laptop... with turbo enabled I always
got
  these:
  Apr  5 18:27:07 heisenberg kernel: [ 9884.510420] mce: CPU3: Package
temperature above threshold, cpu clock throttled (total events = 2609)
  Apr  5 18:27:07 heisenberg kernel: [ 9884.510422] mce: CPU1: Package
temperature above threshold, cpu clock throttled (total events = 2609)
  Apr  5 18:27:07 heisenberg kernel: [ 9884.510465] mce: CPU0: Package
temperature above threshold, cpu clock throttled (total events = 2609)
  Apr  5 18:27:07 heisenberg kernel: [ 9884.510467] mce: CPU2: Package
temperature above threshold, cpu clock throttled (total events = 2609)
  Apr  5 18:27:07 heisenberg kernel: [ 9884.511427] mce: CPU3: Package
temperature/speed normal
  Apr  5 18:27:07 heisenberg kernel: [ 9884.511430] mce: CPU0: Package
temperature/speed normal
  Apr  5 18:27:07 heisenberg kernel: [ 9884.511431] mce: CPU1: Package
temperature/speed normal
  Apr  5 18:27:07 heisenberg kernel: [ 9884.511436] mce: CPU2: Package
temperature/speed normal
  => so for the tests with ipntel_pstate not being disabled, turbo mode
     was always disabled




3) How tests were made
**********************
I've tested with the following combinations:
- kernels 5.2.17 and 5.5.13
- with and without intel_pstate=disable
- with Cinnamon and GNOME Shell in classic mode

For all tests the notebook was placed in the same position and ran with
the same commands for tests, no other major processes (like firefox or
so) were running, just the respective bare desktop environment
(cinnamon or gnome shell classic), cron/anacron were stopped.

I always took temperature measurements with the output from sensors and
CSV output from powertop (which contains all the sleep states and high
energy users).

Temperature and powertop measurements were started at basically the
same time. powertop running for n iterations each 20s.
But since powertop takes a while to start the temperature measurements
are effectively shorter.


a) deep-idle
For these tests I've waited very long (like 5 minutes or more) for the
system to cool down.
Measurements with, e.g.:
export NAME="5.2.17/ipstate-disable/thermald-no/gnome-shell-
classic/deep-idle" ; timeout 80 sh -c "while true; do sleep 1; sensors;
done | grep °C > ${NAME}.temp"
and
export NAME="5.2.17/ipstate-disable/thermald-no/cinnamon/deep-idle" ;
powertop -i 4  --csv=${NAME}.powertop.csv


b) idle
Basically the same as (a), just not waiting so long to cool down.
Effectively I've always produced some load (with the fan and CPU temp
noticeably going up over 65°C), then stopped and waited for a minute.


c) winmove
After waiting a while for the CPU to cool down, starting the
measurement and then moving a terminal window fast and constantly in
circles over the whole screen.

The measurement is split in 3 phases. During warmup and main I've moved
the window like crazy. As soon as the cooldown phase began, I've
stopped that and did nothing more until the measurement finished.

Measurements with, e.g.:
export NAME="5.2.17/ipstate-disable/thermald-no/cinnamon/winmove" ;
powertop -i 3 --csv=${NAME}.warmup.powertop.csv ; beep ; powertop -i 4
--csv=${NAME}.main.powertop.csv ; beep ; powertop -i 4 --
csv=${NAME}.cooldown.powertop.csv
export NAME="5.2.17/ipstate-disable/thermald-no/gnome-shell-
classic/winmove" ; powertop -i 3 --csv=${NAME}.warmup.powertop.csv ;
beep ; powertop -i 4 --csv=${NAME}.main.powertop.csv ; beep ; powertop
-i 4 --csv=${NAME}.cooldown.powertop.csv


d) verify, verify-data
That measure when running a poorly written shell script of mine, the
shell script would read a list of regular files from find, and tries to
verify the SHA512 sum of that file, potentially stored as XATTR on it.
The script is poorly written, does quite a number of forks, pipes, and
so on, which seemed good for these test.

Difference between verify and verify-data are directories on which I
let find run.

with verify: I did it on /home/, where many files don't have my hash
XATTRs set

with verify-data: I did it on a dir, where basically all files have
them set an such there's more going into actual SHA512 calculation

Measurements with, e.g.:
export NAME="5.2.17/ipstate-disable/thermald-no/gnome-shell-
classic/verifyxattr" ; timeout 160 sh -c "while true; do sleep 1;
sensors; done | grep °C > ${NAME}.warmup+main.temp" ; beep
export NAME="5.2.17/ipstate-disable/thermald-no/cinnamon/verifyxattr" ;
powertop -i 8 --csv=${NAME}.warmup+main.powertop.csv ; bee


e) mpv-gpu-vaapi
Playing back a:
    Stream #0:0: Video: h264 (High), yuv420p(tv, bt709, progressive),
1920x1080 [SAR 1:1 DAR 16:9], 59.94 fps, 59.94 tbr, 1k tbn, 119.88 tbc
(default)
video via:
mpv someVideo.mkv -ao=null
in fullscreen.
.config/mpv/mpv.conf has:
script-opts=osc-deadzonesize=0
hwdec=auto

So the video plays back with:
(+) Video --vid=1 (*) (h264 1920x1080 59.940fps)
 (+) Audio --aid=1 --alang=eng (*) (opus 2ch 48000Hz)
Using hardware decoding (vaapi-copy).
AO: [null] 48000Hz stereo 2ch floatp
VO: [gpu] 1920x1080 nv12
AV: 00:00:01 / 00:36:23 (0%) A-V:  0.000

i.e. gpu and vaapi-copy

I've made two phases:
warmup+main: starting the video, going immediately to fullscreen
cooldown: as soon as the beep for it came, stopping mpv

Measurements with, e.g.:
export NAME="5.2.17/ipstate-disable/thermald-no/gnome-shell-
classic/mpv-gpu-vaapi" ; timeout 80 sh -c "while true; do sleep 1;
sensors; done | grep °C > ${NAME}.warmup+main.temp" ; beep ; timeout 80
sh -c "while true; do sleep 1; sensors; done | grep °C >
${NAME}.cooldown.temp"
export NAME="5.2.17/ipstate-disable/thermald-no/cinnamon/mpv-gpu-vaapi"
; powertop -i 4 --csv=${NAME}.warmup+main.powertop.csv ; beep ;
powertop -i 4 --csv=${NAME}.cooldown.powertop.csv


f) unhide-brute
Running the unhide program in brute mode (using the C version, not the
ruby version).
unhide seems to do a lot of forking, which also seems to CPU to go
crazy in terms of temperature (at least in some cases).

Measurements with, e.g.:
export NAME="5.5.13/ipstate-active-hwp/thermald-no/cinnamon/unhide-
brute" ; timeout 240 sh -c "while true; do sleep 1; sensors; done |
grep °C > ${NAME}.warmup+main.temp" ; beep
export NAME="5.5.13/ipstate-active-hwp/thermald-no/cinnamon/unhide-
brute" ; powertop -i 12 --csv=${NAME}.warmup+main.powertop.csv ; beep




4) Results
**********
For ease of use I've placed all my original test files and derived ones
in a git repo:
https://github.com/calestyo/cpu-tests


At first, I'm just looking at the bare temperatures of the Package,
which I've extracted to ./pack-temps for each of the tests.


I use ips = intel_pstate. in the following



a) deep-idle and idle
cinnamon_deep-idle.svg
=> 5.2 with ips=off runs ~2 °C cooler than 5.5 with ips=on
                         ~4 °C cooler than 5.5 with ips=off
                         ~3 °C cooler than 5.2 with ips=on
=> so from hottest to coolest:
   5.5/ips=off 5.2/ips=on  5.5/ips=on  5.2/ips=off

cinnamon_idle.svg
=> at least it's quite obvious that 5.2 with ips=off runs coolest

compared to 5.5 with ips=off it's 4-5 °C...
   compared to 5.5 with
ips=on  it's still quite noticeable in the

beginning (could be testing though)
   it's also noticeable cooler than
5.2 with ips=on
=> so from hottest to coolest:
   5.5/ips=off 5.2/ips=on
5.5/ips=on  5.2/ips=off
   (at least in the end)


gnome-shell-classic_deep-idle.svg
=> I'd say the results are too close, as one could deduce anything
valuable here,... but  5.5 looks better here (and ips=on looks even
better than having it disabled)

gnome-shell-classic_idle.svg
=> similarly here, the numbers are rather close, but 5.5 with ips=on
looks better... could however depend on the testing and the difference
is only ~2°C


=> One also sees, cinnamon runs considerably hotter than gnome-shell-
classic, none of them have any special applets or so running (just task
bar, workspace switcher, clock).



b) winmove.*
cinnamon_winmove.warmup.svg
cinnamon_winmove.main.svg
cinnamon_winmove.cooldown.svg
=> 5.2/ips=off runs considerably cooler than everything else, something
   around 10-15°C?
=> 5.5/ips=on is clearly the worst
=> 5.2/ips=on and 5.5/ips=off are similar, but 5.2 seems still a bit
   better during warmup and main

In this test it seems:
=> ips=off is considerably cooler/better for each kernel
=> 5.2 is considerably better or at least equal than the best of 5.5


gnome-shell-classic_winmove.warmup.svg
gnome-shell-classic_winmove.main.svg
gnome-shell-classic_winmove.cooldown.svg
=> again, g-s-c seems to do much better than cinnamon, but
=> 5.5/ips=off seems worst
   5.5/ips=on  seems slightly best
   5.2/* in the mid-range
=> but again, the numbers are pretty close so this could be just from
   testing



c) mpv-gpu-vaapi
mpv-gpu-vaapi.warmup+main.svg
=> most likely the blue line (5.2/ips=off/cinnamon) is just bogus,
   could redo it if someone needs
difficult to say something,...
=> all end up at 95-100°C, the ones from cinnamon much faster
=> the only thing I'd personally deduce from these is, that hardware
   acceleration has some severe problem in my setup. I'd have expected
   that playing back a video in fullscreen should be not problem at all
   for the GPU

mpv-gpu-vaapi.cooldown.svg
=> the only clear thing, I guess, is that 5.5/ips=off is worst with
cinnamon



d) verifyxattr and verifyxattr-data

cinnamon_verifyxattr.warmup+main.svg
cinnamon_verifyxattr-data.warmup+main.svg
=> the ones with ips=on are considerably worse...
=> interestingly, for verifyxattr 5.2/ips=off is better with
   5.5/ips=off being the 2nd
=> but for verifyxattr-data (which is ought to be more actual SHA512
   computation intensive and less just-forking), it's vice-versa
   and 5.5/ips=off is better than 5.2/ips=off

=> seems like a hint that forking and/or process switches or similar
   things could cause the temperature issues


gnome-shell-classic_verifyxattr.warmup+main.svg
gnome-shell-classic_verifyxattr-data.warmup+main.svg
similarly:
=> the ones with ips=on are considerably worse...
=> interestingly for verifyxattr 5.2 and 5.5 with ips=off
   are more or less the same
=> but for verifyxattr-data 5.5 is noticeably better


=> these differences cannot be directly explained by some GPU issues,
   at least not to my knowledge, since not much graphical output was
   produced



e) unhide-brute
cinnamon_unhide-brute.warmup+main.svg
gnome-shell-classic_unhide-brute.warmup+main.svg
=> both are nearly the same, except that under g-s-c, 5.5/ips=on
   is noticeable worse than under 5.2/ips=on
=> again, ips=off is *much* better than ips=on
=> 5.2 is considerably better than 5.5


=> these differences cannot be directly explained by some GPU issues,
   at least not to my knowledge, since not much graphical output was
   produced




I think overall conclusions are mostly:
Especially when the temperatures vary greatly, then
- 5.2 is much better than 5.5
- ips=off is much better or at least similar to ips=on





5) the powertop files
*********************
So far I've only taken a glance on them in trying to deduce anything
meaningful.
My hope would have been that some experts here have more experience on
reading them. ;-)


Looking for example at the files from my unhide-brute tests, comparing
5.2/ips=on/cinnamon with 5.5/ips=on/cinnamon, it seemed sometimes that:
Timer;tick_sched_timer
kWork;intel_atomic_commit_work
kWork;free_work

might be offenders... but that's all not so obvious (at least to me).




6) Observations / Other
***********************
a) One thing I've noted sometimes, but not always:
When the system was under "some" load that caused extreme
temperatures... even when I stopped that load, temperatures didn't
always go back.
I mean it's clear that cooling takes a while, but sometimes things went
on for 5 mins or more.


b) Sometimes it might have seemed, that putting the system to suspend
cured the symptoms for a while,... but not always and I haven't tested
this in both kernels and ips=* variations.


c) While writing this email I'm in 5.5/ips=off/g-s-c ...
   (not sure whether ips=on would have been much better)
   The short time idle-temperature is already pretty bad (~65°C)
   (firefox, which seems to be a bad temperature offender, runs
   though).
   Closing FF and the idle temp goes to around ~58-59 (would probably
   go lower if I wait for longer)
   But when I now start moving around the mouse pointer, everything
   stays the roughly same... but I had also situations, where just by
   moving the pointer, temperatures when to 80°C or higher,... stopped
   moving the pointer, and the fell again.

   When I know press&hold a key in a gnome-terminal window, say
   constantly writing "n" to it... nothing changes.

   When I do the same in Evolution's mail compose window...
   temperatures go up to 74°C.
   Correspondingly I see
   /usr/lib/x86_64-linux-gnu/webkit2gtk-4.0/WebKitWebProcess
   going through the roof in powertop.

   Now one could argue there's just something fishy in
   Evolution/WebKit, but from what I remember it's by far not that bad
   (if at all) under 5.2.


d) Similarly, if I just move around the pointer via the touchpad:
   1.71 W     13,2 ms/s     436,3        Interrupt      PS/2 Touchpad /
Keyboard / Mouse
   becomes the top power consumer with quite a lot it seems?


e) At least until 5.5 (which fixed the i915 GPU doesn't go to RC6
   issue), I quite often saw the temperatures go crazy, while top
   didn't show that much CPU utilisation.
   Well it's quite clear if the issue was only in the GPU, but even
   with that fixed it still seemed at least sometimes during my tests,
   that I saw extreme temperatures while top didn't show even close to
   100% CPU utilisation.

f) right now, eventstat shows something like this:
Event/s PID      %CPU  PR  NI Task               Init
Function
   81.92   10050   2.1   0   0
WebKitWebProces    hrtimer_wakeup
   51.95    1600   0.1   0   0
Xorg               it_real_fn
   49.95    1600   0.1   0   0
Xorg               hrtimer_wakeup
   46.95   10050   2.1   0   0
WebKitWebProces    tick_sched_timer
   32.97    3184   0.7   0   0 gnome
-terminal-    hrtimer_wakeup               
   28.97   63710   0.0   0   0 [kworker/0:2-event
intel_uncore_fw_release_timer
   11.99    2831   0.1   0   0 gnome-
shell        hrtimer_wakeup
    8.99    3417   0.1   0   0
evolution          hrtimer_wakeup
    5.00    1600   0.1   0   0
Xorg               tick_sched_timer
    5.00   71575   0.3   0   0
top                it_real_fn
    4.00    3184   0.7   0   0 gnome
-terminal-    tick_sched_timer             
    4.00   59885   0.1   0   0
diodon             hrtimer_wakeup
    3.00   71575   0.3   0   0
top                tick_sched_timer
    2.00   72584   0.0   0   0
<unknown>          tick_sched_timer
    2.00    2831   0.1   0   0 gnome-
shell        timerfd_tmrproc
    1.00   72585   0.0   0   0
sleep              hrtimer_wakeup
    1.00   66991   0.0   0 -20 [kworker/u9:2-i915
tick_sched_timer
    1.00     750   0.0   0   0
haveged            hrtimer_wakeup

that hrtimer_wapeup and tick_sched_timer appear quite often in the top
list.
Anything that changed there after 5.2?

Attaching with strace to that WebKitWebProcess:
$ strace -p 10050
strace: Process 10050 attached
restart_syscall(<... resuming interrupted read ...>) = 0
recvmsg(3, {msg_namelen=0}, 0)          = -1 EAGAIN (Resource
temporarily unavailable)
recvmsg(3, {msg_namelen=0}, 0)          = -1 EAGAIN (Resource
temporarily unavailable)
poll([{fd=3, events=POLLIN}, {fd=4, events=POLLIN}, {fd=5,
events=POLLIN}], 3, 0) = 0 (Timeout)
recvmsg(3, {msg_namelen=0}, 0)          = -1 EAGAIN (Resource
temporarily unavailable)
recvmsg(3, {msg_namelen=0}, 0)          = -1 EAGAIN (Resource
temporarily unavailable)
poll([{fd=3, events=POLLIN}, {fd=4, events=POLLIN}, {fd=5,
events=POLLIN}], 3, 0) = 0 (Timeout)
recvmsg(3, {msg_namelen=0}, 0)          = -1 EAGAIN (Resource
temporarily unavailable)
recvmsg(3, {msg_namelen=0}, 0)          = -1 EAGAIN (Resource
temporarily unavailable)
poll([{fd=3, events=POLLIN}, {fd=4, events=POLLIN}, {fd=5,
events=POLLIN}], 3, 0) = 0 (Timeout)
recvmsg(3, {msg_namelen=0}, 0)          = -1 EAGAIN (Resource
temporarily unavailable)
recvmsg(3, {msg_namelen=0}, 0)          = -1 EAGAIN (Resource
temporarily unavailable)
poll([{fd=3, events=POLLIN}, {fd=4, events=POLLIN}, {fd=5,
events=POLLIN}], 3, 0) = 0 (Timeout)
recvmsg(3, {msg_namelen=0}, 0)          = -1 EAGAIN (Resource
temporarily unavailable)
recvmsg(3, {msg_namelen=0}, 0)          = -1 EAGAIN (Resource
temporarily unavailable)
poll([{fd=3, events=POLLIN}, {fd=4, events=POLLIN}, {fd=5,
events=POLLIN}], 3, 14) = 0 (Timeout)

in a very fast "loop"...
Interestingly, printing that out, which is really! fast... doesn't seem
to increase the temperature much.

When I press&hold a key in evolution strace output shows a lot of
these:
recvmsg(3, {msg_namelen=0}, 0)          = -1 EAGAIN (Resource
temporarily unavailable)
recvmsg(3, {msg_namelen=0}, 0)          = -1 EAGAIN (Resource
temporarily unavailable)
poll([{fd=3, events=POLLIN}, {fd=4, events=POLLIN}, {fd=5,
events=POLLIN}], 3, 0) = 1 ([{fd=4, revents=POLLIN}])
read(4, "\2\0\0\0\0\0\0\0", 16)         = 8
write(4, "\1\0\0\0\0\0\0\0", 8)         = 8
write(15, "\1\0\0\0\0\0\0\0", 8)        = 8
futex(0x7f74d8000b60, FUTEX_WAKE_PRIVATE, 1) = 1
write(15, "\1\0\0\0\0\0\0\0", 8)        = 8
futex(0x7f74d8000b60, FUTEX_WAKE_PRIVATE, 1) = 1
write(15, "\1\0\0\0\0\0\0\0", 8)        = 8
futex(0x7f74d8000b60, FUTEX_WAKE_PRIVATE, 1) = 1
write(15, "\1\0\0\0\0\0\0\0", 8)        = 8
futex(0x7f74d8000b60, FUTEX_WAKE_PRIVATE, 1) = 1
write(10, "\1\0\0\0\0\0\0\0", 8)        = 8
futex(0x55c670f5d690, FUTEX_WAKE_PRIVATE, 1) = 1
futex(0x7f74c0005ab0, FUTEX_WAKE_PRIVATE, 1) = 1
futex(0x55c670eff418, FUTEX_WAKE_PRIVATE, 1) = 1
madvise(0x7f74af5ea000, 4096, MADV_NORMAL) = 0
madvise(0x7f74af5ea000, 4096, MADV_DODUMP) = 0
write(15, "\1\0\0\0\0\0\0\0", 8)        = 8
futex(0x7f74d8000b60, FUTEX_WAKE_PRIVATE, 1) = 1
write(15, "\1\0\0\0\0\0\0\0", 8)        = 8
futex(0x7f74d8000b60, FUTEX_WAKE_PRIVATE, 1) = 1
write(10, "\1\0\0\0\0\0\0\0", 8)        = 8
futex(0x55c670f5d690, FUTEX_WAKE_PRIVATE, 1) = 1
futex(0x7f74c0005ab0, FUTEX_WAKE_PRIVATE, 1) = 1
futex(0x55c670eff418, FUTEX_WAKE_PRIVATE, 1) = 1
write(15, "\1\0\0\0\0\0\0\0", 8)        = 8
futex(0x7f74d8000b60, FUTEX_WAKE_PRIVATE, 1) = 1
write(15, "\1\0\0\0\0\0\0\0", 8)        = 8
futex(0x7f74d8000b60, FUTEX_WAKE_PRIVATE, 1) = 1
write(10, "\1\0\0\0\0\0\0\0", 8)        = 8
futex(0x55c670f5d690, FUTEX_WAKE_PRIVATE, 1) = 1
futex(0x7f74c0005ab0, FUTEX_WAKE_PRIVATE, 1) = 1
futex(0x55c670eff418, FUTEX_WAKE_PRIVATE, 1) = 1
write(10, "\1\0\0\0\0\0\0\0", 8)        = 8
futex(0x55c670f5d690, FUTEX_WAKE_PRIVATE, 1) = 1
futex(0x7f74c0005ab0, FUTEX_WAKE_PRIVATE, 1) = 1
futex(0x55c670eff418, FUTEX_WAKE_PRIVATE, 1) = 1
write(15, "\1\0\0\0\0\0\0\0", 8)        = 8
futex(0x7f74d8000b60, FUTEX_WAKE_PRIVATE, 1) = 1
recvmsg(3, {msg_namelen=0}, 0)          = -1 EAGAIN (Resource
temporarily unavailable)
recvmsg(3, {msg_namelen=0}, 0)          = -1 EAGAIN (Resource
temporarily unavailable)
poll([{fd=3, events=POLLIN}, {fd=4, events=POLLIN}, {fd=5,
events=POLLIN}], 3, 0) = 1 ([{fd=4, revents=POLLIN}])
read(4, "\2\0\0\0\0\0\0\0", 16)         = 8
write(4, "\1\0\0\0\0\0\0\0", 8)         = 8
write(15, "\1\0\0\0\0\0\0\0", 8)        = 8
futex(0x7f74d8000b60, FUTEX_WAKE_PRIVATE, 1) = 1
memfd_create("WebKitSharedMemory", MFD_CLOEXEC) = 23
ftruncate(23, 18585600)                 = 0
mmap(NULL, 18585600, PROT_READ|PROT_WRITE, MAP_SHARED, 23, 0) =
0x7f74ae346000
fcntl(23, F_DUPFD_CLOEXEC, 0)           = 18
munmap(0x7f74ae346000, 18585600)        = 0
close(23)                               = 0
write(15, "\1\0\0\0\0\0\0\0", 8)        = 8
futex(0x7f74d8000b60, FUTEX_WAKE_PRIVATE, 1) = 1
write(15, "\1\0\0\0\0\0\0\0", 8)        = 8
futex(0x7f74d8000b60, FUTEX_WAKE_PRIVATE, 1) = 1
write(15, "\1\0\0\0\0\0\0\0", 8)        = 8
futex(0x7f74d8000b60, FUTEX_WAKE_PRIVATE, 1) = 1
write(15, "\1\0\0\0\0\0\0\0", 8)        = 8
futex(0x7f74d8000b60, FUTEX_WAKE_PRIVATE, 1) = 1
write(15, "\1\0\0\0\0\0\0\0", 8)        = 8
futex(0x7f74d8000b60, FUTEX_WAKE_PRIVATE, 1) = 1
write(15, "\1\0\0\0\0\0\0\0", 8)        = 8
futex(0x7f74d8000b60, FUTEX_WAKE_PRIVATE, 1) = 1
write(15, "\1\0\0\0\0\0\0\0", 8)        = 8
futex(0x7f74d8000b60, FUTEX_WAKE_PRIVATE, 1) = 1
write(15, "\1\0\0\0\0\0\0\0", 8)        = 8
futex(0x7f74d8000b60, FUTEX_WAKE_PRIVATE, 1) = 1
write(10, "\1\0\0\0\0\0\0\0", 8)        = 8
futex(0x55c670f5d690, FUTEX_WAKE_PRIVATE, 1) = 1
futex(0x7f74c0005ab0, FUTEX_WAKE_PRIVATE, 1) = 1
futex(0x55c670eff418, FUTEX_WAKE_PRIVATE, 1) = 1
madvise(0x7f74af5e9000, 4096, MADV_NORMAL) = 0
madvise(0x7f74af5e9000, 4096, MADV_DODUMP) = 0
write(15, "\1\0\0\0\0\0\0\0", 8)        = 8
futex(0x7f74d8000b60, FUTEX_WAKE_PRIVATE, 1) = 1
write(15, "\1\0\0\0\0\0\0\0", 8)        = 8
futex(0x7f74d8000b60, FUTEX_WAKE_PRIVATE, 1) = 1
write(10, "\1\0\0\0\0\0\0\0", 8)        = 8
futex(0x55c670f5d690, FUTEX_WAKE_PRIVATE, 1) = 1
futex(0x7f74c0005ab0, FUTEX_WAKE_PRIVATE, 1) = 1
futex(0x55c670eff418, FUTEX_WAKE_PRIVATE, 1) = 1
write(15, "\1\0\0\0\0\0\0\0", 8)        = 8
futex(0x7f74d8000b60, FUTEX_WAKE_PRIVATE, 1) = 1
write(15, "\1\0\0\0\0\0\0\0", 8)        = 8
futex(0x7f74d8000b60, FUTEX_WAKE_PRIVATE, 1) = 1
write(10, "\1\0\0\0\0\0\0\0", 8)        = 8
futex(0x55c670f5d690, FUTEX_WAKE_PRIVATE, 1) = 1
futex(0x55c670f5d690, FUTEX_WAIT_PRIVATE, 2, NULL) = -1 EAGAIN
(Resource temporarily unavailable)
futex(0x55c670f5d690, FUTEX_WAKE_PRIVATE, 1) = 0
futex(0x7f74c0005ab0, FUTEX_WAKE_PRIVATE, 1) = 1
futex(0x55c670eff418, FUTEX_WAKE_PRIVATE, 1) = 1
write(10, "\1\0\0\0\0\0\0\0", 8)        = 8
futex(0x55c670f5d690, FUTEX_WAKE_PRIVATE, 1) = 1
futex(0x7f74c0005ab0, FUTEX_WAKE_PRIVATE, 1) = 1
futex(0x55c670eff418, FUTEX_WAKE_PRIVATE, 1) = 1
write(15, "\1\0\0\0\0\0\0\0", 8)        = 8
futex(0x7f74d8000b60, FUTEX_WAKE_PRIVATE, 1) = 1
recvmsg(3, {msg_namelen=0}, 0)          = -1 EAGAIN (Resource
temporarily unavailable)
recvmsg(3, {msg_namelen=0}, 0)          = -1 EAGAIN (Resource
temporarily unavailable)
poll([{fd=3, events=POLLIN}, {fd=4, events=POLLIN}, {fd=5,
events=POLLIN}], 3, 0) = 1 ([{fd=4, revents=POLLIN}])
read(4, "\2\0\0\0\0\0\0\0", 16)         = 8
write(4, "\1\0\0\0\0\0\0\0", 8)         = 8
write(15, "\1\0\0\0\0\0\0\0", 8)        = 8
futex(0x7f74d8000b60, FUTEX_WAKE_PRIVATE, 1) = 1
memfd_create("WebKitSharedMemory", MFD_CLOEXEC) = 24
ftruncate(24, 18585600)                 = 0
mmap(NULL, 18585600, PROT_READ|PROT_WRITE, MAP_SHARED, 24, 0) =
0x7f74ae346000
fcntl(24, F_DUPFD_CLOEXEC, 0)           = 18
munmap(0x7f74ae346000, 18585600)        = 0
close(24)                               = 0
write(15, "\1\0\0\0\0\0\0\0", 8)        = 8
futex(0x7f74d8000b60, FUTEX_WAKE_PRIVATE, 1) = 1
write(15, "\1\0\0\0\0\0\0\0", 8)        = 8
futex(0x7f74d8000b60, FUTEX_WAKE_PRIVATE, 1) = 1
write(15, "\1\0\0\0\0\0\0\0", 8)        = 8
futex(0x7f74d8000b60, FUTEX_WAKE_PRIVATE, 1) = 1
write(15, "\1\0\0\0\0\0\0\0", 8)        = 8
futex(0x7f74d8000b60, FUTEX_WAKE_PRIVATE, 1) = 1
write(15, "\1\0\0\0\0\0\0\0", 8)        = 8
futex(0x7f74d8000b60, FUTEX_WAKE_PRIVATE, 1) = 1
write(15, "\1\0\0\0\0\0\0\0", 8)        = 8
futex(0x7f74d8000b60, FUTEX_WAKE_PRIVATE, 1) = 1
write(15, "\1\0\0\0\0\0\0\0", 8)        = 8
futex(0x7f74d8000b60, FUTEX_WAKE_PRIVATE, 1) = 1
recvmsg(3, {msg_namelen=0}, 0)          = -1 EAGAIN (Resource
temporarily unavailable)
recvmsg(3, {msg_namelen=0}, 0)          = -1 EAGAIN (Resource
temporarily unavailable)
poll([{fd=3, events=POLLIN}, {fd=4, events=POLLIN}, {fd=5,
events=POLLIN}], 3, 0) = 1 ([{fd=4, revents=POLLIN}])
read(4, "\2\0\0\0\0\0\0\0", 16)         = 8
write(4, "\1\0\0\0\0\0\0\0", 8)         = 8
write(4, "\1\0\0\0\0\0\0\0", 8)         = 8
write(4, "\1\0\0\0\0\0\0\0", 8)         = 8
write(4, "\1\0\0\0\0\0\0\0", 8)         = 8
write(15, "\1\0\0\0\0\0\0\0", 8)        = 8
futex(0x7f74d8000b60, FUTEX_WAKE_PRIVATE, 1) = 1
write(10, "\1\0\0\0\0\0\0\0", 8)        = 8
futex(0x55c670f5d690, FUTEX_WAKE_PRIVATE, 1) = 1
futex(0x7f74c0005ab0, FUTEX_WAKE_PRIVATE, 1) = 1
futex(0x55c670eff418, FUTEX_WAKE_PRIVATE, 1) = 1
write(15, "\1\0\0\0\0\0\0\0", 8)        = 8
futex(0x7f74d8000b60, FUTEX_WAKE_PRIVATE, 1) = 1
write(15, "\1\0\0\0\0\0\0\0", 8)        = 8
futex(0x7f74d8000b60, FUTEX_WAKE_PRIVATE, 1) = 1
write(10, "\1\0\0\0\0\0\0\0", 8)        = 8
futex(0x55c670f5d690, FUTEX_WAKE_PRIVATE, 1) = 1
futex(0x7f74c0005ab0, FUTEX_WAKE_PRIVATE, 1) = 0
futex(0x55c670eff418, FUTEX_WAKE_PRIVATE, 1) = 1
write(15, "\1\0\0\0\0\0\0\0", 8)        = 8
futex(0x7f74d8000b60, FUTEX_WAKE_PRIVATE, 1) = 1
write(15, "\1\0\0\0\0\0\0\0", 8)        = 8
futex(0x7f74d8000b60, FUTEX_WAKE_PRIVATE, 1) = 1
write(10, "\1\0\0\0\0\0\0\0", 8)        = 8
futex(0x55c670f5d690, FUTEX_WAKE_PRIVATE, 1) = 1
futex(0x7f74c0005ab0, FUTEX_WAKE_PRIVATE, 1) = 1
futex(0x55c670eff418, FUTEX_WAKE_PRIVATE, 1) = 1
write(10, "\1\0\0\0\0\0\0\0", 8)        = 8
futex(0x55c670f5d690, FUTEX_WAKE_PRIVATE, 1) = 1
futex(0x7f74c0005ab0, FUTEX_WAKE_PRIVATE, 1) = 1
futex(0x55c670eff418, FUTEX_WAKE_PRIVATE, 1) = 1
write(15, "\1\0\0\0\0\0\0\0", 8)        = 8
futex(0x7f74d8000b60, FUTEX_WAKE_PRIVATE, 1) = 1
recvmsg(3, {msg_namelen=0}, 0)          = -1 EAGAIN (Resource
temporarily unavailable)
recvmsg(3, {msg_namelen=0}, 0)          = -1 EAGAIN (Resource
temporarily unavailable)
poll([{fd=3, events=POLLIN}, {fd=4, events=POLLIN}, {fd=5,
events=POLLIN}], 3, 0) = 1 ([{fd=4, revents=POLLIN}])
read(4, "\5\0\0\0\0\0\0\0", 16)         = 8
write(4, "\1\0\0\0\0\0\0\0", 8)         = 8
write(15, "\1\0\0\0\0\0\0\0", 8)        = 8
futex(0x7f74d8000b60, FUTEX_WAKE_PRIVATE, 1) = 1
memfd_create("WebKitSharedMemory", MFD_CLOEXEC) = 24
ftruncate(24, 18585600)                 = 0
mmap(NULL, 18585600, PROT_READ|PROT_WRITE, MAP_SHARED, 24, 0) =
0x7f74ae346000
fcntl(24, F_DUPFD_CLOEXEC, 0)           = 18

Not sure if that's normal...


Anyway, back to my needle-in-the-haystack-search:
diodon, which is a small little clipboard helper also shows up in
powertop since a while, not it that much mW, but still much more I'd
have expected from a little tool that does basically nothing.
Attaching to it with strace again reveals a lot of polling:

writev(7, [{iov_base="\203(\3\0\336\5\0\0\2\0\0\0", iov_len=12},
{iov_base=NULL, iov_len=0}, {iov_base="", iov_len=0}], 3) = 12
poll([{fd=7, events=POLLIN}], 1, -1)    = 1 ([{fd=7, revents=POLLIN}])
recvmsg(7, {msg_name=NULL, msg_namelen=0,
msg_iov=[{iov_base="\1(!T\7\0\0\0\336\5\0\0[\201S\2\0\0]\0\0\0!\1\0\0]\
0\0\0!\1"..., iov_len=4096}], msg_iovlen=1, msg_controllen=0,
msg_flags=0}, 0) = 60
poll([{fd=3, events=POLLIN}, {fd=7, events=POLLIN}, {fd=8,
events=POLLIN}], 3, 0) = 1 ([{fd=3, revents=POLLIN}])
read(3, "\1\0\0\0\0\0\0\0", 16)         = 8
recvmsg(7, {msg_namelen=0}, 0)          = -1 EAGAIN (Resource
temporarily unavailable)
recvmsg(7, {msg_namelen=0}, 0)          = -1 EAGAIN (Resource
temporarily unavailable)
poll([{fd=3, events=POLLIN}, {fd=7, events=POLLIN}, {fd=8,
events=POLLIN}], 3, 494) = 0 (Timeout)
recvmsg(7, {msg_namelen=0}, 0)          = -1 EAGAIN (Resource
temporarily unavailable)
recvmsg(7, {msg_namelen=0}, 0)          = -1 EAGAIN (Resource
temporarily unavailable)
poll([{fd=3, events=POLLIN}, {fd=7, events=POLLIN}, {fd=8,
events=POLLIN}], 3, 3) = 0 (Timeout)
poll([{fd=7, events=POLLIN|POLLOUT}], 1, -1) = 1 ([{fd=7,
revents=POLLOUT}])
writev(7,
[{iov_base="\22\0\7\0\2\0\300\4L\2\0\0L\2\0\0\10\0\0\0\1\0\0\0a\1\0\0",
iov_len=28}, {iov_base=NULL, iov_len=0}, {iov_base="", iov_len=0}], 3)
= 28
poll([{fd=7, events=POLLIN}], 1, -1)    = 1 ([{fd=7, revents=POLLIN}])
recvmsg(7, {msg_name=NULL, msg_namelen=0,
msg_iov=[{iov_base="\34\0\"T\2\0\300\4L\2\0\0\207f\24\2\0\0\0\0\0\0\0\0
\0\0\0\0\0\0\0\0", iov_len=4096}], msg_iovlen=1, msg_controllen=0,
msg_flags=0}, 0) = 32
recvmsg(7, {msg_namelen=0}, 0)          = -1 EAGAIN (Resource
temporarily unavailable)
poll([{fd=7, events=POLLIN|POLLOUT}], 1, -1) = 1 ([{fd=7,
revents=POLLOUT}])
writev(7, [{iov_base="\27\0\2\0\1\0\0\0", iov_len=8}, {iov_base=NULL,
iov_len=0}, {iov_base="", iov_len=0}], 3) = 8
poll([{fd=7, events=POLLIN}], 1, -1)    = 1 ([{fd=7, revents=POLLIN}])
recvmsg(7, {msg_name=NULL, msg_namelen=0,
msg_iov=[{iov_base="\1\0#T\0\0\0\0\2\0@\2\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0
\0\0\0\0\0", iov_len=4096}], msg_iovlen=1, msg_controllen=0,
msg_flags=0}, 0) = 32
poll([{fd=7, events=POLLIN|POLLOUT}], 1, -1) = 1 ([{fd=7,
revents=POLLOUT}])
writev(7,
[{iov_base="\30\0\6\0\2\0\300\4\1\0\0\0M\2\0\0\230\1\0\0\207f\24\2",
iov_len=24}, {iov_base=NULL, iov_len=0}, {iov_base="", iov_len=0}], 3)
= 24
recvmsg(7, {msg_namelen=0}, 0)          = -1 EAGAIN (Resource
temporarily unavailable)
recvmsg(7, {msg_namelen=0}, 0)          = -1 EAGAIN (Resource
temporarily unavailable)
poll([{fd=3, events=POLLIN}, {fd=7, events=POLLIN}, {fd=8,
events=POLLIN}], 3, 494) = 1 ([{fd=7, revents=POLLIN}])
recvmsg(7, {msg_name=NULL, msg_namelen=0,
msg_iov=[{iov_base="\34\0$T\2\0\300\4\230\1\0\0\214f\24\2\0\0\0\0\0\0\0
\0\0\0\0\0\0\0\0\0"..., iov_len=4096}], msg_iovlen=1, msg_controllen=0,
msg_flags=0}, 0) = 64
recvmsg(7, {msg_namelen=0}, 0)          = -1 EAGAIN (Resource
temporarily unavailable)

All this again with 5.5/ips=off/g-c-s.


g) Looking at powertop:
  1.56 W      9,8 ms/s     397,8        Interrupt      PS/2 Touchpad /
Keyboard / Mouse
  836 mW      1,6 ms/s     213,9        kWork          dbs_work_handler
  804 mW      2,7 ms/s     205,5        Timer          tick_sched_timer
  594 mW     30,2 ms/s     144,6        Process        [PID 2831]
/usr/bin/gnome-shell
  585 mW     35,6 ms/s     140,9        Process        [PID 1600]
/usr/lib/xorg/Xorg :0 -seat seat0 -auth /var/run/lightdm/root/:0
-nolisten
(this is during and effectively idly system)

tick_sched_timer and dbs_work_handler appear there quite often at the
top.
And they keyboard/mouse/touchpad, too.




7) Conclusions
**************
Well not that many except:
- video acceleration seems not really working
- cinnamon generally worse than gnome-shell-classic (which of course
  doesn't have to be a kernel issue, but it still seems to have gotten
  worse with >5.2 ... and it might be a pointer to what's wrong in the
  kernel
- when the temperature differs more greatly in the measurements, than
  5.2 seems typically much better than 5.5
  and ips=off, too.


My tests are obviously somehow limited. None of them simulates the
"normal" usage, like just switching between windows (well the winmove
tests does to some extent), scrolling up and down in a window and so
on.... and these use cases also greatly increase temperature... not
rarely over 80°C.


Any ideas what could cause all this? Context switches? Spectre&friends
protections that were added after 5.2? Interrupts? Something related to
polling?

Or any ideas what to do in terms of further tests? Other kernel
options? Or other tools (things like eventstat and so on)?


Any help would be appreciated, cause right now my laptop is more of an
oven and it starts to literally burn my legs when I work with it.



Thanks,
Chris.

#945055#130
Date:
2020-04-17 19:19:34 UTC
From:
To:
I've made some further very extensive tests in the meantime, but these
were mostly for clearly GPU related stuff, i.e. the problem that the
temperatures go through the roof when playing back any video.
These were reported here:
https://gitlab.freedesktop.org/drm/intel/-/issues/953#note_463451

But I haven't made any plots/conclusions for that new set of tests, yet
(will keep this ticket updated once I've done).



As for the general (I mean even when doing non-graphics intensive stuff
like the unhide-brute or sha512 sum verify tests that I've described
above) extreme temperature increase since >5.2 that I see, ... what I
would try next is whether
mitigations=off changes anything (it didn't for video playback).


Also I found out about the nice features of perf record respectively
perf report.
I've played a bit with that already and the first "results" showed that
when I do anyting (like just typing at the keyboard, quickly moving
up/down in e.g. Evolutions mail list, or just Alt-Tab-ing between
windows, the number of events recorded there increases by
magnitudes(!!).


I'd be thankful for any guide in what to actually test to better nail
down that problem I see.

Thanks!

#945055#135
Date:
2020-04-17 19:40:21 UTC
From:
To:

Hey Ben.

Took a while till I got the mail that the bug was unarchived so I
didn't update everything immediately.
found-in-version was based on my guess that the problems I see since
versions > 5.2 were caused by
https://gitlab.freedesktop.org/drm/intel/issues/614

That bug was a regression introduced by a security fix that prevented
the GPU from entering RC6 sleep states.

perf showed me that I was affected by it, so I assumed the fix (which
was introduced in 5.5rc-something) would solve everything.

It didn't, as my fruther test series, which I've just sent to this
Debian as well, showed.


Even with 5.5 I see a tremendous temperature increase.



Unfortunately I'm by far not an expert enough to really tell where the
problem comes from (I'd say there may be even different problems
involved)... and I'd also need guiding what to actually test, to better
nail it down.


When I saw the problem still occurs with 5.5, I've made another test
series and reported it first at lkml:
https://lore.kernel.org/lkml/ce8097694ddfab616616f8f81521495d99c74416.camel@scientia.net/T/#u

When I got no response I've updated my older ticket at intel-drm:
https://gitlab.freedesktop.org/drm/intel/-/issues/953


My tests would indicate that there are a number of temperature
problems, in short:

- GPU intensive stuff (like playing videos)
- GPU stuff which shouldn't be intensive at all (e.g. moving around
windows)

but also:
- supposedly non-GPU intensive stuff like Alt-Tab-ing between windows,
scrolling up/down in lists in the GUI)
- stuff which doesn't even do graphics at all (see the unhide-brute and
(SHA)-verify tests I've made.



For the GPU-intensive stuff (specifically that I hit 100°C when I play
any videos) there is:
https://gitlab.freedesktop.org/drm/intel/issues/956
(intel-drm folks had asked me to put it in a separate issue)


For the general stuff (e.g. unhide brute or SHA512 verification running
much hotter), there is:
- the post to lkml
- https://bugzilla.kernel.org/show_bug.cgi?id=207245
- and since intel_pstate being enabled there's also:
https://bugzilla.kernel.org/show_bug.cgi?id=207247


The different tickets contain also descriptions of symptoms I've see,
e.g. where temperatures go through the roof even when just moving
windows, Alt-Tab-switching between them, scrolling up/down in a window,
and so on.


See especially the plots in the git repo I've provided, which shows how
much higher the temperature is from 5.2 to 5.5 (and for each of them
for intel_pstate  being on or off).



Any help on what to test would be highly appreciated.


I did some preliminary tests with perf record, while then e.g.
scrolling up/down in a GUI window (used the mail list in Evolution)
while the temperatures go up to ~80°C ...
This would have indicated that during that, the number of events as
recorded by perf record, grows by a magnitude.

I haven't had time yet to make more systematic tests.


Thanks,
Chris.

#945055#150
Date:
2020-04-20 02:23:10 UTC
From:
To:
I've upgraded to 5.5.17 (again the stock Debian sid package), and all
future tests with 5.5.x will be with this.

Problems unchanged.




I've also checked 5.5.17 with intel_pstate being enabled but at the
same time using:

iommu=off mitigations=off pci=nomsi


I didn't repeat all tests as extensively as they're in the git repo,
but I've played back a video with mpv and did some casual working (Atl-
Tab-switching between windows, scrolling/up down in some windows,
etc.).

None of these seem to help in terms of my CPU temperature going through
the roof.

#945055#157
Date:
2021-05-02 11:59:08 UTC
From:
To:
Hi

This bug was filed for a very old kernel or the bug is old itself
without resolution.

If you can reproduce it with

- the current version in unstable/testing
- the latest kernel from backports

please reopen the bug, see https://www.debian.org/Bugs/server-control
for details.

Regards,
Salvatore

#945055#160
Date:
2021-05-02 11:59:08 UTC
From:
To:
Hi

This bug was filed for a very old kernel or the bug is old itself
without resolution.

If you can reproduce it with

- the current version in unstable/testing
- the latest kernel from backports

please reopen the bug, see https://www.debian.org/Bugs/server-control
for details.

Regards,
Salvatore

#945055#165
Date:
2021-05-02 13:40:20 UTC
From:
To:
This bug has actually NOT been fixed.

It's NOT the one with the CPU not going into the RC6 state.

Cheers,
Chris.

#945055#170
Date:
2021-05-02 13:56:49 UTC
From:
To:
Hi,

In this case I'm reopening the bug again. But I suggest to ping again
upstream in this case, because without progress/movement/ideas
upstream we cannot do anything here downstream.

Regards,
Salvatore, trying to do some maintenance on open src:linux bugs
without progress.

#945055#177
Date:
2021-05-02 14:19:47 UTC
From:
To:
I'm afraid that upstream has shown pretty clearly that they have
basically no interest to look into that issue (guess Intel only spends
money on stuff they still sell).

Just look at the bug reports I've linked to (and the subsequent ones
linkes from there).
I put in many hours of testing and made many plots from which it should
be clear that something is quite wrong. But no further reaction.



For me, I found a workaround:

The CPU/GPU would (according to upstream devs) be very well capable of
controlling a HiDPI screen and doing FullHD playback there (actually
the developer said it should easily do several such stream).
And the notebook in fact has a HiDPI screen.

Now starting around after kernel 5.2, with HiDPI resolutions enabled,
the issues I've described show up:
- even little GPU loads like moving windows in circles leading to
extremely high temperatures ~70-90°C
- video playback in fullscreen generally 100°C.
- even non-GPU related load seemed to have caused such issues, as my
tests showed.


At some point, by chance I reduced the screen resolution to "just"
1920x1080 (from the HiDPI default of 3840x2160).

That immediately solved all issues.


Well, actually, e.g. Cinnamon still seems to run under higher CPU
temperatures than e.g. GNOME Classic (and I'm not only talking about
the idle temperature, but also when doing things like moving Windows),
but it's all so low now that I can live with it.

Under HiDPI, the difference bettween Cinnamon and GNOME Classic was
sometimes quite considerable, IIRC.



That being said, I think you may further lower the severity if you
wish, but it makes perhaps sense to keep the bug open for a while (I
guess until the CPU/GPU is so old, nobody would likely still use it),
so that people have an easier chance of finding it (and the
workaround)?


Cheers,
Chris.