Hi. With the most recent upgrade, the CPU seems to run at considerably higher temperatures. Shortly after a fresh boot, but long enough so that everything has started up, and the system being basically idle it looks like this: microcode updated early to revision 0xca, date = 2019-09-26 iwlwifi-virtual-0 Adapter: Virtual device temp1: N/A coretemp-isa-0000 Adapter: ISA adapter Package id 0: +58.0°C (high = +100.0°C, crit = +100.0°C) Core 0: +57.0°C (high = +100.0°C, crit = +100.0°C) Core 1: +57.0°C (high = +100.0°C, crit = +100.0°C) CMB1-acpi-0 Adapter: ACPI interface in0: 16.60 V curr1: 0.00 A And with the same kernel but older microcode: microcode updated early to revision 0xc6, date = 2019-08-14 iwlwifi-virtual-0 Adapter: Virtual device temp1: N/A coretemp-isa-0000 Adapter: ISA adapter Package id 0: +55.0°C (high = +100.0°C, crit = +100.0°C) Core 0: +53.0°C (high = +100.0°C, crit = +100.0°C) Core 1: +54.0°C (high = +100.0°C, crit = +100.0°C) CMB1-acpi-0 Adapter: ACPI interface in0: 16.60 V curr1: 0.00 A top shows basically no CPU utilisation (in both cases) but the fan goes up and the CPU is constantly notacibly hot, which both wasn't the case previously when the system was idle. Now another strange thing: With the NEW microcode, once some load was put on the system, even when this is gone and the CPU back to basically no utilisation, the temperatures are at a *much* higher level and stay there, for whichever reason: iwlwifi-virtual-0 Adapter: Virtual device temp1: +34.0°C coretemp-isa-0000 Adapter: ISA adapter Package id 0: +76.0°C (high = +100.0°C, crit = +100.0°C) Core 0: +70.0°C (high = +100.0°C, crit = +100.0°C) Core 1: +69.0°C (high = +100.0°C, crit = +100.0°C) CMB1-acpi-0 Adapter: ACPI interface in0: 16.59 V curr1: 0.00 A Is this a problem with the microcode or a kinda expected side-effect of the security workarounds? Cheers, Chris.
... ... I need the output of "cat /proc/cpuinfo" and also of "grep . /sys/devices/system/cpu/vulnerabilities/*" please. We need to know exactly what your processor is, and what got enabled. Alternatively, you can report this directly upstream at: https://github.com/intel/Intel-Linux-Processor-Microcode-Data-Files/issues They will need the same information I requested. There is nothing expected about it, as far as I know.
All below running with: # dmesg | head -n1 [ 0.000000] microcode: microcode updated early to revision 0xca, date = 2019-09-26 # uname -a Linux heisenberg 5.3.0-2-amd64 #1 SMP Debian 5.3.9-2 (2019-11-12) x86_64 GNU/Linux # cat /proc/cpuinfo processor : 0 vendor_id : GenuineIntel cpu family : 6 model : 142 model name : Intel(R) Core(TM) i7-7600U CPU @ 2.80GHz stepping : 9 microcode : 0xca cpu MHz : 992.983 cache size : 4096 KB physical id : 0 siblings : 4 core id : 0 cpu cores : 2 apicid : 0 initial apicid : 0 fpu : yes fpu_exception : yes cpuid level : 22 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb invpcid_single pti ssbd ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm mpx rdseed adx smap clflushopt intel_pt xsaveopt xsavec xgetbv1 xsaves dtherm ida arat pln pts hwp hwp_notify hwp_act_window hwp_epp md_clear flush_l1d bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs taa itlb_multihit bogomips : 5799.77 clflush size : 64 cache_alignment : 64 address sizes : 39 bits physical, 48 bits virtual power management: processor : 1 vendor_id : GenuineIntel cpu family : 6 model : 142 model name : Intel(R) Core(TM) i7-7600U CPU @ 2.80GHz stepping : 9 microcode : 0xca cpu MHz : 999.945 cache size : 4096 KB physical id : 0 siblings : 4 core id : 1 cpu cores : 2 apicid : 2 initial apicid : 2 fpu : yes fpu_exception : yes cpuid level : 22 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb invpcid_single pti ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm mpx rdseed adx smap clflushopt intel_pt xsaveopt xsavec xgetbv1 xsaves dtherm ida arat pln pts hwp hwp_notify hwp_act_window hwp_epp bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs taa itlb_multihit bogomips : 5799.77 clflush size : 64 cache_alignment : 64 address sizes : 39 bits physical, 48 bits virtual power management: processor : 2 vendor_id : GenuineIntel cpu family : 6 model : 142 model name : Intel(R) Core(TM) i7-7600U CPU @ 2.80GHz stepping : 9 microcode : 0xca cpu MHz : 994.502 cache size : 4096 KB physical id : 0 siblings : 4 core id : 0 cpu cores : 2 apicid : 1 initial apicid : 1 fpu : yes fpu_exception : yes cpuid level : 22 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb invpcid_single pti ssbd ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm mpx rdseed adx smap clflushopt intel_pt xsaveopt xsavec xgetbv1 xsaves dtherm ida arat pln pts hwp hwp_notify hwp_act_window hwp_epp md_clear flush_l1d bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs taa itlb_multihit bogomips : 5799.77 clflush size : 64 cache_alignment : 64 address sizes : 39 bits physical, 48 bits virtual power management: processor : 3 vendor_id : GenuineIntel cpu family : 6 model : 142 model name : Intel(R) Core(TM) i7-7600U CPU @ 2.80GHz stepping : 9 microcode : 0xca cpu MHz : 999.951 cache size : 4096 KB physical id : 0 siblings : 4 core id : 1 cpu cores : 2 apicid : 3 initial apicid : 3 fpu : yes fpu_exception : yes cpuid level : 22 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb invpcid_single pti ssbd ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm mpx rdseed adx smap clflushopt intel_pt xsaveopt xsavec xgetbv1 xsaves dtherm ida arat pln pts hwp hwp_notify hwp_act_window hwp_epp md_clear flush_l1d bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs taa itlb_multihit bogomips : 5799.77 clflush size : 64 cache_alignment : 64 address sizes : 39 bits physical, 48 bits virtual power management: # grep . /sys/devices/system/cpu/vulnerabilities/* /sys/devices/system/cpu/vulnerabilities/itlb_multihit:KVM: Mitigation: Split huge pages /sys/devices/system/cpu/vulnerabilities/l1tf:Mitigation: PTE Inversion; VMX: conditional cache flushes, SMT vulnerable /sys/devices/system/cpu/vulnerabilities/mds:Mitigation: Clear CPU buffers; SMT vulnerable /sys/devices/system/cpu/vulnerabilities/meltdown:Mitigation: PTI /sys/devices/system/cpu/vulnerabilities/spec_store_bypass:Mitigation: Speculative Store Bypass disabled via prctl and seccomp /sys/devices/system/cpu/vulnerabilities/spectre_v1:Mitigation: usercopy/swapgs barriers and __user pointer sanitization /sys/devices/system/cpu/vulnerabilities/spectre_v2:Mitigation: Full generic retpoline, IBPB: conditional, IBRS_FW, STIBP: conditional, RSB filling /sys/devices/system/cpu/vulnerabilities/tsx_async_abort:Mitigation: Clear CPU buffers; SMT vulnerable In the meantime, I've especially observed this situation that after some higher load, the CPU stays at pretty high temps, even though what produced the load has stopped and top show basically nothing, just now it's e.g. on: $ sensors coretemp-isa-0000 Adapter: ISA adapter Package id 0: +77.0°C (high = +100.0°C, crit = +100.0°C) Core 0: +76.0°C (high = +100.0°C, crit = +100.0°C) Core 1: +73.0°C (high = +100.0°C, crit = +100.0°C) iwlwifi-virtual-0 Adapter: Virtual device temp1: +32.0°C CMB1-acpi-0 Adapter: ACPI interface in0: 16.59 V curr1: 0.00 A Sometimes it starts to reduce again, for no apparent reason, but right now it already runs hot for several minutes. I'll try the older 5.2 now (with the current microcode) see whether that changes anything. Cheers, Chris.
After some further tests I think it could be actually independent of
the microcode and rather be kernel 5.3's fault.
First perhaps some notes on that notebook:
It's an Fujitsu LIFEBOOK U757 with:
model name : Intel(R) Core(TM) i7-7600U CPU @ 2.80GHz
When I got the system in around 2017, IIRC, I already had considerable
CPU overheating problems seeing often messages like:
ov 17 14:41:22 heisenberg kernel: [ 36.347425] mce: CPU2: Core temperature above threshold, cpu clock throttled (total events = 1)
Nov 17 14:41:22 heisenberg kernel: [ 36.347426] mce: CPU0: Core temperature above threshold, cpu clock throttled (total events = 1)
Nov 17 14:41:22 heisenberg kernel: [ 36.347427] mce: CPU1: Package temperature above threshold, cpu clock throttled (total events = 1)
Nov 17 14:41:22 heisenberg kernel: [ 36.347427] mce: CPU3: Package temperature above threshold, cpu clock throttled (total events = 1)
Nov 17 14:41:22 heisenberg kernel: [ 36.347429] mce: CPU0: Package temperature above threshold, cpu clock throttled (total events = 1)
Nov 17 14:41:22 heisenberg kernel: [ 36.347531] mce: CPU2: Package temperature above threshold, cpu clock throttled (total events = 1)
Nov 17 14:41:22 heisenberg kernel: [ 36.348423] mce: CPU2: Core temperature/speed normal
Nov 17 14:41:22 heisenberg kernel: [ 36.348424] mce: CPU0: Core temperature/speed normal
Nov 17 14:41:22 heisenberg kernel: [ 36.348461] mce: CPU1: Package temperature/speed normal
Nov 17 14:41:22 heisenberg kernel: [ 36.348461] mce: CPU3: Package temperature/speed normal
Nov 17 14:41:22 heisenberg kernel: [ 36.348498] mce: CPU2: Package temperature/speed normal
Nov 17 14:41:22 heisenberg kernel: [ 36.348568] mce: CPU0: Package temperature/speed normal
Just with many thousands of events and temperatures reaching pretty
exactly 100°C.
Fujitsu support had no real solution, claiming it wouldn't happen under
Windows.
Eventually the solution was to disable the turbo:
/sys/devices/system/cpu/intel_pstate/no_turbo = 1
(and all my previous tests as well as the ones from this mail have that
set).
Since then I rarely see the temperature warnings from above, and if
it's usually only exactly one event during boot.
I guess the cooling of that slim ultrabook is just not designed well
enough to transport enough heat away if the turbo is on.
One further constant thing is that playback of videos always lead to
considerable CPU utilisation (and higher temperatures), much worse than
the previous ~2012 lifebook the university bought me.
I've never found a real solution for that,... video decoding
acceleration is enabled and seems to work but still,...
Suspicion was that it might be some issue in cinnamon, cause the
cinnamon process also gets quite high CPU usage when I play back
videos.
But when I've reported this ticket, things had gotten much worse (and
it was like that already for a week till I've took action or even
noticed it), especially when the system was basically idle, the temps
were also much higher and the fan running much louder/faster.
And as I've described before, there are situations when it get's really
hot (80° and more) an doesn't cool down again a lot, even if the it
becomes idle again.
Just before I did some more testing with different kernel/microcode
combinations:
**************************************************************
With 5.2.17 0xca-2019-09-26:
Event/s PID %CPU PR NI Task Init Function
59.82 1730 0.2 0 0 Xorg hrtimer_wakeup
55.83 1730 0.2 0 0 Xorg it_real_fn
14.96 3203 0.5 0 0 gnome-terminal- hrtimer_wakeup
13.96 3086 0.0 0 0 diodon hrtimer_wakeup
7.98 3065 0.2 0 0 cinnamon tick_sched_timer
3.99 3203 0.5 0 0 gnome-terminal- tick_sched_timer
2.99 32 0.0 0 0 [kworker/1:1] intel_uncore_fw_release_timer
1.99 509 0.0 0 0 [kworker/u8:4] intel_uncore_fw_release_timer
1.99 1730 0.2 0 0 Xorg intel_uncore_fw_release_timer
1.99 45 0.0 0 0 [kworker/2:1] intel_uncore_fw_release_timer
1.00 847 0.0 0 0 gmain hrtimer_wakeup
1.00 2861 0.0 0 0 gmain hrtimer_wakeup
1.00 236 0.0 0 0 [kworker/3:2] intel_uncore_fw_release_timer
1.00 751 0.0 0 0 haveged hrtimer_wakeup
1.00 2920 0.0 0 0 gmain hrtimer_wakeup
1.00 1730 0.2 0 0 Xorg tick_sched_timer
1.00 3065 0.3 0 0 cinnamon intel_uncore_fw_release_timer
173 Total events, 172.48 events/sec (kernel: 7.98, userspace: 164.51)
Event/s PID %CPU PR NI Task Init Function
68.00 1730 0.5 0 0 Xorg hrtimer_wakeup
65.00 1730 0.5 0 0 Xorg it_real_fn
20.00 3203 0.3 0 0 gnome-terminal- hrtimer_wakeup
14.00 3086 0.0 0 0 diodon hrtimer_wakeup
7.00 3065 0.4 0 0 cinnamon tick_sched_timer
5.00 32 0.0 0 0 [kworker/1:1] intel_uncore_fw_release_timer
4.00 1730 0.5 0 0 Xorg tick_sched_timer
4.00 1730 0.5 0 0 Xorg intel_uncore_fw_release_timer
2.00 751 0.0 0 0 haveged hrtimer_wakeup
1.00 2695 0.0 0 0 ssh-agent hrtimer_wakeup
1.00 3074 0.0 0 0 gdbus tick_sched_timer
1.00 3203 0.3 0 0 gnome-terminal- tick_sched_timer
1.00 145 0.0 0 0 [kworker/0:2] tick_sched_timer
1.00 3104 0.0 0 0 gdbus tick_sched_timer
194 Total events, 194.00 events/sec (kernel: 6.00, userspace: 188.00)
Event/s PID %CPU PR NI Task Init Function
63.00 1730 0.5 0 0 Xorg hrtimer_wakeup
61.00 1730 0.5 0 0 Xorg it_real_fn
14.00 3086 0.1 0 0 diodon hrtimer_wakeup
13.00 3203 0.3 0 0 gnome-terminal- hrtimer_wakeup
12.00 3065 0.5 0 0 cinnamon tick_sched_timer
7.00 32 0.0 0 0 [kworker/1:1] intel_uncore_fw_release_timer
5.00 1730 0.5 0 0 Xorg intel_uncore_fw_release_timer
3.00 1730 0.5 0 0 Xorg tick_sched_timer
2.00 173 0.0 0 0 [kworker/u8:3] intel_uncore_fw_release_timer
2.00 3203 0.3 0 0 gnome-terminal- tick_sched_timer
2.00 236 0.5 0 0 [kworker/3:2] intel_uncore_fw_release_timer
1.00 145 0.0 0 0 [kworker/0:2] intel_uncore_fw_release_timer
1.00 751 0.0 0 0 haveged hrtimer_wakeup
1.00 3086 0.0 0 0 diodon tick_sched_timer
187 Total events, 187.00 events/sec (kernel: 12.00, userspace: 175.00)
Event/s PID %CPU PR NI Task Init Function
60.00 1730 0.2 0 0 Xorg hrtimer_wakeup
55.00 1730 0.2 0 0 Xorg it_real_fn
16.00 3086 0.1 0 0 diodon hrtimer_wakeup
14.00 3203 0.2 0 0 gnome-terminal- hrtimer_wakeup
7.00 3065 0.2 0 0 cinnamon tick_sched_timer
5.00 32 0.0 0 0 [kworker/1:1] intel_uncore_fw_release_timer
2.00 1730 0.2 0 0 Xorg tick_sched_timer
1.00 3203 0.2 0 0 gnome-terminal- tick_sched_timer
1.00 751 0.0 0 0 haveged hrtimer_wakeup
1.00 3065 0.3 0 0 cinnamon hrtimer_wakeup
1.00 1730 0.2 0 0 Xorg intel_uncore_fw_release_timer
1.00 32 0.0 0 0 [kworker/1:1] tick_sched_timer
1.00 3065 0.4 0 0 cinnamon intel_uncore_fw_release_timer
165 Total events, 165.00 events/sec (kernel: 6.00, userspace: 159.00)
Event/s PID %CPU PR NI Task Init Function
65.00 1730 0.5 0 0 Xorg hrtimer_wakeup
61.00 1730 0.5 0 0 Xorg it_real_fn
34.00 3065 1.3 0 0 cinnamon tick_sched_timer
19.00 3086 0.1 0 0 diodon hrtimer_wakeup
14.00 3203 0.2 0 0 gnome-terminal- hrtimer_wakeup
11.00 842 0.0 0 0 NetworkManager tick_sched_timer
8.00 907 0.0 0 0 gdbus tick_sched_timer
8.00 841 0.0 0 0 dbus-daemon tick_sched_timer
5.00 3074 0.7 0 0 gdbus tick_sched_timer
5.00 32 0.0 0 0 [kworker/1:1] intel_uncore_fw_release_timer
4.00 3104 0.2 0 0 gdbus tick_sched_timer
2.00 3203 0.2 0 0 gnome-terminal- tick_sched_timer
2.00 236 0.0 0 0 [kworker/3:2] intel_uncore_fw_release_timer
2.00 841 0.0 0 0 <...> tick_sched_timer
2.00 3086 0.1 0 0 diodon tick_sched_timer
2.00 1730 0.5 0 0 Xorg tick_sched_timer
2.00 3088 0.0 0 0 nm-applet tick_sched_timer
2.00 888 0.0 0 0 wpa_supplicant hrtimer_wakeup
2.00 1730 0.5 0 0 Xorg intel_uncore_fw_release_timer
1.00 751 0.0 0 0 haveged hrtimer_wakeup
251 Total events, 251.00 events/sec (kernel: 7.00, userspace: 244.00)
Event/s PID %CPU PR NI Task Init Function
59.30 1730 0.5 0 0 Xorg hrtimer_wakeup
57.29 1730 0.5 0 0 Xorg it_real_fn
23.12 3203 0.0 0 0 gnome-terminal- hrtimer_wakeup
16.08 3086 0.0 0 0 diodon hrtimer_wakeup
7.04 32 0.0 0 0 [kworker/1:1] intel_uncore_fw_release_timer
7.04 3065 0.3 0 0 cinnamon tick_sched_timer
3.02 1730 0.5 0 0 Xorg tick_sched_timer
3.02 1730 0.5 0 0 Xorg intel_uncore_fw_release_timer
2.01 751 0.0 0 0 haveged hrtimer_wakeup
1.01 3203 0.0 0 0 gnome-terminal- tick_sched_timer
178 Total events, 178.89 events/sec (kernel: 7.04, userspace: 171.86)
^C Event/s PID %CPU PR NI Task Init Function
141.84 1730 0.0 0 0 Xorg hrtimer_wakeup
120.57 1730 0.0 0 0 Xorg it_real_fn
85.11 3203 1.8 0 0 gnome-terminal- hrtimer_wakeup
28.37 3203 1.8 0 0 gnome-terminal- tick_sched_timer
14.18 751 0.0 0 0 haveged hrtimer_wakeup
14.18 32 0.0 0 0 [kworker/1:1] intel_uncore_fw_release_timer
7.09 2718 0.0 0 0 <...> hrtimer_wakeup
7.09 7530 0.0 0 0 [kworker/u8:8] tick_sched_timer
7.09 3065 0.8 0 0 cinnamon hrtimer_wakeup
7.09 1730 0.0 0 0 Xorg intel_uncore_fw_release_timer
61 Total events, 432.62 events/sec (kernel: 21.28, userspace: 411.35)
=> I thought I'd had seen much higher numbers for hrtimer_wakeup when
running 5.3, but that didn't turn out to be the case
at an idle system (DE/cinnamon running but no real load)
root@heisenberg:~# sensors
coretemp-isa-0000
Adapter: ISA adapter
Package id 0: +54.0°C (high = +100.0°C, crit = +100.0°C)
Core 0: +52.0°C (high = +100.0°C, crit = +100.0°C)
Core 1: +52.0°C (high = +100.0°C, crit = +100.0°C)
iwlwifi-virtual-0
Adapter: Virtual device
temp1: +33.0°C
update-iniramfs -u -k all barely hits 70°C
**************************************************************
**************************************************************
With 5.3.9+0xCA-2019-09-26
eventstat didn't show considerably higher numbers for e.g.
hrtimer_wakeup
which I thought I'd had seen at first.
But now idle system (again cinnamon running) seems to run much hotter,
barely getting below 60°:
# sensors
coretemp-isa-0000
Adapter: ISA adapter
Package id 0: +66.0°C (high = +100.0°C, crit = +100.0°C)
Core 0: +61.0°C (high = +100.0°C, crit = +100.0°C)
Core 1: +61.0°C (high = +100.0°C, crit = +100.0°C)
iwlwifi-virtual-0
Adapter: Virtual device
temp1: +33.0°C
CMB1-acpi-0
Adapter: ACPI interface
in0: 16.58 V
curr1: 0.00 A
Here I did some apt/aptitude stuff to get older an intel-microcode from
stable or oldstable.
After that (installation of packages and update-initramfs) it took (I'd
say) noticeable longer (not extremely much, but noticable) till the CPU
cools down to the (still higher base level of) idle temps from above
(~60-68°)
Running update-initramfs -k all -u let's the temps go easily above 70°
up to 85°.
Interestingly sometimes it cools down again rather fast (but still only
to the 60° range).
Sometimes it doesn't.
Especially video playpack seems to be a killer.
Playing a:
Stream #0:0(und): Video: h264 (High) (avc1 / 0x31637661), yuv420p,
720x304 [SAR 152:151 DAR 360:151], 529 kb/s, SAR 181:180 DAR 181:76, 25
fps, 25 tbr, 25k tbn, 50 tbc (default)
in full screen lets the CPU heat up to:
# sensors
coretemp-isa-0000
Adapter: ISA adapter
Package id 0: +93.0°C (high = +100.0°C, crit = +100.0°C)
Core 0: +83.0°C (high = +100.0°C, crit = +100.0°C)
Core 1: +81.0°C (high = +100.0°C, crit = +100.0°C)
iwlwifi-virtual-0
Adapter: Virtual device
temp1: +33.0°C
CMB1-acpi-0
Adapter: ACPI interface
in0: 16.57 V
curr1: 0.00 A
and it took quite a while to cool down, even though I've stopped the
video for a minute or so already.
**************************************************************
**************************************************************
With 5.3.9+0xb-2019-04-01:
I.e. current kernel, but even older microcode (the last one where I
though it was ok, was 3.20191112.1 ... but that might be just a
coincidence since on Nov 14 2019 I've installed kernel 5.3 packages,
and that is roughly around 3.20191113.1 (Fri, 15 Nov 2019) where I've
started to slowly notice the CPU temperature issues.
Idle temp sems to be around:
# sensors
iwlwifi-virtual-0
Adapter: Virtual device
temp1: +33.0°C
coretemp-isa-0000
Adapter: ISA adapter
Package id 0: +71.0°C (high = +100.0°C, crit = +100.0°C)
Core 0: +66.0°C (high = +100.0°C, crit = +100.0°C)
Core 1: +65.0°C (high = +100.0°C, crit = +100.0°C)
CMB1-acpi-0
Adapter: ACPI interface
in0: 16.57 V
curr1: 0.00 A
so here I concluded that maybe 5.3 is the offender... and not the
microcode!?
Installing the current microcode again and afterwards doing:
update-initramfs -k all -u
leads to temps around that:
$ sensors
iwlwifi-virtual-0
Adapter: Virtual device
temp1: +33.0°C
coretemp-isa-0000
Adapter: ISA adapter
Package id 0: +79.0°C (high = +100.0°C, crit = +100.0°C)
Core 0: +74.0°C (high = +100.0°C, crit = +100.0°C)
Core 1: +79.0°C (high = +100.0°C, crit = +100.0°C)
CMB1-acpi-0
Adapter: ACPI interface
in0: 16.57 V
curr1: 0.00 A
staying long at around:
iwlwifi-virtual-0
Adapter: Virtual device
temp1: +33.0°C
coretemp-isa-0000
Adapter: ISA adapter
Package id 0: +74.0°C (high = +100.0°C, crit = +100.0°C)
Core 0: +68.0°C (high = +100.0°C, crit = +100.0°C)
Core 1: +66.0°C (high = +100.0°C, crit = +100.0°C)
CMB1-acpi-0
Adapter: ACPI interface
in0: 16.57 V
curr1: 0.00 A
even though the initrd creation is already long over and top shows
nothing else.
**************************************************************
I'm now back at running 5.2.17-1 (2019-10-06) from the linux-image-5.2.0-3-amd64-unsigned
package with the most recent intel-microcode package version.
Temperatures seem good (in the sense: as from before I noticed issues).
So my conclusion would be 5.3 is the bad boy...
Shall we reassign it to src:linux?
Cheers,
Chris.
Please send the output of the grep line: grep . /sys/devices/system/cpu/vulnerabilities/* With the 5.2 kernel in the new microcode (20191115). Thank you!
/sys/devices/system/cpu/vulnerabilities/l1tf:Mitigation: PTE Inversion; VMX: conditional cache flushes, SMT vulnerable /sys/devices/system/cpu/vulnerabilities/mds:Mitigation: Clear CPU buffers; SMT vulnerable /sys/devices/system/cpu/vulnerabilities/meltdown:Mitigation: PTI /sys/devices/system/cpu/vulnerabilities/spec_store_bypass:Mitigation: Speculative Store Bypass disabled via prctl and seccomp /sys/devices/system/cpu/vulnerabilities/spectre_v1:Mitigation: usercopy/swapgs barriers and __user pointer sanitization /sys/devices/system/cpu/vulnerabilities/spectre_v2:Mitigation: Full generic retpoline, IBPB: conditional, IBRS_FW, STIBP: conditional, RSB filling during: root@heisenberg:~# uname -a Linux heisenberg 5.2.0-3-amd64 #1 SMP Debian 5.2.17-1 (2019-10-06) x86_64 GNU/Linux root@heisenberg:~# dmesg | head -n1 [ 0.000000] microcode: microcode updated early to revision 0xca, date = 2019-09-26 (which I think is the one from 20191115, right? Cheers, Chris.
Reassigning to the kernel, since the problem is likely there.
btw: 5.3.15-1 seems to be still affected. While I see mostly "normal" temperatures right after boot (and when everything has settled)... after some point in time, tmeperatures get up and remain at high levels, e.g. : Package id 0: +81.0°C (high = +100.0°C, crit = +100.0°C) Core 0: +75.0°C (high = +100.0°C, crit = +100.0°C) Core 1: +74.0°C (high = +100.0°C, crit = +100.0°C) even though top shows an effectively idle system. Cheers, Chris.
I should perhaps add that there is some slight indication that this might be graphics related. Cause when I switch from the running X/Cinnamon (with the CPU at average temperatures between 75-70°C, while top doesn't show really much) to the virtual kernel console, temperatures go down drastically (to around 56°C). Cheers, Chris.
Raising severity, since this current kernels are completely unusable on at least some hardware (i.e. the one I use here), since the temperature just explodes. I'd say grave is justified already by potential hardware damages of systems running even at little actual load at 100 °C not to talk about the fact that one can effectively not upgrade to >5.2 kernels and thus miss any security updates. I've just checked the 5.4 packages from sid and the described issue still occurs. It seems to me that it's likely somehow graphics related, cause if I do nothing (i.e. the screen also does nothing) the temperatures go down to acceptable ~60°C .. but if I just scroll up and down e.g. in my email client's mail list (which ist just the list of subject/from/etc. lines), the temperature goes up to 80°C And still, as previously described, even if I stop the actions that caused the temperatures going up (like no longer scrolling up/down) it takes quite a while till CPU temperatures go down again (eventually they do). Downgrading to 5.2 and everything's back to normal. Cheers, Chris.
Hey. I've forwarded this to lkml. My most recent post in that thread[0] contains an pretty elaborate test series comparing kernel 5.2 vs. 5.4 (each with intel_pstate=disable and without), each on Cinnamon and GNOME Classic... under different scenarios (idle system and several videos played back). My personal conclusion would be that something changed between 5.2 and 5.3, which made temperatures and CPU utilisation considerably worse for Cinnamon,... and not such much, but still noticeably for GNOME. Apart from that however, there seems to be additionally something wrong with Cinnamon, as it performs much worse with video playback than GNOME does - even under 5.2. So I've additionally created a ticket there at Cinnamon[1]. [0] https://lore.kernel.org/lkml/c7b7e81b14380709c3d63033b0e67ee12b737b55.camel@scientia.net/ [1] https://github.com/linuxmint/cinnamon/issues/9085#issuecomment-570654676
Hey. According to https://gitlab.freedesktop.org/drm/intel/issues/953 the bug was introduced by: drm/i915/gen8+: Add RC6 CTX corruption WA (d4360736a7c0a6326e3bbdf7d41181f6ed03d9a6) which, AFAIU, is actually a security fix. There seem to be some patches, but not sure when they'll be "final" (if ever)... without opening the security issue again. Also this would just fix my imminent showstopper of Cinnamon running at extreme temperatures when being effectively idle. As my test series shows the following issues likely remain: - Cinnamon performs noticeably worse with video playback than GNOME even under 5.2 (where the offending commit isn't there) - vaapi performs than xv (which I guess it shouldn't) - intel_pstate makes the system hotter to. Thanks, Chris.
Hey. The offending patch is apparently: drm/i915/gen8+: Add RC6 CTX corruption WA which is contained in: $ git log --oneline --all | grep "drm/i915/gen8+: Add RC6 CTX corruption WA" 5013e6d917ac drm/i915/gen8+: Add RC6 CTX corruption WA 2248a28384fe drm/i915/gen8+: Add RC6 CTX corruption WA d4360736a7c0 drm/i915/gen8+: Add RC6 CTX corruption WA 255ed51599de drm/i915/gen8+: Add RC6 CTX corruption WA 1a5a64e0bde8 drm/i915/gen8+: Add RC6 CTX corruption WA 00194ecfb32c drm/i915/gen8+: Add RC6 CTX corruption WA 284d38667f7e drm/i915/gen8+: Add RC6 CTX corruption WA 7e34f4e4aad3 drm/i915/gen8+: Add RC6 CTX corruption WA $ git describe --contains 5013e6d917ac 2248a28384fe d4360736a7c0 255ed51599de 1a5a64e0bde8 00194ecfb32c 284d38667f7e 7e34f4e4aad3 v3.16.77~12 v5.5-rc1~28^2~19 v5.3.11~20 v4.19.84~28 v4.14.154~28 v4.9.201~2 v4.4.201~2 v5.4-rc8~28^2~1 The issue seems to affect *all* the i915/gen8+ GPUs, preventing them to enter sleep states. It seems a patch is available at https://gitlab.freedesktop.org/drm/intel/issues/614 and according to https://gitlab.freedesktop.org/drm/intel/issues/953#note_385488 these seem to be the final versions(?). Apparently it's however kinda stuck to get them in a stable release (not really sure why), so could you possibly cherry pick the patch, since that issue is really a major showstopper for all affected people. Cheers, Chris.
Oh I've just seen that the fixing commit seems to be a already part of 5.5-rc1: $ git log --oneline --all | grep "drm/i915/gt: Schedule request retirement when timeline idles" 311770173fac drm/i915/gt: Schedule request retirement when timeline idles $ git describe --contains 311770173fac v5.5-rc1~28^2^2~6
Guess that issue can be closed as wontfix. Despite a patch being available for nearly two months no in an issue that causes complete breakage of affected systems, there seems to be no intentions to pick it up or release a recent enough kernel which would contain it already. Upstream's apparently also unwilling to submit this to -stable kernels so yeah... people affected to it should probably switch hardware or OS. Cheers, Chris.
This is now neither "fixed" nor "found" in any 5.5 version. Please update the versions properly. This is also tagged "patch" but without a direct link to the patch(es) that are supposed to fix it. (Linking to the upstream bug report is not specific enough.) Ben.
Hey. For several months now, I've been chasing a tremendous heat increase (CPU/GPU) respectively power usage on my notebook. It basically started after upgrading from 5.2 to 5.3, at least I haven't explicitly noted any grave changes from before 5.2 to 5.2. The issue (actually there might be several) persists until at least 5.4 and 5.5. Things are so bad, that when just type this mail,... that I can hear the fan go up considerably (and temps up to 90°C) just by typing the mail in the mail client (while it goes back to - still insane - 65°C idle, when not typing... ok idle here(!) is with firefox running). Similar things when I scroll through a terminal window, Alt-Tab cycle between windows, and so on. Testing is a bit difficult for me, as I couldn't come up with an easy way to reproducibly generate real world load (like this typing, or scrolling terminal windows), yet I tried to do an extensive test series, which I think will illustrate some things. Not really sure what the normal average or idle temps of that CPU are, but I guess getting at average >80°C by just typing shouldn't be the case. 1) Previous tests ***************** When first searching for the reason of the temperature increase, I've had opened several tickets: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=945055 https://lore.kernel.org/lkml/d05aba2742ae42783788c954e2a380e7fcb10830.camel@scientia.net/ Finally to find (by coincidence): https://gitlab.freedesktop.org/drm/intel/issues/614 when reporting: https://gitlab.freedesktop.org/drm/intel/-/issues/953 myself. At first I thought #614 would be the bug, but the fix for that went into 5.5-rc, and in fact, with 5.5.x I do see the GPU entering RC6 sleep states again, yet the temperature of my system is still crazy. 2) Testing Environment ********************** (for these new tests here) - Fujitsu Lifebook U757 - most recent BIOS version (1.25) in the tests below (I've had used an older one in previous tests from the links) - 32GB memory, some Sandisk SSD - Intel(R) Core(TM) i7-7600U CPU @ 2.80GHz - microcode: sig=0x806e9, pf=0x80, revision=0xca - Debian sid, all packages (unless some totally unrelated stuff at their newest versions in unstable) - all used kernels are stock kernels from Debian - I do use full dm-crypt encryption of the system, but that shouldn't be a cause for the problems, I guess. - in my /etc/sysfs.conf I have: devices/system/cpu/intel_pstate/no_turbo = 1 basically since I have that laptop... with turbo enabled I always got these: Apr 5 18:27:07 heisenberg kernel: [ 9884.510420] mce: CPU3: Package temperature above threshold, cpu clock throttled (total events = 2609) Apr 5 18:27:07 heisenberg kernel: [ 9884.510422] mce: CPU1: Package temperature above threshold, cpu clock throttled (total events = 2609) Apr 5 18:27:07 heisenberg kernel: [ 9884.510465] mce: CPU0: Package temperature above threshold, cpu clock throttled (total events = 2609) Apr 5 18:27:07 heisenberg kernel: [ 9884.510467] mce: CPU2: Package temperature above threshold, cpu clock throttled (total events = 2609) Apr 5 18:27:07 heisenberg kernel: [ 9884.511427] mce: CPU3: Package temperature/speed normal Apr 5 18:27:07 heisenberg kernel: [ 9884.511430] mce: CPU0: Package temperature/speed normal Apr 5 18:27:07 heisenberg kernel: [ 9884.511431] mce: CPU1: Package temperature/speed normal Apr 5 18:27:07 heisenberg kernel: [ 9884.511436] mce: CPU2: Package temperature/speed normal => so for the tests with ipntel_pstate not being disabled, turbo mode was always disabled 3) How tests were made ********************** I've tested with the following combinations: - kernels 5.2.17 and 5.5.13 - with and without intel_pstate=disable - with Cinnamon and GNOME Shell in classic mode For all tests the notebook was placed in the same position and ran with the same commands for tests, no other major processes (like firefox or so) were running, just the respective bare desktop environment (cinnamon or gnome shell classic), cron/anacron were stopped. I always took temperature measurements with the output from sensors and CSV output from powertop (which contains all the sleep states and high energy users). Temperature and powertop measurements were started at basically the same time. powertop running for n iterations each 20s. But since powertop takes a while to start the temperature measurements are effectively shorter. a) deep-idle For these tests I've waited very long (like 5 minutes or more) for the system to cool down. Measurements with, e.g.: export NAME="5.2.17/ipstate-disable/thermald-no/gnome-shell- classic/deep-idle" ; timeout 80 sh -c "while true; do sleep 1; sensors; done | grep °C > ${NAME}.temp" and export NAME="5.2.17/ipstate-disable/thermald-no/cinnamon/deep-idle" ; powertop -i 4 --csv=${NAME}.powertop.csv b) idle Basically the same as (a), just not waiting so long to cool down. Effectively I've always produced some load (with the fan and CPU temp noticeably going up over 65°C), then stopped and waited for a minute. c) winmove After waiting a while for the CPU to cool down, starting the measurement and then moving a terminal window fast and constantly in circles over the whole screen. The measurement is split in 3 phases. During warmup and main I've moved the window like crazy. As soon as the cooldown phase began, I've stopped that and did nothing more until the measurement finished. Measurements with, e.g.: export NAME="5.2.17/ipstate-disable/thermald-no/cinnamon/winmove" ; powertop -i 3 --csv=${NAME}.warmup.powertop.csv ; beep ; powertop -i 4 --csv=${NAME}.main.powertop.csv ; beep ; powertop -i 4 -- csv=${NAME}.cooldown.powertop.csv export NAME="5.2.17/ipstate-disable/thermald-no/gnome-shell- classic/winmove" ; powertop -i 3 --csv=${NAME}.warmup.powertop.csv ; beep ; powertop -i 4 --csv=${NAME}.main.powertop.csv ; beep ; powertop -i 4 --csv=${NAME}.cooldown.powertop.csv d) verify, verify-data That measure when running a poorly written shell script of mine, the shell script would read a list of regular files from find, and tries to verify the SHA512 sum of that file, potentially stored as XATTR on it. The script is poorly written, does quite a number of forks, pipes, and so on, which seemed good for these test. Difference between verify and verify-data are directories on which I let find run. with verify: I did it on /home/, where many files don't have my hash XATTRs set with verify-data: I did it on a dir, where basically all files have them set an such there's more going into actual SHA512 calculation Measurements with, e.g.: export NAME="5.2.17/ipstate-disable/thermald-no/gnome-shell- classic/verifyxattr" ; timeout 160 sh -c "while true; do sleep 1; sensors; done | grep °C > ${NAME}.warmup+main.temp" ; beep export NAME="5.2.17/ipstate-disable/thermald-no/cinnamon/verifyxattr" ; powertop -i 8 --csv=${NAME}.warmup+main.powertop.csv ; bee e) mpv-gpu-vaapi Playing back a: Stream #0:0: Video: h264 (High), yuv420p(tv, bt709, progressive), 1920x1080 [SAR 1:1 DAR 16:9], 59.94 fps, 59.94 tbr, 1k tbn, 119.88 tbc (default) video via: mpv someVideo.mkv -ao=null in fullscreen. .config/mpv/mpv.conf has: script-opts=osc-deadzonesize=0 hwdec=auto So the video plays back with: (+) Video --vid=1 (*) (h264 1920x1080 59.940fps) (+) Audio --aid=1 --alang=eng (*) (opus 2ch 48000Hz) Using hardware decoding (vaapi-copy). AO: [null] 48000Hz stereo 2ch floatp VO: [gpu] 1920x1080 nv12 AV: 00:00:01 / 00:36:23 (0%) A-V: 0.000 i.e. gpu and vaapi-copy I've made two phases: warmup+main: starting the video, going immediately to fullscreen cooldown: as soon as the beep for it came, stopping mpv Measurements with, e.g.: export NAME="5.2.17/ipstate-disable/thermald-no/gnome-shell- classic/mpv-gpu-vaapi" ; timeout 80 sh -c "while true; do sleep 1; sensors; done | grep °C > ${NAME}.warmup+main.temp" ; beep ; timeout 80 sh -c "while true; do sleep 1; sensors; done | grep °C > ${NAME}.cooldown.temp" export NAME="5.2.17/ipstate-disable/thermald-no/cinnamon/mpv-gpu-vaapi" ; powertop -i 4 --csv=${NAME}.warmup+main.powertop.csv ; beep ; powertop -i 4 --csv=${NAME}.cooldown.powertop.csv f) unhide-brute Running the unhide program in brute mode (using the C version, not the ruby version). unhide seems to do a lot of forking, which also seems to CPU to go crazy in terms of temperature (at least in some cases). Measurements with, e.g.: export NAME="5.5.13/ipstate-active-hwp/thermald-no/cinnamon/unhide- brute" ; timeout 240 sh -c "while true; do sleep 1; sensors; done | grep °C > ${NAME}.warmup+main.temp" ; beep export NAME="5.5.13/ipstate-active-hwp/thermald-no/cinnamon/unhide- brute" ; powertop -i 12 --csv=${NAME}.warmup+main.powertop.csv ; beep 4) Results ********** For ease of use I've placed all my original test files and derived ones in a git repo: https://github.com/calestyo/cpu-tests At first, I'm just looking at the bare temperatures of the Package, which I've extracted to ./pack-temps for each of the tests. I use ips = intel_pstate. in the following a) deep-idle and idle cinnamon_deep-idle.svg => 5.2 with ips=off runs ~2 °C cooler than 5.5 with ips=on ~4 °C cooler than 5.5 with ips=off ~3 °C cooler than 5.2 with ips=on => so from hottest to coolest: 5.5/ips=off 5.2/ips=on 5.5/ips=on 5.2/ips=off cinnamon_idle.svg => at least it's quite obvious that 5.2 with ips=off runs coolest compared to 5.5 with ips=off it's 4-5 °C... compared to 5.5 with ips=on it's still quite noticeable in the beginning (could be testing though) it's also noticeable cooler than 5.2 with ips=on => so from hottest to coolest: 5.5/ips=off 5.2/ips=on 5.5/ips=on 5.2/ips=off (at least in the end) gnome-shell-classic_deep-idle.svg => I'd say the results are too close, as one could deduce anything valuable here,... but 5.5 looks better here (and ips=on looks even better than having it disabled) gnome-shell-classic_idle.svg => similarly here, the numbers are rather close, but 5.5 with ips=on looks better... could however depend on the testing and the difference is only ~2°C => One also sees, cinnamon runs considerably hotter than gnome-shell- classic, none of them have any special applets or so running (just task bar, workspace switcher, clock). b) winmove.* cinnamon_winmove.warmup.svg cinnamon_winmove.main.svg cinnamon_winmove.cooldown.svg => 5.2/ips=off runs considerably cooler than everything else, something around 10-15°C? => 5.5/ips=on is clearly the worst => 5.2/ips=on and 5.5/ips=off are similar, but 5.2 seems still a bit better during warmup and main In this test it seems: => ips=off is considerably cooler/better for each kernel => 5.2 is considerably better or at least equal than the best of 5.5 gnome-shell-classic_winmove.warmup.svg gnome-shell-classic_winmove.main.svg gnome-shell-classic_winmove.cooldown.svg => again, g-s-c seems to do much better than cinnamon, but => 5.5/ips=off seems worst 5.5/ips=on seems slightly best 5.2/* in the mid-range => but again, the numbers are pretty close so this could be just from testing c) mpv-gpu-vaapi mpv-gpu-vaapi.warmup+main.svg => most likely the blue line (5.2/ips=off/cinnamon) is just bogus, could redo it if someone needs difficult to say something,... => all end up at 95-100°C, the ones from cinnamon much faster => the only thing I'd personally deduce from these is, that hardware acceleration has some severe problem in my setup. I'd have expected that playing back a video in fullscreen should be not problem at all for the GPU mpv-gpu-vaapi.cooldown.svg => the only clear thing, I guess, is that 5.5/ips=off is worst with cinnamon d) verifyxattr and verifyxattr-data cinnamon_verifyxattr.warmup+main.svg cinnamon_verifyxattr-data.warmup+main.svg => the ones with ips=on are considerably worse... => interestingly, for verifyxattr 5.2/ips=off is better with 5.5/ips=off being the 2nd => but for verifyxattr-data (which is ought to be more actual SHA512 computation intensive and less just-forking), it's vice-versa and 5.5/ips=off is better than 5.2/ips=off => seems like a hint that forking and/or process switches or similar things could cause the temperature issues gnome-shell-classic_verifyxattr.warmup+main.svg gnome-shell-classic_verifyxattr-data.warmup+main.svg similarly: => the ones with ips=on are considerably worse... => interestingly for verifyxattr 5.2 and 5.5 with ips=off are more or less the same => but for verifyxattr-data 5.5 is noticeably better => these differences cannot be directly explained by some GPU issues, at least not to my knowledge, since not much graphical output was produced e) unhide-brute cinnamon_unhide-brute.warmup+main.svg gnome-shell-classic_unhide-brute.warmup+main.svg => both are nearly the same, except that under g-s-c, 5.5/ips=on is noticeable worse than under 5.2/ips=on => again, ips=off is *much* better than ips=on => 5.2 is considerably better than 5.5 => these differences cannot be directly explained by some GPU issues, at least not to my knowledge, since not much graphical output was produced I think overall conclusions are mostly: Especially when the temperatures vary greatly, then - 5.2 is much better than 5.5 - ips=off is much better or at least similar to ips=on 5) the powertop files ********************* So far I've only taken a glance on them in trying to deduce anything meaningful. My hope would have been that some experts here have more experience on reading them. ;-) Looking for example at the files from my unhide-brute tests, comparing 5.2/ips=on/cinnamon with 5.5/ips=on/cinnamon, it seemed sometimes that: Timer;tick_sched_timer kWork;intel_atomic_commit_work kWork;free_work might be offenders... but that's all not so obvious (at least to me). 6) Observations / Other *********************** a) One thing I've noted sometimes, but not always: When the system was under "some" load that caused extreme temperatures... even when I stopped that load, temperatures didn't always go back. I mean it's clear that cooling takes a while, but sometimes things went on for 5 mins or more. b) Sometimes it might have seemed, that putting the system to suspend cured the symptoms for a while,... but not always and I haven't tested this in both kernels and ips=* variations. c) While writing this email I'm in 5.5/ips=off/g-s-c ... (not sure whether ips=on would have been much better) The short time idle-temperature is already pretty bad (~65°C) (firefox, which seems to be a bad temperature offender, runs though). Closing FF and the idle temp goes to around ~58-59 (would probably go lower if I wait for longer) But when I now start moving around the mouse pointer, everything stays the roughly same... but I had also situations, where just by moving the pointer, temperatures when to 80°C or higher,... stopped moving the pointer, and the fell again. When I know press&hold a key in a gnome-terminal window, say constantly writing "n" to it... nothing changes. When I do the same in Evolution's mail compose window... temperatures go up to 74°C. Correspondingly I see /usr/lib/x86_64-linux-gnu/webkit2gtk-4.0/WebKitWebProcess going through the roof in powertop. Now one could argue there's just something fishy in Evolution/WebKit, but from what I remember it's by far not that bad (if at all) under 5.2. d) Similarly, if I just move around the pointer via the touchpad: 1.71 W 13,2 ms/s 436,3 Interrupt PS/2 Touchpad / Keyboard / Mouse becomes the top power consumer with quite a lot it seems? e) At least until 5.5 (which fixed the i915 GPU doesn't go to RC6 issue), I quite often saw the temperatures go crazy, while top didn't show that much CPU utilisation. Well it's quite clear if the issue was only in the GPU, but even with that fixed it still seemed at least sometimes during my tests, that I saw extreme temperatures while top didn't show even close to 100% CPU utilisation. f) right now, eventstat shows something like this: Event/s PID %CPU PR NI Task Init Function 81.92 10050 2.1 0 0 WebKitWebProces hrtimer_wakeup 51.95 1600 0.1 0 0 Xorg it_real_fn 49.95 1600 0.1 0 0 Xorg hrtimer_wakeup 46.95 10050 2.1 0 0 WebKitWebProces tick_sched_timer 32.97 3184 0.7 0 0 gnome -terminal- hrtimer_wakeup 28.97 63710 0.0 0 0 [kworker/0:2-event intel_uncore_fw_release_timer 11.99 2831 0.1 0 0 gnome- shell hrtimer_wakeup 8.99 3417 0.1 0 0 evolution hrtimer_wakeup 5.00 1600 0.1 0 0 Xorg tick_sched_timer 5.00 71575 0.3 0 0 top it_real_fn 4.00 3184 0.7 0 0 gnome -terminal- tick_sched_timer 4.00 59885 0.1 0 0 diodon hrtimer_wakeup 3.00 71575 0.3 0 0 top tick_sched_timer 2.00 72584 0.0 0 0 <unknown> tick_sched_timer 2.00 2831 0.1 0 0 gnome- shell timerfd_tmrproc 1.00 72585 0.0 0 0 sleep hrtimer_wakeup 1.00 66991 0.0 0 -20 [kworker/u9:2-i915 tick_sched_timer 1.00 750 0.0 0 0 haveged hrtimer_wakeup that hrtimer_wapeup and tick_sched_timer appear quite often in the top list. Anything that changed there after 5.2? Attaching with strace to that WebKitWebProcess: $ strace -p 10050 strace: Process 10050 attached restart_syscall(<... resuming interrupted read ...>) = 0 recvmsg(3, {msg_namelen=0}, 0) = -1 EAGAIN (Resource temporarily unavailable) recvmsg(3, {msg_namelen=0}, 0) = -1 EAGAIN (Resource temporarily unavailable) poll([{fd=3, events=POLLIN}, {fd=4, events=POLLIN}, {fd=5, events=POLLIN}], 3, 0) = 0 (Timeout) recvmsg(3, {msg_namelen=0}, 0) = -1 EAGAIN (Resource temporarily unavailable) recvmsg(3, {msg_namelen=0}, 0) = -1 EAGAIN (Resource temporarily unavailable) poll([{fd=3, events=POLLIN}, {fd=4, events=POLLIN}, {fd=5, events=POLLIN}], 3, 0) = 0 (Timeout) recvmsg(3, {msg_namelen=0}, 0) = -1 EAGAIN (Resource temporarily unavailable) recvmsg(3, {msg_namelen=0}, 0) = -1 EAGAIN (Resource temporarily unavailable) poll([{fd=3, events=POLLIN}, {fd=4, events=POLLIN}, {fd=5, events=POLLIN}], 3, 0) = 0 (Timeout) recvmsg(3, {msg_namelen=0}, 0) = -1 EAGAIN (Resource temporarily unavailable) recvmsg(3, {msg_namelen=0}, 0) = -1 EAGAIN (Resource temporarily unavailable) poll([{fd=3, events=POLLIN}, {fd=4, events=POLLIN}, {fd=5, events=POLLIN}], 3, 0) = 0 (Timeout) recvmsg(3, {msg_namelen=0}, 0) = -1 EAGAIN (Resource temporarily unavailable) recvmsg(3, {msg_namelen=0}, 0) = -1 EAGAIN (Resource temporarily unavailable) poll([{fd=3, events=POLLIN}, {fd=4, events=POLLIN}, {fd=5, events=POLLIN}], 3, 14) = 0 (Timeout) in a very fast "loop"... Interestingly, printing that out, which is really! fast... doesn't seem to increase the temperature much. When I press&hold a key in evolution strace output shows a lot of these: recvmsg(3, {msg_namelen=0}, 0) = -1 EAGAIN (Resource temporarily unavailable) recvmsg(3, {msg_namelen=0}, 0) = -1 EAGAIN (Resource temporarily unavailable) poll([{fd=3, events=POLLIN}, {fd=4, events=POLLIN}, {fd=5, events=POLLIN}], 3, 0) = 1 ([{fd=4, revents=POLLIN}]) read(4, "\2\0\0\0\0\0\0\0", 16) = 8 write(4, "\1\0\0\0\0\0\0\0", 8) = 8 write(15, "\1\0\0\0\0\0\0\0", 8) = 8 futex(0x7f74d8000b60, FUTEX_WAKE_PRIVATE, 1) = 1 write(15, "\1\0\0\0\0\0\0\0", 8) = 8 futex(0x7f74d8000b60, FUTEX_WAKE_PRIVATE, 1) = 1 write(15, "\1\0\0\0\0\0\0\0", 8) = 8 futex(0x7f74d8000b60, FUTEX_WAKE_PRIVATE, 1) = 1 write(15, "\1\0\0\0\0\0\0\0", 8) = 8 futex(0x7f74d8000b60, FUTEX_WAKE_PRIVATE, 1) = 1 write(10, "\1\0\0\0\0\0\0\0", 8) = 8 futex(0x55c670f5d690, FUTEX_WAKE_PRIVATE, 1) = 1 futex(0x7f74c0005ab0, FUTEX_WAKE_PRIVATE, 1) = 1 futex(0x55c670eff418, FUTEX_WAKE_PRIVATE, 1) = 1 madvise(0x7f74af5ea000, 4096, MADV_NORMAL) = 0 madvise(0x7f74af5ea000, 4096, MADV_DODUMP) = 0 write(15, "\1\0\0\0\0\0\0\0", 8) = 8 futex(0x7f74d8000b60, FUTEX_WAKE_PRIVATE, 1) = 1 write(15, "\1\0\0\0\0\0\0\0", 8) = 8 futex(0x7f74d8000b60, FUTEX_WAKE_PRIVATE, 1) = 1 write(10, "\1\0\0\0\0\0\0\0", 8) = 8 futex(0x55c670f5d690, FUTEX_WAKE_PRIVATE, 1) = 1 futex(0x7f74c0005ab0, FUTEX_WAKE_PRIVATE, 1) = 1 futex(0x55c670eff418, FUTEX_WAKE_PRIVATE, 1) = 1 write(15, "\1\0\0\0\0\0\0\0", 8) = 8 futex(0x7f74d8000b60, FUTEX_WAKE_PRIVATE, 1) = 1 write(15, "\1\0\0\0\0\0\0\0", 8) = 8 futex(0x7f74d8000b60, FUTEX_WAKE_PRIVATE, 1) = 1 write(10, "\1\0\0\0\0\0\0\0", 8) = 8 futex(0x55c670f5d690, FUTEX_WAKE_PRIVATE, 1) = 1 futex(0x7f74c0005ab0, FUTEX_WAKE_PRIVATE, 1) = 1 futex(0x55c670eff418, FUTEX_WAKE_PRIVATE, 1) = 1 write(10, "\1\0\0\0\0\0\0\0", 8) = 8 futex(0x55c670f5d690, FUTEX_WAKE_PRIVATE, 1) = 1 futex(0x7f74c0005ab0, FUTEX_WAKE_PRIVATE, 1) = 1 futex(0x55c670eff418, FUTEX_WAKE_PRIVATE, 1) = 1 write(15, "\1\0\0\0\0\0\0\0", 8) = 8 futex(0x7f74d8000b60, FUTEX_WAKE_PRIVATE, 1) = 1 recvmsg(3, {msg_namelen=0}, 0) = -1 EAGAIN (Resource temporarily unavailable) recvmsg(3, {msg_namelen=0}, 0) = -1 EAGAIN (Resource temporarily unavailable) poll([{fd=3, events=POLLIN}, {fd=4, events=POLLIN}, {fd=5, events=POLLIN}], 3, 0) = 1 ([{fd=4, revents=POLLIN}]) read(4, "\2\0\0\0\0\0\0\0", 16) = 8 write(4, "\1\0\0\0\0\0\0\0", 8) = 8 write(15, "\1\0\0\0\0\0\0\0", 8) = 8 futex(0x7f74d8000b60, FUTEX_WAKE_PRIVATE, 1) = 1 memfd_create("WebKitSharedMemory", MFD_CLOEXEC) = 23 ftruncate(23, 18585600) = 0 mmap(NULL, 18585600, PROT_READ|PROT_WRITE, MAP_SHARED, 23, 0) = 0x7f74ae346000 fcntl(23, F_DUPFD_CLOEXEC, 0) = 18 munmap(0x7f74ae346000, 18585600) = 0 close(23) = 0 write(15, "\1\0\0\0\0\0\0\0", 8) = 8 futex(0x7f74d8000b60, FUTEX_WAKE_PRIVATE, 1) = 1 write(15, "\1\0\0\0\0\0\0\0", 8) = 8 futex(0x7f74d8000b60, FUTEX_WAKE_PRIVATE, 1) = 1 write(15, "\1\0\0\0\0\0\0\0", 8) = 8 futex(0x7f74d8000b60, FUTEX_WAKE_PRIVATE, 1) = 1 write(15, "\1\0\0\0\0\0\0\0", 8) = 8 futex(0x7f74d8000b60, FUTEX_WAKE_PRIVATE, 1) = 1 write(15, "\1\0\0\0\0\0\0\0", 8) = 8 futex(0x7f74d8000b60, FUTEX_WAKE_PRIVATE, 1) = 1 write(15, "\1\0\0\0\0\0\0\0", 8) = 8 futex(0x7f74d8000b60, FUTEX_WAKE_PRIVATE, 1) = 1 write(15, "\1\0\0\0\0\0\0\0", 8) = 8 futex(0x7f74d8000b60, FUTEX_WAKE_PRIVATE, 1) = 1 write(15, "\1\0\0\0\0\0\0\0", 8) = 8 futex(0x7f74d8000b60, FUTEX_WAKE_PRIVATE, 1) = 1 write(10, "\1\0\0\0\0\0\0\0", 8) = 8 futex(0x55c670f5d690, FUTEX_WAKE_PRIVATE, 1) = 1 futex(0x7f74c0005ab0, FUTEX_WAKE_PRIVATE, 1) = 1 futex(0x55c670eff418, FUTEX_WAKE_PRIVATE, 1) = 1 madvise(0x7f74af5e9000, 4096, MADV_NORMAL) = 0 madvise(0x7f74af5e9000, 4096, MADV_DODUMP) = 0 write(15, "\1\0\0\0\0\0\0\0", 8) = 8 futex(0x7f74d8000b60, FUTEX_WAKE_PRIVATE, 1) = 1 write(15, "\1\0\0\0\0\0\0\0", 8) = 8 futex(0x7f74d8000b60, FUTEX_WAKE_PRIVATE, 1) = 1 write(10, "\1\0\0\0\0\0\0\0", 8) = 8 futex(0x55c670f5d690, FUTEX_WAKE_PRIVATE, 1) = 1 futex(0x7f74c0005ab0, FUTEX_WAKE_PRIVATE, 1) = 1 futex(0x55c670eff418, FUTEX_WAKE_PRIVATE, 1) = 1 write(15, "\1\0\0\0\0\0\0\0", 8) = 8 futex(0x7f74d8000b60, FUTEX_WAKE_PRIVATE, 1) = 1 write(15, "\1\0\0\0\0\0\0\0", 8) = 8 futex(0x7f74d8000b60, FUTEX_WAKE_PRIVATE, 1) = 1 write(10, "\1\0\0\0\0\0\0\0", 8) = 8 futex(0x55c670f5d690, FUTEX_WAKE_PRIVATE, 1) = 1 futex(0x55c670f5d690, FUTEX_WAIT_PRIVATE, 2, NULL) = -1 EAGAIN (Resource temporarily unavailable) futex(0x55c670f5d690, FUTEX_WAKE_PRIVATE, 1) = 0 futex(0x7f74c0005ab0, FUTEX_WAKE_PRIVATE, 1) = 1 futex(0x55c670eff418, FUTEX_WAKE_PRIVATE, 1) = 1 write(10, "\1\0\0\0\0\0\0\0", 8) = 8 futex(0x55c670f5d690, FUTEX_WAKE_PRIVATE, 1) = 1 futex(0x7f74c0005ab0, FUTEX_WAKE_PRIVATE, 1) = 1 futex(0x55c670eff418, FUTEX_WAKE_PRIVATE, 1) = 1 write(15, "\1\0\0\0\0\0\0\0", 8) = 8 futex(0x7f74d8000b60, FUTEX_WAKE_PRIVATE, 1) = 1 recvmsg(3, {msg_namelen=0}, 0) = -1 EAGAIN (Resource temporarily unavailable) recvmsg(3, {msg_namelen=0}, 0) = -1 EAGAIN (Resource temporarily unavailable) poll([{fd=3, events=POLLIN}, {fd=4, events=POLLIN}, {fd=5, events=POLLIN}], 3, 0) = 1 ([{fd=4, revents=POLLIN}]) read(4, "\2\0\0\0\0\0\0\0", 16) = 8 write(4, "\1\0\0\0\0\0\0\0", 8) = 8 write(15, "\1\0\0\0\0\0\0\0", 8) = 8 futex(0x7f74d8000b60, FUTEX_WAKE_PRIVATE, 1) = 1 memfd_create("WebKitSharedMemory", MFD_CLOEXEC) = 24 ftruncate(24, 18585600) = 0 mmap(NULL, 18585600, PROT_READ|PROT_WRITE, MAP_SHARED, 24, 0) = 0x7f74ae346000 fcntl(24, F_DUPFD_CLOEXEC, 0) = 18 munmap(0x7f74ae346000, 18585600) = 0 close(24) = 0 write(15, "\1\0\0\0\0\0\0\0", 8) = 8 futex(0x7f74d8000b60, FUTEX_WAKE_PRIVATE, 1) = 1 write(15, "\1\0\0\0\0\0\0\0", 8) = 8 futex(0x7f74d8000b60, FUTEX_WAKE_PRIVATE, 1) = 1 write(15, "\1\0\0\0\0\0\0\0", 8) = 8 futex(0x7f74d8000b60, FUTEX_WAKE_PRIVATE, 1) = 1 write(15, "\1\0\0\0\0\0\0\0", 8) = 8 futex(0x7f74d8000b60, FUTEX_WAKE_PRIVATE, 1) = 1 write(15, "\1\0\0\0\0\0\0\0", 8) = 8 futex(0x7f74d8000b60, FUTEX_WAKE_PRIVATE, 1) = 1 write(15, "\1\0\0\0\0\0\0\0", 8) = 8 futex(0x7f74d8000b60, FUTEX_WAKE_PRIVATE, 1) = 1 write(15, "\1\0\0\0\0\0\0\0", 8) = 8 futex(0x7f74d8000b60, FUTEX_WAKE_PRIVATE, 1) = 1 recvmsg(3, {msg_namelen=0}, 0) = -1 EAGAIN (Resource temporarily unavailable) recvmsg(3, {msg_namelen=0}, 0) = -1 EAGAIN (Resource temporarily unavailable) poll([{fd=3, events=POLLIN}, {fd=4, events=POLLIN}, {fd=5, events=POLLIN}], 3, 0) = 1 ([{fd=4, revents=POLLIN}]) read(4, "\2\0\0\0\0\0\0\0", 16) = 8 write(4, "\1\0\0\0\0\0\0\0", 8) = 8 write(4, "\1\0\0\0\0\0\0\0", 8) = 8 write(4, "\1\0\0\0\0\0\0\0", 8) = 8 write(4, "\1\0\0\0\0\0\0\0", 8) = 8 write(15, "\1\0\0\0\0\0\0\0", 8) = 8 futex(0x7f74d8000b60, FUTEX_WAKE_PRIVATE, 1) = 1 write(10, "\1\0\0\0\0\0\0\0", 8) = 8 futex(0x55c670f5d690, FUTEX_WAKE_PRIVATE, 1) = 1 futex(0x7f74c0005ab0, FUTEX_WAKE_PRIVATE, 1) = 1 futex(0x55c670eff418, FUTEX_WAKE_PRIVATE, 1) = 1 write(15, "\1\0\0\0\0\0\0\0", 8) = 8 futex(0x7f74d8000b60, FUTEX_WAKE_PRIVATE, 1) = 1 write(15, "\1\0\0\0\0\0\0\0", 8) = 8 futex(0x7f74d8000b60, FUTEX_WAKE_PRIVATE, 1) = 1 write(10, "\1\0\0\0\0\0\0\0", 8) = 8 futex(0x55c670f5d690, FUTEX_WAKE_PRIVATE, 1) = 1 futex(0x7f74c0005ab0, FUTEX_WAKE_PRIVATE, 1) = 0 futex(0x55c670eff418, FUTEX_WAKE_PRIVATE, 1) = 1 write(15, "\1\0\0\0\0\0\0\0", 8) = 8 futex(0x7f74d8000b60, FUTEX_WAKE_PRIVATE, 1) = 1 write(15, "\1\0\0\0\0\0\0\0", 8) = 8 futex(0x7f74d8000b60, FUTEX_WAKE_PRIVATE, 1) = 1 write(10, "\1\0\0\0\0\0\0\0", 8) = 8 futex(0x55c670f5d690, FUTEX_WAKE_PRIVATE, 1) = 1 futex(0x7f74c0005ab0, FUTEX_WAKE_PRIVATE, 1) = 1 futex(0x55c670eff418, FUTEX_WAKE_PRIVATE, 1) = 1 write(10, "\1\0\0\0\0\0\0\0", 8) = 8 futex(0x55c670f5d690, FUTEX_WAKE_PRIVATE, 1) = 1 futex(0x7f74c0005ab0, FUTEX_WAKE_PRIVATE, 1) = 1 futex(0x55c670eff418, FUTEX_WAKE_PRIVATE, 1) = 1 write(15, "\1\0\0\0\0\0\0\0", 8) = 8 futex(0x7f74d8000b60, FUTEX_WAKE_PRIVATE, 1) = 1 recvmsg(3, {msg_namelen=0}, 0) = -1 EAGAIN (Resource temporarily unavailable) recvmsg(3, {msg_namelen=0}, 0) = -1 EAGAIN (Resource temporarily unavailable) poll([{fd=3, events=POLLIN}, {fd=4, events=POLLIN}, {fd=5, events=POLLIN}], 3, 0) = 1 ([{fd=4, revents=POLLIN}]) read(4, "\5\0\0\0\0\0\0\0", 16) = 8 write(4, "\1\0\0\0\0\0\0\0", 8) = 8 write(15, "\1\0\0\0\0\0\0\0", 8) = 8 futex(0x7f74d8000b60, FUTEX_WAKE_PRIVATE, 1) = 1 memfd_create("WebKitSharedMemory", MFD_CLOEXEC) = 24 ftruncate(24, 18585600) = 0 mmap(NULL, 18585600, PROT_READ|PROT_WRITE, MAP_SHARED, 24, 0) = 0x7f74ae346000 fcntl(24, F_DUPFD_CLOEXEC, 0) = 18 Not sure if that's normal... Anyway, back to my needle-in-the-haystack-search: diodon, which is a small little clipboard helper also shows up in powertop since a while, not it that much mW, but still much more I'd have expected from a little tool that does basically nothing. Attaching to it with strace again reveals a lot of polling: writev(7, [{iov_base="\203(\3\0\336\5\0\0\2\0\0\0", iov_len=12}, {iov_base=NULL, iov_len=0}, {iov_base="", iov_len=0}], 3) = 12 poll([{fd=7, events=POLLIN}], 1, -1) = 1 ([{fd=7, revents=POLLIN}]) recvmsg(7, {msg_name=NULL, msg_namelen=0, msg_iov=[{iov_base="\1(!T\7\0\0\0\336\5\0\0[\201S\2\0\0]\0\0\0!\1\0\0]\ 0\0\0!\1"..., iov_len=4096}], msg_iovlen=1, msg_controllen=0, msg_flags=0}, 0) = 60 poll([{fd=3, events=POLLIN}, {fd=7, events=POLLIN}, {fd=8, events=POLLIN}], 3, 0) = 1 ([{fd=3, revents=POLLIN}]) read(3, "\1\0\0\0\0\0\0\0", 16) = 8 recvmsg(7, {msg_namelen=0}, 0) = -1 EAGAIN (Resource temporarily unavailable) recvmsg(7, {msg_namelen=0}, 0) = -1 EAGAIN (Resource temporarily unavailable) poll([{fd=3, events=POLLIN}, {fd=7, events=POLLIN}, {fd=8, events=POLLIN}], 3, 494) = 0 (Timeout) recvmsg(7, {msg_namelen=0}, 0) = -1 EAGAIN (Resource temporarily unavailable) recvmsg(7, {msg_namelen=0}, 0) = -1 EAGAIN (Resource temporarily unavailable) poll([{fd=3, events=POLLIN}, {fd=7, events=POLLIN}, {fd=8, events=POLLIN}], 3, 3) = 0 (Timeout) poll([{fd=7, events=POLLIN|POLLOUT}], 1, -1) = 1 ([{fd=7, revents=POLLOUT}]) writev(7, [{iov_base="\22\0\7\0\2\0\300\4L\2\0\0L\2\0\0\10\0\0\0\1\0\0\0a\1\0\0", iov_len=28}, {iov_base=NULL, iov_len=0}, {iov_base="", iov_len=0}], 3) = 28 poll([{fd=7, events=POLLIN}], 1, -1) = 1 ([{fd=7, revents=POLLIN}]) recvmsg(7, {msg_name=NULL, msg_namelen=0, msg_iov=[{iov_base="\34\0\"T\2\0\300\4L\2\0\0\207f\24\2\0\0\0\0\0\0\0\0 \0\0\0\0\0\0\0\0", iov_len=4096}], msg_iovlen=1, msg_controllen=0, msg_flags=0}, 0) = 32 recvmsg(7, {msg_namelen=0}, 0) = -1 EAGAIN (Resource temporarily unavailable) poll([{fd=7, events=POLLIN|POLLOUT}], 1, -1) = 1 ([{fd=7, revents=POLLOUT}]) writev(7, [{iov_base="\27\0\2\0\1\0\0\0", iov_len=8}, {iov_base=NULL, iov_len=0}, {iov_base="", iov_len=0}], 3) = 8 poll([{fd=7, events=POLLIN}], 1, -1) = 1 ([{fd=7, revents=POLLIN}]) recvmsg(7, {msg_name=NULL, msg_namelen=0, msg_iov=[{iov_base="\1\0#T\0\0\0\0\2\0@\2\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0 \0\0\0\0\0", iov_len=4096}], msg_iovlen=1, msg_controllen=0, msg_flags=0}, 0) = 32 poll([{fd=7, events=POLLIN|POLLOUT}], 1, -1) = 1 ([{fd=7, revents=POLLOUT}]) writev(7, [{iov_base="\30\0\6\0\2\0\300\4\1\0\0\0M\2\0\0\230\1\0\0\207f\24\2", iov_len=24}, {iov_base=NULL, iov_len=0}, {iov_base="", iov_len=0}], 3) = 24 recvmsg(7, {msg_namelen=0}, 0) = -1 EAGAIN (Resource temporarily unavailable) recvmsg(7, {msg_namelen=0}, 0) = -1 EAGAIN (Resource temporarily unavailable) poll([{fd=3, events=POLLIN}, {fd=7, events=POLLIN}, {fd=8, events=POLLIN}], 3, 494) = 1 ([{fd=7, revents=POLLIN}]) recvmsg(7, {msg_name=NULL, msg_namelen=0, msg_iov=[{iov_base="\34\0$T\2\0\300\4\230\1\0\0\214f\24\2\0\0\0\0\0\0\0 \0\0\0\0\0\0\0\0\0"..., iov_len=4096}], msg_iovlen=1, msg_controllen=0, msg_flags=0}, 0) = 64 recvmsg(7, {msg_namelen=0}, 0) = -1 EAGAIN (Resource temporarily unavailable) All this again with 5.5/ips=off/g-c-s. g) Looking at powertop: 1.56 W 9,8 ms/s 397,8 Interrupt PS/2 Touchpad / Keyboard / Mouse 836 mW 1,6 ms/s 213,9 kWork dbs_work_handler 804 mW 2,7 ms/s 205,5 Timer tick_sched_timer 594 mW 30,2 ms/s 144,6 Process [PID 2831] /usr/bin/gnome-shell 585 mW 35,6 ms/s 140,9 Process [PID 1600] /usr/lib/xorg/Xorg :0 -seat seat0 -auth /var/run/lightdm/root/:0 -nolisten (this is during and effectively idly system) tick_sched_timer and dbs_work_handler appear there quite often at the top. And they keyboard/mouse/touchpad, too. 7) Conclusions ************** Well not that many except: - video acceleration seems not really working - cinnamon generally worse than gnome-shell-classic (which of course doesn't have to be a kernel issue, but it still seems to have gotten worse with >5.2 ... and it might be a pointer to what's wrong in the kernel - when the temperature differs more greatly in the measurements, than 5.2 seems typically much better than 5.5 and ips=off, too. My tests are obviously somehow limited. None of them simulates the "normal" usage, like just switching between windows (well the winmove tests does to some extent), scrolling up and down in a window and so on.... and these use cases also greatly increase temperature... not rarely over 80°C. Any ideas what could cause all this? Context switches? Spectre&friends protections that were added after 5.2? Interrupts? Something related to polling? Or any ideas what to do in terms of further tests? Other kernel options? Or other tools (things like eventstat and so on)? Any help would be appreciated, cause right now my laptop is more of an oven and it starts to literally burn my legs when I work with it. Thanks, Chris.
I've made some further very extensive tests in the meantime, but these were mostly for clearly GPU related stuff, i.e. the problem that the temperatures go through the roof when playing back any video. These were reported here: https://gitlab.freedesktop.org/drm/intel/-/issues/953#note_463451 But I haven't made any plots/conclusions for that new set of tests, yet (will keep this ticket updated once I've done). As for the general (I mean even when doing non-graphics intensive stuff like the unhide-brute or sha512 sum verify tests that I've described above) extreme temperature increase since >5.2 that I see, ... what I would try next is whether mitigations=off changes anything (it didn't for video playback). Also I found out about the nice features of perf record respectively perf report. I've played a bit with that already and the first "results" showed that when I do anyting (like just typing at the keyboard, quickly moving up/down in e.g. Evolutions mail list, or just Alt-Tab-ing between windows, the number of events recorded there increases by magnitudes(!!). I'd be thankful for any guide in what to actually test to better nail down that problem I see. Thanks!
Hey Ben. Took a while till I got the mail that the bug was unarchived so I didn't update everything immediately. found-in-version was based on my guess that the problems I see since versions > 5.2 were caused by https://gitlab.freedesktop.org/drm/intel/issues/614 That bug was a regression introduced by a security fix that prevented the GPU from entering RC6 sleep states. perf showed me that I was affected by it, so I assumed the fix (which was introduced in 5.5rc-something) would solve everything. It didn't, as my fruther test series, which I've just sent to this Debian as well, showed. Even with 5.5 I see a tremendous temperature increase. Unfortunately I'm by far not an expert enough to really tell where the problem comes from (I'd say there may be even different problems involved)... and I'd also need guiding what to actually test, to better nail it down. When I saw the problem still occurs with 5.5, I've made another test series and reported it first at lkml: https://lore.kernel.org/lkml/ce8097694ddfab616616f8f81521495d99c74416.camel@scientia.net/T/#u When I got no response I've updated my older ticket at intel-drm: https://gitlab.freedesktop.org/drm/intel/-/issues/953 My tests would indicate that there are a number of temperature problems, in short: - GPU intensive stuff (like playing videos) - GPU stuff which shouldn't be intensive at all (e.g. moving around windows) but also: - supposedly non-GPU intensive stuff like Alt-Tab-ing between windows, scrolling up/down in lists in the GUI) - stuff which doesn't even do graphics at all (see the unhide-brute and (SHA)-verify tests I've made. For the GPU-intensive stuff (specifically that I hit 100°C when I play any videos) there is: https://gitlab.freedesktop.org/drm/intel/issues/956 (intel-drm folks had asked me to put it in a separate issue) For the general stuff (e.g. unhide brute or SHA512 verification running much hotter), there is: - the post to lkml - https://bugzilla.kernel.org/show_bug.cgi?id=207245 - and since intel_pstate being enabled there's also: https://bugzilla.kernel.org/show_bug.cgi?id=207247 The different tickets contain also descriptions of symptoms I've see, e.g. where temperatures go through the roof even when just moving windows, Alt-Tab-switching between them, scrolling up/down in a window, and so on. See especially the plots in the git repo I've provided, which shows how much higher the temperature is from 5.2 to 5.5 (and for each of them for intel_pstate being on or off). Any help on what to test would be highly appreciated. I did some preliminary tests with perf record, while then e.g. scrolling up/down in a GUI window (used the mail list in Evolution) while the temperatures go up to ~80°C ... This would have indicated that during that, the number of events as recorded by perf record, grows by a magnitude. I haven't had time yet to make more systematic tests. Thanks, Chris.
I've upgraded to 5.5.17 (again the stock Debian sid package), and all future tests with 5.5.x will be with this. Problems unchanged. I've also checked 5.5.17 with intel_pstate being enabled but at the same time using: iommu=off mitigations=off pci=nomsi I didn't repeat all tests as extensively as they're in the git repo, but I've played back a video with mpv and did some casual working (Atl- Tab-switching between windows, scrolling/up down in some windows, etc.). None of these seem to help in terms of my CPU temperature going through the roof.
Hi This bug was filed for a very old kernel or the bug is old itself without resolution. If you can reproduce it with - the current version in unstable/testing - the latest kernel from backports please reopen the bug, see https://www.debian.org/Bugs/server-control for details. Regards, Salvatore
Hi This bug was filed for a very old kernel or the bug is old itself without resolution. If you can reproduce it with - the current version in unstable/testing - the latest kernel from backports please reopen the bug, see https://www.debian.org/Bugs/server-control for details. Regards, Salvatore
This bug has actually NOT been fixed. It's NOT the one with the CPU not going into the RC6 state. Cheers, Chris.
Hi, In this case I'm reopening the bug again. But I suggest to ping again upstream in this case, because without progress/movement/ideas upstream we cannot do anything here downstream. Regards, Salvatore, trying to do some maintenance on open src:linux bugs without progress.
I'm afraid that upstream has shown pretty clearly that they have basically no interest to look into that issue (guess Intel only spends money on stuff they still sell). Just look at the bug reports I've linked to (and the subsequent ones linkes from there). I put in many hours of testing and made many plots from which it should be clear that something is quite wrong. But no further reaction. For me, I found a workaround: The CPU/GPU would (according to upstream devs) be very well capable of controlling a HiDPI screen and doing FullHD playback there (actually the developer said it should easily do several such stream). And the notebook in fact has a HiDPI screen. Now starting around after kernel 5.2, with HiDPI resolutions enabled, the issues I've described show up: - even little GPU loads like moving windows in circles leading to extremely high temperatures ~70-90°C - video playback in fullscreen generally 100°C. - even non-GPU related load seemed to have caused such issues, as my tests showed. At some point, by chance I reduced the screen resolution to "just" 1920x1080 (from the HiDPI default of 3840x2160). That immediately solved all issues. Well, actually, e.g. Cinnamon still seems to run under higher CPU temperatures than e.g. GNOME Classic (and I'm not only talking about the idle temperature, but also when doing things like moving Windows), but it's all so low now that I can live with it. Under HiDPI, the difference bettween Cinnamon and GNOME Classic was sometimes quite considerable, IIRC. That being said, I think you may further lower the severity if you wish, but it makes perhaps sense to keep the bug open for a while (I guess until the CPU/GPU is so old, nobody would likely still use it), so that people have an easier chance of finding it (and the workaround)? Cheers, Chris.