#616301 xserver-xorg-video-radeon:screen goes black, system hangs after 2sec:[youtube(FF/Opera)-reset req.]

Package:
xserver-xorg-video-radeon
Source:
xserver-xorg-video-ati
Description:
X.Org X server -- AMD/ATI Radeon display driver
Submitter:
slyher
Date:
2011-10-06 07:24:03 UTC
Severity:
important
#616301#5
Date:
2011-03-03 10:41:26 UTC
From:
To:
Hi there.
After updating the packages 2. March 2011 playing youtube videos got somehow
imposible. Screen goes black after movie plays for 2-4 seconds, sound may
continue for few seconds and also stops.
I tried to login to a console but after few tries screen started blinking white
and console 'froze'. The white color stayed- I've powered down the machine the
hard way.
Tried on different browsers Firefox and opera with same effect.
I've attached information.
If more information needed dont hesitate to write.
Sincerely Slyher.

#616301#10
Date:
2011-03-03 11:24:45 UTC
From:
To:
What packages were upgraded from/to which versions?
#616301#15
Date:
2011-03-03 11:54:16 UTC
From:
To:
W dniu 03.03.2011 12:24, Michel Dänzer pisze:
Hi there.
What I can tell from /var/log/apt/history.log

Start-Date: 2011-03-01  17:27:00
Commandline: apt-get dist-upgrade
Upgrade: foomatic-db-engine:i386 (4.0.4-3, 4.0.7-1), dnsmasq-base:i386
(2.55-2+b1, 2.57-1), python-markupsafe:i386 (0.9.2-3, 0.12-1),
musescore-common:i386 (0.9.6+dfsg-1, 1.0+dfsg-2), musescore:i386
(0.9.6+dfsg-1, 1.0+dfsg-2), gpgv:i386 (1.4.10-4, 1.4.11-3),
foomatic-filters:i386 (4.0.5-6, 4.0.7-1), foomatic-filters-ppds:i386
(4.0.4-3, 4.0.7-1), musescore-soundfont-gm:i386 (0.9.6+dfsg-1,
1.0+dfsg-2), gnupg:i386 (1.4.10-4, 1.4.11-3)
End-Date: 2011-03-01  17:28:29

Start-Date: 2011-03-02  09:03:39
Commandline: apt-get dist-upgrade
Install: libclass-load-perl:i386 (0.06-1, automatic)
Upgrade: desktop-base:i386 (6.0.5, 6.0.6), libcupscgi1:i386 (1.4.5-3,
1.4.6-1), cups-client:i386 (1.4.5-3, 1.4.6-1), libcupsmime1:i386
(1.4.5-3, 1.4.6-1), libnspr4-0d:i386 (4.8.6-1, 4.8.7-2), cups-ppdc:i386
(1.4.5-3, 1.4.6-1), libcupsppdc1:i386 (1.4.5-3, 1.4.6-1),
cups-common:i386 (1.4.5-3, 1.4.6-1), libcups2:i386 (1.4.5-3, 1.4.6-1),
libgsf-1-common:i386 (1.14.19-2, 1.14.19-3), cups:i386 (1.4.5-3,
1.4.6-1), libcupsdriver1:i386 (1.4.5-3, 1.4.6-1), libgdata7:i386
(0.6.4-2, 0.6.4-3), cups-bsd:i386 (1.4.5-3, 1.4.6-1), libcupsimage2:i386
(1.4.5-3, 1.4.6-1), libdatetime-timezone-perl:i386 (1.23-1+2010n,
1.28-1+2011b), libgsf-1-114:i386 (1.14.19-2, 1.14.19-3),
libgdata-common:i386 (0.6.4-2, 0.6.4-3)
End-Date: 2011-03-02  09:05:00

#616301#20
Date:
2011-03-03 16:49:00 UTC
From:
To:
Hi.

I'm experiencing the same problem, i.e. whenever I open a youtube.com website with
video content my system hangs (black screen and monitor reports/gives warning of wrong
input frequencies).

Until at least two days ago I could play youtube videos without issues using iceweasel
with Adobe Flash Player (flashplugin-nonfree); but not today.
I also run 'aptitude update && aptitude upgrade' regularly but I doubt the following
packages have something to do with this.

/var/log/apt/history.log:

Start-Date: 2011-03-01  03:11:17
Upgrade: libsmbclient:amd64 (3.5.6~dfsg-3, 3.5.6~dfsg-3squeeze2), smbclient:amd64 (3.5.6~dfsg-3, 3.5.6~dfsg-3squeeze2), libwbclient0:amd64 (3.5.6~dfsg-3, 3.5.6~dfsg-3squeeze2), libavahi-glib1:amd64 (0.6.27-2, 0.6.27-2+squeeze1), libavahi-common-data:amd64 (0.6.27-2, 0.6.27-2+squeeze1), libavahi-core7:amd64 (0.6.27-2, 0.6.27-2+squeeze1), samba-common:amd64 (3.5.6~dfsg-3, 3.5.6~dfsg-3squeeze2), avahi-dnsconfd:amd64 (0.6.27-2, 0.6.27-2+squeeze1), avahi-daemon:amd64 (0.6.27-2, 0.6.27-2+squeeze1), libavahi-client3:amd64 (0.6.27-2, 0.6.27-2+squeeze1), samba-common-bin:amd64 (3.5.6~dfsg-3, 3.5.6~dfsg-3squeeze2), libavahi-common3:amd64 (0.6.27-2, 0.6.27-2+squeeze1)
End-Date: 2011-03-01  03:11:24

Start-Date: 2011-03-03  02:39:51
Upgrade: libpango1.0-common:amd64 (1.28.3-1+squeeze1, 1.28.3-1+squeeze2), libpango1.0-0:amd64 (1.28.3-1+squeeze1, 1.28.3-1+squeeze2)
End-Date: 2011-03-03  02:39:57


More interesting are the syslog entries when hitting the bug (abbreviated).

/var/log/syslog:

Mar  3 10:17:01 banshee /USR/SBIN/CRON[9664]: (root) CMD (   cd / && run-parts --report /etc/cron.hourly)
Mar  3 10:18:29 banshee kernel: [27296.676053] [drm:radeon_fence_wait] *ERROR* fence(ffff880061522f40:0x0031D254) 508ms timeout going to reset GPU
Mar  3 10:18:29 banshee kernel: [27296.676064] radeon 0000:01:05.0: GPU softreset
Mar  3 10:18:29 banshee kernel: [27296.676070] radeon 0000:01:05.0:   R_008010_GRBM_STATUS=0xE77324AC
Mar  3 10:18:29 banshee kernel: [27296.676076] radeon 0000:01:05.0:   R_008014_GRBM_STATUS2=0x00110103
Mar  3 10:18:29 banshee kernel: [27296.676081] radeon 0000:01:05.0:   R_000E50_SRBM_STATUS=0x20001040
Mar  3 10:18:29 banshee kernel: [27296.676090] radeon 0000:01:05.0:   R_008020_GRBM_SOFT_RESET=0x00007FEE
Mar  3 10:18:29 banshee kernel: [27296.676146] radeon 0000:01:05.0: R_008020_GRBM_SOFT_RESET=0x00000001
Mar  3 10:18:29 banshee kernel: [27296.676204] radeon 0000:01:05.0:   R_000E60_SRBM_SOFT_RESET=0x00000402
Mar  3 10:18:29 banshee kernel: [27296.676360] radeon 0000:01:05.0:   R_008010_GRBM_STATUS=0x00003030
Mar  3 10:18:29 banshee kernel: [27296.676365] radeon 0000:01:05.0:   R_008014_GRBM_STATUS2=0x00000003
Mar  3 10:18:29 banshee kernel: [27296.676370] radeon 0000:01:05.0:   R_000E50_SRBM_STATUS=0x20000040
<-- message loop (trying to softreset endlessly) -->
Mar  3 10:20:00 banshee kernel: [27387.493788] [drm:radeon_fence_wait] *ERROR* fence(ffff880062eb7540:0x0031D382) 516ms timeout
Mar  3 10:20:00 banshee kernel: [27387.493797] [drm:radeon_fence_wait] *ERROR* last signaled fence(0x0031D382)
Mar  3 10:20:03 banshee kernel: [27390.008063] [drm:radeon_fence_wait] *ERROR* fence(ffff880062eb7dc0:0x0031D384) 504ms timeout going to reset GPU
Mar  3 10:20:03 banshee kernel: [27390.008076] radeon 0000:01:05.0: GPU softreset
Mar  3 10:20:03 banshee kernel: [27390.008082] radeon 0000:01:05.0:   R_008010_GRBM_STATUS=0xA0003030
Mar  3 10:20:03 banshee kernel: [27390.008089] radeon 0000:01:05.0:   R_008014_GRBM_STATUS2=0x00000003
Mar  3 10:20:03 banshee kernel: [27390.008096] radeon 0000:01:05.0:   R_000E50_SRBM_STATUS=0x20000040
Mar  3 10:20:03 banshee kernel: [27390.008106] radeon 0000:01:05.0:   R_008020_GRBM_SOFT_RESET=0x00007FEE
Mar  3 10:20:03 banshee kernel: [27390.008164] radeon 0000:01:05.0: R_008020_GRBM_SOFT_RESET=0x00000001
Mar  3 10:20:03 banshee kernel: [27390.008224] radeon 0000:01:05.0:   R_000E60_SRBM_SOFT_RESET=0x00000402
Mar  3 10:20:03 banshee kernel: [27390.008382] radeon 0000:01:05.0:   R_008010_GRBM_STATUS=0x00003030
Mar  3 10:20:03 banshee kernel: [27390.008392] radeon 0000:01:05.0:   R_008014_GRBM_STATUS2=0x00000003
Mar  3 10:20:03 banshee kernel: [27390.008401] radeon 0000:01:05.0:   R_000E50_SRBM_STATUS=0x20000040
Mar  3 10:20:03 banshee kernel: [27390.009795] [drm:radeon_fence_wait] *ERROR* fence(ffff880062eb7dc0:0x0031D384) 512ms timeout
Mar  3 10:20:03 banshee kernel: [27390.009804] [drm:radeon_fence_wait] *ERROR* last signaled fence(0x0031D384)
Mar  3 10:20:04 banshee kernel: [27391.517962] SysRq : Emergency Remount R/O


Maybe Google changed the video format which triggers a bug in the graphics driver?

Kind regards,


Matthias

#616301#25
Date:
2011-03-03 18:50:25 UTC
From:
To:
I forgot to mention that other websites serving video via Adobe Flash do not trigger the bug - at least those I tried (http://www.tagesschau.de/, http://www.vimeo.com/, http://www.myvideo.de/).

Kind regards,


Matthias

#616301#30
Date:
2011-03-04 12:51:59 UTC
From:
To:
Hi,

Same thing here, opening any youtube video since March 3 crashes the whole
system using iceweasel 3.5.16-4. The crash has been consistent across two
identical machines and started occurring yesterday. The fact that we didn't
upgrade Flash, combined with the fact that yesterday I noticed some issues* on
youtube on another machine (without radeon), leads me to the conclusion that
the bug is triggered by some change youtube did to their player.

* red and green tinting allover the video and misaligned image, as well as
  crashes of the plugin (only, not the browser or the system). This system has
  nothing to do with the affected systems (i915, x86, lenny).

Due to this bug actually DoS'ing the whole system by simply opening a URL, IMHO
it should be marked "critical" (and possibly reassinged to the kernel?).

The hardware we're having problems with, are two x86_64 systems with:

VGA compatible controller: ATI Technologies Inc RV620 LE [Radeon HD 3450]

graphics cards.

I have tested with both, 2.6.32-5-amd64 (2.6.32-30) and 2.6.37-2-amd64 from
sid (2.6.37-2) with the same result (crash). Also, I tried both
xserver-xorg-video-radeon 1:6.13.1-2+squeeze1 (squeeze) and 1:6.14.0-1
(sid) and both keep crashing.

The problem in my system manifests with flash plugin version 10.3 r162
(64-bit). Version 10.0 r42 (which was in /var/cache/flashplugin-nonfree) works
fine.


Following are the backtraces obtained with netconsole from 2.6.32 and
2.6.37:

2.6.32
======
[  826.144018] [drm:radeon_fence_wait] *ERROR* fence(ffff88010002a980:0x0000CA4C) 504ms timeout going
to reset GPU
[  826.144029] radeon 0000:01:00.0: GPU softreset
[  826.144033] radeon 0000:01:00.0:   R_008010_GRBM_STATUS=0xE57024E0
[  826.144038] radeon 0000:01:00.0:   R_008014_GRBM_STATUS2=0x00110103
[  826.144042] radeon 0000:01:00.0:   R_000E50_SRBM_STATUS=0x200000C0
[  826.144052] radeon 0000:01:00.0:   R_008020_GRBM_SOFT_RESET=0x00007FEE
[  826.144107] radeon 0000:01:00.0: R_008020_GRBM_SOFT_RESET=0x00000001
[  826.144166] radeon 0000:01:00.0:   R_000E60_SRBM_SOFT_RESET=0x00000402
(death and silence)


2.6.37
======
[  491.372026] radeon 0000:01:00.0: GPU lockup CP stall for more than 10000msec
[  491.372034] ------------[ cut here ]------------
[  491.372059] WARNING: at /build/buildd-linux-2.6_2.6.37-2-amd64-bITS0h/linux-2.6-2.6.37/debian/build
/source_amd64_none/drivers/gpu/drm/radeon/radeon_fence.c:244 radeon_fence_wait+0x235/0x2d3 [radeon]()
[  491.372065] Hardware name: OptiPlex 780
[  491.372068] GPU lockup (waiting for 0x00006E39 last fence id 0x00006E38)
[  491.372072] Modules linked in: gre netconsole configfs 8021q garp ppdev lp nf_conntrack_netlink nfn
etlink kvm_intel kvm nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_filter ip6_tables xt_pkttype xt_tcpudp
nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack iptable_filter ip_tables x_tables fuse bridge s
tp coretemp loop snd_hda_codec_analog snd_hda_intel snd_hda_codec snd_hwdep snd_pcm_oss snd_mixer_oss
snd_pcm radeon snd_seq_midi snd_rawmidi snd_seq_midi_event snd_seq snd_timer snd_seq_device snd soundc
ore ttm drm_kms_helper drm i2c_algo_bit snd_page_alloc power_supply shpchp parport_pc joydev tpm_tis t
pm dell_wmi sparse_keymap i2c_i801 tpm_bios processor wmi parport i2c_core thermal_sys dcdbas[  491.37
3400] radeon 0000:01:00.0: GPU softreset
[  491.373403] radeon 0000:01:00.0:   R_008010_GRBM_STATUS=0xE57024E0
[  491.373406] radeon 0000:01:00.0:   R_008014_GRBM_STATUS2=0x00110103
[  491.373408] radeon 0000:01:00.0:   R_000E50_SRBM_STATUS=0x200000C0
[  491.373416] radeon 0000:01:00.0:   R_008020_GRBM_SOFT_RESET=0x00007FEE
[  491.388299] radeon 0000:01:00.0: R_008020_GRBM_SOFT_RESET=0x00000001
[  491.404176] radeon 0000:01:00.0:   R_008010_GRBM_STATUS=0xA0003030
[  491.404179] radeon 0000:01:00.0:   R_008014_GRBM_STATUS2=0x00000003
[  491.404181] radeon 0000:01:00.0:   R_000E50_SRBM_STATUS=0x200080C0
[  491.405177] radeon 0000:01:00.0: GPU reset succeed
[  491.405200] BUG: unable to handle kernel paging request at ffffc900112013e0
[  491.405258] IP: [<ffffffffa046fa73>] rs600_gart_set_page+0x28/0x34 [radeon]
[  491.405304] PGD 12780d067 PUD 12780e067 PMD 11f44a067 PTE 0
[  491.405336] Oops: 0002 [#1] SMP
[  491.405354] last sysfs file: /sys/devices/virtual/dmi/id/chassis_asset_tag
[  491.405379] CPU 1
[  491.405388] Modules linked in: gre netconsole configfs 8021q garp ppdev lp nf_conntrack_netlink nfn
etlink kvm_intel kvm nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_filter ip6_tables xt_pkttype xt_tcpudp
nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack iptable_filter ip_tables x_tables fuse bridge s
tp coretemp loop snd_hda_codec_analog snd_hda_intel snd_hda_codec snd_hwdep snd_pcm_oss snd_mixer_oss
snd_pcm radeon snd_seq_midi snd_rawmidi snd_seq_midi_event snd_seq snd_timer snd_seq_device snd soundc
ore ttm drm_kms_helper drm i2c_algo_bit snd_page_alloc power_supply shpchp parport_pc joydev tpm_tis t
pm dell_wmi sparse_keymap i2c_i801 tpm_bios processor wmi parport i2c_core thermal_sys dcdbas pcspkr p
smouse evdev button pci_hotplug serio_raw ext3 libahci ehci_hcd libata usbcore scsi_mod e1000e nls_bas
e [last unloaded: ip_gre]
[  491.405920]
[  491.405928] Pid: 24, comm: kworker/1:1 Tainted: G        W   2.6.37-2-amd64 #1 Dell Inc. OptiPlex 7
80                 /0C27VV
[  491.405979] RIP: 0010:[<ffffffffa046fa73>]  [<ffffffffa046fa73>] rs600_gart_set_page+0x28/0x34 [rad
eon]
[  491.406020] RSP: 0018:ffff8801234a5d28  EFLAGS: 00010202
[  491.406040] RAX: ffffc90011200000 RBX: ffff88011fe96000 RCX: 0000000000000000
[  491.406064] RDX: 0000000037c11067 RSI: ffffc900112013e0 RDI: ffff88011fe96000
[  491.406087] RBP: 0000000000000000 R08: 0000000000000000 R09: ffff8801221aae00
[  491.406111] R10: 0000000000000286 R11: ffff8801221a96a8 R12: 000000000000027d
[  491.406135] R13: 000000000000027c R14: 000000000000027c R15: 0000000000000001
[  491.406158] FS:  0000000000000000(0000) GS:ffff8800cfa40000(0000) knlGS:0000000000000000
[  491.406186] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[  491.406206] CR2: ffffc900112013e0 CR3: 000000011fec4000 CR4: 00000000000406e0
[  491.406229] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  491.406253] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[  491.406277] Process kworker/1:1 (pid: 24, threadinfo ffff8801234a4000, task ffff8801234a8000)
[  491.406305] Stack:
[  491.406313]  ffffffffa044efaa ffff8801221a9648 ffff880122138b80 ffff8801208ba808
[  491.406349]  ffff8801221a9730 0000000000000002 0000000000000000 ffff8801221a9690
[  491.406385]  ffffffffa044d3cc ffff880120c8fd20 ffffffffa03b3aaa ffff8801221a9648
[  491.406419] Call Trace:
[  491.406439]  [<ffffffffa044efaa>] ? radeon_gart_unbind+0xec/0x11a [radeon]
[  491.406470]  [<ffffffffa044d3cc>] ? radeon_ttm_backend_unbind+0x14/0x1c [radeon]
[  491.406499]  [<ffffffffa03b3aaa>] ? ttm_tt_unbind+0x15/0x26 [ttm]
[  491.406524]  [<ffffffffa03b4668>] ? ttm_bo_cleanup_memtype_use+0x15/0x55 [ttm]
[  491.406551]  [<ffffffffa03b53bf>] ? ttm_bo_cleanup_refs+0x163/0x195 [ttm]
[  491.406575]  [<ffffffffa03b547c>] ? ttm_bo_delayed_delete+0x8b/0xfe [ttm]
[  491.406601]  [<ffffffffa03b54ef>] ? ttm_bo_delayed_workqueue+0x0/0x26 [ttm]
[  491.406626]  [<ffffffffa03b5501>] ? ttm_bo_delayed_workqueue+0x12/0x26 [ttm]
[  491.406652]  [<ffffffff8105b7e4>] ? process_one_work+0x1d1/0x2ee
[  491.406676]  [<ffffffff8105d273>] ? worker_thread+0x12d/0x247
[  491.406696]  [<ffffffff8105d146>] ? worker_thread+0x0/0x247
[  491.406715]  [<ffffffff8105d146>] ? worker_thread+0x0/0x247
[  491.406735]  [<ffffffff8106012f>] ? kthread+0x7a/0x82
[  491.406753]  [<ffffffff8100a824>] ? kernel_thread_helper+0x4/0x10
[  491.406775]  [<ffffffff810600b5>] ? kthread+0x0/0x82
[  491.406793]  [<ffffffff8100a820>] ? kernel_thread_helper+0x0/0x10
[  491.406814] Code: 5b ff e0 85 f6 48 8b 87 58 03 00 00 78 23 3b b7 40 03 00 00 77 1b c1 e6 03 48 81
e2 00 f0 ff ff 48 63 f6 48 83 ca 67 48 8d 34 30 <48> 89 16 31 c0 c3 b8 ea ff ff ff c3 41 54 49 [  491.
407104] CR2: ffffc900112013e0

Regards,
Apollon

#616301#33
Date:
2011-03-04 14:36:17 UTC
From:
To:
Which versions of libgl1-mesa-dri have you tried?
#616301#36
Date:
2011-03-04 14:51:45 UTC
From:
To:
I was using the squeeze versions:

libdrm-radeon1 2.4.21-1~squeeze3
libgl1-mesa-dri 7.7.1-4

After upgrading to the current versions in sid:

libdrm-radeon1 2.4.23-3
libgl1-mesa-dri 7.10-4

it seems to be stable. I got a flash plugin crash once (out of 4 attempts), but
the system survived.

#616301#39
Date:
2011-03-04 16:36:21 UTC
From:
To:
So far, so good. If you can try 7.10-2 (which was still shipping the
classic r600 driver, whereas 7.10-4 is shipping the Gallium based one)
or other versions in between as well, that would be interesting.

#616301#44
Date:
2011-03-04 19:01:59 UTC
From:
To:
severity 616301 critical
thanks

My system locks up whenever I click on a YouTube video link since
yesterday. I can probably live without YouTube :), but in any case this
shouldn't happen.

This isn't a singled out case nor in exotic, possibly faulty, hardware.
It's on a standard 1½-year old Dell OptiPlex 780 desktop with a Radeon
HD card (one of the standard configurations) and this is on a stock
squeeze system.

The findings so far seem to suggest this is a Mesa issue; I'd probably
file it under "Linux kernel bugs" (or even DoS bugs) but I'm not sure
where to properly file such bugs in the post-KMS stack world.

Regards,
Faidon

#616301#51
Date:
2011-03-04 19:16:58 UTC
From:
To:
severity 616301 important
thanks

Faidon Liambotis <paravoid@debian.org> (04/03/2011):

No…

Plenty of other things shouldn't happen. That doesn't make it a
critical bug.

KiBi.

#616301#56
Date:
2011-03-04 20:04:29 UTC
From:
To:
With packages

libdrm-radeon1_2.4.23-2
libgl1-mesa-dri_7.10-2
libtalloc2_2.0.5-1

from http://snapshot.debian.org/archive/debian/20110210T084921Z/
iceweasel sometimes freezes when viewing youtube videos but so far this
doesn't trigger gpu softresets.

Kind regards,


Matthias

#616301#61
Date:
2011-03-04 21:44:45 UTC
From:
To:
Hi,

Herber Sylwester <slyher@oomkill.net> (03/03/2011):

hmm, nothing seems related. Any chance you upgraded something in the
kernel/X stack before that, and only restarted your X or rebooted
lately?

KiBi.

#616301#66
Date:
2011-03-04 21:58:00 UTC
From:
To:
(Trying to gather everyone in To/Cc.)

Hi,

Apollon Oikonomopoulos <apoikos@gmail.com> (04/03/2011):

while you're at it, could you try switching off KMS? That should get
you a working system in the interim.

If you have some more time, you might want to check what happens with
2.6.38rc* from experimental.
[…]

Thanks. To reply Faidon's question on IRC, I guess we might want to
either reassign that bug to the kernel (affecting the driver + the
server, to make sure it gets noticed) and/or to clone it for now,
until further investigation happens. Michel, any preference?

KiBi.

#616301#71
Date:
2011-03-05 01:50:39 UTC
From:
To:
No, not unless it will affect a large proportion of users.

If there is a kernel driver involved then it should be assigned to the
kernel.  Even without KMS, a Mesa driver should be considered untrusted
and should not be able to trigger a crash or hang.  With KMS, this
applies to the X driver too.

Ben.

#616301#76
Date:
2011-03-05 08:30:05 UTC
From:
To:
Hi,

Same thing here ... with basically same constellation in terms of
packages and update timing.

For info:

Work around on Youtube: Disable HW accel in flashplayer (Right-click /
Settings on some none-crashing flashplayer instance (some other
web-site) using iceweasel e.g.).

So it might be maybe radeon, video HW accel related ... !?

G+, Oliver

#616301#81
Date:
2011-03-05 13:06:22 UTC
From:
To:
Hi there.

W dniu 04.03.2011 22:44, Cyril Brulebois pisze:


Mayby the upgrade was just a time related coincidence.
After the upgrade i did niether reboot the machine nor restart the X-server.
I remeber that 2nd I've watched few Youtube videos and it was fine.
Don't know if attaching history.log for february will do any good.
I did not do any manual enhancements to the system especially the kernel
part.
Regards,
Herber Sylwester.

#616301#86
Date:
2011-03-05 15:23:51 UTC
From:
To:
Hi there.
W dniu 04.03.2011 22:44, Cyril Brulebois pisze:

Start-Date: 2011-03-05  16:03:36
Commandline: apt-get dist-upgrade
Install: libglib2.0-bin:i386 (2.28.1-1+b1, automatic), gcc-4.5-base:i386
(4.5.2-4, automatic), libboost-program-options1.42.0:i386 (1.42.0-4,
automatic), libsoundtouch0:i386 (1.5.0-4, automatic)
Upgrade: browser-plugin-gnash:i386 (0.8.8-9, 0.8.9~git20110220-1),
libjna-java:i386 (3.2.4-2, 3.2.7-1), libstdc++6:i386 (4.4.5-12,
4.5.2-4), libwildmidi1:i386 (0.2.3.2-2, 0.2.3.4-1), rpm2cpio:i386
(4.8.1-6, 4.8.1-7), librpmbuild1:i386 (4.8.1-6, 4.8.1-7), libmpfr4:i386
(3.0.0-2, 3.0.0-7), python-mako:i386 (0.3.6-1, 0.4.0-1),
libportaudio2:i386 (19+svn20071022-3.2, 19+svn20101113-3), xsane:i386
(0.997-2+b1, 0.998-1), mobile-broadband-provider-info:i386 (20101106-1,
20110218-1), libk3b6:i386 (2.0.2-1, 2.0.2-1+b1), libglib2.0-dev:i386
(2.24.2-1, 2.28.1-1+b1), libutempter0:i386 (1.1.5-3, 1.1.5-4),
hplip-cups:i386 (3.10.6-2, 3.11.1-2), libx11-data:i386 (1.4.1-4,
1.4.1-5), libgfortran3:i386 (4.4.5-12, 4.5.2-4), iputils-ping:i386
(20100418-3, 20101006-1), hpijs:i386 (3.10.6-2, 3.11.1-2), hplip:i386
(3.10.6-2, 3.11.1-2), libsqlite3-dev:i386 (3.7.4-2, 3.7.5-1),
librpmio1:i386 (4.8.1-6, 4.8.1-7), librpm1:i386 (4.8.1-6, 4.8.1-7),
rpm-common:i386 (4.8.1-6, 4.8.1-7), libgomp1:i386 (4.4.5-12, 4.5.2-4),
libpcre3:i386 (8.02-1.1, 8.12-3), libx11-xcb1:i386 (1.4.1-4, 1.4.1-5),
libsqlite3-0:i386 (3.7.4-2, 3.7.5-1), k3b:i386 (2.0.2-1, 2.0.2-1+b1),
libgcc1:i386 (4.4.5-12, 4.5.2-4), konqueror-plugin-gnash:i386 (0.8.8-9,
0.8.9~git20110220-1), libhpmud0:i386 (3.10.6-2, 3.11.1-2), w3m:i386
(0.5.3-1, 0.5.3-2), mozilla-plugin-gnash:i386 (0.8.8-9,
0.8.9~git20110220-1), python-pysqlite2:i386 (2.6.0-1, 2.6.3-1),
libglib2.0-data:i386 (2.24.2-1, 2.28.1-1), libfreetype6-dev:i386
(2.4.2-2.1, 2.4.4-1), xdg-utils:i386 (1.1.0~rc1-1, 1.1.0~rc1-2),
rpm:i386 (4.8.1-6, 4.8.1-7), gstreamer0.10-plugins-bad:i386
(0.10.19-2+b2, 0.10.19-2.1), libgcc1-dbg:i386 (4.4.5-12, 4.5.2-4),
xsane-common:i386 (0.997-2, 0.998-1), libemail-mime-perl:i386 (1.906-1,
1.907-1), libfreetype6:i386 (2.4.2-2.1, 2.4.4-1), gnash:i386 (0.8.8-9,
0.8.9~git20110220-1), libglib2.0-0:i386 (2.24.2-1, 2.28.1-1+b1),
libk3b6-extracodecs:i386 (2.0.2-1, 2.0.2-1+b1), libsane-hpaio:i386
(3.10.6-2, 3.11.1-2), libx11-6:i386 (1.4.1-4, 1.4.1-5),
pkg-kde-tools:i386 (0.9.3, 0.9.5), klash:i386 (0.8.8-9,
0.8.9~git20110220-1), libx11-dev:i386 (1.4.1-4, 1.4.1-5), libobjc2:i386
(4.4.5-12, 4.5.2-4), gnash-common:i386 (0.8.8-9, 0.8.9~git20110220-1),
sqlite3:i386 (3.7.4-2, 3.7.5-1), binfmt-support:i386 (2.0.2, 2.0.3),
eject:i386 (2.1.5+deb1+cvs20081104-7.1, 2.1.5+deb1+cvs20081104-8),
libmms0:i386 (0.6.2-1, 0.6.2-2), hplip-data:i386 (3.10.6-2, 3.11.1-2)
End-Date: 2011-03-05  16:06:20

Did not help either.
I'm starting to think it may be caussed by hardware malfunction. (wild
goose chase?)
I've noticed that system does not hang - I'm able to go to a console
login and reboot. The screen ends 10cm below the monitor so i have to
gues some things. If it would be helpful I may try to record the incident.
Sincerely,
Herber Sylwester.

#616301#89
Date:
2011-03-06 10:57:13 UTC
From:
To:
Good news, it might be possible to isolate an upstream fix for the
classic r600 driver and backport it to squeeze then. Can you try doing
that with git bisect?

#616301#92
Date:
2011-03-06 13:18:35 UTC
From:
To:
I'll look into it tomorrow - then I hopefully find time to do it.

If someone else also wants to work on this simultaneously or needs a workaround for squeeze:
Simply updating to "libgl1-mesa-dri_7.8.1-2" from
http://snapshot.debian.org/package/mesa/7.8.1-2/#libgl1-mesa-dri_7.8.1-2
should fix the issue.

Kind regards,


Matthias

#616301#97
Date:
2011-03-06 18:08:49 UTC
From:
To:
With or without KMS, the userspace acceleration drivers can certainly
cause GPU hangs if the 3D engine is programmed with some combination
of commands it doesn't like.

Alex

#616301#102
Date:
2011-03-06 18:36:21 UTC
From:
To:
You can't solve the halting problem but you can implement a watchdog,
can't you?

Ben.

#616301#107
Date:
2011-03-07 00:49:23 UTC
From:
To:
I haven't heard of many chips that won't hang given the wrong
instructsion whether it's GPU or keyboard controller.  Sounds like more
than a driver issue but a choice of driver issue.  How are you going to
have it both ways without an ammount of care you have no time for?

having interrupting access / watchdog is nice if your driver can do that

Ben Hutchings wrote:

#616301#112
Date:
2011-03-07 01:02:42 UTC
From:
To:
Of course.  This is why the kernel driver filters the commands going to
the GPU - the commands come from unprivileged applications (the Mesa
driver is just a shared library) and should not be trusted.
[...]

Don't top-post.

Ben.

#616301#117
Date:
2011-03-07 02:10:18 UTC
From:
To:
of course if you ask and have that command.  just ask I'm actually not
planning on being in the channel long.  someone might find me!  he he.

Question.  can I ask what top-post is?  I will look it up too.  debian
rules are getting rather long to even have heard about even having read
a good part of them once.

Ben Hutchings wrote:

#616301#122
Date:
2011-03-07 09:05:45 UTC
From:
To:
We have lockup detection and asic reset support, but depending on the
lockup it may or may not be able to successfully reset the asic.
Also, as for the command buffer checking, we try to protect against
basic stupidity, but the chips are just too complex to check for every
possible scenario that might cause a hang.

Alex

#616301#127
Date:
2011-03-07 10:21:26 UTC
From:
To:
We do, we reset the GPU 10s after it hangs, but this depends on a lot
of things going our way.

Occasionally we do reset the GPU when we shouldn't as well.

However if there is an issue in the kernel, ddx or mesa driver,
constants resets will pretty much DoS the GPU.

Dave.

#616301#132
Date:
2011-03-07 13:36:57 UTC
From:
To:
No, it seems clear at this point that it's a youtube change triggering a
pre-existing bug in the Mesa r600 driver.

Might be nice for the sake of completeness, though most likely it'll
just be generic symptoms from GPU lockups, which unfortunately don't say
much about what triggered them. The most promising approach at this
point seems bisecting which Mesa upstream change fixed the problem.

#616301#135
Date:
2011-03-08 18:06:13 UTC
From:
To:
Hi,

I've been trying to bisect upstream between mesa_7_7_1 and mesa-7.8,
without any success. A large part of the commit tree for r600 between
7.7.1 and 7.8 FTBFS with any debian libdrm-dev version I tried since
2.4.15-1, stating:

  In file included from radeon_common.h:4,
                   from radeon_screen.c:49:
  radeon_common_context.h:405: error: array type has incomplete element type

I've searched around a bit and this seems to be due to some libdrm API
changes.

This is taking too much time, I can't do this remotely since I have to
be able to reboot the machine and furthermore, this is my work machine.
Unless someone can provide some advice regarding the build process, I'll
have to give up :(.

Regards,
Apollon

#616301#138
Date:
2011-03-09 12:39:42 UTC
From:
To:
Can you provide a specific mesa commit hash where this happens?
#616301#141
Date:
2011-03-09 12:55:09 UTC
From:
To:

For example, a1b9c4e22a83d2125f66c3a3af3143bc0daee9a4 (2nd bisection
step) and the whole area around it.

For the record, the following mesa commits seem to make r600 conform to
the new API:

$ git log --oneline origin/master | grep "new libdrm_radeon api"
bd9e0eb radeon/r600: use new libdrm_radeon api
9373287 radeon/r600: use new libdrm_radeon api
b065aec radeon/r600: use new libdrm_radeon api
(Basically the same commit on different branches)

however, they are not cleanly backportable to the affected commits I
encountered.

Debian libdrm-dev versions prior to 2.4.18 do not ship
/usr/include/drm/drm.h which is needed, and versions from 2.4.18
(included) and onwards fail with the above message. Any hints as to the
version of libdrm required to compile these revisions?

Thanks,
Apollon

#616301#144
Date:
2011-03-09 13:18:24 UTC
From:
To:
You could try an older version of libdrm-dev with /usr/include/drm/drm.h
manually copied from a newer version. It should be backwards compatible.

#616301#147
Date:
2011-03-09 13:45:43 UTC
From:
To:
Hi,

I've encountered the same problem. :/
Seems like we need libdrm-dev 2.4.16 or some other version before that.

But I also tested libgl1-mesa-dri 7.6.1 from snapshot.debian.org which
works with current youtube videos and doesn't even crash Adobe Flash
player and Iceweasel when opening more than one such tab (unlike
libgl1-mesa-dri >= 7.8), so I'll try to bisect from that point to 7.7 too.
Apparently this gpu softreset issue is specific to the mesa 7.7
development line.

I attached my git-bisect logs for reference.
(git://git.debian.org/git/pkg-xorg/lib/mesa)

By the way:
The 'proper fix' in commit 5997501ca0d0c905025bc2a840e48e2176d64ea3
has only been partly applied to mesa-7.7.1 ... that seems odd.

Kind regards,

Matthias

#616301#152
Date:
2011-03-13 02:35:33 UTC
From:
To:
Note that the bug in question causes a *kernel* lockup, not just a GPU
hang. As indicated by the output in the bug log¹, we're getting a “BUG:
unable to handle kernel paging request” after a GPU reset on 2.6.37,
while a silent lockup on 2.6.32.

Regards,
Faidon

¹: http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=616301#30