#616301 xserver-xorg-video-radeon:screen goes black, system hangs after 2sec:[youtube(FF/Opera)-reset req.] #616301
- Package:
- xserver-xorg-video-radeon
- Source:
- xserver-xorg-video-ati
- Description:
- X.Org X server -- AMD/ATI Radeon display driver
- Submitter:
- slyher
- Date:
- 2011-10-06 07:24:03 UTC
- Severity:
- important
Hi there. After updating the packages 2. March 2011 playing youtube videos got somehow imposible. Screen goes black after movie plays for 2-4 seconds, sound may continue for few seconds and also stops. I tried to login to a console but after few tries screen started blinking white and console 'froze'. The white color stayed- I've powered down the machine the hard way. Tried on different browsers Firefox and opera with same effect. I've attached information. If more information needed dont hesitate to write. Sincerely Slyher.
What packages were upgraded from/to which versions?
W dniu 03.03.2011 12:24, Michel Dänzer pisze: Hi there. What I can tell from /var/log/apt/history.log Start-Date: 2011-03-01 17:27:00 Commandline: apt-get dist-upgrade Upgrade: foomatic-db-engine:i386 (4.0.4-3, 4.0.7-1), dnsmasq-base:i386 (2.55-2+b1, 2.57-1), python-markupsafe:i386 (0.9.2-3, 0.12-1), musescore-common:i386 (0.9.6+dfsg-1, 1.0+dfsg-2), musescore:i386 (0.9.6+dfsg-1, 1.0+dfsg-2), gpgv:i386 (1.4.10-4, 1.4.11-3), foomatic-filters:i386 (4.0.5-6, 4.0.7-1), foomatic-filters-ppds:i386 (4.0.4-3, 4.0.7-1), musescore-soundfont-gm:i386 (0.9.6+dfsg-1, 1.0+dfsg-2), gnupg:i386 (1.4.10-4, 1.4.11-3) End-Date: 2011-03-01 17:28:29 Start-Date: 2011-03-02 09:03:39 Commandline: apt-get dist-upgrade Install: libclass-load-perl:i386 (0.06-1, automatic) Upgrade: desktop-base:i386 (6.0.5, 6.0.6), libcupscgi1:i386 (1.4.5-3, 1.4.6-1), cups-client:i386 (1.4.5-3, 1.4.6-1), libcupsmime1:i386 (1.4.5-3, 1.4.6-1), libnspr4-0d:i386 (4.8.6-1, 4.8.7-2), cups-ppdc:i386 (1.4.5-3, 1.4.6-1), libcupsppdc1:i386 (1.4.5-3, 1.4.6-1), cups-common:i386 (1.4.5-3, 1.4.6-1), libcups2:i386 (1.4.5-3, 1.4.6-1), libgsf-1-common:i386 (1.14.19-2, 1.14.19-3), cups:i386 (1.4.5-3, 1.4.6-1), libcupsdriver1:i386 (1.4.5-3, 1.4.6-1), libgdata7:i386 (0.6.4-2, 0.6.4-3), cups-bsd:i386 (1.4.5-3, 1.4.6-1), libcupsimage2:i386 (1.4.5-3, 1.4.6-1), libdatetime-timezone-perl:i386 (1.23-1+2010n, 1.28-1+2011b), libgsf-1-114:i386 (1.14.19-2, 1.14.19-3), libgdata-common:i386 (0.6.4-2, 0.6.4-3) End-Date: 2011-03-02 09:05:00
Hi. I'm experiencing the same problem, i.e. whenever I open a youtube.com website with video content my system hangs (black screen and monitor reports/gives warning of wrong input frequencies). Until at least two days ago I could play youtube videos without issues using iceweasel with Adobe Flash Player (flashplugin-nonfree); but not today. I also run 'aptitude update && aptitude upgrade' regularly but I doubt the following packages have something to do with this. /var/log/apt/history.log: Start-Date: 2011-03-01 03:11:17 Upgrade: libsmbclient:amd64 (3.5.6~dfsg-3, 3.5.6~dfsg-3squeeze2), smbclient:amd64 (3.5.6~dfsg-3, 3.5.6~dfsg-3squeeze2), libwbclient0:amd64 (3.5.6~dfsg-3, 3.5.6~dfsg-3squeeze2), libavahi-glib1:amd64 (0.6.27-2, 0.6.27-2+squeeze1), libavahi-common-data:amd64 (0.6.27-2, 0.6.27-2+squeeze1), libavahi-core7:amd64 (0.6.27-2, 0.6.27-2+squeeze1), samba-common:amd64 (3.5.6~dfsg-3, 3.5.6~dfsg-3squeeze2), avahi-dnsconfd:amd64 (0.6.27-2, 0.6.27-2+squeeze1), avahi-daemon:amd64 (0.6.27-2, 0.6.27-2+squeeze1), libavahi-client3:amd64 (0.6.27-2, 0.6.27-2+squeeze1), samba-common-bin:amd64 (3.5.6~dfsg-3, 3.5.6~dfsg-3squeeze2), libavahi-common3:amd64 (0.6.27-2, 0.6.27-2+squeeze1) End-Date: 2011-03-01 03:11:24 Start-Date: 2011-03-03 02:39:51 Upgrade: libpango1.0-common:amd64 (1.28.3-1+squeeze1, 1.28.3-1+squeeze2), libpango1.0-0:amd64 (1.28.3-1+squeeze1, 1.28.3-1+squeeze2) End-Date: 2011-03-03 02:39:57 More interesting are the syslog entries when hitting the bug (abbreviated). /var/log/syslog: Mar 3 10:17:01 banshee /USR/SBIN/CRON[9664]: (root) CMD ( cd / && run-parts --report /etc/cron.hourly) Mar 3 10:18:29 banshee kernel: [27296.676053] [drm:radeon_fence_wait] *ERROR* fence(ffff880061522f40:0x0031D254) 508ms timeout going to reset GPU Mar 3 10:18:29 banshee kernel: [27296.676064] radeon 0000:01:05.0: GPU softreset Mar 3 10:18:29 banshee kernel: [27296.676070] radeon 0000:01:05.0: R_008010_GRBM_STATUS=0xE77324AC Mar 3 10:18:29 banshee kernel: [27296.676076] radeon 0000:01:05.0: R_008014_GRBM_STATUS2=0x00110103 Mar 3 10:18:29 banshee kernel: [27296.676081] radeon 0000:01:05.0: R_000E50_SRBM_STATUS=0x20001040 Mar 3 10:18:29 banshee kernel: [27296.676090] radeon 0000:01:05.0: R_008020_GRBM_SOFT_RESET=0x00007FEE Mar 3 10:18:29 banshee kernel: [27296.676146] radeon 0000:01:05.0: R_008020_GRBM_SOFT_RESET=0x00000001 Mar 3 10:18:29 banshee kernel: [27296.676204] radeon 0000:01:05.0: R_000E60_SRBM_SOFT_RESET=0x00000402 Mar 3 10:18:29 banshee kernel: [27296.676360] radeon 0000:01:05.0: R_008010_GRBM_STATUS=0x00003030 Mar 3 10:18:29 banshee kernel: [27296.676365] radeon 0000:01:05.0: R_008014_GRBM_STATUS2=0x00000003 Mar 3 10:18:29 banshee kernel: [27296.676370] radeon 0000:01:05.0: R_000E50_SRBM_STATUS=0x20000040 <-- message loop (trying to softreset endlessly) --> Mar 3 10:20:00 banshee kernel: [27387.493788] [drm:radeon_fence_wait] *ERROR* fence(ffff880062eb7540:0x0031D382) 516ms timeout Mar 3 10:20:00 banshee kernel: [27387.493797] [drm:radeon_fence_wait] *ERROR* last signaled fence(0x0031D382) Mar 3 10:20:03 banshee kernel: [27390.008063] [drm:radeon_fence_wait] *ERROR* fence(ffff880062eb7dc0:0x0031D384) 504ms timeout going to reset GPU Mar 3 10:20:03 banshee kernel: [27390.008076] radeon 0000:01:05.0: GPU softreset Mar 3 10:20:03 banshee kernel: [27390.008082] radeon 0000:01:05.0: R_008010_GRBM_STATUS=0xA0003030 Mar 3 10:20:03 banshee kernel: [27390.008089] radeon 0000:01:05.0: R_008014_GRBM_STATUS2=0x00000003 Mar 3 10:20:03 banshee kernel: [27390.008096] radeon 0000:01:05.0: R_000E50_SRBM_STATUS=0x20000040 Mar 3 10:20:03 banshee kernel: [27390.008106] radeon 0000:01:05.0: R_008020_GRBM_SOFT_RESET=0x00007FEE Mar 3 10:20:03 banshee kernel: [27390.008164] radeon 0000:01:05.0: R_008020_GRBM_SOFT_RESET=0x00000001 Mar 3 10:20:03 banshee kernel: [27390.008224] radeon 0000:01:05.0: R_000E60_SRBM_SOFT_RESET=0x00000402 Mar 3 10:20:03 banshee kernel: [27390.008382] radeon 0000:01:05.0: R_008010_GRBM_STATUS=0x00003030 Mar 3 10:20:03 banshee kernel: [27390.008392] radeon 0000:01:05.0: R_008014_GRBM_STATUS2=0x00000003 Mar 3 10:20:03 banshee kernel: [27390.008401] radeon 0000:01:05.0: R_000E50_SRBM_STATUS=0x20000040 Mar 3 10:20:03 banshee kernel: [27390.009795] [drm:radeon_fence_wait] *ERROR* fence(ffff880062eb7dc0:0x0031D384) 512ms timeout Mar 3 10:20:03 banshee kernel: [27390.009804] [drm:radeon_fence_wait] *ERROR* last signaled fence(0x0031D384) Mar 3 10:20:04 banshee kernel: [27391.517962] SysRq : Emergency Remount R/O Maybe Google changed the video format which triggers a bug in the graphics driver? Kind regards, Matthias
I forgot to mention that other websites serving video via Adobe Flash do not trigger the bug - at least those I tried (http://www.tagesschau.de/, http://www.vimeo.com/, http://www.myvideo.de/). Kind regards, Matthias
Hi, Same thing here, opening any youtube video since March 3 crashes the whole system using iceweasel 3.5.16-4. The crash has been consistent across two identical machines and started occurring yesterday. The fact that we didn't upgrade Flash, combined with the fact that yesterday I noticed some issues* on youtube on another machine (without radeon), leads me to the conclusion that the bug is triggered by some change youtube did to their player. * red and green tinting allover the video and misaligned image, as well as crashes of the plugin (only, not the browser or the system). This system has nothing to do with the affected systems (i915, x86, lenny). Due to this bug actually DoS'ing the whole system by simply opening a URL, IMHO it should be marked "critical" (and possibly reassinged to the kernel?). The hardware we're having problems with, are two x86_64 systems with: VGA compatible controller: ATI Technologies Inc RV620 LE [Radeon HD 3450] graphics cards. I have tested with both, 2.6.32-5-amd64 (2.6.32-30) and 2.6.37-2-amd64 from sid (2.6.37-2) with the same result (crash). Also, I tried both xserver-xorg-video-radeon 1:6.13.1-2+squeeze1 (squeeze) and 1:6.14.0-1 (sid) and both keep crashing. The problem in my system manifests with flash plugin version 10.3 r162 (64-bit). Version 10.0 r42 (which was in /var/cache/flashplugin-nonfree) works fine. Following are the backtraces obtained with netconsole from 2.6.32 and 2.6.37: 2.6.32 ====== [ 826.144018] [drm:radeon_fence_wait] *ERROR* fence(ffff88010002a980:0x0000CA4C) 504ms timeout going to reset GPU [ 826.144029] radeon 0000:01:00.0: GPU softreset [ 826.144033] radeon 0000:01:00.0: R_008010_GRBM_STATUS=0xE57024E0 [ 826.144038] radeon 0000:01:00.0: R_008014_GRBM_STATUS2=0x00110103 [ 826.144042] radeon 0000:01:00.0: R_000E50_SRBM_STATUS=0x200000C0 [ 826.144052] radeon 0000:01:00.0: R_008020_GRBM_SOFT_RESET=0x00007FEE [ 826.144107] radeon 0000:01:00.0: R_008020_GRBM_SOFT_RESET=0x00000001 [ 826.144166] radeon 0000:01:00.0: R_000E60_SRBM_SOFT_RESET=0x00000402 (death and silence) 2.6.37 ====== [ 491.372026] radeon 0000:01:00.0: GPU lockup CP stall for more than 10000msec [ 491.372034] ------------[ cut here ]------------ [ 491.372059] WARNING: at /build/buildd-linux-2.6_2.6.37-2-amd64-bITS0h/linux-2.6-2.6.37/debian/build /source_amd64_none/drivers/gpu/drm/radeon/radeon_fence.c:244 radeon_fence_wait+0x235/0x2d3 [radeon]() [ 491.372065] Hardware name: OptiPlex 780 [ 491.372068] GPU lockup (waiting for 0x00006E39 last fence id 0x00006E38) [ 491.372072] Modules linked in: gre netconsole configfs 8021q garp ppdev lp nf_conntrack_netlink nfn etlink kvm_intel kvm nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_filter ip6_tables xt_pkttype xt_tcpudp nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack iptable_filter ip_tables x_tables fuse bridge s tp coretemp loop snd_hda_codec_analog snd_hda_intel snd_hda_codec snd_hwdep snd_pcm_oss snd_mixer_oss snd_pcm radeon snd_seq_midi snd_rawmidi snd_seq_midi_event snd_seq snd_timer snd_seq_device snd soundc ore ttm drm_kms_helper drm i2c_algo_bit snd_page_alloc power_supply shpchp parport_pc joydev tpm_tis t pm dell_wmi sparse_keymap i2c_i801 tpm_bios processor wmi parport i2c_core thermal_sys dcdbas[ 491.37 3400] radeon 0000:01:00.0: GPU softreset [ 491.373403] radeon 0000:01:00.0: R_008010_GRBM_STATUS=0xE57024E0 [ 491.373406] radeon 0000:01:00.0: R_008014_GRBM_STATUS2=0x00110103 [ 491.373408] radeon 0000:01:00.0: R_000E50_SRBM_STATUS=0x200000C0 [ 491.373416] radeon 0000:01:00.0: R_008020_GRBM_SOFT_RESET=0x00007FEE [ 491.388299] radeon 0000:01:00.0: R_008020_GRBM_SOFT_RESET=0x00000001 [ 491.404176] radeon 0000:01:00.0: R_008010_GRBM_STATUS=0xA0003030 [ 491.404179] radeon 0000:01:00.0: R_008014_GRBM_STATUS2=0x00000003 [ 491.404181] radeon 0000:01:00.0: R_000E50_SRBM_STATUS=0x200080C0 [ 491.405177] radeon 0000:01:00.0: GPU reset succeed [ 491.405200] BUG: unable to handle kernel paging request at ffffc900112013e0 [ 491.405258] IP: [<ffffffffa046fa73>] rs600_gart_set_page+0x28/0x34 [radeon] [ 491.405304] PGD 12780d067 PUD 12780e067 PMD 11f44a067 PTE 0 [ 491.405336] Oops: 0002 [#1] SMP [ 491.405354] last sysfs file: /sys/devices/virtual/dmi/id/chassis_asset_tag [ 491.405379] CPU 1 [ 491.405388] Modules linked in: gre netconsole configfs 8021q garp ppdev lp nf_conntrack_netlink nfn etlink kvm_intel kvm nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_filter ip6_tables xt_pkttype xt_tcpudp nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack iptable_filter ip_tables x_tables fuse bridge s tp coretemp loop snd_hda_codec_analog snd_hda_intel snd_hda_codec snd_hwdep snd_pcm_oss snd_mixer_oss snd_pcm radeon snd_seq_midi snd_rawmidi snd_seq_midi_event snd_seq snd_timer snd_seq_device snd soundc ore ttm drm_kms_helper drm i2c_algo_bit snd_page_alloc power_supply shpchp parport_pc joydev tpm_tis t pm dell_wmi sparse_keymap i2c_i801 tpm_bios processor wmi parport i2c_core thermal_sys dcdbas pcspkr p smouse evdev button pci_hotplug serio_raw ext3 libahci ehci_hcd libata usbcore scsi_mod e1000e nls_bas e [last unloaded: ip_gre] [ 491.405920] [ 491.405928] Pid: 24, comm: kworker/1:1 Tainted: G W 2.6.37-2-amd64 #1 Dell Inc. OptiPlex 7 80 /0C27VV [ 491.405979] RIP: 0010:[<ffffffffa046fa73>] [<ffffffffa046fa73>] rs600_gart_set_page+0x28/0x34 [rad eon] [ 491.406020] RSP: 0018:ffff8801234a5d28 EFLAGS: 00010202 [ 491.406040] RAX: ffffc90011200000 RBX: ffff88011fe96000 RCX: 0000000000000000 [ 491.406064] RDX: 0000000037c11067 RSI: ffffc900112013e0 RDI: ffff88011fe96000 [ 491.406087] RBP: 0000000000000000 R08: 0000000000000000 R09: ffff8801221aae00 [ 491.406111] R10: 0000000000000286 R11: ffff8801221a96a8 R12: 000000000000027d [ 491.406135] R13: 000000000000027c R14: 000000000000027c R15: 0000000000000001 [ 491.406158] FS: 0000000000000000(0000) GS:ffff8800cfa40000(0000) knlGS:0000000000000000 [ 491.406186] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b [ 491.406206] CR2: ffffc900112013e0 CR3: 000000011fec4000 CR4: 00000000000406e0 [ 491.406229] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 491.406253] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 [ 491.406277] Process kworker/1:1 (pid: 24, threadinfo ffff8801234a4000, task ffff8801234a8000) [ 491.406305] Stack: [ 491.406313] ffffffffa044efaa ffff8801221a9648 ffff880122138b80 ffff8801208ba808 [ 491.406349] ffff8801221a9730 0000000000000002 0000000000000000 ffff8801221a9690 [ 491.406385] ffffffffa044d3cc ffff880120c8fd20 ffffffffa03b3aaa ffff8801221a9648 [ 491.406419] Call Trace: [ 491.406439] [<ffffffffa044efaa>] ? radeon_gart_unbind+0xec/0x11a [radeon] [ 491.406470] [<ffffffffa044d3cc>] ? radeon_ttm_backend_unbind+0x14/0x1c [radeon] [ 491.406499] [<ffffffffa03b3aaa>] ? ttm_tt_unbind+0x15/0x26 [ttm] [ 491.406524] [<ffffffffa03b4668>] ? ttm_bo_cleanup_memtype_use+0x15/0x55 [ttm] [ 491.406551] [<ffffffffa03b53bf>] ? ttm_bo_cleanup_refs+0x163/0x195 [ttm] [ 491.406575] [<ffffffffa03b547c>] ? ttm_bo_delayed_delete+0x8b/0xfe [ttm] [ 491.406601] [<ffffffffa03b54ef>] ? ttm_bo_delayed_workqueue+0x0/0x26 [ttm] [ 491.406626] [<ffffffffa03b5501>] ? ttm_bo_delayed_workqueue+0x12/0x26 [ttm] [ 491.406652] [<ffffffff8105b7e4>] ? process_one_work+0x1d1/0x2ee [ 491.406676] [<ffffffff8105d273>] ? worker_thread+0x12d/0x247 [ 491.406696] [<ffffffff8105d146>] ? worker_thread+0x0/0x247 [ 491.406715] [<ffffffff8105d146>] ? worker_thread+0x0/0x247 [ 491.406735] [<ffffffff8106012f>] ? kthread+0x7a/0x82 [ 491.406753] [<ffffffff8100a824>] ? kernel_thread_helper+0x4/0x10 [ 491.406775] [<ffffffff810600b5>] ? kthread+0x0/0x82 [ 491.406793] [<ffffffff8100a820>] ? kernel_thread_helper+0x0/0x10 [ 491.406814] Code: 5b ff e0 85 f6 48 8b 87 58 03 00 00 78 23 3b b7 40 03 00 00 77 1b c1 e6 03 48 81 e2 00 f0 ff ff 48 63 f6 48 83 ca 67 48 8d 34 30 <48> 89 16 31 c0 c3 b8 ea ff ff ff c3 41 54 49 [ 491. 407104] CR2: ffffc900112013e0 Regards, Apollon
Which versions of libgl1-mesa-dri have you tried?
I was using the squeeze versions: libdrm-radeon1 2.4.21-1~squeeze3 libgl1-mesa-dri 7.7.1-4 After upgrading to the current versions in sid: libdrm-radeon1 2.4.23-3 libgl1-mesa-dri 7.10-4 it seems to be stable. I got a flash plugin crash once (out of 4 attempts), but the system survived.
So far, so good. If you can try 7.10-2 (which was still shipping the classic r600 driver, whereas 7.10-4 is shipping the Gallium based one) or other versions in between as well, that would be interesting.
severity 616301 critical thanks My system locks up whenever I click on a YouTube video link since yesterday. I can probably live without YouTube :), but in any case this shouldn't happen. This isn't a singled out case nor in exotic, possibly faulty, hardware. It's on a standard 1½-year old Dell OptiPlex 780 desktop with a Radeon HD card (one of the standard configurations) and this is on a stock squeeze system. The findings so far seem to suggest this is a Mesa issue; I'd probably file it under "Linux kernel bugs" (or even DoS bugs) but I'm not sure where to properly file such bugs in the post-KMS stack world. Regards, Faidon
severity 616301 important thanks Faidon Liambotis <paravoid@debian.org> (04/03/2011): No… Plenty of other things shouldn't happen. That doesn't make it a critical bug. KiBi.
With packages libdrm-radeon1_2.4.23-2 libgl1-mesa-dri_7.10-2 libtalloc2_2.0.5-1 from http://snapshot.debian.org/archive/debian/20110210T084921Z/ iceweasel sometimes freezes when viewing youtube videos but so far this doesn't trigger gpu softresets. Kind regards, Matthias
Hi, Herber Sylwester <slyher@oomkill.net> (03/03/2011): hmm, nothing seems related. Any chance you upgraded something in the kernel/X stack before that, and only restarted your X or rebooted lately? KiBi.
(Trying to gather everyone in To/Cc.) Hi, Apollon Oikonomopoulos <apoikos@gmail.com> (04/03/2011): while you're at it, could you try switching off KMS? That should get you a working system in the interim. If you have some more time, you might want to check what happens with 2.6.38rc* from experimental. […] Thanks. To reply Faidon's question on IRC, I guess we might want to either reassign that bug to the kernel (affecting the driver + the server, to make sure it gets noticed) and/or to clone it for now, until further investigation happens. Michel, any preference? KiBi.
No, not unless it will affect a large proportion of users. If there is a kernel driver involved then it should be assigned to the kernel. Even without KMS, a Mesa driver should be considered untrusted and should not be able to trigger a crash or hang. With KMS, this applies to the X driver too. Ben.
Hi, Same thing here ... with basically same constellation in terms of packages and update timing. For info: Work around on Youtube: Disable HW accel in flashplayer (Right-click / Settings on some none-crashing flashplayer instance (some other web-site) using iceweasel e.g.). So it might be maybe radeon, video HW accel related ... !? G+, Oliver
Hi there. W dniu 04.03.2011 22:44, Cyril Brulebois pisze: Mayby the upgrade was just a time related coincidence. After the upgrade i did niether reboot the machine nor restart the X-server. I remeber that 2nd I've watched few Youtube videos and it was fine. Don't know if attaching history.log for february will do any good. I did not do any manual enhancements to the system especially the kernel part. Regards, Herber Sylwester.
Hi there. W dniu 04.03.2011 22:44, Cyril Brulebois pisze: Start-Date: 2011-03-05 16:03:36 Commandline: apt-get dist-upgrade Install: libglib2.0-bin:i386 (2.28.1-1+b1, automatic), gcc-4.5-base:i386 (4.5.2-4, automatic), libboost-program-options1.42.0:i386 (1.42.0-4, automatic), libsoundtouch0:i386 (1.5.0-4, automatic) Upgrade: browser-plugin-gnash:i386 (0.8.8-9, 0.8.9~git20110220-1), libjna-java:i386 (3.2.4-2, 3.2.7-1), libstdc++6:i386 (4.4.5-12, 4.5.2-4), libwildmidi1:i386 (0.2.3.2-2, 0.2.3.4-1), rpm2cpio:i386 (4.8.1-6, 4.8.1-7), librpmbuild1:i386 (4.8.1-6, 4.8.1-7), libmpfr4:i386 (3.0.0-2, 3.0.0-7), python-mako:i386 (0.3.6-1, 0.4.0-1), libportaudio2:i386 (19+svn20071022-3.2, 19+svn20101113-3), xsane:i386 (0.997-2+b1, 0.998-1), mobile-broadband-provider-info:i386 (20101106-1, 20110218-1), libk3b6:i386 (2.0.2-1, 2.0.2-1+b1), libglib2.0-dev:i386 (2.24.2-1, 2.28.1-1+b1), libutempter0:i386 (1.1.5-3, 1.1.5-4), hplip-cups:i386 (3.10.6-2, 3.11.1-2), libx11-data:i386 (1.4.1-4, 1.4.1-5), libgfortran3:i386 (4.4.5-12, 4.5.2-4), iputils-ping:i386 (20100418-3, 20101006-1), hpijs:i386 (3.10.6-2, 3.11.1-2), hplip:i386 (3.10.6-2, 3.11.1-2), libsqlite3-dev:i386 (3.7.4-2, 3.7.5-1), librpmio1:i386 (4.8.1-6, 4.8.1-7), librpm1:i386 (4.8.1-6, 4.8.1-7), rpm-common:i386 (4.8.1-6, 4.8.1-7), libgomp1:i386 (4.4.5-12, 4.5.2-4), libpcre3:i386 (8.02-1.1, 8.12-3), libx11-xcb1:i386 (1.4.1-4, 1.4.1-5), libsqlite3-0:i386 (3.7.4-2, 3.7.5-1), k3b:i386 (2.0.2-1, 2.0.2-1+b1), libgcc1:i386 (4.4.5-12, 4.5.2-4), konqueror-plugin-gnash:i386 (0.8.8-9, 0.8.9~git20110220-1), libhpmud0:i386 (3.10.6-2, 3.11.1-2), w3m:i386 (0.5.3-1, 0.5.3-2), mozilla-plugin-gnash:i386 (0.8.8-9, 0.8.9~git20110220-1), python-pysqlite2:i386 (2.6.0-1, 2.6.3-1), libglib2.0-data:i386 (2.24.2-1, 2.28.1-1), libfreetype6-dev:i386 (2.4.2-2.1, 2.4.4-1), xdg-utils:i386 (1.1.0~rc1-1, 1.1.0~rc1-2), rpm:i386 (4.8.1-6, 4.8.1-7), gstreamer0.10-plugins-bad:i386 (0.10.19-2+b2, 0.10.19-2.1), libgcc1-dbg:i386 (4.4.5-12, 4.5.2-4), xsane-common:i386 (0.997-2, 0.998-1), libemail-mime-perl:i386 (1.906-1, 1.907-1), libfreetype6:i386 (2.4.2-2.1, 2.4.4-1), gnash:i386 (0.8.8-9, 0.8.9~git20110220-1), libglib2.0-0:i386 (2.24.2-1, 2.28.1-1+b1), libk3b6-extracodecs:i386 (2.0.2-1, 2.0.2-1+b1), libsane-hpaio:i386 (3.10.6-2, 3.11.1-2), libx11-6:i386 (1.4.1-4, 1.4.1-5), pkg-kde-tools:i386 (0.9.3, 0.9.5), klash:i386 (0.8.8-9, 0.8.9~git20110220-1), libx11-dev:i386 (1.4.1-4, 1.4.1-5), libobjc2:i386 (4.4.5-12, 4.5.2-4), gnash-common:i386 (0.8.8-9, 0.8.9~git20110220-1), sqlite3:i386 (3.7.4-2, 3.7.5-1), binfmt-support:i386 (2.0.2, 2.0.3), eject:i386 (2.1.5+deb1+cvs20081104-7.1, 2.1.5+deb1+cvs20081104-8), libmms0:i386 (0.6.2-1, 0.6.2-2), hplip-data:i386 (3.10.6-2, 3.11.1-2) End-Date: 2011-03-05 16:06:20 Did not help either. I'm starting to think it may be caussed by hardware malfunction. (wild goose chase?) I've noticed that system does not hang - I'm able to go to a console login and reboot. The screen ends 10cm below the monitor so i have to gues some things. If it would be helpful I may try to record the incident. Sincerely, Herber Sylwester.
Good news, it might be possible to isolate an upstream fix for the classic r600 driver and backport it to squeeze then. Can you try doing that with git bisect?
I'll look into it tomorrow - then I hopefully find time to do it. If someone else also wants to work on this simultaneously or needs a workaround for squeeze: Simply updating to "libgl1-mesa-dri_7.8.1-2" from http://snapshot.debian.org/package/mesa/7.8.1-2/#libgl1-mesa-dri_7.8.1-2 should fix the issue. Kind regards, Matthias
With or without KMS, the userspace acceleration drivers can certainly cause GPU hangs if the 3D engine is programmed with some combination of commands it doesn't like. Alex
You can't solve the halting problem but you can implement a watchdog, can't you? Ben.
I haven't heard of many chips that won't hang given the wrong instructsion whether it's GPU or keyboard controller. Sounds like more than a driver issue but a choice of driver issue. How are you going to have it both ways without an ammount of care you have no time for? having interrupting access / watchdog is nice if your driver can do that Ben Hutchings wrote:
Of course. This is why the kernel driver filters the commands going to the GPU - the commands come from unprivileged applications (the Mesa driver is just a shared library) and should not be trusted. [...] Don't top-post. Ben.
of course if you ask and have that command. just ask I'm actually not planning on being in the channel long. someone might find me! he he. Question. can I ask what top-post is? I will look it up too. debian rules are getting rather long to even have heard about even having read a good part of them once. Ben Hutchings wrote:
We have lockup detection and asic reset support, but depending on the lockup it may or may not be able to successfully reset the asic. Also, as for the command buffer checking, we try to protect against basic stupidity, but the chips are just too complex to check for every possible scenario that might cause a hang. Alex
We do, we reset the GPU 10s after it hangs, but this depends on a lot of things going our way. Occasionally we do reset the GPU when we shouldn't as well. However if there is an issue in the kernel, ddx or mesa driver, constants resets will pretty much DoS the GPU. Dave.
No, it seems clear at this point that it's a youtube change triggering a pre-existing bug in the Mesa r600 driver. Might be nice for the sake of completeness, though most likely it'll just be generic symptoms from GPU lockups, which unfortunately don't say much about what triggered them. The most promising approach at this point seems bisecting which Mesa upstream change fixed the problem.
Hi,
I've been trying to bisect upstream between mesa_7_7_1 and mesa-7.8,
without any success. A large part of the commit tree for r600 between
7.7.1 and 7.8 FTBFS with any debian libdrm-dev version I tried since
2.4.15-1, stating:
In file included from radeon_common.h:4,
from radeon_screen.c:49:
radeon_common_context.h:405: error: array type has incomplete element type
I've searched around a bit and this seems to be due to some libdrm API
changes.
This is taking too much time, I can't do this remotely since I have to
be able to reboot the machine and furthermore, this is my work machine.
Unless someone can provide some advice regarding the build process, I'll
have to give up :(.
Regards,
Apollon
Can you provide a specific mesa commit hash where this happens?
For example, a1b9c4e22a83d2125f66c3a3af3143bc0daee9a4 (2nd bisection step) and the whole area around it. For the record, the following mesa commits seem to make r600 conform to the new API: $ git log --oneline origin/master | grep "new libdrm_radeon api" bd9e0eb radeon/r600: use new libdrm_radeon api 9373287 radeon/r600: use new libdrm_radeon api b065aec radeon/r600: use new libdrm_radeon api (Basically the same commit on different branches) however, they are not cleanly backportable to the affected commits I encountered. Debian libdrm-dev versions prior to 2.4.18 do not ship /usr/include/drm/drm.h which is needed, and versions from 2.4.18 (included) and onwards fail with the above message. Any hints as to the version of libdrm required to compile these revisions? Thanks, Apollon
You could try an older version of libdrm-dev with /usr/include/drm/drm.h manually copied from a newer version. It should be backwards compatible.
Hi, I've encountered the same problem. :/ Seems like we need libdrm-dev 2.4.16 or some other version before that. But I also tested libgl1-mesa-dri 7.6.1 from snapshot.debian.org which works with current youtube videos and doesn't even crash Adobe Flash player and Iceweasel when opening more than one such tab (unlike libgl1-mesa-dri >= 7.8), so I'll try to bisect from that point to 7.7 too. Apparently this gpu softreset issue is specific to the mesa 7.7 development line. I attached my git-bisect logs for reference. (git://git.debian.org/git/pkg-xorg/lib/mesa) By the way: The 'proper fix' in commit 5997501ca0d0c905025bc2a840e48e2176d64ea3 has only been partly applied to mesa-7.7.1 ... that seems odd. Kind regards, Matthias
Note that the bug in question causes a *kernel* lockup, not just a GPU hang. As indicated by the output in the bug log¹, we're getting a “BUG: unable to handle kernel paging request” after a GPU reset on 2.6.37, while a silent lockup on 2.6.32. Regards, Faidon ¹: http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=616301#30