Star64_linux/drivers/gpu/drm/amd/amdgpu
Dennis Li 72e14ebf9f drm/amdgpu: annotate a false positive recursive locking
[  584.110304] ============================================
[  584.110590] WARNING: possible recursive locking detected
[  584.110876] 5.6.0-deli-v5.6-2848-g3f3109b0e75f #1 Tainted: G           OE
[  584.111164] --------------------------------------------
[  584.111456] kworker/38:1/553 is trying to acquire lock:
[  584.111721] ffff9b15ff0a47a0 (&adev->reset_sem){++++}, at: amdgpu_device_gpu_recover+0x262/0x1030 [amdgpu]
[  584.112112]
               but task is already holding lock:
[  584.112673] ffff9b1603d247a0 (&adev->reset_sem){++++}, at: amdgpu_device_gpu_recover+0x262/0x1030 [amdgpu]
[  584.113068]
               other info that might help us debug this:
[  584.113689]  Possible unsafe locking scenario:

[  584.114350]        CPU0
[  584.114685]        ----
[  584.115014]   lock(&adev->reset_sem);
[  584.115349]   lock(&adev->reset_sem);
[  584.115678]
                *** DEADLOCK ***

[  584.116624]  May be due to missing lock nesting notation

[  584.117284] 4 locks held by kworker/38:1/553:
[  584.117616]  #0: ffff9ad635c1d348 ((wq_completion)events){+.+.}, at: process_one_work+0x21f/0x630
[  584.117967]  #1: ffffac708e1c3e58 ((work_completion)(&con->recovery_work)){+.+.}, at: process_one_work+0x21f/0x630
[  584.118358]  #2: ffffffffc1c2a5d0 (&tmp->hive_lock){+.+.}, at: amdgpu_device_gpu_recover+0xae/0x1030 [amdgpu]
[  584.118786]  #3: ffff9b1603d247a0 (&adev->reset_sem){++++}, at: amdgpu_device_gpu_recover+0x262/0x1030 [amdgpu]
[  584.119222]
               stack backtrace:
[  584.119990] CPU: 38 PID: 553 Comm: kworker/38:1 Kdump: loaded Tainted: G           OE     5.6.0-deli-v5.6-2848-g3f3109b0e75f #1
[  584.120782] Hardware name: Supermicro SYS-7049GP-TRT/X11DPG-QT, BIOS 3.1 05/23/2019
[  584.121223] Workqueue: events amdgpu_ras_do_recovery [amdgpu]
[  584.121638] Call Trace:
[  584.122050]  dump_stack+0x98/0xd5
[  584.122499]  __lock_acquire+0x1139/0x16e0
[  584.122931]  ? trace_hardirqs_on+0x3b/0xf0
[  584.123358]  ? cancel_delayed_work+0xa6/0xc0
[  584.123771]  lock_acquire+0xb8/0x1c0
[  584.124197]  ? amdgpu_device_gpu_recover+0x262/0x1030 [amdgpu]
[  584.124599]  down_write+0x49/0x120
[  584.125032]  ? amdgpu_device_gpu_recover+0x262/0x1030 [amdgpu]
[  584.125472]  amdgpu_device_gpu_recover+0x262/0x1030 [amdgpu]
[  584.125910]  ? amdgpu_ras_error_query+0x1b8/0x2a0 [amdgpu]
[  584.126367]  amdgpu_ras_do_recovery+0x159/0x190 [amdgpu]
[  584.126789]  process_one_work+0x29e/0x630
[  584.127208]  worker_thread+0x3c/0x3f0
[  584.127621]  ? __kthread_parkme+0x61/0x90
[  584.128014]  kthread+0x12f/0x150
[  584.128402]  ? process_one_work+0x630/0x630
[  584.128790]  ? kthread_park+0x90/0x90
[  584.129174]  ret_from_fork+0x3a/0x50

Each adev has owned lock_class_key to avoid false positive
recursive locking.

v2:
1. register adev->lock_key into lockdep, otherwise lockdep will
report the below warning

[ 1216.705820] BUG: key ffff890183b647d0 has not been registered!
[ 1216.705924] ------------[ cut here ]------------
[ 1216.705972] DEBUG_LOCKS_WARN_ON(1)
[ 1216.705997] WARNING: CPU: 20 PID: 541 at kernel/locking/lockdep.c:3743 lockdep_init_map+0x150/0x210

v3:
change to use down_write_nest_lock to annotate the false dead-lock
warning.

Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Dennis Li <Dennis.Li@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-08-14 16:22:40 -04:00
..
amdgpu.h drm/amdkfd: option to disable system mem limit 2020-08-06 15:43:08 -04:00
amdgpu_acp.c
amdgpu_acp.h
amdgpu_acpi.c
amdgpu_afmt.c
amdgpu_amdkfd.c drm/amdgpu: unlock mutex on error 2020-08-07 17:31:26 -04:00
amdgpu_amdkfd.h drm/amdkfd: Add thermal throttling SMI event 2020-07-27 16:21:50 -04:00
amdgpu_amdkfd_arcturus.c
amdgpu_amdkfd_fence.c
amdgpu_amdkfd_gfx_v7.c drm/amdgpu: fix system hang issue during GPU reset 2020-07-27 16:21:37 -04:00
amdgpu_amdkfd_gfx_v8.c drm/amdgpu: fix system hang issue during GPU reset 2020-07-27 16:21:37 -04:00
amdgpu_amdkfd_gfx_v9.c drm/amdgpu: fix system hang issue during GPU reset 2020-07-27 16:21:37 -04:00
amdgpu_amdkfd_gfx_v9.h
amdgpu_amdkfd_gfx_v10.c drm/amdgpu: fix system hang issue during GPU reset 2020-07-27 16:21:37 -04:00
amdgpu_amdkfd_gfx_v10_3.c
amdgpu_amdkfd_gpuvm.c drm/amdkfd: option to disable system mem limit 2020-08-06 15:43:08 -04:00
amdgpu_atombios.c drm/amdgpu: move vram usage by vbios to mman (v2) 2020-08-04 17:29:29 -04:00
amdgpu_atombios.h
amdgpu_atomfirmware.c drm/amdgpu: move vram usage by vbios to mman (v2) 2020-08-04 17:29:29 -04:00
amdgpu_atomfirmware.h
amdgpu_atpx_handler.c
amdgpu_benchmark.c
amdgpu_bios.c
amdgpu_bo_list.c
amdgpu_bo_list.h
amdgpu_cgs.c
amdgpu_connectors.c
amdgpu_connectors.h
amdgpu_cs.c drm/amdgpu: fix system hang issue during GPU reset 2020-07-27 16:21:37 -04:00
amdgpu_csa.c
amdgpu_csa.h
amdgpu_ctx.c drm/amdgpu: fix system hang issue during GPU reset 2020-07-27 16:21:37 -04:00
amdgpu_ctx.h
amdgpu_debugfs.c drm/amdgpu: add debugfs interface for RAP test 2020-08-14 16:22:40 -04:00
amdgpu_debugfs.h
amdgpu_device.c drm/amdgpu: annotate a false positive recursive locking 2020-08-14 16:22:40 -04:00
amdgpu_df.h
amdgpu_discovery.c drm/amdgpu: move IP discovery data to mman 2020-08-04 17:29:29 -04:00
amdgpu_discovery.h
amdgpu_display.c
amdgpu_display.h
amdgpu_dma_buf.c drm/amdgpu: Enable P2P dmabuf over XGMI 2020-08-11 11:47:35 -04:00
amdgpu_dma_buf.h drm/amdgpu: Enable P2P dmabuf over XGMI 2020-08-11 11:47:35 -04:00
amdgpu_doorbell.h
amdgpu_dpm.c
amdgpu_dpm.h drm/amd/powerplay: add new sysfs interface for retrieving gpu metrics(V2) 2020-08-06 15:43:56 -04:00
amdgpu_drv.c drm/amdgpu: new ids flag for tmz (v2) 2020-08-06 15:45:52 -04:00
amdgpu_drv.h
amdgpu_encoders.c
amdgpu_fb.c
amdgpu_fence.c
amdgpu_fru_eeprom.c
amdgpu_fru_eeprom.h
amdgpu_gart.c
amdgpu_gart.h
amdgpu_gds.h
amdgpu_gem.c drm/amdgpu: fix system hang issue during GPU reset 2020-07-27 16:21:37 -04:00
amdgpu_gem.h
amdgpu_gfx.c drm/amdgpu: reconfigure spm golden settings on Navi1x after GFXOFF exit(v3) 2020-08-14 16:22:39 -04:00
amdgpu_gfx.h drm/amdgpu: add interface amdgpu_gfx_init_spm_golden for Navi1x 2020-08-14 16:22:29 -04:00
amdgpu_gmc.c drm/amdgpu: move stolen memory from gmc to mman 2020-08-04 17:29:29 -04:00
amdgpu_gmc.h drm/amdgpu: move stolen memory from gmc to mman 2020-08-04 17:29:29 -04:00
amdgpu_gtt_mgr.c
amdgpu_i2c.c
amdgpu_i2c.h
amdgpu_ib.c
amdgpu_ids.c
amdgpu_ids.h
amdgpu_ih.c
amdgpu_ih.h
amdgpu_ioc32.c
amdgpu_irq.c
amdgpu_irq.h
amdgpu_job.c drm/amdgpu: fix system hang issue during GPU reset 2020-07-27 16:21:37 -04:00
amdgpu_job.h
amdgpu_jpeg.c
amdgpu_jpeg.h
amdgpu_kms.c drm/amdgpu: new ids flag for tmz (v2) 2020-08-06 15:45:52 -04:00
amdgpu_mes.h
amdgpu_mmhub.c
amdgpu_mmhub.h
amdgpu_mn.c
amdgpu_mn.h
amdgpu_mode.h
amdgpu_nbio.c
amdgpu_nbio.h
amdgpu_object.c drm/amdgpu: handle bo size 0 in amdgpu_bo_create_kernel_at (v2) 2020-08-04 17:29:28 -04:00
amdgpu_object.h
amdgpu_pll.c
amdgpu_pll.h
amdgpu_pm.c drm/amd/powerplay: add new sysfs interface for retrieving gpu metrics(V2) 2020-08-06 15:43:56 -04:00
amdgpu_pm.h
amdgpu_pmu.c
amdgpu_pmu.h
amdgpu_psp.c drm/amdgpu: enable RAP TA load 2020-08-14 16:22:39 -04:00
amdgpu_psp.h drm/amdgpu: enable RAP TA load 2020-08-14 16:22:39 -04:00
amdgpu_rap.c drm/amdgpu: add debugfs interface for RAP test 2020-08-14 16:22:40 -04:00
amdgpu_rap.h drm/amdgpu: add debugfs interface for RAP test 2020-08-14 16:22:40 -04:00
amdgpu_ras.c drm/amdgpu: add debugfs node to toggle ras error cnt harvest 2020-08-14 16:12:47 -04:00
amdgpu_ras.h drm/amdgpu: bypass querying ras error count registers 2020-08-14 16:12:22 -04:00
amdgpu_ras_eeprom.c drm/amdgpu: added RAS EEPROM device support check 2020-08-04 17:29:18 -04:00
amdgpu_ras_eeprom.h drm/amdgpu: break GPU recovery once it's in bad state(v4) 2020-08-04 17:26:54 -04:00
amdgpu_ring.c
amdgpu_ring.h
amdgpu_rlc.c
amdgpu_rlc.h
amdgpu_sa.c
amdgpu_sched.c
amdgpu_sched.h
amdgpu_sdma.c
amdgpu_sdma.h
amdgpu_socbb.h
amdgpu_sync.c
amdgpu_sync.h
amdgpu_test.c
amdgpu_trace.h
amdgpu_trace_points.c
amdgpu_ttm.c drm/amdgpu: move vram usage by vbios to mman (v2) 2020-08-04 17:29:29 -04:00
amdgpu_ttm.h drm/amdgpu: move vram usage by vbios to mman (v2) 2020-08-04 17:29:29 -04:00
amdgpu_ucode.c drm/amdgpu: fix system hang issue during GPU reset 2020-07-27 16:21:37 -04:00
amdgpu_ucode.h drm/amdgpu: enable RAP TA load 2020-08-14 16:22:39 -04:00
amdgpu_umc.c drm/amdgpu: disable page reservation when amdgpu_bad_page_threshold = 0 2020-08-04 17:27:20 -04:00
amdgpu_umc.h
amdgpu_uvd.c
amdgpu_uvd.h
amdgpu_vce.c
amdgpu_vce.h
amdgpu_vcn.c
amdgpu_vcn.h
amdgpu_vf_error.c
amdgpu_vf_error.h
amdgpu_virt.c drm/amdgpu: move vram usage by vbios to mman (v2) 2020-08-04 17:29:29 -04:00
amdgpu_virt.h drm/amdgpu: fix system hang issue during GPU reset 2020-07-27 16:21:37 -04:00
amdgpu_vm.c drm/amdgpu: Enable P2P dmabuf over XGMI 2020-08-11 11:47:35 -04:00
amdgpu_vm.h
amdgpu_vm_cpu.c
amdgpu_vm_sdma.c
amdgpu_vram_mgr.c drm: amdgpu: Use the correct size when allocating memory 2020-08-10 17:19:12 -04:00
amdgpu_xgmi.c drm/amdgpu: unlock mutex on error 2020-08-07 17:31:26 -04:00
amdgpu_xgmi.h drm/amdgpu: fix system hang issue during GPU reset 2020-07-27 16:21:37 -04:00
arct_reg_init.c
athub_v1_0.c
athub_v1_0.h
athub_v2_0.c
athub_v2_0.h
athub_v2_1.c
athub_v2_1.h
atom.c drm/amdgpu: fix system hang issue during GPU reset 2020-07-27 16:21:37 -04:00
atom.h
atombios_crtc.c
atombios_crtc.h
atombios_dp.c
atombios_dp.h
atombios_encoders.c
atombios_encoders.h
atombios_i2c.c
atombios_i2c.h
cik.c
cik.h
cik_dpm.h
cik_ih.c
cik_ih.h
cik_sdma.c
cik_sdma.h
cikd.h
clearstate_ci.h
clearstate_defs.h
clearstate_gfx9.h
clearstate_gfx10.h
clearstate_si.h
clearstate_vi.h
cz_ih.c
cz_ih.h
dce_v6_0.c
dce_v6_0.h
dce_v8_0.c
dce_v8_0.h
dce_v10_0.c
dce_v10_0.h
dce_v11_0.c
dce_v11_0.h
dce_virtual.c
dce_virtual.h
df_v1_7.c
df_v1_7.h
df_v3_6.c
df_v3_6.h
emu_soc.c
gfx_v6_0.c
gfx_v6_0.h
gfx_v7_0.c
gfx_v7_0.h
gfx_v8_0.c drm/amdgpu: introduce a new parameter to configure how many KCQ we want(v5) 2020-08-04 17:27:29 -04:00
gfx_v8_0.h
gfx_v9_0.c drm/amdgpu: update gc golden register for arcturus 2020-08-10 17:26:52 -04:00
gfx_v9_0.h
gfx_v9_4.c
gfx_v9_4.h
gfx_v10_0.c drm/amdgpu: add interface amdgpu_gfx_init_spm_golden for Navi1x 2020-08-14 16:22:29 -04:00
gfx_v10_0.h
gfxhub_v1_0.c
gfxhub_v1_0.h
gfxhub_v1_1.c
gfxhub_v1_1.h
gfxhub_v2_0.c
gfxhub_v2_0.h
gfxhub_v2_1.c drm/amdgpu: Skip some registers config for SRIOV 2020-08-10 17:26:52 -04:00
gfxhub_v2_1.h
gmc_v6_0.c drm/amdgpu/gmc6: switch to using amdgpu_gmc_get_vbios_allocations 2020-08-04 17:29:28 -04:00
gmc_v6_0.h
gmc_v7_0.c drm/amdgpu/gmc7: switch to using amdgpu_gmc_get_vbios_allocations 2020-08-04 17:29:28 -04:00
gmc_v7_0.h
gmc_v8_0.c drm/amdgpu/gmc8: switch to using amdgpu_gmc_get_vbios_allocations 2020-08-04 17:29:29 -04:00
gmc_v8_0.h
gmc_v9_0.c drm/amdgpu/gmc9: switch to using amdgpu_gmc_get_vbios_allocations 2020-08-04 17:29:29 -04:00
gmc_v9_0.h
gmc_v10_0.c drm/amdgpu/gmc10: switch to using amdgpu_gmc_get_vbios_allocations 2020-08-04 17:29:29 -04:00
gmc_v10_0.h
iceland_ih.c
iceland_ih.h
iceland_sdma_pkt_open.h
jpeg_v1_0.c
jpeg_v1_0.h
jpeg_v2_0.c
jpeg_v2_0.h
jpeg_v2_5.c
jpeg_v2_5.h
jpeg_v3_0.c
jpeg_v3_0.h
Kconfig
kv_dpm.c
kv_dpm.h
kv_smc.c
Makefile drm/amdgpu: add debugfs interface for RAP test 2020-08-14 16:22:40 -04:00
mes_api_def.h
mes_v10_1.c
mes_v10_1.h
mmhub_v1_0.c
mmhub_v1_0.h
mmhub_v2_0.c drm/amdgpu: Skip some registers config for SRIOV 2020-08-10 17:26:52 -04:00
mmhub_v2_0.h
mmhub_v9_4.c
mmhub_v9_4.h
mmsch_v1_0.h
mmsch_v2_0.h
mmsch_v3_0.h
mxgpu_ai.c drm/amdgpu: fix system hang issue during GPU reset 2020-07-27 16:21:37 -04:00
mxgpu_ai.h
mxgpu_nv.c drm/amdgpu: fix system hang issue during GPU reset 2020-07-27 16:21:37 -04:00
mxgpu_nv.h
mxgpu_vi.c
mxgpu_vi.h
navi10_ih.c
navi10_ih.h
navi10_reg_init.c
navi10_sdma_pkt_open.h
navi12_reg_init.c
navi14_reg_init.c
nbio_v2_3.c
nbio_v2_3.h
nbio_v6_1.c
nbio_v6_1.h
nbio_v7_0.c
nbio_v7_0.h
nbio_v7_4.c drm/amdgpu: bypass querying ras error count registers 2020-08-14 16:12:22 -04:00
nbio_v7_4.h
nv.c drm/amdgpu: use mode1 reset by default for sienna_cichlid 2020-08-07 17:29:29 -04:00
nv.h
nvd.h
ObjectID.h
ppsmc.h
psp_gfx_if.h
psp_v3_1.c
psp_v3_1.h
psp_v10_0.c
psp_v10_0.h
psp_v11_0.c
psp_v11_0.h
psp_v12_0.c
psp_v12_0.h
r600_dpm.h
sdma_common.h
sdma_v2_4.c
sdma_v2_4.h
sdma_v3_0.c
sdma_v3_0.h
sdma_v4_0.c
sdma_v4_0.h
sdma_v5_0.c
sdma_v5_0.h
sdma_v5_2.c
sdma_v5_2.h
si.c drm/amdgpu/si: initial support for GPU reset 2020-07-28 09:22:57 -04:00
si.h
si_dma.c
si_dma.h
si_dpm.c
si_dpm.h
si_enums.h
si_ih.c
si_ih.h
si_smc.c
sid.h
sienna_cichlid_reg_init.c
sislands_smc.h
smu_v11_0_i2c.c
smu_v11_0_i2c.h
soc15.c
soc15.h
soc15_common.h
soc15d.h
ta_rap_if.h drm/amdgpu: add RAP TA header file 2020-08-14 16:22:39 -04:00
ta_ras_if.h
ta_xgmi_if.h
tonga_ih.c
tonga_ih.h
tonga_sdma_pkt_open.h
umc_v6_0.c
umc_v6_0.h
umc_v6_1.c
umc_v6_1.h
umc_v8_7.c drm/amdgpu: add support for umc 8.7 ras functions 2020-07-27 16:23:00 -04:00
umc_v8_7.h drm/amdgpu: add support for umc 8.7 ras functions 2020-07-27 16:23:00 -04:00
uvd_v3_1.c
uvd_v3_1.h
uvd_v4_2.c
uvd_v4_2.h
uvd_v5_0.c
uvd_v5_0.h
uvd_v6_0.c
uvd_v6_0.h
uvd_v7_0.c
uvd_v7_0.h
vce_v2_0.c
vce_v2_0.h
vce_v3_0.c
vce_v3_0.h
vce_v4_0.c
vce_v4_0.h
vcn_v1_0.c
vcn_v1_0.h
vcn_v2_0.c
vcn_v2_0.h
vcn_v2_5.c
vcn_v2_5.h
vcn_v3_0.c Revert "drm/amdgpu/vcn3.0: remove extra asic type check" 2020-07-27 16:22:23 -04:00
vcn_v3_0.h
vega10_ih.c
vega10_ih.h
vega10_reg_init.c
vega10_sdma_pkt_open.h
vega20_reg_init.c
vi.c
vi.h
vid.h