Commit graph

7991 commits

Author SHA1 Message Date
Stanley.Yang
5436ab94cd drm/amdkfd: fix set kfd node ras properties value
The ctx->features are new RAS implementation which
is only available for Vega20 and onwards, it is not
available for vega10, vega10 should follow legacy
ECC implementation.

Changed from V1:
    wrap function to initialize kfd node properties

Changed from V2:
    remove wrap function and SDMA SRAM ECC check

Signed-off-by: Stanley.Yang <Stanley.Yang@amd.com>
Reviewed-by: Guchun Chen <guchun.chen@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-08-26 16:40:19 -04:00
Tao Zhou
c56c90f413 drm/amdgpu: add asd fw check before loading asd
asd is not ready for some ASICs in early stage, and psp->asd_fw is more generic than ASIC name in the check.

Signed-off-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Jiansong Chen <Jiansong.Chen@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-08-26 16:40:19 -04:00
Alex Deucher
4d2997ab21 drm/amdgpu: add a wrapper for atom asic_init
This allows us to add asic specific workarounds for atom
asic init while keeping the adev specifics out of the
atombios parser code.

Acked-by: Nirmoy Das <nirmoy.das@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-08-26 16:40:19 -04:00
Alex Deucher
a71737313e drm/amdgpu: add pre_asic_init callback for navi
Nothing to do for this family.

Acked-by: Nirmoy Das <nirmoy.das@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-08-26 16:40:19 -04:00
Alex Deucher
b0a2db9b48 drm/amdgpu: add pre_asic_init callback for SOC15
We need to restore some registers prior to running asic
init to work around a firmware bug.

Acked-by: Nirmoy Das <nirmoy.das@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-08-26 16:40:18 -04:00
Alex Deucher
cff6c7f91a drm/amdgpu: add pre_asic_init callback for VI
Nothing to do for this family.

Acked-by: Nirmoy Das <nirmoy.das@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-08-26 16:40:18 -04:00
Alex Deucher
819515c7f3 drm/amdgpu: add pre_asic_init callback for CIK
Nothing to do for this family.

Acked-by: Nirmoy Das <nirmoy.das@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-08-26 16:40:18 -04:00
Alex Deucher
632d9f9492 drm/amdgpu: add pre_asic_init callback for SI
Nothing to do for this family.

Acked-by: Nirmoy Das <nirmoy.das@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-08-26 16:40:18 -04:00
Alex Deucher
9737a923c9 drm/amdgpu: add an asic callback for pre asic init
This callback can be used by asics that need to
do something special prior to calling atom asic init.

Acked-by: Nirmoy Das <nirmoy.das@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-08-26 16:40:18 -04:00
Alex Deucher
f8646661f7 drm/amdgpu: fix up DCHUBBUB_SDPIF_MMIO_CNTRL_0 handling
Properly define this register using a relative offset rather
than an absolute offset and use the proper SOC15 macros to
access it.  It's also DCN, not DCE, so remove it from the
DCE12 header.

No functional change.

Acked-by: Nirmoy Das <nirmoy.das@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-08-26 16:40:18 -04:00
Felix Kuehling
332f6e1e98 drm/amdkfd: call amdgpu_amdkfd_get_hive_id directly
No need to use a function pointer because the implementation is not
ASIC-specific.

Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-08-26 16:40:18 -04:00
Felix Kuehling
817154c1a2 drm/amdkfd: call amdgpu_amdkfd_get_unique_id directly
No need to use a function pointer because the implementation is not
ASIC-specific. This fixes missing support due to a missing function
pointer on Arcturus.

Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-08-26 16:40:18 -04:00
Youling Tang
a590a83d74 gpu: amd: Remove duplicate semicolons at the end of line
Remove duplicate semicolons at the end of line.

Signed-off-by: Youling Tang <tangyouling@loongson.cn>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-08-26 16:40:18 -04:00
Jiansong Chen
d3bbba79eb drm/amdgpu/gfx10: refine mgcg setting
1. enable ENABLE_CGTS_LEGACY to fix specviewperf11 random hang.
2. remove obsolete RLC_CGTT_SCLK_OVERRIDE workaround.

Signed-off-by: Jiansong Chen <Jiansong.Chen@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-08-26 16:40:17 -04:00
Huang Rui
6127896f4a drm/amdkfd: implement the dGPU fallback path for apu (v6)
We still have a few iommu issues which need to address, so force raven
as "dgpu" path for the moment.

This is to add the fallback path to bypass IOMMU if IOMMU v2 is disabled
or ACPI CRAT table not correct.

v2: Use ignore_crat parameter to decide whether it will go with IOMMUv2.
v3: Align with existed thunk, don't change the way of raven, only renoir
    will use "dgpu" path by default.
v4: don't update global ignore_crat in the driver, and revise fallback
    function if CRAT is broken.
v5: refine acpi crat good but no iommu support case, and rename the
    title.
v6: fix the issue of dGPU initialized firstly, just modify the report
    value in the node_show().

Signed-off-by: Huang Rui <ray.huang@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-08-26 16:40:17 -04:00
Luben Tuikov
8aba21b751 drm/amdgpu: Embed drm_device into amdgpu_device (v3)
a) Embed struct drm_device into struct amdgpu_device.
b) Modify the inline-f drm_to_adev() accordingly.
c) Modify the inline-f adev_to_drm() accordingly.
d) Eliminate the use of drm_device.dev_private,
   in amdgpu.
e) Switch from using drm_dev_alloc() to
   drm_dev_init().
f) Add a DRM driver release function, which frees
   the container amdgpu_device after all krefs on
   the contained drm_device have been released.

v2: Split out adding adev_to_drm() into its own
    patch (previous commit), making this patch
    more succinct and clear. More detailed commit
    description.
v3: squash in fix to call drmm_add_final_kfree()
    to avoid a warning.

Signed-off-by: Luben Tuikov <luben.tuikov@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-08-26 16:40:17 -04:00
Jiansong Chen
82dff839c9 drm/amdgpu: disable runtime pm for navy_flounder
Disable runtime pm for navy_flounder temporarily.

Signed-off-by: Jiansong Chen <Jiansong.Chen@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-08-26 15:45:52 -04:00
Alex Deucher
b5b97cab55 drm/amdgpu: Fix buffer overflow in INFO ioctl
The values for "se_num" and "sh_num" come from the user in the ioctl.
They can be in the 0-255 range but if they're more than
AMDGPU_GFX_MAX_SE (4) or AMDGPU_GFX_MAX_SH_PER_SE (2) then it results in
an out of bounds read.

Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
2020-08-26 15:45:51 -04:00
Jiansong Chen
75947544c8 drm/amdgpu: use MODE1 reset for navy_flounder by default
Switch default gpu reset method to MODE1 for navy_flounder.

Signed-off-by: Jiansong Chen <Jiansong.Chen@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-08-26 15:45:51 -04:00
Tao Zhou
14e4f3bd81 drm/amdgpu: add asd fw check before loading asd
asd is not ready for some ASICs in early stage, and psp->asd_fw is more generic than ASIC name in the check.

Signed-off-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Jiansong Chen <Jiansong.Chen@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-08-26 15:27:52 -04:00
Jiansong Chen
de7a1b0b87 drm/amdgpu/gfx10: refine mgcg setting
1. enable ENABLE_CGTS_LEGACY to fix specviewperf11 random hang.
2. remove obsolete RLC_CGTT_SCLK_OVERRIDE workaround.

Signed-off-by: Jiansong Chen <Jiansong.Chen@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
2020-08-26 15:17:16 -04:00
Luben Tuikov
4a580877bd drm/amdgpu: Get DRM dev from adev by inline-f
Add a static inline adev_to_drm() to obtain
the DRM device pointer from an amdgpu_device pointer.

Signed-off-by: Luben Tuikov <luben.tuikov@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-08-24 13:06:06 -04:00
Luben Tuikov
1348969ab6 drm/amdgpu: drm_device to amdgpu_device by inline-f (v2)
Get the amdgpu_device from the DRM device by use
of an inline function, drm_to_adev(). The inline
function resolves a pointer to struct drm_device
to a pointer to struct amdgpu_device.

v2: Use a typed visible static inline function
    instead of an invisible macro.

Signed-off-by: Luben Tuikov <luben.tuikov@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-08-24 13:06:06 -04:00
Prike.Liang
50166d1ce5 drm/amdgpu: enable HDP clock gatting
Enabe HDP SD/DS clock gatting in Renoir series.

Signed-off-by: Prike.Liang <Prike.Liang@amd.com>
Reviewed-by: Evan Quan <evan.quan@amd.com>
Reviewed-by: Huang Rui <ray.huang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-08-24 13:06:05 -04:00
Prike.Liang
d844812b28 drm/amdgpu: enable ATHUB clock gatting
Enable ATHUB clock gatting set in Renoir series.

Signed-off-by: Prike.Liang <Prike.Liang@amd.com>
Reviewed-by: Evan Quan <evan.quan@amd.com>
Reviewed-by: Huang Rui <ray.huang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-08-24 13:06:05 -04:00
Dennis Li
08ebb485f0 drm/amdgpu: annotate a false positive recursive locking
Re-apply commit 72e14ebf9f

[  584.110304] ============================================
[  584.110590] WARNING: possible recursive locking detected
[  584.110876] 5.6.0-deli-v5.6-2848-g3f3109b0e75f #1 Tainted: G           OE
[  584.111164] --------------------------------------------
[  584.111456] kworker/38:1/553 is trying to acquire lock:
[  584.111721] ffff9b15ff0a47a0 (&adev->reset_sem){++++}, at: amdgpu_device_gpu_recover+0x262/0x1030 [amdgpu]
[  584.112112]
               but task is already holding lock:
[  584.112673] ffff9b1603d247a0 (&adev->reset_sem){++++}, at: amdgpu_device_gpu_recover+0x262/0x1030 [amdgpu]
[  584.113068]
               other info that might help us debug this:
[  584.113689]  Possible unsafe locking scenario:

[  584.114350]        CPU0
[  584.114685]        ----
[  584.115014]   lock(&adev->reset_sem);
[  584.115349]   lock(&adev->reset_sem);
[  584.115678]
                *** DEADLOCK ***

[  584.116624]  May be due to missing lock nesting notation

[  584.117284] 4 locks held by kworker/38:1/553:
[  584.117616]  #0: ffff9ad635c1d348 ((wq_completion)events){+.+.}, at: process_one_work+0x21f/0x630
[  584.117967]  #1: ffffac708e1c3e58 ((work_completion)(&con->recovery_work)){+.+.}, at: process_one_work+0x21f/0x630
[  584.118358]  #2: ffffffffc1c2a5d0 (&tmp->hive_lock){+.+.}, at: amdgpu_device_gpu_recover+0xae/0x1030 [amdgpu]
[  584.118786]  #3: ffff9b1603d247a0 (&adev->reset_sem){++++}, at: amdgpu_device_gpu_recover+0x262/0x1030 [amdgpu]
[  584.119222]
               stack backtrace:
[  584.119990] CPU: 38 PID: 553 Comm: kworker/38:1 Kdump: loaded Tainted: G           OE     5.6.0-deli-v5.6-2848-g3f3109b0e75f #1
[  584.120782] Hardware name: Supermicro SYS-7049GP-TRT/X11DPG-QT, BIOS 3.1 05/23/2019
[  584.121223] Workqueue: events amdgpu_ras_do_recovery [amdgpu]
[  584.121638] Call Trace:
[  584.122050]  dump_stack+0x98/0xd5
[  584.122499]  __lock_acquire+0x1139/0x16e0
[  584.122931]  ? trace_hardirqs_on+0x3b/0xf0
[  584.123358]  ? cancel_delayed_work+0xa6/0xc0
[  584.123771]  lock_acquire+0xb8/0x1c0
[  584.124197]  ? amdgpu_device_gpu_recover+0x262/0x1030 [amdgpu]
[  584.124599]  down_write+0x49/0x120
[  584.125032]  ? amdgpu_device_gpu_recover+0x262/0x1030 [amdgpu]
[  584.125472]  amdgpu_device_gpu_recover+0x262/0x1030 [amdgpu]
[  584.125910]  ? amdgpu_ras_error_query+0x1b8/0x2a0 [amdgpu]
[  584.126367]  amdgpu_ras_do_recovery+0x159/0x190 [amdgpu]
[  584.126789]  process_one_work+0x29e/0x630
[  584.127208]  worker_thread+0x3c/0x3f0
[  584.127621]  ? __kthread_parkme+0x61/0x90
[  584.128014]  kthread+0x12f/0x150
[  584.128402]  ? process_one_work+0x630/0x630
[  584.128790]  ? kthread_park+0x90/0x90
[  584.129174]  ret_from_fork+0x3a/0x50

Each adev has owned lock_class_key to avoid false positive
recursive locking.

v2:
1. register adev->lock_key into lockdep, otherwise lockdep will
report the below warning

[ 1216.705820] BUG: key ffff890183b647d0 has not been registered!
[ 1216.705924] ------------[ cut here ]------------
[ 1216.705972] DEBUG_LOCKS_WARN_ON(1)
[ 1216.705997] WARNING: CPU: 20 PID: 541 at kernel/locking/lockdep.c:3743 lockdep_init_map+0x150/0x210

v3:
change to use down_write_nest_lock to annotate the false dead-lock
warning.

Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Dennis Li <Dennis.Li@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-08-24 13:05:52 -04:00
Dennis Li
d95e8e97e2 drm/amdgpu: refine create and release logic of hive info
Change to dynamically create and release hive info object,
which help driver support more hives in the future.

v2:
Change to save hive object pointer in adev, to avoid locking
xgmi_mutex every time when calling amdgpu_get_xgmi_hive.

v3:
1. Change type of hive object pointer in adev from void* to
amdgpu_hive_info*.
2. remove unnecessary variable initialization.

Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Dennis Li <Dennis.Li@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-08-24 12:24:14 -04:00
Dennis Li
aac891685d drm/amdgpu: refine message print for devices of hive
Using dev_xxx instead of DRM_xxx/pr_xxx to indicate which device
of a hive is the message for.

Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Dennis Li <Dennis.Li@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-08-24 12:24:06 -04:00
Dennis Li
cbfd17f7ba drm/amdgpu: fix the nullptr issue when reenter GPU recovery
in single gpu system, if driver reenter gpu recovery,
amdgpu_device_lock_adev will return false, but hive is
nullptr now.

Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Dennis Li <Dennis.Li@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-08-24 12:23:54 -04:00
Dennis Li
6049db43d6 drm/amdgpu: change reset lock from mutex to rw_semaphore
clients don't need reset-lock for synchronization when no
GPU recovery.

v2:
change to return the return value of down_read_killable.

v3:
if GPU recovery begin, VF ignore FLR notification.

Reviewed-by: Monk Liu <monk.liu@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Dennis Li <Dennis.Li@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-08-24 12:23:48 -04:00
Lukas Bulwahn
5049a05269 drm/amd/display: remove unintended executable mode
Besides the intended change, commit 4cc1178e16 ("drm/amdgpu: replace DRM
prefix with PCI device info for gfx/mmhub") also set the source files
mmhub_v1_0.c and gfx_v9_4.c to be executable, i.e., changed fromold mode
644 to new mode 755.

Commit 241b2ec931 ("drm/amd/display: Add dcn30 Headers (v2)") added the
four header files {dpcs,dcn}_3_0_0_{offset,sh_mask}.h as executable, i.e.,
mode 755.

Set to the usual modes for source and headers files and clean up those
mistakes. No functional change.

Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Lukas Bulwahn <lukas.bulwahn@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-08-24 12:23:02 -04:00
Dennis Li
53b3f8f40e drm/amdgpu: refine codes to avoid reentering GPU recovery
if other threads have holden the reset lock, recovery will
fail to try_lock. Therefore we introduce atomic hive->in_reset
and adev->in_gpu_reset, to avoid reentering GPU recovery.

v2:
drop "? true : false" in the definition of amdgpu_in_reset

Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Dennis Li <Dennis.Li@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-08-24 12:22:56 -04:00
Dave Airlie
ebb21aa188 drm/ttm: drop bus.size from bus placement.
This is always calculated the same, and only used in a couple of places.

Signed-off-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200811074658.58309-2-airlied@gmail.com
2020-08-24 17:06:08 +10:00
Dave Airlie
098754fe3c drm/ttm: init mem->bus in common code.
The drivers all do the same thing here.

Reviewed-by: Christian König <christian.koenig@amd.com> for both.
Signed-off-by: Dave Airlie <airlied@redhat.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200811074658.58309-1-airlied@gmail.com
2020-08-24 17:00:48 +10:00
Gustavo A. R. Silva
df561f6688 treewide: Use fallthrough pseudo-keyword
Replace the existing /* fall through */ comments and its variants with
the new pseudo-keyword macro fallthrough[1]. Also, remove unnecessary
fall-through markings when it is the case.

[1] https://www.kernel.org/doc/html/v5.7/process/deprecated.html?highlight=fallthrough#implicit-switch-case-fall-through

Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
2020-08-23 17:36:59 -05:00
Dave Airlie
ba9086a6df Merge tag 'amd-drm-fixes-5.9-2020-08-20' of git://people.freedesktop.org/~agd5f/linux into drm-fixes
amd-drm-fixes-5.9-2020-08-20:

amdgpu:
- Fixes for Navy Flounder
- Misc display fixes
- RAS fix

amdkfd:
- SDMA fix for renoir

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Alex Deucher <alexdeucher@gmail.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200820041938.3928-1-alexander.deucher@amd.com
2020-08-21 10:17:52 +10:00
Jiansong Chen
da2446b66b Revert "drm/amdgpu: disable gfxoff for navy_flounder"
This reverts commit 9c9b17a7d1.
Newly released sdma fw (51.52) provides a fix for the issue.

Signed-off-by: Jiansong Chen <Jiansong.Chen@amd.com>
Reviewed-by: Kenneth Feng <kenneth.feng@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-08-19 23:55:04 -04:00
Dave Airlie
485d41b092 Merge tag 'amd-drm-fixes-5.9-2020-08-12' of git://people.freedesktop.org/~agd5f/linux into drm-fixes
amd-drm-fixes-5.9-2020-08-12:

amdgpu:
- Fix allocation size
- SR-IOV fixes
- Vega20 SMU feature state caching fix
- Fix custom pptable handling
- Arcturus golden settings update
- Several display fixes

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Alex Deucher <alexdeucher@gmail.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200813033610.4008-1-alexander.deucher@amd.com
2020-08-19 13:56:28 +10:00
Leo Liu
cdab4211f6 drm/amdgpu/jpeg: remove redundant check when it returns
Fix warning from kernel test robot
v2: remove the local variable as well

Signed-off-by: Leo Liu <leo.liu@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Acked-by: Nirmoy Das <nirmoy.das@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-08-18 18:22:16 -04:00
jqdeng
8e1d88f948 drm/amdgpu: Limit the error info print rate
Use function printk_ratelimit to limit the print rate.

Signed-off-by: jqdeng <Emily.Deng@amd.com>
Acked-by: Nirmoy Das <nirmoy.das@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-08-18 18:22:10 -04:00
jqdeng
9a1cddd637 drm/amdgpu: Fix repeatly flr issue
Only for no job running test case need to do recover in
flr notification.
For having job in mirror list, then let guest driver to
hit job timeout, and then do recover.

Signed-off-by: jqdeng <Emily.Deng@amd.com>
Acked-by: Nirmoy Das <nirmoy.das@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-08-18 18:22:02 -04:00
Jiansong Chen
332d790365 Revert "drm/amdgpu: disable gfxoff for navy_flounder"
This reverts commit ba4e049e63.
Newly released sdma fw (51.52) provides a fix for the issue.

Signed-off-by: Jiansong Chen <Jiansong.Chen@amd.com>
Reviewed-by: Kenneth Feng <kenneth.feng@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-08-18 18:20:34 -04:00
Luben Tuikov
9af5e21dac drm/scheduler: Remove priority macro INVALID (v2)
Remove DRM_SCHED_PRIORITY_INVALID. We no longer
carry around an invalid priority and cut it off
at the source.

Backwards compatibility behaviour of AMDGPU CTX
IOCTL passing in garbage for context priority
from user space and then mapping that to
DRM_SCHED_PRIORITY_NORMAL is preserved.

v2: Revert "res"  --> "r" and
           "prio" --> "priority".

Signed-off-by: Luben Tuikov <luben.tuikov@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-08-18 18:20:26 -04:00
Luben Tuikov
e2d732fdb7 drm/scheduler: Scheduler priority fixes (v2)
Remove DRM_SCHED_PRIORITY_LOW, as it was used
in only one place.

Rename and separate by a line
DRM_SCHED_PRIORITY_MAX to DRM_SCHED_PRIORITY_COUNT
as it represents a (total) count of said
priorities and it is used as such in loops
throughout the code. (0-based indexing is the
the count number.)

Remove redundant word HIGH in priority names,
and rename *KERNEL* to *HIGH*, as it really
means that, high.

v2: Add back KERNEL and remove SW and HW,
    in lieu of a single HIGH between NORMAL and KERNEL.

Signed-off-by: Luben Tuikov <luben.tuikov@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-08-18 18:20:17 -04:00
Huang Rui
34174b89bf drm/amdkfd: fix the wrong sdma instance query for renoir
Renoir only has one sdma instance, it will get failed once query the
sdma1 registers. So use switch-case instead of static register array.

Signed-off-by: Huang Rui <ray.huang@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-08-18 17:54:12 -04:00
Bhawanpreet Lakha
0a668aee0a drm/amdgpu: parse ta firmware for navy_flounder
Use the same case as sienna_cichlid

Signed-off-by: Bhawanpreet Lakha <Bhawanpreet.Lakha@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-08-18 17:53:50 -04:00
Guchun Chen
1a68d96f81 drm/amdgpu: fix NULL pointer access issue when unloading driver
When unloading driver by "modprobe -r amdgpu", one NULL pointer
dereference bug occurs in ras debugfs releasing. The cause is the
duplicated debugfs_remove, as drm debugfs_root dir has been cleaned
up already by drm_minor_unregister.

BUG: kernel NULL pointer dereference, address: 00000000000000a0
PGD 0 P4D 0
Oops: 0002 [#1] SMP PTI
CPU: 11 PID: 1526 Comm: modprobe Tainted: G           OE     5.6.0-guchchen #1
Hardware name: System manufacturer System Product Name/TUF Z370-PLUS GAMING II, BIOS 0411 09/21/2018
RIP: 0010:down_write+0x15/0x40
Code: eb de e8 7e 17 72 ff cc cc cc cc cc cc cc cc cc cc cc cc cc cc 0f 1f 44 00 00 53 48 89 fb e8 92
d8 ff ff 31 c0 ba 01 00 00 00 <f0> 48 0f b1 13 75 0f 65 48 8b 04 25 c0 8b 01 00 48 89 43 08 5b c3
RSP: 0018:ffffb1590386fcd0 EFLAGS: 00010246
RAX: 0000000000000000 RBX: 00000000000000a0 RCX: 0000000000000000
RDX: 0000000000000001 RSI: ffffffff85b2fcc2 RDI: 00000000000000a0
RBP: ffffb1590386fd30 R08: ffffffff85b2fcc2 R09: 000000000002b3c0
R10: ffff97a330618c40 R11: 00000000000005f6 R12: ffff97a3481beb40
R13: 00000000000000a0 R14: ffff97a3481beb40 R15: 0000000000000000
FS:  00007fb11a717540(0000) GS:ffff97a376cc0000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00000000000000a0 CR3: 00000004066d6006 CR4: 00000000003606e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
 simple_recursive_removal+0x63/0x370
 ? debugfs_remove+0x60/0x60
 debugfs_remove+0x40/0x60
 amdgpu_ras_fini+0x82/0x230 [amdgpu]
 ? __kernfs_remove.part.17+0x101/0x1f0
 ? kernfs_name_hash+0x12/0x80
 amdgpu_device_fini+0x1c0/0x580 [amdgpu]
 amdgpu_driver_unload_kms+0x3e/0x70 [amdgpu]
 amdgpu_pci_remove+0x36/0x60 [amdgpu]
 pci_device_remove+0x3b/0xb0
 device_release_driver_internal+0xe5/0x1c0
 driver_detach+0x46/0x90
 bus_remove_driver+0x58/0xd0
 pci_unregister_driver+0x29/0x90
 amdgpu_exit+0x11/0x25 [amdgpu]
 __x64_sys_delete_module+0x13d/0x210
 do_syscall_64+0x5f/0x250
 entry_SYSCALL_64_after_hwframe+0x44/0xa9

Signed-off-by: Guchun Chen <guchun.chen@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-08-18 17:00:46 -04:00
Jiansong Chen
9c9b17a7d1 drm/amdgpu: disable gfxoff for navy_flounder
gfxoff is temporarily disabled for navy_flounder,
since at present the feature has broken some basic
amdgpu test.

Signed-off-by: Jiansong Chen <Jiansong.Chen@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-08-18 16:58:59 -04:00
Maxime Ripard
d85ddd1318 Linux 5.9-rc1
-----BEGIN PGP SIGNATURE-----
 
 iQFSBAABCAA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAl85kWkeHHRvcnZhbGRz
 QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiGGPwIAJpEmEBkMoQ+KARK
 PaaVDQW9fwAlC1nThMpGv/m8Ym7KbfLkTgEJQiQyNv3pDDhyLP8jvcZcscIkfs4s
 56IMjFndRHWNeCVu9YPXWmAEp/WycZNC7YVPu0j1bI9VgvaHvbHOqUWzxB716RbY
 K4TFprJEA3sotNm0vdda2NgSlSup/0NVKiP2LwQPjkwH+Kf6/Ol1j2uxbWywEo75
 BdW5LreDtUoJ7W5BeX8GJ0IVgWdyxBV61eVbaINNY3EOPc7+uMGOgR9oHeGWRceH
 V4ELYww5yjizUDtKFvVTc/k0tj+Rq73mtOADdaF0YWItqxtDBvAcdKIpC0KYzVaa
 2fB+rts=
 =9Pnj
 -----END PGP SIGNATURE-----
gpgsig -----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQRcEzekXsqa64kGDp7j7w1vZxhRxQUCXzvIXwAKCRDj7w1vZxhR
 xYXLAQC80uF6JkpBeNyuewyY7CDadDG1qDchDmYquwGVDnO+HwEAmvL84csLcxBy
 ah3UMOKUyWz5Sahlg48ZIaaUhRaulwE=
 =Lu/B
 -----END PGP SIGNATURE-----

Merge v5.9-rc1 into drm-misc-next

Sam needs 5.9-rc1 to have dev_err_probe in to merge some patches.

Signed-off-by: Maxime Ripard <maxime@cerno.tech>
2020-08-18 14:14:25 +02:00
Kevin Wang
4444457450 drm/amdgpu: add condition check for trace_amdgpu_cs()
v1:
add trace event enabled check to avoid nop loop when submit multi ibs
in amdgpu_cs_ioctl() function.

v2:
add a new wrapper function to trace all amdgpu cs ibs.

Signed-off-by: Kevin Wang <kevin1.wang@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-08-17 14:06:37 -04:00