Star64_linux/drivers/edac
Robert Richter cb51a371d0 EDAC/ghes: Setup DIMM label from DMI and use it in error reports
The ghes driver reports errors with 'unknown label' even if the actual
DIMM label is known, e.g.:

 EDAC MC0: 1 CE Single-bit ECC on unknown label (node:0 card:0
   module:0 rank:1 bank:0 col:13 bit_pos:16 DIMM location:N0 DIMM_A0
   page:0x966a9b3 offset:0x0 grain:1 syndrome:0x0 - APEI location:
   node:0 card:0 module:0 rank:1 bank:0 col:13 bit_pos:16 DIMM
   location:N0 DIMM_A0 status(0x0000000000000400): Storage error in
   DRAM memory)

Fix this by using struct dimm_info's label string in error reports:

 EDAC MC0: 1 CE Single-bit ECC on N0 DIMM_A0 (node:0 card:0 module:0
   rank:1 bank:515 col:14 bit_pos:16 DIMM location:N0 DIMM_A0
   page:0x99223d8 offset:0x0 grain:1 syndrome:0x0 - APEI location:
   node:0 card:0 module:0 rank:1 bank:515 col:14 bit_pos:16 DIMM
   location:N0 DIMM_A0 status(0x0000000000000400): Storage error in
   DRAM memory)

The labels are initialized by reading the bank and device strings
from DMI. Now, the label information can also read from sysfs. E.g. a
ThunderX2 system will show the following:

  /sys/devices/system/edac/mc/mc0/dimm0/dimm_label:N0 DIMM_A0
  /sys/devices/system/edac/mc/mc0/dimm1/dimm_label:N0 DIMM_B0
  /sys/devices/system/edac/mc/mc0/dimm2/dimm_label:N0 DIMM_C0
  /sys/devices/system/edac/mc/mc0/dimm3/dimm_label:N0 DIMM_D0
  /sys/devices/system/edac/mc/mc0/dimm4/dimm_label:N0 DIMM_E0
  /sys/devices/system/edac/mc/mc0/dimm5/dimm_label:N0 DIMM_F0
  /sys/devices/system/edac/mc/mc0/dimm6/dimm_label:N0 DIMM_G0
  /sys/devices/system/edac/mc/mc0/dimm7/dimm_label:N0 DIMM_H0
  /sys/devices/system/edac/mc/mc0/dimm8/dimm_label:N1 DIMM_I0
  /sys/devices/system/edac/mc/mc0/dimm9/dimm_label:N1 DIMM_J0
  /sys/devices/system/edac/mc/mc0/dimm10/dimm_label:N1 DIMM_K0
  /sys/devices/system/edac/mc/mc0/dimm11/dimm_label:N1 DIMM_L0
  /sys/devices/system/edac/mc/mc0/dimm12/dimm_label:N1 DIMM_M0
  /sys/devices/system/edac/mc/mc0/dimm13/dimm_label:N1 DIMM_N0
  /sys/devices/system/edac/mc/mc0/dimm14/dimm_label:N1 DIMM_O0
  /sys/devices/system/edac/mc/mc0/dimm15/dimm_label:N1 DIMM_P0

Since dimm_labels can be rewritten, that label will be used in a later
error report:

  # echo foobar >/sys/devices/system/edac/mc/mc0/dimm0/dimm_label
  # # some error injection here
  # dmesg | grep foobar
  [ 751.383533] EDAC MC0: 1 CE Single-bit ECC on foobar (node:0 card:0
  module:0 rank:1 bank:259 col:3 bit_pos:16 DIMM location:N0 DIMM_A0
  page:0x8c8dc74 offset:0x0 grain:1 syndrome:0x0 - APEI location:
  node:0 card:0 module:0 rank:1 bank:259 col:3 bit_pos:16 DIMM
  location:N0 DIMM_A0 status(0x0000000000000400): Storage error in DRAM
  memory)

 [ bp: Remove curly brackets around a single if-statement in dimm_setup_label(). ]

Signed-off-by: Robert Richter <rrichter@marvell.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20200528101307.23245-1-rrichter@marvell.com
2020-06-16 15:22:04 +02:00
..
altera_edac.c EDAC/altera: Use the Altera System Manager driver 2019-11-22 10:18:29 +01:00
altera_edac.h
amd64_edac.c Merge branch 'x86/entry' into ras/core 2020-06-11 15:17:57 +02:00
amd64_edac.h EDAC/amd64: Add AMD family 17h model 60h PCI IDs 2020-05-22 18:43:13 +02:00
amd64_edac_dbg.c
amd64_edac_inj.c
amd76x_edac.c
amd8111_edac.c
amd8111_edac.h
amd8131_edac.c EDAC/amd8131: Remove defined but not used bridge_str 2020-04-24 09:08:47 +02:00
amd8131_edac.h
armada_xp_edac.c EDAC/armada_xp: Fix some log messages 2020-04-14 11:28:09 +02:00
aspeed_edac.c EDAC/aspeed: Remove unneeded semicolon 2019-12-19 07:27:09 +01:00
bluefield_edac.c
cell_edac.c
cpc925_edac.c
debugfs.c
dmc520_edac.c EDAC: Add EDAC driver for DMC520 2020-02-19 21:00:27 +01:00
e7xxx_edac.c
e752x_edac.c
edac_device.c EDAC/device: Rework error logging API 2019-10-09 13:01:42 +02:00
edac_device.h EDAC/device: Rework error logging API 2019-10-09 13:01:42 +02:00
edac_device_sysfs.c
edac_mc.c EDAC: Drop the EDAC report status checks 2020-04-14 16:01:01 +02:00
edac_mc.h EDAC/mc: Determine mci pointer from the error descriptor 2020-02-17 13:05:10 +01:00
edac_mc_sysfs.c EDAC/mc: Remove per layer counters 2020-02-17 13:37:00 +01:00
edac_module.c
edac_module.h EDAC/mc: Change mci device removal to use put_device() 2020-02-17 12:32:44 +01:00
edac_pci.c
edac_pci.h
edac_pci_sysfs.c
fsl_ddr_edac.c
fsl_ddr_edac.h
ghes_edac.c EDAC/ghes: Setup DIMM label from DMI and use it in error reports 2020-06-16 15:22:04 +02:00
highbank_l2_edac.c
highbank_mc_edac.c
i7core_edac.c x86/mce: Fix all mce notifiers to update the mce->kflags bitmask 2020-04-14 15:59:26 +02:00
i10nm_base.c Merge branches 'edac-i10nm' and 'edac-misc' into edac-updates-for-5.8 2020-06-01 11:39:15 +02:00
i3000_edac.c remove ioremap_nocache and devm_ioremap_nocache 2020-01-06 09:45:59 +01:00
i3200_edac.c remove ioremap_nocache and devm_ioremap_nocache 2020-01-06 09:45:59 +01:00
i5000_edac.c EDAC: Replace EDAC_DIMM_PTR() macro with edac_get_dimm() function 2019-11-09 10:32:32 +01:00
i5100_edac.c EDAC: remove set but not used variable 'ecc_loc' 2019-12-16 13:54:02 -08:00
i5400_edac.c EDAC: Replace EDAC_DIMM_PTR() macro with edac_get_dimm() function 2019-11-09 10:32:32 +01:00
i7300_edac.c EDAC: Replace EDAC_DIMM_PTR() macro with edac_get_dimm() function 2019-11-09 10:32:32 +01:00
i82443bxgx_edac.c
i82860_edac.c
i82875p_edac.c
i82975x_edac.c remove ioremap_nocache and devm_ioremap_nocache 2020-01-06 09:45:59 +01:00
ie31200_edac.c remove ioremap_nocache and devm_ioremap_nocache 2020-01-06 09:45:59 +01:00
Kconfig treewide: replace '---help---' in Kconfig files with 'help' 2020-06-14 01:57:21 +09:00
layerscape_edac.c
Makefile EDAC: Add EDAC driver for DMC520 2020-02-19 21:00:27 +01:00
mce_amd.c x86/mce: Fix all mce notifiers to update the mce->kflags bitmask 2020-04-14 15:59:26 +02:00
mce_amd.h x86/mce/amd, edac: Remove report_gart_errors 2020-04-14 15:53:46 +02:00
mpc85xx_edac.c
mpc85xx_edac.h
mv64x60_edac.c
mv64x60_edac.h
octeon_edac-l2c.c
octeon_edac-lmc.c
octeon_edac-pc.c
octeon_edac-pci.c
pasemi_edac.c
pnd2_edac.c EDAC: Drop the EDAC report status checks 2020-04-14 16:01:01 +02:00
pnd2_edac.h
ppc4xx_edac.c
ppc4xx_edac.h
qcom_edac.c
r82600_edac.c
sb_edac.c EDAC: Drop the EDAC report status checks 2020-04-14 16:01:01 +02:00
sifive_edac.c A garden variety of small fixes all over the place. 2020-01-27 09:16:22 -08:00
skx_base.c Merge branches 'edac-i10nm' and 'edac-misc' into edac-updates-for-5.8 2020-06-01 11:39:15 +02:00
skx_common.c Merge branch 'x86/entry' into ras/core 2020-06-11 15:17:57 +02:00
skx_common.h Merge branches 'edac-i10nm' and 'edac-misc' into edac-updates-for-5.8 2020-06-01 11:39:15 +02:00
synopsys_edac.c EDAC/synopsys: Do not dump uninitialized pinf->col 2020-03-17 14:32:31 +01:00
thunderx_edac.c EDAC/thunderx: Make symbols static 2020-04-23 12:07:24 +02:00
ti_edac.c EDAC: Replace EDAC_DIMM_PTR() macro with edac_get_dimm() function 2019-11-09 10:32:32 +01:00
wq.c
x38_edac.c remove ioremap_nocache and devm_ioremap_nocache 2020-01-06 09:45:59 +01:00
xgene_edac.c EDAC/xgene: Remove set but not used address local var 2020-04-14 14:35:19 +02:00