devlink: Add health report functionality

Upon error discover, every driver can report it to the devlink health
mechanism via devlink_health_report function, using the appropriate
reporter registered to it. Driver can pass error specific context which
will be delivered to it as part of the dump / recovery callbacks.

Once an error is reported, devlink health will do the following actions:
* A log is being send to the kernel trace events buffer
* Health status and statistics are being updated for the reporter instance
* Object dump is being taken and stored at the reporter instance (as long
  as there is no other dump which is already stored)
* Auto recovery attempt is being done. depends on:
  - Auto Recovery configuration
  - Grace period vs. time since last recover

Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Reviewed-by: Moshe Shemesh <moshe@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This commit is contained in:
Eran Ben Elisha 2019-01-17 23:59:12 +02:00 committed by David S. Miller
parent 880ee82f03
commit c7af343b4e
3 changed files with 164 additions and 0 deletions

View file

@ -641,6 +641,8 @@ devlink_health_reporter_destroy(struct devlink_health_reporter *reporter);
void *
devlink_health_reporter_priv(struct devlink_health_reporter *reporter);
int devlink_health_report(struct devlink_health_reporter *reporter,
const char *msg, void *priv_ctx);
#else
static inline struct devlink *devlink_alloc(const struct devlink_ops *ops,
@ -979,6 +981,13 @@ devlink_health_reporter_priv(struct devlink_health_reporter *reporter)
{
return NULL;
}
static inline int
devlink_health_report(struct devlink_health_reporter *reporter,
const char *msg, void *priv_ctx)
{
return 0;
}
#endif
#endif /* _NET_DEVLINK_H_ */