blkcg: delay blkg destruction until after writeback has finished

Currently, blkcg destruction relies on a sequence of events:
  1. Destruction starts. blkcg_css_offline() is called and blkgs
     release their reference to the blkcg. This immediately destroys
     the cgwbs (writeback).
  2. With blkgs giving up their reference, the blkcg ref count should
     become zero and eventually call blkcg_css_free() which finally
     frees the blkcg.

Jiufei Xue reported that there is a race between blkcg_bio_issue_check()
and cgroup_rmdir(). To remedy this, blkg destruction becomes contingent
on the completion of all writeback associated with the blkcg. A count of
the number of cgwbs is maintained and once that goes to zero, blkg
destruction can follow. This should prevent premature blkg destruction
related to writeback.

The new process for blkcg cleanup is as follows:
  1. Destruction starts. blkcg_css_offline() is called which offlines
     writeback. Blkg destruction is delayed on the cgwb_refcnt count to
     avoid punting potentially large amounts of outstanding writeback
     to root while maintaining any ongoing policies. Here, the base
     cgwb_refcnt is put back.
  2. When the cgwb_refcnt becomes zero, blkcg_destroy_blkgs() is called
     and handles destruction of blkgs. This is where the css reference
     held by each blkg is released.
  3. Once the blkcg ref count goes to zero, blkcg_css_free() is called.
     This finally frees the blkg.

It seems in the past blk-throttle didn't do the most understandable
things with taking data from a blkg while associating with current. So,
the simplification and unification of what blk-throttle is doing caused
this.

Fixes: 08e18eab0c ("block: add bi_blkg to the bio for cgroups")
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Dennis Zhou <dennisszhou@gmail.com>
Cc: Jiufei Xue <jiufei.xue@linux.alibaba.com>
Cc: Joseph Qi <joseph.qi@linux.alibaba.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Josef Bacik <josef@toxicpanda.com>
Cc: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
This commit is contained in:
Dennis Zhou (Facebook) 2018-08-31 16:22:43 -04:00 committed by Jens Axboe
parent 6b06546206
commit 59b57717ff
3 changed files with 94 additions and 8 deletions

View file

@ -56,6 +56,7 @@ struct blkcg {
struct list_head all_blkcgs_node;
#ifdef CONFIG_CGROUP_WRITEBACK
struct list_head cgwb_list;
refcount_t cgwb_refcnt;
#endif
};
@ -386,6 +387,49 @@ static inline struct blkcg *cpd_to_blkcg(struct blkcg_policy_data *cpd)
return cpd ? cpd->blkcg : NULL;
}
extern void blkcg_destroy_blkgs(struct blkcg *blkcg);
#ifdef CONFIG_CGROUP_WRITEBACK
/**
* blkcg_cgwb_get - get a reference for blkcg->cgwb_list
* @blkcg: blkcg of interest
*
* This is used to track the number of active wb's related to a blkcg.
*/
static inline void blkcg_cgwb_get(struct blkcg *blkcg)
{
refcount_inc(&blkcg->cgwb_refcnt);
}
/**
* blkcg_cgwb_put - put a reference for @blkcg->cgwb_list
* @blkcg: blkcg of interest
*
* This is used to track the number of active wb's related to a blkcg.
* When this count goes to zero, all active wb has finished so the
* blkcg can continue destruction by calling blkcg_destroy_blkgs().
* This work may occur in cgwb_release_workfn() on the cgwb_release
* workqueue.
*/
static inline void blkcg_cgwb_put(struct blkcg *blkcg)
{
if (refcount_dec_and_test(&blkcg->cgwb_refcnt))
blkcg_destroy_blkgs(blkcg);
}
#else
static inline void blkcg_cgwb_get(struct blkcg *blkcg) { }
static inline void blkcg_cgwb_put(struct blkcg *blkcg)
{
/* wb isn't being accounted, so trigger destruction right away */
blkcg_destroy_blkgs(blkcg);
}
#endif
/**
* blkg_path - format cgroup path of blkg
* @blkg: blkg of interest