fs: introduce new truncate sequence

Introduce a new truncate calling sequence into fs/mm subsystems. Rather than
setattr > vmtruncate > truncate, have filesystems call their truncate sequence
from ->setattr if filesystem specific operations are required. vmtruncate is
deprecated, and truncate_pagecache and inode_newsize_ok helpers introduced
previously should be used.

simple_setattr is introduced for simple in-ram filesystems to implement
the new truncate sequence. Eventually all filesystems should be converted
to implement a setattr, and the default code in notify_change should go
away.

simple_setsize is also introduced to perform just the ATTR_SIZE portion
of simple_setattr (ie. changing i_size and trimming pagecache).

To implement the new truncate sequence:
- filesystem specific manipulations (eg freeing blocks) must be done in
  the setattr method rather than ->truncate.
- vmtruncate can not be used by core code to trim blocks past i_size in
  the event of write failure after allocation, so this must be performed
  in the fs code.
- convert usage of helpers block_write_begin, nobh_write_begin,
  cont_write_begin, and *blockdev_direct_IO* to use _newtrunc postfixed
  variants. These avoid calling vmtruncate to trim blocks (see previous).
- inode_setattr should not be used. generic_setattr is a new function
  to be used to copy simple attributes into the generic inode.
- make use of the better opportunity to handle errors with the new sequence.

Big problem with the previous calling sequence: the filesystem is not called
until i_size has already changed.  This means it is not allowed to fail the
call, and also it does not know what the previous i_size was. Also, generic
code calling vmtruncate to truncate allocated blocks in case of error had
no good way to return a meaningful error (or, for example, atomically handle
block deallocation).

Cc: Christoph Hellwig <hch@lst.de>
Acked-by: Jan Kara <jack@suse.cz>
Signed-off-by: Nick Piggin <npiggin@suse.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
This commit is contained in:
npiggin@suse.de 2010-05-27 01:05:33 +10:00 committed by Al Viro
parent 7000d3c424
commit 7bb46a6734
8 changed files with 300 additions and 63 deletions

View file

@ -8,6 +8,7 @@
#include <linux/slab.h>
#include <linux/mount.h>
#include <linux/vfs.h>
#include <linux/quotaops.h>
#include <linux/mutex.h>
#include <linux/exportfs.h>
#include <linux/writeback.h>
@ -325,6 +326,81 @@ int simple_rename(struct inode *old_dir, struct dentry *old_dentry,
return 0;
}
/**
* simple_setsize - handle core mm and vfs requirements for file size change
* @inode: inode
* @newsize: new file size
*
* Returns 0 on success, -error on failure.
*
* simple_setsize must be called with inode_mutex held.
*
* simple_setsize will check that the requested new size is OK (see
* inode_newsize_ok), and then will perform the necessary i_size update
* and pagecache truncation (if necessary). It will be typically be called
* from the filesystem's setattr function when ATTR_SIZE is passed in.
*
* The inode itself must have correct permissions and attributes to allow
* i_size to be changed, this function then just checks that the new size
* requested is valid.
*
* In the case of simple in-memory filesystems with inodes stored solely
* in the inode cache, and file data in the pagecache, nothing more needs
* to be done to satisfy a truncate request. Filesystems with on-disk
* blocks for example will need to free them in the case of truncate, in
* that case it may be easier not to use simple_setsize (but each of its
* components will likely be required at some point to update pagecache
* and inode etc).
*/
int simple_setsize(struct inode *inode, loff_t newsize)
{
loff_t oldsize;
int error;
error = inode_newsize_ok(inode, newsize);
if (error)
return error;
oldsize = inode->i_size;
i_size_write(inode, newsize);
truncate_pagecache(inode, oldsize, newsize);
return error;
}
EXPORT_SYMBOL(simple_setsize);
/**
* simple_setattr - setattr for simple in-memory filesystem
* @dentry: dentry
* @iattr: iattr structure
*
* Returns 0 on success, -error on failure.
*
* simple_setattr implements setattr for an in-memory filesystem which
* does not store its own file data or metadata (eg. uses the page cache
* and inode cache as its data store).
*/
int simple_setattr(struct dentry *dentry, struct iattr *iattr)
{
struct inode *inode = dentry->d_inode;
int error;
error = inode_change_ok(inode, iattr);
if (error)
return error;
if (iattr->ia_valid & ATTR_SIZE) {
error = simple_setsize(inode, iattr->ia_size);
if (error)
return error;
}
generic_setattr(inode, iattr);
return error;
}
EXPORT_SYMBOL(simple_setattr);
int simple_readpage(struct file *file, struct page *page)
{
clear_highpage(page);