* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/penberg/slab-2.6:
slub: fix slab_pad_check()
slub: release kobject if sysfs_create_group failed in sysfs_slab_add
SLUB: fix ARCH_KMALLOC_MINALIGN cases 64 and 256
SLUB: Fix some coding style issues
SLUB: Drop write permission to /proc/slabinfo
slab: remove duplicate kmem_cache_init_late() declarations
slub: change kmem_cache->align to record the real alignment
slub: use size and objsize orders to disable debug flags
slub: add option to disable higher order debugging slabs
* 'osync_cleanup' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs-2.6:
fsync: wait for data writeout completion before calling ->fsync
vfs: Remove generic_osync_inode() and sync_page_range{_nolock}()
fat: Opencode sync_page_range_nolock()
pohmelfs: Use new syncing helper
xfs: Convert sync_page_range() to simple filemap_write_and_wait_range()
ocfs2: Update syncing after splicing to match generic version
ntfs: Use new syncing helpers and update comments
ext4: Remove syncing logic from ext4_file_write
ext3: Remove syncing logic from ext3_file_write
ext2: Update comment about generic_osync_inode
vfs: Introduce new helpers for syncing after writing to O_SYNC file or IS_SYNC inode
vfs: Rename generic_file_aio_write_nolock
ocfs2: Use __generic_file_aio_write instead of generic_file_aio_write_nolock
pohmelfs: Use __generic_file_aio_write instead of generic_file_aio_write_nolock
vfs: Remove syncing from generic_file_direct_write() and generic_file_buffered_write()
vfs: Export __generic_file_aio_write() and add some comments
vfs: Introduce filemap_fdatawait_range
* 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-2.6-nmw:
GFS2: Whitespace fixes
GFS2: Remove unused sysfs file
GFS2: Be extra careful about deallocating inodes
GFS2: Remove no_formal_ino generating code
GFS2: Rename eattr.[ch] as xattr.[ch]
GFS2: Clean up of extended attribute support
GFS2: Add explanation of extended attr on-disk format
GFS2: Add "-o errors=panic|withdraw" mount options
GFS2: jumping to wrong label?
GFS2: free disk inode which is deleted by remote node -V2
GFS2: Add a document explaining GFS2's uevents
GFS2: Add sysfs link to device
GFS2: Replace assertion with proper error handling
GFS2: Improve error handling in inode allocation
GFS2: Add some more info to uevents
GFS2: Add online uevent to GFS2
* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-udf-2.6:
udf: Fix possible corruption when close races with write
udf: Perform preallocation only for regular files
udf: Remove wrong assignment in udf_symlink
udf: Remove dead code
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ryusuke/nilfs2: (21 commits)
fs/Kconfig: move nilfs2 outside misc filesystems
nilfs2: convert nilfs_bmap_lookup to an inline function
nilfs2: allow btree code to directly call dat operations
nilfs2: add update functions of virtual block address to dat
nilfs2: remove individual gfp constants for each metadata file
nilfs2: stop zero-fill of btree path just before free it
nilfs2: remove unused btree argument from btree functions
nilfs2: remove nilfs_dat_abort_start and nilfs_dat_abort_free
nilfs2: shorten freeze period due to GC in write operation v3
nilfs2: add more check routines in mount process
nilfs2: An unassigned variable is assigned to a never used structure member
nilfs2: use GFP_NOIO for bio_alloc instead of GFP_NOWAIT
nilfs2: stop using periodic write_super callback
nilfs2: clean up nilfs_write_super
nilfs2: fix disorder of nilfs_write_super in nilfs_sync_fs
nilfs2: remove redundant super block commit
nilfs2: implement nilfs_show_options to display mount options in /proc/mounts
nilfs2: always lookup disk block address before reading metadata block
nilfs2: use semaphore to protect pointer to a writable FS-instance
nilfs2: fix format string compile warning (ino_t)
...
* git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6:
cifs: consolidate reconnect logic in smb_init routines
cifs: Replace wrtPending with a real reference count
cifs: protect GlobalOplock_Q with its own spinlock
cifs: use tcon pointer in cifs_show_options
cifs: send IPv6 addr in upcall with colon delimiters
[CIFS] Fix checkpatch warnings
PATCH] cifs: fix broken mounts when a SSH tunnel is used (try #4)
[CIFS] Memory leak in ntlmv2 hash calculation
[CIFS] potential NULL dereference in parse_DFS_referrals()
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc-next-2.6: (21 commits)
sparc64: Initial niagara2 perf counter support.
sparc64: Perf counter 'nop' event is not constant.
sparc64: Provide a way to specify a perf counter overflow IRQ enable bit.
sparc64: Provide hypervisor tracing bit support for perf counters.
sparc64: Initial hw perf counter support.
sparc64: Implement a real set_perf_counter_pending().
sparc64: Use nmi_enter() and nmi_exit(), as needed.
sparc64: Provide extern decls for sparc_??u_type strings.
sparc64: Make touch_nmi_watchdog() actually work.
sparc64: Kill unnecessary cast in profile_timer_exceptions_notify().
sparc64: Manage NMI watchdog enabling like x86.
sparc: add basic support for 'perf'
sparc: convert /proc/io_map, /proc/dvma_map to seq_file
sparc, leon: sparc-leon specific SRMMU initialization and bootup fixes.
sparc,leon: Added support for AMBAPP bus.
sparc,leon: Introduce the sparc-leon CPU type.
sparc,leon: Redefine MMU register access asi if CONFIG_LEON
sparc,leon: CONFIG_SPARC_LEON option and leon specific files.
sparc64: cheaper asm/uaccess.h inclusion
SPARC: fix duplicate declaration
...
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6: (1623 commits)
netxen: update copyright
netxen: fix tx timeout recovery
netxen: fix file firmware leak
netxen: improve pci memory access
netxen: change firmware write size
tg3: Fix return ring size breakage
netxen: build fix for INET=n
cdc-phonet: autoconfigure Phonet address
Phonet: back-end for autoconfigured addresses
Phonet: fix netlink address dump error handling
ipv6: Add IFA_F_DADFAILED flag
net: Add DEVTYPE support for Ethernet based devices
mv643xx_eth.c: remove unused txq_set_wrr()
ucc_geth: Fix hangs after switching from full to half duplex
ucc_geth: Rearrange some code to avoid forward declarations
phy/marvell: Make non-aneg speed/duplex forcing work for 88E1111 PHYs
drivers/net/phy: introduce missing kfree
drivers/net/wan: introduce missing kfree
net: force bridge module(s) to be GPL
Subject: [PATCH] appletalk: Fix skb leak when ipddp interface is not loaded
...
Fixed up trivial conflicts:
- arch/x86/include/asm/socket.h
converted to <asm-generic/socket.h> in the x86 tree. The generic
header has the same new #define's, so that works out fine.
- drivers/net/tun.c
fix conflict between 89f56d1e9 ("tun: reuse struct sock fields") that
switched over to using 'tun->socket.sk' instead of the redundantly
available (and thus removed) 'tun->sk', and 2b980dbd ("lsm: Add hooks
to the TUN driver") which added a new 'tun->sk' use.
Noted in 'next' by Stephen Rothwell.
* 'x86-xen-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
x86: split __phys_addr out into separate file
xen: use stronger barrier after unlocking lock
xen: only enable interrupts while actually blocking for spinlock
xen: make -fstack-protector work under Xen
When we close a file, we remove preallocated blocks from it. But this
truncation was not protected by i_mutex and thus it could have raced with a
write through a different fd and cause crashes or even filesystem corruption.
Signed-off-by: Jan Kara <jack@suse.cz>
So far we preallocated blocks also for directories but that brings a
problem, when to get rid of preallocated blocks we don't need. So far
we removed them in udf_clear_inode() which has a disadvantage that
1) blocks are unavailable long after writing to a directory finished
and thus one can get out of space unnecessarily early
2) releasing blocks from udf_clear_inode is problematic because VFS
does not expect us to redirty inode there and it also slows down
memory reclaim.
So preallocate blocks only for regular files where we can drop preallocation
in udf_release_file.
Signed-off-by: Jan Kara <jack@suse.cz>
Recomputation of the pointer was wrong (it should have been just increment).
Luckily, we never use the computed value. Remove it.
Signed-off-by: Jan Kara <jack@suse.cz>
Those get reported in MC0_STATUS, see Table 92, F10h BKDG (31116, rev.
3.28) for more details.
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
This is the MCE error code from the MCi_STATUS banks, bits [15:0] which
describe what type of error was encountered: GART TLB, Memory or Bus
error. The semantics of those bits are identical across all MCE banks so
decode those separately, irrespectively of MCE type.
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
The MCi_STATUS registers have most field definitions in common so decode
them in the general path. Do not pass ecc_type along and compute it in
__amd64_decode_bus_error instead.
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
Move NB decoder along with required defines to EDAC MCE core. Add
registration routines for further decoding of the MCE info in the AMD64
EDAC module.
CC: Andi Kleen <andi@firstfloor.org>
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
This is in preparation of adding AMD-specific MCE decoding functionality
to the EDAC core. The error decoding macros originate from the AMD64
EDAC driver albeit in a simplified and cleaned up version here.
While at it, add macros to generate the error description strings and
use them in the error type decoders directly which removes a bunch of
code and makes the decoding functions much more readable. Also, fix
strings and shorten macro names.
Remove superfluous htlink_msgs.
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
Currenly vfs_fsync(_range) first calls filemap_fdatawrite to write out
the data, the calls into ->fsync to write out the metadata and then finally
calls filemap_fdatawait to wait for the data I/O to complete. What sounds
like a clever micro-optimization actually is nast trap for many filesystems.
For many modern filesystems i_size or other inode information is only
updated on I/O completion and we need to wait for I/O to finish before
we can write out the metadata. For old fashionen filesystems that
instanciate blocks during the actual write and also update the metadata
at that point it opens up a large window were we could expose uninitialized
blocks after a crash. While a few filesystems that need it already wait
for the I/O to finish inside their ->fsync methods it is rather suboptimal
as it is done under the i_mutex and also always for the whole file instead
of just a part as we could do for O_SYNC handling.
Here is a small audit of all fsync instances in the tree:
- spufs_mfc_fsync:
- ps3flash_fsync:
- vol_cdev_fsync:
- printer_fsync:
- fb_deferred_io_fsync:
- bad_file_fsync:
- simple_sync_file:
don't care - filesystems/drivers do't use the page cache or are
purely in-memory.
- simple_fsync:
- file_fsync:
- affs_file_fsync:
- fat_file_fsync:
- jfs_fsync:
- ubifs_fsync:
- reiserfs_dir_fsync:
- reiserfs_sync_file:
never touch pagecache themselves. We need to wait before if we do
not want to expose stale data after an allocation.
- afs_fsync:
- fuse_fsync_common:
do the waiting writeback itself in awkward ways, would benefit from
proper semantics
- block_fsync:
Does a filemap_write_and_wait on the block device inode. Because we
now have f_mapping that is the same inode we call it on in vfs_fsync.
So just removing it and letting the VFS do the work in one go would
be an improvement.
- btrfs_sync_file:
- cifs_fsync:
- xfs_file_fsync:
need the wait first and currently do it themselves. would benefit from
doing it outside i_mutex.
- coda_fsync:
- ecryptfs_fsync:
- exofs_file_fsync:
- shm_fsync:
only passes the fsync through to the lower layer
- ext3_sync_file:
doesn't seem to care, comments are confusing.
- ext4_sync_file:
would need the wait to work correctly for delalloc mode with late
i_size updates. Otherwise the ext3 comment applies.
currently implemens it's own writeback and wait in an odd way,
could benefit from doing it properly.
- gfs2_fsync:
not needed for journaled data mode, but probably harmless there.
Currently writes back data asynchronously itself. Needs some
major audit.
- hostfs_fsync:
just calls fsync/datasync on the host FD. Without the wait before
data might not even be inflight yet if we're unlucky.
- hpfs_file_fsync:
- ncp_fsync:
no-ops. Dangerous before and after.
- jffs2_fsync:
just calls jffs2_flush_wbuf_gc, not sure how this relates to data.
- nfs_fsync_dir:
just increments stats, claims all directory operations are synchronous
- nfs_file_fsync:
only writes out data??? Looks very odd.
- nilfs_sync_file:
looks like it expects all data done, but not sure from the code
- ntfs_dir_fsync:
- ntfs_file_fsync:
appear to do their own data writeback. Very convoluted code.
- ocfs2_sync_file:
does it's own data writeback, but no wait. probably needs the wait.
- smb_fsync:
according to a comment expects all pages written already, probably needs
the wait before.
This patch only changes vfs_fsync_range, removal of the wait in the methods
that have it is left to the filesystem maintainers. Note that most
filesystems really do need an audit for their fsync methods given the
gems found in this very brief audit.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jan Kara <jack@suse.cz>
fat_cont_expand() is the only user of sync_page_range_nolock(). It's also the
only user of generic_osync_inode() which does not have a file open. So
opencode needed actions for FAT so that we can convert generic_osync_inode() to
a standard syncing path.
Update a comment about generic_osync_inode().
CC: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Signed-off-by: Jan Kara <jack@suse.cz>
Christoph Hellwig says that it is enough for XFS to call
filemap_write_and_wait_range() instead of sync_page_range() because we do
all the metadata syncing when forcing the log.
CC: Felix Blyakher <felixb@sgi.com>
CC: xfs@oss.sgi.com
CC: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jan Kara <jack@suse.cz>
Update ocfs2 specific splicing code to use generic syncing helper. The sync now
does not happen under rw_lock because generic_write_sync() acquires i_mutex
which ranks above rw_lock. That should not matter because standard fsync path
does not hold it either.
Acked-by: Joel Becker <Joel.Becker@oracle.com>
Acked-by: Mark Fasheh <mfasheh@suse.com>
CC: ocfs2-devel@oss.oracle.com
Signed-off-by: Jan Kara <jack@suse.cz>
Use new syncing helpers in .write and .aio_write functions. Also
remove superfluous syncing in ntfs_file_buffered_write() and update
comments about generic_osync_inode().
CC: Anton Altaparmakov <aia21@cantab.net>
CC: linux-ntfs-dev@lists.sourceforge.net
Signed-off-by: Jan Kara <jack@suse.cz>
The syncing is now properly handled by generic_file_aio_write() so
no special ext4 code is needed.
CC: linux-ext4@vger.kernel.org
CC: tytso@mit.edu
Signed-off-by: Jan Kara <jack@suse.cz>
Syncing is now properly done by generic_file_aio_write() so no special logic is
needed in ext3.
CC: linux-ext4@vger.kernel.org
Signed-off-by: Jan Kara <jack@suse.cz>
Introduce new function for generic inode syncing (vfs_fsync_range) and use
it from fsync() path. Introduce also new helper for syncing after a sync
write (generic_write_sync) using the generic function.
Use these new helpers for syncing from generic VFS functions. This makes
O_SYNC writes to block devices acquire i_mutex for syncing. If we really
care about this, we can make block_fsync() drop the i_mutex and reacquire
it before it returns.
CC: Evgeniy Polyakov <zbr@ioremap.net>
CC: ocfs2-devel@oss.oracle.com
CC: Joel Becker <joel.becker@oracle.com>
CC: Felix Blyakher <felixb@sgi.com>
CC: xfs@oss.sgi.com
CC: Anton Altaparmakov <aia21@cantab.net>
CC: linux-ntfs-dev@lists.sourceforge.net
CC: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
CC: linux-ext4@vger.kernel.org
CC: tytso@mit.edu
Acked-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jan Kara <jack@suse.cz>
generic_file_aio_write_nolock() is now used only by block devices and raw
character device. Filesystems should use __generic_file_aio_write() in case
generic_file_aio_write() doesn't suit them. So rename the function to
blkdev_aio_write() and move it to fs/blockdev.c.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jan Kara <jack@suse.cz>
Use the new helper. We have to submit data pages ourselves in case of O_SYNC
write because __generic_file_aio_write does not do it for us. OCFS2 developpers
might think about moving the sync out of i_mutex which seems to be easily
possible but that's out of scope of this patch.
CC: ocfs2-devel@oss.oracle.com
Acked-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Use new helper __generic_file_aio_write(). Since the fs takes care of syncing
by itself afterwards, there are no more changes needed.
CC: Evgeniy Polyakov <zbr@ioremap.net>
Signed-off-by: Jan Kara <jack@suse.cz>
generic_file_direct_write() and generic_file_buffered_write() called
generic_osync_inode() if it was called on O_SYNC file or IS_SYNC inode. But
this is superfluous since generic_file_aio_write() does the syncing as well.
Also XFS and OCFS2 which call these functions directly handle syncing
themselves. So let's have a single place where syncing happens:
generic_file_aio_write().
We slightly change the behavior by syncing only the range of file to which the
write happened for buffered writes but that should be all that is required.
CC: ocfs2-devel@oss.oracle.com
CC: Joel Becker <joel.becker@oracle.com>
CC: Felix Blyakher <felixb@sgi.com>
CC: xfs@oss.sgi.com
Signed-off-by: Jan Kara <jack@suse.cz>
Rename __generic_file_aio_write_nolock() to __generic_file_aio_write(), add
comments to write helpers explaining how they should be used and export
__generic_file_aio_write() since it will be used by some filesystems.
CC: ocfs2-devel@oss.oracle.com
CC: Joel Becker <joel.becker@oracle.com>
Acked-by: Evgeniy Polyakov <zbr@ioremap.net>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jan Kara <jack@suse.cz>
This simple helper saves some filesystems conversion from byte offset
to page numbers and also makes the fdata* interface more complete.
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jan Kara <jack@suse.cz>
* 'x86-percpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
x86, percpu: Collect hot percpu variables into one cacheline
x86, percpu: Fix DECLARE/DEFINE_PER_CPU_PAGE_ALIGNED()
x86, percpu: Add 'percpu_read_stable()' interface for cacheable accesses
* 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
x86, highmem_32.c: Clean up comment
x86, pgtable.h: Clean up types
x86: Clean up dump_pagetable()