The situation: VFS inode X on a mounted ntfs volume is dirty. For
same inode X, the ntfs_inode is dirty and thus corresponding on-disk
inode, i.e. mft record, which is in a dirty PAGE_CACHE_PAGE belonging
to the table of inodes, i.e. $MFT, inode 0.
What happens:
Process 1: sys_sync()/umount()/whatever... calls
__sync_single_inode() for $MFT -> do_writepages() -> write_page for
the dirty page containing the on-disk inode X, the page is now locked
-> ntfs_write_mst_block() which clears PageUptodate() on the page to
prevent anyone else getting hold of it whilst it does the write out.
This is necessary as the on-disk inode needs "fixups" applied before
the write to disk which are removed again after the write and
PageUptodate is then set again. It then analyses the page looking
for dirty on-disk inodes and when it finds one it calls
ntfs_may_write_mft_record() to see if it is safe to write this
on-disk inode. This then calls ilookup5() to check if the
corresponding VFS inode is in icache(). This in turn calls ifind()
which waits on the inode lock via wait_on_inode whilst holding the
global inode_lock.
Process 2: pdflush results in a call to __sync_single_inode for the
same VFS inode X on the ntfs volume. This locks the inode (I_LOCK)
then calls write-inode -> ntfs_write_inode -> map_mft_record() ->
read_cache_page() for the page (in page cache of table of inodes
$MFT, inode 0) containing the on-disk inode. This page has
PageUptodate() clear because of Process 1 (see above) so
read_cache_page() blocks when it tries to take the page lock for the
page so it can call ntfs_read_page().
Thus Process 1 is holding the page lock on the page containing the
on-disk inode X and it is waiting on the inode X to be unlocked in
ifind() so it can write the page out and then unlock the page.
And Process 2 is holding the inode lock on inode X and is waiting for
the page to be unlocked so it can call ntfs_readpage() or discover
that Process 1 set PageUptodate() again and use the page.
Thus we have a deadlock due to ifind() waiting on the inode lock.
The solution: The fix is to use the newly introduced
ilookup5_nowait() which does not wait on the inode's lock and hence
avoids the deadlock. This is safe as we do not care about the VFS
inode and only use the fact that it is in the VFS inode cache and the
fact that the vfs and ntfs inodes are one struct in memory to find
the ntfs inode in memory if present. Also, the ntfs inode has its
own locking so it does not matter if the vfs inode is locked.
Signed-off-by: Anton Altaparmakov <aia21@cantab.net>
if the requested vcn is inside it. Otherwise we get into problems
when we try to map an out of bounds vcn because we then try to map
the already mapped runlist fragment which causes
ntfs_mapping_pairs_decompress() to fail and return error. Update
ntfs_attr_find_vcn_nolock() accordingly.
Signed-off-by: Anton Altaparmakov <aia21@cantab.net>
and ntfs_mapping_pairs_build() to allow the runlist encoding to be
partial which is desirable when filling holes in sparse attributes.
Update all callers.
Signed-off-by: Anton Altaparmakov <aia21@cantab.net>
with a 64-bit variable and a int, i.e. 32-bit, constant. This causes
the higher order 32-bits of the 64-bit variable to be zeroed. To fix
this cast the 'const' to the same 64-bit type as 'var'.
Signed-off-by: Anton Altaparmakov <aia21@cantab.net>
to be mounted and if this is the case do not allow (re)mounting
read-write. This is done by parsing hiberfil.sys if present.
Signed-off-by: Anton Altaparmakov <aia21@cantab.net>
if the runlist was not mapped at all and a mapping error occured we
would leave the runlist locked on exit to the function so that the
next access to the same file would try to take the lock and deadlock.
Signed-off-by: Anton Altaparmakov <aia21@cantab.net>
is active on the volume and we are mounting read-write or remounting
from read-only to read-write.
Signed-off-by: Anton Altaparmakov <aia21@cantab.net>
Thus, relax the checking in fs/ntfs/super.c::is_boot_sector_ntfs() to
only emit a warning when the checksum is incorrect rather than
refusing the mount. Thanks to Bernd Casimir for pointing this
problem out.
Signed-off-by: Anton Altaparmakov <aia21@cantab.net>
- Add ifdef NTFS_RW around write specific code if fs/ntfs/runlist.[hc] and
fs/ntfs/attrib.[hc].
- Minor bugfix to fs/ntfs/attrib.c::ntfs_attr_make_non_resident() where the
runlist was not freed in all error cases.
- Add fs/ntfs/runlist.[hc]::ntfs_rl_find_vcn_nolock().
Signed-off-by: Anton Altaparmakov <aia21@cantab.net>
and handle the case where an attribute is converted from resident
to non-resident by a concurrent file write.
- Reorder some operations when converting an attribute from resident
to non-resident (fs/ntfs/attrib.c) so it is safe wrt concurrent
->readpage and ->writepage.
Signed-off-by: Anton Altaparmakov <aia21@cantab.net>
dropping the read lock and taking the write lock we were not checking
whether someone else did not already do the work we wanted to do.
- Rename ntfs_find_vcn_nolock() to ntfs_attr_find_vcn_nolock().
- Tidy up some comments in fs/ntfs/runlist.c.
- Add LCN_ENOMEM and LCN_EIO definitions to fs/ntfs/runlist.h.
Signed-off-by: Anton Altaparmakov <aia21@cantab.net>
checked and set in the ntfs inode as done for compressed files
and the compressed size needs to be used for vfs inode->i_blocks
instead of the allocated size, again, as done for compressed files.
Signed-off-by: Anton Altaparmakov <aia21@cantab.net>
definition of ntfs_export_ops from fs/ntfs/super.c to namei.c.
Also, declare ntfs_export_ops in fs/ntfs/ntfs.h.
Signed-off-by: Anton Altaparmakov <aia21@cantab.net>
mft record for resident attributes (fs/ntfs/inode.c).
- Small readability cleanup to use "a" instead of "ctx->attr"
everywhere (fs/ntfs/inode.c).
Signed-off-by: Anton Altaparmakov <aia21@cantab.net>
warning in the do_div() call on sparc32. Thanks to Meelis Roos for the
report and analysis of the warning.
Signed-off-by: Anton Altaparmakov <aia21@cantab.net>
helper ntfs_map_runlist_nolock() which is used by ntfs_map_runlist().
This allows us to map runlist fragments with the runlist lock already
held without having to drop and reacquire it around the call. Adapt
all callers.
- Change ntfs_find_vcn() to ntfs_find_vcn_nolock() which takes a locked
runlist. This allows us to find runlist elements with the runlist
lock already held without having to drop and reacquire it around the
call. Adapt all callers.
Signed-off-by: Anton Altaparmakov <aia21@cantab.net>
enable bit which is set appropriately and a per inode sparse disable
bit which is preset on some system file inodes as appropriate.
- Enforce that sparse support is disabled on NTFS volumes pre 3.0.
Signed-off-by: Anton Altaparmakov <aia21@cantab.net>
fs/ntfs/aops.c::ntfs_{prepare,commit}_write()() and re-enable it.
It should be safe now. (Famous last words...)
Signed-off-by: Anton Altaparmakov <aia21@cantab.net>
value afterwards. Cache the initialized_size in the same way and
protect access to the two sizes using the size_lock.
- Minor optimization to fs/ntfs/super.c::ntfs_statfs() and its helpers.
Signed-off-by: Anton Altaparmakov <aia21@cantab.net>
cached value everywhere. Cache the initialized_size in the same way
and protect the critical region where the two sizes are read using the
new size_lock of the ntfs inode.
- Add the new size_lock to the ntfs_inode structure (fs/ntfs/inode.h)
and initialize it (fs/ntfs/inode.c).
Signed-off-by: Anton Altaparmakov <aia21@cantab.net>
Initial git repository build. I'm not bothering with the full history,
even though we have it. We can create a separate "historical" git
archive of that later if we want to, and in the meantime it's about
3.2GB when imported into git - space that would just make the early
git days unnecessarily complicated, when we don't have a lot of good
infrastructure for it.
Let it rip!