At several places we modify EXT3_I(inode)->i_state without holding i_mutex
(ext3_release_file, ext3_bmap, ext3_journalled_writepage, ext3_do_update_inode,
...). These modifications are racy and we can lose updates to i_state. So
convert handling of i_state to use bitops which are atomic.
Signed-off-by: Jan Kara <jack@suse.cz>
Cleanup handling of S_NOQUOTA inode flag and document it a bit. The flag
does not have to be set under dqptr_sem. Only functions modifying inode's
dquot pointers have to check the flag under dqptr_sem before going forward
with the modification. This way we are sure that we cannot add new dquot
pointers to the inode which is just becoming a quota file.
The good thing about this cleanup is that there are no more places in quota
code which enforce i_mutex vs. dqptr_sem lock ordering (in particular that
dqptr_sem -> i_mutex of quota file). This should silence some (false) lockdep
warnings with ext4 + quota and generally make life of some filesystems easier.
Signed-off-by: Jan Kara <jack@suse.cz>
* 'for-linus' of git://git.open-osd.org/linux-open-osd:
exofs: groups support
exofs: Prepare for groups
exofs: Error recovery if object is missing from storage
exofs: convert io_state to use pages array instead of bio at input
exofs: RAID0 support
exofs: Define on-disk per-inode optional layout attribute
exofs: unindent exofs_sbi_read
exofs: Move layout related members to a layout structure
exofs: Recover in the case of read-passed-end-of-file
exofs: Micro-optimize exofs_i_info
exofs: debug print even less
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc-2.6:
sparc64: Make prom entry spinlock NMI safe.
sparc64: Kill off old sys_perfctr system call and state.
sparc: Update defconfigs.
sparc: Provide io{read,write}{16,32}be().
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/ide-next-2.6: (49 commits)
drivers/ide: Fix continuation line formats
ide: fixed section mismatch warning in cmd640.c
ide: ide_timing_compute() fixup
ide: make ide_get_best_pio_mode() static
via82cxxx: use ->pio_mode value to determine pair device speed
tx493xide: use ->pio_mode value to determine pair device speed
siimage: use ->pio_mode value to determine pair device speed
palm_bk3710: use ->pio_mode value to determine pair device speed
it821x: use ->pio_mode value to determine pair device speed
cs5536: use ->pio_mode value to determine pair device speed
cs5535: use ->pio_mode value to determine pair device speed
cmd64x: fix handling of address setup timings
amd74xx: use ->pio_mode value to determine pair device speed
alim15x3: fix handling of UDMA enable bit
alim15x3: fix handling of DMA timings
alim15x3: fix handling of command timings
alim15x3: fix handling of address setup timings
ide-timings: use ->pio_mode value to determine fastest PIO speed
ide: change ->set_dma_mode method parameters
ide: change ->set_pio_mode method parameters
...
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6: (52 commits)
init: Open /dev/console from rootfs
mqueue: fix typo "failues" -> "failures"
mqueue: only set error codes if they are really necessary
mqueue: simplify do_open() error handling
mqueue: apply mathematics distributivity on mq_bytes calculation
mqueue: remove unneeded info->messages initialization
mqueue: fix mq_open() file descriptor leak on user-space processes
fix race in d_splice_alias()
set S_DEAD on unlink() and non-directory rename() victims
vfs: add NOFOLLOW flag to umount(2)
get rid of ->mnt_parent in tomoyo/realpath
hppfs can use existing proc_mnt, no need for do_kern_mount() in there
Mirror MS_KERNMOUNT in ->mnt_flags
get rid of useless vfsmount_lock use in put_mnt_ns()
Take vfsmount_lock to fs/internal.h
get rid of insanity with namespace roots in tomoyo
take check for new events in namespace (guts of mounts_poll()) to namespace.c
Don't mess with generic_permission() under ->d_lock in hpfs
sanitize const/signedness for udf
nilfs: sanitize const/signedness in dealing with ->d_name.name
...
Fix up fairly trivial (famous last words...) conflicts in
drivers/infiniband/core/uverbs_main.c and security/tomoyo/realpath.c
* git://git.infradead.org/battery-2.6:
power_supply: bq27x00: fix voltage and current units
power_supply: bq27x00: add status and time properties
power_supply: bq27x00: add BQ27500 support
power_supply: bq27x00: fix temperature conversion
power_supply: bq27x00: remove unused struct fields
power_supply: bq27x00: remove double endian swap
da9030_battery: fix spelling in comment
wm97xx_battery: Clean up some warnings
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/lrg/voltage-2.6: (27 commits)
Regulators: wm8400 - cleanup platform driver data handling
Regulators: wm8994 - clean up driver data after removal
Regulators: wm831x-xxx - clean up driver data after removal
Regulators: pcap-regulator - clean up driver data after removal
Regulators: max8660 - annotate probe and remove methods
Regulators: max1586 - annotate probe and remove methods
Regulators: lp3971 - fail if platform data was not supplied
Regulators: tps6507x-regulator - mark probe method as __devinit
Regulators: tps65023-regulator - mark probe method as __devinit
Regulators: twl-regulator - mark probe function as __devinit
Regulators: fixed - annotate probe and remove methods
Regulators: ab3100 - fix probe and remove annotations
Regulators: virtual - use sysfs attribute groups
twl6030: regulator: Configure STATE register instead of REMAP
regulator: Provide optional dummy regulator for consumers
regulator: Assume regulators are enabled if they don't report anything
regulator: Convert fixed voltage regulator to use enable_time()
regulator: Add WM8994 regulator support
regulator: enable max8649 regulator driver
regulator: trivial: fix typos in user-visible Kconfig text
...
* 'drm-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/airlied/drm-2.6: (151 commits)
vga_switcheroo: disable default y by new rules.
drm/nouveau: fix *staging* driver build with switcheroo off.
drm/radeon: fix typo in Makefile
vga_switcheroo: fix build on platforms with no ACPI
drm/radeon: Fix printf type warning in 64bit system.
drm/radeon/kms: bump the KMS version number for square tiling support.
vga_switcheroo: initial implementation (v15)
drm/radeon/kms: do not disable audio engine twice
Revert "drm/radeon/kms: disable HDMI audio for now on rv710/rv730"
drm/radeon/kms: do not preset audio stuff and start timer when not using audio
drm/radeon: r100/r200 ums: block ability for userspace app to trash 0 page and beyond
drm/ttm: fix function prototype to match implementation
drm/radeon: use ALIGN instead of open coding it
drm/radeon/kms: initialize set_surface_reg reg for rs600 asic
drm/i915: Use a dmi quirk to skip a broken SDVO TV output.
drm/i915: enable/disable LVDS port at DPMS time
drm/i915: check for multiple write domains in pin_and_relocate
drm/i915: clean-up i915_gem_flush_gpu_write_domain
drm/i915: reuse i915_gpu_idle helper
drm/i915: ensure lru ordering of fence_list
...
Fixed trivial conflicts in drivers/gpu/vga/Kconfig
Fix stop_machine_text_poke() to issue smp_mb() before exiting
waiting loop, and use cpu_relax() for waiting.
Changes in v2:
- Don't use ACCESS_ONCE().
Signed-off-by: Masami Hiramatsu <mhiramat@redhat.com>
Acked-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: systemtap <systemtap@sources.redhat.com>
Cc: DLE <dle-develop@lists.sourceforge.net>
Cc: Jason Baron <jbaron@redhat.com>
LKML-Reference: <20100304033850.3819.74590.stgit@localhost6.localdomain6>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Move @SRC right after FUNC in syntax according to syntax change
on command line help.
Signed-off-by: Masami Hiramatsu <mhiramat@redhat.com>
Cc: systemtap <systemtap@sources.redhat.com>
Cc: DLE <dle-develop@lists.sourceforge.net>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: K.Prasad <prasad@linux.vnet.ibm.com>
LKML-Reference: <20100304033843.3819.10087.stgit@localhost6.localdomain6>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
The slab tree adds a percpu variable usage case (commit
9dfc6e68bf "SLUB: Use this_cpu operations in
slub"), but the percpu tree removes the prefixing of percpu variables (commit
dd17c8f729 "percpu: remove per_cpu__ prefix"),
thus causing the following compilation error:
CC mm/slub.o
mm/slub.c: In function ‘alloc_kmem_cache_cpus’:
mm/slub.c:2078: error: implicit declaration of function ‘per_cpu_var’
mm/slub.c:2078: warning: assignment makes pointer from integer without a cast
make[1]: *** [mm/slub.o] Error 1
Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>
a) Fix sparse warning in ext4_ioctl()
b) Remove unneeded variable in mext_leaf_block()
c) Fix spelling typo in mext_check_arguments()
Signed-off-by: Akira Fujita <a-fujita@rs.jp.nec.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
If EXT4_IOC_MOVE_EXT ioctl is called with NULL donor_fd, fget() in
ext4_ioctl() gets inappropriate file structure for donor; so we need
to do this check earlier, before calling double_down_write_data_sem().
Signed-off-by: Akira Fujita <a-fujita@rs.jp.nec.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
If the leaf node has 2 extent space or fewer and EXT4_IOC_MOVE_EXT
ioctl is called with the file offset where after the 2nd extent
covers, mext_insert_across_blocks() always tries to insert extent into
the first extent. As a result, the file gets corrupted because of
wrong extent order. The patch fixes this problem.
Signed-off-by: Akira Fujita <a-fujita@rs.jp.nec.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
The cpumask of the padata instance was used without allocated.
This caused boot crashes if CONFIG_CPUMASK_OFFSTACK is enabled.
This patch fixes this by doing proper allocation for this cpumask.
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
There are duplicate macro definitions of in_range() in mballoc.h and
balloc.c. This consolidates these two definitions into ext4.h, and
changes extents.c to use in_range() as well.
Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Cc: Andreas Dilger <adilger@sun.com>
More cleanup to convert open-coded calculations of the first block
number of a free extent to use ext4_grp_offs_to_block() instead.
Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Cc: Andreas Dilger <adilger@sun.com>
This is a cleanup and simplification patch which takes some open-coded
calculations to calculate the first block number of a group and
converts them to use the (already defined) ext4_group_first_block_no()
function.
Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Cc: Andreas Dilger <adilger@sun.com>
If the calling convention of ->timer_fn() and ->cleanup_fn() are unified
across hardware versions we can drop parameters to ioat_init_channel() and
unify ioat_is_dma_complete() implementations.
Both ->timer_fn() and ->cleanup_fn() are modified to expect a struct
dma_chan pointer.
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
The hardware automatically disables further interrupts after each event
until rearmed. This allows a delay to be injected between the occurence
of the interrupt and the running of the cleanup routine. The delay is
scaled by the descriptor backlog and then written to the INTRDELAY
register which specifies the number of microseconds to hold off
interrupt delivery after an interrupt event occurs. According to
powertop this reduces the interrupt rate from ~5000 intr/s to ~150
intr/s per without affecting throughput (simple dd to a raid6 array).
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Since ioat_cleanup_preamble() and the update of the last completed
descriptor are not synchronized there is a chance that two cleanup threads
can see descriptors to clean. If the first cleans up all pending
descriptors then the second will trigger the BUG_ON.
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
This makes sure that we pick the synchronous signals caused by a
processor fault over any pending regular asynchronous signals sent to
use by [t]kill().
This is not strictly required semantics, but it makes it _much_ easier
for programs like Wine that expect to find the fault information in the
signal stack.
Without this, if a non-synchronous signal gets picked first, the delayed
asynchronous signal will have its signal context pointing to the new
signal invocation, rather than the instruction that caused the SIGSEGV
or SIGBUS in the first place.
This is not all that pretty, and we're discussing making the synchronous
signals more explicit rather than have these kinds of implicit
preferences of SIGSEGV and friends. See for example
http://bugzilla.kernel.org/show_bug.cgi?id=15395
for some of the discussion. But in the meantime this is a simple and
fairly straightforward work-around, and the whole
if (x & Y)
x &= Y;
thing can be compiled into (and gcc does do it) just three instructions:
movq %rdx, %rax
andl $Y, %eax
cmovne %rax, %rdx
so it is at least a simple solution to a subtle issue.
Reported-and-tested-by: Pavel Vilim <wylda@volny.cz>
Acked-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
We forget to release page references we acquire in
ext4_da_block_invalidatepages. Luckily, this function gets called only if we
are not able to allocate blocks for delay-allocated data so that function
should better never be called.
Also cleanup handling of index variable.
Reported-by: Wu Fengguang <fengguang.wu@intel.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
To avoid potential problems with an empty /dev open /dev/console
from rootfs instead of waiting to mount our root filesystem and
mounting it there. This effectively guarantees that there will
be a device node, and it won't be on a filesystem that we will
ever unmount, so there are no issues with leaving /dev/console
open and pinning the filesystem.
This is actually more effective than automatically mounting
devtmpfs on /dev because it removes removes the occasionally
problematic assumption that /dev/console exists from the boot
code.
With this patch I was able to throw busybox on my /boot partition
(which has no /dev directory) and boot into userspace without
problems.
The only possible negative consequence I can think of is that
someone out there deliberately used did not use a character device
that is major 5 minor 2 for /dev/console. Does anyone know of a
situation in which that could make sense?
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
... postponing assignments until they're needed. Doesn't change code size.
Signed-off-by: André Goddard Rosa <andre.goddard@gmail.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
It reduces code size:
text data bss dec hex filename
9925 72 16 10013 271d ipc/mqueue-BEFORE.o
9885 72 16 9973 26f5 ipc/mqueue-AFTER.o
Signed-off-by: André Goddard Rosa <andre.goddard@gmail.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Code size reduction:
text data bss dec hex filename
9941 72 16 10029 272d ipc/mqueue-BEFORE.o
9925 72 16 10013 271d ipc/mqueue-AFTER.o
Signed-off-by: André Goddard Rosa <andre.goddard@gmail.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
... and abort earlier if we couldn't allocate the message pointers array,
avoiding the u->mq_bytes accounting logic.
It reduces code size:
text data bss dec hex filename
9949 72 16 10037 2735 ipc/mqueue-BEFORE.o
9941 72 16 10029 272d ipc/mqueue-AFTER.o
Signed-off-by: André Goddard Rosa <andre.goddard@gmail.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
rehashing the negative placeholder opens a race with d_lookup();
we unhash it almost immediately (by d_move()), but the race
window is there. Since d_move() doesn't rely on target being
hashed, we don't need that d_rehash() at all.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Add a new UMOUNT_NOFOLLOW flag to umount(2). This is needed to prevent
symlink attacks in unprivileged unmounts (fuse, samba, ncpfs).
Additionally, return -EINVAL if an unknown flag is used (and specify
an explicitly unused flag: UMOUNT_UNUSED). This makes it possible for
the caller to determine if a flag is supported or not.
CC: Eugene Teo <eugene@redhat.com>
CC: Michael Kerrisk <mtk.manpages@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
It hadn't been needed since we'd sanitized the logics in
mark_mounts_for_expiry() (which, in turn, used to be a
rudiment of bad old times when namespace_sem was per-ns).
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
passing *any* namespace root to __d_path() as root is equivalent
to just passing it {NULL, NULL}; no need to bother with finding
the root of our namespace in there.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>