Commit Graph

15064 Commits

Author SHA1 Message Date
Linus Torvalds
33f1de6931 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-2.6-nmw
* 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-2.6-nmw:
  GFS2: Whitespace fixes
  GFS2: Remove unused sysfs file
  GFS2: Be extra careful about deallocating inodes
  GFS2: Remove no_formal_ino generating code
  GFS2: Rename eattr.[ch] as xattr.[ch]
  GFS2: Clean up of extended attribute support
  GFS2: Add explanation of extended attr on-disk format
  GFS2: Add "-o errors=panic|withdraw" mount options
  GFS2: jumping to wrong label?
  GFS2: free disk inode which is deleted by remote node -V2
  GFS2: Add a document explaining GFS2's uevents
  GFS2: Add sysfs link to device
  GFS2: Replace assertion with proper error handling
  GFS2: Improve error handling in inode allocation
  GFS2: Add some more info to uevents
  GFS2: Add online uevent to GFS2
2009-09-14 14:35:56 -07:00
Linus Torvalds
041d6d0be8 Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-udf-2.6
* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-udf-2.6:
  udf: Fix possible corruption when close races with write
  udf: Perform preallocation only for regular files
  udf: Remove wrong assignment in udf_symlink
  udf: Remove dead code
2009-09-14 14:35:07 -07:00
Linus Torvalds
af8cb8aa38 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ryusuke/nilfs2
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ryusuke/nilfs2: (21 commits)
  fs/Kconfig: move nilfs2 outside misc filesystems
  nilfs2: convert nilfs_bmap_lookup to an inline function
  nilfs2: allow btree code to directly call dat operations
  nilfs2: add update functions of virtual block address to dat
  nilfs2: remove individual gfp constants for each metadata file
  nilfs2: stop zero-fill of btree path just before free it
  nilfs2: remove unused btree argument from btree functions
  nilfs2: remove nilfs_dat_abort_start and nilfs_dat_abort_free
  nilfs2: shorten freeze period due to GC in write operation v3
  nilfs2: add more check routines in mount process
  nilfs2: An unassigned variable is assigned to a never used structure member
  nilfs2: use GFP_NOIO for bio_alloc instead of GFP_NOWAIT
  nilfs2: stop using periodic write_super callback
  nilfs2: clean up nilfs_write_super
  nilfs2: fix disorder of nilfs_write_super in nilfs_sync_fs
  nilfs2: remove redundant super block commit
  nilfs2: implement nilfs_show_options to display mount options in /proc/mounts
  nilfs2: always lookup disk block address before reading metadata block
  nilfs2: use semaphore to protect pointer to a writable FS-instance
  nilfs2: fix format string compile warning (ino_t)
  ...
2009-09-14 14:34:33 -07:00
Linus Torvalds
6cdb5930a6 Merge git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6:
  cifs: consolidate reconnect logic in smb_init routines
  cifs: Replace wrtPending with a real reference count
  cifs: protect GlobalOplock_Q with its own spinlock
  cifs: use tcon pointer in cifs_show_options
  cifs: send IPv6 addr in upcall with colon delimiters
  [CIFS] Fix checkpatch warnings
  PATCH] cifs: fix broken mounts when a SSH tunnel is used (try #4)
  [CIFS] Memory leak in ntlmv2 hash calculation
  [CIFS] potential NULL dereference in parse_DFS_referrals()
2009-09-14 14:33:13 -07:00
Linus Torvalds
d7e9660ad9 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6: (1623 commits)
  netxen: update copyright
  netxen: fix tx timeout recovery
  netxen: fix file firmware leak
  netxen: improve pci memory access
  netxen: change firmware write size
  tg3: Fix return ring size breakage
  netxen: build fix for INET=n
  cdc-phonet: autoconfigure Phonet address
  Phonet: back-end for autoconfigured addresses
  Phonet: fix netlink address dump error handling
  ipv6: Add IFA_F_DADFAILED flag
  net: Add DEVTYPE support for Ethernet based devices
  mv643xx_eth.c: remove unused txq_set_wrr()
  ucc_geth: Fix hangs after switching from full to half duplex
  ucc_geth: Rearrange some code to avoid forward declarations
  phy/marvell: Make non-aneg speed/duplex forcing work for 88E1111 PHYs
  drivers/net/phy: introduce missing kfree
  drivers/net/wan: introduce missing kfree
  net: force bridge module(s) to be GPL
  Subject: [PATCH] appletalk: Fix skb leak when ipddp interface is not loaded
  ...

Fixed up trivial conflicts:

 - arch/x86/include/asm/socket.h

   converted to <asm-generic/socket.h> in the x86 tree.  The generic
   header has the same new #define's, so that works out fine.

 - drivers/net/tun.c

   fix conflict between 89f56d1e9 ("tun: reuse struct sock fields") that
   switched over to using 'tun->socket.sk' instead of the redundantly
   available (and thus removed) 'tun->sk', and 2b980dbd ("lsm: Add hooks
   to the TUN driver") which added a new 'tun->sk' use.

   Noted in 'next' by Stephen Rothwell.
2009-09-14 10:37:28 -07:00
Jan Kara
cbc8cc3352 udf: Fix possible corruption when close races with write
When we close a file, we remove preallocated blocks from it. But this
truncation was not protected by i_mutex and thus it could have raced with a
write through a different fd and cause crashes or even filesystem corruption.

Signed-off-by: Jan Kara <jack@suse.cz>
2009-09-14 19:13:01 +02:00
Jan Kara
81056dd044 udf: Perform preallocation only for regular files
So far we preallocated blocks also for directories but that brings a
problem, when to get rid of preallocated blocks we don't need. So far
we removed them in udf_clear_inode() which has a disadvantage that
1) blocks are unavailable long after writing to a directory finished
   and thus one can get out of space unnecessarily early
2) releasing blocks from udf_clear_inode is problematic because VFS
   does not expect us to redirty inode there and it also slows down
   memory reclaim.

So preallocate blocks only for regular files where we can drop preallocation
in udf_release_file.

Signed-off-by: Jan Kara <jack@suse.cz>
2009-09-14 19:13:00 +02:00
Jan Kara
7c6e3d1aae udf: Remove wrong assignment in udf_symlink
Recomputation of the pointer was wrong (it should have been just increment).
Luckily, we never use the computed value. Remove it.

Signed-off-by: Jan Kara <jack@suse.cz>
2009-09-14 19:13:00 +02:00
Jan Kara
5891d9dd2a udf: Remove dead code
Remove code that gets never used.

Signed-off-by: Jan Kara <jack@suse.cz>
2009-09-14 19:13:00 +02:00
Ryusuke Konishi
41f4db0f48 fs/Kconfig: move nilfs2 outside misc filesystems
Some people asked me questions like the following:

On Wed, 15 Jul 2009 13:11:21 +0200, Leon Woestenberg wrote:
> just wondering, any reasons why NILFS2 is one of the miscellaneous
> filesystems and, for example, btrfs, is not in Kconfig?

Actually, nilfs is NOT a filesystem came from other operating systems,
but a filesystem created purely for Linux.  Nor is it a flash
filesystem but that for generic block devices.

So, this moves nilfs outside the misc category as I responded in LKML
"Re: Why does NILFS2 hide under Miscellaneous filesystems?"
(Message-Id: <20090716.002526.93465395.ryusuke@osrg.net>).

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2009-09-14 18:27:16 +09:00
Ryusuke Konishi
0f3fe33b39 nilfs2: convert nilfs_bmap_lookup to an inline function
The nilfs_bmap_lookup() is now a wrapper function of
nilfs_bmap_lookup_at_level().

This moves the nilfs_bmap_lookup() to a header file converting it to
an inline function and gives an opportunity for optimization.

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2009-09-14 18:27:16 +09:00
Ryusuke Konishi
2e0c2c7392 nilfs2: allow btree code to directly call dat operations
The current btree code is written so that btree functions call dat
operations via wrapper functions in bmap.c when they allocate, free,
or modify virtual block addresses.

This abstraction requires additional function calls and causes
frequent call of nilfs_bmap_get_dat() function since it is used in the
every wrapper function.

This removes the wrapper functions and makes them available from
btree.c and direct.c, which will increase the opportunity of
compiler optimization.

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2009-09-14 18:27:16 +09:00
Ryusuke Konishi
bd8169efae nilfs2: add update functions of virtual block address to dat
This is a preparation for the successive cleanup ("nilfs2: allow btree
to directly call dat operations").

This adds functions bundling a few operations to change an entry of
virtual block address on the dat file.

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2009-09-14 18:27:15 +09:00
Ryusuke Konishi
7a102b0923 nilfs2: remove individual gfp constants for each metadata file
This gets rid of NILFS_CPFILE_GFP, NILFS_SUFILE_GFP, NILFS_DAT_GFP,
and NILFS_IFILE_GFP.  All of these constants refer to NILFS_MDT_GFP,
and can be removed.

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2009-09-14 18:27:15 +09:00
Ryusuke Konishi
3218929dbd nilfs2: stop zero-fill of btree path just before free it
The btree path object is cleared just before it is freed.

This will remove the code doing the unnecessary clear operation.

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2009-09-14 18:27:15 +09:00
Ryusuke Konishi
6d28f7ea43 nilfs2: remove unused btree argument from btree functions
Even though many btree functions take a btree object as their first
argument, most of them are not used in their functions.

This sticky use of the btree argument is hurting code readability and
giving the possibility of inefficient code generation.

So, this removes the unnecessary btree arguments.

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2009-09-14 18:27:15 +09:00
Ryusuke Konishi
9ead986373 nilfs2: remove nilfs_dat_abort_start and nilfs_dat_abort_free
These functions are not called from any functions.

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2009-09-14 18:27:15 +09:00
Jiro SEKIBA
1cf58fa840 nilfs2: shorten freeze period due to GC in write operation v3
This is a re-revised patch to shorten freeze period.
This version include a fix of the bug Konishi-san mentioned last time.

When GC is runnning, GC moves live block to difference segments.
Copying live blocks into memory is done in a transaction,
however it is not necessarily to be in the transaction.
This patch will get the nilfs_ioctl_move_blocks() out from
transaction lock and put it before the transaction.

I ran sysbench fileio test against nilfs partition.
I copied some DVD/CD images and created snapshot to create live blocks
before starting the benchmark.

Followings are summary of rc8 and rc8 w/ the patch of per-request
statistics, which is min/max and avg.  I ran each test three times and
bellow is average of those numers.

According to this benchmark result, average time is slightly degrated.
However, worstcase (max) result is significantly improved.
This can address a few seconds write freeze.

- random write per-request performance of rc8
 min   0.843ms
 max 680.406ms
 avg   3.050ms
- random write per-request performance of rc8 w/ this patch
 min   0.843ms -> 100.00%
 max 380.490ms ->  55.90%
 avg   3.233ms -> 106.00%

- sequential write per-request performance of rc8
 min   0.736ms
 max 774.343ms
 avg   2.883ms
- sequential write per-request performance of rc8 w/ this patch
 min   0.720ms ->  97.80%
 max  644.280ms->  83.20%
 avg   3.130ms -> 108.50%

-----8<-----8<-----nilfs_cleanerd.conf-----8<-----8<-----
protection_period       150
selection_policy        timestamp       # timestamp in ascend order
nsegments_per_clean     2
cleaning_interval       2
retry_interval          60
use_mmap
log_priority            info
-----8<-----8<-----nilfs_cleanerd.conf-----8<-----8<-----

Signed-off-by: Jiro SEKIBA <jir@unicus.jp>
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2009-09-14 18:27:15 +09:00
Zhu Yanhai
43be0ec038 nilfs2: add more check routines in mount process
nilfs2: Add more safeguard routines and protections in mount process,
which also makes nilfs2 report consistency error messages when
checkpoint number is invalid.

Signed-off-by: Zhu Yanhai <zhu.yanhai@gmail.com>
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2009-09-14 18:27:14 +09:00
Zhang Qiang
a4f0b9c5b4 nilfs2: An unassigned variable is assigned to a never used structure member
nilfs2: In procedure 'nilfs_get_sb()', when a nilfs filesysttem is
mounted for the first time, local variable 'nilfs->ns_last_cno' is
used before loading the latest checkpoint number from disk (in
'nilfs_fill_super'). 'nilfs->ns_last_cno' is assigned to 'sd.cno', but
'sd.cno' has never been used in the procedure.

Signed-off-by: Zhang Qiang <zhangqiang.buaa@gmail.com>
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2009-09-14 18:27:14 +09:00
Ryusuke Konishi
c1b353f04a nilfs2: use GFP_NOIO for bio_alloc instead of GFP_NOWAIT
Alberto Bertogli advised me about bio_alloc() use in nilfs:
On Sat, 13 Jun 2009 22:52:40 -0300, Alberto Bertogli wrote:
> By the way, those bio_alloc()s are using GFP_NOWAIT but it looks
> like they could use at least GFP_NOIO or GFP_NOFS, since the caller
> can (and sometimes do) sleep. The only caller is nilfs_submit_bh(),
> which calls nilfs_submit_seg_bio() which can sleep calling
> wait_for_completion().

This takes in the comment and replaces the use of GFP_NOWAIT flag with
GFP_NOIO.

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2009-09-14 18:27:14 +09:00
Jiro SEKIBA
1dfa27105a nilfs2: stop using periodic write_super callback
This removes nilfs_write_super and commit super block in nilfs
internal thread, instead of periodic write_super callback.

VFS layer calls ->write_super callback periodically.  However,
it looks like that calling back is ommited when disk I/O is busy.
And when cleanerd (nilfs GC) is runnig, disk I/O tend to be busy thus
nilfs superblock is not synchronized as nilfs designed.

To avoid it, syncing superblock by nilfs thread instead of pdflush.

Signed-off-by: Jiro SEKIBA <jir@unicus.jp>
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2009-09-14 18:27:14 +09:00
Jiro SEKIBA
79efdd9411 nilfs2: clean up nilfs_write_super
Separate conditions that check if syncing super block and alternative
super block are required as inline functions to reuse the conditions.

Signed-off-by: Jiro SEKIBA <jir@unicus.jp>
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2009-09-14 18:27:14 +09:00
Jiro SEKIBA
6233caa9d5 nilfs2: fix disorder of nilfs_write_super in nilfs_sync_fs
This fixes disorder of nilfs_write_super in nilfs_sync_fs.  Commiting
super block must be the end of the function so that every changes are
reflected.

->sync_fs() is not called frequently so this makes nilfs_sync_fs call
nilfs_commit_super instead of nilfs_write_super.

Signed-off-by: Jiro SEKIBA <jir@unicus.jp>
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2009-09-14 18:27:14 +09:00
Jiro SEKIBA
ec5d66abdb nilfs2: remove redundant super block commit
This removes redundant super block commit.

nilfs_write_super will call nilfs_commit_super to store super block
into block device.  However, nilfs_put_super will call
nilfs_commit_super right after calling nilfs_write_super.  So calling
nilfs_write_super in nilfs_put_super would be redundant.

Signed-off-by: Jiro SEKIBA <jir@unicus.jp>
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2009-09-14 18:27:13 +09:00
Jiro SEKIBA
b58a285ba4 nilfs2: implement nilfs_show_options to display mount options in /proc/mounts
This is a patch to display mount options in procfs.
Mount options will show up in the /proc/mounts as other fs does.

...
/dev/sda6 /mnt nilfs2 ro,relatime,barrier=off,cp=3,order=strict 0 0
...

Signed-off-by: Jiro SEKIBA <jir@unicus.jp>
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2009-09-14 18:27:13 +09:00
Ryusuke Konishi
1435110467 nilfs2: always lookup disk block address before reading metadata block
The current metadata file code skips disk address lookup for its data
block if the buffer has a mapped flag.

This has a potential risk to cause read request to be performed
against the stale block address that GC moved, and it may lead to meta
data corruption.  The mapped flag is safe if the buffer has an
uptodate flag, otherwise it may prevent necessary update of disk
address in the next read.

This will avoid the potential problem by ensuring disk address lookup
before reading metadata block even for buffers with the mapped flag.

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2009-09-14 18:27:13 +09:00
Ryusuke Konishi
027d6404eb nilfs2: use semaphore to protect pointer to a writable FS-instance
will get rid of nilfs_get_writer() and nilfs_put_writer() pair used to
retain a writable FS-instance for a period.

The pair functions were making up some kind of recursive lock with a
mutex, but they became overkill since the commit
201913ed74.  Furthermore, they caused
the following lockdep warning because the mutex can be released by a
task which didn't lock it:

 =====================================
 [ BUG: bad unlock balance detected! ]
 -------------------------------------
 kswapd0/422 is trying to release lock (&nilfs->ns_writer_mutex) at:
 [<c1359ff5>] mutex_unlock+0x8/0xa
 but there are no more locks to release!

 other info that might help us debug this:
 no locks held by kswapd0/422.

 stack backtrace:
 Pid: 422, comm: kswapd0 Not tainted 2.6.31-rc4-nilfs #51
 Call Trace:
  [<c1358f97>] ? printk+0xf/0x18
  [<c104fea7>] print_unlock_inbalance_bug+0xcc/0xd7
  [<c11578de>] ? prop_put_global+0x3/0x35
  [<c1050195>] lock_release+0xed/0x1dc
  [<c1359ff5>] ? mutex_unlock+0x8/0xa
  [<c1359f83>] __mutex_unlock_slowpath+0xaf/0x119
  [<c1359ff5>] mutex_unlock+0x8/0xa
  [<d1284add>] nilfs_mdt_write_page+0xd8/0xe1 [nilfs2]
  [<c1092653>] shrink_page_list+0x379/0x68d
  [<c109171b>] ? isolate_pages_global+0xb4/0x18c
  [<c1092bd2>] shrink_list+0x26b/0x54b
  [<c10930be>] shrink_zone+0x20c/0x2a2
  [<c10936b7>] kswapd+0x407/0x591
  [<c1091667>] ? isolate_pages_global+0x0/0x18c
  [<c1040603>] ? autoremove_wake_function+0x0/0x33
  [<c10932b0>] ? kswapd+0x0/0x591
  [<c104033b>] kthread+0x69/0x6e
  [<c10402d2>] ? kthread+0x0/0x6e
  [<c1003e33>] kernel_thread_helper+0x7/0x1a

This patch uses a reader/writer semaphore instead of the own lock and
kills this warning.

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2009-09-14 18:27:13 +09:00
Heiko Carstens
b5696e5e0d nilfs2: fix format string compile warning (ino_t)
Unlike on most other architectures ino_t is an unsigned int on s390.
So add an explicit cast to avoid this compile warning:

fs/nilfs2/recovery.c: In function 'recover_dsync_blocks':
fs/nilfs2/recovery.c:555: warning: format '%lu' expects type 'long unsigned int', but argument 3 has type 'ino_t'

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2009-09-14 18:27:13 +09:00
Ryusuke Konishi
1b2f5a641b nilfs2: fix ignored error code in __nilfs_read_inode()
The __nilfs_read_inode function is ignoring the error code returned
from nilfs_read_inode_common(), and wrongly delivers a success code
(zero) when it escapes from the function in erroneous cases.

This adds the missing error handling.

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2009-09-14 18:27:12 +09:00
Steven Whitehouse
86d0063656 GFS2: Whitespace fixes
Reported-by: Daniel Walker <dwalker@fifo99.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2009-09-14 09:50:57 +01:00
Linus Torvalds
86d710146f Merge git://git.linux-nfs.org/projects/trondmy/nfs-2.6
* git://git.linux-nfs.org/projects/trondmy/nfs-2.6: (87 commits)
  NFSv4: Disallow 'mount -t nfs4 -overs=2' and 'mount -t nfs4 -overs=3'
  NFS: Allow the "nfs" file system type to support NFSv4
  NFS: Move details of nfs4_get_sb() to a helper
  NFS: Refactor NFSv4 text-based mount option validation
  NFS: Mount option parser should detect missing "port="
  NFS: out of date comment regarding O_EXCL above nfs3_proc_create()
  NFS: Handle a zero-length auth flavor list
  SUNRPC: Ensure that sunrpc gets initialised before nfs, lockd, etc...
  nfs: fix compile error in rpc_pipefs.h
  nfs: Remove reference to generic_osync_inode from a comment
  SUNRPC: cache must take a reference to the cache detail's module on open()
  NFS: Use the DNS resolver in the mount code.
  NFS: Add a dns resolver for use with NFSv4 referrals and migration
  SUNRPC: Fix a typo in cache_pipefs_files
  nfs: nfs4xdr: optimize low level decoding
  nfs: nfs4xdr: get rid of READ_BUF
  nfs: nfs4xdr: simplify decode_exchange_id by reusing decode_opaque_inline
  nfs: nfs4xdr: get rid of COPYMEM
  nfs: nfs4xdr: introduce decode_sessionid helper
  nfs: nfs4xdr: introduce decode_verifier helper
  ...
2009-09-11 16:39:11 -07:00
Linus Torvalds
774a694f8c Merge branch 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (64 commits)
  sched: Fix sched::sched_stat_wait tracepoint field
  sched: Disable NEW_FAIR_SLEEPERS for now
  sched: Keep kthreads at default priority
  sched: Re-tune the scheduler latency defaults to decrease worst-case latencies
  sched: Turn off child_runs_first
  sched: Ensure that a child can't gain time over it's parent after fork()
  sched: enable SD_WAKE_IDLE
  sched: Deal with low-load in wake_affine()
  sched: Remove short cut from select_task_rq_fair()
  sched: Turn on SD_BALANCE_NEWIDLE
  sched: Clean up topology.h
  sched: Fix dynamic power-balancing crash
  sched: Remove reciprocal for cpu_power
  sched: Try to deal with low capacity, fix update_sd_power_savings_stats()
  sched: Try to deal with low capacity
  sched: Scale down cpu_power due to RT tasks
  sched: Implement dynamic cpu_power
  sched: Add smt_gain
  sched: Update the cpu_power sum during load-balance
  sched: Add SD_PREFER_SIBLING
  ...
2009-09-11 13:23:18 -07:00
Trond Myklebust
ab3bbaa8b2 Merge branch 'nfs-for-2.6.32' 2009-09-11 14:59:37 -04:00
Linus Torvalds
a9c86d4259 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound-2.6
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound-2.6: (377 commits)
  ASoC: au1x: PSC-AC97 bugfixes
  ALSA: dummy - Increase MAX_PCM_SUBSTREAMS to 128
  ALSA: dummy - Add debug proc file
  ALSA: Add const prefix to proc helper functions
  ALSA: Re-export snd_pcm_format_name() function
  ALSA: hda - Use auto model for HP laptops with ALC268 codec
  ALSA: cs46xx - Fix minimum period size
  ASoC: Fix WM835x Out4 capture enumeration
  ALSA: Remove unneeded ifdef from sound/core.h
  ALSA: Remove struct snd_monitor_file from public sound/core.h
  ASoC: Remove unuused hw_read_t
  sound: oxygen: work around MCE when changing volume
  ALSA: dummy - Fake buffer allocations
  ALSA: hda/realtek: Added support for CLEVO M540R subsystem, 6 channel + digital
  ASoC: fix pxa2xx-ac97.c breakage
  ALSA: dummy - Fix the timer calculation in systimer mode
  ALSA: dummy - Add more description
  ALSA: dummy - Better jiffies handling
  ALSA: dummy - Support high-res timer mode
  ALSA: Release v1.0.21
  ...
2009-09-11 09:19:35 -07:00
Linus Torvalds
a12e4d304c Merge branch 'writeback' of git://git.kernel.dk/linux-2.6-block
* 'writeback' of git://git.kernel.dk/linux-2.6-block:
  writeback: check for registered bdi in flusher add and inode dirty
  writeback: add name to backing_dev_info
  writeback: add some debug inode list counters to bdi stats
  writeback: get rid of pdflush completely
  writeback: switch to per-bdi threads for flushing data
  writeback: move dirty inodes from super_block to backing_dev_info
  writeback: get rid of generic_sync_sb_inodes() export
2009-09-11 09:17:05 -07:00
Linus Torvalds
f6f7919086 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/security-testing-2.6
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/security-testing-2.6: (57 commits)
  binfmt_elf: fix PT_INTERP bss handling
  TPM: Fixup boot probe timeout for tpm_tis driver
  sysfs: Add labeling support for sysfs
  LSM/SELinux: inode_{get,set,notify}secctx hooks to access LSM security context information.
  VFS: Factor out part of vfs_setxattr so it can be called from the SELinux hook for inode_setsecctx.
  KEYS: Add missing linux/tracehook.h #inclusions
  KEYS: Fix default security_session_to_parent()
  Security/SELinux: includecheck fix kernel/sysctl.c
  KEYS: security_cred_alloc_blank() should return int under all circumstances
  IMA: open new file for read
  KEYS: Add a keyctl to install a process's session keyring on its parent [try #6]
  KEYS: Extend TIF_NOTIFY_RESUME to (almost) all architectures [try #6]
  KEYS: Do some whitespace cleanups [try #6]
  KEYS: Make /proc/keys use keyid not numread as file position [try #6]
  KEYS: Add garbage collection for dead, revoked and expired keys. [try #6]
  KEYS: Flag dead keys to induce EKEYREVOKED [try #6]
  KEYS: Allow keyctl_revoke() on keys that have SETATTR but not WRITE perm [try #6]
  KEYS: Deal with dead-type keys appropriately [try #6]
  CRED: Add some configurable debugging [try #6]
  selinux: Support for the new TUN LSM hooks
  ...
2009-09-11 08:55:49 -07:00
Jens Axboe
500b067c5e writeback: check for registered bdi in flusher add and inode dirty
Also a debugging aid. We want to catch dirty inodes being added to
backing devices that don't do writeback.

Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2009-09-11 09:20:26 +02:00
Jens Axboe
d993831fa7 writeback: add name to backing_dev_info
This enables us to track who does what and print info. Its main use
is catching dirty inodes on the default_backing_dev_info, so we can
fix that up.

Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2009-09-11 09:20:26 +02:00
Jens Axboe
d0bceac747 writeback: get rid of pdflush completely
It is now unused, so kill it off.

Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2009-09-11 09:20:25 +02:00
Jens Axboe
03ba3782e8 writeback: switch to per-bdi threads for flushing data
This gets rid of pdflush for bdi writeout and kupdated style cleaning.
pdflush writeout suffers from lack of locality and also requires more
threads to handle the same workload, since it has to work in a
non-blocking fashion against each queue. This also introduces lumpy
behaviour and potential request starvation, since pdflush can be starved
for queue access if others are accessing it. A sample ffsb workload that
does random writes to files is about 8% faster here on a simple SATA drive
during the benchmark phase. File layout also seems a LOT more smooth in
vmstat:

 r  b   swpd   free   buff  cache   si   so    bi    bo   in    cs us sy id wa
 0  1      0 608848   2652 375372    0    0     0 71024  604    24  1 10 48 42
 0  1      0 549644   2712 433736    0    0     0 60692  505    27  1  8 48 44
 1  0      0 476928   2784 505192    0    0     4 29540  553    24  0  9 53 37
 0  1      0 457972   2808 524008    0    0     0 54876  331    16  0  4 38 58
 0  1      0 366128   2928 614284    0    0     4 92168  710    58  0 13 53 34
 0  1      0 295092   3000 684140    0    0     0 62924  572    23  0  9 53 37
 0  1      0 236592   3064 741704    0    0     4 58256  523    17  0  8 48 44
 0  1      0 165608   3132 811464    0    0     0 57460  560    21  0  8 54 38
 0  1      0 102952   3200 873164    0    0     4 74748  540    29  1 10 48 41
 0  1      0  48604   3252 926472    0    0     0 53248  469    29  0  7 47 45

where vanilla tends to fluctuate a lot in the creation phase:

 r  b   swpd   free   buff  cache   si   so    bi    bo   in    cs us sy id wa
 1  1      0 678716   5792 303380    0    0     0 74064  565    50  1 11 52 36
 1  0      0 662488   5864 319396    0    0     4   352  302   329  0  2 47 51
 0  1      0 599312   5924 381468    0    0     0 78164  516    55  0  9 51 40
 0  1      0 519952   6008 459516    0    0     4 78156  622    56  1 11 52 37
 1  1      0 436640   6092 541632    0    0     0 82244  622    54  0 11 48 41
 0  1      0 436640   6092 541660    0    0     0     8  152    39  0  0 51 49
 0  1      0 332224   6200 644252    0    0     4 102800  728    46  1 13 49 36
 1  0      0 274492   6260 701056    0    0     4 12328  459    49  0  7 50 43
 0  1      0 211220   6324 763356    0    0     0 106940  515    37  1 10 51 39
 1  0      0 160412   6376 813468    0    0     0  8224  415    43  0  6 49 45
 1  1      0  85980   6452 886556    0    0     4 113516  575    39  1 11 54 34
 0  2      0  85968   6452 886620    0    0     0  1640  158   211  0  0 46 54

A 10 disk test with btrfs performs 26% faster with per-bdi flushing. A
SSD based writeback test on XFS performs over 20% better as well, with
the throughput being very stable around 1GB/sec, where pdflush only
manages 750MB/sec and fluctuates wildly while doing so. Random buffered
writes to many files behave a lot better as well, as does random mmap'ed
writes.

A separate thread is added to sync the super blocks. In the long term,
adding sync_supers_bdi() functionality could get rid of this thread again.

Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2009-09-11 09:20:25 +02:00
Jens Axboe
66f3b8e2e1 writeback: move dirty inodes from super_block to backing_dev_info
This is a first step at introducing per-bdi flusher threads. We should
have no change in behaviour, although sb_has_dirty_inodes() is now
ridiculously expensive, as there's no easy way to answer that question.
Not a huge problem, since it'll be deleted in subsequent patches.

Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2009-09-11 09:20:25 +02:00
Jens Axboe
d8a8559cd7 writeback: get rid of generic_sync_sb_inodes() export
This adds two new exported functions:

- writeback_inodes_sb(), which only attempts to writeback dirty inodes on
  this super_block, for WB_SYNC_NONE writeout.
- sync_inodes_sb(), which writes out all dirty inodes on this super_block
  and also waits for the IO to complete.

Acked-by: Jan Kara <jack@suse.cz>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2009-09-11 09:20:25 +02:00
James Morris
a3c8b97396 Merge branch 'next' into for-linus 2009-09-11 08:04:49 +10:00
Takashi Iwai
3827119e20 Merge branch 'topic/soundcore-preclaim' into for-linus
* topic/soundcore-preclaim:
  sound: make OSS device number claiming optional and schedule its removal
  sound: request char-major-* module aliases for missing OSS devices
  chrdev: implement __[un]register_chrdev()
2009-09-10 15:33:04 +02:00
Roland McGrath
9f0ab4a3f0 binfmt_elf: fix PT_INTERP bss handling
In fs/binfmt_elf.c, load_elf_interp() calls padzero() for .bss even if
the PT_LOAD has no PROT_WRITE and no .bss.  This generates EFAULT.

Here is a small test case.  (Yes, there are other, useful PT_INTERP
which have only .text and no .data/.bss.)

	----- ptinterp.S
	_start: .globl _start
		 nop
		 int3
	-----
	$ gcc -m32 -nostartfiles -nostdlib -o ptinterp ptinterp.S
	$ gcc -m32 -Wl,--dynamic-linker=ptinterp -o hello hello.c
	$ ./hello
	Segmentation fault  # during execve() itself

	After applying the patch:
	$ ./hello
	Trace trap  # user-mode execution after execve() finishes

If the ELF headers are actually self-inconsistent, then dying is fine.
But having no PROT_WRITE segment is perfectly normal and correct if
there is no segment with p_memsz > p_filesz (i.e. bss).  John Reiser
suggested checking for PROT_WRITE in the bss logic.  I think it makes
most sense to simply apply the bss logic only when there is bss.

This patch looks less trivial than it is due to some reindentation.
It just moves the "if (last_bss > elf_bss) {" test up to include the
partial-page bss logic as well as the more-pages bss logic.

Reported-by: John Reiser <jreiser@bitwagon.com>
Signed-off-by: Roland McGrath <roland@redhat.com>
Signed-off-by: James Morris <jmorris@namei.org>
2009-09-10 20:11:12 +10:00
Linus Torvalds
526b678093 Merge branch 'lookup-permissions-cleanup'
* lookup-permissions-cleanup:
  jffs2/jfs/xfs: switch over to 'check_acl' rather than 'permission()'
  ext[234]: move over to 'check_acl' permission model
  shmfs: use 'check_acl' instead of 'permission'
  Make 'check_acl()' a first-class filesystem op
  Simplify exec_permission_lite(), part 3
  Simplify exec_permission_lite() further
  Simplify exec_permission_lite() logic
  Do not call 'ima_path_check()' for each path component
2009-09-09 20:04:54 -07:00
Roland McGrath
752015d1b0 binfmt_elf: fix PT_INTERP bss handling
In fs/binfmt_elf.c, load_elf_interp() calls padzero() for .bss even if
the PT_LOAD has no PROT_WRITE and no .bss.  This generates EFAULT.

Here is a small test case.  (Yes, there are other, useful PT_INTERP
which have only .text and no .data/.bss.)

	----- ptinterp.S
	_start: .globl _start
		 nop
		 int3
	-----
	$ gcc -m32 -nostartfiles -nostdlib -o ptinterp ptinterp.S
	$ gcc -m32 -Wl,--dynamic-linker=ptinterp -o hello hello.c
	$ ./hello
	Segmentation fault  # during execve() itself

	After applying the patch:
	$ ./hello
	Trace trap  # user-mode execution after execve() finishes

If the ELF headers are actually self-inconsistent, then dying is fine.
But having no PROT_WRITE segment is perfectly normal and correct if
there is no segment with p_memsz > p_filesz (i.e. bss).  John Reiser
suggested checking for PROT_WRITE in the bss logic.  I think it makes
most sense to simply apply the bss logic only when there is bss.

This patch looks less trivial than it is due to some reindentation.
It just moves the "if (last_bss > elf_bss) {" test up to include the
partial-page bss logic as well as the more-pages bss logic.

Reported-by: John Reiser <jreiser@bitwagon.com>
Signed-off-by: Roland McGrath <roland@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-09-09 20:03:47 -07:00
David P. Quigley
ddd29ec659 sysfs: Add labeling support for sysfs
This patch adds a setxattr handler to the file, directory, and symlink
inode_operations structures for sysfs. The patch uses hooks introduced in the
previous patch to handle the getting and setting of security information for
the sysfs inodes. As was suggested by Eric Biederman the struct iattr in the
sysfs_dirent structure has been replaced by a structure which contains the
iattr, secdata and secdata length to allow the changes to persist in the event
that the inode representing the sysfs_dirent is evicted. Because sysfs only
stores this information when a change is made all the optional data is moved
into one dynamically allocated field.

This patch addresses an issue where SELinux was denying virtd access to the PCI
configuration entries in sysfs. The lack of setxattr handlers for sysfs
required that a single label be assigned to all entries in sysfs. Granting virtd
access to every entry in sysfs is not an acceptable solution so fine grained
labeling of sysfs is required such that individual entries can be labeled
appropriately.

[sds:  Fixed compile-time warnings, coding style, and setting of inode security init flags.]

Signed-off-by: David P. Quigley <dpquigl@tycho.nsa.gov>
Signed-off-by: Stephen D. Smalley <sds@tycho.nsa.gov>
Signed-off-by: James Morris <jmorris@namei.org>
2009-09-10 10:11:29 +10:00
David P. Quigley
b1ab7e4b2a VFS: Factor out part of vfs_setxattr so it can be called from the SELinux hook for inode_setsecctx.
This factors out the part of the vfs_setxattr function that performs the
setting of the xattr and its notification. This is needed so the SELinux
implementation of inode_setsecctx can handle the setting of the xattr while
maintaining the proper separation of layers.

Signed-off-by: David P. Quigley <dpquigl@tycho.nsa.gov>
Acked-by: Serge Hallyn <serue@us.ibm.com>
Signed-off-by: James Morris <jmorris@namei.org>
2009-09-10 10:11:22 +10:00