kernel-ark/lib
Denis Vlasenko 4277eedd79 vsprintf.c: optimizing, part 2: base 10 conversion speedup, v2
Optimize integer-to-string conversion in vsprintf.c for base 10.  This is
by far the most used conversion, and in some use cases it impacts
performance.  For example, top reads /proc/$PID/stat for every process, and
with 4000 processes decimal conversion alone takes noticeable time.

Using code from

http://www.cs.uiowa.edu/~jones/bcd/decimal.html
(with permission from the author, Douglas W. Jones)

binary-to-decimal-string conversion is done in groups of five digits at
once, using only additions/subtractions/shifts (with -O2; -Os throws in
some multiply instructions).

On i386 arch gcc 4.1.2 -O2 generates ~500 bytes of code.

This patch is run tested. Userspace benchmark/test is also attached.
I tested it on PIII and AMD64 and new code is generally ~2.5 times
faster. On AMD64:

# ./vsprintf_verify-O2
Original decimal conv: .......... 151 ns per iteration
Patched decimal conv:  .......... 62 ns per iteration
Testing correctness
12895992590592 ok...        [Ctrl-C]
# ./vsprintf_verify-O2
Original decimal conv: .......... 151 ns per iteration
Patched decimal conv:  .......... 62 ns per iteration
Testing correctness
26025406464 ok...        [Ctrl-C]

More realistic test: top from busybox project was modified to
report how many us it took to scan /proc (this does not account
any processing done after that, like sorting process list),
and then I test it with 4000 processes:

#!/bin/sh
i=4000
while test $i != 0; do
    sleep 30 &
    let i--
done
busybox top -b -n3 >/dev/null

on unpatched kernel:

top: 4120 processes took 102864 microseconds to scan
top: 4120 processes took 91757 microseconds to scan
top: 4120 processes took 92517 microseconds to scan
top: 4120 processes took 92581 microseconds to scan

on patched kernel:

top: 4120 processes took 75460 microseconds to scan
top: 4120 processes took 66451 microseconds to scan
top: 4120 processes took 67267 microseconds to scan
top: 4120 processes took 67618 microseconds to scan

The speedup comes from much faster generation of /proc/PID/stat
by sprintf() calls inside the kernel.

Signed-off-by: Douglas W Jones <jones@cs.uiowa.edu>
Signed-off-by: Denys Vlasenko <vda.linux@googlemail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-16 09:05:52 -07:00
..
lzo Add LZO1X algorithm to the kernel 2007-07-10 17:51:13 -07:00
reed_solomon [RSLIB] Support non-canonical GF representations 2007-05-02 11:56:33 +01:00
zlib_deflate
zlib_inflate Fix ppp_deflate issues with recent zlib_inflate changes 2007-05-07 12:13:04 -07:00
.gitignore
audit.c [PATCH] audit signal recipients 2007-05-11 05:38:25 -04:00
bitmap.c
bitrev.c
bug.c generic bug: use show_regs() instead of dump_stack() 2007-07-16 09:05:51 -07:00
bust_spinlocks.c
check_signature.c uninline check_signature() 2007-07-16 09:05:50 -07:00
cmdline.c
cpumask.c Safer nr_node_ids and nr_node_ids determination and initial values 2007-05-07 12:12:51 -07:00
crc16.c
crc32.c
crc32defs.h
crc-ccitt.c
crc-itu-t.c CRC ITU-T V.41 2007-05-10 18:24:13 +02:00
ctype.c
debug_locks.c
dec_and_lock.c
devres.c iomap: implement pcim_iounmap_regions() 2007-04-28 14:15:58 -04:00
div64.c [S390]: Fix build on 31-bit. 2007-04-25 22:28:53 -07:00
dump_stack.c
extable.c
fault-inject.c simplify the stacktrace code 2007-05-08 11:14:58 -07:00
find_next_bit.c
gen_crc32table.c
genalloc.c
halfmd4.c
hexdump.c hexdump: more output formatting 2007-06-08 17:23:34 -07:00
hweight.c
idr.c lib: add idr_remove_all 2007-07-16 09:05:34 -07:00
inflate.c [PATCH] x86-64: deflate inflate_dynamic too 2007-05-02 19:27:15 +02:00
int_sqrt.c
iomap_copy.c
iomap.c iomap: make the default iomap functions fail softer 2007-05-04 20:44:23 -07:00
ioremap.c Detach sched.h from mm.h 2007-05-21 09:18:19 -07:00
irq_regs.c
Kconfig Add LZO1X algorithm to the kernel 2007-07-10 17:51:13 -07:00
Kconfig.debug SLUB: support slub_debug on by default 2007-07-16 09:05:36 -07:00
kernel_lock.c
klist.c
kobject_uevent.c the overdue removal of the mount/umount uevents 2007-04-27 10:57:31 -07:00
kobject.c sysfs: make kobj point to sysfs_dirent instead of dentry 2007-07-11 16:09:08 -07:00
kref.c kref: fix CPU ordering with respect to krefs 2007-04-27 10:57:29 -07:00
libcrc32c.c
list_debug.c
locking-selftest-hardirq.h
locking-selftest-mutex.h
locking-selftest-rlock-hardirq.h
locking-selftest-rlock-softirq.h
locking-selftest-rlock.h
locking-selftest-rsem.h
locking-selftest-softirq.h
locking-selftest-spin-hardirq.h
locking-selftest-spin-softirq.h
locking-selftest-spin.h
locking-selftest-wlock-hardirq.h
locking-selftest-wlock-softirq.h
locking-selftest-wlock.h
locking-selftest-wsem.h
locking-selftest.c
Makefile uninline check_signature() 2007-07-16 09:05:50 -07:00
parser.c [AFS]: Make the match_*() functions take const options. 2007-05-03 03:10:39 -07:00
percpu_counter.c percpu_counters: use for_each_online_cpu() 2007-07-16 09:05:41 -07:00
plist.c
prio_tree.c
radix-tree.c [LIB]: export radix_tree_preload() 2007-07-14 16:05:04 +10:00
random32.c
rbtree.c
reciprocal_div.c
rwsem-spinlock.c
rwsem.c
semaphore-sleepers.c
sha1.c
smp_processor_id.c
sort.c
spinlock_debug.c
string.c [STRING]: Move strcasecmp/strncasecmp to lib/string.c 2007-04-26 01:54:39 -07:00
swiotlb.c fix section mismatch warning in lib/swiotlb.c 2007-05-08 11:14:59 -07:00
textsearch.c
ts_bm.c
ts_fsm.c
ts_kmp.c
vsprintf.c vsprintf.c: optimizing, part 2: base 10 conversion speedup, v2 2007-07-16 09:05:52 -07:00