a2a892a236
Implementation: =============== The encrypt/decrypt code is based on an x86 implementation I did a while ago which I never published. This unpublished implementation does include an assembler based key schedule and precomputed tables. For simplicity and best acceptance, however, I took Gladman's in-kernel code for table generation and key schedule for the kernel port of my assembler code and modified this code to produce the key schedule as required by my assembler implementation. File locations and Kconfig are kept similar to the i586 AES assembler implementation. It may seem a little bit strange to use 32 bit I/O and registers in the assembler implementation but this gives the best code size. My implementation takes one instruction more per round compared to Gladman's x86 assembler but it doesn't require any stack for local variables or saved registers and it is less serialized than Gladman's code. Note that all comparisons to Gladman's code were done after my code was implemented. I did only use FIPS PUB 197 for the implementation so my implementation is independent work. If anybody has a better assembler solution for x86_64 I'll be pleased to have my code replaced with the better solution. Testing: ======== The implementation passes the in-kernel crypto testing module and I'm running it without any problems on my laptop where it is mainly used for dm-crypt. Microbenchmark: =============== The microbenchmark was done in userspace with similar compile flags as used during kernel compile. Encrypt/decrypt is about 35% faster than the generic C implementation. As the generic C as well as my assembler implementation are both table I don't really expect that there is much room for further improvements though I'll be glad to be corrected here. The key schedule is about 5% slower than the generic C implementation. This is due to the fact that some more work has to be done in the key schedule routine to fit the schedule to the assembler implementation. Code Size: ========== Encrypt and decrypt are together about 2.1 Kbytes smaller than the generic C implementation which is important with regard to L1 cache usage. The key schedule routine is about 100 bytes larger than the generic C implementation. Data Size: ========== There's no difference in data size requirements between the assembler implementation and the generic C implementation. License: ======== Gladmans's code is dual BSD/GPL whereas my assembler code is GPLv2 only (I'm not going to change the license for my code). So I had to change the module license for the x86_64 aes module from 'Dual BSD/GPL' to 'GPL' to reflect the most restrictive license within the module. Signed-off-by: Andreas Steinmetz <ast@domdv.de> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
122 lines
3.8 KiB
Makefile
122 lines
3.8 KiB
Makefile
#
|
|
# x86_64/Makefile
|
|
#
|
|
# This file is included by the global makefile so that you can add your own
|
|
# architecture-specific flags and dependencies. Remember to do have actions
|
|
# for "archclean" and "archdep" for cleaning up and making dependencies for
|
|
# this architecture
|
|
#
|
|
# This file is subject to the terms and conditions of the GNU General Public
|
|
# License. See the file "COPYING" in the main directory of this archive
|
|
# for more details.
|
|
#
|
|
# Copyright (C) 1994 by Linus Torvalds
|
|
#
|
|
# 19990713 Artur Skawina <skawina@geocities.com>
|
|
# Added '-march' and '-mpreferred-stack-boundary' support
|
|
# 20000913 Pavel Machek <pavel@suse.cz>
|
|
# Converted for x86_64 architecture
|
|
# 20010105 Andi Kleen, add IA32 compiler.
|
|
# ....and later removed it again....
|
|
#
|
|
# $Id: Makefile,v 1.31 2002/03/22 15:56:07 ak Exp $
|
|
|
|
#
|
|
# early bootup linking needs 32bit. You can either use real 32bit tools
|
|
# here or 64bit tools in 32bit mode.
|
|
#
|
|
IA32_CC := $(CC) $(CPPFLAGS) -m32 -O2 -fomit-frame-pointer
|
|
IA32_LD := $(LD) -m elf_i386
|
|
IA32_AS := $(CC) $(AFLAGS) -m32 -Wa,--32 -traditional -c
|
|
IA32_OBJCOPY := $(CROSS_COMPILE)objcopy
|
|
IA32_CPP := $(CROSS_COMPILE)gcc -m32 -E
|
|
export IA32_CC IA32_LD IA32_AS IA32_OBJCOPY IA32_CPP
|
|
|
|
|
|
LDFLAGS := -m elf_x86_64
|
|
OBJCOPYFLAGS := -O binary -R .note -R .comment -S
|
|
LDFLAGS_vmlinux :=
|
|
|
|
CHECKFLAGS += -D__x86_64__ -m64
|
|
|
|
cflags-$(CONFIG_MK8) += $(call cc-option,-march=k8)
|
|
cflags-$(CONFIG_MPSC) += $(call cc-option,-march=nocona)
|
|
CFLAGS += $(cflags-y)
|
|
|
|
CFLAGS += -mno-red-zone
|
|
CFLAGS += -mcmodel=kernel
|
|
CFLAGS += -pipe
|
|
# this makes reading assembly source easier, but produces worse code
|
|
# actually it makes the kernel smaller too.
|
|
CFLAGS += -fno-reorder-blocks
|
|
CFLAGS += -Wno-sign-compare
|
|
ifneq ($(CONFIG_DEBUG_INFO),y)
|
|
CFLAGS += -fno-asynchronous-unwind-tables
|
|
# -fweb shrinks the kernel a bit, but the difference is very small
|
|
# it also messes up debugging, so don't use it for now.
|
|
#CFLAGS += $(call cc-option,-fweb)
|
|
endif
|
|
# -funit-at-a-time shrinks the kernel .text considerably
|
|
# unfortunately it makes reading oopses harder.
|
|
CFLAGS += $(call cc-option,-funit-at-a-time)
|
|
# prevent gcc from generating any FP code by mistake
|
|
CFLAGS += $(call cc-option,-mno-sse -mno-mmx -mno-sse2 -mno-3dnow,)
|
|
|
|
head-y := arch/x86_64/kernel/head.o arch/x86_64/kernel/head64.o arch/x86_64/kernel/init_task.o
|
|
|
|
libs-y += arch/x86_64/lib/
|
|
core-y += arch/x86_64/kernel/ \
|
|
arch/x86_64/mm/ \
|
|
arch/x86_64/crypto/
|
|
core-$(CONFIG_IA32_EMULATION) += arch/x86_64/ia32/
|
|
drivers-$(CONFIG_PCI) += arch/x86_64/pci/
|
|
drivers-$(CONFIG_OPROFILE) += arch/x86_64/oprofile/
|
|
|
|
boot := arch/x86_64/boot
|
|
|
|
.PHONY: bzImage bzlilo install archmrproper \
|
|
fdimage fdimage144 fdimage288 archclean
|
|
|
|
#Default target when executing "make"
|
|
all: bzImage
|
|
|
|
BOOTIMAGE := arch/x86_64/boot/bzImage
|
|
KBUILD_IMAGE := $(BOOTIMAGE)
|
|
|
|
bzImage: vmlinux
|
|
$(Q)$(MAKE) $(build)=$(boot) $(BOOTIMAGE)
|
|
|
|
bzlilo: vmlinux
|
|
$(Q)$(MAKE) $(build)=$(boot) BOOTIMAGE=$(BOOTIMAGE) zlilo
|
|
|
|
bzdisk: vmlinux
|
|
$(Q)$(MAKE) $(build)=$(boot) BOOTIMAGE=$(BOOTIMAGE) zdisk
|
|
|
|
install fdimage fdimage144 fdimage288: vmlinux
|
|
$(Q)$(MAKE) $(build)=$(boot) BOOTIMAGE=$(BOOTIMAGE) $@
|
|
|
|
archclean:
|
|
$(Q)$(MAKE) $(clean)=$(boot)
|
|
|
|
prepare: include/asm-$(ARCH)/offset.h
|
|
|
|
arch/$(ARCH)/kernel/asm-offsets.s: include/asm include/linux/version.h \
|
|
include/config/MARKER
|
|
|
|
include/asm-$(ARCH)/offset.h: arch/$(ARCH)/kernel/asm-offsets.s
|
|
$(call filechk,gen-asm-offsets)
|
|
|
|
CLEAN_FILES += include/asm-$(ARCH)/offset.h
|
|
|
|
define archhelp
|
|
echo '* bzImage - Compressed kernel image (arch/$(ARCH)/boot/bzImage)'
|
|
echo ' install - Install kernel using'
|
|
echo ' (your) ~/bin/installkernel or'
|
|
echo ' (distribution) /sbin/installkernel or'
|
|
echo ' install to $$(INSTALL_PATH) and run lilo'
|
|
endef
|
|
|
|
CLEAN_FILES += arch/$(ARCH)/boot/fdimage arch/$(ARCH)/boot/mtools.conf
|
|
|
|
|