Commit graph

6833 commits

Author SHA1 Message Date
Andrew Morton
9316fcacb8 kernel/up.c: omit it if SMP=y, USE_GENERIC_SMP_HELPERS=n
Fix the sparc build - we were including `up.o' on SMP builds, when
CONFIG_USE_GENERIC_SMP_HELPERS=n.

Tested-by: Robert Reif <reif@earthlink.net>
Fixed-by: Robert Reif <reif@earthlink.net>
Cc: David Miller <davem@davemloft.net>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-01-14 09:42:11 -08:00
Peter Zijlstra
0d66bf6d35 mutex: implement adaptive spinning
Change mutex contention behaviour such that it will sometimes busy wait on
acquisition - moving its behaviour closer to that of spinlocks.

This concept got ported to mainline from the -rt tree, where it was originally
implemented for rtmutexes by Steven Rostedt, based on work by Gregory Haskins.

Testing with Ingo's test-mutex application (http://lkml.org/lkml/2006/1/8/50)
gave a 345% boost for VFS scalability on my testbox:

 # ./test-mutex-shm V 16 10 | grep "^avg ops"
 avg ops/sec:               296604

 # ./test-mutex-shm V 16 10 | grep "^avg ops"
 avg ops/sec:               85870

The key criteria for the busy wait is that the lock owner has to be running on
a (different) cpu. The idea is that as long as the owner is running, there is a
fair chance it'll release the lock soon, and thus we'll be better off spinning
instead of blocking/scheduling.

Since regular mutexes (as opposed to rtmutexes) do not atomically track the
owner, we add the owner in a non-atomic fashion and deal with the races in
the slowpath.

Furthermore, to ease the testing of the performance impact of this new code,
there is means to disable this behaviour runtime (without having to reboot
the system), when scheduler debugging is enabled (CONFIG_SCHED_DEBUG=y),
by issuing the following command:

 # echo NO_OWNER_SPIN > /debug/sched_features

This command re-enables spinning again (this is also the default):

 # echo OWNER_SPIN > /debug/sched_features

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-01-14 18:09:02 +01:00
Peter Zijlstra
41719b0309 mutex: preemption fixes
The problem is that dropping the spinlock right before schedule is a voluntary
preemption point and can cause a schedule, right after which we schedule again.

Fix this inefficiency by keeping preemption disabled until we schedule, do this
by explicity disabling preemption and providing a schedule() variant that
assumes preemption is already disabled.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-01-14 18:09:00 +01:00
Peter Zijlstra
93d81d1aca mutex: small cleanup
Remove a local variable by combining an assingment and test in one.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-01-14 18:08:58 +01:00
Gregory Haskins
398a153b16 sched: fix build error in kernel/sched_rt.c when RT_GROUP_SCHED && !SMP
Ingo found a build error in the scheduler when RT_GROUP_SCHED was
enabled, but SMP was not.  This patch rearranges the code such
that it is a little more streamlined and compiles under all permutations
of SMP, UP and RT_GROUP_SCHED.  It was boot tested on my 4-way x86_64
and it still passes preempt-test.

Signed-off-by: Gregory Haskins <ghaskins@novell.com>
2009-01-14 09:10:04 -05:00
Gregory Haskins
b07430ac37 sched: de CPP-ify the scheduler code
Signed-off-by: Gregory Haskins <ghaskins@novell.com>
2009-01-14 08:55:39 -05:00
Heiko Carstens
d4e82042c4 [CVE-2009-0029] System call wrappers part 32
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
2009-01-14 14:15:31 +01:00
Heiko Carstens
836f92adf1 [CVE-2009-0029] System call wrappers part 31
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
2009-01-14 14:15:31 +01:00
Heiko Carstens
6559eed8ca [CVE-2009-0029] System call wrappers part 30
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
2009-01-14 14:15:30 +01:00
Heiko Carstens
1e7bfb2134 [CVE-2009-0029] System call wrappers part 27
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
2009-01-14 14:15:29 +01:00
Heiko Carstens
c4ea37c26a [CVE-2009-0029] System call wrappers part 26
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
2009-01-14 14:15:29 +01:00
Heiko Carstens
e48fbb699f [CVE-2009-0029] System call wrappers part 24
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
2009-01-14 14:15:28 +01:00
Heiko Carstens
5a8a82b1d3 [CVE-2009-0029] System call wrappers part 23
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
2009-01-14 14:15:28 +01:00
Heiko Carstens
003d7ab479 [CVE-2009-0029] System call wrappers part 19
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
2009-01-14 14:15:26 +01:00
Heiko Carstens
a6b42e83f2 [CVE-2009-0029] System call wrappers part 18
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
2009-01-14 14:15:25 +01:00
Heiko Carstens
ca013e945b [CVE-2009-0029] System call wrappers part 17
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
2009-01-14 14:15:25 +01:00
Heiko Carstens
a5f8fa9e9b [CVE-2009-0029] System call wrappers part 09
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
2009-01-14 14:15:21 +01:00
Heiko Carstens
17da2bd90a [CVE-2009-0029] System call wrappers part 08
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
2009-01-14 14:15:21 +01:00
Heiko Carstens
754fe8d297 [CVE-2009-0029] System call wrappers part 07
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
2009-01-14 14:15:20 +01:00
Heiko Carstens
5add95d4f7 [CVE-2009-0029] System call wrappers part 06
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
2009-01-14 14:15:20 +01:00
Heiko Carstens
362e9c07c7 [CVE-2009-0029] System call wrappers part 05
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
2009-01-14 14:15:20 +01:00
Heiko Carstens
b290ebe2c4 [CVE-2009-0029] System call wrappers part 04
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
2009-01-14 14:15:19 +01:00
Heiko Carstens
ae1251ab78 [CVE-2009-0029] System call wrappers part 03
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
2009-01-14 14:15:19 +01:00
Heiko Carstens
dbf040d9d1 [CVE-2009-0029] System call wrappers part 02
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
2009-01-14 14:15:19 +01:00
Heiko Carstens
58fd3aa288 [CVE-2009-0029] System call wrappers part 01
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
2009-01-14 14:15:18 +01:00
Heiko Carstens
f627a741d2 [CVE-2009-0029] Make sys_syslog a conditional system call
Remove the -ENOSYS implementation for !CONFIG_PRINTK and use
the cond_syscall infrastructure instead.

Acked-by: Kyle McMartin <kyle@redhat.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
2009-01-14 14:15:16 +01:00
Heiko Carstens
2ed7c03ec1 [CVE-2009-0029] Convert all system calls to return a long
Convert all system calls to return a long. This should be a NOP since all
converted types should have the same size anyway.
With the exception of sys_exit_group which returned void. But that doesn't
matter since the system call doesn't return.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
2009-01-14 14:15:14 +01:00
Ingo Molnar
14819ea1e0 irq: export __set_irq_handler() and handle_level_irq()
Impact: build fix

ARM updates broke x86 allmodconfig builds:

 ERROR: "__set_irq_handler" [drivers/mfd/pcf50633-core.ko] undefined!
 ERROR: "handle_level_irq" [drivers/mfd/pcf50633-core.ko] undefined!

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-01-14 12:34:21 +01:00
Uwe Kleine-Koenig
32632920a7 ftrace, trivial: fix typo "resgister" -> "register"
Signed-off-by: Uwe Kleine-Koenig <ukleinek@strlen.de>
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-01-14 12:11:49 +01:00
Frederic Weisbecker
e1d8aa9f1d tracing: add a new workqueue tracer
Impact: new tracer

The workqueue tracer provides some statistical informations
about each cpu workqueue thread such as the number of the
works inserted and executed since their creation. It can help
to evaluate the amount of work each of them have to perform.
For example it can help a developer to decide whether he should
choose a per cpu workqueue instead of a singlethreaded one.

It only traces statistical informations for now but it will probably later
provide event tracing too.

Such a tracer could help too, and be improved, to help rt priority sorted
workqueue development.

To have a snapshot of the workqueues state at any time, just do

cat /debugfs/tracing/trace_stat/workqueues

Ie:

  1    125        125       reiserfs/1
  1      0          0       scsi_tgtd/1
  1      0          0       aio/1
  1      0          0       ata/1
  1    114        114       kblockd/1
  1      0          0       kintegrityd/1
  1   2147       2147       events/1

  0      0          0       kpsmoused
  0    105        105       reiserfs/0
  0      0          0       scsi_tgtd/0
  0      0          0       aio/0
  0      0          0       ata_aux
  0      0          0       ata/0
  0      0          0       cqueue
  0      0          0       kacpi_notify
  0      0          0       kacpid
  0    149        149       kblockd/0
  0      0          0       kintegrityd/0
  0   1000       1000       khelper
  0   2270       2270       events/0

Changes in V2:

_ Drop the static array based on NR_CPU and dynamically allocate the stat array
  with num_possible_cpus() and other cpu mask facilities....
_ Trace workqueue insertion at a bit lower level (insert_work instead of queue_work) to handle
  even the workqueue barriers.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-01-14 12:11:43 +01:00
Frederic Weisbecker
002bb86d8d tracing/ftrace: separate events tracing and stats tracing engine
Impact: tracing's Api change

Currently, the stat tracing depends on the events tracing.
When you switch to a new tracer, the stats files of the previous tracer
will disappear. But it's more scalable to separate those two engines.
This way, we can keep the stat files of one or several tracers when we
want, without bothering of multiple tracer stat files or tracer switching.

To build/destroys its stats files, a tracer just have to call
register_stat_tracer/unregister_stat_tracer everytimes it wants to.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-01-14 12:11:37 +01:00
Shaohua Li
f00012074b ftrace, ia64: Add macro for ftrace_caller
Define FTRACE_ADDR. In IA64, a function pointer isn't a 'unsigned long' but a
'struct {unsigned long ip, unsigned long gp}'.

Signed-off-by: Shaohua Li <shaohua.li@intel.com>
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-01-14 12:11:18 +01:00
Shaohua Li
25aac9dc7c ftrace, ia64: explictly ignore a file in recordmcount.pl
In IA64, a function pointer isn't a 'unsigned long' but a
'struct {unsigned long ip, unsigned long gp}'. MCOUNT_ADDR is determined
at link time not compile time, so explictly ignore kernel/trace/ftrace.o
in recordmcount.pl.

Signed-off-by: Shaohua Li <shaohua.li@intel.com>
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-01-14 12:11:13 +01:00
Mandeep Singh Baines
baf48f6577 softlock: fix false panic which can occur if softlockup_thresh is reduced
At run-time, if softlockup_thresh is changed to a much lower value,
touch_timestamp is likely to be much older than the new softlock_thresh.

This will cause a false softlockup to be detected. If softlockup_panic
is enabled, the system will panic.

The fix is to touch all watchdogs before changing softlockup_thresh.

Signed-off-by: Mandeep Singh Baines <msb@google.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-01-14 11:48:07 +01:00
Lai Jiangshan
e4fa4c9701 rcu: add __cpuinit to rcu_init_percpu_data()
Impact: reduce memory footprint

add __cpuinit to rcu_init_percpu_data(), and this function's text
will be discarded after boot when !CONFIG_HOTPLUG_CPU.

Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-01-14 11:26:40 +01:00
Linus Torvalds
28839855bf Merge branch 'core-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'core-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  smp_call_function_single(): be slightly less stupid, fix #2
  lockdep, mm: fix might_fault() annotation
2009-01-13 09:02:21 -08:00
Yinghai Lu
4a046d1754 x86: arch_probe_nr_irqs
Impact: save RAM with large NR_CPUS, get smaller nr_irqs

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Mike Travis <travis@sgi.com>
2009-01-12 17:39:24 -08:00
Arjan van de Ven
37a76bd4f1 async: fix __lowest_in_progress()
At 37000 feet somewhere near Greenland I woke up from a half-sleep with the
realisation that __lowest_in_progress() is buggy. After landing I checked
and there were indeed 2 problems with it; this patch fixes both:
* The order of the list checks was wrong
* The locking was not correct.

Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-01-12 16:39:58 -08:00
Linus Torvalds
12847095e9 Merge branch 'sched-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'sched-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  kernel/sched.c: add missing forward declaration for 'double_rq_lock'
  sched: partly revert "sched debug: remove NULL checking in print_cfs_rt_rq()"
  cpumask: fix CONFIG_NUMA=y sched.c
2009-01-12 16:29:00 -08:00
Ingo Molnar
6e96281412 smp_call_function_single(): be slightly less stupid, fix #2
fix m68k build failure:

 tip/kernel/up.c: In function 'smp_call_function_single':
 tip/kernel/up.c:16: error: dereferencing pointer to incomplete type
 make[2]: *** [kernel/up.o] Error 1

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-01-12 16:04:37 +01:00
Ingo Molnar
01e3eb8227 Revert "sched: improve preempt debugging"
This reverts commit 7317d7b87e.

This has been reported (and bisected) by Alexey Zaytsev and
Kamalesh Babulal to produce annoying warnings during bootup
on both x86 and powerpc.

kernel_locked() is not a valid test in IRQ context (we update the
BKL's ->lock_depth and the preempt count separately and non-atomicalyy),
so we cannot put it into the generic preempt debugging checks which
can run in IRQ contexts too.

Reported-and-bisected-by: Alexey Zaytsev <alexey.zaytsev@gmail.com>
Reported-and-bisected-by: Kamalesh Babulal <kamalesh@linux.vnet.ibm.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-01-12 13:00:50 +01:00
Ingo Molnar
e3ee1e1231 Merge commit 'v2.6.29-rc1' into timers/hrtimers
Conflicts:
	kernel/time/tick-common.c
2009-01-12 11:32:03 +01:00
Mike Travis
92296c6d6e cpumask, irq: non-x86 build failures
Ingo Molnar wrote:

> All non-x86 architectures fail to build:
>
> In file included from /home/mingo/tip/include/linux/random.h:11,
>                  from /home/mingo/tip/include/linux/stackprotector.h:6,
>                  from /home/mingo/tip/init/main.c:17:
> /home/mingo/tip/include/linux/irqnr.h:26:63: error: asm/irq_vectors.h: No such file or directory

Do not include asm/irq_vectors.h in generic code - it's not available
on all architectures.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-01-11 19:13:45 +01:00
Mike Travis
542d865bbe kstat: modify kstat_irqs_legacy to be variable sized
Impact: reduce memory usage.

Allocate kstat_irqs_legacy based on nr_cpu_ids to deal with this
memory usage bump when NR_CPUS bumped from 128 to 4096:

     8192   +253952    262144 +3100%  kstat_irqs_legacy(.bss)

This is only when CONFIG_SPARSE_IRQS=y.

Signed-off-by: Mike Travis <travis@sgi.com>
2009-01-11 19:13:42 +01:00
Mike Travis
9332fccded irq: initialize nr_irqs based on nr_cpu_ids
Impact: Reduce memory usage.

This is the second half of the changes to make the irq_desc_ptrs be
variable sized based on nr_cpu_ids.  This is done by adding a new
"max_nr_irqs" macro to irq_vectors.h (and a dummy in irqnr.h) to
return a max NR_IRQS value based on NR_CPUS or nr_cpu_ids.

This necessitated moving the define of MAX_IO_APICS to a separate
file (asm/apicnum.h) so it could be included without the baggage
of the other asm/apicdef.h declarations.

Signed-off-by: Mike Travis <travis@sgi.com>
2009-01-11 19:13:38 +01:00
Mike Travis
0fa0ebbf15 irq: allocate irq_desc_ptrs array based on nr_irqs
Impact: allocate irq_desc_ptrs in preparation for making it variable-sized.

This addresses this memory usage bump when NR_CPUS bumped from 128 to 4096:

    34816   +229376    264192  +658%  irq_desc_ptrs(.data.read_mostly)

The patch is split into two parts, the first simply allocates the
irq_desc_ptrs array.  Then next will deal with making it variable.
This is only when CONFIG_SPARSE_IRQS=y.

Signed-off-by: Mike Travis <travis@sgi.com>
2009-01-11 19:13:35 +01:00
Mike Travis
e2f4d06545 irq: use WARN() instead of WARN_ON().
Impact: cleanup WARN msg.

Ingo requested:
> While at it, could you please also convert this to a WARN() construct
> instead? (in a separate commit)

... and it shall be done.  ;-)

Signed-off-by: Mike Travis <travis@sgi.com>
2009-01-11 19:13:32 +01:00
Mike Travis
9594949b06 irq: change references from NR_IRQS to nr_irqs
Impact: preparation, cleanup, add KERN_INFO printk

Modify references from NR_IRQS to nr_irqs as the later will become
variable-sized based on nr_cpu_ids when CONFIG_SPARSE_IRQS=y.

Signed-off-by: Mike Travis <travis@sgi.com>
2009-01-11 19:13:29 +01:00
Mike Travis
d38b223c86 cpumask: reduce stack usage in find_lowest_rq
Impact: reduce stack usage, cleanup

Use a cpumask_var_t in find_lowest_rq() and clean up other old
cpumask_t calls.

Signed-off-by: Mike Travis <travis@sgi.com>
2009-01-11 19:13:22 +01:00
Mike Travis
802bf931f2 cpumask: fix bug in use cpumask_var_t in irq_desc
Impact: fix bug where new irq_desc uses old cpumask pointers which are freed.

As Yinghai pointed out, init_copy_one_irq_desc() copies the old desc to
the new desc overwriting the cpumask pointers.  Since the old_desc and
the cpumask pointers are freed, then memory corruption will occur if
these old pointers are used.

Move the allocation of these pointers to after the copy.

Signed-off-by: Mike Travis <travis@sgi.com>
Cc: Yinghai Lu <yinghai@kernel.org>
2009-01-11 19:13:02 +01:00