Commit graph

377885 commits

Author SHA1 Message Date
Jiri Pirko
b79462a8b9 team: fix checks in team_get_first_port_txable_rcu()
should be checked if "cur" is txable, not "port".

Introduced by commit 6e88e1357c "team: use function team_port_txable()
for determing enabled and up port"

Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-12 00:56:27 -07:00
Jiri Pirko
72df935d98 team: move add to port list before port enablement
team_port_enable() adds port to port_hashlist. Reader sees port
in team_get_port_by_index_rcu() and returns it, but
team_get_first_port_txable_rcu() tries to go through port_list, where the
port is not inserted yet -> NULL pointer dereference.
Fix this by reordering port_list and port_hashlist insertion.
Panic is easily triggeable when txing packets and adding/removing port
in a loop.

Introduced by commit 3d249d4c "net: introduce ethernet teaming device"

Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-12 00:56:27 -07:00
Jiri Pirko
76c455decb team: check return value of team_get_port_by_index_rcu() for NULL
team_get_port_by_index_rcu() might return NULL due to race between port
removal and skb tx path. Panic is easily triggeable when txing packets
and adding/removing port in a loop.

introduced by commit 3d249d4ca "net: introduce ethernet teaming device"
and commit 753f993911 "team: introduce random mode" (for random mode)

Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-12 00:56:27 -07:00
Daniel Borkmann
7a6e288d27 pktgen: ipv6: numa: consolidate skb allocation to pktgen_alloc_skb
We currently allow for numa-node aware skb allocation only within the
fill_packet_ipv4() path, but not in fill_packet_ipv6(). Consolidate that
code to a common allocation helper to enable numa-node aware skb
allocation for ipv6, and use it in both paths. This also makes both
functions a bit more readable.

Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-12 00:47:25 -07:00
Daniel Borkmann
da5bab079f net: udp4: move GSO functions to udp_offload
Similarly to TCP offloading and UDPv6 offloading, move all related
UDPv4 functions to udp_offload.c to make things more explicit. Also,
by this, we can make those functions static.

Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-12 00:47:25 -07:00
Jason Wang
19a6afb23e tuntap: set SOCK_ZEROCOPY flag during open
Commit 54f968d6ef
(tuntap: move socket to tun_file) forgets to set SOCK_ZEROCOPY flag, which will
prevent vhost_net from doing zercopy w/ tap. This patch fixes this by setting
it during file open.

Cc: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-12 00:44:35 -07:00
Shawn Bohrer
946d3bd723 igmp: remove unnecessary in_device member zeroing
ip_mc_init_dev() is passed a freshly kzalloc'd in_device so it is
unnecessary to explicitly zero out the members.

Signed-off-by: Shawn Bohrer <sbohrer@rgmadvisors.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-12 00:41:15 -07:00
Eric Dumazet
e989707135 igmp: hash a hash table to speedup ip_check_mc_rcu()
After IP route cache removal, multicast applications using
a lot of multicast addresses hit a O(N) behavior in ip_check_mc_rcu()

Add a per in_device hash table to get faster lookup.

This hash table is created only if the number of items in mc_list is
above 4.

Reported-by: Shawn Bohrer <sbohrer@rgmadvisors.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Tested-by: Shawn Bohrer <sbohrer@rgmadvisors.com>
Reviewed-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-12 00:25:23 -07:00
Eric Dumazet
64153ce0a7 net_sched: htb: do not setup default rate estimators
With a thousand htb classes, est_timer() spends ~5 million cpu cycles
and throws out cpu cache, because each htb class has a default
rate estimator (est 4sec 16sec).

Most users do not use default rate estimators, so switch htb
to not setup ones.

Add a module parameter (htb_rate_est) so that users relying
on this default rate estimator can revert the behavior.

echo 1 >/sys/module/sch_htb/parameters/htb_rate_est

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-12 00:14:21 -07:00
Simon Wunderlich
795d855d56 mac80211: Fix rate control mask matching call
The order of parameters was mixed up, introduced in commit
"mac80211: improve the rate control API"

Cc: Felix Fietkau <nbd@openwrt.org>
Signed-off-by: Simon Wunderlich <siwu@hrz.tu-chemnitz.de>
Signed-off-by: Mathias Kretschmer <mathias.kretschmer@fokus.fraunhofer.de>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-06-12 09:12:43 +02:00
Simon Wunderlich
a6b368f6ca mac80211: abort CAC in stop_ap()
When a CAC is running and stop_ap is called (e.g. when hostapd is killed
while performing CAC), the CAC must be aborted immediately.
Otherwise ieee80211_stop_ap() will try to stop it when it's too late -
wdev->channel is already NULL and the abort event can not be generated.

Signed-off-by: Simon Wunderlich <siwu@hrz.tu-chemnitz.de>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-06-12 09:12:43 +02:00
Johannes Berg
35d865afbb mac80211: work around broken APs not including HT info
There are some APs, notably 2G/3G/4G Wifi routers, specifically the
"Onda PN51T", "Vodafone PocketWiFi 2", "ZTE MF60" and a similar
T-Mobile branded device [1] that erroneously don't include all the
needed information in (re)association response frames. Work around
this by assuming the information is the same as it was in the
beacon or probe response and using the data from there instead.

This fixes https://bugzilla.kernel.org/show_bug.cgi?id=58881.

[1] https://bbs.archlinux.org/viewtopic.php?pid=1277305

Note that this requires marking the first ieee802_11_parse_elems()
argument const, otherwise we'd get a compiler warning.

Cc: stable@vger.kernel.org
Reported-and-tested-by: Michal Zajac <manwe@manwe.pl>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-06-12 09:11:54 +02:00
Linus Torvalds
77293e215e Merge branch 'fixes-3.10' of git://git.infradead.org/users/willy/linux-nvme
Pull NVMe fixes from Matthew Wilcox.

* 'fixes-3.10' of git://git.infradead.org/users/willy/linux-nvme:
  NVMe: Add MSI support
  NVMe: Use dma_set_mask() correctly
  Return the result from user admin command IOCTL even in case of failure
  NVMe: Do not cancel command multiple times
  NVMe: fix error return code in nvme_submit_bio_queue()
  NVMe: check for integer overflow in nvme_map_user_pages()
  MAINTAINERS: update NVM EXPRESS DRIVER file list
  NVMe: Fix a signedness bug in nvme_trans_modesel_get_mp
  NVMe: Remove redundant version.h header include
2013-06-11 23:07:21 -07:00
Eric Dumazet
130d3d68b5 net_sched: psched_ratecfg_precompute() improvements
Before allowing 64bits bytes rates, refactor
psched_ratecfg_precompute() to get better comments
and increased accuracy.

rate_bps field is renamed to rate_bytes_ps, as we only
have to worry about bytes per second.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Ben Greear <greearb@candelatech.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-11 22:39:47 -07:00
Olof Johansson
323226bbb3 mvebu fixes for v3.10 round 4
- mvebu
     - fix PCIe ranges property so NOR flash is visible
 
  - kirkwood
     - fix identification of 88f6282 so MPPs can be set correctly
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.19 (GNU/Linux)
 
 iQEcBAABAgAGBQJRtjVFAAoJEAi3KVZQDZAe4g4IAJcgufr2WBHJtGwVAGiSQLCr
 gI3NjyfeJP98P7Bt/7ZyXBx5CmW3I6gVLN6Sctjpk01VISregbWv8Qos8iJx31hq
 gqVmPJvaJy3dwHvq15rFMzNgsZGBTVV85wdnhppfeRTtt0nmREeJsLmgI1c5z0bw
 u9BolnU1+sQn0S/TV/A4EZhyc18kAGzGqOKLOm3O4Ftrxncjq0idaSC5TOTDlA8Q
 1D5bDbmaXyu7J1lqnYbbIsn9E4MTxfjbcrVnK1pEf7qhhtrmfJWiqLuay5plo/43
 zMHefOc8rI74QUHxXuv6IS8QZcKKcfSJBpRCmxK6Ks74I9D6T9l7FxDb/2ckgJk=
 =gHgq
 -----END PGP SIGNATURE-----

Merge tag 'fixes-3.10-4' of git://git.infradead.org/users/jcooper/linux into fixes

From Jason Cooper, mvebu fixes for v3.10 round 4:
 - mvebu
    - fix PCIe ranges property so NOR flash is visible
 - kirkwood
    - fix identification of 88f6282 so MPPs can be set correctly

* tag 'fixes-3.10-4' of git://git.infradead.org/users/jcooper/linux:
  arm: mvebu: armada-xp-{gp,openblocks-ax3-4}: specify PCIe range
  ARM: Kirkwood: handle mv88f6282 cpu in __kirkwood_variant().

Signed-off-by: Olof Johansson <olof@lixom.net>
2013-06-11 17:05:56 -07:00
Alexander Shishkin
0c3f3dc68b usb: chipidea: fix id change handling
Re-enable chipidea irq even if there's no role changing to do. This is
a problem since b183c19f ("USB: chipidea: re-order irq handling to avoid
unhandled irqs"); when it manifests, chipidea irq gets disabled for good.

Cc: stable@vger.kernel.org # v3.7
Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-06-11 16:18:05 -07:00
Alexander Shishkin
d343f4e8d6 usb: chipidea: fix no transceiver case
Since usb phy code does return ERR_PTR() values, make sure that we don't
end up dereferencing them. This is a problem, for example, on platforms
that don't register a phy for chipidea since b7fa5c2a ("usb: phy: return
-ENXIO when PHY layer isn't enabled").

Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-06-11 16:18:04 -07:00
John W. Linville
3899ba90a4 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next into for-davem
Conflicts:
	drivers/net/wireless/ath/ath9k/debug.c
	net/mac80211/iface.c
2013-06-11 14:48:32 -04:00
Linus Torvalds
af180b81a3 Merge branch 'fixes' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull kvm bugfixes from Gleb Natapov:
 "There is one more fix for MIPS KVM ABI here, MIPS and PPC build
  breakage fixes and a couple of PPC bug fixes"

* 'fixes' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
  kvm/ppc/booke64: Fix lazy ee handling in kvmppc_handle_exit()
  kvm/ppc/booke: Hold srcu lock when calling gfn functions
  kvm/ppc/booke64: Disable e6500 support
  kvm/ppc/booke64: Fix AltiVec interrupt numbers and build breakage
  mips/kvm: Use KVM_REG_MIPS and proper size indicators for *_ONE_REG
  kvm: Add definition of KVM_REG_MIPS
  KVM: add kvm_para_available to asm-generic/kvm_para.h
2013-06-11 11:16:43 -07:00
Yoshihiro YUNOMAE
58e8eedf18 tracing: Fix outputting formats of x86-tsc and counter when use trace_clock
Outputting formats of x86-tsc and counter should be a raw format, but after
applying the patch(2b6080f28c), the format was
changed to nanosec. This is because the global variable trace_clock_id was used.
When we use multiple buffers, clock_id of each sub-buffer should be used. Then,
this patch uses tr->clock_id instead of the global variable trace_clock_id.

[ Basically, this fixes a regression where the multibuffer code changed the
  trace_clock file to update tr->clock_id but the traces still use the old
  global trace_clock_id variable, negating the file's effect. The global
  trace_clock_id variable is obsolete and removed. - SR ]

Link: http://lkml.kernel.org/r/20130423013239.22334.7394.stgit@yunodevel

Signed-off-by: Yoshihiro YUNOMAE <yoshihiro.yunomae.ez@hitachi.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2013-06-11 13:58:46 -04:00
Solomon Peachy
8b3e7be437 cw1200: Fix an assorted pile of checkpatch warnings.
Signed-off-by: Solomon Peachy <pizza@shaftnet.org>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2013-06-11 12:48:10 -04:00
Solomon Peachy
19db577868 cw1200: Eliminate the ETF debug/engineering code.
This is only really useful for people who are bringing up new hardware
designs and have access to the proprietary vendor tools that interface
with this mode.

It'll live out of tree until it's rewritten to use a less kludgy interface.

Signed-off-by: Solomon Peachy <pizza@shaftnet.org>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2013-06-11 12:48:10 -04:00
Solomon Peachy
fa8eeae102 cw1200: Remove "ITP" debug subsystem.
This can live on as an out-of-tree patch for those that care.

Signed-off-by: Solomon Peachy <pizza@shaftnet.org>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2013-06-11 12:48:09 -04:00
Patrick McHardy
7cdbac71f9 netlink: fix error propagation in netlink_mmap()
Return the error if something went wrong instead of unconditionally
returning 0.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-11 02:52:47 -07:00
Eric Dumazet
45203a3b38 net_sched: add 64bit rate estimators
struct gnet_stats_rate_est contains u32 fields, so the bytes per second
field can wrap at 34360Mbit.

Add a new gnet_stats_rate_est64 structure to get 64bit bps/pps fields,
and switch the kernel to use this structure natively.

This structure is dumped to user space as a new attribute :

TCA_STATS_RATE_EST64

Old tc command will now display the capped bps (to 34360Mbit), instead
of wrapped values, and updated tc command will display correct
information.

Old tc command output, after patch :

eric:~# tc -s -d qd sh dev lo
qdisc pfifo 8001: root refcnt 2 limit 1000p
 Sent 80868245400 bytes 1978837 pkt (dropped 0, overlimits 0 requeues 0)
 rate 34360Mbit 189696pps backlog 0b 0p requeues 0

This patch carefully reorganizes "struct Qdisc" layout to get optimal
performance on SMP.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-11 02:51:03 -07:00
Daniel Borkmann
1abd165ed7 net: sctp: fix NULL pointer dereference in socket destruction
While stress testing sctp sockets, I hit the following panic:

BUG: unable to handle kernel NULL pointer dereference at 0000000000000020
IP: [<ffffffffa0490c4e>] sctp_endpoint_free+0xe/0x40 [sctp]
PGD 7cead067 PUD 7ce76067 PMD 0
Oops: 0000 [#1] SMP
Modules linked in: sctp(F) libcrc32c(F) [...]
CPU: 7 PID: 2950 Comm: acc Tainted: GF            3.10.0-rc2+ #1
Hardware name: Dell Inc. PowerEdge T410/0H19HD, BIOS 1.6.3 02/01/2011
task: ffff88007ce0e0c0 ti: ffff88007b568000 task.ti: ffff88007b568000
RIP: 0010:[<ffffffffa0490c4e>]  [<ffffffffa0490c4e>] sctp_endpoint_free+0xe/0x40 [sctp]
RSP: 0018:ffff88007b569e08  EFLAGS: 00010292
RAX: 0000000000000000 RBX: ffff88007db78a00 RCX: dead000000200200
RDX: ffffffffa049fdb0 RSI: ffff8800379baf38 RDI: 0000000000000000
RBP: ffff88007b569e18 R08: ffff88007c230da0 R09: 0000000000000001
R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
R13: ffff880077990d00 R14: 0000000000000084 R15: ffff88007db78a00
FS:  00007fc18ab61700(0000) GS:ffff88007fc60000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 0000000000000020 CR3: 000000007cf9d000 CR4: 00000000000007e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Stack:
 ffff88007b569e38 ffff88007db78a00 ffff88007b569e38 ffffffffa049fded
 ffffffff81abf0c0 ffff88007db78a00 ffff88007b569e58 ffffffff8145b60e
 0000000000000000 0000000000000000 ffff88007b569eb8 ffffffff814df36e
Call Trace:
 [<ffffffffa049fded>] sctp_destroy_sock+0x3d/0x80 [sctp]
 [<ffffffff8145b60e>] sk_common_release+0x1e/0xf0
 [<ffffffff814df36e>] inet_create+0x2ae/0x350
 [<ffffffff81455a6f>] __sock_create+0x11f/0x240
 [<ffffffff81455bf0>] sock_create+0x30/0x40
 [<ffffffff8145696c>] SyS_socket+0x4c/0xc0
 [<ffffffff815403be>] ? do_page_fault+0xe/0x10
 [<ffffffff8153cb32>] ? page_fault+0x22/0x30
 [<ffffffff81544e02>] system_call_fastpath+0x16/0x1b
Code: 0c c9 c3 66 2e 0f 1f 84 00 00 00 00 00 e8 fb fe ff ff c9 c3 66 0f
      1f 84 00 00 00 00 00 55 48 89 e5 53 48 83 ec 08 66 66 66 66 90 <48>
      8b 47 20 48 89 fb c6 47 1c 01 c6 40 12 07 e8 9e 68 01 00 48
RIP  [<ffffffffa0490c4e>] sctp_endpoint_free+0xe/0x40 [sctp]
 RSP <ffff88007b569e08>
CR2: 0000000000000020
---[ end trace e0d71ec1108c1dd9 ]---

I did not hit this with the lksctp-tools functional tests, but with a
small, multi-threaded test program, that heavily allocates, binds,
listens and waits in accept on sctp sockets, and then randomly kills
some of them (no need for an actual client in this case to hit this).
Then, again, allocating, binding, etc, and then killing child processes.

This panic then only occurs when ``echo 1 > /proc/sys/net/sctp/auth_enable''
is set. The cause for that is actually very simple: in sctp_endpoint_init()
we enter the path of sctp_auth_init_hmacs(). There, we try to allocate
our crypto transforms through crypto_alloc_hash(). In our scenario,
it then can happen that crypto_alloc_hash() fails with -EINTR from
crypto_larval_wait(), thus we bail out and release the socket via
sk_common_release(), sctp_destroy_sock() and hit the NULL pointer
dereference as soon as we try to access members in the endpoint during
sctp_endpoint_free(), since endpoint at that time is still NULL. Now,
if we have that case, we do not need to do any cleanup work and just
leave the destruction handler.

Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Acked-by: Vlad Yasevich <vyasevich@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-11 02:49:05 -07:00
Peter Pan(潘卫平)
b41abb42bf net: pass correct parameter to skb_headers_offset_update()
Since commit 1a37e412a022(net: Use 16bits for *_headers fields of struct
skbuff), skb->*_header are relative to skb->head,
so copy_skb_header() should not call skb_headers_offset_update() now,
and we should pass correct parameter to skb_headers_offset_update() in
pskb_expand_head() and skb_copy_expand().

Signed-off-by: Weiping Pan <panweiping3@gmail.com>
Reviewed-by: Simon Horman <horms@verge.net.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-11 02:48:06 -07:00
Michael S. Tsirkin
288cfe78c8 vhost: fix ubuf_info cleanup
vhost_net_clear_ubuf_info didn't clear ubuf_info
after kfree, this could trigger double free.
Fix this and simplify this code to make it more robust: make sure
ubuf info is always freed through vhost_net_clear_ubuf_info.

Reported-by: Tommi Rantala <tt.rantala@gmail.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-11 02:46:21 -07:00
Michael S. Tsirkin
05c0535194 vhost: check owner before we overwrite ubuf_info
If device has an owner, we shouldn't touch ubuf_info
since it might be in use.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-11 02:46:21 -07:00
Bjørn Mork
c2020be3c3 qmi_wwan/cdc_ether: let qmi_wwan handle the Huawei E1820
Another QMI speaking Qualcomm based device, which should be
driven by qmi_wwan, while cdc_ether should ignore it.

Like on other Huawei devices, the wwan function can appear
either as a single vendor specific interface or as a CDC ECM
class function using separate control and data interfaces.
The ECM control interface protocol is 0xff, likely in an
attempt to indicate that vendor specific management is
required.

In addition to the near standard CDC class, Huawei also add
vendor specific AT management commands to their firmwares.
This is probably an attempt to support non-Windows systems
using standard class drivers.  Unfortunately, this part of
the firmware is often buggy.  Linux is much better off using
whatever native vendor specific management protocol the
device offers, and Windows uses, whenever possible. This
means QMI in the case of Qualcomm based devices.

The E1820 has been verified to work fine with QMI.

Matching on interface number is necessary to distiguish the
wwan function from serial functions in the single interface
mode, as both function types will have class/subclass/function
set to ff/ff/ff.

The control interface number does not change in CDC ECM mode,
so the interface number matching rule is sufficient to handle
both modes.  The cdc_ether blacklist entry is only relevant in
CDC ECM mode, but using a similar interface number based rule
helps document this as a transfer from one driver to another.

Other Huawei 02/06/ff devices are left with the cdc_ether driver
because we do not know whether they are based on Qualcomm chips.
The Huawei specific AT command management is known to be somewhat
hardware independent, and their usage of these class codes may
also be independent of the modem hardware.

Reported-by: Graham Inggs <graham.inggs@uct.ac.za>
Signed-off-by: Bjørn Mork <bjorn@mork.no>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-11 02:43:22 -07:00
Gao feng
da12c90e09 netlink: Add compare function for netlink_table
As we know, netlink sockets are private resource of
net namespace, they can communicate with each other
only when they in the same net namespace. this works
well until we try to add namespace support for other
subsystems which use netlink.

Don't like ipv4 and route table.., it is not suited to
make these subsytems belong to net namespace, Such as
audit and crypto subsystems,they are more suitable to
user namespace.

So we must have the ability to make the netlink sockets
in same user namespace can communicate with each other.

This patch adds a new function pointer "compare" for
netlink_table, we can decide if the netlink sockets can
communicate with each other through this netlink_table
self-defined compare function.

The behavior isn't changed if we don't provide the compare
function for netlink_table.

Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com>
Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-11 02:39:42 -07:00
Dave Airlie
df63d3ecbc Merge tag 'drm-intel-fixes-2013-06-11' of git://people.freedesktop.org/~danvet/drm-intel into drm-fixes
Daniel writes:
Just tiny regression fixes here:
- Two fixes to fix sdvo hotplug which broke in the hpd storm detection
  work.
- One fix to patch-up the sdvo lvds regression fixer from the last pull -
  we need to prefer the vbt mode over edid modes.

* tag 'drm-intel-fixes-2013-06-11' of git://people.freedesktop.org/~danvet/drm-intel:
  drm/i915: prefer VBT modes for SVDO-LVDS over EDID
  drm/i915: Enable hotplug interrupts after querying hw capabilities.
  drm/i915: Fix hotplug interrupt enabling for SDVOC
2013-06-11 19:38:27 +10:00
Li RongQing
8249152c47 xen-netfront: use skb_partial_csum_set() to simplify the codes
use skb_partial_csum_set() to simplify the codes

Cc: Jason Wang <jasowang@redhat.com>
Signed-off-by: Li RongQing <roy.qing.li@gmail.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-11 02:34:36 -07:00
Sergei Shtylyov
9f8c4265bd sh_eth: fix result of sh_eth_check_reset() on timeout
When  the first loop in sh_eth_check_reset() runs to its end, 'cnt' is 0, so the
following check for 'cnt < 0' fails to catch the timeout.  Fix the  condition in
this check, so that the timeout  is actually reported.
While at it, fix the grammar in the failure message...

Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-11 02:28:48 -07:00
Sebastian Siewior
2786aae7fc net/ti davinci_mdio: don't hold a spin lock while calling pm_runtime
was playing with suspend and run into this:

|BUG: sleeping function called from invalid context at drivers/base/power/runtime.c:891
|in_atomic(): 1, irqs_disabled(): 0, pid: 1963, name: bash
|6 locks held by bash/1963:
|CPU: 0 PID: 1963 Comm: bash Not tainted 3.10.0-rc4+ #50
|[<c0014fdc>] (unwind_backtrace+0x0/0xf8) from [<c0011da4>] (show_stack+0x10/0x14)
|[<c0011da4>] (show_stack+0x10/0x14) from [<c02e8680>] (__pm_runtime_idle+0xa4/0xac)
|[<c02e8680>] (__pm_runtime_idle+0xa4/0xac) from [<c0341158>] (davinci_mdio_suspend+0x6c/0x9c)
|[<c0341158>] (davinci_mdio_suspend+0x6c/0x9c) from [<c02e0628>] (platform_pm_suspend+0x2c/0x54)
|[<c02e0628>] (platform_pm_suspend+0x2c/0x54) from [<c02e52bc>] (dpm_run_callback.isra.3+0x2c/0x64)
|[<c02e52bc>] (dpm_run_callback.isra.3+0x2c/0x64) from [<c02e57e4>] (__device_suspend+0x100/0x22c)
|[<c02e57e4>] (__device_suspend+0x100/0x22c) from [<c02e67e8>] (dpm_suspend+0x68/0x230)
|[<c02e67e8>] (dpm_suspend+0x68/0x230) from [<c0072a20>] (suspend_devices_and_enter+0x68/0x350)
|[<c0072a20>] (suspend_devices_and_enter+0x68/0x350) from [<c0072f18>] (pm_suspend+0x210/0x24c)
|[<c0072f18>] (pm_suspend+0x210/0x24c) from [<c0071c74>] (state_store+0x6c/0xbc)
|[<c0071c74>] (state_store+0x6c/0xbc) from [<c02714dc>] (kobj_attr_store+0x14/0x20)
|[<c02714dc>] (kobj_attr_store+0x14/0x20) from [<c01341a0>] (sysfs_write_file+0x16c/0x19c)
|[<c01341a0>] (sysfs_write_file+0x16c/0x19c) from [<c00ddfe4>] (vfs_write+0xb4/0x190)
|[<c00ddfe4>] (vfs_write+0xb4/0x190) from [<c00de3a4>] (SyS_write+0x3c/0x70)
|[<c00de3a4>] (SyS_write+0x3c/0x70) from [<c000e2c0>] (ret_fast_syscall+0x0/0x48)

I don't see a reason why the pm_runtime call must be under the lock.
Further I don't understand why this is a spinlock and not mutex.

Cc: Mugunthan V N <mugunthanvnm@ti.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Acked-by: Mugunthan V N <mugunthanvnm@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-11 02:26:01 -07:00
David S. Miller
2e42206948 Merge branch 'bridge_flags'
Vlad Yasevich says:

====================
The following series adds 2 new flags to bridge.  One flag allows
the user to control whether mac learning is performed on the interface
or not.  By default mac learning is on.
The other flag allows the user to control whether unicast traffic
is flooded (send without an fdb) to a given unicast port.  Default is
on.

Changes since v4:
 - Implemented Stephen's suggestions.

Changes since v2:
 - removed unused "unlock" tag.

Changes since v1:
 - Integrated suggestion from MST to not impact RTM_NEWNEIGH and to
   skip lookups when learning is disabled.

Vlad Yasevich (2):
  bridge: Add flag to control mac learning.
  bridge: Add a flag to control unicast packet flood.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-11 02:04:43 -07:00
Vlad Yasevich
867a59436f bridge: Add a flag to control unicast packet flood.
Add a flag to control flood of unicast traffic.  By default, flood is
on and the bridge will flood unicast traffic if it doesn't know
the destination.  When the flag is turned off, unicast traffic
without an FDB will not be forwarded to the specified port.

Signed-off-by: Vlad Yasevich <vyasevic@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-11 02:04:32 -07:00
Vlad Yasevich
9ba18891f7 bridge: Add flag to control mac learning.
Allow user to control whether mac learning is enabled on the port.
By default, mac learning is enabled.  Disabling mac learning will
cause new dynamic FDB entries to not be created for a particular port.

Signed-off-by: Vlad Yasevich <vyasevic@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-11 02:04:32 -07:00
Scott Wood
7c11c0ccc7 kvm/ppc/booke64: Fix lazy ee handling in kvmppc_handle_exit()
EE is hard-disabled on entry to kvmppc_handle_exit(), so call
hard_irq_disable() so that PACA_IRQ_HARD_DIS is set, and soft_enabled
is unset.

Without this, we get warnings such as arch/powerpc/kernel/time.c:300,
and sometimes host kernel hangs.

Signed-off-by: Scott Wood <scottwood@freescale.com>
Signed-off-by: Gleb Natapov <gleb@redhat.com>
2013-06-11 11:11:00 +03:00
Scott Wood
f1e89028f0 kvm/ppc/booke: Hold srcu lock when calling gfn functions
KVM core expects arch code to acquire the srcu lock when calling
gfn_to_memslot and similar functions.

Signed-off-by: Scott Wood <scottwood@freescale.com>
Signed-off-by: Gleb Natapov <gleb@redhat.com>
2013-06-11 11:10:59 +03:00
Scott Wood
2b6398fcf2 kvm/ppc/booke64: Disable e6500 support
The previous patch made 64-bit booke KVM build again, but Altivec
support is still not complete, and we can't prevent the guest from
turning on Altivec (which can corrupt host state until state
save/restore is implemented).  Disable e6500 on KVM until this is
fixed.

Signed-off-by: Scott Wood <scottwood@freescale.com>
Signed-off-by: Gleb Natapov <gleb@redhat.com>
2013-06-11 11:10:56 +03:00
Mihai Caraman
4edd1ae91b kvm/ppc/booke64: Fix AltiVec interrupt numbers and build breakage
Interrupt numbers defined for Book3E follows IVORs definition. Align
BOOKE_INTERRUPT_ALTIVEC_UNAVAIL and BOOKE_INTERRUPT_ALTIVEC_ASSIST to this
rule which also fixes the build breakage.
IVORs 32 and 33 are shared so reflect this in the interrupts naming.

This fixes a build break for 64-bit booke KVM.

Signed-off-by: Mihai Caraman <mihai.caraman@freescale.com>
Signed-off-by: Scott Wood <scottwood@freescale.com>
Signed-off-by: Gleb Natapov <gleb@redhat.com>
2013-06-11 11:10:49 +03:00
Tomasz Figa
cd3fc1b9a3 ARM: SAMSUNG: pm: Adjust for pinctrl- and DT-enabled platforms
This patch makes legacy code on suspend/resume path being executed
conditionally, on non-DT platforms only, to fix suspend/resume of
DT-enabled systems, for which the code is inappropriate.

Signed-off-by: Tomasz Figa <t.figa@samsung.com>
Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
[olof: add #include <linux/of.h>]
Signed-off-by: Olof Johansson <olof@lixom.net>
2013-06-11 01:10:12 -07:00
David Daney
681865d48e mips/kvm: Use KVM_REG_MIPS and proper size indicators for *_ONE_REG
The API requires that the GET_ONE_REG and SET_ONE_REG ioctls have this
extra information encoded in the register identifiers.

Signed-off-by: David Daney <david.daney@cavium.com>
Signed-off-by: Gleb Natapov <gleb@redhat.com>
2013-06-11 11:07:38 +03:00
David Daney
2a8fedd0c1 kvm: Add definition of KVM_REG_MIPS
We use 0x7000000000000000ULL as 0x6000000000000000ULL is reserved for
ARM64.

Signed-off-by: David Daney <david.daney@cavium.com>
Signed-off-by: Gleb Natapov <gleb@redhat.com>
2013-06-11 11:06:34 +03:00
Haojian Zhuang
7e5955db45 ARM: prima2: fix incorrect panic usage
In prima2, some functions of checking DT is registered in initcall
level. If it doesn't match the compatible name of sirf, kernel
will panic. It blocks the usage of multiplatform on other verndor.

The error message is in below.

Knic - not syncing: unable to find compatible pwrc node in dtb
CPU: 0 PID: 1 Comm: swapper/0 Not tainted 3.10.0-rc3-00006-gd7f26ea-dirty #86
[<c0013adc>] (unwind_backtrace+0x0/0xf8) from [<c0011430>] (show_stack+0x10/0x1)
[<c0011430>] (show_stack+0x10/0x14) from [<c026f724>] (panic+0x90/0x1e8)
[<c026f724>] (panic+0x90/0x1e8) from [<c03267fc>] (sirfsoc_of_pwrc_init+0x24/0x)
[<c03267fc>] (sirfsoc_of_pwrc_init+0x24/0x58) from [<c0320864>] (do_one_initcal)
[<c0320864>] (do_one_initcall+0x90/0x150) from [<c0320a20>] (kernel_init_freeab)
[<c0320a20>] (kernel_init_freeable+0xfc/0x1c4) from [<c026b9e8>] (kernel_init+0)
[<c026b9e8>] (kernel_init+0x8/0xe4) from [<c000e158>] (ret_from_fork+0x14/0x3c)

Signen-off-by: Haojian Zhuang <haojian.zhuang@linaro.org>
Signed-off-by: Olof Johansson <olof@lixom.net>
2013-06-11 00:11:31 -07:00
Nicolas Dichtel
ed13998c31 sock_diag: fix filter code sent to userspace
Filters need to be translated to real BPF code for userland, like SO_GETFILTER.

Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-10 22:23:32 -07:00
Cong Wang
30f3a40f9a net: remove last caller of skb_tail_offset() and itself
Similar to the following commits:

commit 00f97da17a (netpoll: fix position of network header)
commit 525cebedb3 (pktgen: Fix position of ip and udp header)

using skb_tail_offset() seems not correct since the offset
is based on head pointer.

With the last caller removed, skb_tail_offset() can be killed
finally.

Cc: Thomas Graf <tgraf@suug.ch>
Cc: Daniel Borkmann <dborkmann@redhat.com>
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Cong Wang <amwang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-10 22:22:23 -07:00
David S. Miller
0a4db187a9 Merge branch 'll_poll'
Eliezer Tamir says:

====================
This patch set adds the ability for the socket layer code to
poll directly on an Ethernet device's RX queue.
This eliminates the cost of the interrupt and context switch
and with proper tuning allows us to get very close to the HW latency.

This is a follow up to Jesse Brandeburg's Kernel Plumbers talk from
last year
http://www.linuxplumbersconf.org/2012/wp-content/uploads/2012/09/2012-lpc-Low-Latency-Sockets-slides-brandeburg.pdf

Patch 1 adds a napi_id and a hashing mechanism to lookup a napi by id.
Patch 2 adds an ndo_ll_poll method and the code that supports it.
Patch 3 adds support for busy-polling on UDP sockets.
Patch 4 adds support for TCP.
Patch 5 adds the ixgbe driver code implementing ndo_ll_poll.
Patch 6 adds additional statistics to the ixgbe driver for ndo_ll_poll.

Performance numbers:
     setup                         TCP_RR           UDP_RR
kernel  Config     C3/6 rx-usecs tps cpu% S.dem  tps cpu% S.dem
patched optimized  on   100      87k 3.13 11.4   94K 3.17 10.7
patched optimized  on   0        71k 3.12 14.0   84k 3.19 12.0
patched optimized  on   adaptive 80k 3.13 12.5   90k 3.46 12.2
patched typical    on   100      72  3.13 14.0   79k 3.17 12.8
patched typical    on   0        60k 2.13 16.5   71k 3.18 14.0
patched typical    on   adaptive 67k 3.51 16.7   75k 3.36 14.5
3.9     optimized  on   adaptive 25k 1.0  12.7   28k 0.98 11.2
3.9     typical    off  0        48k 1.09  7.3   52k 1.11 4.18
3.9     typical    0ff  adaptive 35k 1.12 4.08   38k 0.65 5.49
3.9     optimized  off  adaptive 40k 0.82 4.83   43k 0.70 5.23
3.9     optimized  off  0        57k 1.17 4.08   62k 1.04 3.95

Test setup details:
Machines: each with two Intel Xeon 2680 CPUs and X520 (82599) optical
NICs
Tests: Netperf tcp_rr and udp_rr, 1 byte (round trips per second)
Kernel: unmodified 3.9 and patched 3.9
Config: typical is derived from RH6.2, optimized is a stripped down
config.
Interrupt coalescing (ethtool rx-usecs) settings: 0=off, 1=adaptive,
100 us
When C3/6 states were turned on (via BIOS) the performance governor
was used.

These performance numbers were measured with v2 of the patch set.
Performance of the optimized config with an rx-usecs setting of 100
(the first line in the table above) was tracked during the evolution
of the patches and has never varied by more than 1%.

Design:
A global hash table that allows us to look up a struct napi by a
unique id was added.

A napi_id field was added both to struct sk_buff and struct sk.
This is used to track which NAPI we need to poll for a specific
socket.

The device driver marks every incoming skb with this id.
This is propagated to the sk when the socket is looked up in the
protocol handler.

When the socket code does not find any more data on the socket queue,
it now may call ndo_ll_poll which will crank the device's rx queue and
feed incoming packets to the stack directly from the context of the
socket.

A sysctl value (net.core4.low_latency_poll) controls how many
microseconds we busy-wait before giving up. (setting to 0 globally
disables busy-polling)

Locking:

1. Locking between napi poll and ndo_ll_poll:
Since what needs to be locked between a device's NAPI poll and
ndo_ll_poll, is highly device / configuration dependent, we do this
inside the Ethernet driver.
For example, when packets for high priority connections are sent to
separate rx queues, you might not need locking between napi poll and
ndo_ll_poll at all.

For ixgbe we only lock the RX queue.
ndo_ll_poll does not touch the interrupt state or the TX queues.
(earlier versions of this patchset did touch them,
but this design is simpler and works better.)

If a queue is actively polled by a socket (on another CPU) napi poll
will not service it, but will wait until the queue can be locked
and cleaned before doing a napi_complete().
If a socket can't lock the queue because another CPU has it,
either from napi or from another socket polling on the queue,
the socket code can busy wait on the socket's skb queue.

Ndo_ll_poll does not have preferential treatment for the data from the
calling socket vs. data from others, so if another CPU is polling,
you will see your data on this socket's queue when it arrives.

Ndo_ll_poll is called with local BHs disabled, so it won't race on
the same CPU with net_rx_action, which calls the napi poll method.

2. Napi_hash
The napi hash mechanism uses RCU.
napi_by_id() must be called under rcu_read_lock().
After a call to napi_hash_del(), caller must take care to wait an rcu
grace period before freeing the memory containing the napi struct.
(Ixgbe already had this because the queue vector structure uses rcu to
protect the statistics counters in it.)

how to test:

1. The patchset should apply cleanly to net-next.
(don't forget to configure INET_LL_RX_POLL).

2. The ethtool -c setting for rx-usecs should be on the order of 100.

3. Use ethtool -K to disable GRO and LRO
(You are encouraged to try it both ways. If you find that your
workload
does better with GRO on do tell us.)

4. Sysctl value net.core.low_latency_poll controls how long
(in us) to busy-wait for more data, You are encouraged to play
with this and see what works for you. The default is now 0 so you need
to
set it to turn the feature on. I recommend a value around 50.

4. benchmark thread and IRQ should be bound to separate cores.
Both cores should be on the same CPU NUMA node as the NIC.
When the app and the IRQ run on the same CPU  you get a small penalty.
If interrupt coalescing is set to a low value this penalty can be very
large.

5. If you suspect that your machine is not configured properly,
use numademo to make sure that the CPU to memory BW is OK.
numademo 128m memcpy local copy numbers should be more than
8GB/s on a properly configured machine.

Change log:
v10
- removed select/poll support. (we will work on this some more and try again)
v9
- correct sysctl proc_handler, reported by Eric Dumazet and Amir Vadai.
- more int -> bool changes, reported by Eric Dumazet.
- better mask testing in sock_poll(), reported by Eric Dumazet.

v8
- split out udp and select/poll into separate patches.
  what used to be patch 2/5 is now three patches.
- type corrections from Amir Vadai and Cong Wang:
  one unsigned long that was left when changing to cycles_t
  int -> bool
- more detailed patch descriptions.

v7
- suggested by Ben Hutchings and Eric Dumazet:
  type fixes, static for globals in net/core.c,
  avoid napi_id collisions in napi_hash_add()

v6
- many small fixes suggested by Eric Dumazet:
  data locality, typos, documentation
  protect napi_hash insert/delete with a spinlock (napi_gen_id is no
  longer atomic_t since it's only accessed with the spinlock held.)
- added IPv6 TCP and UDP support (only minimally tested)

v5
- corrections suggested by Ben Hutchings:
  fixed typos, moved the config option and sysctl value from IPv4 to net
- moved sk_mark_ll() to the protocol handlers
- removed global id mechanism, replaced with a hashed napi_id.
  based on code sample from Eric Dumazet
  Note that ixgbe_free_q_vector() already waits an rcu grace period
  before freeing the q_vector, so nothing additional needs to be done
  when adding a call to napi_hash_del().
- simple poll/select support

v4
- removed separate config option for TCP as suggested Eric Dumazet.
- added linux mib counter for packets received through the low latency path,
  as suggested by Andi Kleen.
- re-allow module unloading, remove module param, use a global generation id
  instead to prevent the use of a stale napi pointer, as suggested
  by Eric Dumazet
- updated Documentation/networking/ip-sysctl.txt text

v3
- coding style changes suggested by Dave Miller

v2
- the sysctl knob is now in microseconds. The default value is now 0 (off).
- for now the code depends at configure time on CONFIG_I86_TSC
- the napi reference in struct skb is now a union with the dma cookie
  since the former is only used on RX and the latter on TX,
  as suggested by Eric Dumazet.
- we do a better job at honoring non-blocking operations.
- removed busy-polling support for tcp_read_sock()
- remove dynamic disabling of GRO
- coding style fixes
- disallow unloading the device module after the feature has been used

Credit:
Jesse Brandeburg, Arun Chekhov Ilango, Julie Cummings,
Alexander Duyck, Eric Geisler, Jason Neighbors, Yadong Li,
Mike Polehn, Anil Vasudevan, Don Wood
Special thanks for finding bugs in earlier versions:
Willem de Bruijn and Andi Kleen
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-10 21:23:57 -07:00
Eliezer Tamir
7e15b90ff9 ixgbe: add extra stats for ndo_ll_poll
Add additional statistics to the ixgbe driver for ndo_ll_poll
Defined under LL_EXTENDED_STATS

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Signed-off-by: Eliezer Tamir <eliezer.tamir@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-10 21:22:36 -07:00