From raziebe@gmail.com Fri Jul 1 02:01:45 2005 Received: with ECARTIS (v1.0.0; list netdev); Fri, 01 Jul 2005 02:01:50 -0700 (PDT) Received: from wproxy.gmail.com (wproxy.gmail.com [64.233.184.204]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j6191iH9029279 for ; Fri, 1 Jul 2005 02:01:45 -0700 Received: by wproxy.gmail.com with SMTP id i20so259692wra for ; Fri, 01 Jul 2005 02:00:12 -0700 (PDT) DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=beta; d=gmail.com; h=received:message-id:date:from:reply-to:to:subject:mime-version:content-type:content-transfer-encoding:content-disposition; b=CFP/B2ras/rASvHjwg3bq1xvfsoRqcy3iIH53ToMxO/MNkCqWKBN1taqlp/ytn2ZBAunkqBzzqEkkL8l64vNCVonzGS4beI4hn3/V7Io4BLRClgBS8vqjMLuIJv4ek/3NeUTV96G/wmnZj4qNm3M0P3A2Vr/qSvibdNZfSwAjAc= Received: by 10.54.101.2 with SMTP id y2mr1295995wrb; Fri, 01 Jul 2005 02:00:12 -0700 (PDT) Received: by 10.54.122.5 with HTTP; Fri, 1 Jul 2005 02:00:11 -0700 (PDT) Message-ID: <5d96567b05070102001fc7b677@mail.gmail.com> Date: Fri, 1 Jul 2005 11:00:11 +0200 From: "Raz Ben-Jehuda(caro)" Reply-To: "Raz Ben-Jehuda(caro)" To: netdev@oss.sgi.com Subject: general bonding question Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Disposition: inline Content-Transfer-Encoding: 8bit X-MIME-Autoconverted: from quoted-printable to 8bit by oss.sgi.com id j6191iH9029279 X-archive-position: 2584 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: raziebe@gmail.com Precedence: bulk X-list: netdev To any of you that deal with the nic bonding driver. Is there a special reason why the bond interface is not using the slave device features, such as "scatter gather" or hardware checksum ? I think of fixing the driver but i thought it would better to advise first. -- Raz Long Live the Penguin From eric-madwifi-devel@lammerts.org Fri Jul 1 08:36:39 2005 Received: with ECARTIS (v1.0.0; list netdev); Fri, 01 Jul 2005 08:36:47 -0700 (PDT) Received: from mail.ultrawaves.com ([64.135.31.50]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j61FacH9027189 for ; Fri, 1 Jul 2005 08:36:39 -0700 Received: from [10.20.30.16] (sweetums.ultrawaves [10.20.30.16]) by mail.ultrawaves.com (Postfix) with ESMTP id 589BD45C015; Fri, 1 Jul 2005 11:35:00 -0400 (EDT) Message-ID: <42C562A4.9070501@lammerts.org> Date: Fri, 01 Jul 2005 11:35:00 -0400 From: Eric Lammerts User-Agent: Mozilla Thunderbird 1.0 (X11/20041206) X-Accept-Language: en-us, en MIME-Version: 1.0 To: madwifi-devel@lists.sourceforge.net Cc: tommy.christensen@tpack.net, herbert@gondor.apana.org.au, davem@davemloft.net, netdev@oss.sgi.com Subject: problems with 2.6.12 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-archive-position: 2585 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: eric-madwifi-devel@lammerts.org Precedence: bulk X-list: netdev Hello all, I was having problems with 2.6.12 + madwifi in master mode. No packets were going out. With tcpdump I see DHCP requests coming in, with strace I see the dhcp daemon sending replies out, but they don't show up in tcpdump. It's caused by this change: http://oss.sgi.com/projects/netdev/archive/2005-05/msg00109.html which btw also causes problems for other people: http://marc.theaimsgroup.com/?l=linux-kernel&m=111853727810345&w=2 Madwifi doesn't call netif_carrier_on() in master mode, so Linux drops all packets. When I remove the dev_deactivate() line, it works fine again. Should we fix madwifi or the kernel? Eric From tommy.christensen@tpack.net Fri Jul 1 09:34:31 2005 Received: with ECARTIS (v1.0.0; list netdev); Fri, 01 Jul 2005 09:34:35 -0700 (PDT) Received: from mail.tpack.net (ip18.tpack.net [213.173.228.18]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with SMTP id j61GYUH9015022 for ; Fri, 1 Jul 2005 09:34:31 -0700 Received: (qmail 5899 invoked from network); 1 Jul 2005 16:32:59 -0000 Received: from unknown (HELO ?172.17.159.11?) (192.168.111.1) by 0 with SMTP; 1 Jul 2005 16:32:59 -0000 Message-ID: <42C57058.70806@tpack.net> Date: Fri, 01 Jul 2005 18:33:28 +0200 From: Tommy Christensen User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7.2) Gecko/20040803 X-Accept-Language: en-us, en MIME-Version: 1.0 To: Eric Lammerts CC: madwifi-devel@lists.sourceforge.net, herbert@gondor.apana.org.au, davem@davemloft.net, netdev@oss.sgi.com Subject: Re: problems with 2.6.12 References: <42C562A4.9070501@lammerts.org> In-Reply-To: <42C562A4.9070501@lammerts.org> Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit X-archive-position: 2586 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: tommy.christensen@tpack.net Precedence: bulk X-list: netdev Eric Lammerts wrote: > Hello all, > I was having problems with 2.6.12 + madwifi in master mode. No packets > were going out. With tcpdump I see DHCP requests coming in, with strace > I see the dhcp daemon sending replies out, but they don't show up in > tcpdump. > > It's caused by this change: > http://oss.sgi.com/projects/netdev/archive/2005-05/msg00109.html > > which btw also causes problems for other people: > http://marc.theaimsgroup.com/?l=linux-kernel&m=111853727810345&w=2 Auch. And vlan interfaces are having trouble as well. > Madwifi doesn't call netif_carrier_on() in master mode, so Linux drops > all packets. When I remove the dev_deactivate() line, it works fine again. Netdevices are "born" with carrier on, so if your code don't call netif_carrier_off() or set dev->state directly, I don't see how you can end up in this state. Could you investigate this? > Should we fix madwifi or the kernel? The code is there for a reason, so hopefully we can work this out. -Tommy From tgraf@suug.ch Fri Jul 1 10:42:37 2005 Received: with ECARTIS (v1.0.0; list netdev); Fri, 01 Jul 2005 10:42:45 -0700 (PDT) Received: from postel.suug.ch (postel.suug.ch [195.134.158.23]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j61HgXH9020965 for ; Fri, 1 Jul 2005 10:42:36 -0700 Received: by postel.suug.ch (Postfix, from userid 10001) id DA5F81C0F3; Fri, 1 Jul 2005 19:41:17 +0200 (CEST) Date: Fri, 1 Jul 2005 19:41:17 +0200 From: Thomas Graf To: Patrick McHardy Cc: Patrick Jenkins , linux-kernel@vger.kernel.org, Maillist netdev Subject: Re: [PATCH] multipath routing algorithm, better patch Message-ID: <20050701174117.GW16076@postel.suug.ch> References: <42C4919A.5000009@trash.net> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <42C4919A.5000009@trash.net> X-archive-position: 2587 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: tgraf@suug.ch Precedence: bulk X-list: netdev * Patrick McHardy <42C4919A.5000009@trash.net> 2005-07-01 02:43 > Multiple algorithms can be compiled in at once, so this patch is wrong. > mp_alg is supplied by userspace: > > if (rta->rta_mp_alg) { > mp_alg = *rta->rta_mp_alg; > > if (mp_alg < IP_MP_ALG_NONE || > mp_alg > IP_MP_ALG_MAX) > goto err_inval; > } > > If it isn't set correctly its an iproute problem. Did you actually > experience any problems? Well, my patch for iproute2 to enable multipath algorithm selection is currently being merged to Stephen together with the ematch bits. We had to work out a dependency on GNU flex first (the berkley version uses the same executable names) so the inclusion was delayed a bit. From kaber@trash.net Fri Jul 1 12:35:52 2005 Received: with ECARTIS (v1.0.0; list netdev); Fri, 01 Jul 2005 12:35:55 -0700 (PDT) Received: from kaber.coreworks.de ([62.206.217.67]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j61JZpH9030771 for ; Fri, 1 Jul 2005 12:35:52 -0700 Received: from localhost ([127.0.0.1]) by kaber.coreworks.de with esmtp (Exim 4.51) id 1DoRHH-0002t5-2M; Fri, 01 Jul 2005 21:34:19 +0200 Message-ID: <42C59ABA.1070305@trash.net> Date: Fri, 01 Jul 2005 21:34:18 +0200 From: Patrick McHardy User-Agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.7.8) Gecko/20050514 Debian/1.7.8-1 X-Accept-Language: en MIME-Version: 1.0 To: Thomas Graf CC: Patrick Jenkins , linux-kernel@vger.kernel.org, Maillist netdev Subject: Re: [PATCH] multipath routing algorithm, better patch References: <42C4919A.5000009@trash.net> <20050701174117.GW16076@postel.suug.ch> In-Reply-To: <20050701174117.GW16076@postel.suug.ch> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 2588 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: kaber@trash.net Precedence: bulk X-list: netdev Thomas Graf wrote: > * Patrick McHardy <42C4919A.5000009@trash.net> 2005-07-01 02:43 > >>If it isn't set correctly its an iproute problem. Did you actually >>experience any problems? > > Well, my patch for iproute2 to enable multipath algorithm selection > is currently being merged to Stephen together with the ematch bits. > We had to work out a dependency on GNU flex first (the berkley > version uses the same executable names) so the inclusion was > delayed a bit. So its no problem but simply missing support. BTW, do you know if Stephen's new CVS repository is exported somewhere? Regards Patrick From radheka.godse@intel.com Fri Jul 1 13:24:50 2005 Received: with ECARTIS (v1.0.0; list netdev); Fri, 01 Jul 2005 13:24:58 -0700 (PDT) Received: from orsfmr003.jf.intel.com (fmr18.intel.com [134.134.136.17]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j61KOnH9001472 for ; Fri, 1 Jul 2005 13:24:50 -0700 Received: from orsfmr100.jf.intel.com (orsfmr100.jf.intel.com [10.7.209.16]) by orsfmr003.jf.intel.com (8.12.10/8.12.10/d: major-outer.mc,v 1.1 2004/09/17 17:50:56 root Exp $) with ESMTP id j61KN3xw003914; Fri, 1 Jul 2005 20:23:04 GMT Received: from nwlxmail01.jf.intel.com (nwlxmail01.jf.intel.com [10.7.171.40]) by orsfmr100.jf.intel.com (8.12.10/8.12.10/d: major-inner.mc,v 1.2 2004/09/17 18:05:01 root Exp $) with ESMTP id j61KN3Pb018286; Fri, 1 Jul 2005 20:23:03 GMT Received: from [134.134.3.92] ([134.134.3.92]) by nwlxmail01.jf.intel.com (8.12.10/8.12.9/MailSET/Hub) with ESMTP id j61KN3SL031915; Fri, 1 Jul 2005 13:23:03 -0700 Date: Fri, 1 Jul 2005 13:22:05 -0700 (PDT) From: Radheka Godse X-X-Sender: radheka@localhost.localdomain To: fubar@us.ibm.com, bonding-devel@lists.sourceforge.net cc: netdev@oss.sgi.com Subject: [PATCH 2.6.13-rc1 0/17] bonding: Sysfs Support Message-ID: ReplyTo: "Radheka Godse" MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed X-Scanned-By: MIMEDefang 2.44 X-archive-position: 2589 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: radheka.godse@intel.com Precedence: bulk X-list: netdev This patch set is an updated sysfs bonding patch to the one sent previously in April. The patch is upto date with all recent bonding and kernel changes and applies cleanly to linux-2.6.13-rc1. It adds sysfs entry for the "Xmit Hash Policy" for new bonding module param; incorporates feedback and bug fixes on all known issues and has been tested on linux-2.6.13-rc1. The interface is pretty simple. Here are some notes on how it could be used: The file /sys/class/net/bonding_masters contains the names of all the active bonds. To add or remove bonds, write a whitespace-delimited list of interface names to this file. For example: echo "bond1 bond2 bond3" > /sys/class/net/bonding_masters will create three bonds with the given names. If any other bonds exist, they will be deleted. echo "bond0 bond2 bond3" > /sys/class/net/bonding_masters would then create bond0 and remove bond1. For each bond, there is a directory /sys/class/net//bonding. In this directory are a number of files which control the bond. The names of these files match the existing module parameters and do the same things. The slaves file contains a whitespace-delimited list of interface names, which are slaves to the bond. This file behaves much the same as the "bonding_masters" file; just write a list of your desired interfaces to this file. Likewise, the arp_targets file contains a whitespace-delimited list of IP addresses and should be written to in a similar fashion. The other files contain single values(numeric and/or mnemonic). Some caveats: - slaves can only be assigned when the interface is up - mode can only be changed when the interface is down - Xmit hash policy can be changed only when interface is down Warnings and status messages will be logged and can be viewed with dmesg. Example: modprobe bonding echo "bond0 bond1" > /sys/class/net/bonding_masters echo "6" > /sys/class/net/bond0/bonding/mode echo "1000" > /sys/class/net/bond0/bonding/miimon ifconfig bond0 192.168.0.1 echo "eth0 eth1" > /sys/class/net/bond0/bonding/slaves # bond0 is now ready to use echo "1" > /sys/class/net/bond1/bonding/mode # ... and so on for bond1 These patches were generated against 2.6.12 with Jay's upstream patches. It applies cleanly to kernel 2.6.13-rc1. - Radheka Godse - Mitch Williams From radheka.godse@intel.com Fri Jul 1 13:31:46 2005 Received: with ECARTIS (v1.0.0; list netdev); Fri, 01 Jul 2005 13:31:48 -0700 (PDT) Received: from orsfmr005.jf.intel.com (fmr20.intel.com [134.134.136.19]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j61KVjH9002272 for ; Fri, 1 Jul 2005 13:31:46 -0700 Received: from orsfmr101.jf.intel.com (orsfmr101.jf.intel.com [10.7.209.17]) by orsfmr005.jf.intel.com (8.12.10/8.12.10/d: major-outer.mc,v 1.1 2004/09/17 17:50:56 root Exp $) with ESMTP id j61KU8tp005569; Fri, 1 Jul 2005 20:30:08 GMT Received: from nwlxmail01.jf.intel.com (nwlxmail01.jf.intel.com [10.7.171.40]) by orsfmr101.jf.intel.com (8.12.10/8.12.10/d: major-inner.mc,v 1.2 2004/09/17 18:05:01 root Exp $) with ESMTP id j61KU87G020793; Fri, 1 Jul 2005 20:30:08 GMT Received: from [134.134.3.92] ([134.134.3.92]) by nwlxmail01.jf.intel.com (8.12.10/8.12.9/MailSET/Hub) with ESMTP id j61KU8SL032541; Fri, 1 Jul 2005 13:30:08 -0700 Date: Fri, 1 Jul 2005 13:29:20 -0700 (PDT) From: Radheka Godse X-X-Sender: radheka@localhost.localdomain To: fubar@us.ibm.com, bonding-devel@lists.sourceforge.net cc: netdev@oss.sgi.com Subject: [PATCH 2.6.13-rc1 1/17] bonding: make some functions not static Message-ID: ReplyTo: "Radheka Godse" MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed X-Scanned-By: MIMEDefang 2.44 X-archive-position: 2590 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: radheka.godse@intel.com Precedence: bulk X-list: netdev This patch prepares for adding sysfs functionality to bonding by making some functions in bond_main non-static and adding protos into the header. diff -urN -X dontdiff linux-2.6.12post/drivers/net/bonding/bonding.h linux-2.6.12post-sysfs/drivers/net/bonding/bonding.h --- linux-2.6.12post/drivers/net/bonding/bonding.h 2005-06-28 18:18:03.000000000 -0700 +++ linux-2.6.12post-sysfs/drivers/net/bonding/bonding.h 2005-06-30 13:58:27.000000000 -0700 @@ -255,6 +255,17 @@ struct vlan_entry *bond_next_vlan(struct bonding *bond, struct vlan_entry *curr); int bond_dev_queue_xmit(struct bonding *bond, struct sk_buff *skb, struct net_device *slave_dev); +void bond_deinit(struct net_device *bond_dev); +int bond_release(struct net_device *bond_dev, struct net_device *slave_dev); +int bond_sethwaddr(struct net_device *bond_dev, struct net_device *slave_dev); +void bond_mii_monitor(struct net_device *bond_dev); +void bond_loadbalance_arp_mon(struct net_device *bond_dev); +void bond_activebackup_arp_mon(struct net_device *bond_dev); +void bond_set_mode_ops(struct bonding *bond, int mode); +int bond_parse_parm(char *mode_arg, struct bond_parm_tbl *tbl); +const char *bond_mode_name(int mode); +void bond_select_active_slave(struct bonding *bond); +void bond_change_active_slave(struct bonding *bond, struct slave *new_active); #endif /* _LINUX_BONDING_H */ diff -urN -X dontdiff linux-2.6.12post/drivers/net/bonding/bond_main.c linux-2.6.12post-sysfs/drivers/net/bonding/bond_main.c --- linux-2.6.12post/drivers/net/bonding/bond_main.c 2005-06-28 18:18:03.000000000 -0700 +++ linux-2.6.12post-sysfs/drivers/net/bonding/bond_main.c 2005-06-30 13:53:55.000000000 -0700 @@ -1453,7 +1453,7 @@ * * Warning: Caller must hold curr_slave_lock for writing. */ -static void bond_change_active_slave(struct bonding *bond, struct slave *new_active) +void bond_change_active_slave(struct bonding *bond, struct slave *new_active) { struct slave *old_active = bond->curr_active_slave; @@ -1527,7 +1527,7 @@ * * Warning: Caller must hold curr_slave_lock for writing. */ -static void bond_select_active_slave(struct bonding *bond) +void bond_select_active_slave(struct bonding *bond) { struct slave *best_slave; @@ -1595,7 +1595,7 @@ /*---------------------------------- IOCTL ----------------------------------*/ -static int bond_sethwaddr(struct net_device *bond_dev, struct net_device *slave_dev) +int bond_sethwaddr(struct net_device *bond_dev, struct net_device *slave_dev) { dprintk("bond_dev=%p\n", bond_dev); dprintk("slave_dev=%p\n", slave_dev); @@ -2030,7 +2030,7 @@ * for Bonded connections: * The first up interface should be left on and all others downed. */ -static int bond_release(struct net_device *bond_dev, struct net_device *slave_dev) +int bond_release(struct net_device *bond_dev, struct net_device *slave_dev) { struct bonding *bond = bond_dev->priv; struct slave *slave, *oldcurrent; @@ -2479,7 +2479,7 @@ /*-------------------------------- Monitoring -------------------------------*/ /* this function is called regularly to monitor each slave's link. */ -static void bond_mii_monitor(struct net_device *bond_dev) +void bond_mii_monitor(struct net_device *bond_dev) { struct bonding *bond = bond_dev->priv; struct slave *slave, *oldcurrent; @@ -2904,7 +2904,7 @@ * arp is transmitted to generate traffic. see activebackup_arp_monitor for * arp monitoring in active backup mode. */ -static void bond_loadbalance_arp_mon(struct net_device *bond_dev) +void bond_loadbalance_arp_mon(struct net_device *bond_dev) { struct bonding *bond = bond_dev->priv; struct slave *slave, *oldcurrent; @@ -3042,7 +3042,7 @@ * may have received. * see loadbalance_arp_monitor for arp monitoring in load balancing mode */ -static void bond_activebackup_arp_mon(struct net_device *bond_dev) +void bond_activebackup_arp_mon(struct net_device *bond_dev) { struct bonding *bond = bond_dev->priv; struct slave *slave; @@ -4484,7 +4484,7 @@ /* * set bond mode specific net device operations */ -static inline void bond_set_mode_ops(struct bonding *bond, int mode) +void bond_set_mode_ops(struct bonding *bond, int mode) { struct net_device *bond_dev = bond->dev; @@ -4603,7 +4603,7 @@ /* De-initialize device specific data. * Caller must hold rtnl_lock. */ -static inline void bond_deinit(struct net_device *bond_dev) +void bond_deinit(struct net_device *bond_dev) { struct bonding *bond = bond_dev->priv; @@ -4639,7 +4639,7 @@ * Convert string input module parms. Accept either the * number of the mode or its string name. */ -static inline int bond_parse_parm(char *mode_arg, struct bond_parm_tbl *tbl) +int bond_parse_parm(char *mode_arg, struct bond_parm_tbl *tbl) { int i; From radheka.godse@intel.com Fri Jul 1 13:40:34 2005 Received: with ECARTIS (v1.0.0; list netdev); Fri, 01 Jul 2005 13:40:38 -0700 (PDT) Received: from orsfmr003.jf.intel.com (fmr18.intel.com [134.134.136.17]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j61KeYH9003213 for ; Fri, 1 Jul 2005 13:40:34 -0700 Received: from orsfmr100.jf.intel.com (orsfmr100.jf.intel.com [10.7.209.16]) by orsfmr003.jf.intel.com (8.12.10/8.12.10/d: major-outer.mc,v 1.1 2004/09/17 17:50:56 root Exp $) with ESMTP id j61Kcwxw011205; Fri, 1 Jul 2005 20:38:58 GMT Received: from nwlxmail01.jf.intel.com (nwlxmail01.jf.intel.com [10.7.171.40]) by orsfmr100.jf.intel.com (8.12.10/8.12.10/d: major-inner.mc,v 1.2 2004/09/17 18:05:01 root Exp $) with ESMTP id j61KcwPb027451; Fri, 1 Jul 2005 20:38:58 GMT Received: from [134.134.3.92] ([134.134.3.92]) by nwlxmail01.jf.intel.com (8.12.10/8.12.9/MailSET/Hub) with ESMTP id j61KcwSL001300; Fri, 1 Jul 2005 13:38:58 -0700 Date: Fri, 1 Jul 2005 13:38:11 -0700 (PDT) From: Radheka Godse X-X-Sender: radheka@localhost.localdomain To: fubar@us.ibm.com, bonding-devel@lists.sourceforge.net cc: netdev@oss.sgi.com Subject: [PATCH 2.6.13-rc1 2/17] bonding: split bond creation into new function Message-ID: ReplyTo: "Radheka Godse" MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed X-Scanned-By: MIMEDefang 2.44 X-archive-position: 2592 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: radheka.godse@intel.com Precedence: bulk X-list: netdev This patch moves the work of creating a new bond into a separate function, instead of being inline in bonding_init. This function is non-static and proto is added to the header, for use by the sysfs interface. Signed-off-by: Radheka Godse Signed-off-by: Mitch Williams diff -urN -X dontdiff linux-2.6.12post/drivers/net/bonding/bonding.h linux-2.6.12post-sysfs/drivers/net/bonding/bonding.h --- linux-2.6.12post/drivers/net/bonding/bonding.h 2005-06-28 18:18:03.000000000 -0700 +++ linux-2.6.12post-sysfs/drivers/net/bonding/bonding.h 2005-06-30 13:58:27.000000000 -0700 @@ -255,6 +255,7 @@ struct vlan_entry *bond_next_vlan(struct bonding *bond, struct vlan_entry *curr); int bond_dev_queue_xmit(struct bonding *bond, struct sk_buff *skb, struct net_device *slave_dev); +int bond_create(char *name, struct bond_params *params, struct bonding **newbond); void bond_deinit(struct net_device *bond_dev); int bond_release(struct net_device *bond_dev, struct net_device *slave_dev); int bond_sethwaddr(struct net_device *bond_dev, struct net_device *slave_dev); diff -urN -X dontdiff linux-2.6.12post/drivers/net/bonding/bond_main.c linux-2.6.12post-sysfs/drivers/net/bonding/bond_main.c --- linux-2.6.12post/drivers/net/bonding/bond_main.c 2005-06-28 18:18:03.000000000 -0700 +++ linux-2.6.12post-sysfs/drivers/net/bonding/bond_main.c 2005-06-30 13:53:55.000000000 -0700 @@ -555,6 +555,7 @@ static char *xmit_hash_policy = NULL; static int arp_interval = BOND_LINK_ARP_INTERV; static char *arp_ip_target[BOND_MAX_ARP_TARGETS] = { NULL, }; +struct bond_params bonding_defaults; module_param(max_bonds, int, 0); MODULE_PARM_DESC(max_bonds, "Max number of bonded devices"); @@ -4530,12 +4531,10 @@ * Does not allocate but creates a /proc entry. * Allowed to fail. */ -static int __init bond_init(struct net_device *bond_dev, struct bond_params *params) +static int bond_init(struct net_device *bond_dev, struct bond_params *params) { struct bonding *bond = bond_dev->priv; - dprintk("Begin bond_init for %s\n", bond_dev->name); - /* initialize rwlocks */ rwlock_init(&bond->lock); rwlock_init(&bond->curr_slave_lock); @@ -4919,61 +5025,81 @@ return 0; } +/* Create a new bond based on the specified name and bonding parameters. + * Caller must NOT hold rtnl_lock; we need to release it here before we + * set up our sysfs entries. + */ +int bond_create(char *name, struct bond_params *params, struct bonding **newbond) +{ + struct net_device *bond_dev; + int res; + + rtnl_lock(); + bond_dev = alloc_netdev(sizeof(struct bonding), name, ether_setup); + if (!bond_dev) { + printk(KERN_ERR DRV_NAME + ": %s: eek! can't alloc netdev!\n", + name); + res = -ENOMEM; + goto out_rtnl; + } + + /* bond_init() must be called after dev_alloc_name() (for the + * /proc files), but before register_netdevice(), because we + * need to set function pointers. + */ + + res = bond_init(bond_dev, params); + if (res < 0) { + goto out_netdev; + } + + SET_MODULE_OWNER(bond_dev); + + res = register_netdevice(bond_dev); + if (res < 0) { + goto out_bond; + } + if (newbond) + *newbond = bond_dev->priv; + + rtnl_unlock(); /* allows sysfs registration of net device */ + res = bond_create_sysfs_entry(bond_dev->priv); + goto done; +out_bond: + bond_deinit(bond_dev); +out_netdev: + free_netdev(bond_dev); +out_rtnl: + rtnl_unlock(); +done: + return res; +} + static int __init bonding_init(void) { - struct bond_params params; int i; int res; + char new_bond_name[8]; /* Enough room for 999 bonds at init. */ printk(KERN_INFO "%s", version); - res = bond_check_params(¶ms); + res = bond_check_params(&bonding_defaults); if (res) { - return res; + goto out; } - rtnl_lock(); - #ifdef CONFIG_PROC_FS bond_create_proc_dir(); #endif for (i = 0; i < max_bonds; i++) { - struct net_device *bond_dev; - - bond_dev = alloc_netdev(sizeof(struct bonding), "", ether_setup); - if (!bond_dev) { - res = -ENOMEM; - goto out_err; - } - - res = dev_alloc_name(bond_dev, "bond%d"); - if (res < 0) { - free_netdev(bond_dev); - goto out_err; - } - - /* bond_init() must be called after dev_alloc_name() (for the - * /proc files), but before register_netdevice(), because we - * need to set function pointers. - */ - res = bond_init(bond_dev, ¶ms); - if (res < 0) { - free_netdev(bond_dev); - goto out_err; - } - - SET_MODULE_OWNER(bond_dev); - - res = register_netdevice(bond_dev); - if (res < 0) { - bond_deinit(bond_dev); - free_netdev(bond_dev); - goto out_err; - } + sprintf(new_bond_name, "bond%d",i); + res = bond_create(new_bond_name,&bonding_defaults, NULL); + if (res) + goto err; } - rtnl_unlock(); register_netdevice_notifier(&bond_netdev_notifier); register_inetaddr_notifier(&bond_inetaddr_notifier); From radheka.godse@intel.com Fri Jul 1 13:42:00 2005 Received: with ECARTIS (v1.0.0; list netdev); Fri, 01 Jul 2005 13:42:02 -0700 (PDT) Received: from orsfmr002.jf.intel.com (fmr17.intel.com [134.134.136.16]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j61Kg0H9004005 for ; Fri, 1 Jul 2005 13:42:00 -0700 Received: from orsfmr100.jf.intel.com (orsfmr100.jf.intel.com [10.7.209.16]) by orsfmr002.jf.intel.com (8.12.10/8.12.10/d: major-outer.mc,v 1.1 2004/09/17 17:50:56 root Exp $) with ESMTP id j61KeNq8017560; Fri, 1 Jul 2005 20:40:23 GMT Received: from nwlxmail01.jf.intel.com (nwlxmail01.jf.intel.com [10.7.171.40]) by orsfmr100.jf.intel.com (8.12.10/8.12.10/d: major-inner.mc,v 1.2 2004/09/17 18:05:01 root Exp $) with ESMTP id j61KeNPb028405; Fri, 1 Jul 2005 20:40:23 GMT Received: from [134.134.3.92] ([134.134.3.92]) by nwlxmail01.jf.intel.com (8.12.10/8.12.9/MailSET/Hub) with ESMTP id j61KeNSL001674; Fri, 1 Jul 2005 13:40:23 -0700 Date: Fri, 1 Jul 2005 13:39:31 -0700 (PDT) From: Radheka Godse X-X-Sender: radheka@localhost.localdomain To: fubar@us.ibm.com, bonding-devel@lists.sourceforge.net cc: netdev@oss.sgi.com Subject: [PATCH 2.6.13-rc1 3/17] bonding: export some structs to bonding.h Message-ID: ReplyTo: "Radheka Godse" MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed X-Scanned-By: MIMEDefang 2.44 X-archive-position: 2593 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: radheka.godse@intel.com Precedence: bulk X-list: netdev This patch exposes some data structures for use by the sysfs interface. Signed-off-by: Radheka Godse Signed-off-by: Mitch Williams diff -urN -X dontdiff linux-2.6.12post/drivers/net/bonding/bonding.h linux-2.6.12post-sysfs/drivers/net/bonding/bonding.h --- linux-2.6.12post/drivers/net/bonding/bonding.h 2005-06-28 18:18:03.000000000 -0700 +++ linux-2.6.12post-sysfs/drivers/net/bonding/bonding.h 2005-06-30 13:58:27.000000000 -0700 @@ -152,6 +152,11 @@ u32 arp_targets[BOND_MAX_ARP_TARGETS]; }; +struct bond_parm_tbl { + char *modename; + int mode; +}; + struct vlan_entry { struct list_head vlan_list; u32 vlan_ip; diff -urN -X dontdiff linux-2.6.12post/drivers/net/bonding/bond_main.c linux-2.6.12post-sysfs/drivers/net/bonding/bond_main.c --- linux-2.6.12post/drivers/net/bonding/bond_main.c 2005-06-28 18:18:03.000000000 -0700 +++ linux-2.6.12post-sysfs/drivers/net/bonding/bond_main.c 2005-06-30 13:53:55.000000000 -0700 @@ -585,7 +585,7 @@ static const char *version = DRV_DESCRIPTION ": v" DRV_VERSION " (" DRV_RELDATE ")\n"; -static LIST_HEAD(bond_dev_list); +LIST_HEAD(bond_dev_list); #ifdef CONFIG_PROC_FS static struct proc_dir_entry *bond_proc_dir = NULL; @@ -604,18 +604,14 @@ * command comes from an application using * another ABI version. */ -struct bond_parm_tbl { - char *modename; - int mode; -}; -static struct bond_parm_tbl bond_lacp_tbl[] = { +struct bond_parm_tbl bond_lacp_tbl[] = { { "slow", AD_LACP_SLOW}, { "fast", AD_LACP_FAST}, { NULL, -1}, }; -static struct bond_parm_tbl bond_mode_tbl[] = { +struct bond_parm_tbl bond_mode_tbl[] = { { "balance-rr", BOND_MODE_ROUNDROBIN}, { "active-backup", BOND_MODE_ACTIVEBACKUP}, { "balance-xor", BOND_MODE_XOR}, @@ -626,7 +622,7 @@ { NULL, -1}, }; -static struct bond_parm_tbl xmit_hashtype_tbl[] = { +struct bond_parm_tbl xmit_hashtype_tbl[] = { { "layer2", BOND_XMIT_POLICY_LAYER2}, { "layer3+4", BOND_XMIT_POLICY_LAYER34}, { NULL, -1}, @@ -634,12 +630,11 @@ /*-------------------------- Forward declarations ---------------------------*/ -static inline void bond_set_mode_ops(struct bonding *bond, int mode); static void bond_send_gratuitous_arp(struct bonding *bond); /*---------------------------- General routines -----------------------------*/ -static const char *bond_mode_name(int mode) +const char *bond_mode_name(int mode) { switch (mode) { case BOND_MODE_ROUNDROBIN : From radheka.godse@intel.com Fri Jul 1 13:39:09 2005 Received: with ECARTIS (v1.0.0; list netdev); Fri, 01 Jul 2005 13:39:23 -0700 (PDT) Received: from orsfmr004.jf.intel.com (fmr19.intel.com [134.134.136.18]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j61Kd9H9003077 for ; Fri, 1 Jul 2005 13:39:09 -0700 Received: from orsfmr101.jf.intel.com (orsfmr101.jf.intel.com [10.7.209.17]) by orsfmr004.jf.intel.com (8.12.10/8.12.10/d: major-outer.mc,v 1.1 2004/09/17 17:50:56 root Exp $) with ESMTP id j61KbWkn012922; Fri, 1 Jul 2005 20:37:32 GMT Received: from nwlxmail01.jf.intel.com (nwlxmail01.jf.intel.com [10.7.171.40]) by orsfmr101.jf.intel.com (8.12.10/8.12.10/d: major-inner.mc,v 1.2 2004/09/17 18:05:01 root Exp $) with ESMTP id j61KbW7G024918; Fri, 1 Jul 2005 20:37:32 GMT Received: from [134.134.3.92] ([134.134.3.92]) by nwlxmail01.jf.intel.com (8.12.10/8.12.9/MailSET/Hub) with ESMTP id j61Kb7SL000669; Fri, 1 Jul 2005 13:37:32 -0700 Date: Fri, 1 Jul 2005 13:36:19 -0700 (PDT) From: Radheka Godse X-X-Sender: radheka@localhost.localdomain To: fubar@us.ibm.com, bonding-devel@lists.sourceforge.net cc: netdev@oss.sgi.com Subject: [PATCH 2.6.13-rc1 1/17] bonding: make some functions not static Message-ID: ReplyTo: "Radheka Godse" MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed X-Scanned-By: MIMEDefang 2.44 X-archive-position: 2591 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: radheka.godse@intel.com Precedence: bulk X-list: netdev This patch prepares for adding sysfs functionality to bonding by making some functions in bond_main non-static and adding protos into the header. Signed-off-by: Radheka Godse Signed-off-by: Mitch Williams diff -urN -X dontdiff linux-2.6.12post/drivers/net/bonding/bonding.h linux-2.6.12post-sysfs/drivers/net/bonding/bonding.h --- linux-2.6.12post/drivers/net/bonding/bonding.h 2005-06-28 18:18:03.000000000 -0700 +++ linux-2.6.12post-sysfs/drivers/net/bonding/bonding.h 2005-06-30 13:58:27.000000000 -0700 @@ -255,6 +255,17 @@ struct vlan_entry *bond_next_vlan(struct bonding *bond, struct vlan_entry *curr); int bond_dev_queue_xmit(struct bonding *bond, struct sk_buff *skb, struct net_device *slave_dev); +void bond_deinit(struct net_device *bond_dev); +int bond_release(struct net_device *bond_dev, struct net_device *slave_dev); +int bond_sethwaddr(struct net_device *bond_dev, struct net_device *slave_dev); +void bond_mii_monitor(struct net_device *bond_dev); +void bond_loadbalance_arp_mon(struct net_device *bond_dev); +void bond_activebackup_arp_mon(struct net_device *bond_dev); +void bond_set_mode_ops(struct bonding *bond, int mode); +int bond_parse_parm(char *mode_arg, struct bond_parm_tbl *tbl); +const char *bond_mode_name(int mode); +void bond_select_active_slave(struct bonding *bond); +void bond_change_active_slave(struct bonding *bond, struct slave *new_active); #endif /* _LINUX_BONDING_H */ diff -urN -X dontdiff linux-2.6.12post/drivers/net/bonding/bond_main.c linux-2.6.12post-sysfs/drivers/net/bonding/bond_main.c --- linux-2.6.12post/drivers/net/bonding/bond_main.c 2005-06-28 18:18:03.000000000 -0700 +++ linux-2.6.12post-sysfs/drivers/net/bonding/bond_main.c 2005-06-30 13:53:55.000000000 -0700 @@ -1453,7 +1453,7 @@ * * Warning: Caller must hold curr_slave_lock for writing. */ -static void bond_change_active_slave(struct bonding *bond, struct slave *new_active) +void bond_change_active_slave(struct bonding *bond, struct slave *new_active) { struct slave *old_active = bond->curr_active_slave; @@ -1527,7 +1527,7 @@ * * Warning: Caller must hold curr_slave_lock for writing. */ -static void bond_select_active_slave(struct bonding *bond) +void bond_select_active_slave(struct bonding *bond) { struct slave *best_slave; @@ -1595,7 +1595,7 @@ /*---------------------------------- IOCTL ----------------------------------*/ -static int bond_sethwaddr(struct net_device *bond_dev, struct net_device *slave_dev) +int bond_sethwaddr(struct net_device *bond_dev, struct net_device *slave_dev) { dprintk("bond_dev=%p\n", bond_dev); dprintk("slave_dev=%p\n", slave_dev); @@ -2030,7 +2030,7 @@ * for Bonded connections: * The first up interface should be left on and all others downed. */ -static int bond_release(struct net_device *bond_dev, struct net_device *slave_dev) +int bond_release(struct net_device *bond_dev, struct net_device *slave_dev) { struct bonding *bond = bond_dev->priv; struct slave *slave, *oldcurrent; @@ -2479,7 +2479,7 @@ /*-------------------------------- Monitoring -------------------------------*/ /* this function is called regularly to monitor each slave's link. */ -static void bond_mii_monitor(struct net_device *bond_dev) +void bond_mii_monitor(struct net_device *bond_dev) { struct bonding *bond = bond_dev->priv; struct slave *slave, *oldcurrent; @@ -2904,7 +2904,7 @@ * arp is transmitted to generate traffic. see activebackup_arp_monitor for * arp monitoring in active backup mode. */ -static void bond_loadbalance_arp_mon(struct net_device *bond_dev) +void bond_loadbalance_arp_mon(struct net_device *bond_dev) { struct bonding *bond = bond_dev->priv; struct slave *slave, *oldcurrent; @@ -3042,7 +3042,7 @@ * may have received. * see loadbalance_arp_monitor for arp monitoring in load balancing mode */ -static void bond_activebackup_arp_mon(struct net_device *bond_dev) +void bond_activebackup_arp_mon(struct net_device *bond_dev) { struct bonding *bond = bond_dev->priv; struct slave *slave; @@ -4484,7 +4484,7 @@ /* * set bond mode specific net device operations */ -static inline void bond_set_mode_ops(struct bonding *bond, int mode) +void bond_set_mode_ops(struct bonding *bond, int mode) { struct net_device *bond_dev = bond->dev; @@ -4603,7 +4603,7 @@ /* De-initialize device specific data. * Caller must hold rtnl_lock. */ -static inline void bond_deinit(struct net_device *bond_dev) +void bond_deinit(struct net_device *bond_dev) { struct bonding *bond = bond_dev->priv; @@ -4639,7 +4639,7 @@ * Convert string input module parms. Accept either the * number of the mode or its string name. */ -static inline int bond_parse_parm(char *mode_arg, struct bond_parm_tbl *tbl) +int bond_parse_parm(char *mode_arg, struct bond_parm_tbl *tbl) { int i; From radheka.godse@intel.com Fri Jul 1 13:43:33 2005 Received: with ECARTIS (v1.0.0; list netdev); Fri, 01 Jul 2005 13:43:36 -0700 (PDT) Received: from orsfmr002.jf.intel.com (fmr17.intel.com [134.134.136.16]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j61KhXH9004840 for ; Fri, 1 Jul 2005 13:43:33 -0700 Received: from orsfmr100.jf.intel.com (orsfmr100.jf.intel.com [10.7.209.16]) by orsfmr002.jf.intel.com (8.12.10/8.12.10/d: major-outer.mc,v 1.1 2004/09/17 17:50:56 root Exp $) with ESMTP id j61Kfuq8018705; Fri, 1 Jul 2005 20:41:56 GMT Received: from nwlxmail01.jf.intel.com (nwlxmail01.jf.intel.com [10.7.171.40]) by orsfmr100.jf.intel.com (8.12.10/8.12.10/d: major-inner.mc,v 1.2 2004/09/17 18:05:01 root Exp $) with ESMTP id j61KftPb029624; Fri, 1 Jul 2005 20:41:55 GMT Received: from [134.134.3.92] ([134.134.3.92]) by nwlxmail01.jf.intel.com (8.12.10/8.12.9/MailSET/Hub) with ESMTP id j61KftSL002146; Fri, 1 Jul 2005 13:41:55 -0700 Date: Fri, 1 Jul 2005 13:41:08 -0700 (PDT) From: Radheka Godse X-X-Sender: radheka@localhost.localdomain To: fubar@us.ibm.com, bonding-devel@lists.sourceforge.net cc: netdev@oss.sgi.com Subject: [PATCH 2.6.13-rc1 4/17] bonding: return pointer to slave from enslave Message-ID: ReplyTo: "Radheka Godse" MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed X-Scanned-By: MIMEDefang 2.44 X-archive-position: 2594 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: radheka.godse@intel.com Precedence: bulk X-list: netdev This patch changes the bond_enslave function so that it (optionally) returns a pointer tothe slave struct for a new slave. This functionality is not used by the existing ioctl interface, but will be used by the sysfs interface. This function is also made non-static and a proto is placed in the header. Signed-off-by: Radheka Godse Signed-off-by: Mitch Williams diff -urN -X dontdiff linux-2.6.12post/drivers/net/bonding/bonding.h linux-2.6.12post-sysfs/drivers/net/bonding/bonding.h --- linux-2.6.12post/drivers/net/bonding/bonding.h 2005-06-28 18:18:03.000000000 -0700 +++ linux-2.6.12post-sysfs/drivers/net/bonding/bonding.h 2005-06-30 13:58:27.000000000 -0700 @@ -262,6 +262,7 @@ int bond_dev_queue_xmit(struct bonding *bond, struct sk_buff *skb, struct net_device *slave_dev); int bond_create(char *name, struct bond_params *params, struct bonding **newbond); void bond_deinit(struct net_device *bond_dev); +int bond_enslave(struct net_device *bond_dev, struct net_device *slave_dev, struct slave **vassal); int bond_release(struct net_device *bond_dev, struct net_device *slave_dev); int bond_sethwaddr(struct net_device *bond_dev, struct net_device *slave_dev); void bond_mii_monitor(struct net_device *bond_dev); diff -urN -X dontdiff linux-2.6.12post/drivers/net/bonding/bond_main.c linux-2.6.12post-sysfs/drivers/net/bonding/bond_main.c --- linux-2.6.12post/drivers/net/bonding/bond_main.c 2005-06-28 18:18:03.000000000 -0700 +++ linux-2.6.12post-sysfs/drivers/net/bonding/bond_main.c 2005-06-30 13:53:55.000000000 -0700 @@ -1601,7 +1601,7 @@ } /* enslave device to bond device */ -static int bond_enslave(struct net_device *bond_dev, struct net_device *slave_dev) +int bond_enslave(struct net_device *bond_dev, struct net_device *slave_dev, struct slave **vassal) { struct bonding *bond = bond_dev->priv; struct slave *new_slave = NULL; @@ -1992,6 +1992,8 @@ new_slave->link != BOND_LINK_DOWN ? "n up" : " down"); /* enslave is successful */ + if (vassal) + *vassal=new_slave; return 0; /* Undo stages on error */ @@ -4049,6 +4051,7 @@ return -EINVAL; } + down_write(&(bonding_rwsem)); slave_dev = dev_get_by_name(ifr->ifr_slave); dprintk("slave_dev=%p: \n", slave_dev); @@ -4060,7 +4063,7 @@ switch (cmd) { case BOND_ENSLAVE_OLD: case SIOCBONDENSLAVE: - res = bond_enslave(bond_dev, slave_dev); + res = bond_enslave(bond_dev, slave_dev, NULL); break; case BOND_RELEASE_OLD: case SIOCBONDRELEASE: From radheka.godse@intel.com Fri Jul 1 13:45:45 2005 Received: with ECARTIS (v1.0.0; list netdev); Fri, 01 Jul 2005 13:45:53 -0700 (PDT) Received: from orsfmr002.jf.intel.com (fmr17.intel.com [134.134.136.16]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j61KjjH9006613 for ; Fri, 1 Jul 2005 13:45:45 -0700 Received: from orsfmr100.jf.intel.com (orsfmr100.jf.intel.com [10.7.209.16]) by orsfmr002.jf.intel.com (8.12.10/8.12.10/d: major-outer.mc,v 1.1 2004/09/17 17:50:56 root Exp $) with ESMTP id j61Ki8q8019807; Fri, 1 Jul 2005 20:44:08 GMT Received: from nwlxmail01.jf.intel.com (nwlxmail01.jf.intel.com [10.7.171.40]) by orsfmr100.jf.intel.com (8.12.10/8.12.10/d: major-inner.mc,v 1.2 2004/09/17 18:05:01 root Exp $) with ESMTP id j61Ki8Pb030802; Fri, 1 Jul 2005 20:44:08 GMT Received: from [134.134.3.92] ([134.134.3.92]) by nwlxmail01.jf.intel.com (8.12.10/8.12.9/MailSET/Hub) with ESMTP id j61Ki8SL002587; Fri, 1 Jul 2005 13:44:08 -0700 Date: Fri, 1 Jul 2005 13:43:21 -0700 (PDT) From: Radheka Godse X-X-Sender: radheka@localhost.localdomain To: fubar@us.ibm.com, bonding-devel@lists.sourceforge.net cc: netdev@oss.sgi.com Subject: [PATCH 2.6.13-rc1 5/17] bonding: reset RLB flag during ALB init Message-ID: ReplyTo: "Radheka Godse" MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed X-Scanned-By: MIMEDefang 2.44 X-archive-position: 2595 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: radheka.godse@intel.com Precedence: bulk X-list: netdev This patch makes the ALB init function explicitly clear the RLB flag if RLB is not used. This is required by the sysfs interfaces, which allows the user to change mode without destroying the bond. Signed-off-by: Radheka Godse Signed-off-by: Mitch Williams diff -urN -X dontdiff linux-2.6.12post/drivers/net/bonding/bond_alb.c linux-2.6.12post-sysfs/drivers/net/bonding/bond_alb.c --- linux-2.6.12post/drivers/net/bonding/bond_alb.c 2005-06-17 12:48:29.000000000 -0700 +++ linux-2.6.12post-sysfs/drivers/net/bonding/bond_alb.c 2005-06-28 18:21:35.000000000 -0700 @@ -1256,6 +1256,8 @@ tlb_deinitialize(bond); return res; } + } else { + bond->alb_info.rlb_enabled = 0; } return 0; From tgraf@suug.ch Fri Jul 1 13:47:50 2005 Received: with ECARTIS (v1.0.0; list netdev); Fri, 01 Jul 2005 13:47:53 -0700 (PDT) Received: from postel.suug.ch (postel.suug.ch [195.134.158.23]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j61KllH9007212 for ; Fri, 1 Jul 2005 13:47:49 -0700 Received: by postel.suug.ch (Postfix, from userid 10001) id 56F661C0F3; Fri, 1 Jul 2005 22:46:37 +0200 (CEST) Date: Fri, 1 Jul 2005 22:46:37 +0200 From: Thomas Graf To: Patrick McHardy Cc: Patrick Jenkins , linux-kernel@vger.kernel.org, Maillist netdev Subject: Re: [PATCH] multipath routing algorithm, better patch Message-ID: <20050701204637.GX16076@postel.suug.ch> References: <42C4919A.5000009@trash.net> <20050701174117.GW16076@postel.suug.ch> <42C59ABA.1070305@trash.net> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <42C59ABA.1070305@trash.net> X-archive-position: 2596 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: tgraf@suug.ch Precedence: bulk X-list: netdev * Patrick McHardy <42C59ABA.1070305@trash.net> 2005-07-01 21:34 > So its no problem but simply missing support. BTW, do you know if > Stephen's new CVS repository is exported somewhere? cvs -d :pserver:cvsanon@developer.osdl.org/repos cvs login cvs -d :pserver:cvsanon@developer.osdl.org/repos cvs co iproute2 From radheka.godse@intel.com Fri Jul 1 13:48:20 2005 Received: with ECARTIS (v1.0.0; list netdev); Fri, 01 Jul 2005 13:48:23 -0700 (PDT) Received: from orsfmr003.jf.intel.com (fmr18.intel.com [134.134.136.17]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j61KmJH9007295 for ; Fri, 1 Jul 2005 13:48:20 -0700 Received: from orsfmr101.jf.intel.com (orsfmr101.jf.intel.com [10.7.209.17]) by orsfmr003.jf.intel.com (8.12.10/8.12.10/d: major-outer.mc,v 1.1 2004/09/17 17:50:56 root Exp $) with ESMTP id j61Kkhxw015076; Fri, 1 Jul 2005 20:46:43 GMT Received: from nwlxmail01.jf.intel.com (nwlxmail01.jf.intel.com [10.7.171.40]) by orsfmr101.jf.intel.com (8.12.10/8.12.10/d: major-inner.mc,v 1.2 2004/09/17 18:05:01 root Exp $) with ESMTP id j61Kkh7G031178; Fri, 1 Jul 2005 20:46:43 GMT Received: from [134.134.3.92] ([134.134.3.92]) by nwlxmail01.jf.intel.com (8.12.10/8.12.9/MailSET/Hub) with ESMTP id j61KkhSL003239; Fri, 1 Jul 2005 13:46:43 -0700 Date: Fri, 1 Jul 2005 13:45:56 -0700 (PDT) From: Radheka Godse X-X-Sender: radheka@localhost.localdomain To: fubar@us.ibm.com, bonding-devel@lists.sourceforge.net cc: netdev@oss.sgi.com Subject: [PATCH 2.6.13-rc1 6/17] bonding: ALB init kmalloc inside spinlock bugfix Message-ID: ReplyTo: "Radheka Godse" MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed X-Scanned-By: MIMEDefang 2.44 X-archive-position: 2597 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: radheka.godse@intel.com Precedence: bulk X-list: netdev This patch corrects bug in ALB init where kmalloc called inside a held lock causes stacdump in debug mode Signed-off-by: Radheka Godse Signed-off-by: Mitch Williams diff -urN -X dontdiff linux-2.6.12post/drivers/net/bonding/bond_alb.c linux-2.6.12post-sysfs/drivers/net/bonding/bond_alb.c --- linux-2.6.12post/drivers/net/bonding/bond_alb.c 2005-06-17 12:48:29.000000000 -0700 +++ linux-2.6.12post-sysfs/drivers/net/bonding/bond_alb.c 2005-06-28 18:21:35.000000000 -0700 @@ -198,20 +198,21 @@ { struct alb_bond_info *bond_info = &(BOND_ALB_INFO(bond)); int size = TLB_HASH_TABLE_SIZE * sizeof(struct tlb_client_info); + struct tlb_client_info *new_hashtbl; int i; spin_lock_init(&(bond_info->tx_hashtbl_lock)); - _lock_tx_hashtbl(bond); - - bond_info->tx_hashtbl = kmalloc(size, GFP_KERNEL); - if (!bond_info->tx_hashtbl) { + new_hashtbl = kmalloc(size, GFP_KERNEL); + if (!new_hashtbl) { printk(KERN_ERR DRV_NAME ": Error: %s: Failed to allocate TLB hash table\n", bond->dev->name); - _unlock_tx_hashtbl(bond); return -1; } + _lock_tx_hashtbl(bond); + + bond_info->tx_hashtbl = new_hashtbl; memset(bond_info->tx_hashtbl, 0, size); @@ -798,21 +801,22 @@ { struct alb_bond_info *bond_info = &(BOND_ALB_INFO(bond)); struct packet_type *pk_type = &(BOND_ALB_INFO(bond).rlb_pkt_type); + struct rlb_client_info *new_hashtbl; int size = RLB_HASH_TABLE_SIZE * sizeof(struct rlb_client_info); int i; spin_lock_init(&(bond_info->rx_hashtbl_lock)); - _lock_rx_hashtbl(bond); - - bond_info->rx_hashtbl = kmalloc(size, GFP_KERNEL); - if (!bond_info->rx_hashtbl) { + new_hashtbl = kmalloc(size, GFP_KERNEL); + if (!new_hashtbl) { printk(KERN_ERR DRV_NAME ": Error: %s: Failed to allocate RLB hash table\n", bond->dev->name); - _unlock_rx_hashtbl(bond); return -1; } + _lock_rx_hashtbl(bond); + + bond_info->rx_hashtbl = new_hashtbl; bond_info->rx_hashtbl_head = RLB_NULL_INDEX; From radheka.godse@intel.com Fri Jul 1 13:49:48 2005 Received: with ECARTIS (v1.0.0; list netdev); Fri, 01 Jul 2005 13:49:51 -0700 (PDT) Received: from orsfmr005.jf.intel.com (fmr20.intel.com [134.134.136.19]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j61KnSH9008077 for ; Fri, 1 Jul 2005 13:49:48 -0700 Received: from orsfmr100.jf.intel.com (orsfmr100.jf.intel.com [10.7.209.16]) by orsfmr005.jf.intel.com (8.12.10/8.12.10/d: major-outer.mc,v 1.1 2004/09/17 17:50:56 root Exp $) with ESMTP id j61Klptp008705; Fri, 1 Jul 2005 20:47:51 GMT Received: from nwlxmail01.jf.intel.com (nwlxmail01.jf.intel.com [10.7.171.40]) by orsfmr100.jf.intel.com (8.12.10/8.12.10/d: major-inner.mc,v 1.2 2004/09/17 18:05:01 root Exp $) with ESMTP id j61KlpPb000733; Fri, 1 Jul 2005 20:47:51 GMT Received: from [134.134.3.92] ([134.134.3.92]) by nwlxmail01.jf.intel.com (8.12.10/8.12.9/MailSET/Hub) with ESMTP id j61KlpSL003353; Fri, 1 Jul 2005 13:47:51 -0700 Date: Fri, 1 Jul 2005 13:47:04 -0700 (PDT) From: Radheka Godse X-X-Sender: radheka@localhost.localdomain To: fubar@us.ibm.com, bonding-devel@lists.sourceforge.net cc: netdev@oss.sgi.com Subject: [PATCH 2.6.13-rc1 7/17] bonding: make sysfs consistent with ifenslave behavior Message-ID: ReplyTo: "Radheka Godse" MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed X-Scanned-By: MIMEDefang 2.44 X-archive-position: 2598 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: radheka.godse@intel.com Precedence: bulk X-list: netdev For consistency with ifenslave, instead of exiting with an error, updated bonding sysfs to close and attempt to enslave an up adapter. Signed-off-by: Radheka Godse Signed-off-by: Mitch Williams diff -urN -X dontdiff linux-2.6.12post/drivers/net/bonding/bond_main.c linux-2.6.12post-sysfs/drivers/net/bonding/bond_main.c --- linux-2.6.12post/drivers/net/bonding/bond_main.c 2005-06-28 18:18:03.000000000 -0700 +++ linux-2.6.12post-sysfs/drivers/net/bonding/bond_main.c 2005-06-30 13:53:55.000000000 -0700 @@ -1665,10 +1665,19 @@ */ if ((slave_dev->flags & IFF_UP)) { printk(KERN_ERR DRV_NAME - ": Error: %s is up\n", - slave_dev->name); + ": %s: Warning: %s is up. Closing it " + "before adding to the bond.\n", + bond_dev->name, slave_dev->name); res = -EPERM; - goto err_undo_flags; + res = dev_close(slave_dev); + if (res) + { + printk(KERN_ERR DRV_NAME + ": %s: Error: Failed to close %s.\n", + bond_dev->name, slave_dev->name); + res = -EPERM; + goto err_undo_flags; + } } if (slave_dev->set_mac_address == NULL) { From radheka.godse@intel.com Fri Jul 1 13:51:16 2005 Received: with ECARTIS (v1.0.0; list netdev); Fri, 01 Jul 2005 13:51:22 -0700 (PDT) Received: from orsfmr004.jf.intel.com (fmr19.intel.com [134.134.136.18]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j61KpBH9008828 for ; Fri, 1 Jul 2005 13:51:16 -0700 Received: from orsfmr101.jf.intel.com (orsfmr101.jf.intel.com [10.7.209.17]) by orsfmr004.jf.intel.com (8.12.10/8.12.10/d: major-outer.mc,v 1.1 2004/09/17 17:50:56 root Exp $) with ESMTP id j61KnVkn019278; Fri, 1 Jul 2005 20:49:31 GMT Received: from nwlxmail01.jf.intel.com (nwlxmail01.jf.intel.com [10.7.171.40]) by orsfmr101.jf.intel.com (8.12.10/8.12.10/d: major-inner.mc,v 1.2 2004/09/17 18:05:01 root Exp $) with ESMTP id j61KnV7G032516; Fri, 1 Jul 2005 20:49:31 GMT Received: from [134.134.3.92] ([134.134.3.92]) by nwlxmail01.jf.intel.com (8.12.10/8.12.9/MailSET/Hub) with ESMTP id j61KnVSL003497; Fri, 1 Jul 2005 13:49:31 -0700 Date: Fri, 1 Jul 2005 13:48:44 -0700 (PDT) From: Radheka Godse X-X-Sender: radheka@localhost.localdomain To: fubar@us.ibm.com, bonding-devel@lists.sourceforge.net cc: netdev@oss.sgi.com Subject: [PATCH 2.6.13-rc1 8/17] bonding: SYSFS INTERFACE (large) Message-ID: ReplyTo: "Radheka Godse" MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed X-Scanned-By: MIMEDefang 2.44 X-archive-position: 2599 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: radheka.godse@intel.com Precedence: bulk X-list: netdev This large patch adds the sysfs interface to channel bonding. It will allow users to add and remove bonds, add and remove slaves, and change all bonding parameters without using ifenslave. The ifenslave interface still works. Signed-off-by: Radheka Godse Signed-off-by: Mitch Williams diff -urN -X dontdiff linux-2.6.12post/drivers/net/bonding/bonding.h linux-2.6.12post-sysfs/drivers/net/bonding/bonding.h --- linux-2.6.12post/drivers/net/bonding/bonding.h 2005-06-28 18:18:03.000000000 -0700 +++ linux-2.6.12post-sysfs/drivers/net/bonding/bonding.h 2005-06-30 13:58:27.000000000 -0700 @@ -37,6 +37,7 @@ #include #include #include +#include #include "bond_3ad.h" #include "bond_alb.h" @@ -262,6 +259,13 @@ int bond_dev_queue_xmit(struct bonding *bond, struct sk_buff *skb, struct net_device *slave_dev); int bond_create(char *name, struct bond_params *params, struct bonding **newbond); void bond_deinit(struct net_device *bond_dev); +int bond_create_sysfs(void); +void bond_destroy_sysfs(void); +void bond_destroy_sysfs_entry(struct bonding *bond); +int bond_create_sysfs_entry(struct bonding *bond); +int bond_create_slave_symlinks(struct net_device *master, struct net_device *slave); +void bond_destroy_slave_symlinks(struct net_device *master, struct net_device *slave); +int bond_check_abi_ver(void); int bond_enslave(struct net_device *bond_dev, struct net_device *slave_dev, struct slave **vassal); int bond_release(struct net_device *bond_dev, struct net_device *slave_dev); int bond_sethwaddr(struct net_device *bond_dev, struct net_device *slave_dev); diff -urN -X dontdiff linux-2.6.12post/drivers/net/bonding/bond_main.c linux-2.6.12post-sysfs/drivers/net/bonding/bond_main.c --- linux-2.6.12post/drivers/net/bonding/bond_main.c 2005-06-28 18:18:03.000000000 -0700 +++ linux-2.6.12post-sysfs/drivers/net/bonding/bond_main.c 2005-06-30 13:53:55.000000000 -0700 @@ -591,6 +591,7 @@ static struct proc_dir_entry *bond_proc_dir = NULL; #endif +extern struct rw_semaphore bonding_rwsem; static u32 arp_target[BOND_MAX_ARP_TARGETS] = { 0, } ; static int arp_ip_count = 0; static int bond_mode = BOND_MODE_ROUNDROBIN; @@ -1994,6 +1994,9 @@ } } + res = bond_create_slave_symlinks(bond_dev, slave_dev); + if (res) + goto err_unset_master; printk(KERN_INFO DRV_NAME ": %s: enslaving %s as a%s interface with a%s link.\n", bond_dev->name, slave_dev->name, @@ -2165,6 +2176,9 @@ write_unlock_bh(&bond->lock); + /* must do this from outside any spinlocks */ + bond_destroy_slave_symlinks(bond_dev, slave_dev); + bond_del_vlans_from_slave(bond, slave_dev); /* If the mode USES_PRIMARY, then we should only remove its @@ -2256,6 +2263,7 @@ */ write_unlock_bh(&bond->lock); + bond_destroy_slave_symlinks(bond_dev, slave_dev); bond_del_vlans_from_slave(bond, slave_dev); /* If the mode USES_PRIMARY, then we should only remove its @@ -2436,6 +2444,22 @@ } } +int bond_check_abi_ver(void) +{ + int retval = 1; + + if (orig_app_abi_ver == -1) { + orig_app_abi_ver = BOND_ABI_VERSION; + app_abi_ver = BOND_ABI_VERSION; + } + else { + if (app_abi_ver == 0) + retval = 0; + } + + return retval; +} + static int bond_info_query(struct net_device *bond_dev, struct ifbond *info) { struct bonding *bond = bond_dev->priv; @@ -3569,7 +3593,10 @@ bond_remove_proc_entry(bond); bond_create_proc_entry(bond); #endif - + down_write(&(bonding_rwsem)); + bond_destroy_sysfs_entry(bond); + bond_create_sysfs_entry(bond); + up_write(&(bonding_rwsem)); return NOTIFY_DONE; } @@ -4101,6 +4128,7 @@ orig_app_abi_ver = prev_abi_ver; } + up_write(&(bonding_rwsem)); return res; } @@ -5000,18 +4990,22 @@ goto err; } + res = bond_create_sysfs(); + if (res) + goto err; + register_netdevice_notifier(&bond_netdev_notifier); register_inetaddr_notifier(&bond_inetaddr_notifier); - return 0; - -out_err: - /* free and unregister all bonds that were successfully added */ + goto out; +err: + rtnl_lock(); bond_free_all(); - + bond_destroy_sysfs(); rtnl_unlock(); - +out: return res; + } static void __exit bonding_exit(void) @@ -5021,6 +5053,7 @@ rtnl_lock(); bond_free_all(); + bond_destroy_sysfs(); rtnl_unlock(); } diff -urN -X dontdiff linux-2.6.12post/drivers/net/bonding/bond_sysfs.c linux-2.6.12post-sysfs/drivers/net/bonding/bond_sysfs.c --- linux-2.6.12post/drivers/net/bonding/bond_sysfs.c 1969-12-31 16:00:00.000000000 -0800 +++ linux-2.6.12post-sysfs/drivers/net/bonding/bond_sysfs.c 2005-06-30 13:54:22.000000000 -0700 @@ -0,0 +1,1493 @@ + +/* + * Copyright(c) 2004-2005 Intel Corporation. All rights reserved. + * + * This program is free software; you can redistribute it and/or modify it + * under the terms of the GNU General Public License as published by the + * Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, but + * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY + * or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License + * for more details. + * + * You should have received a copy of the GNU General Public License along + * with this program; if not, write to the Free Software Foundation, Inc., + * 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA. + * + * The full GNU General Public License is included in this distribution in the + * file called LICENSE. + * + * + * Changes: + * + * 2004/12/12 - Mitch Williams + * - Initial creation of sysfs interface. + * + * 2005/06/22 - Radheka Godse + * - Added ifenslave -c type functionality to sysfs + * - Added sysfs files for attributes such as MII Status and + * 802.3ad aggregator that are displayed in /proc + * - Added "name value" format to sysfs "mode" and + * "lacp_rate", for e.g., "active-backup 1" or "slow 0" for + * consistency and ease of script parsing + * - Fixed reversal of octets in arp_ip_targets via sysfs + * - sysfs support to handle bond interface re-naming + * - Moved all sysfs entries into /sys/class/net instead of + * of using a standalone subsystem. + * - Added sysfs symlinks between masters and slaves + * - Corrected bugs in sysfs unload path when creating bonds + * with existing interface names. + * - Removed redundant sysfs stat file since it duplicates slave info + * from the proc file + * - Fixed errors in sysfs show/store arp targets. + * - For consistency with ifenslave, instead of exiting + * with an error, updated bonding sysfs to + * close and attempt to enslave an up adapter. + * - Fixed NULL dereference when adding a slave interface + * that does not exist. + * - Added checks in sysfs bonding to reject invalid ip addresses + * - Synch up with post linux-2.6.12 bonding changes + * - Created sysfs bond attrib for xmit_hash_policy + */ +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +/* #define BONDING_DEBUG 1 */ +#include "bonding.h" +#define to_class_dev(obj) container_of(obj,struct class_device,kobj) +#define to_net_dev(class) container_of(class, struct net_device, class_dev) +#define to_bond(cd) ((struct bonding *)(to_net_dev(cd)->priv)) + +/*---------------------------- Declarations -------------------------------*/ + +/* Macros for real simple parsing of text. */ +#define eat_nonalnum(str,whence,max) \ + while (whence < max) {if (!isalnum(str[whence])) whence++; else break;}; +#define find_next_nonalpha(str,whence,max) \ + while (whence < max) {if (isalnum(str[whence])) whence++; else break;}; + +extern struct list_head bond_dev_list; +extern struct bond_params bonding_defaults; +extern struct bond_parm_tbl bond_mode_tbl[]; +extern struct bond_parm_tbl bond_lacp_tbl[]; +extern struct bond_parm_tbl xmit_hashtype_tbl[]; + +static struct class *netdev_class; +/*--------------------------- Data Structures -----------------------------*/ + +/* Bonding sysfs lock. Why can't we just use the subsytem lock? + * Because kobject_register tries to acquire the subsystem lock. If + * we already hold the lock (which we would if the user was creating + * a new bond through the sysfs interface), we deadlock. + */ + +struct rw_semaphore bonding_rwsem; + + + + +/*------------------------------ Functions --------------------------------*/ + +/* + * "show" function for the bond_masters attribute. + * The class parameter is ignored. + */ +static ssize_t bonding_show_bonds(struct class *cls, char *buffer) +{ + int res = 0; + struct bonding *bond; + + down_read(&(bonding_rwsem)); + + list_for_each_entry(bond, &bond_dev_list, bond_list) { + res += sprintf(buffer + res, "%s ", + bond->dev->name); + if (res > (PAGE_SIZE - IFNAMSIZ)) { + dprintk("eek! too many bonds!\n"); + break; + } + } + res += sprintf(buffer + res, "\n"); + res++; + up_read(&(bonding_rwsem)); + return res; +} + +/* + * "store" function for the bond_masters attribute. This is what + * creates and deletes entire bonds. + * + * The class parameter is ignored. + * + * This function uses the eat_nonalnum and eat_alnum macros, define + * above. Why not use sscanf()? Scanf can get strings, but can't filter + * out inappropriate characters. For example, we can't have bonds named + * "foo/bar" or "foo*bar" or "Does this work?" as these aren't valid + * filenames. While we could use scanf to get strings and then validate + * them, this is quicker. + * The above examples give us these results: + * "foo/bar" gives two bonds, "foo" and "bar". + * "foo*bar" gives two bonds, "foo" and "bar". + * "Does this work?" gives three bonds, "Does", "this", and "work". + */ + +static ssize_t bonding_store_bonds(struct class *cls, const char *buffer, size_t count) +{ + char name[IFNAMSIZ]; + int i, res, found, pos = 0; + struct bonding *bond; + struct bonding *nxt; + + down_write(&(bonding_rwsem)); + /* First process adds */ + eat_nonalnum(buffer, pos, count); + /* Pos now points to the first alpha character. */ + i = pos; + find_next_nonalpha(buffer, i, count); + /* i now points to the next character past the end of the bond name. */ + if (i - pos >= IFNAMSIZ) { + printk(KERN_ERR DRV_NAME "Interface name %.*s too large! Ignoring.\n", + i - pos, buffer + pos); + up_write(&(bonding_rwsem)); + return -EPERM; + } + /* Copy the bond name so we can deal with it separately. */ + strncpy(name, buffer + pos, i - pos); + /* Don't forget the null terminator! */ + name[i - pos] = 0; + while (strlen(name)) { + /* Got a bond name in name. Is it already in the list? */ + found = 0; + list_for_each_entry_safe(bond, nxt, &bond_dev_list, bond_list) { + if (strnicmp(bond->dev->name, name, IFNAMSIZ) == 0) { + /* Temporarily set a meaningless flag. When + * we get done with the loop, we'll check all of these. + * If the bond doesn't have this flag set, then we need + * to remove the bond. If the flag has it set, then + * we can just clear the flag. + */ + bond->flags |= IFF_DYNAMIC; + found = 1; + break; /* Found it, so go to next name */ + } + } + if (found == 0) { + printk(KERN_INFO DRV_NAME ": %s is being created...\n", name); + res = bond_create(name, &bonding_defaults, &bond); + if (res) { + up_write(&(bonding_rwsem)); + printk(KERN_INFO DRV_NAME ": %s interface already exists. Bond creation failed.\n", name); + return res; + } + printk(KERN_INFO DRV_NAME ": %s created.\n", name); + /* Set the flag so we don't delete + * this interface in the loop below. + */ + bond->flags |= IFF_DYNAMIC; + } + /* Scan for the next name. i still has the location of + * the char just past the end of the last name we handled. + */ + pos = i; + eat_nonalnum(buffer, pos, count); + i = pos; + find_next_nonalpha(buffer, i, count); + if (i - pos >= IFNAMSIZ) { + printk(KERN_ERR DRV_NAME + ": %.*s interface name too large! Ignoring.\n", + i - pos, buffer + pos); + up_write(&(bonding_rwsem)); + return -EPERM; + } + strncpy(name, buffer + pos, i - pos); + name[i - pos] = 0; + } /* end of while loop and end of input */ + + /* Now we do deletes. Walk through the list, and check to see + * if the flag didn't get set. If it's set, clear it. If it's + * not set, delete the bond. + */ + list_for_each_entry_safe(bond, nxt, &bond_dev_list, bond_list) { + if (bond->flags & IFF_DYNAMIC) { + bond->flags &= ~IFF_DYNAMIC; + } else { + printk(KERN_INFO DRV_NAME ": %s is being deleted...\n", bond->dev->name); + rtnl_lock(); + unregister_netdevice(bond->dev); + bond_deinit(bond->dev); + bond_destroy_sysfs_entry(bond); + rtnl_unlock(); + printk(KERN_INFO DRV_NAME ": %s deleted.\n", bond->dev->name); + } + } + /* Always return either count or an error. If you return 0, you'll + * get called forever, which is bad. + */ + up_write(&(bonding_rwsem)); + return count; +} +/* class attribute for bond_masters file. This ends up in /sys/class/net */ +static CLASS_ATTR(bonding_masters, S_IWUSR | S_IRUGO, + bonding_show_bonds, bonding_store_bonds); + +int bond_create_slave_symlinks(struct net_device *master, struct net_device *slave) +{ + char linkname[IFNAMSIZ+7]; + int ret = 0; + + /* first, create a link from the slave back to the master */ + ret = sysfs_create_link(&(slave->class_dev.kobj), &(master->class_dev.kobj), + "master"); + if (ret) + return ret; + /* next, create a link from the master to the slave */ + sprintf(linkname,"slave_%s",slave->name); + ret = sysfs_create_link(&(master->class_dev.kobj), &(slave->class_dev.kobj), + linkname); + return ret; + +} + +void bond_destroy_slave_symlinks(struct net_device *master, struct net_device *slave) +{ + char linkname[IFNAMSIZ+7]; + + sysfs_remove_link(&(slave->class_dev.kobj), "master"); + sprintf(linkname,"slave_%s",slave->name); + sysfs_remove_link(&(master->class_dev.kobj), linkname); +} + + +/* + * Show the slaves in the current bond. + */ +static ssize_t bonding_show_slaves(struct class_device *cd, char *buf) +{ + struct slave *slave; + int i, res = 0; + struct bonding *bond = to_bond(cd); + + down_read(&(bonding_rwsem)); + + read_lock_bh(&bond->lock); + bond_for_each_slave(bond, slave, i) { + res += sprintf(buf + res, "%s ", slave->dev->name); + if (res > (PAGE_SIZE - IFNAMSIZ)) { + dprintk("eek! too many slaves!\n"); + break; + } + } + read_unlock_bh(&bond->lock); + res += sprintf(buf + res, "\n"); + res++; + up_read(&(bonding_rwsem)); + return res; +} + +/* + * Set the slaves in the current bond. The bond interface must be + * up for this to succeed. + * This function is largely the same flow as bonding_update_bonds(). + */ +static ssize_t bonding_store_slaves(struct class_device *cd, const char *buffer, size_t count) +{ + char name[IFNAMSIZ]; + int i, j, res, found, pos = 0, ret = count; + struct slave *slave; + struct net_device *dev = 0; + struct bonding *bond = to_bond(cd); + + if (!bond_check_abi_ver()) + { + printk(KERN_ERR DRV_NAME + ": Error: your version of ifenslave is incompatible " + "with the sysfs interface in this version of bonding. " + "Upgrade ifenslave to version 1 or greater and reload " + "bonding.\n"); + return -EINVAL; + } + + down_write(&(bonding_rwsem)); + + /* Quick sanity check -- is the bond interface up? */ + if (!(bond->dev->flags & IFF_UP)) { + printk(KERN_ERR DRV_NAME + ": %s: Unable to update slaves because interface is down.\n", + bond->dev->name); + ret = -EPERM; + goto out; + } + + /* Note: We can't hold bond->lock here, as bond_create grabs it. */ + + /* First process adds */ + + /* Copy the first name we find. */ + eat_nonalnum(buffer, pos, count); + i = pos; + find_next_nonalpha(buffer, i, count); + if (i - pos >= IFNAMSIZ) { + printk(KERN_ERR DRV_NAME ": %s: Slave name %.*s too large! Ignoring.\n", + bond->dev->name, + i - pos, buffer + pos); + ret = -EPERM; + goto out; + } + strncpy(name, buffer + pos, i - pos); + name[i - pos] = 0; + + while (strlen(name)) { + /* Got a slave name in name. Is it already there? */ + found = 0; + read_lock_bh(&bond->lock); + bond_for_each_slave(bond, slave, j) { + if (strnicmp(slave->dev->name, name, IFNAMSIZ) == 0) { + /* Temporarily set a meaningless flag. When + * we get done with the loop, we'll check all of these. + * If the slave doesn't have this flag set, then we need + * to remove the slave. If the flag has it set, then + * we can just clear the flag. + */ + slave->original_flags |= IFF_DYNAMIC; + found = 1; + break; /* Found it, so go to next name */ + } + } + read_unlock_bh(&bond->lock); + if (found == 0) { + printk(KERN_INFO DRV_NAME ": %s: Adding slave %s.\n", + bond->dev->name, name); + dev = dev_get_by_name(name); + if (!dev) { + printk(KERN_INFO DRV_NAME + ": %s: Interface %s does not exist!\n", + bond->dev->name, name); + ret = -EPERM; + goto out; + } + else { + dev_put(dev); + } + + + if (dev->flags & IFF_SLAVE) { + printk(KERN_INFO DRV_NAME + ": %s: Interface %s is already enslaved!\n", + bond->dev->name, name); + ret = -EPERM; + goto out; + } + + /* If this is the first slave, then we need to set + the master's hardware address to be the same as the + slave's. */ + if (!(*((u32 *) & (bond->dev->dev_addr[0])))) { + memcpy(bond->dev->dev_addr, dev->dev_addr, + dev->addr_len); + } + + /* Set the slave's MTU to match the bond */ + if (dev->mtu != bond->dev->mtu) { + if (dev->change_mtu) { + res = dev->change_mtu(dev, + bond->dev->mtu); + if (res) { + ret = res; + goto out; + } + } else { + dev->mtu = bond->dev->mtu; + } + } + rtnl_lock(); + res = bond_enslave(bond->dev, dev, &slave); + rtnl_unlock(); + if (res) { + ret = res; + goto out; + } + slave->original_flags |= IFF_DYNAMIC; + } + /* Get the next name in the buffer. */ + pos = i; + eat_nonalnum(buffer, pos, count); + i = pos; + find_next_nonalpha(buffer, i, count); + if (i - pos >= IFNAMSIZ) { + printk(KERN_ERR DRV_NAME + ": %s: Slave name %.*s too large! Ignoring.\n", + bond->dev->name, i - pos, buffer + pos); + ret = -EPERM; + goto out; + } + strncpy(name, buffer + pos, i - pos); + name[i - pos] = 0; + } /* End of while loop, and end of input. */ + + /* Now handle deletes. + * We can't use bond_for_each_slave here because we might modify + * the list when we're inside the loop. We can't hold the lock + * either because bond_release grabs it. + */ + slave = bond->first_slave; + i = bond->slave_cnt; + while (i > 0) { + BUG_ON(slave == 0); + if (slave->original_flags & IFF_DYNAMIC) { + slave->original_flags &= ~IFF_DYNAMIC; + slave = slave->next; + } else { + /* Didn't find the name of this slave in the list, so + * remove the slave. + */ + printk(KERN_INFO DRV_NAME + ": %s: Removing slave %s\n", + bond->dev->name, + slave->dev->name); + dev = slave->dev; + slave = slave->next; + rtnl_lock(); + res = bond_release(bond->dev, dev); + rtnl_unlock(); + if (res) { + ret = res; + goto out; + } + /* set the slave MTU to the default */ + if (dev->change_mtu) { + dev->change_mtu(dev, 1500); + } else { + dev->mtu = 1500; + } + } + i--; + } + +out: + up_write(&(bonding_rwsem)); + return ret; +} +static CLASS_DEVICE_ATTR(slaves, S_IRUGO | S_IWUSR, bonding_show_slaves, bonding_store_slaves); + +/* + * Show and set the bonding mode. The bond interface must be down to + * change the mode. + */ +static ssize_t bonding_show_mode(struct class_device *cd, char *buf) +{ + int count; + struct bonding *bond = to_bond(cd); + + down_read(&(bonding_rwsem)); + count = sprintf(buf, "%s %d\n", + bond_mode_tbl[bond->params.mode].modename, + bond->params.mode) + 1; + up_read(&(bonding_rwsem)); + return count; +} + +static ssize_t bonding_store_mode(struct class_device *cd, const char *buf, size_t count) +{ + int new_value, ret = count; + struct bonding *bond = to_bond(cd); + + down_write(&(bonding_rwsem)); + + if (bond->dev->flags & IFF_UP) { + printk(KERN_ERR DRV_NAME + "Unable to update mode of %s because interface is up.\n", + bond->dev->name); + ret = -EPERM; + goto out; + } + + new_value = bond_parse_parm((char *)buf, bond_mode_tbl); + if (new_value < 0) { + printk(KERN_ERR DRV_NAME + ": %s: Ignoring invalid mode value %.*s.\n", + bond->dev->name, + (int)strlen(buf) - 1, buf); + ret = -EINVAL; + goto out; + } else { + bond->params.mode = new_value; + bond_set_mode_ops(bond, bond->params.mode); + printk(KERN_INFO DRV_NAME ": %s: setting mode to %s (%d).\n", + bond->dev->name, bond_mode_tbl[new_value].modename, new_value); + } +out: + up_write(&(bonding_rwsem)); + return ret; +} +static CLASS_DEVICE_ATTR(mode, S_IRUGO | S_IWUSR, bonding_show_mode, bonding_store_mode); + +/* + * Show and set the bonding transmit hash method. The bond interface must be down to + * change the xmit hash policy. + */ +static ssize_t bonding_show_xmit_hash(struct class_device *cd, char *buf) +{ + int count; + struct bonding *bond = to_bond(cd); + + down_read(&(bonding_rwsem)); + if ((bond->params.mode != BOND_MODE_XOR) && + (bond->params.mode != BOND_MODE_8023AD)) { + // Not Applicable + count = sprintf(buf, "NA\n") + 1; + } else { + count = sprintf(buf, "%s %d\n", + xmit_hashtype_tbl[bond->params.xmit_policy].modename, + bond->params.xmit_policy) + 1; + } + up_read(&(bonding_rwsem)); + + return count; +} + +static ssize_t bonding_store_xmit_hash(struct class_device *cd, const char *buf, size_t count) +{ + int new_value, ret = count; + struct bonding *bond = to_bond(cd); + + down_write(&(bonding_rwsem)); + + if (bond->dev->flags & IFF_UP) { + printk(KERN_ERR DRV_NAME + "%s: Interface is up. Unable to update xmit policy.\n", + bond->dev->name); + ret = -EPERM; + goto out; + } + + if ((bond->params.mode != BOND_MODE_XOR) && + (bond->params.mode != BOND_MODE_8023AD)) { + printk(KERN_ERR DRV_NAME + "%s: Transmit hash policy is irrelevant in this mode.\n", + bond->dev->name); + ret = -EPERM; + goto out; + } + + new_value = bond_parse_parm((char *)buf, xmit_hashtype_tbl); + if (new_value < 0) { + printk(KERN_ERR DRV_NAME + ": %s: Ignoring invalid xmit hash policy value %.*s.\n", + bond->dev->name, + (int)strlen(buf) - 1, buf); + ret = -EINVAL; + goto out; + } else { + bond->params.xmit_policy = new_value; + bond_set_mode_ops(bond, bond->params.mode); + printk(KERN_INFO DRV_NAME ": %s: setting xmit hash policy to %s (%d).\n", + bond->dev->name, xmit_hashtype_tbl[new_value].modename, new_value); + } +out: + up_write(&(bonding_rwsem)); + return ret; +} +static CLASS_DEVICE_ATTR(xmit_hash_policy, S_IRUGO | S_IWUSR, bonding_show_xmit_hash, bonding_store_xmit_hash); + +/* + * Show and set the arp timer interval. There are two tricky bits + * here. First, if ARP monitoring is activated, then we must disable + * MII monitoring. Second, if the ARP timer isn't running, we must + * start it. + */ +static ssize_t bonding_show_arp_interval(struct class_device *cd, char *buf) +{ + int count; + struct bonding *bond = to_bond(cd); + + down_read(&(bonding_rwsem)); + count = sprintf(buf, "%d\n", bond->params.arp_interval) + 1; + up_read(&(bonding_rwsem)); + return count; +} + +static ssize_t bonding_store_arp_interval(struct class_device *cd, const char *buf, size_t count) +{ + int new_value, ret = count; + struct bonding *bond = to_bond(cd); + + down_write(&(bonding_rwsem)); + + sscanf(buf, "%d", &new_value); + if (new_value < 0) { + printk(KERN_ERR DRV_NAME + ": %s: Invalid arp_interval value %d not in range %d-%d; rejected.\n", + bond->dev->name, new_value, 1, INT_MAX); + ret = -EINVAL; + goto out; + } else { + printk(KERN_INFO DRV_NAME + ": %s: Setting ARP monitoring interval to %d.\n", + bond->dev->name, new_value); + bond->params.arp_interval = new_value; + if (bond->params.miimon) { + printk(KERN_INFO DRV_NAME + ": %s: ARP monitoring cannot be used with MII monitoring. " + "%s Disabling MII monitoring.\n", + bond->dev->name, bond->dev->name); + bond->params.miimon = 0; + /* Kill MII timer, else it brings bond's link down */ + if (bond->arp_timer.function) { + printk(KERN_INFO DRV_NAME + ": %s: Kill MII timer, else it brings bond's link down...\n", + bond->dev->name); + del_timer_sync(&bond->mii_timer); + } + } + if (!bond->params.arp_targets[0]) { + printk(KERN_INFO DRV_NAME + ": %s: ARP monitoring has been set up, " + "but no ARP targets have been specified.\n", + bond->dev->name); + } + if (bond->dev->flags & IFF_UP) { + /* If the interface is up, we may need to fire off + * the ARP timer. If the interface is down, the + * timer will get fired off when the open function + * is called. + */ + if (bond->arp_timer.function) { + /* The timer's already set up, so fire it off */ + mod_timer(&bond->arp_timer, jiffies + 1); + } else { + /* Set up the timer. */ + init_timer(&bond->arp_timer); + bond->arp_timer.expires = jiffies + 1; + bond->arp_timer.data = + (unsigned long) bond->dev; + if (bond->params.mode == BOND_MODE_ACTIVEBACKUP) { + bond->arp_timer.function = + (void *) + &bond_activebackup_arp_mon; + } else { + bond->arp_timer.function = + (void *) + &bond_loadbalance_arp_mon; + } + add_timer(&bond->arp_timer); + } + } + } + +out: + up_write(&(bonding_rwsem)); + return ret; +} +static CLASS_DEVICE_ATTR(arp_interval, S_IRUGO | S_IWUSR , bonding_show_arp_interval, bonding_store_arp_interval); + +/* + * Show and set the arp targets. + */ +static ssize_t bonding_show_arp_targets(struct class_device *cd, char *buf) +{ + int i, res = 0; + struct bonding *bond = to_bond(cd); + + down_read(&(bonding_rwsem)); + + for (i = 0; (i < BOND_MAX_ARP_TARGETS) && bond->params.arp_targets[i]; + i++) { + res += sprintf(buf + res, "%u.%u.%u.%u", + NIPQUAD(bond->params.arp_targets[i])); + if ((i+1 < BOND_MAX_ARP_TARGETS) && bond->params.arp_targets[i+1]) + res += sprintf(buf + res, " "); + else + res += sprintf(buf + res, "\n"); + res++; + } + + up_read(&(bonding_rwsem)); + return res; +} + +static ssize_t bonding_store_arp_targets(struct class_device *cd, const char *buf, size_t count) +{ + int octet1, octet2, octet3, octet4, ret = count; + int newpos = 0, pos = 0, i = 0; + struct bonding *bond = to_bond(cd); + const char *delimiter; + + down_write(&(bonding_rwsem)); + + memset(bond->params.arp_targets, 0, sizeof(bond->params.arp_targets)); + if (sscanf(buf, "%d.%d.%d.%d%n", &octet1, &octet2, &octet3, &octet4, &pos) != 4) { + printk(KERN_ERR DRV_NAME + ": %s: Invalid arp targets.\n", bond->dev->name); + ret = -EINVAL; + goto out; + } + + /* check for delimiting white space */ + delimiter = buf + pos; + if(*delimiter != ' ' && *delimiter != '\n' ) { + printk(KERN_ERR DRV_NAME + ": %s: Invalid arp targets with missing whitespace.\n", bond->dev->name); + ret = -EINVAL; + goto out; + } + + while ((pos < count) && (i < BOND_MAX_ARP_TARGETS)) { + if ( (octet1 < 0) || (octet1 > 255) + || (octet2 < 0) || (octet2 > 255) + || (octet3 < 0) || (octet3 > 255) + || (octet4 < 0) || (octet4 > 255) + || ((octet1 == 0) && (octet2 == 0) && + (octet3 == 0) && (octet4 == 0)) + || ((octet1 == 255) && (octet2 == 255) && + (octet3 == 255) && (octet4 == 255)) + ) { + memset(bond->params.arp_targets, 0, sizeof(bond->params.arp_targets)); + printk(KERN_ERR DRV_NAME + ": %s: Unable to add arp target %d.%d.%d.%d\n", + bond->dev->name, octet1, octet2, octet3, octet4); + ret = -EINVAL; + goto out; + } else { + printk(KERN_INFO DRV_NAME + ": %s: Adding arp target %d.%d.%d.%d\n", + bond->dev->name, octet1, octet2, octet3, octet4); + bond->params.arp_targets[i] = htonl + ((((u32) octet1 & 0xff) << 24) | + (((u32) octet2 & 0xff) << 16) | + (((u32) octet3 & 0xff) << 8) | ((u32) octet4 & + 0xff)); + + i++; + } + + newpos = 0; + if (sscanf(buf + pos, "%d.%d.%d.%d%n", &octet1, &octet2, &octet3, + &octet4, &newpos) != 4 && newpos != 0) { + memset(bond->params.arp_targets, 0, sizeof(bond->params.arp_targets)); + printk(KERN_ERR DRV_NAME + ": %s: Invalid arp targets specified.\n", bond->dev->name); + ret = -EINVAL; + goto out; + } + if (newpos == 0) { + break; + } + + pos += newpos; + /* check for delimiting white space */ + delimiter = buf + pos; + if(*delimiter != ' ' && *delimiter != '\n' ) { + memset(bond->params.arp_targets, 0, sizeof(bond->params.arp_targets)); + printk(KERN_ERR DRV_NAME + ": %s: Invalid arp targets with missing whitespace.\n", bond->dev->name); + ret = -EINVAL; + goto out; + } + } +out: + up_write(&(bonding_rwsem)); + return ret; +} +static CLASS_DEVICE_ATTR(arp_ip_target, S_IRUGO | S_IWUSR , bonding_show_arp_targets, bonding_store_arp_targets); + +/* + * Show and set the up and down delays. These must be multiples of the + * MII monitoring value, and are stored internally as the multiplier. + * Thus, we must translate to MS for the real world. + */ +static ssize_t bonding_show_downdelay(struct class_device *cd, char *buf) +{ + int count; + struct bonding *bond = to_bond(cd); + + down_read(&(bonding_rwsem)); + count = sprintf(buf, "%d\n", bond->params.downdelay * bond->params.miimon) + 1; + up_read(&(bonding_rwsem)); + return count; +} + +static ssize_t bonding_store_downdelay(struct class_device *cd, const char *buf, size_t count) +{ + int new_value, ret = count; + struct bonding *bond = to_bond(cd); + + down_write(&(bonding_rwsem)); + + if (!(bond->params.miimon)) { + printk(KERN_ERR DRV_NAME + ": %s: Unable to set down delay as MII monitoring is disabled\n", + bond->dev->name); + ret = -EPERM; + goto out; + } + + sscanf(buf, "%d", &new_value); + if (new_value < 0) { + printk(KERN_ERR DRV_NAME + ": %s: Invalid down delay value %d not in range %d-%d; rejected.\n", + bond->dev->name, new_value, 1, INT_MAX); + ret = -EINVAL; + goto out; + } else { + if ((new_value % bond->params.miimon) != 0) { + printk(KERN_WARNING DRV_NAME + ": %s: Warning: down delay (%d) is not a multiple " + "of miimon (%d), delay rounded to %d ms\n", + bond->dev->name, new_value, bond->params.miimon, + (new_value / bond->params.miimon) * + bond->params.miimon); + } + bond->params.downdelay = new_value / bond->params.miimon; + printk(KERN_INFO DRV_NAME ": %s: Setting down delay to %d.\n", + bond->dev->name, bond->params.downdelay * bond->params.miimon); + + } + +out: + up_write(&(bonding_rwsem)); + return ret; +} +static CLASS_DEVICE_ATTR(down_delay, S_IRUGO | S_IWUSR , bonding_show_downdelay, bonding_store_downdelay); + +static ssize_t bonding_show_updelay(struct class_device *cd, char *buf) +{ + int count; + struct bonding *bond = to_bond(cd); + + down_read(&(bonding_rwsem)); + count = sprintf(buf, "%d\n", bond->params.updelay * bond->params.miimon) + 1; + up_read(&(bonding_rwsem)); + return count; + +} + +static ssize_t bonding_store_updelay(struct class_device *cd, const char *buf, size_t count) +{ + int new_value, ret = count; + struct bonding *bond = to_bond(cd); + + down_write(&(bonding_rwsem)); + + if (!(bond->params.miimon)) { + printk(KERN_ERR DRV_NAME + ": %s: Unable to set up delay as MII monitoring is disabled\n", + bond->dev->name); + ret = -EPERM; + goto out; + } + + sscanf(buf, "%d", &new_value); + if (new_value < 0) { + printk(KERN_ERR DRV_NAME + ": %s: Invalid down delay value %d not in range %d-%d; rejected.\n", + bond->dev->name, new_value, 1, INT_MAX); + ret = -EINVAL; + goto out; + } else { + if ((new_value % bond->params.miimon) != 0) { + printk(KERN_WARNING DRV_NAME + ": %s: Warning: up delay (%d) is not a multiple " + "of miimon (%d), updelay rounded to %d ms\n", + bond->dev->name, new_value, bond->params.miimon, + (new_value / bond->params.miimon) * + bond->params.miimon); + } + bond->params.updelay = new_value / bond->params.miimon; + printk(KERN_INFO DRV_NAME ": %s: Setting up delay to %d.\n", + bond->dev->name, bond->params.updelay * bond->params.miimon); + + } + +out: + up_write(&(bonding_rwsem)); + return ret; +} +static CLASS_DEVICE_ATTR(up_delay, S_IRUGO | S_IWUSR , bonding_show_updelay, bonding_store_updelay); + +/* + * Show and set the LACP interval. Interface must be down, and the mode + * must be set to 802.3ad mode. + */ +static ssize_t bonding_show_lacp(struct class_device *cd, char *buf) +{ + int count; + struct bonding *bond = to_bond(cd); + + down_read(&(bonding_rwsem)); + count = sprintf(buf, "%s %d\n", + bond_lacp_tbl[bond->params.lacp_fast].modename, + bond->params.lacp_fast) + 1; + up_read(&(bonding_rwsem)); + return count; +} + +static ssize_t bonding_store_lacp(struct class_device *cd, const char *buf, size_t count) +{ + int new_value, ret = count; + struct bonding *bond = to_bond(cd); + + down_write(&(bonding_rwsem)); + + if (bond->dev->flags & IFF_UP) { + printk(KERN_ERR DRV_NAME + ": %s: Unable to update LACP rate because interface is up.\n", + bond->dev->name); + ret = -EPERM; + goto out; + } + + if (bond->params.mode != BOND_MODE_8023AD) { + printk(KERN_ERR DRV_NAME + ": %s: Unable to update LACP rate because bond is not in 802.3ad mode.\n", + bond->dev->name); + ret = -EPERM; + goto out; + } + + new_value = bond_parse_parm((char *)buf, bond_lacp_tbl); + + if ((new_value == 1) || (new_value == 0)) { + bond->params.lacp_fast = new_value; + printk(KERN_INFO DRV_NAME + ": %s: Setting LACP rate to %s (%d).\n", + bond->dev->name, bond_lacp_tbl[new_value].modename, new_value); + } else { + printk(KERN_ERR DRV_NAME + ": %s: Ignoring invalid LACP rate value %.*s.\n", + bond->dev->name, (int)strlen(buf) - 1, buf); + ret = -EINVAL; + } +out: + up_write(&(bonding_rwsem)); + return ret; +} +static CLASS_DEVICE_ATTR(lacp_rate, S_IRUGO | S_IWUSR, bonding_show_lacp, bonding_store_lacp); + +/* + * Show and set the MII monitor interval. There are two tricky bits + * here. First, if MII monitoring is activated, then we must disable + * ARP monitoring. Second, if the timer isn't running, we must + * start it. + */ +static ssize_t bonding_show_miimon(struct class_device *cd, char *buf) +{ + int count; + struct bonding *bond = to_bond(cd); + + down_read(&(bonding_rwsem)); + count = sprintf(buf, "%d\n", bond->params.miimon) + 1; + up_read(&(bonding_rwsem)); + return count; +} + +static ssize_t bonding_store_miimon(struct class_device *cd, const char *buf, size_t count) +{ + int new_value, ret = count; + struct bonding *bond = to_bond(cd); + + down_read(&(bonding_rwsem)); + + sscanf(buf, "%d", &new_value); + if (new_value < 0) { + printk(KERN_ERR DRV_NAME + ": %s: Invalid miimon value %d not in range %d-%d; rejected.\n", + bond->dev->name, new_value, 1, INT_MAX); + ret = -EINVAL; + goto out; + } else { + printk(KERN_INFO DRV_NAME + ": %s: Setting MII monitoring interval to %d.\n", + bond->dev->name, new_value); + bond->params.miimon = new_value; + if(bond->params.updelay) + printk(KERN_INFO DRV_NAME + ": %s: Note: Updating updelay (to %d) " + "since it is a multiple of the miimon value.\n", + bond->dev->name, + bond->params.updelay * bond->params.miimon); + if(bond->params.downdelay) + printk(KERN_INFO DRV_NAME + ": %s: Note: Updating downdelay (to %d) " + "since it is a multiple of the miimon value.\n", + bond->dev->name, + bond->params.downdelay * bond->params.miimon); + if (bond->params.arp_interval) { + printk(KERN_INFO DRV_NAME + ": %s: MII monitoring cannot be used with " + "ARP monitoring. Disabling ARP monitoring...\n", + bond->dev->name); + bond->params.arp_interval = 0; + /* Kill ARP timer, else it brings bond's link down */ + if (bond->mii_timer.function) { + printk(KERN_INFO DRV_NAME + ": %s: Kill ARP timer, else it brings bond's link down...\n", + bond->dev->name); + del_timer_sync(&bond->arp_timer); + } + } + + if (bond->dev->flags & IFF_UP) { + /* If the interface is up, we may need to fire off + * the MII timer. If the interface is down, the + * timer will get fired off when the open function + * is called. + */ + if (bond->mii_timer.function) { + /* The timer's already set up, so fire it off */ + mod_timer(&bond->mii_timer, jiffies + 1); + } else { + /* Set up the timer. */ + init_timer(&bond->mii_timer); + bond->mii_timer.expires = jiffies + 1; + bond->mii_timer.data = + (unsigned long) bond->dev; + bond->mii_timer.function = + (void *) &bond_mii_monitor; + add_timer(&bond->mii_timer); + } + } + } +out: + up_read(&(bonding_rwsem)); + return ret; +} +static CLASS_DEVICE_ATTR(miimon, S_IRUGO | S_IWUSR, bonding_show_miimon, bonding_store_miimon); + +/* + * Show and set the primary slave. The store function is much + * simpler than bonding_store_slaves function because it only needs to + * handle one interface name. + * The bond must be a mode that supports a primary for this be + * set. + */ +static ssize_t bonding_show_primary(struct class_device *cd, char *buf) +{ + int count = 0; + struct bonding *bond = to_bond(cd); + + down_read(&(bonding_rwsem)); + if (bond->primary_slave) + count = sprintf(buf, "%s\n", bond->primary_slave->dev->name) + 1; + + up_read(&(bonding_rwsem)); + return count; +} + +static ssize_t bonding_store_primary(struct class_device *cd, const char *buf, size_t count) +{ + int i; + struct slave *slave; + struct bonding *bond = to_bond(cd); + + down_write(&(bonding_rwsem)); + + write_lock_bh(&bond->lock); + if (!USES_PRIMARY(bond->params.mode)) { + printk(KERN_INFO DRV_NAME + ": %s: Unable to set primary slave; %s is in mode %d\n", + bond->dev->name, bond->dev->name, bond->params.mode); + } else { + bond_for_each_slave(bond, slave, i) { + if (strnicmp + (slave->dev->name, buf, + strlen(slave->dev->name)) == 0) { + printk(KERN_INFO DRV_NAME + ": %s: Setting %s as primary slave.\n", + bond->dev->name, slave->dev->name); + bond->primary_slave = slave; + bond_select_active_slave(bond); + goto out; + } + } + + /* if we got here, then we didn't match the name of any slave */ + + if (strlen(buf) == 0 || buf[0] == '\n') { + printk(KERN_INFO DRV_NAME + ": %s: Setting primary slave to None.\n", + bond->dev->name); + bond->primary_slave = 0; + bond_select_active_slave(bond); + } else { + printk(KERN_INFO DRV_NAME + ": %s: Unable to set %.*s as primary slave as it is not a slave.\n", + bond->dev->name, (int)strlen(buf) - 1, buf); + } + } +out: + write_unlock_bh(&bond->lock); + up_write(&(bonding_rwsem)); + return count; +} +static CLASS_DEVICE_ATTR(primary, S_IRUGO | S_IWUSR, bonding_show_primary, bonding_store_primary); + +/* + * Show and set the use_carrier flag. + */ +static ssize_t bonding_show_carrier(struct class_device *cd, char *buf) +{ + int count; + struct bonding *bond = to_bond(cd); + + down_read(&(bonding_rwsem)); + count = sprintf(buf, "%d\n", bond->params.use_carrier) + 1; + up_read(&(bonding_rwsem)); + return count; +} + +static ssize_t bonding_store_carrier(struct class_device *cd, const char *buf, size_t count) +{ + int new_value; + struct bonding *bond = to_bond(cd); + + down_write(&(bonding_rwsem)); + + sscanf(buf, "%d", &new_value); + if ((new_value == 0) || (new_value == 1)) { + bond->params.use_carrier = new_value; + printk(KERN_INFO DRV_NAME ": %s: Setting use_carrier to %d.\n", + bond->dev->name, new_value); + } else { + printk(KERN_INFO DRV_NAME + ": %s: Ignoring invalid use_carrier value %d.\n", + bond->dev->name, new_value); + } + up_write(&(bonding_rwsem)); + return count; +} +static CLASS_DEVICE_ATTR(use_carrier, S_IRUGO | S_IWUSR, bonding_show_carrier, bonding_store_carrier); + + +/* + * Show and set currently active_slave. + */ +static ssize_t bonding_show_active_slave(struct class_device *cd, char *buf) +{ + struct slave *curr; + struct bonding *bond = to_bond(cd); + int count; + + down_read(&(bonding_rwsem)); + + read_lock(&bond->curr_slave_lock); + curr = bond->curr_active_slave; + read_unlock(&bond->curr_slave_lock); + + if (USES_PRIMARY(bond->params.mode)) { + count = sprintf(buf, "%s\n", (curr) ? curr->dev->name : "None") + 1; + } + else { + count = sprintf(buf, "%s\n", "None") + 1; + } + up_read(&(bonding_rwsem)); + return count; +} + +static ssize_t bonding_store_active_slave(struct class_device *cd, const char *buf, size_t count) +{ + int i; + struct slave *slave; + struct slave *old_active = NULL; + struct slave *new_active = NULL; + struct bonding *bond = to_bond(cd); + + down_write(&(bonding_rwsem)); + + write_lock_bh(&bond->lock); + if (!USES_PRIMARY(bond->params.mode)) { + printk(KERN_INFO DRV_NAME + ": %s: Unable to change active slave; %s is in mode %d\n", + bond->dev->name, bond->dev->name, bond->params.mode); + } else { + bond_for_each_slave(bond, slave, i) { + if (strnicmp + (slave->dev->name, buf, + strlen(slave->dev->name)) == 0) { + old_active = bond->curr_active_slave; + new_active = slave; + if (new_active && (new_active == old_active)) { + /* do nothing */ + printk(KERN_INFO DRV_NAME + ": %s: %s is already the current active slave.\n", + bond->dev->name, slave->dev->name); + goto out; + } + else { + if ((new_active) && + (old_active) && + (new_active->link == BOND_LINK_UP) && + IS_UP(new_active->dev)) { + printk(KERN_INFO DRV_NAME + ": %s: Setting %s as active slave.\n", + bond->dev->name, slave->dev->name); + bond_change_active_slave(bond, new_active); + } + else { + printk(KERN_INFO DRV_NAME + ": %s: Could not set %s as active slave; " + "either %s is down or the link is down.\n", + bond->dev->name, slave->dev->name, + slave->dev->name); + } + goto out; + } + } + } + + /* if we got here, then we didn't match the name of any slave */ + + if (strlen(buf) == 0 || buf[0] == '\n') { + printk(KERN_INFO DRV_NAME + ": %s: Setting active slave to None.\n", + bond->dev->name); + bond->primary_slave = 0; + bond_select_active_slave(bond); + } else { + printk(KERN_INFO DRV_NAME + ": %s: Unable to set %.*s as active slave as it is not a slave.\n", + bond->dev->name, (int)strlen(buf) - 1, buf); + } + } +out: + write_unlock_bh(&bond->lock); + up_write(&(bonding_rwsem)); + return count; + +} +static CLASS_DEVICE_ATTR(active_slave, S_IRUGO | S_IWUSR, bonding_show_active_slave, bonding_store_active_slave); + + +/* + * Show link status of the bond interface. + */ +static ssize_t bonding_show_mii_status(struct class_device *cd, char *buf) +{ + int count; + struct slave *curr; + struct bonding *bond = to_bond(cd); + + down_read(&(bonding_rwsem)); + + read_lock(&bond->curr_slave_lock); + curr = bond->curr_active_slave; + read_unlock(&bond->curr_slave_lock); + + count = sprintf(buf, "%s\n", (curr) ? "up" : "down") + 1; + up_read(&(bonding_rwsem)); + return count; +} +static CLASS_DEVICE_ATTR(mii_status, S_IRUGO, bonding_show_mii_status, NULL); + + +/* + * Show current 802.3ad aggregator ID. + */ +static ssize_t bonding_show_ad_aggregator(struct class_device *cd, char *buf) +{ + int count = 0; + struct bonding *bond = to_bond(cd); + + down_read(&(bonding_rwsem)); + if (bond->params.mode == BOND_MODE_8023AD) { + struct ad_info ad_info; + count = sprintf(buf, "%d\n", (bond_3ad_get_active_agg_info(bond, &ad_info)) ? 0 : ad_info.aggregator_id) + 1; + } + up_read(&(bonding_rwsem)); + return count; +} +static CLASS_DEVICE_ATTR(ad_aggregator, S_IRUGO, bonding_show_ad_aggregator, NULL); + + +/* + * Show number of active 802.3ad ports. + */ +static ssize_t bonding_show_ad_num_ports(struct class_device *cd, char *buf) +{ + int count = 0; + struct bonding *bond = to_bond(cd); + + down_read(&(bonding_rwsem)); + if (bond->params.mode == BOND_MODE_8023AD) { + struct ad_info ad_info; + count = sprintf(buf, "%d\n", (bond_3ad_get_active_agg_info(bond, &ad_info)) ? 0: ad_info.ports) + 1; + } + up_read(&(bonding_rwsem)); + return count; +} +static CLASS_DEVICE_ATTR(ad_num_ports, S_IRUGO, bonding_show_ad_num_ports, NULL); + + +/* + * Show current 802.3ad actor key. + */ +static ssize_t bonding_show_ad_actor_key(struct class_device *cd, char *buf) +{ + int count = 0; + struct bonding *bond = to_bond(cd); + + down_read(&(bonding_rwsem)); + if (bond->params.mode == BOND_MODE_8023AD) { + struct ad_info ad_info; + count = sprintf(buf, "%d\n", (bond_3ad_get_active_agg_info(bond, &ad_info)) ? 0 : ad_info.actor_key) + 1; + } + up_read(&(bonding_rwsem)); + return count; +} +static CLASS_DEVICE_ATTR(ad_actor_key, S_IRUGO, bonding_show_ad_actor_key, NULL); + + +/* + * Show current 802.3ad partner key. + */ +static ssize_t bonding_show_ad_partner_key(struct class_device *cd, char *buf) +{ + int count = 0; + struct bonding *bond = to_bond(cd); + + down_read(&(bonding_rwsem)); + if (bond->params.mode == BOND_MODE_8023AD) { + struct ad_info ad_info; + count = sprintf(buf, "%d\n", (bond_3ad_get_active_agg_info(bond, &ad_info)) ? 0 : ad_info.partner_key) + 1; + } + up_read(&(bonding_rwsem)); + return count; +} +static CLASS_DEVICE_ATTR(ad_partner_key, S_IRUGO, bonding_show_ad_partner_key, NULL); + + +/* + * Show current 802.3ad partner mac. + */ +static ssize_t bonding_show_ad_partner_mac(struct class_device *cd, char *buf) +{ + int count = 0; + struct bonding *bond = to_bond(cd); + + down_read(&(bonding_rwsem)); + if (bond->params.mode == BOND_MODE_8023AD) { + struct ad_info ad_info; + if (!bond_3ad_get_active_agg_info(bond, &ad_info)) { + count = sprintf(buf,"%02x:%02x:%02x:%02x:%02x:%02x\n", + ad_info.partner_system[0], + ad_info.partner_system[1], + ad_info.partner_system[2], + ad_info.partner_system[3], + ad_info.partner_system[4], + ad_info.partner_system[5]) + 1; + } + } + up_read(&(bonding_rwsem)); + return count; +} +static CLASS_DEVICE_ATTR(ad_partner_mac, S_IRUGO, bonding_show_ad_partner_mac, NULL); + + + +static struct attribute *per_bond_attrs[] = { + &class_device_attr_slaves.attr, + &class_device_attr_mode.attr, + &class_device_attr_arp_interval.attr, + &class_device_attr_arp_ip_target.attr, + &class_device_attr_down_delay.attr, + &class_device_attr_up_delay.attr, + &class_device_attr_lacp_rate.attr, + &class_device_attr_xmit_hash_policy.attr, + &class_device_attr_miimon.attr, + &class_device_attr_primary.attr, + &class_device_attr_use_carrier.attr, + &class_device_attr_active_slave.attr, + &class_device_attr_mii_status.attr, + &class_device_attr_ad_aggregator.attr, + &class_device_attr_ad_num_ports.attr, + &class_device_attr_ad_actor_key.attr, + &class_device_attr_ad_partner_key.attr, + &class_device_attr_ad_partner_mac.attr, + NULL, +}; + +static struct attribute_group bonding_group = { + .name = "bonding", + .attrs = per_bond_attrs, +}; + +/* + * Initialize sysfs. This sets up the bonding_masters file in + * /sys/class/net. + */ +int bond_create_sysfs(void) +{ + int ret = 0; + struct bonding *firstbond; + + init_rwsem(&bonding_rwsem); + + /* get the netdev class pointer */ + firstbond = container_of(bond_dev_list.next, struct bonding, bond_list); + if (!firstbond) + { + return -ENODEV; + + } + netdev_class = firstbond->dev->class_dev.class; + if (!netdev_class) + { + return -ENODEV; + } + ret = class_create_file(netdev_class, &class_attr_bonding_masters); + + return ret; + +} + +/* + * Remove /sys/class/net/bonding_masters. + */ +void bond_destroy_sysfs(void) +{ + if (netdev_class) + class_remove_file(netdev_class, &class_attr_bonding_masters); +} + +/* + * Initialize sysfs for each bond. This sets up and registers + * the 'bondctl' directory for each individual bond under /sys/class/net. + */ +int bond_create_sysfs_entry(struct bonding *bond) +{ + struct net_device *dev = bond->dev; + int err; + + err = sysfs_create_group(&(dev->class_dev.kobj), &bonding_group); + if (err) { + printk(KERN_EMERG "eek! didn't create group!\n"); + } + + return err; +} +/* + * Remove sysfs entries for each bond. + */ +void bond_destroy_sysfs_entry(struct bonding *bond) +{ + struct net_device *dev = bond->dev; + + sysfs_remove_group(&(dev->class_dev.kobj), &bonding_group); +} + diff -urN -X dontdiff linux-2.6.12post/drivers/net/bonding/Makefile linux-2.6.12post-sysfs/drivers/net/bonding/Makefile --- linux-2.6.12post/drivers/net/bonding/Makefile 2005-06-17 12:48:29.000000000 -0700 +++ linux-2.6.12post-sysfs/drivers/net/bonding/Makefile 2005-06-28 18:22:23.000000000 -0700 @@ -4,5 +4,5 @@ obj-$(CONFIG_BONDING) += bonding.o -bonding-objs := bond_main.o bond_3ad.o bond_alb.o +bonding-objs := bond_main.o bond_3ad.o bond_alb.o bond_sysfs.o From radheka.godse@intel.com Fri Jul 1 13:52:16 2005 Received: with ECARTIS (v1.0.0; list netdev); Fri, 01 Jul 2005 13:52:18 -0700 (PDT) Received: from orsfmr002.jf.intel.com (fmr17.intel.com [134.134.136.16]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j61KqFH9009088 for ; Fri, 1 Jul 2005 13:52:15 -0700 Received: from orsfmr100.jf.intel.com (orsfmr100.jf.intel.com [10.7.209.16]) by orsfmr002.jf.intel.com (8.12.10/8.12.10/d: major-outer.mc,v 1.1 2004/09/17 17:50:56 root Exp $) with ESMTP id j61Kodq8022675; Fri, 1 Jul 2005 20:50:39 GMT Received: from nwlxmail01.jf.intel.com (nwlxmail01.jf.intel.com [10.7.171.40]) by orsfmr100.jf.intel.com (8.12.10/8.12.10/d: major-inner.mc,v 1.2 2004/09/17 18:05:01 root Exp $) with ESMTP id j61KodPb002162; Fri, 1 Jul 2005 20:50:39 GMT Received: from [134.134.3.92] ([134.134.3.92]) by nwlxmail01.jf.intel.com (8.12.10/8.12.9/MailSET/Hub) with ESMTP id j61KodSL003725; Fri, 1 Jul 2005 13:50:39 -0700 Date: Fri, 1 Jul 2005 13:49:52 -0700 (PDT) From: Radheka Godse X-X-Sender: radheka@localhost.localdomain To: fubar@us.ibm.com, bonding-devel@lists.sourceforge.net cc: netdev@oss.sgi.com Subject: [PATCH 2.6.13-rc1 9/17] bonding: get primary name from slave dev Message-ID: ReplyTo: "Radheka Godse" MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed X-Scanned-By: MIMEDefang 2.44 X-archive-position: 2600 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: radheka.godse@intel.com Precedence: bulk X-list: netdev This patch fixes a bug in the proc file handler in bonding. The name of the primary slave was taken from the command-line options, instead of from the dev structure of the primary slave itself. This caused an incorrect dispaly if t he primary is set or changed via sysfs. Signed-off-by: Radheka Godse Signed-off-by: Mitch Williams diff -urN -X dontdiff linux-2.6.12post/drivers/net/bonding/bond_main.c linux-2.6.12post-sysfs/drivers/net/bonding/bond_main.c --- linux-2.6.12post/drivers/net/bonding/bond_main.c 2005-06-28 18:18:03.000000000 -0700 +++ linux-2.6.12post-sysfs/drivers/net/bonding/bond_main.c 2005-06-30 13:53:55.000000000 -0700 @@ -3369,8 +3369,8 @@ if (USES_PRIMARY(bond->params.mode)) { seq_printf(seq, "Primary Slave: %s\n", - (bond->params.primary[0]) ? - bond->params.primary : "None"); + (bond->primary_slave) ? + bond->primary_slave->dev->name : "None"); seq_printf(seq, "Currently Active Slave: %s\n", (curr) ? curr->dev->name : "None"); From radheka.godse@intel.com Fri Jul 1 13:53:16 2005 Received: with ECARTIS (v1.0.0; list netdev); Fri, 01 Jul 2005 13:53:19 -0700 (PDT) Received: from orsfmr004.jf.intel.com (fmr19.intel.com [134.134.136.18]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j61KrFH9009713 for ; Fri, 1 Jul 2005 13:53:16 -0700 Received: from orsfmr101.jf.intel.com (orsfmr101.jf.intel.com [10.7.209.17]) by orsfmr004.jf.intel.com (8.12.10/8.12.10/d: major-outer.mc,v 1.1 2004/09/17 17:50:56 root Exp $) with ESMTP id j61Kpdkn020138; Fri, 1 Jul 2005 20:51:39 GMT Received: from nwlxmail01.jf.intel.com (nwlxmail01.jf.intel.com [10.7.171.40]) by orsfmr101.jf.intel.com (8.12.10/8.12.10/d: major-inner.mc,v 1.2 2004/09/17 18:05:01 root Exp $) with ESMTP id j61Kpd7G001196; Fri, 1 Jul 2005 20:51:39 GMT Received: from [134.134.3.92] ([134.134.3.92]) by nwlxmail01.jf.intel.com (8.12.10/8.12.9/MailSET/Hub) with ESMTP id j61KpdSL003806; Fri, 1 Jul 2005 13:51:39 -0700 Date: Fri, 1 Jul 2005 13:50:52 -0700 (PDT) From: Radheka Godse X-X-Sender: radheka@localhost.localdomain To: fubar@us.ibm.com, bonding-devel@lists.sourceforge.net cc: netdev@oss.sgi.com Subject: [PATCH 2.6.13-rc1 10/17] bonding: added missing mode descriptions to modinfo Message-ID: ReplyTo: "Radheka Godse" MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed X-Scanned-By: MIMEDefang 2.44 X-archive-position: 2601 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: radheka.godse@intel.com Precedence: bulk X-list: netdev This patch updates the modinfo description for mode module param to include all available modes as described in bonding.txt. Signed-off-by: Radheka Godse Signed-off-by: Mitch Williams diff -urN -X dontdiff linux-2.6.12post/drivers/net/bonding/bond_main.c linux-2.6.12post-sysfs/drivers/net/bonding/bond_main.c --- linux-2.6.12post/drivers/net/bonding/bond_main.c 2005-06-28 18:18:03.000000000 -0700 +++ linux-2.6.12post-sysfs/drivers/net/bonding/bond_main.c 2005-06-30 13:53:55.000000000 -0700 @@ -564,17 +564,24 @@ module_param(updelay, int, 0); MODULE_PARM_DESC(updelay, "Delay before considering link up, in milliseconds"); module_param(downdelay, int, 0); -MODULE_PARM_DESC(downdelay, "Delay before considering link down, in milliseconds"); +MODULE_PARM_DESC(downdelay, "Delay before considering link down, " + "in milliseconds"); module_param(use_carrier, int, 0); -MODULE_PARM_DESC(use_carrier, "Use netif_carrier_ok (vs MII ioctls) in miimon; 0 for off, 1 for on (default)"); +MODULE_PARM_DESC(use_carrier, "Use netif_carrier_ok (vs MII ioctls) in miimon; " + "0 for off, 1 for on (default)"); module_param(mode, charp, 0); -MODULE_PARM_DESC(mode, "Mode of operation : 0 for round robin, 1 for active-backup, 2 for xor"); +MODULE_PARM_DESC(mode, "Mode of operation : 0 for balance-rr, " + "1 for active-backup, 2 for balance-xor, " + "3 for broadcast, 4 for 802.3ad, 5 for balance-tlb, " + "6 for balance-alb"); module_param(primary, charp, 0); MODULE_PARM_DESC(primary, "Primary network device to use"); module_param(lacp_rate, charp, 0); -MODULE_PARM_DESC(lacp_rate, "LACPDU tx rate to request from 802.3ad partner (slow/fast)"); +MODULE_PARM_DESC(lacp_rate, "LACPDU tx rate to request from 802.3ad partner " + "(slow/fast)"); module_param(xmit_hash_policy, charp, 0); -MODULE_PARM_DESC(xmit_hash_policy, "XOR hashing method : 0 for layer 2 (default), 1 for layer 3+4"); +MODULE_PARM_DESC(xmit_hash_policy, "XOR hashing method: 0 for layer 2 (default)" + ", 1 for layer 3+4"); module_param(arp_interval, int, 0); MODULE_PARM_DESC(arp_interval, "arp interval in milliseconds"); module_param_array(arp_ip_target, charp, NULL, 0); From radheka.godse@intel.com Fri Jul 1 13:54:18 2005 Received: with ECARTIS (v1.0.0; list netdev); Fri, 01 Jul 2005 13:54:21 -0700 (PDT) Received: from orsfmr002.jf.intel.com (fmr17.intel.com [134.134.136.16]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j61KsHH9010291 for ; Fri, 1 Jul 2005 13:54:17 -0700 Received: from orsfmr101.jf.intel.com (orsfmr101.jf.intel.com [10.7.209.17]) by orsfmr002.jf.intel.com (8.12.10/8.12.10/d: major-outer.mc,v 1.1 2004/09/17 17:50:56 root Exp $) with ESMTP id j61Kqfq8023577; Fri, 1 Jul 2005 20:52:41 GMT Received: from nwlxmail01.jf.intel.com (nwlxmail01.jf.intel.com [10.7.171.40]) by orsfmr101.jf.intel.com (8.12.10/8.12.10/d: major-inner.mc,v 1.2 2004/09/17 18:05:01 root Exp $) with ESMTP id j61Kqf7G001818; Fri, 1 Jul 2005 20:52:41 GMT Received: from [134.134.3.92] ([134.134.3.92]) by nwlxmail01.jf.intel.com (8.12.10/8.12.9/MailSET/Hub) with ESMTP id j61KqfSL003880; Fri, 1 Jul 2005 13:52:41 -0700 Date: Fri, 1 Jul 2005 13:51:49 -0700 (PDT) From: Radheka Godse X-X-Sender: radheka@localhost.localdomain To: fubar@us.ibm.com, bonding-devel@lists.sourceforge.net cc: netdev@oss.sgi.com Subject: [PATCH 2.6.13-rc1 11/17] bonding: add driver name to log messages Message-ID: ReplyTo: "Radheka Godse" MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed X-Scanned-By: MIMEDefang 2.44 X-archive-position: 2602 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: radheka.godse@intel.com Precedence: bulk X-list: netdev Trivial fix to include DRVNAME in logs printed to /var/log/messages Signed-off-by: Radheka Godse Signed-off-by: Mitch Williams diff -urN -X dontdiff linux-2.6.12post/drivers/net/bonding/bond_3ad.c linux-2.6.12post-sysfs/drivers/net/bonding/bond_3ad.c --- linux-2.6.12post/drivers/net/bonding/bond_3ad.c 2005-06-28 18:18:03.000000000 -0700 +++ linux-2.6.12post-sysfs/drivers/net/bonding/bond_3ad.c 2005-06-28 18:21:34.000000000 -0700 @@ -1915,7 +1915,8 @@ struct aggregator *aggregator; if (bond == NULL) { - printk(KERN_ERR "The slave %s is not attached to its bond\n", slave->dev->name); + printk(KERN_ERR DRV_NAME ": %s: The slave %s is not attached to its bond\n", + slave->dev->master->name, slave->dev->name); return -1; } @@ -2085,7 +2086,8 @@ // clear the aggregator ad_clear_agg(temp_aggregator); if (select_new_active_agg) { - printk(KERN_INFO "Removing an active aggregator\n"); + printk(KERN_INFO DRV_NAME ": %s: Removing an active aggregator\n", + slave->dev->master->name); // select new active aggregator ad_agg_selection_logic(__get_first_agg(port)); } @@ -2230,8 +2232,9 @@ // if slave is null, the whole port is not initialized if (!port->slave) { - printk(KERN_WARNING DRV_NAME ": Warning: speed changed for uninitialized port on %s\n", - slave->dev->name); + printk(KERN_WARNING DRV_NAME ": Warning: %s: speed " + "changed for uninitialized port on %s\n", + slave->dev->master->name, slave->dev->name); return; } @@ -2257,8 +2260,9 @@ // if slave is null, the whole port is not initialized if (!port->slave) { - printk(KERN_WARNING DRV_NAME ": Warning: duplex changed for uninitialized port on %s\n", - slave->dev->name); + printk(KERN_WARNING DRV_NAME ": %s: Warning: duplex changed " + "for uninitialized port on %s\n", + slave->dev->master->name, slave->dev->name); return; } @@ -2363,7 +2367,8 @@ } if (bond_3ad_get_active_agg_info(bond, &ad_info)) { - printk(KERN_DEBUG "ERROR: bond_3ad_get_active_agg_info failed\n"); + printk(KERN_DEBUG DRV_NAME ": %s: Error: " + "bond_3ad_get_active_agg_info failed\n", dev->name); goto out; } @@ -2372,7 +2377,9 @@ if (slaves_in_agg == 0) { /*the aggregator is empty*/ - printk(KERN_DEBUG "ERROR: active aggregator is empty\n"); + printk(KERN_DEBUG DRV_NAME ": %s: Error: active " + "aggregator is empty\n", + dev->name); goto out; } diff -urN -X dontdiff linux-2.6.12post/drivers/net/bonding/bond_main.c linux-2.6.12post-sysfs/drivers/net/bonding/bond_main.c --- linux-2.6.12post/drivers/net/bonding/bond_main.c 2005-06-28 18:18:03.000000000 -0700 +++ linux-2.6.12post-sysfs/drivers/net/bonding/bond_main.c 2005-06-30 13:53:55.000000000 -0700 @@ -1884,10 +1884,10 @@ new_slave->dev->name); if (bond->params.mode == BOND_MODE_8023AD) { - printk(KERN_WARNING - "Operation of 802.3ad mode requires ETHTOOL " + printk(KERN_WARNING DRV_NAME + ": %s: Warning: Operation of 802.3ad mode requires ETHTOOL " "support in base driver for proper aggregator " - "selection.\n"); + "selection.\n", bond_dev->name); } } @@ -2716,8 +2706,11 @@ break; default: /* Should not happen */ - printk(KERN_ERR "bonding: Error: %s Illegal value (link=%d)\n", - slave->dev->name, slave->link); + printk(KERN_ERR DRV_NAME + ": %s: Error: %s Illegal value (link=%d)\n", + bond_dev->name, + slave->dev->name, + slave->link); goto out; } /* end of switch (slave->link) */ From radheka.godse@intel.com Fri Jul 1 13:55:25 2005 Received: with ECARTIS (v1.0.0; list netdev); Fri, 01 Jul 2005 13:55:30 -0700 (PDT) Received: from orsfmr004.jf.intel.com (fmr19.intel.com [134.134.136.18]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j61KtOH9010859 for ; Fri, 1 Jul 2005 13:55:25 -0700 Received: from orsfmr101.jf.intel.com (orsfmr101.jf.intel.com [10.7.209.17]) by orsfmr004.jf.intel.com (8.12.10/8.12.10/d: major-outer.mc,v 1.1 2004/09/17 17:50:56 root Exp $) with ESMTP id j61Krmkn020954; Fri, 1 Jul 2005 20:53:48 GMT Received: from nwlxmail01.jf.intel.com (nwlxmail01.jf.intel.com [10.7.171.40]) by orsfmr101.jf.intel.com (8.12.10/8.12.10/d: major-inner.mc,v 1.2 2004/09/17 18:05:01 root Exp $) with ESMTP id j61Krm7G002460; Fri, 1 Jul 2005 20:53:48 GMT Received: from [134.134.3.92] ([134.134.3.92]) by nwlxmail01.jf.intel.com (8.12.10/8.12.9/MailSET/Hub) with ESMTP id j61KrmSL003920; Fri, 1 Jul 2005 13:53:48 -0700 Date: Fri, 1 Jul 2005 13:53:01 -0700 (PDT) From: Radheka Godse X-X-Sender: radheka@localhost.localdomain To: fubar@us.ibm.com, bonding-devel@lists.sourceforge.net cc: netdev@oss.sgi.com Subject: [PATCH 2.6.13-rc1 12/17] bonding: prefix bondname to log messages Message-ID: ReplyTo: "Radheka Godse" MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed X-Scanned-By: MIMEDefang 2.44 X-archive-position: 2603 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: radheka.godse@intel.com Precedence: bulk X-list: netdev This patch adds : prefix to logs printed to /var/log/messages to identify messages on a per bond basis. Signed-off-by: Radheka Godse Signed-off-by: Mitch Williams diff -urN -X dontdiff linux-2.6.12post/drivers/net/bonding/bond_3ad.c linux-2.6.12post-sysfs/drivers/net/bonding/bond_3ad.c --- linux-2.6.12post/drivers/net/bonding/bond_3ad.c 2005-06-28 18:18:03.000000000 -0700 +++ linux-2.6.12post-sysfs/drivers/net/bonding/bond_3ad.c 2005-06-28 18:21:34.000000000 -0700 @@ -1378,8 +1378,9 @@ } } if (!curr_port) { // meaning: the port was related to an aggregator but was not on the aggregator port list - printk(KERN_WARNING DRV_NAME ": Warning: Port %d (on %s) was " + printk(KERN_WARNING DRV_NAME ": %s: Warning: Port %d (on %s) was " "related to aggregator %d but was not on its port list\n", + port->slave->dev->master->name, port->actor_port_number, port->slave->dev->name, port->aggregator->aggregator_identifier); } @@ -1450,7 +1451,8 @@ dprintk("Port %d joined LAG %d(new LAG)\n", port->actor_port_number, port->aggregator->aggregator_identifier); } else { - printk(KERN_ERR DRV_NAME ": Port %d (on %s) did not find a suitable aggregator\n", + printk(KERN_ERR DRV_NAME ": %s: Port %d (on %s) did not find a suitable aggregator\n", + port->slave->dev->master->name, port->actor_port_number, port->slave->dev->name); } } @@ -1582,8 +1584,9 @@ // check if any partner replys if (best_aggregator->is_individual) { - printk(KERN_WARNING DRV_NAME ": Warning: No 802.3ad response from the link partner " - "for any adapters in the bond\n"); + printk(KERN_WARNING DRV_NAME ": %s: Warning: No 802.3ad response from " + "the link partner for any adapters in the bond\n", + best_aggregator->slave->dev->master->name); } // check if there are more than one aggregator @@ -1991,7 +1994,9 @@ // if slave is null, the whole port is not initialized if (!port->slave) { - printk(KERN_WARNING DRV_NAME ": Trying to unbind an uninitialized port on %s\n", slave->dev->name); + printk(KERN_WARNING DRV_NAME ": Warning: %s: Trying to " + "unbind an uninitialized port on %s\n", + slave->dev->master->name, slave->dev->name); return; } @@ -2022,7 +2027,8 @@ dprintk("Some port(s) related to LAG %d - replaceing with LAG %d\n", aggregator->aggregator_identifier, new_aggregator->aggregator_identifier); if ((new_aggregator->lag_ports == port) && new_aggregator->is_active) { - printk(KERN_INFO DRV_NAME ": Removing an active aggregator\n"); + printk(KERN_INFO DRV_NAME ": %s: Removing an active aggregator\n", + aggregator->slave->dev->master->name); // select new active aggregator select_new_active_agg = 1; } @@ -2052,15 +2058,17 @@ ad_agg_selection_logic(__get_first_agg(port)); } } else { - printk(KERN_WARNING DRV_NAME ": Warning: unbinding aggregator, " - "and could not find a new aggregator for its ports\n"); + printk(KERN_WARNING DRV_NAME ": %s: Warning: unbinding aggregator, " + "and could not find a new aggregator for its ports\n", + slave->dev->master->name); } } else { // in case that the only port related to this aggregator is the one we want to remove select_new_active_agg = aggregator->is_active; // clear the aggregator ad_clear_agg(aggregator); if (select_new_active_agg) { - printk(KERN_INFO "Removing an active aggregator\n"); + printk(KERN_INFO DRV_NAME ": %s: Removing an active aggregator\n", + slave->dev->master->name); // select new active aggregator ad_agg_selection_logic(__get_first_agg(port)); } @@ -2133,7 +2141,8 @@ // select the active aggregator for the bond if ((port = __get_first_port(bond))) { if (!port->slave) { - printk(KERN_WARNING DRV_NAME ": Warning: bond's first port is uninitialized\n"); + printk(KERN_WARNING DRV_NAME ": %s: Warning: bond's first port is " + "uninitialized\n", bond->dev->name); goto re_arm; } @@ -2145,7 +2154,8 @@ // for each port run the state machines for (port = __get_first_port(bond); port; port = __get_next_port(port)) { if (!port->slave) { - printk(KERN_WARNING DRV_NAME ": Warning: Found an uninitialized port\n"); + printk(KERN_WARNING DRV_NAME ": %s: Warning: Found an uninitialized " + "port\n", bond->dev->name); goto re_arm; } @@ -2186,7 +2196,8 @@ port = &(SLAVE_AD_INFO(slave).port); if (!port->slave) { - printk(KERN_WARNING DRV_NAME ": Warning: port of slave %s is uninitialized\n", slave->dev->name); + printk(KERN_WARNING DRV_NAME ": %s: Warning: port of slave %s is " + "uninitialized\n", slave->dev->name, slave->dev->master->name); return; } @@ -2289,8 +2300,9 @@ // if slave is null, the whole port is not initialized if (!port->slave) { - printk(KERN_WARNING DRV_NAME ": Warning: link status changed for uninitialized port on %s\n", - slave->dev->name); + printk(KERN_WARNING DRV_NAME ": Warning: %s: link status changed for " + "uninitialized port on %s\n", + slave->dev->master->name, slave->dev->name); return; } @@ -2397,7 +2409,8 @@ } if (slave_agg_no >= 0) { - printk(KERN_ERR DRV_NAME ": Error: Couldn't find a slave to tx on for aggregator ID %d\n", agg_id); + printk(KERN_ERR DRV_NAME ": %s: Error: Couldn't find a slave to tx on " + "for aggregator ID %d\n", dev->name, agg_id); goto out; } diff -urN -X dontdiff linux-2.6.12post/drivers/net/bonding/bond_alb.c linux-2.6.12post-sysfs/drivers/net/bonding/bond_alb.c --- linux-2.6.12post/drivers/net/bonding/bond_alb.c 2005-06-17 12:48:29.000000000 -0700 +++ linux-2.6.12post-sysfs/drivers/net/bonding/bond_alb.c 2005-06-28 18:21:35.000000000 -0700 @@ -515,7 +515,8 @@ client_info->mac_dst); if (!skb) { printk(KERN_ERR DRV_NAME - ": Error: failed to create an ARP packet\n"); + ": %s: Error: failed to create an ARP packet\n", + client_info->slave->dev->master->name); continue; } @@ -525,7 +526,8 @@ skb = vlan_put_tag(skb, client_info->vlan_id); if (!skb) { printk(KERN_ERR DRV_NAME - ": Error: failed to insert VLAN tag\n"); + ": %s: Error: failed to insert VLAN tag\n", + client_info->slave->dev->master->name); continue; } } @@ -608,8 +610,9 @@ if (!client_info->slave) { printk(KERN_ERR DRV_NAME - ": Error: found a client with no channel in " - "the client's hash table\n"); + ": %s: Error: found a client with no channel in " + "the client's hash table\n", + bond->dev->name); continue; } /*update all clients using this src_ip, that are not assigned @@ -930,7 +933,8 @@ skb = vlan_put_tag(skb, vlan->vlan_id); if (!skb) { printk(KERN_ERR DRV_NAME - ": Error: failed to insert VLAN tag\n"); + ": %s: Error: failed to insert VLAN tag\n", + bond->dev->name); continue; } } @@ -959,11 +963,11 @@ s_addr.sa_family = dev->type; if (dev_set_mac_address(dev, &s_addr)) { printk(KERN_ERR DRV_NAME - ": Error: dev_set_mac_address of dev %s failed! ALB " + ": %s: Error: dev_set_mac_address of dev %s failed! ALB " "mode requires that the base driver support setting " "the hw address also when the network device's " "interface is open\n", - dev->name); + dev->master->name, dev->name); return -EOPNOTSUPP; } return 0; @@ -1113,9 +1117,9 @@ * of the new slave */ printk(KERN_ERR DRV_NAME - ": Error: the hw address of slave %s is not " + ": %s: Error: the hw address of slave %s is not " "unique - cannot enslave it!", - slave->dev->name); + bond->dev->name, slave->dev->name); return -EINVAL; } @@ -1161,16 +1165,16 @@ bond->alb_info.rlb_enabled); printk(KERN_WARNING DRV_NAME - ": Warning: the hw address of slave %s is in use by " + ": %s: Warning: the hw address of slave %s is in use by " "the bond; giving it the hw address of %s\n", - slave->dev->name, free_mac_slave->dev->name); + bond->dev->name, slave->dev->name, free_mac_slave->dev->name); } else if (has_bond_addr) { printk(KERN_ERR DRV_NAME - ": Error: the hw address of slave %s is in use by the " + ": %s: Error: the hw address of slave %s is in use by the " "bond; couldn't find a slave with a free hw address to " "give it (this should not have happened)\n", - slave->dev->name); + bond->dev->name, slave->dev->name); return -EFAULT; } diff -urN -X dontdiff linux-2.6.12post/drivers/net/bonding/bond_main.c linux-2.6.12post-sysfs/drivers/net/bonding/bond_main.c --- linux-2.6.12post/drivers/net/bonding/bond_main.c 2005-06-28 18:18:03.000000000 -0700 +++ linux-2.6.12post-sysfs/drivers/net/bonding/bond_main.c 2005-06-30 13:53:55.000000000 -0700 @@ -1621,8 +1621,8 @@ if (slave_dev->do_ioctl == NULL) { printk(KERN_WARNING DRV_NAME - ": Warning : no link monitoring support for %s\n", - slave_dev->name); + ": %s: Warning: no link monitoring support for %s\n", + bond_dev->name, slave_dev->name); } /* bond must be initialized by bond_open() before enslaving */ @@ -1643,17 +1643,17 @@ dprintk("%s: NETIF_F_VLAN_CHALLENGED\n", slave_dev->name); if (!list_empty(&bond->vlan_list)) { printk(KERN_ERR DRV_NAME - ": Error: cannot enslave VLAN " + ": %s: Error: cannot enslave VLAN " "challenged slave %s on VLAN enabled " - "bond %s\n", slave_dev->name, + "bond %s\n", bond_dev->name, slave_dev->name, bond_dev->name); return -EPERM; } else { printk(KERN_WARNING DRV_NAME - ": Warning: enslaved VLAN challenged " + ": %s: Warning: enslaved VLAN challenged " "slave %s. Adding VLANs will be blocked as " "long as %s is part of bond %s\n", - slave_dev->name, slave_dev->name, + bond_dev->name, slave_dev->name, slave_dev->name, bond_dev->name); bond_dev->features |= NETIF_F_VLAN_CHALLENGED; } @@ -1705,8 +1705,8 @@ */ if (!(slave_dev->flags & IFF_UP)) { printk(KERN_ERR DRV_NAME - ": Error: %s is not running\n", - slave_dev->name); + ": %s: Error: %s is not running\n", + bond_dev->name, slave_dev->name); res = -EINVAL; goto err_undo_flags; } @@ -1715,9 +1715,9 @@ (bond->params.mode == BOND_MODE_TLB) || (bond->params.mode == BOND_MODE_ALB)) { printk(KERN_ERR DRV_NAME - ": Error: to use %s mode, you must upgrade " + ": %s: Error: to use %s mode, you must upgrade " "ifenslave.\n", - bond_mode_name(bond->params.mode)); + bond_dev->name, bond_mode_name(bond->params.mode)); res = -EOPNOTSUPP; goto err_undo_flags; } @@ -1838,21 +1838,21 @@ * the messages for netif_carrier. */ printk(KERN_WARNING DRV_NAME - ": Warning: MII and ETHTOOL support not " + ": %s: Warning: MII and ETHTOOL support not " "available for interface %s, and " "arp_interval/arp_ip_target module parameters " "not specified, thus bonding will not detect " "link failures! see bonding.txt for details.\n", - slave_dev->name); + bond_dev->name, slave_dev->name); } else if (link_reporting == -1) { /* unable get link status using mii/ethtool */ printk(KERN_WARNING DRV_NAME - ": Warning: can't get link status from " + ": %s: Warning: can't get link status from " "interface %s; the network driver associated " "with this interface does not support MII or " "ETHTOOL link status reporting, thus miimon " "has no effect on this interface.\n", - slave_dev->name); + bond_dev->name, slave_dev->name); } } @@ -1879,9 +1894,9 @@ if (bond_update_speed_duplex(new_slave) && (new_slave->link != BOND_LINK_DOWN)) { printk(KERN_WARNING DRV_NAME - ": Warning: failed to get speed and duplex from %s, " + ": %s: Warning: failed to get speed and duplex from %s, " "assumed to be 100Mb/sec and Full.\n", - new_slave->dev->name); + bond_dev->name, new_slave->dev->name); if (bond->params.mode == BOND_MODE_8023AD) { printk(KERN_WARNING DRV_NAME @@ -2080,11 +2080,12 @@ ETH_ALEN); if (!mac_addr_differ && (bond->slave_cnt > 1)) { printk(KERN_WARNING DRV_NAME - ": Warning: the permanent HWaddr of %s " + ": %s: Warning: the permanent HWaddr of %s " "- %02X:%02X:%02X:%02X:%02X:%02X - is " "still in use by %s. Set the HWaddr of " "%s to a different address to avoid " "conflicts.\n", + bond_dev->name, slave_dev->name, slave->perm_hwaddr[0], slave->perm_hwaddr[1], @@ -2158,19 +2168,20 @@ bond_dev->features |= NETIF_F_VLAN_CHALLENGED; } else { printk(KERN_WARNING DRV_NAME - ": Warning: clearing HW address of %s while it " + ": %s: Warning: clearing HW address of %s while it " "still has VLANs.\n", - bond_dev->name); + bond_dev->name, bond_dev->name); printk(KERN_WARNING DRV_NAME - ": When re-adding slaves, make sure the bond's " - "HW address matches its VLANs'.\n"); + ": %s: When re-adding slaves, make sure the bond's " + "HW address matches its VLANs'.\n", + bond_dev->name); } } else if ((bond_dev->features & NETIF_F_VLAN_CHALLENGED) && !bond_has_challenged_slaves(bond)) { printk(KERN_INFO DRV_NAME - ": last VLAN challenged slave %s " + ": %s: last VLAN challenged slave %s " "left bond %s. VLAN blocking is removed\n", - slave_dev->name, bond_dev->name); + bond_dev->name, slave_dev->name, bond_dev->name); bond_dev->features &= ~NETIF_F_VLAN_CHALLENGED; } @@ -2327,12 +2329,13 @@ bond_dev->features |= NETIF_F_VLAN_CHALLENGED; } else { printk(KERN_WARNING DRV_NAME - ": Warning: clearing HW address of %s while it " + ": %s: Warning: clearing HW address of %s while it " "still has VLANs.\n", - bond_dev->name); + bond_dev->name, bond_dev->name); printk(KERN_WARNING DRV_NAME - ": When re-adding slaves, make sure the bond's " - "HW address matches its VLANs'.\n"); + ": %s: When re-adding slaves, make sure the bond's " + "HW address matches its VLANs'.\n", + bond_dev->name); } printk(KERN_INFO DRV_NAME @@ -2424,8 +2427,9 @@ &endptr, 0); if (*endptr) { printk(KERN_ERR DRV_NAME - ": Error: got invalid ABI " - "version from application\n"); + ": %s: Error: got invalid ABI " + "version from application\n", + bond_dev->name); return -EINVAL; } @@ -4496,8 +4500,9 @@ struct sk_buff *skb2 = skb_clone(skb, GFP_ATOMIC); if (!skb2) { printk(KERN_ERR DRV_NAME - ": Error: bond_xmit_broadcast(): " - "skb_clone() failed\n"); + ": %s: Error: bond_xmit_broadcast(): " + "skb_clone() failed\n", + bond_dev->name); continue; } @@ -4566,7 +4571,8 @@ default: /* Should never happen, mode already checked */ printk(KERN_ERR DRV_NAME - ": Error: Unknown bonding mode %d\n", + ": %s: Error: Unknown bonding mode %d\n", + bond_dev->name, mode); break; } From radheka.godse@intel.com Fri Jul 1 13:56:40 2005 Received: with ECARTIS (v1.0.0; list netdev); Fri, 01 Jul 2005 13:56:44 -0700 (PDT) Received: from orsfmr004.jf.intel.com (fmr19.intel.com [134.134.136.18]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j61KuKH9011318 for ; Fri, 1 Jul 2005 13:56:40 -0700 Received: from orsfmr100.jf.intel.com (orsfmr100.jf.intel.com [10.7.209.16]) by orsfmr004.jf.intel.com (8.12.10/8.12.10/d: major-outer.mc,v 1.1 2004/09/17 17:50:56 root Exp $) with ESMTP id j61Ksikn021391; Fri, 1 Jul 2005 20:54:44 GMT Received: from nwlxmail01.jf.intel.com (nwlxmail01.jf.intel.com [10.7.171.40]) by orsfmr100.jf.intel.com (8.12.10/8.12.10/d: major-inner.mc,v 1.2 2004/09/17 18:05:01 root Exp $) with ESMTP id j61KsiPb004944; Fri, 1 Jul 2005 20:54:44 GMT Received: from [134.134.3.92] ([134.134.3.92]) by nwlxmail01.jf.intel.com (8.12.10/8.12.9/MailSET/Hub) with ESMTP id j61KshSL004040; Fri, 1 Jul 2005 13:54:44 -0700 Date: Fri, 1 Jul 2005 13:53:51 -0700 (PDT) From: Radheka Godse X-X-Sender: radheka@localhost.localdomain To: fubar@us.ibm.com, bonding-devel@lists.sourceforge.net cc: netdev@oss.sgi.com Subject: [PATCH 2.6.13-rc1 13/17] bonding: make error messages more consistent Message-ID: ReplyTo: "Radheka Godse" MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed X-Scanned-By: MIMEDefang 2.44 X-archive-position: 2604 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: radheka.godse@intel.com Precedence: bulk X-list: netdev This patch attempts to make error reporting more consistent, removes/adds newlines as appropriate, adds bonding: : prefix if missing (especially in alb, ad) messages. Signed-off-by: Radheka Godse Signed-off-by: Mitch Williams diff -urN -X dontdiff linux-2.6.12post/drivers/net/bonding/bond_3ad.c linux-2.6.12post-sysfs/drivers/net/bonding/bond_3ad.c --- linux-2.6.12post/drivers/net/bonding/bond_3ad.c 2005-06-28 18:18:03.000000000 -0700 +++ linux-2.6.12post-sysfs/drivers/net/bonding/bond_3ad.c 2005-06-28 18:21:34.000000000 -0700 @@ -1198,10 +1198,10 @@ // detect loopback situation if (!MAC_ADDRESS_COMPARE(&(lacpdu->actor_system), &(port->actor_system))) { // INFO_RECEIVED_LOOPBACK_FRAMES - printk(KERN_ERR DRV_NAME ": An illegal loopback occurred on adapter (%s)\n", - port->slave->dev->name); - printk(KERN_ERR "Check the configuration to verify that all Adapters " - "are connected to 802.3ad compliant switch ports\n"); + printk(KERN_ERR DRV_NAME ": %s: An illegal loopback occurred on " + "adapter (%s). Check the configuration to verify that all " + "Adapters are connected to 802.3ad compliant switch ports\n", + port->slave->dev->master->name, port->slave->dev->name); __release_rx_machine_lock(port); return; } diff -urN -X dontdiff linux-2.6.12post/drivers/net/bonding/bond_alb.c linux-2.6.12post-sysfs/drivers/net/bonding/bond_alb.c --- linux-2.6.12post/drivers/net/bonding/bond_alb.c 2005-06-17 12:48:29.000000000 -0700 +++ linux-2.6.12post-sysfs/drivers/net/bonding/bond_alb.c 2005-06-28 18:21:35.000000000 -0700 @@ -206,7 +206,7 @@ new_hashtbl = kmalloc(size, GFP_KERNEL); if (!new_hashtbl) { printk(KERN_ERR DRV_NAME - ": Error: %s: Failed to allocate TLB hash table\n", + ": %s: Error: Failed to allocate TLB hash table\n", bond->dev->name); return -1; } @@ -811,7 +798,7 @@ new_hashtbl = kmalloc(size, GFP_KERNEL); if (!new_hashtbl) { printk(KERN_ERR DRV_NAME - ": Error: %s: Failed to allocate RLB hash table\n", + ": %s: Error: Failed to allocate RLB hash table\n", bond->dev->name); return -1; } diff -urN -X dontdiff linux-2.6.12post/drivers/net/bonding/bond_main.c linux-2.6.12post-sysfs/drivers/net/bonding/bond_main.c --- linux-2.6.12post/drivers/net/bonding/bond_main.c 2005-06-28 18:18:03.000000000 -0700 +++ linux-2.6.12post-sysfs/drivers/net/bonding/bond_main.c 2005-06-30 13:53:55.000000000 -0700 @@ -919,7 +919,7 @@ res = bond_add_vlan(bond, vid); if (res) { printk(KERN_ERR DRV_NAME - ": %s: Failed to add vlan id %d\n", + ": %s: Error: Failed to add vlan id %d\n", bond_dev->name, vid); } } @@ -953,7 +953,7 @@ res = bond_del_vlan(bond, vid); if (res) { printk(KERN_ERR DRV_NAME - ": %s: Failed to remove vlan id %d\n", + ": %s: Error: Failed to remove vlan id %d\n", bond_dev->name, vid); } } @@ -1690,11 +1698,10 @@ if (slave_dev->set_mac_address == NULL) { printk(KERN_ERR DRV_NAME - ": Error: The slave device you specified does " - "not support setting the MAC address.\n"); - printk(KERN_ERR + ": %s: Error: The slave device you specified does " + "not support setting the MAC address. " "Your kernel likely does not support slave " - "devices.\n"); + "devices.\n", bond_dev->name); res = -EOPNOTSUPP; goto err_undo_flags; @@ -2059,7 +2058,7 @@ if (!(slave_dev->flags & IFF_SLAVE) || (slave_dev->master != bond_dev)) { printk(KERN_ERR DRV_NAME - ": Error: %s: cannot release %s.\n", + ": %s: Error: cannot release %s.\n", bond_dev->name, slave_dev->name); return -EINVAL; } @@ -4758,7 +4757,7 @@ if (max_bonds < 1 || max_bonds > INT_MAX) { printk(KERN_WARNING DRV_NAME ": Warning: max_bonds (%d) not in range %d-%d, so it " - "was reset to BOND_DEFAULT_MAX_BONDS (%d)", + "was reset to BOND_DEFAULT_MAX_BONDS (%d)\n", max_bonds, 1, INT_MAX, BOND_DEFAULT_MAX_BONDS); max_bonds = BOND_DEFAULT_MAX_BONDS; } From radheka.godse@intel.com Fri Jul 1 13:57:41 2005 Received: with ECARTIS (v1.0.0; list netdev); Fri, 01 Jul 2005 13:57:44 -0700 (PDT) Received: from orsfmr005.jf.intel.com (fmr20.intel.com [134.134.136.19]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j61KvfH9011902 for ; Fri, 1 Jul 2005 13:57:41 -0700 Received: from orsfmr101.jf.intel.com (orsfmr101.jf.intel.com [10.7.209.17]) by orsfmr005.jf.intel.com (8.12.10/8.12.10/d: major-outer.mc,v 1.1 2004/09/17 17:50:56 root Exp $) with ESMTP id j61Ku5tp009751; Fri, 1 Jul 2005 20:56:05 GMT Received: from nwlxmail01.jf.intel.com (nwlxmail01.jf.intel.com [10.7.171.40]) by orsfmr101.jf.intel.com (8.12.10/8.12.10/d: major-inner.mc,v 1.2 2004/09/17 18:05:01 root Exp $) with ESMTP id j61Ku57G003690; Fri, 1 Jul 2005 20:56:05 GMT Received: from [134.134.3.92] ([134.134.3.92]) by nwlxmail01.jf.intel.com (8.12.10/8.12.9/MailSET/Hub) with ESMTP id j61Ku0SL004186; Fri, 1 Jul 2005 13:56:04 -0700 Date: Fri, 1 Jul 2005 13:55:08 -0700 (PDT) From: Radheka Godse X-X-Sender: radheka@localhost.localdomain To: fubar@us.ibm.com, bonding-devel@lists.sourceforge.net cc: netdev@oss.sgi.com Subject: [PATCH 2.6.13-rc1 14/17] bonding: spelling and whitespace correction Message-ID: ReplyTo: "Radheka Godse" MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed X-Scanned-By: MIMEDefang 2.44 X-archive-position: 2605 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: radheka.godse@intel.com Precedence: bulk X-list: netdev Trivial patch to fix spelling errors and some white spaces changes for readabilitiy. Signed-off-by: Radheka Godse Signed-off-by: Mitch Williams diff -urN -X dontdiff linux-2.6.12post/drivers/net/bonding/bond_alb.c linux-2.6.12post-sysfs/drivers/net/bonding/bond_alb.c --- linux-2.6.12post/drivers/net/bonding/bond_alb.c 2005-06-17 12:48:29.000000000 -0700 +++ linux-2.6.12post-sysfs/drivers/net/bonding/bond_alb.c 2005-06-28 18:21:35.000000000 -0700 @@ -1423,7 +1423,7 @@ read_lock(&bond->curr_slave_lock); bond_for_each_slave(bond, slave, i) { - alb_send_learning_packets(slave,slave->dev->dev_addr); + alb_send_learning_packets(slave, slave->dev->dev_addr); } read_unlock(&bond->curr_slave_lock); diff -urN -X dontdiff linux-2.6.12post/drivers/net/bonding/bonding.h linux-2.6.12post-sysfs/drivers/net/bonding/bonding.h --- linux-2.6.12post/drivers/net/bonding/bonding.h 2005-06-28 18:18:03.000000000 -0700 +++ linux-2.6.12post-sysfs/drivers/net/bonding/bonding.h 2005-06-30 13:58:27.000000000 -0700 @@ -165,7 +165,7 @@ }; struct slave { - struct net_device *dev; /* first - usefull for panic debug */ + struct net_device *dev; /* first - useful for panic debug */ struct slave *next; struct slave *prev; s16 delay; @@ -191,7 +191,7 @@ * beforehand. */ struct bonding { - struct net_device *dev; /* first - usefull for panic debug */ + struct net_device *dev; /* first - useful for panic debug */ struct slave *first_slave; struct slave *curr_active_slave; struct slave *current_arp_slave; diff -urN -X dontdiff linux-2.6.12post/drivers/net/bonding/bond_main.c linux-2.6.12post-sysfs/drivers/net/bonding/bond_main.c --- linux-2.6.12post/drivers/net/bonding/bond_main.c 2005-06-28 18:18:03.000000000 -0700 +++ linux-2.6.12post-sysfs/drivers/net/bonding/bond_main.c 2005-06-30 13:53:55.000000000 -0700 @@ -2117,7 +2117,6 @@ /* release the slave from its bond */ bond_detach_slave(bond, slave); - if (bond->primary_slave == slave) { bond->primary_slave = NULL; } @@ -2436,7 +2435,6 @@ if (orig_app_abi_ver == -1) { orig_app_abi_ver = new_abi_ver; } - app_abi_ver = new_abi_ver; } @@ -4226,6 +4224,7 @@ bond_for_each_slave(bond, slave, i) { dprintk("s %p s->p %p c_m %p\n", slave, slave->prev, slave->dev->change_mtu); + res = dev_set_mtu(slave->dev, new_mtu); if (res) { @@ -4623,7 +4622,7 @@ */ bond_dev->features |= NETIF_F_VLAN_CHALLENGED; - /* don't acquire bond device's xmit_lock when + /* don't acquire bond device's xmit_lock when * transmitting */ bond_dev->features |= NETIF_F_LLTX; @@ -5035,7 +4976,6 @@ #ifdef CONFIG_PROC_FS bond_create_proc_dir(); #endif - for (i = 0; i < max_bonds; i++) { sprintf(new_bond_name, "bond%d",i); res = bond_create(new_bond_name,&bonding_defaults, NULL); From radheka.godse@intel.com Fri Jul 1 14:11:46 2005 Received: with ECARTIS (v1.0.0; list netdev); Fri, 01 Jul 2005 14:11:55 -0700 (PDT) Received: from orsfmr004.jf.intel.com (fmr19.intel.com [134.134.136.18]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j61LBOH9013745 for ; Fri, 1 Jul 2005 14:11:46 -0700 Received: from orsfmr101.jf.intel.com (orsfmr101.jf.intel.com [10.7.209.17]) by orsfmr004.jf.intel.com (8.12.10/8.12.10/d: major-outer.mc,v 1.1 2004/09/17 17:50:56 root Exp $) with ESMTP id j61L9kkn028420; Fri, 1 Jul 2005 21:09:46 GMT Received: from nwlxmail01.jf.intel.com (nwlxmail01.jf.intel.com [10.7.171.40]) by orsfmr101.jf.intel.com (8.12.10/8.12.10/d: major-inner.mc,v 1.2 2004/09/17 18:05:01 root Exp $) with ESMTP id j61L9k7G011307; Fri, 1 Jul 2005 21:09:46 GMT Received: from [134.134.3.92] ([134.134.3.92]) by nwlxmail01.jf.intel.com (8.12.10/8.12.9/MailSET/Hub) with ESMTP id j61L9jSL006114; Fri, 1 Jul 2005 14:09:46 -0700 Date: Fri, 1 Jul 2005 14:08:52 -0700 (PDT) From: Radheka Godse X-X-Sender: radheka@localhost.localdomain To: fubar@us.ibm.com, bonding-devel@lists.sourceforge.net cc: netdev@oss.sgi.com Subject: [PATCH 2.6.13-rc1 15/17] bonding: include ARP and Xmit Hash Policy information in /proc file Message-ID: ReplyTo: "Radheka Godse" MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed X-Scanned-By: MIMEDefang 2.44 X-archive-position: 2606 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: radheka.godse@intel.com Precedence: bulk X-list: netdev For bonds configured to do ARP monitoring, this patch displays polling interval and ip targets info in their respective proc files. For balance-XOR and 802.3ad modes, this patch displays Xmit Hash Policy. Signed-off-by: Radheka Godse Signed-off-by: Mitch Williams diff -urN -X dontdiff linux-2.6.12post/drivers/net/bonding/bond_main.c linux-2.6.12post-sysfs/drivers/net/bonding/bond_main.c --- linux-2.6.12post/drivers/net/bonding/bond_main.c 2005-06-28 18:18:03.000000000 -0700 +++ linux-2.6.12post-sysfs/drivers/net/bonding/bond_main.c 2005-06-30 13:53:55.000000000 -0700 @@ -3370,6 +3370,8 @@ { struct bonding *bond = seq->private; struct slave *curr; + int i; + u32 target; read_lock(&bond->curr_slave_lock); curr = bond->curr_active_slave; @@ -3378,6 +3366,13 @@ seq_printf(seq, "Bonding Mode: %s\n", bond_mode_name(bond->params.mode)); + if (bond->params.mode == BOND_MODE_XOR || + bond->params.mode == BOND_MODE_8023AD) { + seq_printf(seq, "Transmit Hash Policy: %s (%d)\n", + xmit_hashtype_tbl[bond->params.xmit_policy].modename, + bond->params.xmit_policy); + } + if (USES_PRIMARY(bond->params.mode)) { seq_printf(seq, "Primary Slave: %s\n", (bond->primary_slave) ? @@ -3394,6 +3403,24 @@ seq_printf(seq, "Down Delay (ms): %d\n", bond->params.downdelay * bond->params.miimon); + + // ARP information + if(bond->params.arp_interval > 0) { + seq_printf(seq, "ARP Polling Interval (ms): %d\n", + bond->params.arp_interval); + + seq_printf(seq, "ARP IP target/s (n.n.n.n form):"); + + for(i = 0; (i < BOND_MAX_ARP_TARGETS) && bond->params.arp_targets[i] ;i++) { + target = ntohl(bond->params.arp_targets[i]); + seq_printf(seq, " %d.%d.%d.%d", HIPQUAD(target)); + if((i+1 < BOND_MAX_ARP_TARGETS) && bond->params.arp_targets[i+1]) + seq_printf(seq, ","); + else + seq_printf(seq, "\n"); + } + } + if (bond->params.mode == BOND_MODE_8023AD) { struct ad_info ad_info; From radheka.godse@intel.com Fri Jul 1 14:12:41 2005 Received: with ECARTIS (v1.0.0; list netdev); Fri, 01 Jul 2005 14:12:45 -0700 (PDT) Received: from orsfmr005.jf.intel.com (fmr20.intel.com [134.134.136.19]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j61LCfH9014049 for ; Fri, 1 Jul 2005 14:12:41 -0700 Received: from orsfmr101.jf.intel.com (orsfmr101.jf.intel.com [10.7.209.17]) by orsfmr005.jf.intel.com (8.12.10/8.12.10/d: major-outer.mc,v 1.1 2004/09/17 17:50:56 root Exp $) with ESMTP id j61LB5tp012564; Fri, 1 Jul 2005 21:11:05 GMT Received: from nwlxmail01.jf.intel.com (nwlxmail01.jf.intel.com [10.7.171.40]) by orsfmr101.jf.intel.com (8.12.10/8.12.10/d: major-inner.mc,v 1.2 2004/09/17 18:05:01 root Exp $) with ESMTP id j61LB57G013097; Fri, 1 Jul 2005 21:11:05 GMT Received: from [134.134.3.92] ([134.134.3.92]) by nwlxmail01.jf.intel.com (8.12.10/8.12.9/MailSET/Hub) with ESMTP id j61LB5SL006345; Fri, 1 Jul 2005 14:11:05 -0700 Date: Fri, 1 Jul 2005 14:10:17 -0700 (PDT) From: Radheka Godse X-X-Sender: radheka@localhost.localdomain To: fubar@us.ibm.com, bonding-devel@lists.sourceforge.net cc: netdev@oss.sgi.com Subject: [PATCH 2.6.13-rc1 16/17] bonding: version, date and log update Message-ID: ReplyTo: "Radheka Godse" MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed X-Scanned-By: MIMEDefang 2.44 X-archive-position: 2607 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: radheka.godse@intel.com Precedence: bulk X-list: netdev This patch updates the bonding version number and adds a few entries to the change log in bond_main. The major version number is changed to 3 because of the sysfs interface. Signed-off-by: Radheka Godse Signed-off-by: Mitch Williams diff -urN -X dontdiff linux-2.6.12post/drivers/net/bonding/bonding.h linux-2.6.12post-sysfs/drivers/net/bonding/bonding.h --- linux-2.6.12post/drivers/net/bonding/bonding.h 2005-06-28 18:18:03.000000000 -0700 +++ linux-2.6.12post-sysfs/drivers/net/bonding/bonding.h 2005-06-30 13:58:27.000000000 -0700 @@ -29,6 +29,10 @@ * 2005/05/05 - Jason Gabler * - added "xmit_policy" kernel parameter for alternate hashing policy * support for mode 2 + * + * 2005/06/21 - Mitch Williams + * Radheka Godse + * - Added bonding sysfs interface */ #ifndef _LINUX_BONDING_H @@ -41,8 +37,8 @@ #include "bond_3ad.h" #include "bond_alb.h" -#define DRV_VERSION "2.6.3" -#define DRV_RELDATE "June 8, 2005" +#define DRV_VERSION "3.0.0" +#define DRV_RELDATE "June 28, 2005" #define DRV_NAME "bonding" #define DRV_DESCRIPTION "Ethernet Channel Bonding Driver" diff -urN -X dontdiff linux-2.6.12post/drivers/net/bonding/bond_main.c linux-2.6.12post-sysfs/drivers/net/bonding/bond_main.c --- linux-2.6.12post/drivers/net/bonding/bond_main.c 2005-06-28 18:18:03.000000000 -0700 +++ linux-2.6.12post-sysfs/drivers/net/bonding/bond_main.c 2005-06-30 13:53:55.000000000 -0700 @@ -487,7 +487,36 @@ * * Added xmit_hash_policy_layer34() * - Modified by Jay Vosburgh to also support mode 4. * Set version to 2.6.3. - */ + * 2005/06/24 - Mitch Williams + * - Radheka Godse + * - Added bonding sysfs interface + * + * - pre-work: + * - Split out bond creation code to allow for future addition of + * sysfs interface. + * - Added extra optional parameter to bond_enslave to return a + * a pointer to the new slave. This will be used by future + * sysfs functionality. + * - Removed static declaration on some functions and data items. + * + * - Added sysfs support, including capability to add/remove/change + * any bond at runtime. + * + * - Miscellaneous: + * - Added bonding: : prefix to sysfs log messages + * - added arp_ip_targets to /proc entry + * - trivial fix: added missing modes description to modinfo + * - Corrected bug in ALB init where kmalloc is called inside + * a held lock + * - Corrected behavior to maintain bond link when changing + * from arp monitor to miimon and vice versa + * - Added missing bonding: : prefix to alb, ad log messages + * - Fixed stack dump warnings seen if changing between miimon + * and arp monitoring when the bond interface is down. + * - Fixed stack dump warnings seen when enslaving an e100 + * driver + * Set version to 3.0.0 +*/ //#define BONDING_DEBUG 1 From radheka.godse@intel.com Fri Jul 1 14:16:24 2005 Received: with ECARTIS (v1.0.0; list netdev); Fri, 01 Jul 2005 14:16:26 -0700 (PDT) Received: from orsfmr002.jf.intel.com (fmr17.intel.com [134.134.136.16]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j61LGNH9015503 for ; Fri, 1 Jul 2005 14:16:23 -0700 Received: from orsfmr100.jf.intel.com (orsfmr100.jf.intel.com [10.7.209.16]) by orsfmr002.jf.intel.com (8.12.10/8.12.10/d: major-outer.mc,v 1.1 2004/09/17 17:50:56 root Exp $) with ESMTP id j61LElq8001904; Fri, 1 Jul 2005 21:14:47 GMT Received: from nwlxmail01.jf.intel.com (nwlxmail01.jf.intel.com [10.7.171.40]) by orsfmr100.jf.intel.com (8.12.10/8.12.10/d: major-inner.mc,v 1.2 2004/09/17 18:05:01 root Exp $) with ESMTP id j61LElPb018068; Fri, 1 Jul 2005 21:14:47 GMT Received: from [134.134.3.92] ([134.134.3.92]) by nwlxmail01.jf.intel.com (8.12.10/8.12.9/MailSET/Hub) with ESMTP id j61LElSL006679; Fri, 1 Jul 2005 14:14:47 -0700 Date: Fri, 1 Jul 2005 14:14:00 -0700 (PDT) From: Radheka Godse X-X-Sender: radheka@localhost.localdomain To: fubar@us.ibm.com, bonding-devel@lists.sourceforge.net cc: netdev@oss.sgi.com Subject: [PATCH 2.6.13-rc1 17/17] bonding: Optimization to read MII only when link status has changed. Message-ID: ReplyTo: "Radheka Godse" MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed X-Scanned-By: MIMEDefang 2.44 X-archive-position: 2608 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: radheka.godse@intel.com Precedence: bulk X-list: netdev Enhanced bond_mii_monitor fn: to read MII only when link status has changed for an enslaved adapter. Signed-off-by: Radheka Godse diff -urN -X dontdiff linux-2.6.12post/drivers/net/bonding/bond_main.c linux-2.6.12post-sysfs/drivers/net/bonding/bond_main.c --- linux-2.6.12post/drivers/net/bonding/bond_main.c 2005-06-28 18:18:03.000000000 -0700 +++ linux-2.6.12post-sysfs/drivers/net/bonding/bond_main.c 2005-06-30 13:53:55.000000000 -0700 @@ -2652,6 +2652,7 @@ if (slave == oldcurrent) { do_failover = 1; } + bond_update_speed_duplex(slave); } else { slave->delay--; } @@ -2742,6 +2675,7 @@ } else { slave->delay--; } + bond_update_speed_duplex(slave); } break; default: @@ -2754,8 +2675,6 @@ goto out; } /* end of switch (slave->link) */ - bond_update_speed_duplex(slave); - if (bond->params.mode == BOND_MODE_8023AD) { if (old_speed != slave->speed) { bond_3ad_adapter_speed_changed(slave); ~ From zdenek@rcn.com Fri Jul 1 19:24:38 2005 Received: with ECARTIS (v1.0.0; list netdev); Fri, 01 Jul 2005 19:24:43 -0700 (PDT) Received: from smtp05.mrf.mail.rcn.net (smtp05.mrf.mail.rcn.net [207.172.4.64]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j622ObH9001243 for ; Fri, 1 Jul 2005 19:24:38 -0700 Received: from 209-150-50-115.c3-0.nwt-ubr3.sbo-nwt.ma.cable.rcn.com (HELO funex) (209.150.50.115) by smtp05.mrf.mail.rcn.net with SMTP; 01 Jul 2005 22:23:07 -0400 Message-Id: <3u3gb7$1mhk2i@smtp05.mrf.mail.rcn.net> X-IronPort-AV: i="3.93,252,1115006400"; d="scan'208"; a="57200722:sNHT42246956" X-Sender: zdenek@pop.rcn.com X-Mailer: QUALCOMM Windows Eudora Pro Version 4.0 Date: Fri, 01 Jul 2005 22:16:13 -0400 To: netdev@oss.sgi.com, linux-net@vger.kernel.org From: Zdenek Radouch Subject: controlling ARP Proxy scope? Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" X-archive-position: 2609 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: zdenek@rcn.com Precedence: bulk X-list: netdev I haven't been able to locate anything discussing how to control the scope of Linux proxy ARP. So, left with only a binary flag in /proc, and network definition on the interface, I assumed (perhaps naively) that the arp would proxy only for the addresses within the subnet defined for the interface (on which the proxy arp is turned on). However, that does not seem to be the case. I have an interface with address 10.1.2.219 and mask 255.255.255.248 with proxy arp turned on on this interface, and the machine is responding (I see that with tcpdump) to arp requests for address 10.1.2.1, i.e., an address outside of the proxy interface's subnet. Can anyone explain the behavior? What is the scope of the proxying? If the scope is not limited to the proxy interface's subnet, then how do I avoid proxying for addresses of machines facing the proxy server? This seems to be quite broken. I am running 2.4.25 kernel. I'll appreciate any pointers. Thanks! -Zdenek From dtor_core@ameritech.net Fri Jul 1 22:31:40 2005 Received: with ECARTIS (v1.0.0; list netdev); Fri, 01 Jul 2005 22:31:49 -0700 (PDT) Received: from smtp101.sbc.mail.re2.yahoo.com (smtp101.sbc.mail.re2.yahoo.com [68.142.229.104]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with SMTP id j625VdH9013788 for ; Fri, 1 Jul 2005 22:31:40 -0700 Received: (qmail 95981 invoked from network); 2 Jul 2005 05:30:05 -0000 Received: from unknown (HELO mail.corenet.homeip.net) (dtor?core@ameritech.net@68.253.32.177 with login) by smtp101.sbc.mail.re2.yahoo.com with SMTP; 2 Jul 2005 05:30:04 -0000 From: Dmitry Torokhov To: netdev@oss.sgi.com Subject: Re: [PATCH 2.6.13-rc1 8/17] bonding: SYSFS INTERFACE (large) Date: Sat, 2 Jul 2005 00:30:03 -0500 User-Agent: KMail/1.8.1 Cc: Radheka Godse , fubar@us.ibm.com, bonding-devel@lists.sourceforge.net References: In-Reply-To: MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Content-Disposition: inline Message-Id: <200507020030.03635.dtor_core@ameritech.net> X-archive-position: 2610 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: dtor_core@ameritech.net Precedence: bulk X-list: netdev Hi, On Friday 01 July 2005 15:48, Radheka Godse wrote: > This large patch adds the sysfs interface to channel bonding. It will > allow users to add and remove bonds, add and remove slaves, and change > all bonding parameters without using ifenslave. > The ifenslave interface still works. ... Couple of comments: > @@ -3569,7 +3593,10 @@ > bond_remove_proc_entry(bond); > bond_create_proc_entry(bond); > #endif > - > + down_write(&(bonding_rwsem)); > + bond_destroy_sysfs_entry(bond); > + bond_create_sysfs_entry(bond); > + up_write(&(bonding_rwsem)); Space vs. tab identation. > return NOTIFY_DONE; > } > > @@ -4101,6 +4128,7 @@ > orig_app_abi_ver = prev_abi_ver; > } > > + up_write(&(bonding_rwsem)); Whu extra parens? > + * Changes: > + * > + * 2004/12/12 - Mitch Williams > + * - Initial creation of sysfs interface. > + * > + * 2005/06/22 - Radheka Godse > + * - Added ifenslave -c type functionality to sysfs > + * - Added sysfs files for attributes such as MII Status and > + * 802.3ad aggregator that are displayed in /proc > + * - Added "name value" format to sysfs "mode" and > + * "lacp_rate", for e.g., "active-backup 1" or "slow 0" for > + * consistency and ease of script parsing > + * - Fixed reversal of octets in arp_ip_targets via sysfs > + * - sysfs support to handle bond interface re-naming > + * - Moved all sysfs entries into /sys/class/net instead of > + * of using a standalone subsystem. > + * - Added sysfs symlinks between masters and slaves > + * - Corrected bugs in sysfs unload path when creating bonds > + * with existing interface names. > + * - Removed redundant sysfs stat file since it duplicates slave info > + * from the proc file > + * - Fixed errors in sysfs show/store arp targets. > + * - For consistency with ifenslave, instead of exiting > + * with an error, updated bonding sysfs to > + * close and attempt to enslave an up adapter. > + * - Fixed NULL dereference when adding a slave interface > + * that does not exist. > + * - Added checks in sysfs bonding to reject invalid ip addresses > + * - Synch up with post linux-2.6.12 bonding changes > + * - Created sysfs bond attrib for xmit_hash_policy I think we prefer to rely in SCMs to keep changelogs for new modules. > + > +static struct class *netdev_class; > +/*--------------------------- Data Structures -----------------------------*/ > + > +/* Bonding sysfs lock. Why can't we just use the subsytem lock? > + * Because kobject_register tries to acquire the subsystem lock. If > + * we already hold the lock (which we would if the user was creating > + * a new bond through the sysfs interface), we deadlock. > + */ > + > +struct rw_semaphore bonding_rwsem; klists were just added to the kernel proper. Does this sentiment still holds true? > + > +/* > + * "show" function for the bond_masters attribute. > + * The class parameter is ignored. > + */ > +static ssize_t bonding_show_bonds(struct class *cls, char *buffer) > +{ > + int res = 0; > + struct bonding *bond; > + > + down_read(&(bonding_rwsem)); Why extra parens? > + list_for_each_entry_safe(bond, nxt, &bond_dev_list, bond_list) { > + if (strnicmp(bond->dev->name, name, IFNAMSIZ) == 0) { > + /* Temporarily set a meaningless flag. When > + * we get done with the loop, we'll check all of these. > + * If the bond doesn't have this flag set, then we need > + * to remove the bond. If the flag has it set, then > + * we can just clear the flag. > + */ > + bond->flags |= IFF_DYNAMIC; > + found = 1; > + break; /* Found it, so go to next name */ > + } > + } Why list_for_each_entry_safe is used? NO elements is being deleted in the loop... > + > + /* first, create a link from the slave back to the master */ > + ret = sysfs_create_link(&(slave->class_dev.kobj), &(master->class_dev.kobj), > + "master"); Extra parens again. > +static ssize_t bonding_show_arp_interval(struct class_device *cd, char *buf) > +{ > + int count; > + struct bonding *bond = to_bond(cd); > + > + down_read(&(bonding_rwsem)); > + count = sprintf(buf, "%d\n", bond->params.arp_interval) + 1; > + up_read(&(bonding_rwsem)); > + return count; > +} What does this lock really protects here? As far as I can see params will not go away... > + > + /* get the netdev class pointer */ > + firstbond = container_of(bond_dev_list.next, struct bonding, bond_list); > + if (!firstbond) > + { Open brace should go on the same line as if. Besides, here it is not needed at all... Thanks! -- Dmitry From greg@kroah.com Sat Jul 2 01:16:00 2005 Received: with ECARTIS (v1.0.0; list netdev); Sat, 02 Jul 2005 01:16:02 -0700 (PDT) Received: from perch.kroah.org (mail.kroah.org [69.55.234.183]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j628FxH9025464 for ; Sat, 2 Jul 2005 01:16:00 -0700 Received: from [192.168.0.10] (c-24-22-115-24.hsd1.or.comcast.net [24.22.115.24]) (authenticated) by perch.kroah.org (8.11.6/8.11.6) with ESMTP id j628EIq03553; Sat, 2 Jul 2005 01:14:19 -0700 Received: from greg by echidna.kroah.org with local (masqmail 0.2.19) id 1Dod8E-5Q6-00; Sat, 02 Jul 2005 01:13:46 -0700 Date: Sat, 2 Jul 2005 01:13:46 -0700 From: Greg KH To: Radheka Godse Cc: netdev@oss.sgi.com Subject: Re: [PATCH 2.6.13-rc1 8/17] bonding: SYSFS INTERFACE (large) Message-ID: <20050702081346.GA20789@kroah.com> References: Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.8i X-archive-position: 2611 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: greg@kroah.com Precedence: bulk X-list: netdev On Fri, Jul 01, 2005 at 01:48:44PM -0700, Radheka Godse wrote: > This large patch adds the sysfs interface to channel bonding. It will > allow users to add and remove bonds, add and remove slaves, and change > all bonding parameters without using ifenslave. > The ifenslave interface still works. Have a short example of what the sysfs tree looks like, and what the new files contain and expect to be written to? > diff -urN -X dontdiff linux-2.6.12post/drivers/net/bonding/bonding.h > linux-2.6.12post-sysfs/drivers/net/bonding/bonding.h > --- linux-2.6.12post/drivers/net/bonding/bonding.h 2005-06-28 > 18:18:03.000000000 -0700 patch looks linewrapped :( > +/* Bonding sysfs lock. Why can't we just use the subsytem lock? The subsystem lock is no more. Well, it's still around, but almost no one uses it, due to the klist changes. You shouldn't need a lock either now. > + * Because kobject_register tries to acquire the subsystem lock. If > + * we already hold the lock (which we would if the user was creating > + * a new bond through the sysfs interface), we deadlock. > + */ > + > +struct rw_semaphore bonding_rwsem; > + > + > + > + > +/*------------------------------ Functions > --------------------------------*/ > + > +/* > + * "show" function for the bond_masters attribute. > + * The class parameter is ignored. > + */ > +static ssize_t bonding_show_bonds(struct class *cls, char *buffer) > +{ > + int res = 0; > + struct bonding *bond; > + > + down_read(&(bonding_rwsem)); > + > + list_for_each_entry(bond, &bond_dev_list, bond_list) { > + res += sprintf(buffer + res, "%s ", > + bond->dev->name); > + if (res > (PAGE_SIZE - IFNAMSIZ)) { > + dprintk("eek! too many bonds!\n"); > + break; > + } > + } > + res += sprintf(buffer + res, "\n"); > + res++; > + up_read(&(bonding_rwsem)); > + return res; This violates the 1-value-per-sysfs file rule. Please fix this up. > +static ssize_t bonding_show_slaves(struct class_device *cd, char *buf) > +{ > + struct slave *slave; > + int i, res = 0; > + struct bonding *bond = to_bond(cd); > + > + down_read(&(bonding_rwsem)); > + > + read_lock_bh(&bond->lock); > + bond_for_each_slave(bond, slave, i) { > + res += sprintf(buf + res, "%s ", slave->dev->name); > + if (res > (PAGE_SIZE - IFNAMSIZ)) { > + dprintk("eek! too many slaves!\n"); > + break; > + } > + } > + read_unlock_bh(&bond->lock); > + res += sprintf(buf + res, "\n"); > + res++; > + up_read(&(bonding_rwsem)); > + return res; > +} Same sysfs violation. I think other files also have problems :( thanks, greg k-h From hno@marasystems.com Sat Jul 2 14:24:07 2005 Received: with ECARTIS (v1.0.0; list netdev); Sat, 02 Jul 2005 14:24:10 -0700 (PDT) Received: from filer.marasystems.com (marasystems.com [83.241.133.2]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j62LO6H9005944 for ; Sat, 2 Jul 2005 14:24:07 -0700 Received: from localhost (henrik@localhost) by filer.marasystems.com (8.11.6/8.11.6) with ESMTP id j62LLbV27158; Sat, 2 Jul 2005 23:21:37 +0200 Date: Sat, 2 Jul 2005 23:21:37 +0200 (CEST) From: Henrik Nordstrom To: Zdenek Radouch cc: netdev@oss.sgi.com, linux-net@vger.kernel.org Subject: Re: controlling ARP Proxy scope? In-Reply-To: <3u3gb7$1mhk2i@smtp05.mrf.mail.rcn.net> Message-ID: References: <3u3gb7$1mhk2i@smtp05.mrf.mail.rcn.net> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed X-archive-position: 2612 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: hno@marasystems.com Precedence: bulk X-list: netdev On Fri, 1 Jul 2005, Zdenek Radouch wrote: > So, left with only a binary flag in /proc, and network definition on the > interface, > I assumed (perhaps naively) that the arp would proxy only for the addresses > within the subnet defined for the interface (on which the proxy arp is > turned on). > However, that does not seem to be the case. You may be able to tune this with either arp_filter or arp_ignore. > I have an interface with address 10.1.2.219 and mask 255.255.255.248 with > proxy arp turned on on this interface, and the machine is responding > (I see that with tcpdump) to arp requests for address 10.1.2.1, i.e., > an address outside of the proxy interface's subnet. Correct. > Can anyone explain the behavior? proxy_arp simply ARPs if there is a route for the requested destination going out on another interface than where the ARP was seen. Regards Henrik From bunk@stusta.de Sat Jul 2 16:52:59 2005 Received: with ECARTIS (v1.0.0; list netdev); Sat, 02 Jul 2005 16:53:05 -0700 (PDT) Received: from mailout.stusta.mhn.de (emailhub.stusta.mhn.de [141.84.69.5]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with SMTP id j62NqrH9015684 for ; Sat, 2 Jul 2005 16:52:58 -0700 Received: (qmail 30658 invoked from network); 2 Jul 2005 23:51:10 -0000 Received: from r063144.stusta.swh.mhn.de (10.150.63.144) by mailout.stusta.mhn.de with SMTP; 2 Jul 2005 23:51:10 -0000 Received: by r063144.stusta.swh.mhn.de (Postfix, from userid 1000) id A17DFAFA5A; Sun, 3 Jul 2005 01:51:09 +0200 (CEST) Date: Sun, 3 Jul 2005 01:51:09 +0200 From: Adrian Bunk To: jgarzik@pobox.com Cc: jkmaline@cc.hut.fi, hostap@shmoo.com, netdev@oss.sgi.com, linux-kernel@vger.kernel.org Subject: [-mm patch] net/ieee80211/: remove pci.h #include's Message-ID: <20050702235109.GJ5346@stusta.de> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.5.9i X-archive-position: 2613 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: bunk@stusta.de Precedence: bulk X-list: netdev I was wondering why editing pci.h triggered the rebuild of three files under net/, and as far as I can see, there's no reason for these three files to #include pci.h . Signed-off-by: Adrian Bunk --- This patch was already sent on: - 30 May 2005 - 1 May 2005 net/ieee80211/ieee80211_module.c | 1 - net/ieee80211/ieee80211_rx.c | 1 - net/ieee80211/ieee80211_tx.c | 1 - 3 files changed, 3 deletions(-) --- linux-2.6.12-rc3-mm1-full/net/ieee80211/ieee80211_module.c.old 2005-04-30 23:23:14.000000000 +0200 +++ linux-2.6.12-rc3-mm1-full/net/ieee80211/ieee80211_module.c 2005-04-30 23:23:18.000000000 +0200 @@ -40,7 +40,6 @@ #include #include #include -#include #include #include #include --- linux-2.6.12-rc3-mm1-full/net/ieee80211/ieee80211_tx.c.old 2005-04-30 23:23:25.000000000 +0200 +++ linux-2.6.12-rc3-mm1-full/net/ieee80211/ieee80211_tx.c 2005-04-30 23:23:32.000000000 +0200 @@ -33,7 +33,6 @@ #include #include #include -#include #include #include #include --- linux-2.6.12-rc3-mm1-full/net/ieee80211/ieee80211_rx.c.old 2005-04-30 23:23:42.000000000 +0200 +++ linux-2.6.12-rc3-mm1-full/net/ieee80211/ieee80211_rx.c 2005-04-30 23:23:46.000000000 +0200 @@ -23,7 +23,6 @@ #include #include #include -#include #include #include #include From yi.zhu@intel.com Sun Jul 3 22:54:25 2005 Received: with ECARTIS (v1.0.0; list netdev); Sun, 03 Jul 2005 22:54:29 -0700 (PDT) Received: from fmsfmr001.fm.intel.com (fmr13.intel.com [192.55.52.67]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j645sOH9002416 for ; Sun, 3 Jul 2005 22:54:25 -0700 Received: from fmsfmr100.fm.intel.com (fmsfmr100.fm.intel.com [10.253.24.20]) by fmsfmr001.fm.intel.com (8.12.10/8.12.10/d: major-outer.mc,v 1.1 2004/09/17 17:50:56 root Exp $) with ESMTP id j645qnqU029685; Mon, 4 Jul 2005 05:52:49 GMT Received: from fmsmsxvs041.fm.intel.com (fmsmsxvs041.fm.intel.com [132.233.42.126]) by fmsfmr100.fm.intel.com (8.12.10/8.12.10/d: major-inner.mc,v 1.2 2004/09/17 18:05:01 root Exp $) with SMTP id j645qi9d031973; Mon, 4 Jul 2005 05:52:49 GMT Received: from debian.sh.intel.com ([172.16.219.38]) by fmsmsxvs041.fm.intel.com (SAVSMTP 3.1.7.47) with SMTP id M2005070322524716745 ; Sun, 03 Jul 2005 22:52:48 -0700 Subject: Re: ipw2100: more cleanups From: Zhu Yi To: Pavel Machek Cc: jketreno@linux.intel.com, netdev@oss.sgi.com, jbohac@suse.cz, jbenc@suse.cz In-Reply-To: <20050513214025.GA1863@elf.ucw.cz> References: <20050513214025.GA1863@elf.ucw.cz> Content-Type: text/plain Organization: Intel Corp. Date: Mon, 04 Jul 2005 13:49:05 +0800 Message-Id: <1120456146.27821.85.camel@debian.sh.intel.com> Mime-Version: 1.0 X-Mailer: Evolution 2.2.2 Content-Transfer-Encoding: 7bit X-Scanned-By: MIMEDefang 2.44 X-archive-position: 2615 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: yi.zhu@intel.com Precedence: bulk X-list: netdev Content-Length: 304 Lines: 13 On Fri, 2005-05-13 at 23:40 +0200, Pavel Machek wrote: > More cleanups (relative to ipw2100-1.1.0 version). Highlights include > removing of two printk wrappers. [snip] > -static DEVICE_ATTR(bssinfo, S_IRUGO, show_bssinfo, NULL); Any special reason to remove these driver sysfs entries? Thanks, -yi From pavel@ucw.cz Mon Jul 4 00:06:36 2005 Received: with ECARTIS (v1.0.0; list netdev); Mon, 04 Jul 2005 00:06:40 -0700 (PDT) Received: from amd.ucw.cz (gprs189-60.eurotel.cz [160.218.189.60]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j6476WH9005409 for ; Mon, 4 Jul 2005 00:06:34 -0700 Received: by amd.ucw.cz (Postfix, from userid 8) id DD4EC8B8D2; Mon, 4 Jul 2005 09:05:06 +0200 (CEST) Date: Mon, 4 Jul 2005 09:05:06 +0200 From: Pavel Machek To: Zhu Yi Cc: jketreno@linux.intel.com, netdev@oss.sgi.com, jbohac@suse.cz, jbenc@suse.cz Subject: Re: ipw2100: more cleanups Message-ID: <20050704070506.GA15370@elf.ucw.cz> References: <20050513214025.GA1863@elf.ucw.cz> <1120456146.27821.85.camel@debian.sh.intel.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1120456146.27821.85.camel@debian.sh.intel.com> X-Warning: Reading this can be dangerous to your mental health. User-Agent: Mutt/1.5.9i X-archive-position: 2616 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: pavel@suse.cz Precedence: bulk X-list: netdev Content-Length: 589 Lines: 20 Hi! > > More cleanups (relative to ipw2100-1.1.0 version). Highlights include > > removing of two printk wrappers. > > [snip] > > > -static DEVICE_ATTR(bssinfo, S_IRUGO, show_bssinfo, NULL); > > Any special reason to remove these driver sysfs entries? Umm, yes, they should not be needed and we do not want to maintain them in /sys forever. /sys should be "one value per file", and definitely not for debugging stuff like this. (See debugfs) Anyway I think I did drop that patch from newer cleanup attempts. Pavel -- teflon -- maybe it is a trademark, but it should not be. From dada1@cosmosbay.com Mon Jul 4 14:23:50 2005 Received: with ECARTIS (v1.0.0; list netdev); Mon, 04 Jul 2005 14:23:55 -0700 (PDT) Received: from smtp.cegetel.net (mf00.sitadelle.com [212.94.174.67]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j64LNnH9008430 for ; Mon, 4 Jul 2005 14:23:50 -0700 Received: from [192.168.0.5] (unknown [84.5.72.184]) by smtp.cegetel.net (Postfix) with ESMTP id EB2911A4205; Mon, 4 Jul 2005 23:22:12 +0200 (CEST) Message-ID: <42C9A884.1030906@cosmosbay.com> Date: Mon, 04 Jul 2005 23:22:12 +0200 From: Eric Dumazet User-Agent: Mozilla Thunderbird 1.0 (Windows/20041206) X-Accept-Language: fr, en MIME-Version: 1.0 To: Michael Chan Cc: "David S. Miller" , netdev@oss.sgi.com Subject: Re: [TG3]: About hw coalescing infrastructure. References: <20050511.141530.57445142.davem@davemloft.net> <42B982D4.9040704@cosmosbay.com> <1119467012.5325.15.camel@rh4> In-Reply-To: <1119467012.5325.15.camel@rh4> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 8bit X-archive-position: 2617 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: dada1@cosmosbay.com Precedence: bulk X-list: netdev Content-Length: 4321 Lines: 122 Michael Chan a écrit : > On Wed, 2005-06-22 at 17:25 +0200, Eric Dumazet wrote: > > >>Is there anything I can try to tune the coalescing ? >>Being able to handle 100 packets each interrupt instead of one or two would certainly help. >>I dont mind about latency. But of course I would like not to drop packets :) >>But maybe the BCM5702 is not able to delay an interrupt ? >> > > > On the 5702 that supports CLRTCKS mode, you need to play around with the > following parameters in tg3.h. To reduce interrupts, you generally have > to increase the values. > > #define LOW_RXCOL_TICKS_CLRTCKS 0x00000014 > #define LOW_TXCOL_TICKS_CLRTCKS 0x00000048 > #define LOW_RXMAX_FRAMES 0x00000005 > #define LOW_TXMAX_FRAMES 0x00000035 > #define DEFAULT_RXCOAL_TICK_INT_CLRTCKS 0x00000014 > #define DEFAULT_TXCOAL_TICK_INT_CLRTCKS 0x00000014 > #define DEFAULT_RXCOAL_MAXF_INT 0x00000005 > #define DEFAULT_TXCOAL_MAXF_INT 0x00000005 > > > Thanks Michael I tried various settings. # ethtool -c eth0 Coalesce parameters for eth0: Adaptive RX: off TX: off stats-block-usecs: 1000000 sample-interval: 0 pkt-rate-low: 0 pkt-rate-high: 0 rx-usecs: 500 rx-frames: 20 rx-usecs-irq: 500 rx-frames-irq: 20 tx-usecs: 600 tx-frames: 53 tx-usecs-irq: 600 tx-frames-irq: 20 rx-usecs-low: 0 rx-frame-low: 0 tx-usecs-low: 0 tx-frame-low: 0 rx-usecs-high: 0 rx-frame-high: 0 tx-usecs-high: 0 tx-frame-high: 0 But it seems something is wrong because when network load becomes high I get : Jul 4 23:01:19 dada1 kernel: NETDEV WATCHDOG: eth0: transmit timed out Jul 4 23:01:19 dada1 kernel: tg3: eth0: transmit timed out, resetting Jul 4 23:01:24 dada1 kernel: NETDEV WATCHDOG: eth0: transmit timed out Jul 4 23:01:24 dada1 kernel: tg3: eth0: transmit timed out, resetting Jul 4 23:01:29 dada1 kernel: NETDEV WATCHDOG: eth0: transmit timed out Jul 4 23:01:29 dada1 kernel: tg3: eth0: transmit timed out, resetting Jul 4 23:01:34 dada1 kernel: NETDEV WATCHDOG: eth0: transmit timed out Jul 4 23:01:34 dada1 kernel: tg3: eth0: transmit timed out, resetting Jul 4 23:01:39 dada1 kernel: NETDEV WATCHDOG: eth0: transmit timed out Jul 4 23:01:39 dada1 kernel: tg3: eth0: transmit timed out, resetting Jul 4 23:01:44 dada1 kernel: NETDEV WATCHDOG: eth0: transmit timed out Jul 4 23:01:44 dada1 kernel: tg3: eth0: transmit timed out, resetting Jul 4 23:01:49 dada1 kernel: NETDEV WATCHDOG: eth0: transmit timed out Jul 4 23:01:49 dada1 kernel: tg3: eth0: transmit timed out, resetting Jul 4 23:01:54 dada1 kernel: NETDEV WATCHDOG: eth0: transmit timed out Jul 4 23:01:54 dada1 kernel: tg3: eth0: transmit timed out, resetting Jul 4 23:01:59 dada1 kernel: NETDEV WATCHDOG: eth0: transmit timed out Jul 4 23:01:59 dada1 kernel: tg3: eth0: transmit timed out, resetting Jul 4 23:02:04 dada1 kernel: NETDEV WATCHDOG: eth0: transmit timed out Jul 4 23:02:04 dada1 kernel: tg3: eth0: transmit timed out, resetting Jul 4 23:02:09 dada1 kernel: NETDEV WATCHDOG: eth0: transmit timed out Jul 4 23:02:09 dada1 kernel: tg3: eth0: transmit timed out, resetting Jul 4 23:02:14 dada1 kernel: NETDEV WATCHDOG: eth0: transmit timed out Jul 4 23:02:14 dada1 kernel: tg3: eth0: transmit timed out, resetting Jul 4 23:02:19 dada1 kernel: NETDEV WATCHDOG: eth0: transmit timed out Jul 4 23:02:19 dada1 kernel: tg3: eth0: transmit timed out, resetting Jul 4 23:02:24 dada1 kernel: NETDEV WATCHDOG: eth0: transmit timed out Jul 4 23:02:24 dada1 kernel: tg3: eth0: transmit timed out, resetting Jul 4 23:02:29 dada1 kernel: NETDEV WATCHDOG: eth0: transmit timed out Jul 4 23:02:29 dada1 kernel: tg3: eth0: transmit timed out, resetting Jul 4 23:02:34 dada1 kernel: NETDEV WATCHDOG: eth0: transmit timed out Jul 4 23:02:34 dada1 kernel: tg3: eth0: transmit timed out, resetting Jul 4 23:02:39 dada1 kernel: NETDEV WATCHDOG: eth0: transmit timed out Jul 4 23:02:39 dada1 kernel: tg3: eth0: transmit timed out, resetting Then the machine crashes at this point. tg3.c : #define DRV_MODULE_VERSION "3.31" #define DRV_MODULE_RELDATE "June 8, 2005" # ethtool -g eth0 Ring parameters for eth0: Pre-set maximums: RX: 511 RX Mini: 0 RX Jumbo: 255 TX: 0 Current hardware settings: RX: 400 RX Mini: 0 RX Jumbo: 40 TX: 511 Thank you Eric Dumazet From davem@davemloft.net Mon Jul 4 14:29:02 2005 Received: with ECARTIS (v1.0.0; list netdev); Mon, 04 Jul 2005 14:29:08 -0700 (PDT) Received: from sunset.davemloft.net (dsl027-180-168.sfo1.dsl.speakeasy.net [216.27.180.168]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j64LT2H9008976 for ; Mon, 4 Jul 2005 14:29:02 -0700 Received: from localhost ([127.0.0.1] ident=davem) by sunset.davemloft.net with esmtp (Exim 4.50) id 1DpYS4-0000vU-AW; Mon, 04 Jul 2005 14:26:04 -0700 Date: Mon, 04 Jul 2005 14:26:04 -0700 (PDT) Message-Id: <20050704.142604.18592223.davem@davemloft.net> To: dada1@cosmosbay.com Cc: mchan@broadcom.com, netdev@oss.sgi.com Subject: Re: [TG3]: About hw coalescing infrastructure. From: "David S. Miller" In-Reply-To: <42C9A884.1030906@cosmosbay.com> References: <42B982D4.9040704@cosmosbay.com> <1119467012.5325.15.camel@rh4> <42C9A884.1030906@cosmosbay.com> X-Mailer: Mew version 4.2 on Emacs 21.4 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 2618 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@davemloft.net Precedence: bulk X-list: netdev Content-Length: 599 Lines: 13 From: Eric Dumazet Date: Mon, 04 Jul 2005 23:22:12 +0200 > But it seems something is wrong because when network load becomes high I get : > > Jul 4 23:01:19 dada1 kernel: NETDEV WATCHDOG: eth0: transmit timed out > Jul 4 23:01:19 dada1 kernel: tg3: eth0: transmit timed out, resetting > Jul 4 23:01:24 dada1 kernel: NETDEV WATCHDOG: eth0: transmit timed out > Jul 4 23:01:24 dada1 kernel: tg3: eth0: transmit timed out, resetting > Jul 4 23:01:29 dada1 kernel: NETDEV WATCHDOG: eth0: transmit timed out Does this happen without changing the coalescing settings at all? From dada1@cosmosbay.com Mon Jul 4 14:41:11 2005 Received: with ECARTIS (v1.0.0; list netdev); Mon, 04 Jul 2005 14:41:16 -0700 (PDT) Received: from smtp.cegetel.net (mf00.sitadelle.com [212.94.174.67]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j64LfAH9010131 for ; Mon, 4 Jul 2005 14:41:11 -0700 Received: from [192.168.0.5] (unknown [84.5.72.184]) by smtp.cegetel.net (Postfix) with ESMTP id EA0061A4211; Mon, 4 Jul 2005 23:39:35 +0200 (CEST) Message-ID: <42C9AC97.1020902@cosmosbay.com> Date: Mon, 04 Jul 2005 23:39:35 +0200 From: Eric Dumazet User-Agent: Mozilla Thunderbird 1.0 (Windows/20041206) X-Accept-Language: fr, en MIME-Version: 1.0 To: "David S. Miller" Cc: mchan@broadcom.com, netdev@oss.sgi.com Subject: Re: [TG3]: About hw coalescing infrastructure. References: <42B982D4.9040704@cosmosbay.com> <1119467012.5325.15.camel@rh4> <42C9A884.1030906@cosmosbay.com> <20050704.142604.18592223.davem@davemloft.net> In-Reply-To: <20050704.142604.18592223.davem@davemloft.net> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 8bit X-archive-position: 2619 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: dada1@cosmosbay.com Precedence: bulk X-list: netdev Content-Length: 738 Lines: 20 David S. Miller a écrit : > From: Eric Dumazet > Date: Mon, 04 Jul 2005 23:22:12 +0200 > > >>But it seems something is wrong because when network load becomes high I get : >> >>Jul 4 23:01:19 dada1 kernel: NETDEV WATCHDOG: eth0: transmit timed out >>Jul 4 23:01:19 dada1 kernel: tg3: eth0: transmit timed out, resetting >>Jul 4 23:01:24 dada1 kernel: NETDEV WATCHDOG: eth0: transmit timed out >>Jul 4 23:01:24 dada1 kernel: tg3: eth0: transmit timed out, resetting >>Jul 4 23:01:29 dada1 kernel: NETDEV WATCHDOG: eth0: transmit timed out > > > Does this happen without changing the coalescing settings > at all? > > Not easy to answer because I have to rebuild a kernel and reboot. Will do that shortly. From davem@davemloft.net Mon Jul 4 14:52:48 2005 Received: with ECARTIS (v1.0.0; list netdev); Mon, 04 Jul 2005 14:52:50 -0700 (PDT) Received: from sunset.davemloft.net (dsl027-180-168.sfo1.dsl.speakeasy.net [216.27.180.168]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j64LqlH9011503 for ; Mon, 4 Jul 2005 14:52:48 -0700 Received: from localhost ([127.0.0.1] ident=davem) by sunset.davemloft.net with esmtp (Exim 4.50) id 1DpYp5-0000xs-6f; Mon, 04 Jul 2005 14:49:51 -0700 Date: Mon, 04 Jul 2005 14:49:51 -0700 (PDT) Message-Id: <20050704.144950.112302038.davem@davemloft.net> To: dada1@cosmosbay.com Cc: mchan@broadcom.com, netdev@oss.sgi.com Subject: Re: [TG3]: About hw coalescing infrastructure. From: "David S. Miller" In-Reply-To: <42C9AC97.1020902@cosmosbay.com> References: <42C9A884.1030906@cosmosbay.com> <20050704.142604.18592223.davem@davemloft.net> <42C9AC97.1020902@cosmosbay.com> X-Mailer: Mew version 4.2 on Emacs 21.4 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 2620 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@davemloft.net Precedence: bulk X-list: netdev Content-Length: 181 Lines: 7 From: Eric Dumazet Date: Mon, 04 Jul 2005 23:39:35 +0200 > Not easy to answer because I have to rebuild a kernel and > reboot. Will do that shortly. Thanks. From dada1@cosmosbay.com Mon Jul 4 15:33:32 2005 Received: with ECARTIS (v1.0.0; list netdev); Mon, 04 Jul 2005 15:33:37 -0700 (PDT) Received: from smtp.cegetel.net (mf00.sitadelle.com [212.94.174.67]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j64MXVH9014548 for ; Mon, 4 Jul 2005 15:33:32 -0700 Received: from [192.168.0.5] (unknown [84.5.72.184]) by smtp.cegetel.net (Postfix) with ESMTP id 08FD11A419A; Tue, 5 Jul 2005 00:31:56 +0200 (CEST) Message-ID: <42C9B8DB.7020403@cosmosbay.com> Date: Tue, 05 Jul 2005 00:31:55 +0200 From: Eric Dumazet User-Agent: Mozilla Thunderbird 1.0 (Windows/20041206) X-Accept-Language: fr, en MIME-Version: 1.0 To: Eric Dumazet Cc: "David S. Miller" , mchan@broadcom.com, netdev@oss.sgi.com Subject: Re: [TG3]: About hw coalescing infrastructure. References: <42B982D4.9040704@cosmosbay.com> <1119467012.5325.15.camel@rh4> <42C9A884.1030906@cosmosbay.com> <20050704.142604.18592223.davem@davemloft.net> <42C9AC97.1020902@cosmosbay.com> In-Reply-To: <42C9AC97.1020902@cosmosbay.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 8bit X-archive-position: 2621 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: dada1@cosmosbay.com Precedence: bulk X-list: netdev Content-Length: 1006 Lines: 34 Eric Dumazet a écrit : > David S. Miller a écrit : > >> From: Eric Dumazet >> Date: Mon, 04 Jul 2005 23:22:12 +0200 >> >> >>> But it seems something is wrong because when network load becomes >>> high I get : >>> >>> Jul 4 23:01:19 dada1 kernel: NETDEV WATCHDOG: eth0: transmit timed out >>> Jul 4 23:01:19 dada1 kernel: tg3: eth0: transmit timed out, resetting >>> Jul 4 23:01:24 dada1 kernel: NETDEV WATCHDOG: eth0: transmit timed out >>> Jul 4 23:01:24 dada1 kernel: tg3: eth0: transmit timed out, resetting >>> Jul 4 23:01:29 dada1 kernel: NETDEV WATCHDOG: eth0: transmit timed out >> >> >> >> Does this happen without changing the coalescing settings >> at all? >> >> > Not easy to answer because I have to rebuild a kernel and reboot. Will > do that shortly. > > > Oops, I forgot to tell you I applied the patch : [TG3]: Eliminate all hw IRQ handler spinlocks. (Date: Fri, 03 Jun 2005 12:25:58 -0700 (PDT)) Maybe I should revert to stock 2.6.12 tg3 driver ? Eric From dada1@cosmosbay.com Mon Jul 4 15:49:35 2005 Received: with ECARTIS (v1.0.0; list netdev); Mon, 04 Jul 2005 15:49:37 -0700 (PDT) Received: from smtp.cegetel.net (mf01.sitadelle.com [212.94.174.68]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j64MnYH9015751 for ; Mon, 4 Jul 2005 15:49:35 -0700 Received: from [192.168.0.5] (unknown [84.5.72.184]) by smtp.cegetel.net (Postfix) with ESMTP id 9772E31841A; Tue, 5 Jul 2005 00:47:59 +0200 (CEST) Message-ID: <42C9BC9C.9070700@cosmosbay.com> Date: Tue, 05 Jul 2005 00:47:56 +0200 From: Eric Dumazet User-Agent: Mozilla Thunderbird 1.0 (Windows/20041206) X-Accept-Language: fr, en MIME-Version: 1.0 To: Eric Dumazet Cc: "David S. Miller" , mchan@broadcom.com, netdev@oss.sgi.com Subject: Re: [TG3]: About hw coalescing infrastructure. References: <42B982D4.9040704@cosmosbay.com> <1119467012.5325.15.camel@rh4> <42C9A884.1030906@cosmosbay.com> <20050704.142604.18592223.davem@davemloft.net> <42C9AC97.1020902@cosmosbay.com> <42C9B8DB.7020403@cosmosbay.com> In-Reply-To: <42C9B8DB.7020403@cosmosbay.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 8bit X-archive-position: 2623 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: dada1@cosmosbay.com Precedence: bulk X-list: netdev Content-Length: 458 Lines: 19 Eric Dumazet a écrit : > > > Oops, I forgot to tell you I applied the patch : [TG3]: Eliminate all > hw IRQ handler spinlocks. > (Date: Fri, 03 Jun 2005 12:25:58 -0700 (PDT)) > > Maybe I should revert to stock 2.6.12 tg3 driver ? > > Eric > > Oh well, I checked 2.6.13-rc1 and it contains this line in tg3_netif_stop() : + tp->dev->trans_start = jiffies; /* prevent tx timeout */ So I am trying driver 3.32 June 24, 2005, included in 2.6.13-rc1 From davem@davemloft.net Mon Jul 4 15:48:56 2005 Received: with ECARTIS (v1.0.0; list netdev); Mon, 04 Jul 2005 15:49:00 -0700 (PDT) Received: from sunset.davemloft.net (dsl027-180-168.sfo1.dsl.speakeasy.net [216.27.180.168]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j64MmuH9015666 for ; Mon, 4 Jul 2005 15:48:56 -0700 Received: from localhost ([127.0.0.1] ident=davem) by sunset.davemloft.net with esmtp (Exim 4.50) id 1DpZib-0000kO-1N; Mon, 04 Jul 2005 15:47:13 -0700 Date: Mon, 04 Jul 2005 15:47:12 -0700 (PDT) Message-Id: <20050704.154712.63128211.davem@davemloft.net> To: dada1@cosmosbay.com Cc: mchan@broadcom.com, netdev@oss.sgi.com Subject: Re: [TG3]: About hw coalescing infrastructure. From: "David S. Miller" In-Reply-To: <42C9B8DB.7020403@cosmosbay.com> References: <20050704.142604.18592223.davem@davemloft.net> <42C9AC97.1020902@cosmosbay.com> <42C9B8DB.7020403@cosmosbay.com> X-Mailer: Mew version 4.2 on Emacs 21.4 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 2622 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@davemloft.net Precedence: bulk X-list: netdev Content-Length: 740 Lines: 17 From: Eric Dumazet Date: Tue, 05 Jul 2005 00:31:55 +0200 > Oops, I forgot to tell you I applied the patch : [TG3]: Eliminate all hw IRQ handler spinlocks. > (Date: Fri, 03 Jun 2005 12:25:58 -0700 (PDT)) > > Maybe I should revert to stock 2.6.12 tg3 driver ? Please don't ever do stuff like that :-( That makes the driver version, and any other information you report completely meaningless and useless. You've just wasted a lot of our time. I have no idea to even know _WHICH_ IRQ spinlock patch you applied. The one that ended up in Linus's tree has many bug fixes and refinements. The ones which were posted on netdev had many bugs and deadlocks which needed to be cured before pushing the change upstream. From dada1@cosmosbay.com Mon Jul 4 15:57:19 2005 Received: with ECARTIS (v1.0.0; list netdev); Mon, 04 Jul 2005 15:57:20 -0700 (PDT) Received: from smtp.cegetel.net (mf01.sitadelle.com [212.94.174.68]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j64MvIH9016997 for ; Mon, 4 Jul 2005 15:57:18 -0700 Received: from [192.168.0.5] (unknown [84.5.72.184]) by smtp.cegetel.net (Postfix) with ESMTP id A33C231840E; Tue, 5 Jul 2005 00:55:41 +0200 (CEST) Message-ID: <42C9BE69.2070008@cosmosbay.com> Date: Tue, 05 Jul 2005 00:55:37 +0200 From: Eric Dumazet User-Agent: Mozilla Thunderbird 1.0 (Windows/20041206) X-Accept-Language: fr, en MIME-Version: 1.0 To: "David S. Miller" Cc: mchan@broadcom.com, netdev@oss.sgi.com Subject: Re: [TG3]: About hw coalescing infrastructure. References: <20050704.142604.18592223.davem@davemloft.net> <42C9AC97.1020902@cosmosbay.com> <42C9B8DB.7020403@cosmosbay.com> <20050704.154712.63128211.davem@davemloft.net> In-Reply-To: <20050704.154712.63128211.davem@davemloft.net> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 8bit X-archive-position: 2624 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: dada1@cosmosbay.com Precedence: bulk X-list: netdev Content-Length: 691 Lines: 25 David S. Miller a écrit : > Please don't ever do stuff like that :-( That makes the driver > version, and any other information you report completely meaningless > and useless. You've just wasted a lot of our time. > Yes. But if you dont want us to test your patches, dont send them to the list. For your information, 2.6.13-rc1 locks too. Very easy way to lock it : ping -f some_destination. > I have no idea to even know _WHICH_ IRQ spinlock patch you applied. > > The one that ended up in Linus's tree has many bug fixes and > refinements. The ones which were posted on netdev had many bugs and > deadlocks which needed to be cured before pushing the change upstream. > > From dada1@cosmosbay.com Mon Jul 4 15:59:39 2005 Received: with ECARTIS (v1.0.0; list netdev); Mon, 04 Jul 2005 15:59:41 -0700 (PDT) Received: from smtp.cegetel.net (mf01.sitadelle.com [212.94.174.68]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j64MxcH9017537 for ; Mon, 4 Jul 2005 15:59:38 -0700 Received: from [192.168.0.5] (unknown [84.5.72.184]) by smtp.cegetel.net (Postfix) with ESMTP id 65ED13183E9; Tue, 5 Jul 2005 00:58:03 +0200 (CEST) Message-ID: <42C9BEF6.4080402@cosmosbay.com> Date: Tue, 05 Jul 2005 00:57:58 +0200 From: Eric Dumazet User-Agent: Mozilla Thunderbird 1.0 (Windows/20041206) X-Accept-Language: fr, en MIME-Version: 1.0 To: Eric Dumazet Cc: "David S. Miller" , mchan@broadcom.com, netdev@oss.sgi.com Subject: Re: [TG3]: About hw coalescing infrastructure. References: <20050704.142604.18592223.davem@davemloft.net> <42C9AC97.1020902@cosmosbay.com> <42C9B8DB.7020403@cosmosbay.com> <20050704.154712.63128211.davem@davemloft.net> <42C9BE69.2070008@cosmosbay.com> In-Reply-To: <42C9BE69.2070008@cosmosbay.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 8bit X-archive-position: 2625 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: dada1@cosmosbay.com Precedence: bulk X-list: netdev Content-Length: 188 Lines: 15 Eric Dumazet a écrit : > > For your information, 2.6.13-rc1 locks too. > > Very easy way to lock it : > > ping -f some_destination. Arg. False alarm. Sorry. Time for me to sleep. From davem@davemloft.net Mon Jul 4 16:02:11 2005 Received: with ECARTIS (v1.0.0; list netdev); Mon, 04 Jul 2005 16:02:15 -0700 (PDT) Received: from sunset.davemloft.net (dsl027-180-168.sfo1.dsl.speakeasy.net [216.27.180.168]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j64N2BH9018411 for ; Mon, 4 Jul 2005 16:02:11 -0700 Received: from localhost ([127.0.0.1] ident=davem) by sunset.davemloft.net with esmtp (Exim 4.50) id 1DpZvW-0000mX-9d; Mon, 04 Jul 2005 16:00:34 -0700 Date: Mon, 04 Jul 2005 16:00:34 -0700 (PDT) Message-Id: <20050704.160034.55511095.davem@davemloft.net> To: dada1@cosmosbay.com Cc: mchan@broadcom.com, netdev@oss.sgi.com Subject: Re: [TG3]: About hw coalescing infrastructure. From: "David S. Miller" In-Reply-To: <42C9BE69.2070008@cosmosbay.com> References: <42C9B8DB.7020403@cosmosbay.com> <20050704.154712.63128211.davem@davemloft.net> <42C9BE69.2070008@cosmosbay.com> X-Mailer: Mew version 4.2 on Emacs 21.4 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=iso-8859-1 Content-Transfer-Encoding: 8bit X-MIME-Autoconverted: from quoted-printable to 8bit by oss.sgi.com id j64N2BH9018411 X-archive-position: 2626 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@davemloft.net Precedence: bulk X-list: netdev Content-Length: 788 Lines: 27 From: Eric Dumazet Date: Tue, 05 Jul 2005 00:55:37 +0200 > David S. Miller a écrit : > > > Please don't ever do stuff like that :-( That makes the driver > > version, and any other information you report completely meaningless > > and useless. You've just wasted a lot of our time. > > > > Yes. But if you dont want us to test your patches, dont send them to > the list. The case here is that you didn't even mention that you had the patch applied. If you don't say what changes you've applied to the stock driver we can't properly debug anything. > For your information, 2.6.13-rc1 locks too. > > Very easy way to lock it : > > ping -f some_destination. SMP system? What platform and 570x chip revision? Does the stock driver in 2.6.12 hang as well? From davem@davemloft.net Mon Jul 4 16:03:15 2005 Received: with ECARTIS (v1.0.0; list netdev); Mon, 04 Jul 2005 16:03:18 -0700 (PDT) Received: from sunset.davemloft.net (dsl027-180-168.sfo1.dsl.speakeasy.net [216.27.180.168]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j64N3FH9018782 for ; Mon, 4 Jul 2005 16:03:15 -0700 Received: from localhost ([127.0.0.1] ident=davem) by sunset.davemloft.net with esmtp (Exim 4.50) id 1DpZwa-0000ms-Eu; Mon, 04 Jul 2005 16:01:40 -0700 Date: Mon, 04 Jul 2005 16:01:40 -0700 (PDT) Message-Id: <20050704.160140.21591849.davem@davemloft.net> To: dada1@cosmosbay.com Cc: mchan@broadcom.com, netdev@oss.sgi.com Subject: Re: [TG3]: About hw coalescing infrastructure. From: "David S. Miller" In-Reply-To: <42C9BEF6.4080402@cosmosbay.com> References: <20050704.154712.63128211.davem@davemloft.net> <42C9BE69.2070008@cosmosbay.com> <42C9BEF6.4080402@cosmosbay.com> X-Mailer: Mew version 4.2 on Emacs 21.4 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=iso-8859-1 Content-Transfer-Encoding: 8bit X-MIME-Autoconverted: from quoted-printable to 8bit by oss.sgi.com id j64N3FH9018782 X-archive-position: 2627 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@davemloft.net Precedence: bulk X-list: netdev Content-Length: 247 Lines: 16 From: Eric Dumazet Date: Tue, 05 Jul 2005 00:57:58 +0200 > Eric Dumazet a écrit : > > > Very easy way to lock it : > > > > ping -f some_destination. > > > Arg. False alarm. Sorry. > > Time for me to sleep. Oh well... From dada1@cosmosbay.com Tue Jul 5 00:40:30 2005 Received: with ECARTIS (v1.0.0; list netdev); Tue, 05 Jul 2005 00:40:35 -0700 (PDT) Received: from smtp.cegetel.net (mf01.sitadelle.com [212.94.174.68]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j657eTH9029339 for ; Tue, 5 Jul 2005 00:40:30 -0700 Received: from [192.168.0.5] (84-4-148-22.dti.cegetel.net [84.4.148.22]) by smtp.cegetel.net (Postfix) with ESMTP id 2A0EC318580; Tue, 5 Jul 2005 09:38:53 +0200 (CEST) Message-ID: <42CA390C.9000801@cosmosbay.com> Date: Tue, 05 Jul 2005 09:38:52 +0200 From: Eric Dumazet User-Agent: Mozilla Thunderbird 1.0 (Windows/20041206) X-Accept-Language: fr, en MIME-Version: 1.0 To: "David S. Miller" Cc: netdev@oss.sgi.com Subject: [PATCH] loop unrolling in net/sched/sch_generic.c References: <20050704.154712.63128211.davem@davemloft.net> <42C9BE69.2070008@cosmosbay.com> <42C9BEF6.4080402@cosmosbay.com> <20050704.160140.21591849.davem@davemloft.net> In-Reply-To: <20050704.160140.21591849.davem@davemloft.net> Content-Type: multipart/mixed; boundary="------------090908060004010709010307" X-archive-position: 2628 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: dada1@cosmosbay.com Precedence: bulk X-list: netdev Content-Length: 1730 Lines: 54 This is a multi-part message in MIME format. --------------090908060004010709010307 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit [NET] : unroll a small loop in pfifo_fast_dequeue(). Compiler generates better code. (Using skb_queue_empty() to test the queue is faster than trying to __skb_dequeue()) oprofile says this function uses now 0.29% instead of 1.22 %, on a x86_64 target. Signed-off-by: Eric Dumazet --------------090908060004010709010307 Content-Type: text/plain; name="patch.sch_generic" Content-Transfer-Encoding: 7bit Content-Disposition: inline; filename="patch.sch_generic" --- linux-2.6.12/net/sched/sch_generic.c 2005-06-17 21:48:29.000000000 +0200 +++ linux-2.6.12-ed/net/sched/sch_generic.c 2005-07-05 09:11:30.000000000 +0200 @@ -333,18 +333,23 @@ static struct sk_buff * pfifo_fast_dequeue(struct Qdisc* qdisc) { - int prio; struct sk_buff_head *list = qdisc_priv(qdisc); struct sk_buff *skb; - for (prio = 0; prio < 3; prio++, list++) { - skb = __skb_dequeue(list); - if (skb) { - qdisc->q.qlen--; - return skb; - } + for (;;) { + if (!skb_queue_empty(list)) + break; + list++; + if (!skb_queue_empty(list)) + break; + list++; + if (!skb_queue_empty(list)) + break; + return NULL; } - return NULL; + skb = __skb_dequeue(list); + qdisc->q.qlen--; + return skb; } static int --------------090908060004010709010307-- From tgraf@suug.ch Tue Jul 5 04:52:35 2005 Received: with ECARTIS (v1.0.0; list netdev); Tue, 05 Jul 2005 04:52:40 -0700 (PDT) Received: from postel.suug.ch (postel.suug.ch [195.134.158.23]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j65BqWH9017561 for ; Tue, 5 Jul 2005 04:52:35 -0700 Received: by postel.suug.ch (Postfix, from userid 10001) id 003BA1C0EB; Tue, 5 Jul 2005 13:51:08 +0200 (CEST) Date: Tue, 5 Jul 2005 13:51:08 +0200 From: Thomas Graf To: Eric Dumazet Cc: "David S. Miller" , netdev@oss.sgi.com Subject: Re: [PATCH] loop unrolling in net/sched/sch_generic.c Message-ID: <20050705115108.GE16076@postel.suug.ch> References: <20050704.154712.63128211.davem@davemloft.net> <42C9BE69.2070008@cosmosbay.com> <42C9BEF6.4080402@cosmosbay.com> <20050704.160140.21591849.davem@davemloft.net> <42CA390C.9000801@cosmosbay.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <42CA390C.9000801@cosmosbay.com> X-archive-position: 2629 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: tgraf@suug.ch Precedence: bulk X-list: netdev Content-Length: 574 Lines: 12 * Eric Dumazet <42CA390C.9000801@cosmosbay.com> 2005-07-05 09:38 > [NET] : unroll a small loop in pfifo_fast_dequeue(). Compiler generates > better code. > (Using skb_queue_empty() to test the queue is faster than trying to > __skb_dequeue()) > oprofile says this function uses now 0.29% instead of 1.22 %, on a > x86_64 target. I think this patch is pretty much pointless. __skb_dequeue() and !skb_queue_empty() should produce almost the same code and as soon as you disable profiling and debugging you'll see that the compiler unrolls the loop itself if possible. From tgraf@suug.ch Tue Jul 5 05:04:41 2005 Received: with ECARTIS (v1.0.0; list netdev); Tue, 05 Jul 2005 05:04:47 -0700 (PDT) Received: from postel.suug.ch (postel.suug.ch [195.134.158.23]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j65C4eH9018676 for ; Tue, 5 Jul 2005 05:04:41 -0700 Received: by postel.suug.ch (Postfix, from userid 10001) id 0431F1C0F3; Tue, 5 Jul 2005 14:03:27 +0200 (CEST) Date: Tue, 5 Jul 2005 14:03:27 +0200 From: Thomas Graf To: Eric Dumazet Cc: "David S. Miller" , netdev@oss.sgi.com Subject: Re: [PATCH] loop unrolling in net/sched/sch_generic.c Message-ID: <20050705120327.GF16076@postel.suug.ch> References: <20050704.154712.63128211.davem@davemloft.net> <42C9BE69.2070008@cosmosbay.com> <42C9BEF6.4080402@cosmosbay.com> <20050704.160140.21591849.davem@davemloft.net> <42CA390C.9000801@cosmosbay.com> <20050705115108.GE16076@postel.suug.ch> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20050705115108.GE16076@postel.suug.ch> X-archive-position: 2630 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: tgraf@suug.ch Precedence: bulk X-list: netdev Content-Length: 706 Lines: 15 * Thomas Graf <20050705115108.GE16076@postel.suug.ch> 2005-07-05 13:51 > * Eric Dumazet <42CA390C.9000801@cosmosbay.com> 2005-07-05 09:38 > > [NET] : unroll a small loop in pfifo_fast_dequeue(). Compiler generates > > better code. > > (Using skb_queue_empty() to test the queue is faster than trying to > > __skb_dequeue()) > > oprofile says this function uses now 0.29% instead of 1.22 %, on a > > x86_64 target. > > I think this patch is pretty much pointless. __skb_dequeue() and > !skb_queue_empty() should produce almost the same code and as soon > as you disable profiling and debugging you'll see that the compiler > unrolls the loop itself if possible. ... given one enables it of course. From dada1@cosmosbay.com Tue Jul 5 06:06:04 2005 Received: with ECARTIS (v1.0.0; list netdev); Tue, 05 Jul 2005 06:06:08 -0700 (PDT) Received: from gw1.cosmosbay.com (gw1.cosmosbay.com [62.23.185.226]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j65D63H9023158 for ; Tue, 5 Jul 2005 06:06:04 -0700 Received: from [172.16.0.131] (edumazet-port [172.16.0.131]) by gw1.cosmosbay.com (8.13.3/8.13.3) with ESMTP id j65D4MJi029934; Tue, 5 Jul 2005 15:04:22 +0200 Message-ID: <42CA8555.9050607@cosmosbay.com> Date: Tue, 05 Jul 2005 15:04:21 +0200 From: Eric Dumazet User-Agent: Mozilla Thunderbird 1.0 (Windows/20041206) X-Accept-Language: fr, en MIME-Version: 1.0 To: Thomas Graf CC: "David S. Miller" , netdev@oss.sgi.com Subject: Re: [PATCH] loop unrolling in net/sched/sch_generic.c References: <20050704.154712.63128211.davem@davemloft.net> <42C9BE69.2070008@cosmosbay.com> <42C9BEF6.4080402@cosmosbay.com> <20050704.160140.21591849.davem@davemloft.net> <42CA390C.9000801@cosmosbay.com> <20050705115108.GE16076@postel.suug.ch> In-Reply-To: <20050705115108.GE16076@postel.suug.ch> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 8bit X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-1.6 (gw1.cosmosbay.com [172.16.8.80]); Tue, 05 Jul 2005 15:04:23 +0200 (CEST) X-archive-position: 2631 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: dada1@cosmosbay.com Precedence: bulk X-list: netdev Content-Length: 4244 Lines: 86 Thomas Graf a écrit : > * Eric Dumazet <42CA390C.9000801@cosmosbay.com> 2005-07-05 09:38 > >>[NET] : unroll a small loop in pfifo_fast_dequeue(). Compiler generates >>better code. >> (Using skb_queue_empty() to test the queue is faster than trying to >> __skb_dequeue()) >> oprofile says this function uses now 0.29% instead of 1.22 %, on a >> x86_64 target. > > > I think this patch is pretty much pointless. __skb_dequeue() and > !skb_queue_empty() should produce almost the same code and as soon > as you disable profiling and debugging you'll see that the compiler > unrolls the loop itself if possible. > > OK. At least my compiler (gcc-3.3.1) does NOT unroll the loop : Original 2.6.12 gives : ffffffff802a9790 : /* pfifo_fast_dequeue total: 2904054 1.9531 */ 258371 0.1738 :ffffffff802a9790: lea 0xc0(%rdi),%rcx 273669 0.1841 :ffffffff802a9797: xor %esi,%esi 12533 0.0084 :ffffffff802a9799: mov (%rcx),%rdx 292315 0.1966 :ffffffff802a979c: cmp %rcx,%rdx 11717 0.0079 :ffffffff802a979f: je ffffffff802a97d1 4474 0.0030 :ffffffff802a97a1: mov %rdx,%rax 6238 0.0042 :ffffffff802a97a4: mov (%rdx),%rdx 41 2.8e-05 :ffffffff802a97a7: decl 0x10(%rcx) 6089 0.0041 :ffffffff802a97aa: test %rax,%rax 126 8.5e-05 :ffffffff802a97ad: movq $0x0,0x10(%rax) 39 2.6e-05 :ffffffff802a97b5: mov %rcx,0x8(%rdx) 6974 0.0047 :ffffffff802a97b9: mov %rdx,(%rcx) 2841 0.0019 :ffffffff802a97bc: movq $0x0,0x8(%rax) 366 2.5e-04 :ffffffff802a97c4: movq $0x0,(%rax) 14757 0.0099 :ffffffff802a97cb: je ffffffff802a97d1 288 1.9e-04 :ffffffff802a97cd: decl 0x40(%rdi) 94 6.3e-05 :ffffffff802a97d0: retq 970400 0.6526 :ffffffff802a97d1: inc %esi 982402 0.6607 :ffffffff802a97d3: add $0x18,%rcx 4 2.7e-06 :ffffffff802a97d7: cmp $0x2,%esi 1 6.7e-07 :ffffffff802a97da: jle ffffffff802a9799 59754 0.0402 :ffffffff802a97dc: xor %eax,%eax 561 3.8e-04 :ffffffff802a97de: data16 :ffffffff802a97df: nop :ffffffff802a97e0: retq And new code (2.6.12-ed): ffffffff802b1020 : /* pfifo_fast_dequeue total: 153139 0.2934 */ 27388 0.0525 :ffffffff802b1020: lea 0xc0(%rdi),%rdx 42091 0.0806 :ffffffff802b1027: cmp %rdx,0xc0(%rdi) :ffffffff802b102e: jne ffffffff802b1052 474 9.1e-04 :ffffffff802b1030: lea 0xd8(%rdi),%rdx 5571 0.0107 :ffffffff802b1037: cmp %rdx,0xd8(%rdi) 2 3.8e-06 :ffffffff802b103e: jne ffffffff802b1052 1 1.9e-06 :ffffffff802b1040: lea 0xf0(%rdi),%rdx 20030 0.0384 :ffffffff802b1047: xor %eax,%eax 6 1.1e-05 :ffffffff802b1049: cmp %rdx,0xf0(%rdi) 6 1.1e-05 :ffffffff802b1050: je ffffffff802b1086 :ffffffff802b1052: mov (%rdx),%rcx 11796 0.0226 :ffffffff802b1055: xor %eax,%eax :ffffffff802b1057: cmp %rdx,%rcx 8 1.5e-05 :ffffffff802b105a: je ffffffff802b1083 3146 0.0060 :ffffffff802b105c: mov %rcx,%rax 12 2.3e-05 :ffffffff802b105f: mov (%rcx),%rcx 118 2.3e-04 :ffffffff802b1062: decl 0x10(%rdx) 4924 0.0094 :ffffffff802b1065: movq $0x0,0x10(%rax) 65 1.2e-04 :ffffffff802b106d: mov %rdx,0x8(%rcx) 725 0.0014 :ffffffff802b1071: mov %rcx,(%rdx) 11493 0.0220 :ffffffff802b1074: movq $0x0,0x8(%rax) 194 3.7e-04 :ffffffff802b107c: movq $0x0,(%rax) 2995 0.0057 :ffffffff802b1083: decl 0x40(%rdi) 19607 0.0376 :ffffffff802b1086: nop 2487 0.0048 :ffffffff802b1087: retq Please give us the code your compiler produces, and explain me how disabling oprofile can change the generated assembly. :) Debugging has no impact on this code either. Thank you Eric From tgraf@suug.ch Tue Jul 5 06:49:21 2005 Received: with ECARTIS (v1.0.0; list netdev); Tue, 05 Jul 2005 06:49:28 -0700 (PDT) Received: from postel.suug.ch (postel.suug.ch [195.134.158.23]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j65DnIH9026297 for ; Tue, 5 Jul 2005 06:49:21 -0700 Received: by postel.suug.ch (Postfix, from userid 10001) id C19731C0EB; Tue, 5 Jul 2005 15:48:05 +0200 (CEST) Date: Tue, 5 Jul 2005 15:48:05 +0200 From: Thomas Graf To: Eric Dumazet Cc: "David S. Miller" , netdev@oss.sgi.com Subject: Re: [PATCH] loop unrolling in net/sched/sch_generic.c Message-ID: <20050705134805.GH16076@postel.suug.ch> References: <20050704.154712.63128211.davem@davemloft.net> <42C9BE69.2070008@cosmosbay.com> <42C9BEF6.4080402@cosmosbay.com> <20050704.160140.21591849.davem@davemloft.net> <42CA390C.9000801@cosmosbay.com> <20050705115108.GE16076@postel.suug.ch> <42CA8555.9050607@cosmosbay.com> Mime-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <42CA8555.9050607@cosmosbay.com> X-archive-position: 2632 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: tgraf@suug.ch Precedence: bulk X-list: netdev Content-Length: 2708 Lines: 118 * Eric Dumazet <42CA8555.9050607@cosmosbay.com> 2005-07-05 15:04 > Thomas Graf a écrit : > >* Eric Dumazet <42CA390C.9000801@cosmosbay.com> 2005-07-05 09:38 > > > >>[NET] : unroll a small loop in pfifo_fast_dequeue(). Compiler generates > >>better code. > >> (Using skb_queue_empty() to test the queue is faster than trying to > >> __skb_dequeue()) > >> oprofile says this function uses now 0.29% instead of 1.22 %, on a > >> x86_64 target. > > > > > >I think this patch is pretty much pointless. __skb_dequeue() and > >!skb_queue_empty() should produce almost the same code and as soon > >as you disable profiling and debugging you'll see that the compiler > >unrolls the loop itself if possible. > > > > > > OK. At least my compiler (gcc-3.3.1) does NOT unroll the loop : Because you don't specify -funroll-loop [...] > Please give us the code your compiler produces, Unrolled version: pfifo_fast_dequeue: pushl %esi xorl %edx, %edx pushl %ebx movl 12(%esp), %esi movl 128(%esi), %eax leal 128(%esi), %ecx cmpl %ecx, %eax je .L132 movl %eax, %edx movl (%eax), %eax decl 8(%ecx) movl $0, 8(%edx) movl %ecx, 4(%eax) movl %eax, 128(%esi) movl $0, 4(%edx) movl $0, (%edx) .L132: testl %edx, %edx je .L131 movl 96(%edx), %ebx movl 80(%esi), %eax decl 40(%esi) subl %ebx, %eax movl %eax, 80(%esi) movl %edx, %eax .L117: popl %ebx popl %esi ret .L131: movl 20(%ecx), %eax leal 20(%ecx), %edx xorl %ebx, %ebx cmpl %edx, %eax je .L137 movl %eax, %ebx movl (%eax), %eax decl 8(%edx) movl $0, 8(%ebx) movl %edx, 4(%eax) movl %eax, 20(%ecx) movl $0, 4(%ebx) movl $0, (%ebx) .L137: testl %ebx, %ebx je .L147 .L146: movl 96(%ebx), %ecx movl 80(%esi), %eax decl 40(%esi) subl %ecx, %eax movl %eax, 80(%esi) movl %ebx, %eax jmp .L117 .L147: movl 40(%ecx), %eax leal 40(%ecx), %edx xorl %ebx, %ebx cmpl %edx, %eax je .L142 movl %eax, %ebx movl (%eax), %eax decl 8(%edx) movl $0, 8(%ebx) movl %edx, 4(%eax) movl %eax, 40(%ecx) movl $0, 4(%ebx) movl $0, (%ebx) .L142: xorl %eax, %eax testl %ebx, %ebx jne .L146 jmp .L117 > and explain me how > disabling oprofile can change the generated assembly. :) > Debugging has no impact on this code either. I just noticed that this is a local modification of my own, so in the vanilla tree it indeed doesn't have any impact on the code generated. Still, your patch does not make sense to me. The latest tree also includes my pfifo_fast changes wich modified the code to maintain a backlog and made it easy to add more fifos at compile time. If you want the loop unrolled then let the compiler do it via -funroll-loop. These kind of optimization seem as uncessary to me as all the loopback optimizations. From patjenk@wam.umd.edu Tue Jul 5 08:19:19 2005 Received: with ECARTIS (v1.0.0; list netdev); Tue, 05 Jul 2005 08:19:24 -0700 (PDT) Received: from po2.wam.umd.edu (po2.wam.umd.edu [128.8.10.164]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j65FJIH9003084 for ; Tue, 5 Jul 2005 08:19:18 -0700 Received: from rac3.wam.umd.edu (IDENT:sendmail@rac3.wam.umd.edu [128.8.10.143]) by po2.wam.umd.edu (8.12.10/8.12.10) with ESMTP id j65FHeZD016122; Tue, 5 Jul 2005 11:17:40 -0400 (EDT) Received: from localhost (patjenk@localhost) by rac3.wam.umd.edu (8.12.10/8.12.10) with ESMTP id j65FHZl1017138; Tue, 5 Jul 2005 11:17:37 -0400 (EDT) X-Authentication-Warning: rac3.wam.umd.edu: patjenk owned process doing -bs Date: Tue, 5 Jul 2005 11:17:35 -0400 (EDT) From: Patrick Jenkins To: linux-kernel@vger.kernel.org, netdev@oss.sgi.com Subject: Re: [PATCH] multipath routing algorithm, better patch Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed X-archive-position: 2633 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: patjenk@wam.umd.edu Precedence: bulk X-list: netdev Content-Length: 1199 Lines: 32 >Thomas Graf wrote: >> * Patrick McHardy <42C4919A.5000009@trash.net> 2005-07-01 02:43 >> >>>If it isn't set correctly its an iproute problem. Did you actually >>>experience any problems? >> >> Well, my patch for iproute2 to enable multipath algorithm selection >> is currently being merged to Stephen together with the ematch bits. >> We had to work out a dependency on GNU flex first (the berkley >> version uses the same executable names) so the inclusion was >> delayed a bit. > >So its no problem but simply missing support. BTW, do you know if >Stephen's new CVS repository is exported somewhere? Yes, we did experience a problem. The routing doesnt work as advertised (i.e. it wont utilize the multiple routes). Patrick is correct in saying its missing support. From what you are saying it seems the problem will be corrected by a new feature in iproute2 that allows the user to select this ability. Is this correct? Also, does this mean my patch is not needed? It seems to me that it should be supported somehow in the kernel seeing as how everyone might not utilize iproute2. Once again, please cc me in the reply because I am not a member of the list. Thanks, Patrick Jenkins From dada1@cosmosbay.com Tue Jul 5 09:00:23 2005 Received: with ECARTIS (v1.0.0; list netdev); Tue, 05 Jul 2005 09:00:26 -0700 (PDT) Received: from gw1.cosmosbay.com (gw1.cosmosbay.com [62.23.185.226]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j65G0MH9006625 for ; Tue, 5 Jul 2005 09:00:22 -0700 Received: from [172.16.0.131] (edumazet-port [172.16.0.131]) by gw1.cosmosbay.com (8.13.3/8.13.3) with ESMTP id j65FwfLs005806; Tue, 5 Jul 2005 17:58:43 +0200 Message-ID: <42CAAE2F.5070807@cosmosbay.com> Date: Tue, 05 Jul 2005 17:58:39 +0200 From: Eric Dumazet User-Agent: Mozilla Thunderbird 1.0 (Windows/20041206) X-Accept-Language: fr, en MIME-Version: 1.0 To: Thomas Graf CC: "David S. Miller" , netdev@oss.sgi.com Subject: Re: [PATCH] loop unrolling in net/sched/sch_generic.c References: <20050704.154712.63128211.davem@davemloft.net> <42C9BE69.2070008@cosmosbay.com> <42C9BEF6.4080402@cosmosbay.com> <20050704.160140.21591849.davem@davemloft.net> <42CA390C.9000801@cosmosbay.com> <20050705115108.GE16076@postel.suug.ch> <42CA8555.9050607@cosmosbay.com> <20050705134805.GH16076@postel.suug.ch> In-Reply-To: <20050705134805.GH16076@postel.suug.ch> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 8bit X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-1.6 (gw1.cosmosbay.com [172.16.8.80]); Tue, 05 Jul 2005 17:58:43 +0200 (CEST) X-archive-position: 2634 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: dada1@cosmosbay.com Precedence: bulk X-list: netdev Content-Length: 5028 Lines: 165 Thomas Graf a écrit : >>OK. At least my compiler (gcc-3.3.1) does NOT unroll the loop : > > > Because you don't specify -funroll-loop I'm using vanilla 2.6.12 : no -funroll-loop in it. Maybe in your tree, not on 99.9% of 2.6.12 trees. Are you suggesting everybody should use this compiler flag ? Something like : net/sched/Makefile: CFLAGS_sch_generic.o := -funroll-loops ? > > [...] > > >>Please give us the code your compiler produces, > > > Unrolled version: > > pfifo_fast_dequeue: > pushl %esi > xorl %edx, %edx > pushl %ebx > movl 12(%esp), %esi > movl 128(%esi), %eax > leal 128(%esi), %ecx > cmpl %ecx, %eax > je .L132 > movl %eax, %edx > movl (%eax), %eax > decl 8(%ecx) > movl $0, 8(%edx) > movl %ecx, 4(%eax) > movl %eax, 128(%esi) > movl $0, 4(%edx) > movl $0, (%edx) > .L132: > testl %edx, %edx > je .L131 > movl 96(%edx), %ebx > movl 80(%esi), %eax > decl 40(%esi) > subl %ebx, %eax > movl %eax, 80(%esi) > movl %edx, %eax > .L117: > popl %ebx > popl %esi > ret > .L131: > movl 20(%ecx), %eax > leal 20(%ecx), %edx > xorl %ebx, %ebx > cmpl %edx, %eax > je .L137 > movl %eax, %ebx > movl (%eax), %eax > decl 8(%edx) > movl $0, 8(%ebx) > movl %edx, 4(%eax) > movl %eax, 20(%ecx) > movl $0, 4(%ebx) > movl $0, (%ebx) > .L137: > testl %ebx, %ebx > je .L147 > .L146: > movl 96(%ebx), %ecx > movl 80(%esi), %eax > decl 40(%esi) > subl %ecx, %eax > movl %eax, 80(%esi) > movl %ebx, %eax > jmp .L117 > .L147: > movl 40(%ecx), %eax > leal 40(%ecx), %edx > xorl %ebx, %ebx > cmpl %edx, %eax > je .L142 > movl %eax, %ebx > movl (%eax), %eax > decl 8(%edx) > movl $0, 8(%ebx) > movl %edx, 4(%eax) > movl %eax, 40(%ecx) > movl $0, 4(%ebx) > movl $0, (%ebx) > .L142: > xorl %eax, %eax > testl %ebx, %ebx > jne .L146 > jmp .L117 > OK thanks, but you dont give the code for my version :) shorter and unrolled as you can see, and with nice predicted branches. 00000fc0 : fc0: 56 push %esi fc1: 89 c1 mov %eax,%ecx fc3: 53 push %ebx fc4: 8d 98 a0 00 00 00 lea 0xa0(%eax),%ebx fca: 39 98 a0 00 00 00 cmp %ebx,0xa0(%eax) fd0: 89 da mov %ebx,%edx fd2: 75 22 jne ff6 fd4: 8d 90 c4 00 00 00 lea 0xc4(%eax),%edx fda: 39 90 c4 00 00 00 cmp %edx,0xc4(%eax) fe0: 89 d3 mov %edx,%ebx fe2: 75 12 jne ff6 fe4: 8d 98 e8 00 00 00 lea 0xe8(%eax),%ebx fea: 31 f6 xor %esi,%esi fec: 39 98 e8 00 00 00 cmp %ebx,0xe8(%eax) ff2: 89 da mov %ebx,%edx ff4: 74 27 je 101d ff6: 8b 32 mov (%edx),%esi ff8: 39 d6 cmp %edx,%esi ffa: 74 26 je 1022 ffc: 8b 06 mov (%esi),%eax ffe: ff 4b 08 decl 0x8(%ebx) 1001: c7 46 08 00 00 00 00 movl $0x0,0x8(%esi) 1008: 89 50 04 mov %edx,0x4(%eax) 100b: 89 02 mov %eax,(%edx) 100d: c7 46 04 00 00 00 00 movl $0x0,0x4(%esi) 1014: c7 06 00 00 00 00 movl $0x0,(%esi) 101a: ff 49 28 decl 0x28(%ecx) 101d: 5b pop %ebx 101e: 89 f0 mov %esi,%eax 1020: 5e pop %esi 1021: c3 ret 1022: ff 49 28 decl 0x28(%ecx) 1025: 31 f6 xor %esi,%esi 1027: eb f4 jmp 101d > > I just noticed that this is a local modification of my own, so in > the vanilla tree it indeed doesn't have any impact on the code > generated. > > Still, your patch does not make sense to me. The latest tree > also includes my pfifo_fast changes wich modified the code to > maintain a backlog and made it easy to add more fifos at compile > time. If you want the loop unrolled then let the compiler do it > via -funroll-loop. These kind of optimization seem as uncessary > to me as all the loopback optimizations. > I dont want change compiler flags in my tree and loose this optim when 2.6.13 is released. I dont know about loopback optimization, I am not involved with this stuff, maybe you think I'm another guy ? It seems to me you give unrelated arguments. I dont know what are your plans, but mine were not to say you are writing bad code. Just to give my performance analysis and feedback, I'm sorry if it hurts you. Eric Dumazet From dada1@cosmosbay.com Tue Jul 5 09:16:43 2005 Received: with ECARTIS (v1.0.0; list netdev); Tue, 05 Jul 2005 09:16:46 -0700 (PDT) Received: from gw1.cosmosbay.com (gw1.cosmosbay.com [62.23.185.226]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j65GGgH9008304 for ; Tue, 5 Jul 2005 09:16:43 -0700 Received: from [172.16.0.131] (edumazet-port [172.16.0.131]) by gw1.cosmosbay.com (8.13.3/8.13.3) with ESMTP id j65GExUv006247; Tue, 5 Jul 2005 18:15:00 +0200 Message-ID: <42CAB203.7050700@cosmosbay.com> Date: Tue, 05 Jul 2005 18:14:59 +0200 From: Eric Dumazet User-Agent: Mozilla Thunderbird 1.0 (Windows/20041206) X-Accept-Language: fr, en MIME-Version: 1.0 To: "David S. Miller" CC: mchan@broadcom.com, netdev@oss.sgi.com Subject: Re: [TG3]: About hw coalescing infrastructure. References: <42C9B8DB.7020403@cosmosbay.com> <20050704.154712.63128211.davem@davemloft.net> <42C9BE69.2070008@cosmosbay.com> <20050704.160034.55511095.davem@davemloft.net> In-Reply-To: <20050704.160034.55511095.davem@davemloft.net> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 8bit X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-1.6 (gw1.cosmosbay.com [172.16.8.80]); Tue, 05 Jul 2005 18:15:00 +0200 (CEST) X-archive-position: 2635 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: dada1@cosmosbay.com Precedence: bulk X-list: netdev Content-Length: 794 Lines: 31 David S. Miller a écrit : > SMP system? What platform and 570x chip revision? > dual opterons 248, Ethernet controller: Broadcom Corporation NetXtreme BCM5702 Gigabit Ethernet (rev 02) > Does the stock driver in 2.6.12 hang as well? Stock driver in 2.6.11 or 2.6.12 cause hangs (crashes ?) too, but no kernel mesages are logged on disk. I think this is crashing in cache_grow() (slab code), not sure because I dont have access to the console. The 2.6.13-rc1 tg3 driver seems to survive (running for 18 hours now), and the cpu used by network activity was cut down :) rx-usecs: 500 rx-frames: 30 rx-usecs-irq: 500 rx-frames-irq: 20 tx-usecs: 490 tx-frames: 53 tx-usecs-irq: 490 tx-frames-irq: 5 About 1200 IRQ/second now, with 30000 recv packs/s and 25000 xmit pack/s Thank you Eric From tgraf@suug.ch Tue Jul 5 10:35:25 2005 Received: with ECARTIS (v1.0.0; list netdev); Tue, 05 Jul 2005 10:35:34 -0700 (PDT) Received: from postel.suug.ch (postel.suug.ch [195.134.158.23]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j65HZOH9013591 for ; Tue, 5 Jul 2005 10:35:25 -0700 Received: by postel.suug.ch (Postfix, from userid 10001) id 5576C1C0EB; Tue, 5 Jul 2005 19:34:11 +0200 (CEST) Date: Tue, 5 Jul 2005 19:34:11 +0200 From: Thomas Graf To: Eric Dumazet Cc: "David S. Miller" , netdev@oss.sgi.com Subject: Re: [PATCH] loop unrolling in net/sched/sch_generic.c Message-ID: <20050705173411.GK16076@postel.suug.ch> References: <20050704.154712.63128211.davem@davemloft.net> <42C9BE69.2070008@cosmosbay.com> <42C9BEF6.4080402@cosmosbay.com> <20050704.160140.21591849.davem@davemloft.net> <42CA390C.9000801@cosmosbay.com> <20050705115108.GE16076@postel.suug.ch> <42CA8555.9050607@cosmosbay.com> <20050705134805.GH16076@postel.suug.ch> <42CAAE2F.5070807@cosmosbay.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <42CAAE2F.5070807@cosmosbay.com> X-archive-position: 2636 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: tgraf@suug.ch Precedence: bulk X-list: netdev Content-Length: 474 Lines: 12 * Eric Dumazet <42CAAE2F.5070807@cosmosbay.com> 2005-07-05 17:58 > I'm using vanilla 2.6.12 : no -funroll-loop in it. Maybe in your tree, not > on 99.9% of 2.6.12 trees. > > Are you suggesting everybody should use this compiler flag ? > I dont know about loopback optimization, I am not involved with this stuff, > maybe you think I'm another guy ? > > It seems to me you give unrelated arguments. Do as you wish, I don't feel like argueing about micro optimizations. From leonid.grossman@neterion.com Tue Jul 5 11:32:47 2005 Received: with ECARTIS (v1.0.0; list netdev); Tue, 05 Jul 2005 11:32:50 -0700 (PDT) Received: from ns1.s2io.com (ns1.s2io.com [142.46.200.198]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j65IWlH9017921 for ; Tue, 5 Jul 2005 11:32:47 -0700 Received: from guinness.s2io.com (sentry.s2io.com [142.46.200.199]) by ns1.s2io.com (8.12.10/8.12.10) with ESMTP id j65IVBcx012270; Tue, 5 Jul 2005 14:31:11 -0400 (EDT) Received: from lgt40 ([10.16.16.64]) by guinness.s2io.com (8.12.6/8.12.6) with ESMTP id j65IV6xS005692; Tue, 5 Jul 2005 14:31:07 -0400 (EDT) Message-Id: <200507051831.j65IV6xS005692@guinness.s2io.com> From: "Leonid Grossman" To: Cc: , "'Raghavendra Koushik'" Subject: Msi-X on Opterons? Date: Tue, 5 Jul 2005 11:31:02 -0700 MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit X-Mailer: Microsoft Office Outlook, Build 11.0.5510 Thread-Index: AcWBj60cBTBHZqLpTBm1j+9Oe4KOHQ== X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2800.1409 X-Scanned-By: MIMEDefang 2.34 X-archive-position: 2637 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: leonid.grossman@neterion.com Precedence: bulk X-list: netdev Content-Length: 247 Lines: 6 Did anyone had any luck getting MSI-X working on Opteron platforms? The newer 8132-based systems support MSI-X, but we could not get it working with our card there. It works fine on Xeon systems, I wonder if the implementation is Xeon-centric... From davem@davemloft.net Tue Jul 5 14:23:54 2005 Received: with ECARTIS (v1.0.0; list netdev); Tue, 05 Jul 2005 14:24:00 -0700 (PDT) Received: from sunset.davemloft.net (dsl027-180-168.sfo1.dsl.speakeasy.net [216.27.180.168]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j65LNsH9031419 for ; Tue, 5 Jul 2005 14:23:54 -0700 Received: from localhost ([127.0.0.1] ident=davem) by sunset.davemloft.net with esmtp (Exim 4.50) id 1Dpurq-0001qd-TQ; Tue, 05 Jul 2005 14:22:10 -0700 Date: Tue, 05 Jul 2005 14:22:10 -0700 (PDT) Message-Id: <20050705.142210.14973612.davem@davemloft.net> To: tgraf@suug.ch Cc: dada1@cosmosbay.com, netdev@oss.sgi.com Subject: Re: [PATCH] loop unrolling in net/sched/sch_generic.c From: "David S. Miller" In-Reply-To: <20050705173411.GK16076@postel.suug.ch> References: <20050705134805.GH16076@postel.suug.ch> <42CAAE2F.5070807@cosmosbay.com> <20050705173411.GK16076@postel.suug.ch> X-Mailer: Mew version 4.2 on Emacs 21.4 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 2638 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@davemloft.net Precedence: bulk X-list: netdev Content-Length: 752 Lines: 19 From: Thomas Graf Date: Tue, 5 Jul 2005 19:34:11 +0200 > Do as you wish, I don't feel like argueing about micro optimizations. I bet the performance gain really comes from the mispredicted branches in the loop. For loops of fixed duration, say, 5 or 6 iterations or less, it totally defeats the branch prediction logic in most processors. By the time the chip moves the I-cache branch state to "likely" the loop has ended and we eat a mispredict. I think the original patch is OK, hand unrolling the loop in the C code. Adding -funroll-loops to the CFLAGS has lots of implications, and in particular the embedded folks might not be happy with some things that result from that. So I'll apply the original unrolling patch for now. From davem@davemloft.net Tue Jul 5 14:28:19 2005 Received: with ECARTIS (v1.0.0; list netdev); Tue, 05 Jul 2005 14:28:21 -0700 (PDT) Received: from sunset.davemloft.net (dsl027-180-168.sfo1.dsl.speakeasy.net [216.27.180.168]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j65LSJH9004901 for ; Tue, 5 Jul 2005 14:28:19 -0700 Received: from localhost ([127.0.0.1] ident=davem) by sunset.davemloft.net with esmtp (Exim 4.50) id 1DpuwE-0002hy-2J; Tue, 05 Jul 2005 14:26:42 -0700 Date: Tue, 05 Jul 2005 14:26:41 -0700 (PDT) Message-Id: <20050705.142641.08323224.davem@davemloft.net> To: dada1@cosmosbay.com Cc: netdev@oss.sgi.com Subject: Re: [PATCH] loop unrolling in net/sched/sch_generic.c From: "David S. Miller" In-Reply-To: <42CA390C.9000801@cosmosbay.com> References: <42C9BEF6.4080402@cosmosbay.com> <20050704.160140.21591849.davem@davemloft.net> <42CA390C.9000801@cosmosbay.com> X-Mailer: Mew version 4.2 on Emacs 21.4 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 2639 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@davemloft.net Precedence: bulk X-list: netdev Content-Length: 423 Lines: 11 Eric, I've told you this before many times. Please do something so that your email client does not corrupt the patches. Once again, your email client turned all the tab characters into spaces and thus made the patch unusable. Even though you used an attachment, the tab-->space transformation still happened somehow. Please fix this, and make a serious mental note to prevent this somehow in the future, thanks a lot. From tgraf@suug.ch Tue Jul 5 14:35:12 2005 Received: with ECARTIS (v1.0.0; list netdev); Tue, 05 Jul 2005 14:35:15 -0700 (PDT) Received: from postel.suug.ch (postel.suug.ch [195.134.158.23]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j65LZBH9005929 for ; Tue, 5 Jul 2005 14:35:11 -0700 Received: by postel.suug.ch (Postfix, from userid 10001) id 259FD1C0F3; Tue, 5 Jul 2005 23:33:56 +0200 (CEST) Date: Tue, 5 Jul 2005 23:33:55 +0200 From: Thomas Graf To: "David S. Miller" Cc: dada1@cosmosbay.com, netdev@oss.sgi.com Subject: Re: [PATCH] loop unrolling in net/sched/sch_generic.c Message-ID: <20050705213355.GM16076@postel.suug.ch> References: <20050705134805.GH16076@postel.suug.ch> <42CAAE2F.5070807@cosmosbay.com> <20050705173411.GK16076@postel.suug.ch> <20050705.142210.14973612.davem@davemloft.net> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20050705.142210.14973612.davem@davemloft.net> X-archive-position: 2640 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: tgraf@suug.ch Precedence: bulk X-list: netdev Content-Length: 248 Lines: 5 * David S. Miller <20050705.142210.14973612.davem@davemloft.net> 2005-07-05 14:22 > So I'll apply the original unrolling patch for now. The patch must be changed to use __qdisc_dequeue_head() instead of __skb_dequeue() or we screw up the backlog. From davem@davemloft.net Tue Jul 5 14:37:27 2005 Received: with ECARTIS (v1.0.0; list netdev); Tue, 05 Jul 2005 14:37:31 -0700 (PDT) Received: from sunset.davemloft.net (dsl027-180-168.sfo1.dsl.speakeasy.net [216.27.180.168]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j65LbRH9006333 for ; Tue, 5 Jul 2005 14:37:27 -0700 Received: from localhost ([127.0.0.1] ident=davem) by sunset.davemloft.net with esmtp (Exim 4.50) id 1Dpv53-0004ba-4H; Tue, 05 Jul 2005 14:35:49 -0700 Date: Tue, 05 Jul 2005 14:35:48 -0700 (PDT) Message-Id: <20050705.143548.28788459.davem@davemloft.net> To: tgraf@suug.ch Cc: dada1@cosmosbay.com, netdev@oss.sgi.com Subject: Re: [PATCH] loop unrolling in net/sched/sch_generic.c From: "David S. Miller" In-Reply-To: <20050705213355.GM16076@postel.suug.ch> References: <20050705173411.GK16076@postel.suug.ch> <20050705.142210.14973612.davem@davemloft.net> <20050705213355.GM16076@postel.suug.ch> X-Mailer: Mew version 4.2 on Emacs 21.4 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 2641 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@davemloft.net Precedence: bulk X-list: netdev Content-Length: 390 Lines: 10 From: Thomas Graf Date: Tue, 5 Jul 2005 23:33:55 +0200 > * David S. Miller <20050705.142210.14973612.davem@davemloft.net> 2005-07-05 14:22 > > So I'll apply the original unrolling patch for now. > > The patch must be changed to use __qdisc_dequeue_head() instead of > __skb_dequeue() or we screw up the backlog. Ok, good thing the patch didn't apply correctly anyways :) From dada1@cosmosbay.com Tue Jul 5 16:17:45 2005 Received: with ECARTIS (v1.0.0; list netdev); Tue, 05 Jul 2005 16:17:54 -0700 (PDT) Received: from smtp.cegetel.net (mf01.sitadelle.com [212.94.174.68]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j65NHiH9013123 for ; Tue, 5 Jul 2005 16:17:45 -0700 Received: from [192.168.0.5] (84-4-149-97.dti.cegetel.net [84.4.149.97]) by smtp.cegetel.net (Postfix) with ESMTP id 90AED31819E; Wed, 6 Jul 2005 01:16:01 +0200 (CEST) Message-ID: <42CB14B2.5090601@cosmosbay.com> Date: Wed, 06 Jul 2005 01:16:02 +0200 From: Eric Dumazet User-Agent: Mozilla Thunderbird 1.0 (Windows/20041206) X-Accept-Language: fr, en MIME-Version: 1.0 To: "David S. Miller" , tgraf@suug.ch Cc: netdev@oss.sgi.com Subject: Re: [PATCH] loop unrolling in net/sched/sch_generic.c References: <20050705173411.GK16076@postel.suug.ch> <20050705.142210.14973612.davem@davemloft.net> <20050705213355.GM16076@postel.suug.ch> <20050705.143548.28788459.davem@davemloft.net> In-Reply-To: <20050705.143548.28788459.davem@davemloft.net> Content-Type: multipart/mixed; boundary="------------080605050609050601090102" X-archive-position: 2642 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: dada1@cosmosbay.com Precedence: bulk X-list: netdev Content-Length: 2313 Lines: 84 This is a multi-part message in MIME format. --------------080605050609050601090102 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 8bit David S. Miller a écrit : > From: Thomas Graf > Date: Tue, 5 Jul 2005 23:33:55 +0200 > > >>* David S. Miller <20050705.142210.14973612.davem@davemloft.net> 2005-07-05 14:22 >> >>>So I'll apply the original unrolling patch for now. >> >>The patch must be changed to use __qdisc_dequeue_head() instead of >>__skb_dequeue() or we screw up the backlog. > > > Ok, good thing the patch didn't apply correctly anyways :) > > Oh well, I was unaware of last changes in 2.6.13-rc1 :( Given the fact that the PFIFO_FAST_BANDS macro was introduced, I wonder if the patch should be this one or not... Should we assume PFIFO_FAST_BANDS will stay at 3 or what ? [NET] : unroll a small loop in pfifo_fast_dequeue(). Compiler generates better code. oprofile says this function uses now 0.29% instead of 1.22 %, on a x86_64 target. Signed-off-by: Eric Dumazet --------------080605050609050601090102 Content-Type: text/plain; name="patch.sch_generic" Content-Transfer-Encoding: 7bit Content-Disposition: inline; filename="patch.sch_generic" --- linux-2.6.13-rc1/net/sched/sch_generic.c 2005-07-06 00:46:53.000000000 +0200 +++ linux-2.6.13-rc1-ed/net/sched/sch_generic.c 2005-07-06 01:05:04.000000000 +0200 @@ -328,18 +328,31 @@ static struct sk_buff *pfifo_fast_dequeue(struct Qdisc* qdisc) { - int prio; struct sk_buff_head *list = qdisc_priv(qdisc); - for (prio = 0; prio < PFIFO_FAST_BANDS; prio++, list++) { - struct sk_buff *skb = __qdisc_dequeue_head(qdisc, list); - if (skb) { - qdisc->q.qlen--; - return skb; +#if PFIFO_FAST_BANDS == 3 + for (;;) { + if (!skb_queue_empty(list)) + break; + list++; + if (!skb_queue_empty(list)) + break; + list++; + if (!skb_queue_empty(list)) + break; + return NULL; } - } - - return NULL; +#else + int prio; + for (prio = 0;; list++) { + if (!skb_queue_empty(list)) + break; + if (++prio == PFIFO_FAST_BANDS) + return NULL; + } +#endif + qdisc->q.qlen--; + return __qdisc_dequeue_head(qdisc, list); } static int pfifo_fast_requeue(struct sk_buff *skb, struct Qdisc* qdisc) --------------080605050609050601090102-- From tgraf@suug.ch Tue Jul 5 16:42:23 2005 Received: with ECARTIS (v1.0.0; list netdev); Tue, 05 Jul 2005 16:42:27 -0700 (PDT) Received: from postel.suug.ch (postel.suug.ch [195.134.158.23]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j65NgMH9018222 for ; Tue, 5 Jul 2005 16:42:23 -0700 Received: by postel.suug.ch (Postfix, from userid 10001) id 7CCA11C0F3; Wed, 6 Jul 2005 01:41:04 +0200 (CEST) Date: Wed, 6 Jul 2005 01:41:04 +0200 From: Thomas Graf To: Eric Dumazet Cc: "David S. Miller" , netdev@oss.sgi.com Subject: Re: [PATCH] loop unrolling in net/sched/sch_generic.c Message-ID: <20050705234104.GR16076@postel.suug.ch> References: <20050705173411.GK16076@postel.suug.ch> <20050705.142210.14973612.davem@davemloft.net> <20050705213355.GM16076@postel.suug.ch> <20050705.143548.28788459.davem@davemloft.net> <42CB14B2.5090601@cosmosbay.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <42CB14B2.5090601@cosmosbay.com> X-archive-position: 2643 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: tgraf@suug.ch Precedence: bulk X-list: netdev Content-Length: 1555 Lines: 41 * Eric Dumazet <42CB14B2.5090601@cosmosbay.com> 2005-07-06 01:16 > Oh well, I was unaware of last changes in 2.6.13-rc1 :( Ok, this clarifies a lot for me, I was under the impression you knew about these changes. > Given the fact that the PFIFO_FAST_BANDS macro was introduced, I wonder if > the patch should be this one or not... > Should we assume PFIFO_FAST_BANDS will stay at 3 or what ? It is very unlikely to change within mainline but the idea behind it is to allow it be changed at compile time. I still think we can fix this performance issue without manually unrolling the loop or we should at least try to. In the end gcc should notice the constant part of the loop and move it out so basically the only difference should the additional prio++ and possibly a failing branch prediction. What about this? I'm still not sure where exactly all the time is lost so this is a shot in the dark. Index: net-2.6/net/sched/sch_generic.c =================================================================== --- net-2.6.orig/net/sched/sch_generic.c +++ net-2.6/net/sched/sch_generic.c @@ -330,10 +330,11 @@ static struct sk_buff *pfifo_fast_dequeu { int prio; struct sk_buff_head *list = qdisc_priv(qdisc); + struct sk_buff *skb; - for (prio = 0; prio < PFIFO_FAST_BANDS; prio++, list++) { - struct sk_buff *skb = __qdisc_dequeue_head(qdisc, list); - if (skb) { + for (prio = 0; prio < PFIFO_FAST_BANDS; prio++) { + if (!skb_queue_empty(list + prio)) { + skb = __qdisc_dequeue_head(qdisc, list); qdisc->q.qlen--; return skb; } From davem@davemloft.net Tue Jul 5 16:46:47 2005 Received: with ECARTIS (v1.0.0; list netdev); Tue, 05 Jul 2005 16:46:51 -0700 (PDT) Received: from sunset.davemloft.net (dsl027-180-168.sfo1.dsl.speakeasy.net [216.27.180.168]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j65NkkH9018911 for ; Tue, 5 Jul 2005 16:46:47 -0700 Received: from localhost ([127.0.0.1] ident=davem) by sunset.davemloft.net with esmtp (Exim 4.50) id 1Dpx67-0000ik-4w; Tue, 05 Jul 2005 16:45:03 -0700 Date: Tue, 05 Jul 2005 16:45:03 -0700 (PDT) Message-Id: <20050705.164503.104035718.davem@davemloft.net> To: tgraf@suug.ch Cc: dada1@cosmosbay.com, netdev@oss.sgi.com Subject: Re: [PATCH] loop unrolling in net/sched/sch_generic.c From: "David S. Miller" In-Reply-To: <20050705234104.GR16076@postel.suug.ch> References: <20050705.143548.28788459.davem@davemloft.net> <42CB14B2.5090601@cosmosbay.com> <20050705234104.GR16076@postel.suug.ch> X-Mailer: Mew version 4.2 on Emacs 21.4 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 2644 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@davemloft.net Precedence: bulk X-list: netdev Content-Length: 680 Lines: 16 From: Thomas Graf Date: Wed, 6 Jul 2005 01:41:04 +0200 > I still think we can fix this performance issue without manually > unrolling the loop or we should at least try to. In the end gcc > should notice the constant part of the loop and move it out so > basically the only difference should the additional prio++ and > possibly a failing branch prediction. But the branch prediction is where I personally think a lot of the lossage is coming from. These can cost upwards of 20 or 30 processor cycles, easily. That's getting close to the cost of a L2 cache miss. I see the difficulties with this change now, why don't we revisit this some time in the future? From tgraf@suug.ch Tue Jul 5 16:56:18 2005 Received: with ECARTIS (v1.0.0; list netdev); Tue, 05 Jul 2005 16:56:21 -0700 (PDT) Received: from postel.suug.ch (postel.suug.ch [195.134.158.23]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j65NuHH9019820 for ; Tue, 5 Jul 2005 16:56:18 -0700 Received: by postel.suug.ch (Postfix, from userid 10001) id 8C7211C0F3; Wed, 6 Jul 2005 01:55:04 +0200 (CEST) Date: Wed, 6 Jul 2005 01:55:04 +0200 From: Thomas Graf To: "David S. Miller" Cc: dada1@cosmosbay.com, netdev@oss.sgi.com Subject: Re: [PATCH] loop unrolling in net/sched/sch_generic.c Message-ID: <20050705235504.GS16076@postel.suug.ch> References: <20050705.143548.28788459.davem@davemloft.net> <42CB14B2.5090601@cosmosbay.com> <20050705234104.GR16076@postel.suug.ch> <20050705.164503.104035718.davem@davemloft.net> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20050705.164503.104035718.davem@davemloft.net> X-archive-position: 2645 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: tgraf@suug.ch Precedence: bulk X-list: netdev Content-Length: 1265 Lines: 29 * David S. Miller <20050705.164503.104035718.davem@davemloft.net> 2005-07-05 16:45 > From: Thomas Graf > Date: Wed, 6 Jul 2005 01:41:04 +0200 > > > I still think we can fix this performance issue without manually > > unrolling the loop or we should at least try to. In the end gcc > > should notice the constant part of the loop and move it out so > > basically the only difference should the additional prio++ and > > possibly a failing branch prediction. > > But the branch prediction is where I personally think a lot > of the lossage is coming from. These can cost upwards of 20 > or 30 processor cycles, easily. That's getting close to the > cost of a L2 cache miss. Absolutely. I think what happens is that we produce predicion failures due to the logic within qdisc_dequeue_head(), I cannot back this up with numbers though. > I see the difficulties with this change now, why don't we revisit > this some time in the future? Fine with me. Eric, the patch I just posted should result in the same branch prediction as your loop unrolling. The only additional overhead we still have is the list + prio thing and an additional conditional jump to do the loop. If you have the cycles etc. it would be nice to compare it with your numbers. From dada1@cosmosbay.com Tue Jul 5 17:34:01 2005 Received: with ECARTIS (v1.0.0; list netdev); Tue, 05 Jul 2005 17:34:03 -0700 (PDT) Received: from smtp.cegetel.net (mf00.sitadelle.com [212.94.174.67]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j660Y0H9022148 for ; Tue, 5 Jul 2005 17:34:00 -0700 Received: from [192.168.0.5] (84-4-149-97.dti.cegetel.net [84.4.149.97]) by smtp.cegetel.net (Postfix) with ESMTP id 8834C1A409E; Wed, 6 Jul 2005 02:32:24 +0200 (CEST) Message-ID: <42CB2698.2080904@cosmosbay.com> Date: Wed, 06 Jul 2005 02:32:24 +0200 From: Eric Dumazet User-Agent: Mozilla Thunderbird 1.0 (Windows/20041206) X-Accept-Language: fr, en MIME-Version: 1.0 To: Thomas Graf Cc: "David S. Miller" , netdev@oss.sgi.com Subject: Re: [PATCH] loop unrolling in net/sched/sch_generic.c References: <20050705173411.GK16076@postel.suug.ch> <20050705.142210.14973612.davem@davemloft.net> <20050705213355.GM16076@postel.suug.ch> <20050705.143548.28788459.davem@davemloft.net> <42CB14B2.5090601@cosmosbay.com> <20050705234104.GR16076@postel.suug.ch> In-Reply-To: <20050705234104.GR16076@postel.suug.ch> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 8bit X-archive-position: 2646 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: dada1@cosmosbay.com Precedence: bulk X-list: netdev Content-Length: 2279 Lines: 68 Thomas Graf a écrit : > I still think we can fix this performance issue without manually > unrolling the loop or we should at least try to. In the end gcc > should notice the constant part of the loop and move it out so > basically the only difference should the additional prio++ and > possibly a failing branch prediction. > > What about this? I'm still not sure where exactly all the time > is lost so this is a shot in the dark. > > Index: net-2.6/net/sched/sch_generic.c > =================================================================== > --- net-2.6.orig/net/sched/sch_generic.c > +++ net-2.6/net/sched/sch_generic.c > @@ -330,10 +330,11 @@ static struct sk_buff *pfifo_fast_dequeu > { > int prio; > struct sk_buff_head *list = qdisc_priv(qdisc); > + struct sk_buff *skb; > > - for (prio = 0; prio < PFIFO_FAST_BANDS; prio++, list++) { > - struct sk_buff *skb = __qdisc_dequeue_head(qdisc, list); > - if (skb) { > + for (prio = 0; prio < PFIFO_FAST_BANDS; prio++) { > + if (!skb_queue_empty(list + prio)) { > + skb = __qdisc_dequeue_head(qdisc, list); > qdisc->q.qlen--; > return skb; > } > > Hum... shouldnt it be : + skb = __qdisc_dequeue_head(qdisc, list + prio); ? Anyway, the branches misprediction come from the fact that most of packets are queued in the prio=2 list. So each time this function is called, a non unrolled version has to pay 2 to 5 branches misprediction. if ((!skb_queue_empty(list + prio)) /* branch not taken, mispredict when prio=0 */ if ((!skb_queue_empty(list + prio)) /* branch not taken, mispredict when prio=1 */ if ((!skb_queue_empty(list + prio)) /* branch taken (or not if queue is really empty), mispredict when prio=2 */ Maybe we can rewrite the whole thing without branches, examining prio from PFIFO_FAST_BANDS-1 down to 0, at least for modern cpu with conditional mov (cmov) struct sk_buff_head *best = NULL; struct sk_buff_head *list = qdisc_priv(qdisc)+PFIFO_FAST_BANDS-1; if (skb_queue_empty(list)) best = list ; list--; if (skb_queue_empty(list)) best = list ; list--; if (skb_queue_empty(list)) best = list ; if (best != NULL) { qdisc->q.qlen--; return __qdisc_dequeue_head(qdisc, best); } This version should have one branch. I will test this after some sleep :) See you Eric From tgraf@suug.ch Tue Jul 5 17:52:55 2005 Received: with ECARTIS (v1.0.0; list netdev); Tue, 05 Jul 2005 17:52:59 -0700 (PDT) Received: from postel.suug.ch (postel.suug.ch [195.134.158.23]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j660qsH9023252 for ; Tue, 5 Jul 2005 17:52:55 -0700 Received: by postel.suug.ch (Postfix, from userid 10001) id 6D5D71C0F2; Wed, 6 Jul 2005 02:51:40 +0200 (CEST) Date: Wed, 6 Jul 2005 02:51:40 +0200 From: Thomas Graf To: Eric Dumazet Cc: "David S. Miller" , netdev@oss.sgi.com Subject: Re: [PATCH] loop unrolling in net/sched/sch_generic.c Message-ID: <20050706005140.GT16076@postel.suug.ch> References: <20050705173411.GK16076@postel.suug.ch> <20050705.142210.14973612.davem@davemloft.net> <20050705213355.GM16076@postel.suug.ch> <20050705.143548.28788459.davem@davemloft.net> <42CB14B2.5090601@cosmosbay.com> <20050705234104.GR16076@postel.suug.ch> <42CB2698.2080904@cosmosbay.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <42CB2698.2080904@cosmosbay.com> X-archive-position: 2647 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: tgraf@suug.ch Precedence: bulk X-list: netdev Content-Length: 869 Lines: 27 * Eric Dumazet <42CB2698.2080904@cosmosbay.com> 2005-07-06 02:32 > Hum... shouldnt it be : > > + skb = __qdisc_dequeue_head(qdisc, list + prio); > Correct. > Anyway, the branches misprediction come from the fact that most of packets > are queued in the prio=2 list. > > So each time this function is called, a non unrolled version has to pay 2 > to 5 branches misprediction. > > if ((!skb_queue_empty(list + prio)) /* branch not taken, mispredict when > prio=0 */ The !expr implies an unlikely so the prediction should be right and equal to your unrolling version. > Maybe we can rewrite the whole thing without branches, examining prio from > PFIFO_FAST_BANDS-1 down to 0, at least for modern cpu with conditional mov > (cmov) This would break the whole thing, the qdisc is supposed to try and dequeue from the highest priority queue (prio=0) first. From dada1@cosmosbay.com Tue Jul 5 17:55:06 2005 Received: with ECARTIS (v1.0.0; list netdev); Tue, 05 Jul 2005 17:55:09 -0700 (PDT) Received: from smtp.cegetel.net (mf00.sitadelle.com [212.94.174.67]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j660t4H9023684 for ; Tue, 5 Jul 2005 17:55:05 -0700 Received: from [192.168.0.5] (84-4-149-97.dti.cegetel.net [84.4.149.97]) by smtp.cegetel.net (Postfix) with ESMTP id 200FA1A40F2; Wed, 6 Jul 2005 02:53:26 +0200 (CEST) Message-ID: <42CB2B84.50702@cosmosbay.com> Date: Wed, 06 Jul 2005 02:53:24 +0200 From: Eric Dumazet User-Agent: Mozilla Thunderbird 1.0 (Windows/20041206) X-Accept-Language: fr, en MIME-Version: 1.0 To: Eric Dumazet Cc: Thomas Graf , "David S. Miller" , netdev@oss.sgi.com Subject: Re: [PATCH] loop unrolling in net/sched/sch_generic.c References: <20050705173411.GK16076@postel.suug.ch> <20050705.142210.14973612.davem@davemloft.net> <20050705213355.GM16076@postel.suug.ch> <20050705.143548.28788459.davem@davemloft.net> <42CB14B2.5090601@cosmosbay.com> <20050705234104.GR16076@postel.suug.ch> <42CB2698.2080904@cosmosbay.com> In-Reply-To: <42CB2698.2080904@cosmosbay.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 8bit X-archive-position: 2648 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: dada1@cosmosbay.com Precedence: bulk X-list: netdev Content-Length: 2924 Lines: 87 Eric Dumazet a écrit : > > > Maybe we can rewrite the whole thing without branches, examining prio > from PFIFO_FAST_BANDS-1 down to 0, at least for modern cpu with > conditional mov (cmov) > > struct sk_buff_head *best = NULL; > struct sk_buff_head *list = qdisc_priv(qdisc)+PFIFO_FAST_BANDS-1; > if (skb_queue_empty(list)) best = list ; > list--; > if (skb_queue_empty(list)) best = list ; > list--; > if (skb_queue_empty(list)) best = list ; > if (best != NULL) { > qdisc->q.qlen--; > return __qdisc_dequeue_head(qdisc, best); > } > > This version should have one branch. > I will test this after some sleep :) > See you > Eric > > (Sorry, still using 2.6.12, but the idea remains) static struct sk_buff * pfifo_fast_dequeue(struct Qdisc* qdisc) { struct sk_buff_head *list = qdisc_priv(qdisc); struct sk_buff_head *best = NULL; list += 2; if (!skb_queue_empty(list)) best = list; list--; if (!skb_queue_empty(list)) best = list; list--; if (!skb_queue_empty(list)) best = list; if (best) { qdisc->q.qlen--; return __skb_dequeue(best); } return NULL; } At least the compiler output seems promising : 0000000000000550 : 550: 48 8d 97 f0 00 00 00 lea 0xf0(%rdi),%rdx 557: 31 c9 xor %ecx,%ecx 559: 48 8d 87 c0 00 00 00 lea 0xc0(%rdi),%rax 560: 48 39 97 f0 00 00 00 cmp %rdx,0xf0(%rdi) 567: 48 0f 45 ca cmovne %rdx,%rcx 56b: 48 8d 97 d8 00 00 00 lea 0xd8(%rdi),%rdx 572: 48 39 97 d8 00 00 00 cmp %rdx,0xd8(%rdi) 579: 48 0f 45 ca cmovne %rdx,%rcx 57d: 48 39 87 c0 00 00 00 cmp %rax,0xc0(%rdi) 584: 48 0f 45 c8 cmovne %rax,%rcx 588: 31 c0 xor %eax,%eax 58a: 48 85 c9 test %rcx,%rcx 58d: 74 32 je 5c1 // one conditional branch 58f: ff 4f 40 decl 0x40(%rdi) 592: 48 8b 11 mov (%rcx),%rdx 595: 48 39 ca cmp %rcx,%rdx 598: 74 27 je 5c1 // never taken branch : always predicted OK 59a: 48 89 d0 mov %rdx,%rax 59d: 48 8b 12 mov (%rdx),%rdx 5a0: ff 49 10 decl 0x10(%rcx) 5a3: 48 c7 40 10 00 00 00 movq $0x0,0x10(%rax) 5aa: 00 5ab: 48 89 4a 08 mov %rcx,0x8(%rdx) 5af: 48 89 11 mov %rdx,(%rcx) 5b2: 48 c7 40 08 00 00 00 movq $0x0,0x8(%rax) 5b9: 00 5ba: 48 c7 00 00 00 00 00 movq $0x0,(%rax) 5c1: 90 nop 5c2: c3 retq I Will post tomorrow some profiling results. Eric From tgraf@suug.ch Tue Jul 5 18:03:14 2005 Received: with ECARTIS (v1.0.0; list netdev); Tue, 05 Jul 2005 18:03:16 -0700 (PDT) Received: from postel.suug.ch (postel.suug.ch [195.134.158.23]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j6613DH9024750 for ; Tue, 5 Jul 2005 18:03:14 -0700 Received: by postel.suug.ch (Postfix, from userid 10001) id 973B41C0F3; Wed, 6 Jul 2005 03:02:00 +0200 (CEST) Date: Wed, 6 Jul 2005 03:02:00 +0200 From: Thomas Graf To: Eric Dumazet Cc: "David S. Miller" , netdev@oss.sgi.com Subject: Re: [PATCH] loop unrolling in net/sched/sch_generic.c Message-ID: <20050706010200.GU16076@postel.suug.ch> References: <20050705173411.GK16076@postel.suug.ch> <20050705.142210.14973612.davem@davemloft.net> <20050705213355.GM16076@postel.suug.ch> <20050705.143548.28788459.davem@davemloft.net> <42CB14B2.5090601@cosmosbay.com> <20050705234104.GR16076@postel.suug.ch> <42CB2698.2080904@cosmosbay.com> <42CB2B84.50702@cosmosbay.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <42CB2B84.50702@cosmosbay.com> X-archive-position: 2649 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: tgraf@suug.ch Precedence: bulk X-list: netdev Content-Length: 416 Lines: 10 * Eric Dumazet <42CB2B84.50702@cosmosbay.com> 2005-07-06 02:53 > (Sorry, still using 2.6.12, but the idea remains) I think you got me wrong, the whole point of this qdisc is to prioritize which means that we cannot dequeue from prio 1 as long as the queue in prio 0 is not empty. If you have no traffic at all for prio=0 and prio=1 then the best solution is to replace the qdisc on the device with a simple fifo. From dada1@cosmosbay.com Tue Jul 5 18:06:14 2005 Received: with ECARTIS (v1.0.0; list netdev); Tue, 05 Jul 2005 18:06:16 -0700 (PDT) Received: from smtp.cegetel.net (mf00.sitadelle.com [212.94.174.67]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j6616DH9025527 for ; Tue, 5 Jul 2005 18:06:14 -0700 Received: from [192.168.0.5] (84-4-149-97.dti.cegetel.net [84.4.149.97]) by smtp.cegetel.net (Postfix) with ESMTP id E35A31A4145; Wed, 6 Jul 2005 03:04:37 +0200 (CEST) Message-ID: <42CB2E24.6010303@cosmosbay.com> Date: Wed, 06 Jul 2005 03:04:36 +0200 From: Eric Dumazet User-Agent: Mozilla Thunderbird 1.0 (Windows/20041206) X-Accept-Language: fr, en MIME-Version: 1.0 To: Thomas Graf Cc: "David S. Miller" , netdev@oss.sgi.com Subject: Re: [PATCH] loop unrolling in net/sched/sch_generic.c References: <20050705173411.GK16076@postel.suug.ch> <20050705.142210.14973612.davem@davemloft.net> <20050705213355.GM16076@postel.suug.ch> <20050705.143548.28788459.davem@davemloft.net> <42CB14B2.5090601@cosmosbay.com> <20050705234104.GR16076@postel.suug.ch> <42CB2698.2080904@cosmosbay.com> <20050706005140.GT16076@postel.suug.ch> In-Reply-To: <20050706005140.GT16076@postel.suug.ch> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 8bit X-archive-position: 2650 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: dada1@cosmosbay.com Precedence: bulk X-list: netdev Content-Length: 636 Lines: 18 Thomas Graf a écrit : >>Maybe we can rewrite the whole thing without branches, examining prio from >>PFIFO_FAST_BANDS-1 down to 0, at least for modern cpu with conditional mov >>(cmov) > > > This would break the whole thing, the qdisc is supposed to try and > dequeue from the highest priority queue (prio=0) first. > > I still dequeue a packet from the highest priority queue. But nothing prevents us to look the three queues in the reverse order, if you can avoid the conditional branches. No memory penalty, since most of time we were looking at the three queues anyway, and the 3 sk_buff_head are in the same cache line. From tgraf@suug.ch Tue Jul 5 18:09:10 2005 Received: with ECARTIS (v1.0.0; list netdev); Tue, 05 Jul 2005 18:09:13 -0700 (PDT) Received: from postel.suug.ch (postel.suug.ch [195.134.158.23]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j66199H9026127 for ; Tue, 5 Jul 2005 18:09:10 -0700 Received: by postel.suug.ch (Postfix, from userid 10001) id 3F8631C0F3; Wed, 6 Jul 2005 03:07:56 +0200 (CEST) Date: Wed, 6 Jul 2005 03:07:56 +0200 From: Thomas Graf To: Eric Dumazet Cc: "David S. Miller" , netdev@oss.sgi.com Subject: Re: [PATCH] loop unrolling in net/sched/sch_generic.c Message-ID: <20050706010756.GV16076@postel.suug.ch> References: <20050705173411.GK16076@postel.suug.ch> <20050705.142210.14973612.davem@davemloft.net> <20050705213355.GM16076@postel.suug.ch> <20050705.143548.28788459.davem@davemloft.net> <42CB14B2.5090601@cosmosbay.com> <20050705234104.GR16076@postel.suug.ch> <42CB2698.2080904@cosmosbay.com> <20050706005140.GT16076@postel.suug.ch> <42CB2E24.6010303@cosmosbay.com> Mime-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <42CB2E24.6010303@cosmosbay.com> X-archive-position: 2651 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: tgraf@suug.ch Precedence: bulk X-list: netdev Content-Length: 560 Lines: 17 * Eric Dumazet <42CB2E24.6010303@cosmosbay.com> 2005-07-06 03:04 > Thomas Graf a écrit : > > >>Maybe we can rewrite the whole thing without branches, examining prio > >>from PFIFO_FAST_BANDS-1 down to 0, at least for modern cpu with > >>conditional mov (cmov) > > > > > >This would break the whole thing, the qdisc is supposed to try and > >dequeue from the highest priority queue (prio=0) first. > > > > > > I still dequeue a packet from the highest priority queue. Ahh... sorry, I misread your patch, interesting idea. I'll be waiting for your numbers. From dada1@cosmosbay.com Tue Jul 5 18:10:37 2005 Received: with ECARTIS (v1.0.0; list netdev); Tue, 05 Jul 2005 18:10:39 -0700 (PDT) Received: from smtp.cegetel.net (mf00.sitadelle.com [212.94.174.67]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j661AaH9026585 for ; Tue, 5 Jul 2005 18:10:36 -0700 Received: from [192.168.0.5] (84-4-149-97.dti.cegetel.net [84.4.149.97]) by smtp.cegetel.net (Postfix) with ESMTP id 4E2891A4038; Wed, 6 Jul 2005 03:09:01 +0200 (CEST) Message-ID: <42CB2F2C.7050602@cosmosbay.com> Date: Wed, 06 Jul 2005 03:09:00 +0200 From: Eric Dumazet User-Agent: Mozilla Thunderbird 1.0 (Windows/20041206) X-Accept-Language: fr, en MIME-Version: 1.0 To: Thomas Graf Cc: "David S. Miller" , netdev@oss.sgi.com Subject: Re: [PATCH] loop unrolling in net/sched/sch_generic.c References: <20050705173411.GK16076@postel.suug.ch> <20050705.142210.14973612.davem@davemloft.net> <20050705213355.GM16076@postel.suug.ch> <20050705.143548.28788459.davem@davemloft.net> <42CB14B2.5090601@cosmosbay.com> <20050705234104.GR16076@postel.suug.ch> <42CB2698.2080904@cosmosbay.com> <42CB2B84.50702@cosmosbay.com> <20050706010200.GU16076@postel.suug.ch> In-Reply-To: <20050706010200.GU16076@postel.suug.ch> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 8bit X-archive-position: 2652 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: dada1@cosmosbay.com Precedence: bulk X-list: netdev Content-Length: 650 Lines: 21 Thomas Graf a écrit : > I think you got me wrong, the whole point of this qdisc > is to prioritize which means that we cannot dequeue from > prio 1 as long as the queue in prio 0 is not empty. if prio 0 is not empty, then the last if (!skb_queue_empty(list)) best = list; will set 'best' to the prio 0 list, and we dequeue the packet on this prio 0 list, not on prio 1 or prio 2. > > If you have no traffic at all for prio=0 and prio=1 then > the best solution is to replace the qdisc on the device > with a simple fifo. Yes sure, but I know that already. Unfortunatly I have some trafic on prio=1 and prio=0 (about 5 %) Thank you Eric From zdenek@rcn.com Tue Jul 5 19:04:07 2005 Received: with ECARTIS (v1.0.0; list netdev); Tue, 05 Jul 2005 19:04:10 -0700 (PDT) Received: from smtp05.mrf.mail.rcn.net (smtp05.mrf.mail.rcn.net [207.172.4.64]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j66246H9030373 for ; Tue, 5 Jul 2005 19:04:07 -0700 Received: from 209-150-50-115.c3-0.nwt-ubr3.sbo-nwt.ma.cable.rcn.com (HELO funex) (209.150.50.115) by smtp05.mrf.mail.rcn.net with SMTP; 05 Jul 2005 22:02:31 -0400 Message-Id: <3u3gb7$1no73u@smtp05.mrf.mail.rcn.net> X-IronPort-AV: i="3.93,263,1115006400"; d="scan'208"; a="58465406:sNHT21009888" X-Sender: zdenek@pop.rcn.com X-Mailer: QUALCOMM Windows Eudora Pro Version 4.0 Date: Tue, 05 Jul 2005 21:55:29 -0400 To: Henrik Nordstrom From: Zdenek Radouch Subject: Re: controlling ARP Proxy scope? Cc: netdev@oss.sgi.com, linux-net@vger.kernel.org In-Reply-To: References: <3u3gb7$1mhk2i@smtp05.mrf.mail.rcn.net> <3u3gb7$1mhk2i@smtp05.mrf.mail.rcn.net> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" X-archive-position: 2653 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: zdenek@rcn.com Precedence: bulk X-list: netdev Content-Length: 2238 Lines: 56 At 11:21 PM 7/2/05 +0200, Henrik Nordstrom wrote: >On Fri, 1 Jul 2005, Zdenek Radouch wrote: > >> So, left with only a binary flag in /proc, and network definition on the >> interface, >> I assumed (perhaps naively) that the arp would proxy only for the addresses >> within the subnet defined for the interface (on which the proxy arp is >> turned on). >> However, that does not seem to be the case. > >You may be able to tune this with either arp_filter or arp_ignore. Unfortunately I can't. Not without adding more code to what is quite obviously a bunch of kludgy patches for an ill-conceived ARP proxy design. > >> I have an interface with address 10.1.2.219 and mask 255.255.255.248 with >> proxy arp turned on on this interface, and the machine is responding >> (I see that with tcpdump) to arp requests for address 10.1.2.1, i.e., >> an address outside of the proxy interface's subnet. > >Correct. > >> Can anyone explain the behavior? > >proxy_arp simply ARPs if there is a route for the requested destination >going out on another interface than where the ARP was seen. In my case, the proxy replies to a request seen on the very same interface to which the route points to. That's wrong no matter how you look at it; this is a route to which this node will be routing, i.e., this node will be ARPing for this route address - it itself should not reply to such requests, nor could it ever successfully do so. I find the idea to proxy based on routing tables quite questionable. It may work is some pretty trivial cases, but will very obviously fail with a more complex configuration. I have seven or eight networks attached to the node, and I certainly do not want to proxy for every single address one may find in the routing tables. It is equally mind boggling to me how this could ever work with a stack allowing source-based routing, that is, a stack allowing coexistence of multiple, possibly conflicting routing tables. Sounds to me like I am going to have to rewrite the module. It needs to be configured manually - the notion that it could work automagically, without external configuration is quite unrealistic, as one can see from the code in arp_filter and arp_ignore. Thanks for the pointers. -Zdenek From hno@marasystems.com Tue Jul 5 19:22:18 2005 Received: with ECARTIS (v1.0.0; list netdev); Tue, 05 Jul 2005 19:22:22 -0700 (PDT) Received: from filer.marasystems.com (marasystems.com [83.241.133.2]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j662MGH9031846 for ; Tue, 5 Jul 2005 19:22:17 -0700 Received: from localhost (henrik@localhost) by filer.marasystems.com (8.11.6/8.11.6) with ESMTP id j662KWt31633; Wed, 6 Jul 2005 04:20:32 +0200 Date: Wed, 6 Jul 2005 04:20:32 +0200 (CEST) From: Henrik Nordstrom To: Zdenek Radouch cc: netdev@oss.sgi.com, linux-net@vger.kernel.org Subject: Re: controlling ARP Proxy scope? In-Reply-To: <3u3gb7$1no73u@smtp05.mrf.mail.rcn.net> Message-ID: References: <3u3gb7$1mhk2i@smtp05.mrf.mail.rcn.net> <3u3gb7$1mhk2i@smtp05.mrf.mail.rcn.net> <3u3gb7$1no73u@smtp05.mrf.mail.rcn.net> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed X-archive-position: 2654 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: hno@marasystems.com Precedence: bulk X-list: netdev Content-Length: 2747 Lines: 65 On Tue, 5 Jul 2005, Zdenek Radouch wrote: >> proxy_arp simply ARPs if there is a route for the requested destination >> going out on another interface than where the ARP was seen. > > In my case, the proxy replies to a request seen on the very same interface > to which the route points to. Are you really sure on this? This part has always worked fine for me with Linux proxy-arp and a large variety of different kernels. > I find the idea to proxy based on routing tables quite questionable. So do I. The manual proxy-arp entries method suits me much better, but is a pain due to lack of range support (probably why it got removed in 2.4) > It may work is some pretty trivial cases, but will very obviously fail > with a more complex configuration. Haven't managed to find a single situation not solveable yet.. and this involves pretty complex configurations.. I don't remember which of the sysctls mentioned earlier did the trick, but once enabled things starts to behave quite sanely even when there is multiple foreign networks unexpectedly carried on the same Ethernet. IIRC the settings I settled for was arp_ignore = 1 arp_announce = 1 > I have seven or eight networks attached to the node, and I certainly do > not want to proxy for every single address one may find in the routing > tables. Then don't. > It is equally mind boggling to me how this could ever work with a stack > allowing source-based routing, that is, a stack allowing coexistence of > multiple, possibly conflicting routing tables. Why not? > Sounds to me like I am going to have to rewrite the module. It needs to be > configured manually Well, for most setups it does work automagically. Just bring up the interfaces with the same IP, route the network out on the "main" interface having most hosts and host (or subnet) route the other out the other interface. ARP then follows automatically. But in messy networks or when your routing table is not correct then sysctls is needed to restrict when to respond to stop you from responding to ARP requests to outside/foreign networks. Probably isn't very hard to bring back the support for published proxy-arp entries if needed. But without range support it's a pain to maitain in most setups requiring proxy-arp as you then need an ARP entry for every "other" station on each interface involved in proxy-arp, meaning that if you proxy-arp a /24 network then you need 253 proxy-arp entries (one per station, defining which interface it belongs on). In the normal situation that you only act as a proxy-arp gateway for less than a handful stations this is a significant administrative overhead compared to just configuring routing which is required anyway. Regards Henrik From zdenek@rcn.com Tue Jul 5 21:41:23 2005 Received: with ECARTIS (v1.0.0; list netdev); Tue, 05 Jul 2005 21:41:26 -0700 (PDT) Received: from smtp05.mrf.mail.rcn.net (smtp05.mrf.mail.rcn.net [207.172.4.64]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j664fMH9012561 for ; Tue, 5 Jul 2005 21:41:23 -0700 Received: from 209-150-50-115.c3-0.nwt-ubr3.sbo-nwt.ma.cable.rcn.com (HELO funex) (209.150.50.115) by smtp05.mrf.mail.rcn.net with SMTP; 06 Jul 2005 00:39:47 -0400 Message-Id: <3u3gb7$1npg80@smtp05.mrf.mail.rcn.net> X-IronPort-AV: i="3.93,263,1115006400"; d="scan'208"; a="58507520:sNHT22991424" X-Sender: zdenek@pop.rcn.com X-Mailer: QUALCOMM Windows Eudora Pro Version 4.0 Date: Wed, 06 Jul 2005 00:32:44 -0400 To: Henrik Nordstrom From: Zdenek Radouch Subject: Re: controlling ARP Proxy scope? Cc: netdev@oss.sgi.com, linux-net@vger.kernel.org In-Reply-To: References: <3u3gb7$1no73u@smtp05.mrf.mail.rcn.net> <3u3gb7$1mhk2i@smtp05.mrf.mail.rcn.net> <3u3gb7$1mhk2i@smtp05.mrf.mail.rcn.net> <3u3gb7$1no73u@smtp05.mrf.mail.rcn.net> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" X-archive-position: 2655 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: zdenek@rcn.com Precedence: bulk X-list: netdev Content-Length: 4535 Lines: 108 At 04:20 AM 7/6/05 +0200, Henrik Nordstrom wrote: >On Tue, 5 Jul 2005, Zdenek Radouch wrote: > >>> proxy_arp simply ARPs if there is a route for the requested destination >>> going out on another interface than where the ARP was seen. >> >> In my case, the proxy replies to a request seen on the very same interface >> to which the route points to. > >Are you really sure on this? This part has always worked fine for me with >Linux proxy-arp and a large variety of different kernels. Yes, I am very sure about it. Of all attached networks, only one is a 10.* network, and the address answered, 10.1.2.1 happens to be the main router/gateway to the rest of the world. My machine (with proxy arp) answers incorrectly the ARP requests (for 10.1.2.1) from other machines on the 10.1.2.* subnet. Once it steals the role of the gateway, my machine then forwards the packets to its own default gateway which is behind the proxied machines. As a result of this mess, I see foreign traffic diverted through my proxy server via a private network to another gateway on the other side. > >> I find the idea to proxy based on routing tables quite questionable. > >So do I. The manual proxy-arp entries method suits me much better, but is >a pain due to lack of range support (probably why it got removed in 2.4) > >> It may work is some pretty trivial cases, but will very obviously fail >> with a more complex configuration. > >Haven't managed to find a single situation not solveable yet.. and this >involves pretty complex configurations.. I don't remember which of the >sysctls mentioned earlier did the trick, but once enabled things starts to >behave quite sanely even when there is multiple foreign networks >unexpectedly carried on the same Ethernet. IIRC the settings I settled for >was > > arp_ignore = 1 > arp_announce = 1 > >> I have seven or eight networks attached to the node, and I certainly do >> not want to proxy for every single address one may find in the routing >> tables. > >Then don't. Well, how do I tell it that I want to proxy for all machines on the 192.168.13.128/29 net attached to eth0.5, but not for any of the machines on 192.168.2.0/24 attached to eth0.6 ? It just occured to me that if I misunderstood the semantics, the setup may be wrong. I assumed that turning the proxy arp on on interface X would make the interface X answer (proxy) the ARP queries. Is that correct? (The alternative would be that turning the bit on interface X would make all other interfaces answer on behalf of X). > >> It is equally mind boggling to me how this could ever work with a stack >> allowing source-based routing, that is, a stack allowing coexistence of >> multiple, possibly conflicting routing tables. > >Why not? Because in rule-based routing, a table entries are only valid when the corresponding rule hits, based on the source address. In the absence of a source address as is the case of an ARP request, how could you possibly determine which of the routing tables should be consulted to decide whether the ARP query should be answered? > >> Sounds to me like I am going to have to rewrite the module. It needs to be >> configured manually > >Well, for most setups it does work automagically. Just bring up the >interfaces with the same IP, route the network out on the "main" interface >having most hosts and host (or subnet) route the other out the other >interface. ARP then follows automatically. > >But in messy networks or when your routing table is not correct then >sysctls is needed to restrict when to respond to stop you from responding >to ARP requests to outside/foreign networks. > >Probably isn't very hard to bring back the support for published proxy-arp >entries if needed. But without range support it's a pain to maitain in >most setups requiring proxy-arp as you then need an ARP entry for every >"other" station on each interface involved in proxy-arp, meaning that if >you proxy-arp a /24 network then you need 253 proxy-arp entries (one per >station, defining which interface it belongs on). In the normal situation >that you only act as a proxy-arp gateway for less than a handful stations >this is a significant administrative overhead compared to just configuring >routing which is required anyway. I agree that you wouldn't want to enter discrete addresses. But it could be a simple command using the standard subnet notation: arp_proxy --add 10.1.2.0/24 [11:22:33:44:55:66] (optional Eth address for 3rd party proxy ARP). Regards -Zdenek From tgraf@suug.ch Wed Jul 6 05:43:27 2005 Received: with ECARTIS (v1.0.0; list netdev); Wed, 06 Jul 2005 05:43:34 -0700 (PDT) Received: from postel.suug.ch (postel.suug.ch [195.134.158.23]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j66ChQH9031341 for ; Wed, 6 Jul 2005 05:43:27 -0700 Received: by postel.suug.ch (Postfix, from userid 10001) id C53701C0F3; Wed, 6 Jul 2005 14:42:06 +0200 (CEST) Date: Wed, 6 Jul 2005 14:42:06 +0200 From: Thomas Graf To: Eric Dumazet Cc: "David S. Miller" , netdev@oss.sgi.com Subject: Re: [PATCH] loop unrolling in net/sched/sch_generic.c Message-ID: <20050706124206.GW16076@postel.suug.ch> References: <20050705173411.GK16076@postel.suug.ch> <20050705.142210.14973612.davem@davemloft.net> <20050705213355.GM16076@postel.suug.ch> <20050705.143548.28788459.davem@davemloft.net> <42CB14B2.5090601@cosmosbay.com> <20050705234104.GR16076@postel.suug.ch> <42CB2698.2080904@cosmosbay.com> <42CB2B84.50702@cosmosbay.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <42CB2B84.50702@cosmosbay.com> X-archive-position: 2656 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: tgraf@suug.ch Precedence: bulk X-list: netdev Content-Length: 2350 Lines: 72 * Eric Dumazet <42CB2B84.50702@cosmosbay.com> 2005-07-06 02:53 A short recap after some coffee and sleep: The inital issue you brought up which you backed up with numbers is probably the cause of multiple wrong branch predictions due to the fact that I wrote skb = dequeue(); if (skb) which was assumed to be likely by the compiler. In your patch you fixed this with !skb_queue_empty() which fixed this wrong prediction and also acts as a little optimization due to skb_queue_empty() being really simple to implement for the compiler. The patch I posted should result in almost the same result, despite of the additional wrong branch prediction for the loop it always has one wrong prediction which is when we hit a non-empty queue. In your unrolled version you could optimize it even more by taking advantage that prio=2 is the most likely non-empty queue so you could change the check to likely and save a wrong branch prediction for the common case at the cost of a branch misprediction if all queues are empty. The patch I posted results in something like this: pfifo_fast_dequeue: pushl %ebx xorl %ecx, %ecx movl 8(%esp), %ebx leal 128(%ebx), %edx .L129: movl (%edx), %eax cmpl %edx, %eax jne .L132 ; if (!skb_queue_empty()) incl %ecx addl $20, %edx cmpl $2, %ecx jle .L129 ; end of loop xorl %eax, %eax ; all queues empty .L117: popl %ebx ret I regard the miss here as acceptable for the increased flexibility we get. It can be optimized with a loop unrolling but my opinion is to try and avoid it if possible. Now your second thought is quite interesting, although it heavly depends on the fact that prio=2 is the most often used band. It will be interesting to see some numbers. > static struct sk_buff * > pfifo_fast_dequeue(struct Qdisc* qdisc) > { > struct sk_buff_head *list = qdisc_priv(qdisc); > struct sk_buff_head *best = NULL; > > list += 2; > if (!skb_queue_empty(list)) > best = list; > list--; > if (!skb_queue_empty(list)) > best = list; > list--; > if (!skb_queue_empty(list)) > best = list; Here is what I mean, a likely() should be even better. > if (best) { > qdisc->q.qlen--; > return __skb_dequeue(best); > } > return NULL; > } From hno@marasystems.com Wed Jul 6 08:33:59 2005 Received: with ECARTIS (v1.0.0; list netdev); Wed, 06 Jul 2005 08:34:04 -0700 (PDT) Received: from filer.marasystems.com (marasystems.com [83.241.133.2]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j66FXwH9020140 for ; Wed, 6 Jul 2005 08:33:58 -0700 Received: from localhost (henrik@localhost) by filer.marasystems.com (8.11.6/8.11.6) with ESMTP id j66FWGq06815; Wed, 6 Jul 2005 17:32:16 +0200 Date: Wed, 6 Jul 2005 17:32:16 +0200 (CEST) From: Henrik Nordstrom To: Zdenek Radouch cc: netdev@oss.sgi.com, linux-net@vger.kernel.org Subject: Re: controlling ARP Proxy scope? In-Reply-To: <3u3gb7$1npg80@smtp05.mrf.mail.rcn.net> Message-ID: References: <3u3gb7$1no73u@smtp05.mrf.mail.rcn.net> <3u3gb7$1mhk2i@smtp05.mrf.mail.rcn.net> <3u3gb7$1mhk2i@smtp05.mrf.mail.rcn.net> <3u3gb7$1no73u@smtp05.mrf.mail.rcn.net> <3u3gb7$1npg80@smtp05.mrf.mail.rcn.net> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed X-archive-position: 2657 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: hno@marasystems.com Precedence: bulk X-list: netdev Content-Length: 2642 Lines: 62 On Wed, 6 Jul 2005, Zdenek Radouch wrote: > Well, how do I tell it that I want to proxy for all machines on the > 192.168.13.128/29 net attached to eth0.5, but not for any of > the machines on 192.168.2.0/24 attached to eth0.6 ? For whom going where? What is eth0.5? A tagged 802.1q vlan on eth0, or something else? (i.e. is it a interface of it's own according to ip link show, or just an alias?) > It just occured to me that if I misunderstood the semantics, the setup > may be wrong. I assumed that turning the proxy arp on on interface X > would make the interface X answer (proxy) the ARP queries. > Is that correct? Yes. >>> It is equally mind boggling to me how this could ever work with a stack >>> allowing source-based routing, that is, a stack allowing coexistence of >>> multiple, possibly conflicting routing tables. >> >> Why not? > > Because in rule-based routing, a table entries are only valid when the > corresponding > rule hits, based on the source address. In the absence of a source address > as is the case of an ARP request, how could you possibly determine which > of the routing tables should be consulted to decide whether the ARP query > should be answered? ARP requests do have valid source addresses, at least the normal ARP queries. The duplicate IP check ARP requests and a few other obscure cases don't. > I agree that you wouldn't want to enter discrete addresses. > But it could be a simple command using the standard subnet notation: > > arp_proxy --add 10.1.2.0/24 [11:22:33:44:55:66] > (optional Eth address for 3rd party proxy ARP). Which to my experience is all automatic from the routing table. But I do remember some slight complications in source routed networks but nothing major. Before arp_ignore existed I often used source routing as a substitute for keeping proxy-ARP at bay but I don't remember the exact details. I only experienced problems when there was other IP networks on the same Ethernet but not known to the box. In this case I had to enable the arp_ignore sysctl to stop answering "other" ARP queries for networks not defined on the same Ethernet interface. I also used the arp_announce to keep the source address of ARP responses under control. In addition in a setup which involved stations with IP addresses identical to that of one of my own other Ethernet interfaces I had to hack the kernel slightly to make it possible to ignore ARP duplicate IP check packets on the relevant interfaces. But this is very special setup involving NAT and other nastinesses which shouldn't be encountered in any reasonably normal network situation. Regards Henrik From mitch.a.williams@intel.com Wed Jul 6 11:39:44 2005 Received: with ECARTIS (v1.0.0; list netdev); Wed, 06 Jul 2005 11:40:49 -0700 (PDT) Received: from orsfmr002.jf.intel.com (fmr17.intel.com [134.134.136.16]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j66IdhH9009649 for ; Wed, 6 Jul 2005 11:39:43 -0700 Received: from orsfmr101.jf.intel.com (orsfmr101.jf.intel.com [10.7.209.17]) by orsfmr002.jf.intel.com (8.12.10/8.12.10/d: major-outer.mc,v 1.1 2004/09/17 17:50:56 root Exp $) with ESMTP id j66IbrYn026215; Wed, 6 Jul 2005 18:37:53 GMT Received: from nwlxmail01.jf.intel.com (nwlxmail01.jf.intel.com [10.7.171.40]) by orsfmr101.jf.intel.com (8.12.10/8.12.10/d: major-inner.mc,v 1.2 2004/09/17 18:05:01 root Exp $) with ESMTP id j66Ibrqx025701; Wed, 6 Jul 2005 18:37:53 GMT Received: from mawilli1-desk2.amr.corp.intel.com (mawilli1-desk2.amr.corp.intel.com [134.134.3.58]) by nwlxmail01.jf.intel.com (8.12.10/8.12.9/MailSET/Hub) with ESMTP id j66IboSL026134; Wed, 6 Jul 2005 11:37:50 -0700 Date: Wed, 6 Jul 2005 11:37:49 -0700 From: Mitch Williams X-X-Sender: mawilli1@mawilli1-desk2.amr.corp.intel.com To: Dmitry Torokhov cc: netdev@oss.sgi.com, Radheka Godse , fubar@us.ibm.com, bonding-devel@lists.sourceforge.net Subject: Re: [PATCH 2.6.13-rc1 8/17] bonding: SYSFS INTERFACE (large) In-Reply-To: <200507020030.03635.dtor_core@ameritech.net> Message-ID: References: <200507020030.03635.dtor_core@ameritech.net> ReplyTo: "Mitch Williams" MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-Scanned-By: MIMEDefang 2.44 X-archive-position: 2658 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: mitch.a.williams@intel.com Precedence: bulk X-list: netdev Content-Length: 1242 Lines: 40 On Sat, 2 Jul 2005, Dmitry Torokhov wrote: > > Couple of comments: [snip] > > + > > +static struct class *netdev_class; > > +/*--------------------------- Data Structures -----------------------------*/ > > + > > +/* Bonding sysfs lock. Why can't we just use the subsytem lock? > > + * Because kobject_register tries to acquire the subsystem lock. If > > + * we already hold the lock (which we would if the user was creating > > + * a new bond through the sysfs interface), we deadlock. > > + */ > > + > > +struct rw_semaphore bonding_rwsem; > > klists were just added to the kernel proper. Does this sentiment still > holds true? Thanks for reviewing this patch, Dmitry. We appreciate your efforts, and we'll make the changes you pointed out. In this case, we hold the lock on access to all bonding-owned sysfs files, because it's possible for changes to one file to alter the contents and/or presence of another file. Consider: 1) process 'foo' opens /sys/class/net/bond1/mode 2) process 'bar' opens /sys/class/net/bonding_masters 3) process 'bar' writes to bonding_masters and removes bond1 4) process 'foo' tries to write 5) Boom. Or rather, oops. Thus, we have this lock. I don't think that klists will help here. -Mitch From mitch.a.williams@intel.com Wed Jul 6 11:55:42 2005 Received: with ECARTIS (v1.0.0; list netdev); Wed, 06 Jul 2005 11:55:45 -0700 (PDT) Received: from orsfmr005.jf.intel.com (fmr20.intel.com [134.134.136.19]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j66ItgH9011667 for ; Wed, 6 Jul 2005 11:55:42 -0700 Received: from orsfmr101.jf.intel.com (orsfmr101.jf.intel.com [10.7.209.17]) by orsfmr005.jf.intel.com (8.12.10/8.12.10/d: major-outer.mc,v 1.1 2004/09/17 17:50:56 root Exp $) with ESMTP id j66Is4hn008811; Wed, 6 Jul 2005 18:54:04 GMT Received: from nwlxmail01.jf.intel.com (nwlxmail01.jf.intel.com [10.7.171.40]) by orsfmr101.jf.intel.com (8.12.10/8.12.10/d: major-inner.mc,v 1.2 2004/09/17 18:05:01 root Exp $) with ESMTP id j66Is4qx007518; Wed, 6 Jul 2005 18:54:04 GMT Received: from mawilli1-desk2.amr.corp.intel.com (mawilli1-desk2.amr.corp.intel.com [134.134.3.58]) by nwlxmail01.jf.intel.com (8.12.10/8.12.9/MailSET/Hub) with ESMTP id j66IrSSL027116; Wed, 6 Jul 2005 11:53:38 -0700 Date: Wed, 6 Jul 2005 11:53:13 -0700 From: Mitch Williams X-X-Sender: mawilli1@mawilli1-desk2.amr.corp.intel.com To: Greg KH cc: Radheka Godse , netdev@oss.sgi.com Subject: Re: [PATCH 2.6.13-rc1 8/17] bonding: SYSFS INTERFACE (large) In-Reply-To: <20050702081346.GA20789@kroah.com> Message-ID: References: <20050702081346.GA20789@kroah.com> ReplyTo: "Mitch Williams" MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-Scanned-By: MIMEDefang 2.44 X-archive-position: 2659 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: mitch.a.williams@intel.com Precedence: bulk X-list: netdev Content-Length: 1367 Lines: 42 On Sat, 2 Jul 2005, Greg KH wrote: > > This violates the 1-value-per-sysfs file rule. Please fix this up. > Thanks for looking at our patch, Greg. We're aware of the "one value" rule, but we really couldn't find any way to do what we wanted to do any other way. The kernel docs do indicate that it is "socially acceptable to express an array of values of values of the same type", which this certainly is. In this particular case, the file /sys/class/net/bonding_masters contains the names of all of the bonds in the system. By default, the module creates a single bond when it loads, thus: $ cat bonding_masters bond0 You can add and remove bonds just by writing to the file. In keeping with the "array of types" concept, you must write the names of all active bonds back to the file. Thus, $ echo "bond0 bond1" > bonding_masters retains bond0 and adds bond1. Likewise, $ echo "bond1 bond2" > bonding_masters retains bond1, deletes bond0, and adds bond2. The slaves file in each bond's directory acts the same way, and is used to add or remove slaves from each individual bond. We discussed this design extensively before implementation, but really couldn't come up with anything as elegant or easy to understand as this scheme. Since it really is an array of similar values, we are hoping that it will be viewed as socially acceptable. -Mitch From shemminger@osdl.org Wed Jul 6 12:03:48 2005 Received: with ECARTIS (v1.0.0; list netdev); Wed, 06 Jul 2005 12:03:52 -0700 (PDT) Received: from smtp.osdl.org (smtp.osdl.org [65.172.181.4]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j66J3lH9014430 for ; Wed, 6 Jul 2005 12:03:47 -0700 Received: from shell0.pdx.osdl.net (fw.osdl.org [65.172.181.6]) by smtp.osdl.org (8.12.8/8.12.8) with ESMTP id j66J1vjA021193 (version=TLSv1/SSLv3 cipher=EDH-RSA-DES-CBC3-SHA bits=168 verify=NO); Wed, 6 Jul 2005 12:01:57 -0700 Received: from dxpl.pdx.osdl.net (dxpl.pdx.osdl.net [10.8.0.74]) by shell0.pdx.osdl.net (8.13.1/8.11.6) with ESMTP id j66J1vBh016852; Wed, 6 Jul 2005 12:01:57 -0700 Date: Wed, 6 Jul 2005 12:02:02 -0700 From: Stephen Hemminger To: Mitch Williams Cc: Dmitry Torokhov , netdev@oss.sgi.com, Radheka Godse , fubar@us.ibm.com, bonding-devel@lists.sourceforge.net Subject: Re: [PATCH 2.6.13-rc1 8/17] bonding: SYSFS INTERFACE (large) Message-ID: <20050706120202.6a048b14@dxpl.pdx.osdl.net> In-Reply-To: References: <200507020030.03635.dtor_core@ameritech.net> Organization: Open Source Development Lab X-Mailer: Sylpheed-Claws 1.0.5 (GTK+ 1.2.10; x86_64-unknown-linux-gnu) X-Face: &@E+xe?c%:&e4D{>f1O<&U>2qwRREG5!}7R4;D<"NO^UI2mJ[eEOA2*3>(`Th.yP,VDPo9$ /`~cw![cmj~~jWe?AHY7D1S+\}5brN0k*NE?pPh_'_d>6;XGG[\KDRViCfumZT3@[ Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-MIMEDefang-Filter: osdl$Revision: 1.111 $ X-Scanned-By: MIMEDefang 2.36 X-archive-position: 2660 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: shemminger@osdl.org Precedence: bulk X-list: netdev Content-Length: 1459 Lines: 43 On Wed, 6 Jul 2005 11:37:49 -0700 Mitch Williams wrote: > > > On Sat, 2 Jul 2005, Dmitry Torokhov wrote: > > > > > Couple of comments: > [snip] > > > > + > > > +static struct class *netdev_class; > > > +/*--------------------------- Data Structures -----------------------------*/ > > > + > > > +/* Bonding sysfs lock. Why can't we just use the subsytem lock? > > > + * Because kobject_register tries to acquire the subsystem lock. If > > > + * we already hold the lock (which we would if the user was creating > > > + * a new bond through the sysfs interface), we deadlock. > > > + */ > > > + > > > +struct rw_semaphore bonding_rwsem; > > > > klists were just added to the kernel proper. Does this sentiment still > > holds true? > > > Thanks for reviewing this patch, Dmitry. We appreciate your efforts, and > we'll make the changes you pointed out. > > In this case, we hold the lock on access to all bonding-owned sysfs files, > because it's possible for changes to one file to alter the contents and/or > presence of another file. Consider: > > 1) process 'foo' opens /sys/class/net/bond1/mode > 2) process 'bar' opens /sys/class/net/bonding_masters > 3) process 'bar' writes to bonding_masters and removes bond1 > 4) process 'foo' tries to write > 5) Boom. Or rather, oops. > > Thus, we have this lock. I don't think that klists will help here. You need to use kobject ref counting then, not the semaphore. From dtor_core@ameritech.net Wed Jul 6 12:10:45 2005 Received: with ECARTIS (v1.0.0; list netdev); Wed, 06 Jul 2005 12:10:49 -0700 (PDT) Received: from web81301.mail.yahoo.com (web81301.mail.yahoo.com [206.190.37.76]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with SMTP id j66JAiH9016798 for ; Wed, 6 Jul 2005 12:10:45 -0700 Received: (qmail 74130 invoked by uid 60001); 6 Jul 2005 19:09:06 -0000 Message-ID: <20050706190906.74128.qmail@web81301.mail.yahoo.com> Received: from [167.167.7.254] by web81301.mail.yahoo.com via HTTP; Wed, 06 Jul 2005 12:09:06 PDT Date: Wed, 6 Jul 2005 12:09:06 -0700 (PDT) From: Dmitry Torokhov Subject: Re: [PATCH 2.6.13-rc1 8/17] bonding: SYSFS INTERFACE (large) To: Mitch Williams Cc: netdev@oss.sgi.com, Radheka Godse , fubar@us.ibm.com, bonding-devel@lists.sourceforge.net In-Reply-To: MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Transfer-Encoding: 8bit X-archive-position: 2661 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: dtor_core@ameritech.net Precedence: bulk X-list: netdev Content-Length: 1834 Lines: 56 Mitch Williams wrote: > On Sat, 2 Jul 2005, Dmitry Torokhov wrote: > > > > > Couple of comments: > [snip] > > > > + > > > +static struct class *netdev_class; > > > +/*--------------------------- Data Structures > -----------------------------*/ > > > + > > > +/* Bonding sysfs lock. Why can't we just use the subsytem lock? > > > + * Because kobject_register tries to acquire the subsystem lock. If > > > + * we already hold the lock (which we would if the user was creating > > > + * a new bond through the sysfs interface), we deadlock. > > > + */ > > > + > > > +struct rw_semaphore bonding_rwsem; > > > > klists were just added to the kernel proper. Does this sentiment still > > holds true? > > > Thanks for reviewing this patch, Dmitry. We appreciate your efforts, and > we'll make the changes you pointed out. > > In this case, we hold the lock on access to all bonding-owned sysfs files, > because it's possible for changes to one file to alter the contents and/or > presence of another file. Consider: > > 1) process 'foo' opens /sys/class/net/bond1/mode > 2) process 'bar' opens /sys/class/net/bonding_masters > 3) process 'bar' writes to bonding_masters and removes bond1 > 4) process 'foo' tries to write > 5) Boom. Or rather, oops. > > Thus, we have this lock. I don't think that klists will help here. > Semaphore will not help in scenario you described: 1) process 'bar' opens /sys/class/net/bonding_masters 2) process 'foo' opens /sys/class/net/bond1/mode 3) process 'bar' starts to write to bonding_masters and acquires semaphore 3) process 'foo' tries to write and waits for semaphore to become available 3) process 'bar' finishes writing and removes bond1 4) process 'foo' acquires the semaphore and continues with write 5) Boom. Or rather, oops. -- Dmitry From greg@kroah.com Wed Jul 6 13:21:05 2005 Received: with ECARTIS (v1.0.0; list netdev); Wed, 06 Jul 2005 13:21:07 -0700 (PDT) Received: from perch.kroah.org (mail.kroah.org [69.55.234.183]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j66KL4H9028234 for ; Wed, 6 Jul 2005 13:21:05 -0700 Received: from [192.168.0.10] (c-24-22-115-24.hsd1.or.comcast.net [24.22.115.24]) (authenticated) by perch.kroah.org (8.11.6/8.11.6) with ESMTP id j66KJ2q31217; Wed, 6 Jul 2005 13:19:02 -0700 Received: from greg by echidna.kroah.org with local (masqmail 0.2.19) id 1DqFwe-4rB-00; Wed, 06 Jul 2005 12:52:32 -0700 Date: Wed, 6 Jul 2005 12:52:32 -0700 From: Greg KH To: Mitch Williams Cc: Radheka Godse , netdev@oss.sgi.com Subject: Re: [PATCH 2.6.13-rc1 8/17] bonding: SYSFS INTERFACE (large) Message-ID: <20050706195232.GB18359@kroah.com> References: <20050702081346.GA20789@kroah.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.8i X-archive-position: 2662 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: greg@kroah.com Precedence: bulk X-list: netdev Content-Length: 2129 Lines: 66 On Wed, Jul 06, 2005 at 11:53:13AM -0700, Mitch Williams wrote: > > > On Sat, 2 Jul 2005, Greg KH wrote: > > > > > This violates the 1-value-per-sysfs file rule. Please fix this up. > > > > Thanks for looking at our patch, Greg. We're aware of the "one value" > rule, but we really couldn't find any way to do what we wanted to do > any other way. The kernel docs do indicate that it is "socially > acceptable to express an array of values of values of the same type", > which this certainly is. > > In this particular case, the file /sys/class/net/bonding_masters contains > the names of all of the bonds in the system. By default, the module > creates a single bond when it loads, thus: > > $ cat bonding_masters > bond0 And, if you have a _lot_ of bonds, you will not show them all, right? That would not work well if you read the file, and then append a new one and write it back. > You can add and remove bonds just by writing to the file. In keeping with > the "array of types" concept, you must write the names of all active bonds > back to the file. Thus, > > $ echo "bond0 bond1" > bonding_masters > > retains bond0 and adds bond1. Likewise, > > $ echo "bond1 bond2" > bonding_masters > > retains bond1, deletes bond0, and adds bond2. > > The slaves file in each bond's directory acts the same way, and is used to > add or remove slaves from each individual bond. > > We discussed this design extensively before implementation, but really > couldn't come up with anything as elegant or easy to understand as this > scheme. Since it really is an array of similar values, we are hoping that > it will be viewed as socially acceptable. No. How about this: bond_add - write to this to add a new bond, one value only. bond_remove - write to this to remove a bond that is present. bonds/bond0 bonds/bond1 bonds/bond2 ... - list of bonds currently present. If you want, you could make those bondX files directories, and put other info about the individual bonds in there, if you need it (I know nothing about the bonding intrerface, sorry.) Would that work? thanks, greg k-h From szilm@presinet.com Wed Jul 6 15:29:00 2005 Received: with ECARTIS (v1.0.0; list netdev); Wed, 06 Jul 2005 15:29:05 -0700 (PDT) Received: from presinet-main.presinet.com (presinet.com [209.53.156.1]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j66MT0H9004874 for ; Wed, 6 Jul 2005 15:29:00 -0700 Received: from [10.10.1.152] (10.10.1.152 [10.10.1.152]) by presinet-main.presinet.com with SMTP (Microsoft Exchange Internet Mail Service Version 5.5.2658.3) id 3HW05PPG; Wed, 6 Jul 2005 15:23:12 -0700 Mime-Version: 1.0 (Apple Message framework v730) Content-Transfer-Encoding: 7bit Message-Id: Content-Type: text/plain; charset=US-ASCII; delsp=yes; format=flowed To: netdev@oss.sgi.com From: Stuart Zilm Subject: policy routing inconsistency Date: Wed, 6 Jul 2005 15:27:19 -0700 X-Mailer: Apple Mail (2.730) X-archive-position: 2663 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: szilm@presinet.com Precedence: bulk X-list: netdev Content-Length: 1382 Lines: 39 While trying to do some policy routing recently I discovered an inconsistency in the behavior that selects the source address by route for locally generated outgoing packets. It seems that while routing occurs through the full policy database (rules and routes), the routes source address is always looked up in the main routing table. For example: DST=10.10.1.1 SRC=10.10.1.2 ip address add $SRC dev eth0 # this is a secondary address on the interface # this works - the source selected is $SRC ip route add $DST dev eth0 src $SRC # implicit table main # this fails - the source selected is chosen from main ip route del $DST dev eth0 src $SRC # implicit table main - NOTE: if this route remains, this source address will be chosen (from table main!) ip route add $SRC dev eth0 src $SRC table 1 ip rule add fwmark 1 table 1 iptables -t mangle -A OUTPUT -d $DST -j MARK --set-mark 1 I expected my source to come from the route that matches and routes my packets. Instead, it seems like there is a separate lookup done on table main directly to select the source. The behavior is the same on linux 2.4.30 and 2.6.8 kernels. Is this done intentionally? What I hoped to achieve was the ability to have two routes to the same host, using different source addresses and select routes based on packet marks. Is that possible? Stuart Zilm PresiNET Systems From mwallis@serialmonkey.com Wed Jul 6 16:36:25 2005 Received: with ECARTIS (v1.0.0; list netdev); Wed, 06 Jul 2005 16:36:31 -0700 (PDT) Received: from mail.qvalent.com (qvfw1.qvalent.com [202.7.65.65]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j66NaMH9013604 for ; Wed, 6 Jul 2005 16:36:25 -0700 Received: from qvwallis (unknown [192.168.40.150]) by mail.qvalent.com (Postfix) with ESMTP id DCECA53 for ; Thu, 7 Jul 2005 10:23:28 +1000 (EST) From: "Mark Wallis" To: "'NetDev'" Subject: FW: ieee80211 patches Date: Thu, 7 Jul 2005 09:30:34 +1000 MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit X-Mailer: Microsoft Office Outlook, Build 11.0.6353 X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.2527 Thread-Index: AcWBG9pEhVptwaQXReSIqwktGFpe4gBZsb5g Message-Id: <20050707002328.DCECA53@mail.qvalent.com> X-archive-position: 2664 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: mwallis@serialmonkey.com Precedence: bulk X-list: netdev Content-Length: 938 Lines: 33 Hi everyone, On 28/06/2005, at 10:23 PM, Jiri Benc wrote: Our patches against latest ieee80211 branch of netdev tree can be found at http://forge.novell.com/modules/xfmod/cvs/cvsbrowse.php/ieee80211/patches-up stream/ (it is possible to download a tarball from this link too). Should we be expecting to see these patches in the net-dev GIT repository anytime soon ? We are basing our new Ralink driver off the ieee80211 common stack but haven't seen any commits in there for awhile (even know patches have been posted here on net-dev). Is there are more appropriate way for us to be keeping up with the latest ieee80211 stack other than the net-dev GIT repository ? Another repository we are not aware of. I apologise is this is a silly question and there is another repository that exists that we just don't know of. ---- Regards, Mark Wallis rt2x00 Project Leader mwallis@serialmonkey.com http://rt2x00.serialmonkey.com From zdenek@rcn.com Wed Jul 6 19:55:16 2005 Received: with ECARTIS (v1.0.0; list netdev); Wed, 06 Jul 2005 19:55:22 -0700 (PDT) Received: from smtp05.mrf.mail.rcn.net (smtp05.mrf.mail.rcn.net [207.172.4.64]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j672tFH9026492 for ; Wed, 6 Jul 2005 19:55:15 -0700 Received: from 209-150-50-115.c3-0.nwt-ubr3.sbo-nwt.ma.cable.rcn.com (HELO funex) (209.150.50.115) by smtp05.mrf.mail.rcn.net with SMTP; 06 Jul 2005 22:53:37 -0400 Message-Id: <3u3gb7$1o5sb5@smtp05.mrf.mail.rcn.net> X-IronPort-AV: i="3.93,267,1115006400"; d="scan'208"; a="58913125:sNHT23007186" X-Sender: zdenek@pop.rcn.com X-Mailer: QUALCOMM Windows Eudora Pro Version 4.0 Date: Wed, 06 Jul 2005 22:46:33 -0400 To: Henrik Nordstrom From: Zdenek Radouch Subject: Re: controlling ARP Proxy scope? Cc: netdev@oss.sgi.com, linux-net@vger.kernel.org In-Reply-To: References: <3u3gb7$1npg80@smtp05.mrf.mail.rcn.net> <3u3gb7$1no73u@smtp05.mrf.mail.rcn.net> <3u3gb7$1mhk2i@smtp05.mrf.mail.rcn.net> <3u3gb7$1mhk2i@smtp05.mrf.mail.rcn.net> <3u3gb7$1no73u@smtp05.mrf.mail.rcn.net> <3u3gb7$1npg80@smtp05.mrf.mail.rcn.net> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" X-archive-position: 2665 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: zdenek@rcn.com Precedence: bulk X-list: netdev Content-Length: 3279 Lines: 69 At 05:32 PM 7/6/05 +0200, Henrik Nordstrom wrote: >On Wed, 6 Jul 2005, Zdenek Radouch wrote: > >> Well, how do I tell it that I want to proxy for all machines on the >> 192.168.13.128/29 net attached to eth0.5, but not for any of >> the machines on 192.168.2.0/24 attached to eth0.6 ? > >For whom going where? The setup is actually quite complex. I have a linear array of machines communicating over a proprietary (L1 and L2) protocol. The machines are connected point-to-point, and they use a proprietary VLAN protocol below IP in order to be able to control bandwidth (QoS). One of these VLANs (vsc1) is meant to provide external access. For purposes of redundant access, each node has two external addresses Ai and Bi on vsc1, set up as aliases vsc1:0 and vsc1:1. The two addresses are meant to allow access from the left end of the array (Ai) and from the right end of the array (Bi), so even in case of a line failure one could access all of the nodes. Only the two edge nodes (left-most and right-most) are connected to the [outside] customer network - they are connected via Ethernet (specifically 802.1q vlans) interfaces eth0.1 or eth0.2. Additionally, each node has multiple CPUs communicating on a private network (Ethernet/802.1q) via interface eth0.5 or eth0.6. You can ignore the point-to-point nature of the array interconnect - I have designed an L2 layer that hides this, making it appear as a bus-style network except there is no ARP (it is not needed). So in the left-most node for example, I have the following interfaces: eth0.1 A1/32 // connection from the customer LAN to the array (left access) eth0.5 Private1/29 // private intra-node interconnect vsc0 Private2/28 // private array interconnect vsc1:0 A1/28 // public access to the array (from the left) vsc1:1 B1/28 // public access to the array (from the right) The routing table has, in addition to the obvious, a default route via eth0.1 to an address Ax (router attached to the left-most node). [It is really supposed to have two rule-based routes to be able to return the packet in the direction it came, but for some reason the ip rule command does not work, and I have not had a chance to debug that yet]. The purpose of the proxy ARP is to proxy for the ARP-less nodes (on vsc1) hidden behind this node, when they are accessed from the left, i.e., using the Ai addresses, via eth0.1 which is the only public interface here. So I turned on the proxy ARP on eth0.1. My question was, if the proxy ARP is based on the routing table, then how do I do I tell it that I want to proxy only for the Ai addresses, and not for the Bi or any of the private addresses? And the problem I was observing is that this node proxies for the Ax address, when the requests for it are seen on eth0.1. These are legitimate requests of nodes out there trying to talk to the router, not to my array. > >I only experienced problems when there was other IP networks on the same >Ethernet but not known to the box. In this case I had to enable the >arp_ignore sysctl to stop answering "other" ARP queries for networks not >defined on the same Ethernet interface. This may actually be my problem. The Ax address the proxy ARP wrongly answers is not part of the eth0.1 subnet. Regards -Zdenek From linville@bilbo.tuxdriver.com Thu Jul 7 07:27:34 2005 Received: with ECARTIS (v1.0.0; list netdev); Thu, 07 Jul 2005 07:27:38 -0700 (PDT) Received: from ra.tuxdriver.com (ra.tuxdriver.com [24.172.12.4]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j67ERXH9009991 for ; Thu, 7 Jul 2005 07:27:34 -0700 Received: from bilbo.tuxdriver.com (azure.tuxdriver.com [24.172.12.5]) by ra.tuxdriver.com (8.13.3/8.13.3) with ESMTP id j67EMBOI021618 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Thu, 7 Jul 2005 10:22:12 -0400 Received: from bilbo.tuxdriver.com (localhost.localdomain [127.0.0.1]) by bilbo.tuxdriver.com (8.13.1/8.13.1) with ESMTP id j67EPmPM010078; Thu, 7 Jul 2005 10:25:48 -0400 Received: (from linville@localhost) by bilbo.tuxdriver.com (8.13.1/8.13.1/Submit) id j67EPmZS010077; Thu, 7 Jul 2005 10:25:48 -0400 Date: Thu, 7 Jul 2005 10:25:48 -0400 From: "John W. Linville" To: Greg KH Cc: Mitch Williams , Radheka Godse , netdev@oss.sgi.com Subject: Re: [PATCH 2.6.13-rc1 8/17] bonding: SYSFS INTERFACE (large) Message-ID: <20050707142544.GA9418@tuxdriver.com> Mail-Followup-To: Greg KH , Mitch Williams , Radheka Godse , netdev@oss.sgi.com References: <20050702081346.GA20789@kroah.com> <20050706195232.GB18359@kroah.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20050706195232.GB18359@kroah.com> User-Agent: Mutt/1.4.1i X-archive-position: 2667 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: linville@tuxdriver.com Precedence: bulk X-list: netdev Content-Length: 887 Lines: 30 On Wed, Jul 06, 2005 at 12:52:32PM -0700, Greg KH wrote: > On Wed, Jul 06, 2005 at 11:53:13AM -0700, Mitch Williams wrote: > > scheme. Since it really is an array of similar values, we are hoping that > > it will be viewed as socially acceptable. > > No. > > How about this: > bond_add - write to this to add a new bond, one value only. > bond_remove - write to this to remove a bond that is present. > bonds/bond0 > bonds/bond1 > bonds/bond2 > ... > - list of bonds currently present. If you want, you > could make those bondX files directories, and put > other info about the individual bonds in there, if you > need it (I know nothing about the bonding intrerface, > sorry.) > > Would that work? I like that suggestion. It keeps the interface creation/deletion a little more independent of each other. John -- John W. Linville linville@tuxdriver.com From webmaster@rapidforum.com Thu Jul 7 08:54:26 2005 Received: with ECARTIS (v1.0.0; list netdev); Thu, 07 Jul 2005 08:54:29 -0700 (PDT) Received: from rapidforum.com (www.rapidforum.com [80.237.244.2]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with SMTP id j67FsPH9022523 for ; Thu, 7 Jul 2005 08:54:26 -0700 Received: (qmail 24479 invoked by uid 1004); 7 Jul 2005 15:52:47 -0000 Received: from p549f0a42.dip0.t-ipconnect.de (HELO ?84.159.10.66?) (dragony@84.159.10.66) by www.rapidforum.com with SMTP; 7 Jul 2005 15:52:47 -0000 Message-ID: <42CD4F8F.9060503@rapidforum.com> Date: Thu, 07 Jul 2005 17:51:43 +0200 From: Christian Schmid User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7.7) Gecko/20050414 X-Accept-Language: de, en MIME-Version: 1.0 To: netdev@oss.sgi.com Subject: Bug still exists... Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit X-archive-position: 2668 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: webmaster@rapidforum.com Precedence: bulk X-list: netdev Content-Length: 484 Lines: 9 Hi. The vm slow-down bug starting at 4000 sockets still exists. I now have finally traced the issue down. It depends on the number of sockets AND the bandwidth you need. So I suppose it depends on the number of packets flowing through the NIC. I managed to completely remove the bug by using two cards and the nexthop-parameter to use eth0 and eth1 with the same gateway-ip. Maybe interrupt-problems? Slow-down appears on intel and broadcom cards. Other cards not tested. Chris From akepner@sgi.com Thu Jul 7 13:20:13 2005 Received: with ECARTIS (v1.0.0; list netdev); Thu, 07 Jul 2005 13:20:21 -0700 (PDT) Received: from omx1.americas.sgi.com (omx1-ext.sgi.com [192.48.179.11]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j67KKAH9002865 for ; Thu, 7 Jul 2005 13:20:11 -0700 Received: from cthulhu.engr.sgi.com (cthulhu.engr.sgi.com [192.26.80.2]) by omx1.americas.sgi.com (8.12.10/8.12.9/linux-outbound_gateway-1.1) with ESMTP id j67KIXxT027098 for ; Thu, 7 Jul 2005 15:18:33 -0500 Received: from [192.168.2.20] (mtv-vpn-sw-corp-0-42.corp.sgi.com [134.15.0.42]) by cthulhu.engr.sgi.com (SGI-8.12.5/8.12.5) with ESMTP id j67KIUdP45231961; Thu, 7 Jul 2005 13:18:31 -0700 (PDT) Date: Thu, 7 Jul 2005 13:14:26 -0700 (PDT) From: Arthur Kepner X-X-Sender: akepner@resonance.WorkGroup To: Herbert Xu cc: netdev@vger.kernel.org, netdev@oss.sgi.com Subject: Re: [RFC/PATCH] "safer ipv4 reassembly" (fwd) Message-ID: MIME-Version: 1.0 Content-Type: MULTIPART/MIXED; BOUNDARY="32512-1443077357-1120766375=:24321" Content-ID: X-archive-position: 2669 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: akepner@sgi.com Precedence: bulk X-list: netdev Content-Length: 18215 Lines: 301 This message is in MIME format. The first part should be readable text, while the remaining parts are likely unreadable without MIME-aware tools. --32512-1443077357-1120766375=:24321 Content-Type: TEXT/PLAIN; CHARSET=US-ASCII Content-ID: Pine.LNX.4.61.0507071300211.24321@resonance.WorkGroup Version 2 of the rfc/patch is attached. It has been changed as indicated in the commentary below. Diffstat: include/linux/sysctl.h | 1 net/ipv4/ip_fragment.c | 195 +++++++++++++++++++++++++++++++++++++++++++++ net/ipv4/sysctl_net_ipv4.c | 11 ++ Signed-off-by: Arthur Kepner On Tue, 28 Jun 2005, Arthur Kepner wrote: > > On Sun, 26 Jun 2005, Herbert Xu wrote: > > > > > > +struct ipc { > > > ...... > > > + struct rcu_head rcu; > > > > Is RCU worth it here? The only time we'd be taking the locks on this > > is when the first fragment of a packet comes in. At that point we'll > > be taking write_lock(&ipfrag_lock) anyway. > > Yes, I think rcu is worth it here. The reason is that to not use rcu would necessitate grabbing the (global) ipfrag_lock an additional time, when we free an ipc. Adding an "ipc" to the hashtable could be done under the ipfrag_lock, as you mention. But removing an ipc shouldn't be done at the same time that fragments are destroyed, because the common case is that another fragment queue will soon be created for the same (src,dst,proto). Better to save the ipc for a while to avoid freeing and then immediately recreating it. Since the freeing of the ipc has to be deferred until well after the last associated fragment queue has been freed, we can't take advantage of the fact that the ipfrag_lock is held when the fragment queue is freed. So when finally freeing the ipc, we can either grab the global ipfrag_lock again, or use some other, finer-grained lock to protect the ipc_hash entries. I'd prefer to avoid introducing new uses of global locks. If we use the finer-grained ipc_hash[].lock locks then rcu allows us to avoid taking any locks in ipc_find when we create a new fragment chain and there already happens to be an ipc for the associated (src,dst,proto). (I suspect this would be a fairly common case.) > > The only other use of RCU in your patch is ip_count. That should be > > changed to be done in ip_defrag instead. At that point you can simply > > find the ipc by deferencing ipq, so no need for __ipc_find and hence > > RCU. > > > > The reason you need to change it in this way is because you can't make > > assumptions about ip_rcv_finish being the first place where a packet > > is defragmented. With connection tracking enabled conntrack is the first > > place where defragmentation occurs. > > > ..... This has been fixed. ip_input.c isn't changed by this version of the patch. But there's the caveat that I mentioned earlier: > > There is a (big) advantage to doing this in ip_defrag() - this > becomes a no-op for non-fragmented datagrams. The disadvantage > is that there could be a situation where you receive: > > 1) first fragment of datagram X [for a particular (src,dst,proto)] > 2) a zillion non-fragmented datagrams [for the same (src,dst,proto)] > 3) last fragment of datagram X [for (src,dst,proto)] > > and no "disorder" would be detected for the datagrams associated > with (src,dst,proto), even though the ip id could have wrapped in the > meantime. This seems like a very uncommon case, however. > > > > > +#define IPC_HASHSZ IPQ_HASHSZ > > > +static struct { > > > + struct hlist_head head; > > > + spinlock_t lock; > > > +} ipc_hash[IPC_HASHSZ]; > > > > I'd store ipc entries in the main ipq hash table since they can use > > the same keys for lookup as ipq entries. You just need to set protocol > > to zero and map the user to values specific to ipc for ipc entries. > > One mapping would be to set the top bit of user for ipc entries, e.g. > > > > #define IP_DEFRAG_IPC 0x80000000 > > ipc->user = ipq->user | IP_DEFRAG_IPC; > > > > Of course you also need to make sure that the two structures share > > the leading elements. You can then use the user field to distinguish > > between ipc/ipq entries. > I thought about this point, but I dislike reusing the same structure for such different purposes, so left this unchanged. Comments? -- Arthur --32512-1443077357-1120766375=:24321 Content-Type: TEXT/PLAIN; CHARSET=US-ASCII; NAME="ip_reasm.patch.2" Content-Transfer-Encoding: BASE64 Content-ID: Pine.LNX.4.61.0507071259350.24321@resonance.WorkGroup Content-Description: ip_reasm.patch.2 Content-Disposition: ATTACHMENT; FILENAME="ip_reasm.patch.2" ZGlmZiAtcnVwIGxpbnV4Lm9yaWcvaW5jbHVkZS9saW51eC9zeXNjdGwuaCBsaW51eC9pbmNsdWRl L2xpbnV4L3N5c2N0bC5oDQotLS0gbGludXgub3JpZy9pbmNsdWRlL2xpbnV4L3N5c2N0bC5oCTIw MDUtMDctMDYgMTI6MzI6MDMuMjI0NTQ2OTUzIC0wNzAwDQorKysgbGludXgvaW5jbHVkZS9saW51 eC9zeXNjdGwuaAkyMDA1LTA3LTA3IDA5OjU2OjQyLjg1NDg0NTA4OSAtMDcwMA0KQEAgLTM0Myw2 ICszNDMsNyBAQCBlbnVtDQogCU5FVF9UQ1BfQklDX0JFVEE9MTA4LA0KIAlORVRfSVBWNF9JQ01Q X0VSUk9SU19VU0VfSU5CT1VORF9JRkFERFI9MTA5LA0KIAlORVRfVENQX0NPTkdfQ09OVFJPTD0x MTAsDQorCU5FVF9JUFY0X1JFQVNNX0NPVU5UPTExMSwNCiB9Ow0KIA0KIGVudW0gew0KZGlmZiAt cnVwIGxpbnV4Lm9yaWcvbmV0L2lwdjQvaXBfZnJhZ21lbnQuYyBsaW51eC9uZXQvaXB2NC9pcF9m cmFnbWVudC5jDQotLS0gbGludXgub3JpZy9uZXQvaXB2NC9pcF9mcmFnbWVudC5jCTIwMDUtMDct MDYgMTI6MzA6NTEuMDMzMzgwODMwIC0wNzAwDQorKysgbGludXgvbmV0L2lwdjQvaXBfZnJhZ21l bnQuYwkyMDA1LTA3LTA3IDA5OjU2OjQyLjg1Njc5ODIzNCAtMDcwMA0KQEAgLTU2LDYgKzU2LDgg QEANCiBpbnQgc3lzY3RsX2lwZnJhZ19oaWdoX3RocmVzaCA9IDI1NioxMDI0Ow0KIGludCBzeXNj dGxfaXBmcmFnX2xvd190aHJlc2ggPSAxOTIqMTAyNDsNCiANCitpbnQgc3lzY3RsX2lwX3JlYXNz ZW1ibHlfY291bnQ7DQorDQogLyogSW1wb3J0YW50IE5PVEUhIEZyYWdtZW50IHF1ZXVlIG11c3Qg YmUgZGVzdHJveWVkIGJlZm9yZSBNU0wgZXhwaXJlcy4NCiAgKiBSRkM3OTEgaXMgd3JvbmcgcHJv cG9zaW5nIHRvIHByb2xvbmdhdGUgdGltZXIgZWFjaCBmcmFnbWVudCBhcnJpdmFsIGJ5IFRUTC4N CiAgKi8NCkBAIC02OSw2ICs3MSwyNSBAQCBzdHJ1Y3QgaXBmcmFnX3NrYl9jYg0KIA0KICNkZWZp bmUgRlJBR19DQihza2IpCSgoc3RydWN0IGlwZnJhZ19za2JfY2IqKSgoc2tiKS0+Y2IpKQ0KIA0K Ky8qIHN0cnVjdCBpcGMgY29udGFpbnMgYSBjb3VudCBvZiB0aGUgbnVtYmVyIG9mIElQIGRhdGFn cmFtcyANCisgKiByZWNlaXZlZCBmb3IgYSAoc2FkZHIsIGRhZGRyLCBwcm90b2NvbCkgdHVwbGUg LSBidXQgb25lIG9mIA0KKyAqIHRoZXNlIHN0cnVjdHVyZXMgZXhpc3RzIGZvciBhIGdpdmVuIChz YWRkciwgZGFkZHIsIHByb3RvY29sKSANCisgKiBpZiBhbmQgb25seSBpZiB0aGVyZSBpcyBhIHF1 ZXVlIG9mIElQIGZyYWdtZW50cyBhc3NvY2lhdGVkIA0KKyAqIHdpdGggdGhhdCAzLXR1cGxlIGFu ZCBzeXNjdGxfaXBfcmVhc3NlbWJseV9jb3VudCBpcyBub24temVyby4NCisgKi8NCitzdHJ1Y3Qg aXBjIHsNCisJc3RydWN0IGhsaXN0X25vZGUJbm9kZTsNCisJdTMyCQkJc2FkZHI7DQorCXUzMgkJ CWRhZGRyOw0KKwl1OAkJCXByb3RvY29sOw0KKwlhdG9taWNfdAkJcmVmY250OwkvKiBob3cgbWFu eSBpcHFzIGhvbGQgcmVmcyB0byB1cyAqLw0KKwlhdG9taWNfdAkJc2VxOwkvKiBob3cgbWFueSBp cCBmcmFnbWVudHMgZm9yIHRoaXMgDQorCQkJCQkgKiAoc2FkZHIsZGFkZHIscHJvdG9jb2wpIHNp bmNlIHdlIA0KKwkJCQkJICogd2VyZSBjcmVhdGVkICovDQorCXN0cnVjdCB0aW1lcl9saXN0CXRp bWVyOw0KKwlzdHJ1Y3QgcmN1X2hlYWQJCXJjdTsNCit9Ow0KKw0KIC8qIERlc2NyaWJlIGFuIGVu dHJ5IGluIHRoZSAiaW5jb21wbGV0ZSBkYXRhZ3JhbXMiIHF1ZXVlLiAqLw0KIHN0cnVjdCBpcHEg ew0KIAlzdHJ1Y3QgaXBxCSpuZXh0OwkJLyogbGlua2VkIGxpc3QgcG9pbnRlcnMJCQkqLw0KQEAg LTkyLDYgKzExMywxNCBAQCBzdHJ1Y3QgaXBxIHsNCiAJc3RydWN0IGlwcQkqKnBwcmV2Ow0KIAlp bnQJCWlpZjsNCiAJc3RydWN0IHRpbWV2YWwJc3RhbXA7DQorCXN0cnVjdCBpcGMJKmlwYzsNCisJ YXRvbWljX3QJc2VxOwkJDQorCS8qIGlwcS0+c2VxIGRlZmluZXMgdGhlICJib3R0b20iIG9mIHRo ZSB3aW5kb3cgb2Ygc2VxdWVuY2UgbnVtYmVycyANCisJICogdGhhdCBhcmUgdmFsaWQgZm9yIHRo aXMgZnJhZ21lbnQgLSB0aGUgInRvcCIgb2YgdGhlIHdpbmRvdyBpcyANCisJICogKGlwcS0+c2Vx ICsgc3lzY3RsX2lwX3JlYXNzZW1ibHlfY291bnQpLiBpcHEtPnNlcSBpcyBpbml0aWFsaXplZA0K KwkgKiB0byB0aGUgdmFsdWUgaW4gdGhlIGFzc29jaWF0ZWQgaXBjIHdoZW4gdGhlIGZyYWdtZW50 IHF1ZXVlIGlzIA0KKwkgKiBjcmVhdGVkLCBhbmQgaW5jcmVtZW50ZWQgZWFjaCB0aW1lIGEgZnJh Z21lbnQgaXMgYWRkZWQgdG8gdGhlIA0KKwkgKiBxdWV1ZSAqLw0KIH07DQogDQogLyogSGFzaCB0 YWJsZS4gKi8NCkBAIC0xMDUsNiArMTM0LDEyIEBAIHN0YXRpYyB1MzIgaXBmcmFnX2hhc2hfcm5k Ow0KIHN0YXRpYyBMSVNUX0hFQUQoaXBxX2xydV9saXN0KTsNCiBpbnQgaXBfZnJhZ19ucXVldWVz ID0gMDsNCiANCisjZGVmaW5lIElQQ19IQVNIU1oJSVBRX0hBU0hTWg0KK3N0YXRpYyBzdHJ1Y3Qg ew0KKwlzdHJ1Y3QgaGxpc3RfaGVhZCBoZWFkOw0KKwlzcGlubG9ja190IGxvY2s7DQorfSBpcGNf aGFzaFtJUENfSEFTSFNaXTsNCisNCiBzdGF0aWMgX19pbmxpbmVfXyB2b2lkIF9faXBxX3VubGlu ayhzdHJ1Y3QgaXBxICpxcCkNCiB7DQogCWlmKHFwLT5uZXh0KQ0KQEAgLTEyMSw2ICsxNTYsMTEg QEAgc3RhdGljIF9faW5saW5lX18gdm9pZCBpcHFfdW5saW5rKHN0cnVjdA0KIAl3cml0ZV91bmxv Y2soJmlwZnJhZ19sb2NrKTsNCiB9DQogDQorc3RhdGljIHVuc2lnbmVkIGludCBpcGNoYXNoZm4o dTMyIHNhZGRyLCB1MzIgZGFkZHIsIHU4IHByb3QpDQorew0KKwlyZXR1cm4gamhhc2hfM3dvcmRz KHByb3QsIHNhZGRyLCBkYWRkciwgMCkgJiAoSVBDX0hBU0hTWiAtIDEpOw0KK30NCisNCiBzdGF0 aWMgdW5zaWduZWQgaW50IGlwcWhhc2hmbih1MTYgaWQsIHUzMiBzYWRkciwgdTMyIGRhZGRyLCB1 OCBwcm90KQ0KIHsNCiAJcmV0dXJuIGpoYXNoXzN3b3JkcygodTMyKWlkIDw8IDE2IHwgcHJvdCwg c2FkZHIsIGRhZGRyLA0KQEAgLTIzMSw4ICsyNzEsMTYgQEAgc3RhdGljIF9faW5saW5lX18gdm9p ZCBpcHFfcHV0KHN0cnVjdCBpcA0KICAqLw0KIHN0YXRpYyB2b2lkIGlwcV9raWxsKHN0cnVjdCBp cHEgKmlwcSkNCiB7DQorCXN0cnVjdCBpcGMgKmNwID0gaXBxLT5pcGM7DQorDQogCWlmIChkZWxf dGltZXIoJmlwcS0+dGltZXIpKQ0KIAkJYXRvbWljX2RlYygmaXBxLT5yZWZjbnQpOw0KKwlpZiAo Y3ApIHsNCisJCWF0b21pY19kZWMoJmNwLT5yZWZjbnQpOw0KKwkJLyogbm8gcGFydGljdWxhciBy ZWFzb24gdG8gdXNlIHN5c2N0bF9pcGZyYWdfdGltZSANCisJCSAqIGZvciB0aGlzIHRpbWVyICov DQorCQltb2RfdGltZXIoJmNwLT50aW1lciwgamlmZmllcyArIHN5c2N0bF9pcGZyYWdfdGltZSk7 DQorCX0NCiANCiAJaWYgKCEoaXBxLT5sYXN0X2luICYgQ09NUExFVEUpKSB7DQogCQlpcHFfdW5s aW5rKGlwcSk7DQpAQCAtMzQ4LDEwICszOTYsMTEwIEBAIHN0YXRpYyBzdHJ1Y3QgaXBxICppcF9m cmFnX2ludGVybih1bnNpZ24NCiAJcmV0dXJuIHFwOw0KIH0NCiANCitzdGF0aWMgaW5saW5lIHZv aWQgX19pcGNfZGVzdHJveShzdHJ1Y3QgcmN1X2hlYWQgKmhlYWQpDQorew0KKwlrZnJlZShjb250 YWluZXJfb2YoaGVhZCwgc3RydWN0IGlwYywgcmN1KSk7DQorfQ0KKw0KK3N0YXRpYyB2b2lkIGlw Y19kZXN0cm95KHVuc2lnbmVkIGxvbmcgYXJnKSANCit7DQorCXN0cnVjdCBpcGMgKmNwID0gKHN0 cnVjdCBpcGMgKikgYXJnOw0KKwl1bnNpZ25lZCBpbnQgaGFzaCA9IGlwY2hhc2hmbihjcC0+c2Fk ZHIsIGNwLT5kYWRkciwgY3AtPnByb3RvY29sKTsNCisNCisJc3Bpbl9sb2NrKCZpcGNfaGFzaFto YXNoXS5sb2NrKTsNCisJQlVHX09OKChhdG9taWNfcmVhZCgmY3AtPnJlZmNudCkpIDwgMCk7DQor DQorCWlmIChhdG9taWNfcmVhZCgmY3AtPnJlZmNudCkgPT0gMCkgew0KKwkJaGxpc3RfZGVsX3Jj dSgmY3AtPm5vZGUpOw0KKwkJY2FsbF9yY3UoJmNwLT5yY3UsIF9faXBjX2Rlc3Ryb3kpOw0KKwl9 DQorCXNwaW5fdW5sb2NrKCZpcGNfaGFzaFtoYXNoXS5sb2NrKTsNCit9DQorDQorLyogDQorICog bXVzdCBob2xkIHNwaW5sb2NrIGZvciB0aGUgYXBwcm9wcmlhdGUgaGFzaCBsaXN0IGhlYWQgd2hl biANCisgKiBfX2lwY19jcmVhdGUgaXMgY2FsbGVkIA0KKyAqLw0KKw0KK3N0YXRpYyBpbmxpbmUg c3RydWN0IGlwYyAqX19pcGNfY3JlYXRlKHN0cnVjdCBpcGhkciAqaXBoLCANCisJCQkJICAgICAg IGNvbnN0IHVuc2lnbmVkIGludCBoYXNoKSANCit7DQorCXN0cnVjdCBpcGMgKmNwID0ga21hbGxv YyhzaXplb2Yoc3RydWN0IGlwYyksIEdGUF9BVE9NSUMpOw0KKwkvKiBYWFggc2hvdWxkIHdlIGFj Y291bnQgc2l6ZSB0byBpcF9mcmFnX21lbSA/Pz8gKi8NCisJaWYgKGNwKSB7DQorCQljcC0+c2Fk ZHIgPSBpcGgtPnNhZGRyOw0KKwkJY3AtPmRhZGRyID0gaXBoLT5kYWRkcjsNCisJCWNwLT5wcm90 b2NvbCA9IGlwaC0+cHJvdG9jb2w7DQorCQlhdG9taWNfc2V0KCZjcC0+c2VxLCAwKTsNCisJCWF0 b21pY19zZXQoJmNwLT5yZWZjbnQsIDEpOw0KKwkJSU5JVF9ITElTVF9OT0RFKCZjcC0+bm9kZSk7 DQorCQlobGlzdF9hZGRfaGVhZF9yY3UoJmNwLT5ub2RlLCAmaXBjX2hhc2hbaGFzaF0uaGVhZCk7 DQorCQlpbml0X3RpbWVyKCZjcC0+dGltZXIpOw0KKwkJY3AtPnRpbWVyLmRhdGEgPSAodW5zaWdu ZWQgbG9uZykgY3A7DQorCQljcC0+dGltZXIuZnVuY3Rpb24gPSBpcGNfZGVzdHJveTsNCisJfSBl bHNlIHsNCisJCU5FVERFQlVHKGlmIChuZXRfcmF0ZWxpbWl0KCkpIA0KKwkJCXByaW50ayhLRVJO X0VSUiAiX19pcGNfY3JlYXRlOiBubyBtZW1vcnkgbGVmdCAhXG4iKSk7DQorCX0NCisJcmV0dXJu IGNwOw0KK30NCisNCisvKiANCisgKiBtdXN0IGJlICJyY3Ugc2FmZSIgd2hlbiBfX2lwY19maW5k IGlzIGNhbGxlZCAtIGVpdGhlciB1c2UgDQorICogcmN1X3JlYWRfbG9jayAoaWYgeW91IGludGVu ZCBvbmx5IHRvIHJlYWQgdGhlIHJldHVybmVkIHN0cnVjdCkgDQorICogb3IgZ3JhYiB0aGUgc3Bp bmxvY2sgZm9yIHRoZSBhcHByb3ByaWF0ZSBoYXNoIGxpc3QgaGVhZCAoaWYgDQorICogeW91IG1p Z2h0IG1vZGlmeSB0aGUgcmV0dXJuZWQgc3RydWN0KSANCisgKi8NCitzdGF0aWMgaW5saW5lIHN0 cnVjdCBpcGMgKl9faXBjX2ZpbmQodTMyIHNhZGRyLCB1MzIgZGFkZHIsIHU4IHByb3RvY29sLCAN CisJCQkJICAgICBjb25zdCB1bnNpZ25lZCBpbnQgaGFzaCkNCit7DQorCXN0cnVjdCBobGlzdF9u b2RlICpwOw0KKw0KKwlobGlzdF9mb3JfZWFjaF9yY3UocCwgJmlwY19oYXNoW2hhc2hdLmhlYWQp IHsNCisJCXN0cnVjdCBpcGMgKiBjcCA9IChzdHJ1Y3QgaXBjICopcDsNCisJCWlmKGNwLT5zYWRk ciA9PSBzYWRkciAmJg0KKwkJICAgY3AtPmRhZGRyID09IGRhZGRyICYmDQorCQkgICBjcC0+cHJv dG9jb2wgPT0gcHJvdG9jb2wpIHsNCisJCQlyZXR1cm4gY3A7DQorCQl9DQorCX0NCisJcmV0dXJu IE5VTEw7DQorfQ0KKw0KK3N0YXRpYyBzdHJ1Y3QgaXBjICppcGNfZmluZChzdHJ1Y3QgaXBoZHIg KmlwaCkNCit7DQorCXN0cnVjdCBpcGMgKmNwOw0KKwl1bnNpZ25lZCBpbnQgaGFzaCA9IGlwY2hh c2hmbihpcGgtPnNhZGRyLCBpcGgtPmRhZGRyLCBpcGgtPnByb3RvY29sKTsNCisNCisJcmN1X3Jl YWRfbG9jaygpOw0KKwlpZigoY3AgPSBfX2lwY19maW5kKGlwaC0+c2FkZHIsIGlwaC0+ZGFkZHIs IA0KKwkJCSAgICBpcGgtPnByb3RvY29sLCBoYXNoKSkgIT0gTlVMTCkgew0KKwkJYXRvbWljX2lu YygmY3AtPnJlZmNudCk7DQorCQlyY3VfcmVhZF91bmxvY2soKTsNCisJCXJldHVybiBjcDsNCisJ fQ0KKwlyY3VfcmVhZF91bmxvY2soKTsNCisJc3Bpbl9sb2NrKCZpcGNfaGFzaFtoYXNoXS5sb2Nr KTsNCisJaWYoKGNwID0gX19pcGNfZmluZChpcGgtPnNhZGRyLCBpcGgtPmRhZGRyLCANCisJCQkg ICAgaXBoLT5wcm90b2NvbCwgaGFzaCkpICE9IE5VTEwpIHsNCisJCWF0b21pY19pbmMoJmNwLT5y ZWZjbnQpOw0KKwkJc3Bpbl91bmxvY2soJmlwY19oYXNoW2hhc2hdLmxvY2spOw0KKwkJcmV0dXJu IGNwOw0KKwl9DQorCWNwID0gX19pcGNfY3JlYXRlKGlwaCwgaGFzaCk7DQorCXNwaW5fdW5sb2Nr KCZpcGNfaGFzaFtoYXNoXS5sb2NrKTsNCisJcmV0dXJuIGNwOw0KK30NCisNCisNCiAvKiBBZGQg YW4gZW50cnkgdG8gdGhlICdpcHEnIHF1ZXVlIGZvciBhIG5ld2x5IHJlY2VpdmVkIElQIGRhdGFn cmFtLiAqLw0KIHN0YXRpYyBzdHJ1Y3QgaXBxICppcF9mcmFnX2NyZWF0ZSh1bnNpZ25lZCBoYXNo LCBzdHJ1Y3QgaXBoZHIgKmlwaCwgdTMyIHVzZXIpDQogew0KIAlzdHJ1Y3QgaXBxICpxcDsNCisJ c3RydWN0IGlwYyAqY3AgPSBOVUxMOw0KKw0KKwlpZiAoc3lzY3RsX2lwX3JlYXNzZW1ibHlfY291 bnQgJiYgKGNwID0gaXBjX2ZpbmQoaXBoKSkgPT0gTlVMTCkNCisJCXJldHVybiBOVUxMOw0KIA0K IAlpZiAoKHFwID0gZnJhZ19hbGxvY19xdWV1ZSgpKSA9PSBOVUxMKQ0KIAkJZ290byBvdXRfbm9t ZW07DQpAQCAtMzY2LDYgKzUxNCwxMCBAQCBzdGF0aWMgc3RydWN0IGlwcSAqaXBfZnJhZ19jcmVh dGUodW5zaWduDQogCXFwLT5tZWF0ID0gMDsNCiAJcXAtPmZyYWdtZW50cyA9IE5VTEw7DQogCXFw LT5paWYgPSAwOw0KKwlxcC0+aXBjID0gY3A7DQorCWlmIChzeXNjdGxfaXBfcmVhc3NlbWJseV9j b3VudCAmJiBjcCkgew0KKwkJYXRvbWljX3NldCgmcXAtPnNlcSwgYXRvbWljX3JlYWQoJmNwLT5z ZXEpKTsNCisJfQ0KIA0KIAkvKiBJbml0aWFsaXplIGEgdGltZXIgZm9yIHRoaXMgZW50cnkuICov DQogCWluaXRfdGltZXIoJnFwLT50aW1lcik7DQpAQCAtMzgxLDYgKzUzMywzOSBAQCBvdXRfbm9t ZW06DQogCXJldHVybiBOVUxMOw0KIH0NCiANCitzdGF0aWMgaW5saW5lIGludCBpbl93aW5kb3co aW50IGJvdHRvbSwgaW50IHNpemUsIGludCBzZXEpIHsNCisJcmV0dXJuICgoKHNlcSAtIGJvdHRv bSkgPj0gMCkgJiYgKChzZXEgLSAoYm90dG9tICsgc2l6ZSkpIDwgMCkpOw0KK30NCisNCitzdGF0 aWMgaW50IF9faXBfcmVhc3NlbWJseV9jb3VudF9jaGVjayhjb25zdCBzdHJ1Y3QgaXBoZHIgKmlw aCwgc3RydWN0IGlwcSAqcXApDQorew0KKwlzdHJ1Y3QgaXBjICpjcCA9IHFwLT5pcGM7DQorCWlu dCBjc2VxLCBxc2VxOw0KKw0KKwkvKiBxcC0+aXBjIG1heSBiZSBOVUxMIGlmIHN5c2N0bF9pcF9y ZWFzc2VtYmx5X2NvdW50IHdhcyBvZmYgDQorCSAqIGF0IHRoZSB0aW1lIHRoZSBmcmFnbWVudCBx dWV1ZSB3YXMgY3JlYXRlZCAqLw0KKwlpZiAoY3AgPT0gTlVMTCkNCisJCXJldHVybiAwOw0KKw0K Kwljc2VxID0gYXRvbWljX2luY19yZXR1cm4oJmNwLT5zZXEpOw0KKwlxc2VxID0gYXRvbWljX2lu Y19yZXR1cm4oJnFwLT5zZXEpOw0KKw0KKwlpZiAoIWluX3dpbmRvdyhxc2VxLCBzeXNjdGxfaXBf cmVhc3NlbWJseV9jb3VudCwgY3NlcSkpIHsNCisJCWF0b21pY19pbmMoJnFwLT5yZWZjbnQpOw0K KwkJcmVhZF91bmxvY2soJmlwZnJhZ19sb2NrKTsNCisJCXNwaW5fbG9jaygmcXAtPmxvY2spOw0K KwkJaWYgKCEocXAtPmxhc3RfaW4mQ09NUExFVEUpKQ0KKwkJCWlwcV9raWxsKHFwKTsNCisJCXNw aW5fdW5sb2NrKCZxcC0+bG9jayk7DQorCQlpcHFfcHV0KHFwLCBOVUxMKTsNCisJCUlQX0lOQ19T VEFUU19CSChJUFNUQVRTX01JQl9SRUFTTUZBSUxTKTsNCisJCXJlYWRfbG9jaygmaXBmcmFnX2xv Y2spOw0KKwkJcmV0dXJuIDE7DQorCX0NCisJcmV0dXJuIDA7DQorfQ0KKw0KKw0KIC8qIEZpbmQg dGhlIGNvcnJlY3QgZW50cnkgaW4gdGhlICJpbmNvbXBsZXRlIGRhdGFncmFtcyIgcXVldWUgZm9y DQogICogdGhpcyBJUCBkYXRhZ3JhbSwgYW5kIGNyZWF0ZSBuZXcgb25lLCBpZiBub3RoaW5nIGlz IGZvdW5kLg0KICAqLw0KQEAgLTQwMCw2ICs1ODUsMTAgQEAgc3RhdGljIGlubGluZSBzdHJ1Y3Qg aXBxICppcF9maW5kKHN0cnVjdA0KIAkJICAgcXAtPmRhZGRyID09IGRhZGRyCSYmDQogCQkgICBx cC0+cHJvdG9jb2wgPT0gcHJvdG9jb2wgJiYNCiAJCSAgIHFwLT51c2VyID09IHVzZXIpIHsNCisJ CQlpZiAoc3lzY3RsX2lwX3JlYXNzZW1ibHlfY291bnQgJiYNCisJCQkJX19pcF9yZWFzc2VtYmx5 X2NvdW50X2NoZWNrKGlwaCwgcXApKSB7DQorCQkJCWJyZWFrOw0KKwkJCX0NCiAJCQlhdG9taWNf aW5jKCZxcC0+cmVmY250KTsNCiAJCQlyZWFkX3VubG9jaygmaXBmcmFnX2xvY2spOw0KIAkJCXJl dHVybiBxcDsNCkBAIC02NzksOSArODY4LDE1IEBAIHN0cnVjdCBza19idWZmICppcF9kZWZyYWco c3RydWN0IHNrX2J1ZmYNCiANCiB2b2lkIGlwZnJhZ19pbml0KHZvaWQpDQogew0KKwlpbnQgaTsN CiAJaXBmcmFnX2hhc2hfcm5kID0gKHUzMikgKChudW1fcGh5c3BhZ2VzIF4gKG51bV9waHlzcGFn ZXM+PjcpKSBeDQogCQkJCSAoamlmZmllcyBeIChqaWZmaWVzID4+IDYpKSk7DQogDQorCWZvciAo aSA9IDA7IGkgPCBJUENfSEFTSFNaOyBpKysgKSB7DQorCQlJTklUX0hMSVNUX0hFQUQoJmlwY19o YXNoW2ldLmhlYWQpOw0KKwkJc3Bpbl9sb2NrX2luaXQoJmlwY19oYXNoW2ldLmxvY2spOw0KKwl9 DQorDQogCWluaXRfdGltZXIoJmlwZnJhZ19zZWNyZXRfdGltZXIpOw0KIAlpcGZyYWdfc2VjcmV0 X3RpbWVyLmZ1bmN0aW9uID0gaXBmcmFnX3NlY3JldF9yZWJ1aWxkOw0KIAlpcGZyYWdfc2VjcmV0 X3RpbWVyLmV4cGlyZXMgPSBqaWZmaWVzICsgc3lzY3RsX2lwZnJhZ19zZWNyZXRfaW50ZXJ2YWw7 DQpkaWZmIC1ydXAgbGludXgub3JpZy9uZXQvaXB2NC9zeXNjdGxfbmV0X2lwdjQuYyBsaW51eC9u ZXQvaXB2NC9zeXNjdGxfbmV0X2lwdjQuYw0KLS0tIGxpbnV4Lm9yaWcvbmV0L2lwdjQvc3lzY3Rs X25ldF9pcHY0LmMJMjAwNS0wNy0wNiAxMjozNDoxNi42ODU4Njk1MzggLTA3MDANCisrKyBsaW51 eC9uZXQvaXB2NC9zeXNjdGxfbmV0X2lwdjQuYwkyMDA1LTA3LTA3IDA5OjU2OjQyLjg1Nzc3NDgw NiAtMDcwMA0KQEAgLTMwLDYgKzMwLDcgQEAgZXh0ZXJuIGludCBzeXNjdGxfaXBmcmFnX2xvd190 aHJlc2g7DQogZXh0ZXJuIGludCBzeXNjdGxfaXBmcmFnX2hpZ2hfdGhyZXNoOyANCiBleHRlcm4g aW50IHN5c2N0bF9pcGZyYWdfdGltZTsNCiBleHRlcm4gaW50IHN5c2N0bF9pcGZyYWdfc2VjcmV0 X2ludGVydmFsOw0KK2V4dGVybiBpbnQgc3lzY3RsX2lwX3JlYXNzZW1ibHlfY291bnQ7DQogDQog LyogRnJvbSBpcF9vdXRwdXQuYyAqLw0KIGV4dGVybiBpbnQgc3lzY3RsX2lwX2R5bmFkZHI7DQpA QCAtNTAsNiArNTEsNyBAQCBleHRlcm4gaW50IGluZXRfcGVlcl9nY19taW50aW1lOw0KIGV4dGVy biBpbnQgaW5ldF9wZWVyX2djX21heHRpbWU7DQogDQogI2lmZGVmIENPTkZJR19TWVNDVEwNCitz dGF0aWMgaW50IHplcm87DQogc3RhdGljIGludCB0Y3BfcmV0cjFfbWF4ID0gMjU1OyANCiBzdGF0 aWMgaW50IGlwX2xvY2FsX3BvcnRfcmFuZ2VfbWluW10gPSB7IDEsIDEgfTsNCiBzdGF0aWMgaW50 IGlwX2xvY2FsX3BvcnRfcmFuZ2VfbWF4W10gPSB7IDY1NTM1LCA2NTUzNSB9Ow0KQEAgLTY0Myw2 ICs2NDUsMTUgQEAgY3RsX3RhYmxlIGlwdjRfdGFibGVbXSA9IHsNCiAJCS5zdHJhdGVneQk9ICZz eXNjdGxfamlmZmllcw0KIAl9LA0KIAl7DQorCQkuY3RsX25hbWUJPSBORVRfSVBWNF9SRUFTTV9D T1VOVCwNCisJCS5wcm9jbmFtZQk9ICJpcF9yZWFzc2VtYmx5X2NvdW50IiwNCisJCS5kYXRhCQk9 ICZzeXNjdGxfaXBfcmVhc3NlbWJseV9jb3VudCwNCisJCS5tYXhsZW4JCT0gc2l6ZW9mKGludCks DQorCQkubW9kZQkJPSAwNjQ0LA0KKwkJLnByb2NfaGFuZGxlcgk9ICZwcm9jX2RvaW50dmVjX21p bm1heCwNCisJCS5leHRyYTEJCT0gJnplcm8NCisJfSwNCisJew0KIAkJLmN0bF9uYW1lCT0gTkVU X1RDUF9OT19NRVRSSUNTX1NBVkUsDQogCQkucHJvY25hbWUJPSAidGNwX25vX21ldHJpY3Nfc2F2 ZSIsDQogCQkuZGF0YQkJPSAmc3lzY3RsX3RjcF9ub21ldHJpY3Nfc2F2ZSwNCg== --32512-1443077357-1120766375=:24321-- From davem@davemloft.net Thu Jul 7 14:19:12 2005 Received: with ECARTIS (v1.0.0; list netdev); Thu, 07 Jul 2005 14:19:18 -0700 (PDT) Received: from sunset.davemloft.net ([216.27.180.168]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j67LJBH9008028 for ; Thu, 7 Jul 2005 14:19:12 -0700 Received: from localhost ([127.0.0.1] ident=davem) by sunset.davemloft.net with esmtp (Exim 4.50) id 1DqdkE-0002uA-UF; Thu, 07 Jul 2005 14:17:18 -0700 Date: Thu, 07 Jul 2005 14:17:18 -0700 (PDT) Message-Id: <20050707.141718.85410359.davem@davemloft.net> To: tgraf@suug.ch Cc: dada1@cosmosbay.com, netdev@oss.sgi.com Subject: Re: [PATCH] loop unrolling in net/sched/sch_generic.c From: "David S. Miller" In-Reply-To: <20050706124206.GW16076@postel.suug.ch> References: <42CB2698.2080904@cosmosbay.com> <42CB2B84.50702@cosmosbay.com> <20050706124206.GW16076@postel.suug.ch> X-Mailer: Mew version 4.2 on Emacs 21.4 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 2670 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@davemloft.net Precedence: bulk X-list: netdev Content-Length: 1381 Lines: 33 From: Thomas Graf Date: Wed, 6 Jul 2005 14:42:06 +0200 > The inital issue you brought up which you backed up with numbers > is probably the cause of multiple wrong branch predictions due > to the fact that I wrote skb = dequeue(); if (skb) which was > assumed to be likely by the compiler. In your patch you fixed > this with !skb_queue_empty() which fixed this wrong prediction > and also acts as a little optimization due to skb_queue_empty() > being really simple to implement for the compiler. As an aside, this reminds me that as part of my quest to make sk_buff smaller, I intend to walk across the tree and change all tests of the form: if (!skb_queue_len(list)) into: if (skb_queue_empty(list)) It would be really nice, after the above transformation and some others, to get rid of sk_buff_head->qlen. Why? Because that also allows us to remove the skb->list member as well, as it's the only reason for existing is so that the SKB queue removal routines can decrement the queue length. That's kind of silly, and most SKB lists in the kernel do not care about the queue length at all. Rather, they care about empty and non-empty. The cases that do care (mostly packet schedulers) can keep track of the queue length themselves in their private data structures. When they remove packets, they _know_ which queue to decrement the queue length of. From tgraf@suug.ch Thu Jul 7 14:36:15 2005 Received: with ECARTIS (v1.0.0; list netdev); Thu, 07 Jul 2005 14:36:19 -0700 (PDT) Received: from postel.suug.ch (postel.suug.ch [195.134.158.23]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j67LaFH9009605 for ; Thu, 7 Jul 2005 14:36:15 -0700 Received: by postel.suug.ch (Postfix, from userid 10001) id B07871C0F3; Thu, 7 Jul 2005 23:34:50 +0200 (CEST) Date: Thu, 7 Jul 2005 23:34:50 +0200 From: Thomas Graf To: "David S. Miller" Cc: dada1@cosmosbay.com, netdev@oss.sgi.com Subject: Re: [PATCH] loop unrolling in net/sched/sch_generic.c Message-ID: <20050707213450.GB16076@postel.suug.ch> References: <42CB2698.2080904@cosmosbay.com> <42CB2B84.50702@cosmosbay.com> <20050706124206.GW16076@postel.suug.ch> <20050707.141718.85410359.davem@davemloft.net> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20050707.141718.85410359.davem@davemloft.net> X-archive-position: 2671 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: tgraf@suug.ch Precedence: bulk X-list: netdev Content-Length: 851 Lines: 23 * David S. Miller <20050707.141718.85410359.davem@davemloft.net> 2005-07-07 14:17 > As an aside, this reminds me that as part of my quest to make > sk_buff smaller, I intend to walk across the tree and change > all tests of the form: > > if (!skb_queue_len(list)) > > into: > > if (skb_queue_empty(list)) > > [...] > > The cases that do care (mostly packet schedulers) can > keep track of the queue length themselves in their private data > structures. When they remove packets, they _know_ which queue to > decrement the queue length of. Since I'm changing the classful qdiscs to use a generic API for queue management anyway I could take care of this if you want. WRT the leaf qdiscs it's a bit more complicated since we have to change the new API to take a new struct which includes the qlen and the sk_buff_head but not a problem either. From raghavendra.koushik@neterion.com Thu Jul 7 15:17:39 2005 Received: with ECARTIS (v1.0.0; list netdev); Thu, 07 Jul 2005 15:17:51 -0700 (PDT) Received: from linux.site (adsl-67-120-213-161.dsl.sntc01.pacbell.net [67.120.213.161]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j67MHYH9012776 for ; Thu, 7 Jul 2005 15:17:35 -0700 Received: by linux.site (Postfix, from userid 0) id A8AC689826; Thu, 7 Jul 2005 15:05:06 -0700 (PDT) To: jgarzik@pobox.com, netdev@oss.sgi.com Cc: raghavendra.koushik@neterion.com, ravinandan.arakali@neterion.com, leonid.grossman@neterion.com, rapuru.sriram@neterion.com From: raghavendra.koushik@neterion.com Subject: [PATCH 2.6.12.1 1/12] S2io: Code cleanup Message-Id: <20050707220506.A8AC689826@linux.site> Date: Thu, 7 Jul 2005 15:05:06 -0700 (PDT) X-archive-position: 2672 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: raghavendra.koushik@neterion.com Precedence: bulk X-list: netdev Content-Length: 142306 Lines: 4202 Hi, We are submitting a series of 12 patches to support our Xframe I and Xframe II line of products. The patches can be categorized as follows: Patches 1-8 : Changes applicable to both Xframe I and II Patches 9-11: Xframe II specific features Patch 12: Addresses issues found during testing cycle. Please review the patches and let us know your comments. Starting with patch 1 below. This patch involves cosmetic changes(tabs and indentation, regrouping of transmit and receive data structures, typecasting, code cleanup). Signed-off-by: Ravinandan Arakali Signed-off-by: Raghavendr Koushik --- diff -urpN vanilla_kernel/drivers/net/s2io-regs.h linux-2.6.12-rc6/drivers/net/s2io-regs.h --- vanilla_kernel/drivers/net/s2io-regs.h 2005-06-06 08:22:29.000000000 -0700 +++ linux-2.6.12-rc6/drivers/net/s2io-regs.h 2005-06-27 06:21:32.000000000 -0700 @@ -77,19 +77,18 @@ typedef struct _XENA_dev_config { #define ADAPTER_ECC_EN BIT(55) u64 serr_source; -#define SERR_SOURCE_PIC BIT(0) -#define SERR_SOURCE_TXDMA BIT(1) -#define SERR_SOURCE_RXDMA BIT(2) +#define SERR_SOURCE_PIC BIT(0) +#define SERR_SOURCE_TXDMA BIT(1) +#define SERR_SOURCE_RXDMA BIT(2) #define SERR_SOURCE_MAC BIT(3) #define SERR_SOURCE_MC BIT(4) #define SERR_SOURCE_XGXS BIT(5) -#define SERR_SOURCE_ANY (SERR_SOURCE_PIC | \ - SERR_SOURCE_TXDMA | \ - SERR_SOURCE_RXDMA | \ - SERR_SOURCE_MAC | \ - SERR_SOURCE_MC | \ - SERR_SOURCE_XGXS) - +#define SERR_SOURCE_ANY (SERR_SOURCE_PIC | \ + SERR_SOURCE_TXDMA | \ + SERR_SOURCE_RXDMA | \ + SERR_SOURCE_MAC | \ + SERR_SOURCE_MC | \ + SERR_SOURCE_XGXS) u8 unused_0[0x800 - 0x120]; diff -urpN vanilla_kernel/drivers/net/s2io.c linux-2.6.12-rc6/drivers/net/s2io.c --- vanilla_kernel/drivers/net/s2io.c 2005-06-06 08:22:29.000000000 -0700 +++ linux-2.6.12-rc6/drivers/net/s2io.c 2005-06-27 06:21:32.000000000 -0700 @@ -11,29 +11,28 @@ * See the file COPYING in this distribution for more information. * * Credits: - * Jeff Garzik : For pointing out the improper error condition - * check in the s2io_xmit routine and also some - * issues in the Tx watch dog function. Also for - * patiently answering all those innumerable + * Jeff Garzik : For pointing out the improper error condition + * check in the s2io_xmit routine and also some + * issues in the Tx watch dog function. Also for + * patiently answering all those innumerable * questions regaring the 2.6 porting issues. * Stephen Hemminger : Providing proper 2.6 porting mechanism for some * macros available only in 2.6 Kernel. - * Francois Romieu : For pointing out all code part that were + * Francois Romieu : For pointing out all code part that were * deprecated and also styling related comments. - * Grant Grundler : For helping me get rid of some Architecture + * Grant Grundler : For helping me get rid of some Architecture * dependent code. * Christopher Hellwig : Some more 2.6 specific issues in the driver. - * + * * The module loadable parameters that are supported by the driver and a brief * explaination of all the variables. - * rx_ring_num : This can be used to program the number of receive rings used - * in the driver. - * rx_ring_len: This defines the number of descriptors each ring can have. This + * rx_ring_num : This can be used to program the number of receive rings used + * in the driver. + * rx_ring_len: This defines the number of descriptors each ring can have. This * is also an array of size 8. * tx_fifo_num: This defines the number of Tx FIFOs thats used int the driver. - * tx_fifo_len: This too is an array of 8. Each element defines the number of + * tx_fifo_len: This too is an array of 8. Each element defines the number of * Tx descriptors that can be associated with each corresponding FIFO. - * in PCI Configuration space. ************************************************************************/ #include @@ -56,19 +55,19 @@ #include #include -#include #include #include +#include /* local include */ #include "s2io.h" #include "s2io-regs.h" /* S2io Driver name & version. */ -static char s2io_driver_name[] = "s2io"; -static char s2io_driver_version[] = "Version 1.7.7.1"; +static char s2io_driver_name[] = "Neterion"; +static char s2io_driver_version[] = "Version 1.7.7"; -/* +/* * Cards with following subsystem_id have a link state indication * problem, 600B, 600C, 600D, 640B, 640C and 640D. * macro below identifies these cards given the subsystem_id. @@ -85,9 +84,13 @@ static char s2io_driver_version[] = "Ver static inline int rx_buffer_level(nic_t * sp, int rxb_size, int ring) { int level = 0; - if ((sp->pkt_cnt[ring] - rxb_size) > 16) { + mac_info_t *mac_control; + + mac_control = &sp->mac_control; + if ((mac_control->rings[ring].pkt_cnt - rxb_size) > 16) { level = LOW; - if ((sp->pkt_cnt[ring] - rxb_size) < MAX_RXDS_PER_BLOCK) { + if ((mac_control->rings[ring].pkt_cnt - rxb_size) < + MAX_RXDS_PER_BLOCK) { level = PANIC; } } @@ -152,8 +155,7 @@ static char ethtool_stats_keys[][ETH_GST #define S2IO_TEST_LEN sizeof(s2io_gstrings) / ETH_GSTRING_LEN #define S2IO_STRINGS_LEN S2IO_TEST_LEN * ETH_GSTRING_LEN - -/* +/* * Constants to be programmed into the Xena's registers, to configure * the XAUI. */ @@ -195,8 +197,7 @@ static u64 default_dtx_cfg[] = { END_SIGN }; - -/* +/* * Constants for Fixing the MacAddress problem seen mostly on * Alpha machines. */ @@ -226,6 +227,8 @@ static unsigned int rx_ring_num = 1; static unsigned int rx_ring_sz[MAX_RX_RINGS] = {[0 ...(MAX_RX_RINGS - 1)] = 0 }; static unsigned int Stats_refresh_time = 4; +static unsigned int rts_frm_len[MAX_RX_RINGS] = + {[0 ...(MAX_RX_RINGS - 1)] = 0 }; static unsigned int rmac_pause_time = 65535; static unsigned int mc_pause_threshold_q0q3 = 187; static unsigned int mc_pause_threshold_q4q7 = 187; @@ -236,9 +239,9 @@ static unsigned int rmac_util_period = 5 static unsigned int indicate_max_pkts; #endif -/* +/* * S2IO device table. - * This table lists all the devices that this driver supports. + * This table lists all the devices that this driver supports. */ static struct pci_device_id s2io_tbl[] __devinitdata = { {PCI_VENDOR_ID_S2IO, PCI_DEVICE_ID_S2IO_WIN, @@ -246,9 +249,9 @@ static struct pci_device_id s2io_tbl[] _ {PCI_VENDOR_ID_S2IO, PCI_DEVICE_ID_S2IO_UNI, PCI_ANY_ID, PCI_ANY_ID}, {PCI_VENDOR_ID_S2IO, PCI_DEVICE_ID_HERC_WIN, - PCI_ANY_ID, PCI_ANY_ID}, - {PCI_VENDOR_ID_S2IO, PCI_DEVICE_ID_HERC_UNI, - PCI_ANY_ID, PCI_ANY_ID}, + PCI_ANY_ID, PCI_ANY_ID}, + {PCI_VENDOR_ID_S2IO, PCI_DEVICE_ID_HERC_UNI, + PCI_ANY_ID, PCI_ANY_ID}, {0,} }; @@ -267,8 +270,8 @@ static struct pci_driver s2io_driver = { /** * init_shared_mem - Allocation and Initialization of Memory * @nic: Device private variable. - * Description: The function allocates all the memory areas shared - * between the NIC and the driver. This includes Tx descriptors, + * Description: The function allocates all the memory areas shared + * between the NIC and the driver. This includes Tx descriptors, * Rx descriptors and the statistics block. */ @@ -278,11 +281,11 @@ static int init_shared_mem(struct s2io_n void *tmp_v_addr, *tmp_v_addr_next; dma_addr_t tmp_p_addr, tmp_p_addr_next; RxD_block_t *pre_rxd_blk = NULL; - int i, j, blk_cnt; + int i, j, blk_cnt, rx_sz, tx_sz; int lst_size, lst_per_page; struct net_device *dev = nic->dev; #ifdef CONFIG_2BUFF_MODE - unsigned long tmp; + u64 tmp; buffAdd_t *ba; #endif @@ -307,28 +310,34 @@ static int init_shared_mem(struct s2io_n } lst_size = (sizeof(TxD_t) * config->max_txds); + tx_sz = lst_size * size; lst_per_page = PAGE_SIZE / lst_size; for (i = 0; i < config->tx_fifo_num; i++) { int fifo_len = config->tx_cfg[i].fifo_len; int list_holder_size = fifo_len * sizeof(list_info_hold_t); - nic->list_info[i] = kmalloc(list_holder_size, GFP_KERNEL); - if (!nic->list_info[i]) { + mac_control->fifos[i].list_info = kmalloc(list_holder_size, + GFP_KERNEL); + if (!mac_control->fifos[i].list_info) { DBG_PRINT(ERR_DBG, "Malloc failed for list_info\n"); return -ENOMEM; } - memset(nic->list_info[i], 0, list_holder_size); + memset(mac_control->fifos[i].list_info, 0, list_holder_size); } for (i = 0; i < config->tx_fifo_num; i++) { int page_num = TXD_MEM_PAGE_CNT(config->tx_cfg[i].fifo_len, lst_per_page); - mac_control->tx_curr_put_info[i].offset = 0; - mac_control->tx_curr_put_info[i].fifo_len = + mac_control->fifos[i].tx_curr_put_info.offset = 0; + mac_control->fifos[i].tx_curr_put_info.fifo_len = config->tx_cfg[i].fifo_len - 1; - mac_control->tx_curr_get_info[i].offset = 0; - mac_control->tx_curr_get_info[i].fifo_len = + mac_control->fifos[i].tx_curr_get_info.offset = 0; + mac_control->fifos[i].tx_curr_get_info.fifo_len = config->tx_cfg[i].fifo_len - 1; + mac_control->fifos[i].fifo_no = i; + mac_control->fifos[i].nic = nic; + mac_control->fifos[i].max_txds = MAX_SKB_FRAGS; + for (j = 0; j < page_num; j++) { int k = 0; dma_addr_t tmp_p; @@ -344,16 +353,15 @@ static int init_shared_mem(struct s2io_n while (k < lst_per_page) { int l = (j * lst_per_page) + k; if (l == config->tx_cfg[i].fifo_len) - goto end_txd_alloc; - nic->list_info[i][l].list_virt_addr = + break; + mac_control->fifos[i].list_info[l].list_virt_addr = tmp_v + (k * lst_size); - nic->list_info[i][l].list_phy_addr = + mac_control->fifos[i].list_info[l].list_phy_addr = tmp_p + (k * lst_size); k++; } } } - end_txd_alloc: /* Allocation and initialization of RXDs in Rings */ size = 0; @@ -366,21 +374,26 @@ static int init_shared_mem(struct s2io_n return FAILURE; } size += config->rx_cfg[i].num_rxd; - nic->block_count[i] = + mac_control->rings[i].block_count = config->rx_cfg[i].num_rxd / (MAX_RXDS_PER_BLOCK + 1); - nic->pkt_cnt[i] = - config->rx_cfg[i].num_rxd - nic->block_count[i]; + mac_control->rings[i].pkt_cnt = + config->rx_cfg[i].num_rxd - mac_control->rings[i].block_count; } + size = (size * (sizeof(RxD_t))); + rx_sz = size; for (i = 0; i < config->rx_ring_num; i++) { - mac_control->rx_curr_get_info[i].block_index = 0; - mac_control->rx_curr_get_info[i].offset = 0; - mac_control->rx_curr_get_info[i].ring_len = + mac_control->rings[i].rx_curr_get_info.block_index = 0; + mac_control->rings[i].rx_curr_get_info.offset = 0; + mac_control->rings[i].rx_curr_get_info.ring_len = config->rx_cfg[i].num_rxd - 1; - mac_control->rx_curr_put_info[i].block_index = 0; - mac_control->rx_curr_put_info[i].offset = 0; - mac_control->rx_curr_put_info[i].ring_len = + mac_control->rings[i].rx_curr_put_info.block_index = 0; + mac_control->rings[i].rx_curr_put_info.offset = 0; + mac_control->rings[i].rx_curr_put_info.ring_len = config->rx_cfg[i].num_rxd - 1; + mac_control->rings[i].nic = nic; + mac_control->rings[i].ring_no = i; + blk_cnt = config->rx_cfg[i].num_rxd / (MAX_RXDS_PER_BLOCK + 1); /* Allocating all the Rx blocks */ @@ -394,32 +407,36 @@ static int init_shared_mem(struct s2io_n &tmp_p_addr); if (tmp_v_addr == NULL) { /* - * In case of failure, free_shared_mem() - * is called, which should free any - * memory that was alloced till the + * In case of failure, free_shared_mem() + * is called, which should free any + * memory that was alloced till the * failure happened. */ - nic->rx_blocks[i][j].block_virt_addr = + mac_control->rings[i].rx_blocks[j].block_virt_addr = tmp_v_addr; return -ENOMEM; } memset(tmp_v_addr, 0, size); - nic->rx_blocks[i][j].block_virt_addr = tmp_v_addr; - nic->rx_blocks[i][j].block_dma_addr = tmp_p_addr; + mac_control->rings[i].rx_blocks[j].block_virt_addr = + tmp_v_addr; + mac_control->rings[i].rx_blocks[j].block_dma_addr = + tmp_p_addr; } /* Interlinking all Rx Blocks */ for (j = 0; j < blk_cnt; j++) { - tmp_v_addr = nic->rx_blocks[i][j].block_virt_addr; + tmp_v_addr = + mac_control->rings[i].rx_blocks[j].block_virt_addr; tmp_v_addr_next = - nic->rx_blocks[i][(j + 1) % + mac_control->rings[i].rx_blocks[(j + 1) % blk_cnt].block_virt_addr; - tmp_p_addr = nic->rx_blocks[i][j].block_dma_addr; + tmp_p_addr = + mac_control->rings[i].rx_blocks[j].block_dma_addr; tmp_p_addr_next = - nic->rx_blocks[i][(j + 1) % + mac_control->rings[i].rx_blocks[(j + 1) % blk_cnt].block_dma_addr; pre_rxd_blk = (RxD_block_t *) tmp_v_addr; - pre_rxd_blk->reserved_1 = END_OF_BLOCK; /* last RxD + pre_rxd_blk->reserved_1 = END_OF_BLOCK; /* last RxD * marker. */ #ifndef CONFIG_2BUFF_MODE @@ -432,43 +449,43 @@ static int init_shared_mem(struct s2io_n } #ifdef CONFIG_2BUFF_MODE - /* + /* * Allocation of Storages for buffer addresses in 2BUFF mode * and the buffers as well. */ for (i = 0; i < config->rx_ring_num; i++) { blk_cnt = config->rx_cfg[i].num_rxd / (MAX_RXDS_PER_BLOCK + 1); - nic->ba[i] = kmalloc((sizeof(buffAdd_t *) * blk_cnt), + mac_control->rings[i].ba = kmalloc((sizeof(buffAdd_t *) * blk_cnt), GFP_KERNEL); - if (!nic->ba[i]) + if (!mac_control->rings[i].ba) return -ENOMEM; for (j = 0; j < blk_cnt; j++) { int k = 0; - nic->ba[i][j] = kmalloc((sizeof(buffAdd_t) * + mac_control->rings[i].ba[j] = kmalloc((sizeof(buffAdd_t) * (MAX_RXDS_PER_BLOCK + 1)), GFP_KERNEL); - if (!nic->ba[i][j]) + if (!mac_control->rings[i].ba[j]) return -ENOMEM; while (k != MAX_RXDS_PER_BLOCK) { - ba = &nic->ba[i][j][k]; + ba = &mac_control->rings[i].ba[j][k]; - ba->ba_0_org = kmalloc + ba->ba_0_org = (void *) kmalloc (BUF0_LEN + ALIGN_SIZE, GFP_KERNEL); if (!ba->ba_0_org) return -ENOMEM; - tmp = (unsigned long) ba->ba_0_org; + tmp = (u64) ba->ba_0_org; tmp += ALIGN_SIZE; - tmp &= ~((unsigned long) ALIGN_SIZE); + tmp &= ~((u64) ALIGN_SIZE); ba->ba_0 = (void *) tmp; - ba->ba_1_org = kmalloc + ba->ba_1_org = (void *) kmalloc (BUF1_LEN + ALIGN_SIZE, GFP_KERNEL); if (!ba->ba_1_org) return -ENOMEM; - tmp = (unsigned long) ba->ba_1_org; + tmp = (u64) ba->ba_1_org; tmp += ALIGN_SIZE; - tmp &= ~((unsigned long) ALIGN_SIZE); + tmp &= ~((u64) ALIGN_SIZE); ba->ba_1 = (void *) tmp; k++; } @@ -482,9 +499,9 @@ static int init_shared_mem(struct s2io_n (nic->pdev, size, &mac_control->stats_mem_phy); if (!mac_control->stats_mem) { - /* - * In case of failure, free_shared_mem() is called, which - * should free any memory that was alloced till the + /* + * In case of failure, free_shared_mem() is called, which + * should free any memory that was alloced till the * failure happened. */ return -ENOMEM; @@ -494,15 +511,14 @@ static int init_shared_mem(struct s2io_n tmp_v_addr = mac_control->stats_mem; mac_control->stats_info = (StatInfo_t *) tmp_v_addr; memset(tmp_v_addr, 0, size); - DBG_PRINT(INIT_DBG, "%s:Ring Mem PHY: 0x%llx\n", dev->name, (unsigned long long) tmp_p_addr); return SUCCESS; } -/** - * free_shared_mem - Free the allocated Memory +/** + * free_shared_mem - Free the allocated Memory * @nic: Device private variable. * Description: This function is to free all memory locations allocated by * the init_shared_mem() function and return it to the kernel. @@ -532,15 +548,18 @@ static void free_shared_mem(struct s2io_ lst_per_page); for (j = 0; j < page_num; j++) { int mem_blks = (j * lst_per_page); - if (!nic->list_info[i][mem_blks].list_virt_addr) + if (!mac_control->fifos[i].list_info[mem_blks]. + list_virt_addr) break; pci_free_consistent(nic->pdev, PAGE_SIZE, - nic->list_info[i][mem_blks]. + mac_control->fifos[i]. + list_info[mem_blks]. list_virt_addr, - nic->list_info[i][mem_blks]. + mac_control->fifos[i]. + list_info[mem_blks]. list_phy_addr); } - kfree(nic->list_info[i]); + kfree(mac_control->fifos[i].list_info); } #ifndef CONFIG_2BUFF_MODE @@ -549,10 +568,12 @@ static void free_shared_mem(struct s2io_ size = SIZE_OF_BLOCK; #endif for (i = 0; i < config->rx_ring_num; i++) { - blk_cnt = nic->block_count[i]; + blk_cnt = mac_control->rings[i].block_count; for (j = 0; j < blk_cnt; j++) { - tmp_v_addr = nic->rx_blocks[i][j].block_virt_addr; - tmp_p_addr = nic->rx_blocks[i][j].block_dma_addr; + tmp_v_addr = mac_control->rings[i].rx_blocks[j]. + block_virt_addr; + tmp_p_addr = mac_control->rings[i].rx_blocks[j]. + block_dma_addr; if (tmp_v_addr == NULL) break; pci_free_consistent(nic->pdev, size, @@ -565,35 +586,21 @@ static void free_shared_mem(struct s2io_ for (i = 0; i < config->rx_ring_num; i++) { blk_cnt = config->rx_cfg[i].num_rxd / (MAX_RXDS_PER_BLOCK + 1); - if (!nic->ba[i]) - goto end_free; for (j = 0; j < blk_cnt; j++) { int k = 0; - if (!nic->ba[i][j]) { - kfree(nic->ba[i]); - goto end_free; - } + if (!mac_control->rings[i].ba[j]) + continue; while (k != MAX_RXDS_PER_BLOCK) { - buffAdd_t *ba = &nic->ba[i][j][k]; - if (!ba || !ba->ba_0_org || !ba->ba_1_org) - { - kfree(nic->ba[i]); - kfree(nic->ba[i][j]); - if(ba->ba_0_org) - kfree(ba->ba_0_org); - if(ba->ba_1_org) - kfree(ba->ba_1_org); - goto end_free; - } + buffAdd_t *ba = &mac_control->rings[i].ba[j][k]; kfree(ba->ba_0_org); kfree(ba->ba_1_org); k++; } - kfree(nic->ba[i][j]); + kfree(mac_control->rings[i].ba[j]); } - kfree(nic->ba[i]); + if (mac_control->rings[i].ba) + kfree(mac_control->rings[i].ba); } -end_free: #endif if (mac_control->stats_mem) { @@ -604,12 +611,12 @@ end_free: } } -/** - * init_nic - Initialization of hardware +/** + * init_nic - Initialization of hardware * @nic: device peivate variable - * Description: The function sequentially configures every block - * of the H/W from their reset values. - * Return Value: SUCCESS on success and + * Description: The function sequentially configures every block + * of the H/W from their reset values. + * Return Value: SUCCESS on success and * '-1' on failure (endian settings incorrect). */ @@ -625,12 +632,13 @@ static int init_nic(struct s2io_nic *nic struct config_param *config; int mdio_cnt = 0, dtx_cnt = 0; unsigned long long mem_share; + int mem_size; mac_control = &nic->mac_control; config = &nic->config; - /* Initialize swapper control register */ - if (s2io_set_swapper(nic)) { + /* to set the swapper control on the card */ + if(s2io_set_swapper(nic)) { DBG_PRINT(ERR_DBG,"ERROR: Setting Swapper failed\n"); return -1; } @@ -638,8 +646,8 @@ static int init_nic(struct s2io_nic *nic /* Remove XGXS from reset state */ val64 = 0; writeq(val64, &bar0->sw_reset); - val64 = readq(&bar0->sw_reset); msleep(500); + val64 = readq(&bar0->sw_reset); /* Enable Receiving broadcasts */ add = &bar0->mac_cfg; @@ -659,18 +667,18 @@ static int init_nic(struct s2io_nic *nic val64 = dev->mtu; writeq(vBIT(val64, 2, 14), &bar0->rmac_max_pyld_len); - /* - * Configuring the XAUI Interface of Xena. + /* + * Configuring the XAUI Interface of Xena. * *************************************** - * To Configure the Xena's XAUI, one has to write a series - * of 64 bit values into two registers in a particular - * sequence. Hence a macro 'SWITCH_SIGN' has been defined - * which will be defined in the array of configuration values - * (default_dtx_cfg & default_mdio_cfg) at appropriate places - * to switch writing from one regsiter to another. We continue + * To Configure the Xena's XAUI, one has to write a series + * of 64 bit values into two registers in a particular + * sequence. Hence a macro 'SWITCH_SIGN' has been defined + * which will be defined in the array of configuration values + * (default_dtx_cfg & default_mdio_cfg) at appropriate places + * to switch writing from one regsiter to another. We continue * writing these values until we encounter the 'END_SIGN' macro. - * For example, After making a series of 21 writes into - * dtx_control register the 'SWITCH_SIGN' appears and hence we + * For example, After making a series of 21 writes into + * dtx_control register the 'SWITCH_SIGN' appears and hence we * start writing into mdio_control until we encounter END_SIGN. */ while (1) { @@ -751,8 +759,8 @@ static int init_nic(struct s2io_nic *nic DBG_PRINT(INIT_DBG, "Fifo partition at: 0x%p is: 0x%llx\n", &bar0->tx_fifo_partition_0, (unsigned long long) val64); - /* - * Initialization of Tx_PA_CONFIG register to ignore packet + /* + * Initialization of Tx_PA_CONFIG register to ignore packet * integrity checking. */ val64 = readq(&bar0->tx_pa_cfg); @@ -769,54 +777,54 @@ static int init_nic(struct s2io_nic *nic } writeq(val64, &bar0->rx_queue_priority); - /* - * Allocating equal share of memory to all the + /* + * Allocating equal share of memory to all the * configured Rings. */ val64 = 0; + mem_size = 64; for (i = 0; i < config->rx_ring_num; i++) { switch (i) { case 0: - mem_share = (64 / config->rx_ring_num + - 64 % config->rx_ring_num); + mem_share = (mem_size / config->rx_ring_num + + mem_size % config->rx_ring_num); val64 |= RX_QUEUE_CFG_Q0_SZ(mem_share); continue; case 1: - mem_share = (64 / config->rx_ring_num); + mem_share = (mem_size / config->rx_ring_num); val64 |= RX_QUEUE_CFG_Q1_SZ(mem_share); continue; case 2: - mem_share = (64 / config->rx_ring_num); + mem_share = (mem_size / config->rx_ring_num); val64 |= RX_QUEUE_CFG_Q2_SZ(mem_share); continue; case 3: - mem_share = (64 / config->rx_ring_num); + mem_share = (mem_size / config->rx_ring_num); val64 |= RX_QUEUE_CFG_Q3_SZ(mem_share); continue; case 4: - mem_share = (64 / config->rx_ring_num); + mem_share = (mem_size / config->rx_ring_num); val64 |= RX_QUEUE_CFG_Q4_SZ(mem_share); continue; case 5: - mem_share = (64 / config->rx_ring_num); + mem_share = (mem_size / config->rx_ring_num); val64 |= RX_QUEUE_CFG_Q5_SZ(mem_share); continue; case 6: - mem_share = (64 / config->rx_ring_num); + mem_share = (mem_size / config->rx_ring_num); val64 |= RX_QUEUE_CFG_Q6_SZ(mem_share); continue; case 7: - mem_share = (64 / config->rx_ring_num); + mem_share = (mem_size / config->rx_ring_num); val64 |= RX_QUEUE_CFG_Q7_SZ(mem_share); continue; } } writeq(val64, &bar0->rx_queue_cfg); - /* - * Initializing the Tx round robin registers to 0. - * Filling Tx and Rx round robin registers as per the - * number of FIFOs and Rings is still TODO. + /* Initializing the Tx round robin registers to 0 + * filling tx and rx round robin registers as per + * the number of FIFOs and Rings is still TODO */ writeq(0, &bar0->tx_w_round_robin_0); writeq(0, &bar0->tx_w_round_robin_1); @@ -824,30 +832,30 @@ static int init_nic(struct s2io_nic *nic writeq(0, &bar0->tx_w_round_robin_3); writeq(0, &bar0->tx_w_round_robin_4); - /* + /* * TODO - * Disable Rx steering. Hard coding all packets be steered to - * Queue 0 for now. + * Disable Rx steering. Hard coding all packets to be steered to + * Queue 0 for now. */ val64 = 0x8080808080808080ULL; writeq(val64, &bar0->rts_qos_steering); /* UDP Fix */ val64 = 0; - for (i = 1; i < 8; i++) + for (i = 0; i < 8; i++) writeq(val64, &bar0->rts_frm_len_n[i]); - /* Set rts_frm_len register for fifo 0 */ - writeq(MAC_RTS_FRM_LEN_SET(dev->mtu + 22), - &bar0->rts_frm_len_n[0]); + /* Set the default rts frame length for ring0 */ + writeq(MAC_RTS_FRM_LEN_SET(dev->mtu+22), + &bar0->rts_frm_len_n[0]); - /* Enable statistics */ + /* Program statistics memory */ writeq(mac_control->stats_mem_phy, &bar0->stat_addr); val64 = SET_UPDT_PERIOD(Stats_refresh_time) | STAT_CFG_STAT_RO | STAT_CFG_STAT_EN; writeq(val64, &bar0->stat_cfg); - /* + /* * Initializing the sampling rate for the device to calculate the * bandwidth utilization. */ @@ -856,11 +864,12 @@ static int init_nic(struct s2io_nic *nic writeq(val64, &bar0->mac_link_util); - /* - * Initializing the Transmit and Receive Traffic Interrupt + /* + * Initializing the Transmit and Receive Traffic Interrupt * Scheme. */ - /* TTI Initialization. Default Tx timer gets us about + /* + * TTI Initialization. Default Tx timer gets us about * 250 interrupts per sec. Continuous interrupts are enabled * by default. */ @@ -879,7 +888,7 @@ static int init_nic(struct s2io_nic *nic val64 = TTI_CMD_MEM_WE | TTI_CMD_MEM_STROBE_NEW_CMD; writeq(val64, &bar0->tti_command_mem); - /* + /* * Once the operation completes, the Strobe bit of the command * register will be reset. We poll for this particular condition * We wait for a maximum of 500ms for the operation to complete, @@ -916,7 +925,7 @@ static int init_nic(struct s2io_nic *nic val64 = RTI_CMD_MEM_WE | RTI_CMD_MEM_STROBE_NEW_CMD; writeq(val64, &bar0->rti_command_mem); - /* + /* * Once the operation completes, the Strobe bit of the command * register will be reset. We poll for this particular condition * We wait for a maximum of 500ms for the operation to complete, @@ -925,7 +934,7 @@ static int init_nic(struct s2io_nic *nic time = 0; while (TRUE) { val64 = readq(&bar0->rti_command_mem); - if (!(val64 & TTI_CMD_MEM_STROBE_NEW_CMD)) { + if (!(val64 & RTI_CMD_MEM_STROBE_NEW_CMD)) { break; } if (time > 10) { @@ -937,15 +946,15 @@ static int init_nic(struct s2io_nic *nic msleep(50); } - /* - * Initializing proper values as Pause threshold into all + /* + * Initializing proper values as Pause threshold into all * the 8 Queues on Rx side. */ writeq(0xffbbffbbffbbffbbULL, &bar0->mc_pause_thresh_q0q3); writeq(0xffbbffbbffbbffbbULL, &bar0->mc_pause_thresh_q4q7); /* Disable RMAC PAD STRIPPING */ - add = &bar0->mac_cfg; + add = (void *) &bar0->mac_cfg; val64 = readq(&bar0->mac_cfg); val64 &= ~(MAC_CFG_RMAC_STRIP_PAD); writeq(RMAC_CFG_KEY(0x4C0D), &bar0->rmac_cfg_key); @@ -954,8 +963,8 @@ static int init_nic(struct s2io_nic *nic writel((u32) (val64 >> 32), (add + 4)); val64 = readq(&bar0->mac_cfg); - /* - * Set the time value to be inserted in the pause frame + /* + * Set the time value to be inserted in the pause frame * generated by xena. */ val64 = readq(&bar0->rmac_pause_cfg); @@ -963,7 +972,7 @@ static int init_nic(struct s2io_nic *nic val64 |= RMAC_PAUSE_HG_PTIME(nic->mac_control.rmac_pause_time); writeq(val64, &bar0->rmac_pause_cfg); - /* + /* * Set the Threshold Limit for Generating the pause frame * If the amount of data in any Queue exceeds ratio of * (mac_control.mc_pause_threshold_q0q3 or q4q7)/256 @@ -987,8 +996,8 @@ static int init_nic(struct s2io_nic *nic } writeq(val64, &bar0->mc_pause_thresh_q4q7); - /* - * TxDMA will stop Read request if the number of read split has + /* + * TxDMA will stop Read request if the number of read split has * exceeded the limit pointed by shared_splits */ val64 = readq(&bar0->pic_control); @@ -998,14 +1007,14 @@ static int init_nic(struct s2io_nic *nic return SUCCESS; } -/** - * en_dis_able_nic_intrs - Enable or Disable the interrupts +/** + * en_dis_able_nic_intrs - Enable or Disable the interrupts * @nic: device private variable, * @mask: A mask indicating which Intr block must be modified and, * @flag: A flag indicating whether to enable or disable the Intrs. * Description: This function will either disable or enable the interrupts - * depending on the flag argument. The mask argument can be used to - * enable/disable any Intr block. + * depending on the flag argument. The mask argument can be used to + * enable/disable any Intr block. * Return Value: NONE. */ @@ -1023,20 +1032,20 @@ static void en_dis_able_nic_intrs(struct temp64 = readq(&bar0->general_int_mask); temp64 &= ~((u64) val64); writeq(temp64, &bar0->general_int_mask); - /* + /* * Disabled all PCIX, Flash, MDIO, IIC and GPIO - * interrupts for now. - * TODO + * interrupts for now. + * TODO */ writeq(DISABLE_ALL_INTRS, &bar0->pic_int_mask); - /* + /* * No MSI Support is available presently, so TTI and * RTI interrupts are also disabled. */ } else if (flag == DISABLE_INTRS) { - /* - * Disable PIC Intrs in the general - * intr mask register + /* + * Disable PIC Intrs in the general + * intr mask register */ writeq(DISABLE_ALL_INTRS, &bar0->pic_int_mask); temp64 = readq(&bar0->general_int_mask); @@ -1054,27 +1063,27 @@ static void en_dis_able_nic_intrs(struct temp64 = readq(&bar0->general_int_mask); temp64 &= ~((u64) val64); writeq(temp64, &bar0->general_int_mask); - /* - * Keep all interrupts other than PFC interrupt + /* + * Keep all interrupts other than PFC interrupt * and PCC interrupt disabled in DMA level. */ val64 = DISABLE_ALL_INTRS & ~(TXDMA_PFC_INT_M | TXDMA_PCC_INT_M); writeq(val64, &bar0->txdma_int_mask); - /* - * Enable only the MISC error 1 interrupt in PFC block + /* + * Enable only the MISC error 1 interrupt in PFC block */ val64 = DISABLE_ALL_INTRS & (~PFC_MISC_ERR_1); writeq(val64, &bar0->pfc_err_mask); - /* - * Enable only the FB_ECC error interrupt in PCC block + /* + * Enable only the FB_ECC error interrupt in PCC block */ val64 = DISABLE_ALL_INTRS & (~PCC_FB_ECC_ERR); writeq(val64, &bar0->pcc_err_mask); } else if (flag == DISABLE_INTRS) { - /* - * Disable TxDMA Intrs in the general intr mask - * register + /* + * Disable TxDMA Intrs in the general intr mask + * register */ writeq(DISABLE_ALL_INTRS, &bar0->txdma_int_mask); writeq(DISABLE_ALL_INTRS, &bar0->pfc_err_mask); @@ -1092,15 +1101,15 @@ static void en_dis_able_nic_intrs(struct temp64 = readq(&bar0->general_int_mask); temp64 &= ~((u64) val64); writeq(temp64, &bar0->general_int_mask); - /* - * All RxDMA block interrupts are disabled for now - * TODO + /* + * All RxDMA block interrupts are disabled for now + * TODO */ writeq(DISABLE_ALL_INTRS, &bar0->rxdma_int_mask); } else if (flag == DISABLE_INTRS) { - /* - * Disable RxDMA Intrs in the general intr mask - * register + /* + * Disable RxDMA Intrs in the general intr mask + * register */ writeq(DISABLE_ALL_INTRS, &bar0->rxdma_int_mask); temp64 = readq(&bar0->general_int_mask); @@ -1117,8 +1126,8 @@ static void en_dis_able_nic_intrs(struct temp64 = readq(&bar0->general_int_mask); temp64 &= ~((u64) val64); writeq(temp64, &bar0->general_int_mask); - /* - * All MAC block error interrupts are disabled for now + /* + * All MAC block error interrupts are disabled for now * except the link status change interrupt. * TODO */ @@ -1131,8 +1140,8 @@ static void en_dis_able_nic_intrs(struct val64 &= ~((u64) RMAC_LINK_STATE_CHANGE_INT); writeq(val64, &bar0->mac_rmac_err_mask); } else if (flag == DISABLE_INTRS) { - /* - * Disable MAC Intrs in the general intr mask register + /* + * Disable MAC Intrs in the general intr mask register */ writeq(DISABLE_ALL_INTRS, &bar0->mac_int_mask); writeq(DISABLE_ALL_INTRS, @@ -1151,14 +1160,14 @@ static void en_dis_able_nic_intrs(struct temp64 = readq(&bar0->general_int_mask); temp64 &= ~((u64) val64); writeq(temp64, &bar0->general_int_mask); - /* + /* * All XGXS block error interrupts are disabled for now - * TODO + * TODO */ writeq(DISABLE_ALL_INTRS, &bar0->xgxs_int_mask); } else if (flag == DISABLE_INTRS) { - /* - * Disable MC Intrs in the general intr mask register + /* + * Disable MC Intrs in the general intr mask register */ writeq(DISABLE_ALL_INTRS, &bar0->xgxs_int_mask); temp64 = readq(&bar0->general_int_mask); @@ -1174,9 +1183,9 @@ static void en_dis_able_nic_intrs(struct temp64 = readq(&bar0->general_int_mask); temp64 &= ~((u64) val64); writeq(temp64, &bar0->general_int_mask); - /* - * All MC block error interrupts are disabled for now - * TODO + /* + * All MC block error interrupts are disabled for now. + * TODO */ writeq(DISABLE_ALL_INTRS, &bar0->mc_int_mask); } else if (flag == DISABLE_INTRS) { @@ -1198,14 +1207,14 @@ static void en_dis_able_nic_intrs(struct temp64 = readq(&bar0->general_int_mask); temp64 &= ~((u64) val64); writeq(temp64, &bar0->general_int_mask); - /* + /* * Enable all the Tx side interrupts - * writing 0 Enables all 64 TX interrupt levels + * writing 0 Enables all 64 TX interrupt levels */ writeq(0x0, &bar0->tx_traffic_mask); } else if (flag == DISABLE_INTRS) { - /* - * Disable Tx Traffic Intrs in the general intr mask + /* + * Disable Tx Traffic Intrs in the general intr mask * register. */ writeq(DISABLE_ALL_INTRS, &bar0->tx_traffic_mask); @@ -1225,8 +1234,8 @@ static void en_dis_able_nic_intrs(struct /* writing 0 Enables all 8 RX interrupt levels */ writeq(0x0, &bar0->rx_traffic_mask); } else if (flag == DISABLE_INTRS) { - /* - * Disable Rx Traffic Intrs in the general intr mask + /* + * Disable Rx Traffic Intrs in the general intr mask * register. */ writeq(DISABLE_ALL_INTRS, &bar0->rx_traffic_mask); @@ -1237,20 +1246,42 @@ static void en_dis_able_nic_intrs(struct } } -/** - * verify_xena_quiescence - Checks whether the H/W is ready +static int check_prc_pcc_state(u64 val64, int flag) +{ + int ret = 0; + + if (flag == FALSE) { + if (!(val64 & ADAPTER_STATUS_RMAC_PCC_IDLE) && + ((val64 & ADAPTER_STATUS_RC_PRC_QUIESCENT) == + ADAPTER_STATUS_RC_PRC_QUIESCENT)) { + ret = 1; + } + } else { + if (((val64 & ADAPTER_STATUS_RMAC_PCC_IDLE) == + ADAPTER_STATUS_RMAC_PCC_IDLE) && + (!(val64 & ADAPTER_STATUS_RC_PRC_QUIESCENT) || + ((val64 & ADAPTER_STATUS_RC_PRC_QUIESCENT) == + ADAPTER_STATUS_RC_PRC_QUIESCENT))) { + ret = 1; + } + } + + return ret; +} +/** + * verify_xena_quiescence - Checks whether the H/W is ready * @val64 : Value read from adapter status register. * @flag : indicates if the adapter enable bit was ever written once * before. * Description: Returns whether the H/W is ready to go or not. Depending - * on whether adapter enable bit was written or not the comparison + * on whether adapter enable bit was written or not the comparison * differs and the calling function passes the input argument flag to * indicate this. - * Return: 1 If xena is quiescence + * Return: 1 If xena is quiescence * 0 If Xena is not quiescence */ -static int verify_xena_quiescence(u64 val64, int flag) +static int verify_xena_quiescence(nic_t *sp, u64 val64, int flag) { int ret = 0; u64 tmp64 = ~((u64) val64); @@ -1262,25 +1293,7 @@ static int verify_xena_quiescence(u64 va ADAPTER_STATUS_PIC_QUIESCENT | ADAPTER_STATUS_MC_DRAM_READY | ADAPTER_STATUS_MC_QUEUES_READY | ADAPTER_STATUS_M_PLL_LOCK | ADAPTER_STATUS_P_PLL_LOCK))) { - if (flag == FALSE) { - if (!(val64 & ADAPTER_STATUS_RMAC_PCC_IDLE) && - ((val64 & ADAPTER_STATUS_RC_PRC_QUIESCENT) == - ADAPTER_STATUS_RC_PRC_QUIESCENT)) { - - ret = 1; - - } - } else { - if (((val64 & ADAPTER_STATUS_RMAC_PCC_IDLE) == - ADAPTER_STATUS_RMAC_PCC_IDLE) && - (!(val64 & ADAPTER_STATUS_RC_PRC_QUIESCENT) || - ((val64 & ADAPTER_STATUS_RC_PRC_QUIESCENT) == - ADAPTER_STATUS_RC_PRC_QUIESCENT))) { - - ret = 1; - - } - } + ret = check_prc_pcc_state(val64, flag); } return ret; @@ -1289,12 +1302,12 @@ static int verify_xena_quiescence(u64 va /** * fix_mac_address - Fix for Mac addr problem on Alpha platforms * @sp: Pointer to device specifc structure - * Description : + * Description : * New procedure to clear mac address reading problems on Alpha platforms * */ -static void fix_mac_address(nic_t * sp) +void fix_mac_address(nic_t * sp) { XENA_dev_config_t __iomem *bar0 = sp->bar0; u64 val64; @@ -1302,20 +1315,21 @@ static void fix_mac_address(nic_t * sp) while (fix_mac[i] != END_SIGN) { writeq(fix_mac[i++], &bar0->gpio_control); + udelay(10); val64 = readq(&bar0->gpio_control); } } /** - * start_nic - Turns the device on + * start_nic - Turns the device on * @nic : device private variable. - * Description: - * This function actually turns the device on. Before this function is - * called,all Registers are configured from their reset states - * and shared memory is allocated but the NIC is still quiescent. On + * Description: + * This function actually turns the device on. Before this function is + * called,all Registers are configured from their reset states + * and shared memory is allocated but the NIC is still quiescent. On * calling this function, the device interrupts are cleared and the NIC is * literally switched on by writing into the adapter control register. - * Return Value: + * Return Value: * SUCCESS on success and -1 on failure. */ @@ -1324,8 +1338,8 @@ static int start_nic(struct s2io_nic *ni XENA_dev_config_t __iomem *bar0 = nic->bar0; struct net_device *dev = nic->dev; register u64 val64 = 0; - u16 interruptible, i; - u16 subid; + u16 interruptible; + u16 subid, i; mac_info_t *mac_control; struct config_param *config; @@ -1334,7 +1348,7 @@ static int start_nic(struct s2io_nic *ni /* PRC Initialization and configuration */ for (i = 0; i < config->rx_ring_num; i++) { - writeq((u64) nic->rx_blocks[i][0].block_dma_addr, + writeq((u64) mac_control->rings[i].rx_blocks[0].block_dma_addr, &bar0->prc_rxd0_n[i]); val64 = readq(&bar0->prc_ctrl_n[i]); @@ -1353,7 +1367,7 @@ static int start_nic(struct s2io_nic *ni writeq(val64, &bar0->rx_pa_cfg); #endif - /* + /* * Enabling MC-RLDRAM. After enabling the device, we timeout * for around 100ms, which is approximately the time required * for the device to be ready for operation. @@ -1363,27 +1377,27 @@ static int start_nic(struct s2io_nic *ni SPECIAL_REG_WRITE(val64, &bar0->mc_rldram_mrs, UF); val64 = readq(&bar0->mc_rldram_mrs); - msleep(100); /* Delay by around 100 ms. */ + msleep(100); /* Delay by around 100 ms. */ /* Enabling ECC Protection. */ val64 = readq(&bar0->adapter_control); val64 &= ~ADAPTER_ECC_EN; writeq(val64, &bar0->adapter_control); - /* - * Clearing any possible Link state change interrupts that + /* + * Clearing any possible Link state change interrupts that * could have popped up just before Enabling the card. */ val64 = readq(&bar0->mac_rmac_err_reg); if (val64) writeq(val64, &bar0->mac_rmac_err_reg); - /* - * Verify if the device is ready to be enabled, if so enable + /* + * Verify if the device is ready to be enabled, if so enable * it. */ val64 = readq(&bar0->adapter_status); - if (!verify_xena_quiescence(val64, nic->device_enabled_once)) { + if (!verify_xena_quiescence(nic, val64, nic->device_enabled_once)) { DBG_PRINT(ERR_DBG, "%s: device is not ready, ", dev->name); DBG_PRINT(ERR_DBG, "Adapter status reads: 0x%llx\n", (unsigned long long) val64); @@ -1395,12 +1409,12 @@ static int start_nic(struct s2io_nic *ni RX_MAC_INTR; en_dis_able_nic_intrs(nic, interruptible, ENABLE_INTRS); - /* + /* * With some switches, link might be already up at this point. - * Because of this weird behavior, when we enable laser, - * we may not get link. We need to handle this. We cannot - * figure out which switch is misbehaving. So we are forced to - * make a global change. + * Because of this weird behavior, when we enable laser, + * we may not get link. We need to handle this. We cannot + * figure out which switch is misbehaving. So we are forced to + * make a global change. */ /* Enabling Laser. */ @@ -1415,17 +1429,17 @@ static int start_nic(struct s2io_nic *ni val64 |= 0x0000800000000000ULL; writeq(val64, &bar0->gpio_control); val64 = 0x0411040400000000ULL; - writeq(val64, (void __iomem *) bar0 + 0x2700); + writeq(val64, (void __iomem *) ((u8 *) bar0 + 0x2700)); } - /* - * Don't see link state interrupts on certain switches, so + /* + * Don't see link state interrupts on certain switches, so * directly scheduling a link state task from here. */ schedule_work(&nic->set_link_task); - /* - * Here we are performing soft reset on XGXS to + /* + * Here we are performing soft reset on XGXS to * force link down. Since link is already up, we will get * link state change interrupt after this reset */ @@ -1442,12 +1456,12 @@ static int start_nic(struct s2io_nic *ni return SUCCESS; } -/** - * free_tx_buffers - Free all queued Tx buffers +/** + * free_tx_buffers - Free all queued Tx buffers * @nic : device private variable. - * Description: + * Description: * Free all queued Tx buffers. - * Return Value: void + * Return Value: void */ static void free_tx_buffers(struct s2io_nic *nic) @@ -1465,7 +1479,7 @@ static void free_tx_buffers(struct s2io_ for (i = 0; i < config->tx_fifo_num; i++) { for (j = 0; j < config->tx_cfg[i].fifo_len - 1; j++) { - txdp = (TxD_t *) nic->list_info[i][j]. + txdp = (TxD_t *) mac_control->fifos[i].list_info[j]. list_virt_addr; skb = (struct sk_buff *) ((unsigned long) txdp-> @@ -1481,16 +1495,16 @@ static void free_tx_buffers(struct s2io_ DBG_PRINT(INTR_DBG, "%s:forcibly freeing %d skbs on FIFO%d\n", dev->name, cnt, i); - mac_control->tx_curr_get_info[i].offset = 0; - mac_control->tx_curr_put_info[i].offset = 0; + mac_control->fifos[i].tx_curr_get_info.offset = 0; + mac_control->fifos[i].tx_curr_put_info.offset = 0; } } -/** - * stop_nic - To stop the nic +/** + * stop_nic - To stop the nic * @nic ; device private variable. - * Description: - * This function does exactly the opposite of what the start_nic() + * Description: + * This function does exactly the opposite of what the start_nic() * function does. This function is called to stop the device. * Return Value: * void. @@ -1520,11 +1534,11 @@ static void stop_nic(struct s2io_nic *ni } } -/** - * fill_rx_buffers - Allocates the Rx side skbs +/** + * fill_rx_buffers - Allocates the Rx side skbs * @nic: device private variable - * @ring_no: ring number - * Description: + * @ring_no: ring number + * Description: * The function allocates Rx side skbs and puts the physical * address of these buffers into the RxD buffer pointers, so that the NIC * can DMA the received frame into these locations. @@ -1532,8 +1546,8 @@ static void stop_nic(struct s2io_nic *ni * 1. single buffer, * 2. three buffer and * 3. Five buffer modes. - * Each mode defines how many fragments the received frame will be split - * up into by the NIC. The frame is split into L3 header, L4 Header, + * Each mode defines how many fragments the received frame will be split + * up into by the NIC. The frame is split into L3 header, L4 Header, * L4 payload in three buffer mode and in 5 buffer mode, L4 payload itself * is split into 3 fragments. As of now only single buffer mode is * supported. @@ -1541,7 +1555,7 @@ static void stop_nic(struct s2io_nic *ni * SUCCESS on success or an appropriate -ve value on failure. */ -static int fill_rx_buffers(struct s2io_nic *nic, int ring_no) +int fill_rx_buffers(struct s2io_nic *nic, int ring_no) { struct net_device *dev = nic->dev; struct sk_buff *skb; @@ -1549,14 +1563,13 @@ static int fill_rx_buffers(struct s2io_n int off, off1, size, block_no, block_no1; int offset, offset1; u32 alloc_tab = 0; - u32 alloc_cnt = nic->pkt_cnt[ring_no] - - atomic_read(&nic->rx_bufs_left[ring_no]); + u32 alloc_cnt; mac_info_t *mac_control; struct config_param *config; #ifdef CONFIG_2BUFF_MODE RxD_t *rxdpnext; int nextblk; - unsigned long tmp; + u64 tmp; buffAdd_t *ba; dma_addr_t rxdpphys; #endif @@ -1566,17 +1579,18 @@ static int fill_rx_buffers(struct s2io_n mac_control = &nic->mac_control; config = &nic->config; - + alloc_cnt = mac_control->rings[ring_no].pkt_cnt - + atomic_read(&nic->rx_bufs_left[ring_no]); size = dev->mtu + HEADER_ETHERNET_II_802_3_SIZE + HEADER_802_2_SIZE + HEADER_SNAP_SIZE; while (alloc_tab < alloc_cnt) { - block_no = mac_control->rx_curr_put_info[ring_no]. + block_no = mac_control->rings[ring_no].rx_curr_put_info. block_index; - block_no1 = mac_control->rx_curr_get_info[ring_no]. + block_no1 = mac_control->rings[ring_no].rx_curr_get_info. block_index; - off = mac_control->rx_curr_put_info[ring_no].offset; - off1 = mac_control->rx_curr_get_info[ring_no].offset; + off = mac_control->rings[ring_no].rx_curr_put_info.offset; + off1 = mac_control->rings[ring_no].rx_curr_get_info.offset; #ifndef CONFIG_2BUFF_MODE offset = block_no * (MAX_RXDS_PER_BLOCK + 1) + off; offset1 = block_no1 * (MAX_RXDS_PER_BLOCK + 1) + off1; @@ -1585,7 +1599,7 @@ static int fill_rx_buffers(struct s2io_n offset1 = block_no1 * (MAX_RXDS_PER_BLOCK) + off1; #endif - rxdp = nic->rx_blocks[ring_no][block_no]. + rxdp = mac_control->rings[ring_no].rx_blocks[block_no]. block_virt_addr + off; if ((offset == offset1) && (rxdp->Host_Control)) { DBG_PRINT(INTR_DBG, "%s: Get and Put", dev->name); @@ -1594,15 +1608,15 @@ static int fill_rx_buffers(struct s2io_n } #ifndef CONFIG_2BUFF_MODE if (rxdp->Control_1 == END_OF_BLOCK) { - mac_control->rx_curr_put_info[ring_no]. + mac_control->rings[ring_no].rx_curr_put_info. block_index++; - mac_control->rx_curr_put_info[ring_no]. - block_index %= nic->block_count[ring_no]; - block_no = mac_control->rx_curr_put_info - [ring_no].block_index; + mac_control->rings[ring_no].rx_curr_put_info. + block_index %= mac_control->rings[ring_no].block_count; + block_no = mac_control->rings[ring_no].rx_curr_put_info. + block_index; off++; off %= (MAX_RXDS_PER_BLOCK + 1); - mac_control->rx_curr_put_info[ring_no].offset = + mac_control->rings[ring_no].rx_curr_put_info.offset = off; rxdp = (RxD_t *) ((unsigned long) rxdp->Control_2); DBG_PRINT(INTR_DBG, "%s: Next block at: %p\n", @@ -1610,30 +1624,30 @@ static int fill_rx_buffers(struct s2io_n } #ifndef CONFIG_S2IO_NAPI spin_lock_irqsave(&nic->put_lock, flags); - nic->put_pos[ring_no] = + mac_control->rings[ring_no].put_pos = (block_no * (MAX_RXDS_PER_BLOCK + 1)) + off; spin_unlock_irqrestore(&nic->put_lock, flags); #endif #else if (rxdp->Host_Control == END_OF_BLOCK) { - mac_control->rx_curr_put_info[ring_no]. + mac_control->rings[ring_no].rx_curr_put_info. block_index++; - mac_control->rx_curr_put_info[ring_no]. - block_index %= nic->block_count[ring_no]; - block_no = mac_control->rx_curr_put_info - [ring_no].block_index; + mac_control->rings[ring_no].rx_curr_put_info.block_index + %= mac_control->rings[ring_no].block_count; + block_no = mac_control->rings[ring_no].rx_curr_put_info + .block_index; off = 0; DBG_PRINT(INTR_DBG, "%s: block%d at: 0x%llx\n", dev->name, block_no, (unsigned long long) rxdp->Control_1); - mac_control->rx_curr_put_info[ring_no].offset = + mac_control->rings[ring_no].rx_curr_put_info.offset = off; - rxdp = nic->rx_blocks[ring_no][block_no]. + rxdp = mac_control->rings[ring_no].rx_blocks[block_no]. block_virt_addr; } #ifndef CONFIG_S2IO_NAPI spin_lock_irqsave(&nic->put_lock, flags); - nic->put_pos[ring_no] = (block_no * + mac_control->rings[ring_no].put_pos = (block_no * (MAX_RXDS_PER_BLOCK + 1)) + off; spin_unlock_irqrestore(&nic->put_lock, flags); #endif @@ -1645,27 +1659,27 @@ static int fill_rx_buffers(struct s2io_n if (rxdp->Control_2 & BIT(0)) #endif { - mac_control->rx_curr_put_info[ring_no]. + mac_control->rings[ring_no].rx_curr_put_info. offset = off; goto end; } #ifdef CONFIG_2BUFF_MODE - /* - * RxDs Spanning cache lines will be replenished only - * if the succeeding RxD is also owned by Host. It - * will always be the ((8*i)+3) and ((8*i)+6) - * descriptors for the 48 byte descriptor. The offending + /* + * RxDs Spanning cache lines will be replenished only + * if the succeeding RxD is also owned by Host. It + * will always be the ((8*i)+3) and ((8*i)+6) + * descriptors for the 48 byte descriptor. The offending * decsriptor is of-course the 3rd descriptor. */ - rxdpphys = nic->rx_blocks[ring_no][block_no]. + rxdpphys = mac_control->rings[ring_no].rx_blocks[block_no]. block_dma_addr + (off * sizeof(RxD_t)); if (((u64) (rxdpphys)) % 128 > 80) { - rxdpnext = nic->rx_blocks[ring_no][block_no]. + rxdpnext = mac_control->rings[ring_no].rx_blocks[block_no]. block_virt_addr + (off + 1); if (rxdpnext->Host_Control == END_OF_BLOCK) { nextblk = (block_no + 1) % - (nic->block_count[ring_no]); - rxdpnext = nic->rx_blocks[ring_no] + (mac_control->rings[ring_no].block_count); + rxdpnext = mac_control->rings[ring_no].rx_blocks [nextblk].block_virt_addr; } if (rxdpnext->Control_2 & BIT(0)) @@ -1694,11 +1708,11 @@ static int fill_rx_buffers(struct s2io_n rxdp->Control_1 |= RXD_OWN_XENA; off++; off %= (MAX_RXDS_PER_BLOCK + 1); - mac_control->rx_curr_put_info[ring_no].offset = off; + mac_control->rings[ring_no].rx_curr_put_info.offset = off; #else - ba = &nic->ba[ring_no][block_no][off]; + ba = &mac_control->rings[ring_no].ba[block_no][off]; skb_reserve(skb, BUF0_LEN); - tmp = (unsigned long) skb->data; + tmp = (u64) skb->data; tmp += ALIGN_SIZE; tmp &= ~ALIGN_SIZE; skb->data = (void *) tmp; @@ -1722,8 +1736,9 @@ static int fill_rx_buffers(struct s2io_n rxdp->Host_Control = (u64) ((unsigned long) (skb)); rxdp->Control_1 |= RXD_OWN_XENA; off++; - mac_control->rx_curr_put_info[ring_no].offset = off; + mac_control->rings[ring_no].rx_curr_put_info.offset = off; #endif + atomic_inc(&nic->rx_bufs_left[ring_no]); alloc_tab++; } @@ -1733,9 +1748,9 @@ static int fill_rx_buffers(struct s2io_n } /** - * free_rx_buffers - Frees all Rx buffers + * free_rx_buffers - Frees all Rx buffers * @sp: device private variable. - * Description: + * Description: * This function will free all Rx buffers allocated by host. * Return Value: * NONE. @@ -1759,7 +1774,8 @@ static void free_rx_buffers(struct s2io_ for (i = 0; i < config->rx_ring_num; i++) { for (j = 0, blk = 0; j < config->rx_cfg[i].num_rxd; j++) { off = j % (MAX_RXDS_PER_BLOCK + 1); - rxdp = sp->rx_blocks[i][blk].block_virt_addr + off; + rxdp = mac_control->rings[i].rx_blocks[blk]. + block_virt_addr + off; #ifndef CONFIG_2BUFF_MODE if (rxdp->Control_1 == END_OF_BLOCK) { @@ -1794,7 +1810,7 @@ static void free_rx_buffers(struct s2io_ HEADER_SNAP_SIZE, PCI_DMA_FROMDEVICE); #else - ba = &sp->ba[i][blk][off]; + ba = &mac_control->rings[i].ba[blk][off]; pci_unmap_single(sp->pdev, (dma_addr_t) rxdp->Buffer0_ptr, BUF0_LEN, @@ -1814,10 +1830,10 @@ static void free_rx_buffers(struct s2io_ } memset(rxdp, 0, sizeof(RxD_t)); } - mac_control->rx_curr_put_info[i].block_index = 0; - mac_control->rx_curr_get_info[i].block_index = 0; - mac_control->rx_curr_put_info[i].offset = 0; - mac_control->rx_curr_get_info[i].offset = 0; + mac_control->rings[i].rx_curr_put_info.block_index = 0; + mac_control->rings[i].rx_curr_get_info.block_index = 0; + mac_control->rings[i].rx_curr_put_info.offset = 0; + mac_control->rings[i].rx_curr_get_info.offset = 0; atomic_set(&sp->rx_bufs_left[i], 0); DBG_PRINT(INIT_DBG, "%s:Freed 0x%x Rx Buffers on ring%d\n", dev->name, buf_cnt, i); @@ -1827,7 +1843,7 @@ static void free_rx_buffers(struct s2io_ /** * s2io_poll - Rx interrupt handler for NAPI support * @dev : pointer to the device structure. - * @budget : The number of packets that were budgeted to be processed + * @budget : The number of packets that were budgeted to be processed * during one pass through the 'Poll" function. * Description: * Comes into picture only if NAPI support has been incorporated. It does @@ -1837,160 +1853,35 @@ static void free_rx_buffers(struct s2io_ * 0 on success and 1 if there are No Rx packets to be processed. */ -#ifdef CONFIG_S2IO_NAPI +#if defined(CONFIG_S2IO_NAPI) static int s2io_poll(struct net_device *dev, int *budget) { nic_t *nic = dev->priv; - XENA_dev_config_t __iomem *bar0 = nic->bar0; - int pkts_to_process = *budget, pkt_cnt = 0; - register u64 val64 = 0; - rx_curr_get_info_t get_info, put_info; - int i, get_block, put_block, get_offset, put_offset, ring_bufs; -#ifndef CONFIG_2BUFF_MODE - u16 val16, cksum; -#endif - struct sk_buff *skb; - RxD_t *rxdp; + int pkt_cnt = 0, org_pkts_to_process; mac_info_t *mac_control; struct config_param *config; -#ifdef CONFIG_2BUFF_MODE - buffAdd_t *ba; -#endif + XENA_dev_config_t *bar0 = (XENA_dev_config_t *) nic->bar0; + u64 val64; + int i; mac_control = &nic->mac_control; config = &nic->config; - if (pkts_to_process > dev->quota) - pkts_to_process = dev->quota; + nic->pkts_to_process = *budget; + if (nic->pkts_to_process > dev->quota) + nic->pkts_to_process = dev->quota; + org_pkts_to_process = nic->pkts_to_process; val64 = readq(&bar0->rx_traffic_int); writeq(val64, &bar0->rx_traffic_int); for (i = 0; i < config->rx_ring_num; i++) { - get_info = mac_control->rx_curr_get_info[i]; - get_block = get_info.block_index; - put_info = mac_control->rx_curr_put_info[i]; - put_block = put_info.block_index; - ring_bufs = config->rx_cfg[i].num_rxd; - rxdp = nic->rx_blocks[i][get_block].block_virt_addr + - get_info.offset; -#ifndef CONFIG_2BUFF_MODE - get_offset = (get_block * (MAX_RXDS_PER_BLOCK + 1)) + - get_info.offset; - put_offset = (put_block * (MAX_RXDS_PER_BLOCK + 1)) + - put_info.offset; - while ((!(rxdp->Control_1 & RXD_OWN_XENA)) && - (((get_offset + 1) % ring_bufs) != put_offset)) { - if (--pkts_to_process < 0) { - goto no_rx; - } - if (rxdp->Control_1 == END_OF_BLOCK) { - rxdp = - (RxD_t *) ((unsigned long) rxdp-> - Control_2); - get_info.offset++; - get_info.offset %= - (MAX_RXDS_PER_BLOCK + 1); - get_block++; - get_block %= nic->block_count[i]; - mac_control->rx_curr_get_info[i]. - offset = get_info.offset; - mac_control->rx_curr_get_info[i]. - block_index = get_block; - continue; - } - get_offset = - (get_block * (MAX_RXDS_PER_BLOCK + 1)) + - get_info.offset; - skb = - (struct sk_buff *) ((unsigned long) rxdp-> - Host_Control); - if (skb == NULL) { - DBG_PRINT(ERR_DBG, "%s: The skb is ", - dev->name); - DBG_PRINT(ERR_DBG, "Null in Rx Intr\n"); - goto no_rx; - } - val64 = RXD_GET_BUFFER0_SIZE(rxdp->Control_2); - val16 = (u16) (val64 >> 48); - cksum = RXD_GET_L4_CKSUM(rxdp->Control_1); - pci_unmap_single(nic->pdev, (dma_addr_t) - rxdp->Buffer0_ptr, - dev->mtu + - HEADER_ETHERNET_II_802_3_SIZE + - HEADER_802_2_SIZE + - HEADER_SNAP_SIZE, - PCI_DMA_FROMDEVICE); - rx_osm_handler(nic, val16, rxdp, i); - pkt_cnt++; - get_info.offset++; - get_info.offset %= (MAX_RXDS_PER_BLOCK + 1); - rxdp = - nic->rx_blocks[i][get_block].block_virt_addr + - get_info.offset; - mac_control->rx_curr_get_info[i].offset = - get_info.offset; - } -#else - get_offset = (get_block * (MAX_RXDS_PER_BLOCK + 1)) + - get_info.offset; - put_offset = (put_block * (MAX_RXDS_PER_BLOCK + 1)) + - put_info.offset; - while (((!(rxdp->Control_1 & RXD_OWN_XENA)) && - !(rxdp->Control_2 & BIT(0))) && - (((get_offset + 1) % ring_bufs) != put_offset)) { - if (--pkts_to_process < 0) { - goto no_rx; - } - skb = (struct sk_buff *) ((unsigned long) - rxdp->Host_Control); - if (skb == NULL) { - DBG_PRINT(ERR_DBG, "%s: The skb is ", - dev->name); - DBG_PRINT(ERR_DBG, "Null in Rx Intr\n"); - goto no_rx; - } - - pci_unmap_single(nic->pdev, (dma_addr_t) - rxdp->Buffer0_ptr, - BUF0_LEN, PCI_DMA_FROMDEVICE); - pci_unmap_single(nic->pdev, (dma_addr_t) - rxdp->Buffer1_ptr, - BUF1_LEN, PCI_DMA_FROMDEVICE); - pci_unmap_single(nic->pdev, (dma_addr_t) - rxdp->Buffer2_ptr, - dev->mtu + BUF0_LEN + 4, - PCI_DMA_FROMDEVICE); - ba = &nic->ba[i][get_block][get_info.offset]; - - rx_osm_handler(nic, rxdp, i, ba); - - get_info.offset++; - mac_control->rx_curr_get_info[i].offset = - get_info.offset; - rxdp = - nic->rx_blocks[i][get_block].block_virt_addr + - get_info.offset; - - if (get_info.offset && - (!(get_info.offset % MAX_RXDS_PER_BLOCK))) { - get_info.offset = 0; - mac_control->rx_curr_get_info[i]. - offset = get_info.offset; - get_block++; - get_block %= nic->block_count[i]; - mac_control->rx_curr_get_info[i]. - block_index = get_block; - rxdp = - nic->rx_blocks[i][get_block]. - block_virt_addr; - } - get_offset = - (get_block * (MAX_RXDS_PER_BLOCK + 1)) + - get_info.offset; - pkt_cnt++; + rx_intr_handler(&mac_control->rings[i]); + pkt_cnt = org_pkts_to_process - nic->pkts_to_process; + if (!nic->pkts_to_process) { + /* Quota for the current iteration has been met */ + goto no_rx; } -#endif } if (!pkt_cnt) pkt_cnt = 1; @@ -2010,7 +1901,7 @@ static int s2io_poll(struct net_device * en_dis_able_nic_intrs(nic, RX_TRAFFIC_INTR, ENABLE_INTRS); return 0; - no_rx: +no_rx: dev->quota -= pkt_cnt; *budget -= pkt_cnt; @@ -2023,277 +1914,213 @@ static int s2io_poll(struct net_device * } return 1; } -#else -/** +#endif + +/** * rx_intr_handler - Rx interrupt handler * @nic: device private variable. - * Description: - * If the interrupt is because of a received frame or if the + * Description: + * If the interrupt is because of a received frame or if the * receive ring contains fresh as yet un-processed frames,this function is - * called. It picks out the RxD at which place the last Rx processing had - * stopped and sends the skb to the OSM's Rx handler and then increments + * called. It picks out the RxD at which place the last Rx processing had + * stopped and sends the skb to the OSM's Rx handler and then increments * the offset. * Return Value: * NONE. */ - -static void rx_intr_handler(struct s2io_nic *nic) +static void rx_intr_handler(ring_info_t *ring_data) { + nic_t *nic = ring_data->nic; struct net_device *dev = (struct net_device *) nic->dev; - XENA_dev_config_t *bar0 = (XENA_dev_config_t *) nic->bar0; + XENA_dev_config_t __iomem *bar0 = nic->bar0; + int get_block, get_offset, put_block, put_offset, ring_bufs; rx_curr_get_info_t get_info, put_info; RxD_t *rxdp; struct sk_buff *skb; -#ifndef CONFIG_2BUFF_MODE - u16 val16, cksum; -#endif - register u64 val64 = 0; - int get_block, get_offset, put_block, put_offset, ring_bufs; - int i, pkt_cnt = 0; - mac_info_t *mac_control; - struct config_param *config; -#ifdef CONFIG_2BUFF_MODE - buffAdd_t *ba; +#ifndef CONFIG_S2IO_NAPI + int pkt_cnt = 0; #endif + register u64 val64; - mac_control = &nic->mac_control; - config = &nic->config; - - /* - * rx_traffic_int reg is an R1 register, hence we read and write back - * the samevalue in the register to clear it. + /* + * rx_traffic_int reg is an R1 register, hence we read and write + * back the same value in the register to clear it */ - val64 = readq(&bar0->rx_traffic_int); - writeq(val64, &bar0->rx_traffic_int); + val64 = readq(&bar0->tx_traffic_int); + writeq(val64, &bar0->tx_traffic_int); - for (i = 0; i < config->rx_ring_num; i++) { - get_info = mac_control->rx_curr_get_info[i]; - get_block = get_info.block_index; - put_info = mac_control->rx_curr_put_info[i]; - put_block = put_info.block_index; - ring_bufs = config->rx_cfg[i].num_rxd; - rxdp = nic->rx_blocks[i][get_block].block_virt_addr + + get_info = ring_data->rx_curr_get_info; + get_block = get_info.block_index; + put_info = ring_data->rx_curr_put_info; + put_block = put_info.block_index; + ring_bufs = get_info.ring_len+1; + rxdp = ring_data->rx_blocks[get_block].block_virt_addr + get_info.offset; -#ifndef CONFIG_2BUFF_MODE - get_offset = (get_block * (MAX_RXDS_PER_BLOCK + 1)) + - get_info.offset; - spin_lock(&nic->put_lock); - put_offset = nic->put_pos[i]; - spin_unlock(&nic->put_lock); - while ((!(rxdp->Control_1 & RXD_OWN_XENA)) && - (((get_offset + 1) % ring_bufs) != put_offset)) { - if (rxdp->Control_1 == END_OF_BLOCK) { - rxdp = (RxD_t *) ((unsigned long) - rxdp->Control_2); - get_info.offset++; - get_info.offset %= - (MAX_RXDS_PER_BLOCK + 1); - get_block++; - get_block %= nic->block_count[i]; - mac_control->rx_curr_get_info[i]. - offset = get_info.offset; - mac_control->rx_curr_get_info[i]. - block_index = get_block; - continue; - } - get_offset = - (get_block * (MAX_RXDS_PER_BLOCK + 1)) + - get_info.offset; - skb = (struct sk_buff *) ((unsigned long) - rxdp->Host_Control); - if (skb == NULL) { - DBG_PRINT(ERR_DBG, "%s: The skb is ", - dev->name); - DBG_PRINT(ERR_DBG, "Null in Rx Intr\n"); - return; - } - val64 = RXD_GET_BUFFER0_SIZE(rxdp->Control_2); - val16 = (u16) (val64 >> 48); - cksum = RXD_GET_L4_CKSUM(rxdp->Control_1); - pci_unmap_single(nic->pdev, (dma_addr_t) - rxdp->Buffer0_ptr, - dev->mtu + - HEADER_ETHERNET_II_802_3_SIZE + - HEADER_802_2_SIZE + - HEADER_SNAP_SIZE, - PCI_DMA_FROMDEVICE); - rx_osm_handler(nic, val16, rxdp, i); - get_info.offset++; - get_info.offset %= (MAX_RXDS_PER_BLOCK + 1); - rxdp = - nic->rx_blocks[i][get_block].block_virt_addr + - get_info.offset; - mac_control->rx_curr_get_info[i].offset = - get_info.offset; - pkt_cnt++; - if ((indicate_max_pkts) - && (pkt_cnt > indicate_max_pkts)) - break; + get_offset = (get_block * (MAX_RXDS_PER_BLOCK + 1)) + + get_info.offset; +#ifndef CONFIG_S2IO_NAPI + spin_lock(&nic->put_lock); + put_offset = ring_data->put_pos; + spin_unlock(&nic->put_lock); +#else + put_offset = (put_block * (MAX_RXDS_PER_BLOCK + 1)) + + put_info.offset; +#endif + while ((!(rxdp->Control_1 & RXD_OWN_XENA)) && +#ifdef CONFIG_2BUFF_MODE + (!rxdp->Control_2 & BIT(0)) && +#endif + (((get_offset + 1) % ring_bufs) != put_offset)) { + skb = (struct sk_buff *) ((unsigned long)rxdp->Host_Control); + if (skb == NULL) { + DBG_PRINT(ERR_DBG, "%s: The skb is ", + dev->name); + DBG_PRINT(ERR_DBG, "Null in Rx Intr\n"); + return; } +#ifndef CONFIG_2BUFF_MODE + pci_unmap_single(nic->pdev, (dma_addr_t) + rxdp->Buffer0_ptr, + dev->mtu + + HEADER_ETHERNET_II_802_3_SIZE + + HEADER_802_2_SIZE + + HEADER_SNAP_SIZE, + PCI_DMA_FROMDEVICE); #else - get_offset = (get_block * (MAX_RXDS_PER_BLOCK + 1)) + + pci_unmap_single(nic->pdev, (dma_addr_t) + rxdp->Buffer0_ptr, + BUF0_LEN, PCI_DMA_FROMDEVICE); + pci_unmap_single(nic->pdev, (dma_addr_t) + rxdp->Buffer1_ptr, + BUF1_LEN, PCI_DMA_FROMDEVICE); + pci_unmap_single(nic->pdev, (dma_addr_t) + rxdp->Buffer2_ptr, + dev->mtu + BUF0_LEN + 4, + PCI_DMA_FROMDEVICE); +#endif + rx_osm_handler(ring_data, rxdp); + get_info.offset++; + ring_data->rx_curr_get_info.offset = get_info.offset; - spin_lock(&nic->put_lock); - put_offset = nic->put_pos[i]; - spin_unlock(&nic->put_lock); - while (((!(rxdp->Control_1 & RXD_OWN_XENA)) && - !(rxdp->Control_2 & BIT(0))) && - (((get_offset + 1) % ring_bufs) != put_offset)) { - skb = (struct sk_buff *) ((unsigned long) - rxdp->Host_Control); - if (skb == NULL) { - DBG_PRINT(ERR_DBG, "%s: The skb is ", - dev->name); - DBG_PRINT(ERR_DBG, "Null in Rx Intr\n"); - return; - } - - pci_unmap_single(nic->pdev, (dma_addr_t) - rxdp->Buffer0_ptr, - BUF0_LEN, PCI_DMA_FROMDEVICE); - pci_unmap_single(nic->pdev, (dma_addr_t) - rxdp->Buffer1_ptr, - BUF1_LEN, PCI_DMA_FROMDEVICE); - pci_unmap_single(nic->pdev, (dma_addr_t) - rxdp->Buffer2_ptr, - dev->mtu + BUF0_LEN + 4, - PCI_DMA_FROMDEVICE); - ba = &nic->ba[i][get_block][get_info.offset]; - - rx_osm_handler(nic, rxdp, i, ba); - - get_info.offset++; - mac_control->rx_curr_get_info[i].offset = - get_info.offset; - rxdp = - nic->rx_blocks[i][get_block].block_virt_addr + - get_info.offset; + rxdp = ring_data->rx_blocks[get_block].block_virt_addr + + get_info.offset; + if (get_info.offset && + (!(get_info.offset % MAX_RXDS_PER_BLOCK))) { + get_info.offset = 0; + ring_data->rx_curr_get_info.offset + = get_info.offset; + get_block++; + get_block %= ring_data->block_count; + ring_data->rx_curr_get_info.block_index + = get_block; + rxdp = ring_data->rx_blocks[get_block].block_virt_addr; + } - if (get_info.offset && - (!(get_info.offset % MAX_RXDS_PER_BLOCK))) { - get_info.offset = 0; - mac_control->rx_curr_get_info[i]. - offset = get_info.offset; - get_block++; - get_block %= nic->block_count[i]; - mac_control->rx_curr_get_info[i]. - block_index = get_block; - rxdp = - nic->rx_blocks[i][get_block]. - block_virt_addr; - } - get_offset = - (get_block * (MAX_RXDS_PER_BLOCK + 1)) + + get_offset = (get_block * (MAX_RXDS_PER_BLOCK + 1)) + get_info.offset; - pkt_cnt++; - if ((indicate_max_pkts) - && (pkt_cnt > indicate_max_pkts)) - break; - } -#endif +#ifdef CONFIG_S2IO_NAPI + nic->pkts_to_process -= 1; + if (!nic->pkts_to_process) + break; +#else + pkt_cnt++; if ((indicate_max_pkts) && (pkt_cnt > indicate_max_pkts)) break; +#endif } } -#endif -/** + +/** * tx_intr_handler - Transmit interrupt handler * @nic : device private variable - * Description: - * If an interrupt was raised to indicate DMA complete of the - * Tx packet, this function is called. It identifies the last TxD - * whose buffer was freed and frees all skbs whose data have already + * Description: + * If an interrupt was raised to indicate DMA complete of the + * Tx packet, this function is called. It identifies the last TxD + * whose buffer was freed and frees all skbs whose data have already * DMA'ed into the NICs internal memory. * Return Value: * NONE */ -static void tx_intr_handler(struct s2io_nic *nic) +static void tx_intr_handler(fifo_info_t *fifo_data) { + nic_t *nic = fifo_data->nic; XENA_dev_config_t __iomem *bar0 = nic->bar0; struct net_device *dev = (struct net_device *) nic->dev; tx_curr_get_info_t get_info, put_info; struct sk_buff *skb; TxD_t *txdlp; - register u64 val64 = 0; - int i; u16 j, frg_cnt; - mac_info_t *mac_control; - struct config_param *config; - - mac_control = &nic->mac_control; - config = &nic->config; + register u64 val64 = 0; - /* - * tx_traffic_int reg is an R1 register, hence we read and write - * back the samevalue in the register to clear it. + /* + * tx_traffic_int reg is an R1 register, hence we read and write + * back the same value in the register to clear it */ val64 = readq(&bar0->tx_traffic_int); writeq(val64, &bar0->tx_traffic_int); - for (i = 0; i < config->tx_fifo_num; i++) { - get_info = mac_control->tx_curr_get_info[i]; - put_info = mac_control->tx_curr_put_info[i]; - txdlp = (TxD_t *) nic->list_info[i][get_info.offset]. - list_virt_addr; - while ((!(txdlp->Control_1 & TXD_LIST_OWN_XENA)) && - (get_info.offset != put_info.offset) && - (txdlp->Host_Control)) { - /* Check for TxD errors */ - if (txdlp->Control_1 & TXD_T_CODE) { - unsigned long long err; - err = txdlp->Control_1 & TXD_T_CODE; - DBG_PRINT(ERR_DBG, "***TxD error %llx\n", - err); - } - - skb = (struct sk_buff *) ((unsigned long) - txdlp->Host_Control); - if (skb == NULL) { - DBG_PRINT(ERR_DBG, "%s: Null skb ", - dev->name); - DBG_PRINT(ERR_DBG, "in Tx Free Intr\n"); - return; - } - nic->tx_pkt_count++; + get_info = fifo_data->tx_curr_get_info; + put_info = fifo_data->tx_curr_put_info; + txdlp = (TxD_t *) fifo_data->list_info[get_info.offset]. + list_virt_addr; + while ((!(txdlp->Control_1 & TXD_LIST_OWN_XENA)) && + (get_info.offset != put_info.offset) && + (txdlp->Host_Control)) { + /* Check for TxD errors */ + if (txdlp->Control_1 & TXD_T_CODE) { + unsigned long long err; + err = txdlp->Control_1 & TXD_T_CODE; + DBG_PRINT(ERR_DBG, "***TxD error %llx\n", + err); + } + + skb = (struct sk_buff *) ((unsigned long) + txdlp->Host_Control); + if (skb == NULL) { + DBG_PRINT(ERR_DBG, "%s: Null skb ", + __FUNCTION__); + DBG_PRINT(ERR_DBG, "in Tx Free Intr\n"); + return; + } - frg_cnt = skb_shinfo(skb)->nr_frags; + frg_cnt = skb_shinfo(skb)->nr_frags; + nic->tx_pkt_count++; - /* For unfragmented skb */ - pci_unmap_single(nic->pdev, (dma_addr_t) - txdlp->Buffer_Pointer, - skb->len - skb->data_len, - PCI_DMA_TODEVICE); - if (frg_cnt) { - TxD_t *temp = txdlp; - txdlp++; - for (j = 0; j < frg_cnt; j++, txdlp++) { - skb_frag_t *frag = - &skb_shinfo(skb)->frags[j]; - pci_unmap_page(nic->pdev, - (dma_addr_t) - txdlp-> - Buffer_Pointer, - frag->size, - PCI_DMA_TODEVICE); - } - txdlp = temp; + pci_unmap_single(nic->pdev, (dma_addr_t) + txdlp->Buffer_Pointer, + skb->len - skb->data_len, + PCI_DMA_TODEVICE); + if (frg_cnt) { + TxD_t *temp; + temp = txdlp; + txdlp++; + for (j = 0; j < frg_cnt; j++, txdlp++) { + skb_frag_t *frag = + &skb_shinfo(skb)->frags[j]; + pci_unmap_page(nic->pdev, + (dma_addr_t) + txdlp-> + Buffer_Pointer, + frag->size, + PCI_DMA_TODEVICE); } - memset(txdlp, 0, - (sizeof(TxD_t) * config->max_txds)); - - /* Updating the statistics block */ - nic->stats.tx_packets++; - nic->stats.tx_bytes += skb->len; - dev_kfree_skb_irq(skb); - - get_info.offset++; - get_info.offset %= get_info.fifo_len + 1; - txdlp = (TxD_t *) nic->list_info[i] - [get_info.offset].list_virt_addr; - mac_control->tx_curr_get_info[i].offset = - get_info.offset; + txdlp = temp; } + memset(txdlp, 0, + (sizeof(TxD_t) * fifo_data->max_txds)); + + /* Updating the statistics block */ + nic->stats.tx_packets++; + nic->stats.tx_bytes += skb->len; + dev_kfree_skb_irq(skb); + + get_info.offset++; + get_info.offset %= get_info.fifo_len + 1; + txdlp = (TxD_t *) fifo_data->list_info + [get_info.offset].list_virt_addr; + fifo_data->tx_curr_get_info.offset = + get_info.offset; } spin_lock(&nic->tx_lock); @@ -2302,13 +2129,13 @@ static void tx_intr_handler(struct s2io_ spin_unlock(&nic->tx_lock); } -/** +/** * alarm_intr_handler - Alarm Interrrupt handler * @nic: device private variable - * Description: If the interrupt was neither because of Rx packet or Tx + * Description: If the interrupt was neither because of Rx packet or Tx * complete, this function is called. If the interrupt was to indicate - * a loss of link, the OSM link status handler is invoked for any other - * alarm interrupt the block that raised the interrupt is displayed + * a loss of link, the OSM link status handler is invoked for any other + * alarm interrupt the block that raised the interrupt is displayed * and a H/W reset is issued. * Return Value: * NONE @@ -2339,7 +2166,7 @@ static void alarm_intr_handler(struct s2 /* * Also as mentioned in the latest Errata sheets if the PCC_FB_ECC * Error occurs, the adapter will be recycled by disabling the - * adapter enable bit and enabling it again after the device + * adapter enable bit and enabling it again after the device * becomes Quiescent. */ val64 = readq(&bar0->pcc_err_reg); @@ -2355,18 +2182,18 @@ static void alarm_intr_handler(struct s2 /* Other type of interrupts are not being handled now, TODO */ } -/** +/** * wait_for_cmd_complete - waits for a command to complete. - * @sp : private member of the device structure, which is a pointer to the + * @sp : private member of the device structure, which is a pointer to the * s2io_nic structure. - * Description: Function that waits for a command to Write into RMAC - * ADDR DATA registers to be completed and returns either success or - * error depending on whether the command was complete or not. + * Description: Function that waits for a command to Write into RMAC + * ADDR DATA registers to be completed and returns either success or + * error depending on whether the command was complete or not. * Return value: * SUCCESS on success and FAILURE on failure. */ -static int wait_for_cmd_complete(nic_t * sp) +int wait_for_cmd_complete(nic_t * sp) { XENA_dev_config_t __iomem *bar0 = sp->bar0; int ret = FAILURE, cnt = 0; @@ -2386,17 +2213,17 @@ static int wait_for_cmd_complete(nic_t * return ret; } -/** - * s2io_reset - Resets the card. +/** + * s2io_reset - Resets the card. * @sp : private member of the device structure. * Description: Function to Reset the card. This function then also - * restores the previously saved PCI configuration space registers as + * restores the previously saved PCI configuration space registers as * the card reset also resets the configuration space. * Return value: * void. */ -static void s2io_reset(nic_t * sp) +void s2io_reset(nic_t * sp) { XENA_dev_config_t __iomem *bar0 = sp->bar0; u64 val64; @@ -2405,10 +2232,10 @@ static void s2io_reset(nic_t * sp) val64 = SW_RESET_ALL; writeq(val64, &bar0->sw_reset); - /* - * At this stage, if the PCI write is indeed completed, the - * card is reset and so is the PCI Config space of the device. - * So a read cannot be issued at this stage on any of the + /* + * At this stage, if the PCI write is indeed completed, the + * card is reset and so is the PCI Config space of the device. + * So a read cannot be issued at this stage on any of the * registers to ensure the write into "sw_reset" register * has gone through. * Question: Is there any system call that will explicitly force @@ -2421,10 +2248,17 @@ static void s2io_reset(nic_t * sp) /* Restore the PCI state saved during initializarion. */ pci_restore_state(sp->pdev); + s2io_init_pci(sp); msleep(250); + /* Set swapper to enable I/O register access */ + s2io_set_swapper(sp); + + /* Reset device statistics maintained by OS */ + memset(&sp->stats, 0, sizeof (struct net_device_stats)); + /* SXE-002: Configure link and activity LED to turn it off */ subid = sp->pdev->subsystem_device; if ((subid & 0xFF) >= 0x07) { @@ -2432,29 +2266,29 @@ static void s2io_reset(nic_t * sp) val64 |= 0x0000800000000000ULL; writeq(val64, &bar0->gpio_control); val64 = 0x0411040400000000ULL; - writeq(val64, (void __iomem *) bar0 + 0x2700); + writeq(val64, (void __iomem *) ((u8 *) bar0 + 0x2700)); } sp->device_enabled_once = FALSE; } /** - * s2io_set_swapper - to set the swapper controle on the card - * @sp : private member of the device structure, + * s2io_set_swapper - to set the swapper controle on the card + * @sp : private member of the device structure, * pointer to the s2io_nic structure. - * Description: Function to set the swapper control on the card + * Description: Function to set the swapper control on the card * correctly depending on the 'endianness' of the system. * Return value: * SUCCESS on success and FAILURE on failure. */ -static int s2io_set_swapper(nic_t * sp) +int s2io_set_swapper(nic_t * sp) { struct net_device *dev = sp->dev; XENA_dev_config_t __iomem *bar0 = sp->bar0; u64 val64, valt, valr; - /* + /* * Set proper endian settings and verify the same by reading * the PIF Feed-back register. */ @@ -2506,8 +2340,9 @@ static int s2io_set_swapper(nic_t * sp) i++; } if(i == 4) { + unsigned long long x = val64; DBG_PRINT(ERR_DBG, "Write failed, Xmsi_addr "); - DBG_PRINT(ERR_DBG, "reads:0x%llx\n",val64); + DBG_PRINT(ERR_DBG, "reads:0x%llx\n", x); return FAILURE; } } @@ -2515,8 +2350,8 @@ static int s2io_set_swapper(nic_t * sp) val64 &= 0xFFFF000000000000ULL; #ifdef __BIG_ENDIAN - /* - * The device by default set to a big endian format, so a + /* + * The device by default set to a big endian format, so a * big endian driver need not set anything. */ val64 |= (SWAPPER_CTRL_TXP_FE | @@ -2532,9 +2367,9 @@ static int s2io_set_swapper(nic_t * sp) SWAPPER_CTRL_STATS_FE | SWAPPER_CTRL_STATS_SE); writeq(val64, &bar0->swapper_ctrl); #else - /* + /* * Initially we enable all bits to make it accessible by the - * driver, then we selectively enable only those bits that + * driver, then we selectively enable only those bits that * we want to set. */ val64 |= (SWAPPER_CTRL_TXP_FE | @@ -2556,8 +2391,8 @@ static int s2io_set_swapper(nic_t * sp) #endif val64 = readq(&bar0->swapper_ctrl); - /* - * Verifying if endian settings are accurate by reading a + /* + * Verifying if endian settings are accurate by reading a * feedback register. */ val64 = readq(&bar0->pif_rd_swapper_fb); @@ -2577,25 +2412,25 @@ static int s2io_set_swapper(nic_t * sp) * Functions defined below concern the OS part of the driver * * ********************************************************* */ -/** +/** * s2io_open - open entry point of the driver * @dev : pointer to the device structure. * Description: * This function is the open entry point of the driver. It mainly calls a * function to allocate Rx buffers and inserts them into the buffer - * descriptors and then enables the Rx part of the NIC. + * descriptors and then enables the Rx part of the NIC. * Return value: * 0 on success and an appropriate (-)ve integer as defined in errno.h * file on failure. */ -static int s2io_open(struct net_device *dev) +int s2io_open(struct net_device *dev) { nic_t *sp = dev->priv; int err = 0; - /* - * Make sure you have link off by default every time + /* + * Make sure you have link off by default every time * Nic is initialized */ netif_carrier_off(dev); @@ -2605,27 +2440,34 @@ static int s2io_open(struct net_device * if (s2io_card_up(sp)) { DBG_PRINT(ERR_DBG, "%s: H/W initialization failed\n", dev->name); - return -ENODEV; + err = -ENODEV; + goto hw_init_failed; } /* After proper initialization of H/W, register ISR */ - err = request_irq((int) sp->irq, s2io_isr, SA_SHIRQ, + err = request_irq((int) sp->pdev->irq, s2io_isr, SA_SHIRQ, sp->name, dev); if (err) { - s2io_reset(sp); DBG_PRINT(ERR_DBG, "%s: ISR registration failed\n", dev->name); - return err; + goto isr_registration_failed; } if (s2io_set_mac_addr(dev, dev->dev_addr) == FAILURE) { DBG_PRINT(ERR_DBG, "Set Mac Address Failed\n"); - s2io_reset(sp); - return -ENODEV; + err = -ENODEV; + goto setting_mac_address_failed; } netif_start_queue(dev); return 0; + +setting_mac_address_failed: + free_irq(sp->pdev->irq, dev); +isr_registration_failed: + s2io_reset(sp); +hw_init_failed: + return err; } /** @@ -2641,16 +2483,15 @@ static int s2io_open(struct net_device * * file on failure. */ -static int s2io_close(struct net_device *dev) +int s2io_close(struct net_device *dev) { nic_t *sp = dev->priv; - flush_scheduled_work(); netif_stop_queue(dev); /* Reset card, kill tasklet and free Tx and Rx buffers. */ s2io_card_down(sp); - free_irq(dev->irq, dev); + free_irq(sp->pdev->irq, dev); sp->device_close_flag = TRUE; /* Device is shut down. */ return 0; } @@ -2668,7 +2509,7 @@ static int s2io_close(struct net_device * 0 on success & 1 on failure. */ -static int s2io_xmit(struct sk_buff *skb, struct net_device *dev) +int s2io_xmit(struct sk_buff *skb, struct net_device *dev) { nic_t *sp = dev->priv; u16 frg_cnt, frg_len, i, queue, queue_len, put_off, get_off; @@ -2686,22 +2527,24 @@ static int s2io_xmit(struct sk_buff *skb mac_control = &sp->mac_control; config = &sp->config; - DBG_PRINT(TX_DBG, "%s: In S2IO Tx routine\n", dev->name); + DBG_PRINT(TX_DBG, "%s: In Neterion Tx routine\n", dev->name); spin_lock_irqsave(&sp->tx_lock, flags); - if (atomic_read(&sp->card_state) == CARD_DOWN) { - DBG_PRINT(ERR_DBG, "%s: Card going down for reset\n", + DBG_PRINT(TX_DBG, "%s: Card going down for reset\n", dev->name); spin_unlock_irqrestore(&sp->tx_lock, flags); - return 1; + dev_kfree_skb(skb); + return 0; } queue = 0; - put_off = (u16) mac_control->tx_curr_put_info[queue].offset; - get_off = (u16) mac_control->tx_curr_get_info[queue].offset; - txdp = (TxD_t *) sp->list_info[queue][put_off].list_virt_addr; - queue_len = mac_control->tx_curr_put_info[queue].fifo_len + 1; + put_off = (u16) mac_control->fifos[queue].tx_curr_put_info.offset; + get_off = (u16) mac_control->fifos[queue].tx_curr_get_info.offset; + txdp = (TxD_t *) mac_control->fifos[queue].list_info[put_off]. + list_virt_addr; + + queue_len = mac_control->fifos[queue].tx_curr_put_info.fifo_len + 1; /* Avoid "put" pointer going beyond "get" pointer */ if (txdp->Host_Control || (((put_off + 1) % queue_len) == get_off)) { DBG_PRINT(ERR_DBG, "Error in xmit, No free TXDs.\n"); @@ -2721,9 +2564,9 @@ static int s2io_xmit(struct sk_buff *skb frg_cnt = skb_shinfo(skb)->nr_frags; frg_len = skb->len - skb->data_len; - txdp->Host_Control = (unsigned long) skb; txdp->Buffer_Pointer = pci_map_single (sp->pdev, skb->data, frg_len, PCI_DMA_TODEVICE); + txdp->Host_Control = (unsigned long) skb; if (skb->ip_summed == CHECKSUM_HW) { txdp->Control_2 |= (TXD_TX_CKO_IPV4_EN | TXD_TX_CKO_TCP_EN | @@ -2748,11 +2591,12 @@ static int s2io_xmit(struct sk_buff *skb txdp->Control_1 |= TXD_GATHER_CODE_LAST; tx_fifo = mac_control->tx_FIFO_start[queue]; - val64 = sp->list_info[queue][put_off].list_phy_addr; + val64 = mac_control->fifos[queue].list_info[put_off].list_phy_addr; writeq(val64, &tx_fifo->TxDL_Pointer); val64 = (TX_FIFO_LAST_TXD_NUM(frg_cnt) | TX_FIFO_FIRST_LIST | TX_FIFO_LAST_LIST); + #ifdef NETIF_F_TSO if (mss) val64 |= TX_FIFO_SPECIAL_FUNC; @@ -2763,8 +2607,8 @@ static int s2io_xmit(struct sk_buff *skb val64 = readq(&bar0->general_int_status); put_off++; - put_off %= mac_control->tx_curr_put_info[queue].fifo_len + 1; - mac_control->tx_curr_put_info[queue].offset = put_off; + put_off %= mac_control->fifos[queue].tx_curr_put_info.fifo_len + 1; + mac_control->fifos[queue].tx_curr_put_info.offset = put_off; /* Avoid "put" pointer going beyond "get" pointer */ if (((put_off + 1) % queue_len) == get_off) { @@ -2785,13 +2629,13 @@ static int s2io_xmit(struct sk_buff *skb * @irq: the irq of the device. * @dev_id: a void pointer to the dev structure of the NIC. * @pt_regs: pointer to the registers pushed on the stack. - * Description: This function is the ISR handler of the device. It - * identifies the reason for the interrupt and calls the relevant - * service routines. As a contongency measure, this ISR allocates the + * Description: This function is the ISR handler of the device. It + * identifies the reason for the interrupt and calls the relevant + * service routines. As a contongency measure, this ISR allocates the * recv buffers, if their numbers are below the panic value which is * presently set to 25% of the original number of rcv buffers allocated. * Return value: - * IRQ_HANDLED: will be returned if IRQ was handled by this routine + * IRQ_HANDLED: will be returned if IRQ was handled by this routine * IRQ_NONE: will be returned if interrupt is not from our device */ static irqreturn_t s2io_isr(int irq, void *dev_id, struct pt_regs *regs) @@ -2799,9 +2643,7 @@ static irqreturn_t s2io_isr(int irq, voi struct net_device *dev = (struct net_device *) dev_id; nic_t *sp = dev->priv; XENA_dev_config_t __iomem *bar0 = sp->bar0; -#ifndef CONFIG_S2IO_NAPI - int i, ret; -#endif + int i; u64 reason = 0; mac_info_t *mac_control; struct config_param *config; @@ -2809,13 +2651,13 @@ static irqreturn_t s2io_isr(int irq, voi mac_control = &sp->mac_control; config = &sp->config; - /* + /* * Identify the cause for interrupt and call the appropriate * interrupt handler. Causes for the interrupt could be; * 1. Rx of packet. * 2. Tx complete. * 3. Link down. - * 4. Error in any functional blocks of the NIC. + * 4. Error in any functional blocks of the NIC. */ reason = readq(&bar0->general_int_status); @@ -2824,12 +2666,6 @@ static irqreturn_t s2io_isr(int irq, voi return IRQ_NONE; } - /* If Intr is because of Tx Traffic */ - if (reason & GEN_INTR_TXTRAFFIC) { - tx_intr_handler(sp); - } - - /* If Intr is because of an error */ if (reason & (GEN_ERROR_INTR)) alarm_intr_handler(sp); @@ -2844,17 +2680,26 @@ static irqreturn_t s2io_isr(int irq, voi #else /* If Intr is because of Rx Traffic */ if (reason & GEN_INTR_RXTRAFFIC) { - rx_intr_handler(sp); + for (i = 0; i < config->rx_ring_num; i++) { + rx_intr_handler(&mac_control->rings[i]); + } } #endif - /* - * If the Rx buffer count is below the panic threshold then - * reallocate the buffers from the interrupt handler itself, + /* If Intr is because of Tx Traffic */ + if (reason & GEN_INTR_TXTRAFFIC) { + for (i = 0; i < config->tx_fifo_num; i++) + tx_intr_handler(&mac_control->fifos[i]); + } + + /* + * If the Rx buffer count is below the panic threshold then + * reallocate the buffers from the interrupt handler itself, * else schedule a tasklet to reallocate the buffers. */ #ifndef CONFIG_S2IO_NAPI for (i = 0; i < config->rx_ring_num; i++) { + int ret; int rxb_size = atomic_read(&sp->rx_bufs_left[i]); int level = rx_buffer_level(sp, rxb_size, i); @@ -2879,29 +2724,33 @@ static irqreturn_t s2io_isr(int irq, voi } /** - * s2io_get_stats - Updates the device statistics structure. + * s2io_get_stats - Updates the device statistics structure. * @dev : pointer to the device structure. * Description: - * This function updates the device statistics structure in the s2io_nic + * This function updates the device statistics structure in the s2io_nic * structure and returns a pointer to the same. * Return value: * pointer to the updated net_device_stats structure. */ -static struct net_device_stats *s2io_get_stats(struct net_device *dev) +struct net_device_stats *s2io_get_stats(struct net_device *dev) { nic_t *sp = dev->priv; mac_info_t *mac_control; struct config_param *config; + mac_control = &sp->mac_control; config = &sp->config; - sp->stats.tx_errors = mac_control->stats_info->tmac_any_err_frms; - sp->stats.rx_errors = mac_control->stats_info->rmac_drop_frms; - sp->stats.multicast = mac_control->stats_info->rmac_vld_mcst_frms; + sp->stats.tx_errors = + le32_to_cpu(mac_control->stats_info->tmac_any_err_frms); + sp->stats.rx_errors = + le32_to_cpu(mac_control->stats_info->rmac_drop_frms); + sp->stats.multicast = + le32_to_cpu(mac_control->stats_info->rmac_vld_mcst_frms); sp->stats.rx_length_errors = - mac_control->stats_info->rmac_long_frms; + le32_to_cpu(mac_control->stats_info->rmac_long_frms); return (&sp->stats); } @@ -2910,8 +2759,8 @@ static struct net_device_stats *s2io_get * s2io_set_multicast - entry point for multicast address enable/disable. * @dev : pointer to the device structure * Description: - * This function is a driver entry point which gets called by the kernel - * whenever multicast addresses must be enabled/disabled. This also gets + * This function is a driver entry point which gets called by the kernel + * whenever multicast addresses must be enabled/disabled. This also gets * called to set/reset promiscuous mode. Depending on the deivce flag, we * determine, if multicast address must be enabled or if promiscuous mode * is to be disabled etc. @@ -3011,7 +2860,7 @@ static void s2io_set_multicast(struct ne writeq(RMAC_ADDR_DATA0_MEM_ADDR(dis_addr), &bar0->rmac_addr_data0_mem); writeq(RMAC_ADDR_DATA1_MEM_MASK(0ULL), - &bar0->rmac_addr_data1_mem); + &bar0->rmac_addr_data1_mem); val64 = RMAC_ADDR_CMD_MEM_WE | RMAC_ADDR_CMD_MEM_STROBE_NEW_CMD | RMAC_ADDR_CMD_MEM_OFFSET @@ -3040,8 +2889,7 @@ static void s2io_set_multicast(struct ne writeq(RMAC_ADDR_DATA0_MEM_ADDR(mac_addr), &bar0->rmac_addr_data0_mem); writeq(RMAC_ADDR_DATA1_MEM_MASK(0ULL), - &bar0->rmac_addr_data1_mem); - + &bar0->rmac_addr_data1_mem); val64 = RMAC_ADDR_CMD_MEM_WE | RMAC_ADDR_CMD_MEM_STROBE_NEW_CMD | RMAC_ADDR_CMD_MEM_OFFSET @@ -3060,12 +2908,12 @@ static void s2io_set_multicast(struct ne } /** - * s2io_set_mac_addr - Programs the Xframe mac address + * s2io_set_mac_addr - Programs the Xframe mac address * @dev : pointer to the device structure. * @addr: a uchar pointer to the new mac address which is to be set. - * Description : This procedure will program the Xframe to receive + * Description : This procedure will program the Xframe to receive * frames with new Mac Address - * Return value: SUCCESS on success and an appropriate (-)ve integer + * Return value: SUCCESS on success and an appropriate (-)ve integer * as defined in errno.h file on failure. */ @@ -3076,10 +2924,10 @@ int s2io_set_mac_addr(struct net_device register u64 val64, mac_addr = 0; int i; - /* + /* * Set the new MAC address as the new unicast filter and reflect this * change on the device address registered with the OS. It will be - * at offset 0. + * at offset 0. */ for (i = 0; i < ETH_ALEN; i++) { mac_addr <<= 8; @@ -3103,12 +2951,12 @@ int s2io_set_mac_addr(struct net_device } /** - * s2io_ethtool_sset - Sets different link parameters. + * s2io_ethtool_sset - Sets different link parameters. * @sp : private member of the device structure, which is a pointer to the * s2io_nic structure. * @info: pointer to the structure with parameters given by ethtool to set * link information. * Description: - * The function sets different link parameters provided by the user onto + * The function sets different link parameters provided by the user onto * the NIC. * Return value: * 0 on success. @@ -3130,7 +2978,7 @@ static int s2io_ethtool_sset(struct net_ } /** - * s2io_ethtol_gset - Return link specific information. + * s2io_ethtol_gset - Return link specific information. * @sp : private member of the device structure, pointer to the * s2io_nic structure. * @info : pointer to the structure with parameters given by ethtool @@ -3162,8 +3010,8 @@ static int s2io_ethtool_gset(struct net_ } /** - * s2io_ethtool_gdrvinfo - Returns driver specific information. - * @sp : private member of the device structure, which is a pointer to the + * s2io_ethtool_gdrvinfo - Returns driver specific information. + * @sp : private member of the device structure, which is a pointer to the * s2io_nic structure. * @info : pointer to the structure with parameters given by ethtool to * return driver information. @@ -3191,9 +3039,9 @@ static void s2io_ethtool_gdrvinfo(struct /** * s2io_ethtool_gregs - dumps the entire space of Xfame into the buffer. - * @sp: private member of the device structure, which is a pointer to the + * @sp: private member of the device structure, which is a pointer to the * s2io_nic structure. - * @regs : pointer to the structure with parameters given by ethtool for + * @regs : pointer to the structure with parameters given by ethtool for * dumping the registers. * @reg_space: The input argumnet into which all the registers are dumped. * Description: @@ -3222,11 +3070,11 @@ static void s2io_ethtool_gregs(struct ne /** * s2io_phy_id - timer function that alternates adapter LED. - * @data : address of the private member of the device structure, which + * @data : address of the private member of the device structure, which * is a pointer to the s2io_nic structure, provided as an u32. - * Description: This is actually the timer function that alternates the - * adapter LED bit of the adapter control bit to set/reset every time on - * invocation. The timer is set for 1/2 a second, hence tha NIC blinks + * Description: This is actually the timer function that alternates the + * adapter LED bit of the adapter control bit to set/reset every time on + * invocation. The timer is set for 1/2 a second, hence tha NIC blinks * once every second. */ static void s2io_phy_id(unsigned long data) @@ -3254,12 +3102,12 @@ static void s2io_phy_id(unsigned long da * s2io_ethtool_idnic - To physically identify the nic on the system. * @sp : private member of the device structure, which is a pointer to the * s2io_nic structure. - * @id : pointer to the structure with identification parameters given by + * @id : pointer to the structure with identification parameters given by * ethtool. * Description: Used to physically identify the NIC on the system. - * The Link LED will blink for a time specified by the user for + * The Link LED will blink for a time specified by the user for * identification. - * NOTE: The Link has to be Up to be able to blink the LED. Hence + * NOTE: The Link has to be Up to be able to blink the LED. Hence * identification is possible only if it's link is up. * Return value: * int , returns 0 on success @@ -3289,9 +3137,9 @@ static int s2io_ethtool_idnic(struct net } mod_timer(&sp->id_timer, jiffies); if (data) - msleep(data * 1000); + msleep_interruptible(data * HZ); else - msleep(0xFFFFFFFF); + msleep_interruptible(MAX_FLICKER_TIME); del_timer_sync(&sp->id_timer); if (CARDS_WITH_FAULTY_LINK_INDICATORS(subid)) { @@ -3304,7 +3152,8 @@ static int s2io_ethtool_idnic(struct net /** * s2io_ethtool_getpause_data -Pause frame frame generation and reception. - * @sp : private member of the device structure, which is a pointer to the * s2io_nic structure. + * @sp : private member of the device structure, which is a pointer to the + * s2io_nic structure. * @ep : pointer to the structure with pause parameters given by ethtool. * Description: * Returns the Pause frame generation and reception capability of the NIC. @@ -3328,7 +3177,7 @@ static void s2io_ethtool_getpause_data(s /** * s2io_ethtool_setpause_data - set/reset pause frame generation. - * @sp : private member of the device structure, which is a pointer to the + * @sp : private member of the device structure, which is a pointer to the * s2io_nic structure. * @ep : pointer to the structure with pause parameters given by ethtool. * Description: @@ -3339,7 +3188,7 @@ static void s2io_ethtool_getpause_data(s */ static int s2io_ethtool_setpause_data(struct net_device *dev, - struct ethtool_pauseparam *ep) + struct ethtool_pauseparam *ep) { u64 val64; nic_t *sp = dev->priv; @@ -3360,13 +3209,13 @@ static int s2io_ethtool_setpause_data(st /** * read_eeprom - reads 4 bytes of data from user given offset. - * @sp : private member of the device structure, which is a pointer to the + * @sp : private member of the device structure, which is a pointer to the * s2io_nic structure. * @off : offset at which the data must be written * @data : Its an output parameter where the data read at the given - * offset is stored. + * offset is stored. * Description: - * Will read 4 bytes of data from the user given offset and return the + * Will read 4 bytes of data from the user given offset and return the * read data. * NOTE: Will allow to read only part of the EEPROM visible through the * I2C bus. @@ -3407,7 +3256,7 @@ static int read_eeprom(nic_t * sp, int o * s2io_nic structure. * @off : offset at which the data must be written * @data : The data that is to be written - * @cnt : Number of bytes of the data that are actually to be written into + * @cnt : Number of bytes of the data that are actually to be written into * the Eeprom. (max of 3) * Description: * Actually writes the relevant part of the data value into the Eeprom @@ -3444,7 +3293,7 @@ static int write_eeprom(nic_t * sp, int /** * s2io_ethtool_geeprom - reads the value stored in the Eeprom. * @sp : private member of the device structure, which is a pointer to the * s2io_nic structure. - * @eeprom : pointer to the user level structure provided by ethtool, + * @eeprom : pointer to the user level structure provided by ethtool, * containing all relevant information. * @data_buf : user defined value to be written into Eeprom. * Description: Reads the values stored in the Eeprom at given offset @@ -3455,7 +3304,7 @@ static int write_eeprom(nic_t * sp, int */ static int s2io_ethtool_geeprom(struct net_device *dev, - struct ethtool_eeprom *eeprom, u8 * data_buf) + struct ethtool_eeprom *eeprom, u8 * data_buf) { u32 data, i, valid; nic_t *sp = dev->priv; @@ -3480,7 +3329,7 @@ static int s2io_ethtool_geeprom(struct n * s2io_ethtool_seeprom - tries to write the user provided value in Eeprom * @sp : private member of the device structure, which is a pointer to the * s2io_nic structure. - * @eeprom : pointer to the user level structure provided by ethtool, + * @eeprom : pointer to the user level structure provided by ethtool, * containing all relevant information. * @data_buf ; user defined value to be written into Eeprom. * Description: @@ -3528,8 +3377,8 @@ static int s2io_ethtool_seeprom(struct n } /** - * s2io_register_test - reads and writes into all clock domains. - * @sp : private member of the device structure, which is a pointer to the + * s2io_register_test - reads and writes into all clock domains. + * @sp : private member of the device structure, which is a pointer to the * s2io_nic structure. * @data : variable that returns the result of each of the test conducted b * by the driver. @@ -3546,8 +3395,8 @@ static int s2io_register_test(nic_t * sp u64 val64 = 0; int fail = 0; - val64 = readq(&bar0->pcc_enable); - if (val64 != 0xff00000000000000ULL) { + val64 = readq(&bar0->pif_rd_swapper_fb); + if (val64 != 0x123456789abcdefULL) { fail = 1; DBG_PRINT(INFO_DBG, "Read Test level 1 fails\n"); } @@ -3591,13 +3440,13 @@ static int s2io_register_test(nic_t * sp } /** - * s2io_eeprom_test - to verify that EEprom in the xena can be programmed. + * s2io_eeprom_test - to verify that EEprom in the xena can be programmed. * @sp : private member of the device structure, which is a pointer to the * s2io_nic structure. * @data:variable that returns the result of each of the test conducted by * the driver. * Description: - * Verify that EEPROM in the xena can be programmed using I2C_CONTROL + * Verify that EEPROM in the xena can be programmed using I2C_CONTROL * register. * Return value: * 0 on success. @@ -3662,14 +3511,14 @@ static int s2io_eeprom_test(nic_t * sp, /** * s2io_bist_test - invokes the MemBist test of the card . - * @sp : private member of the device structure, which is a pointer to the + * @sp : private member of the device structure, which is a pointer to the * s2io_nic structure. - * @data:variable that returns the result of each of the test conducted by + * @data:variable that returns the result of each of the test conducted by * the driver. * Description: * This invokes the MemBist test of the card. We give around * 2 secs time for the Test to complete. If it's still not complete - * within this peiod, we consider that the test failed. + * within this peiod, we consider that the test failed. * Return value: * 0 on success and -1 on failure. */ @@ -3698,13 +3547,13 @@ static int s2io_bist_test(nic_t * sp, ui } /** - * s2io-link_test - verifies the link state of the nic - * @sp ; private member of the device structure, which is a pointer to the + * s2io-link_test - verifies the link state of the nic + * @sp ; private member of the device structure, which is a pointer to the * s2io_nic structure. * @data: variable that returns the result of each of the test conducted by * the driver. * Description: - * The function verifies the link state of the NIC and updates the input + * The function verifies the link state of the NIC and updates the input * argument 'data' appropriately. * Return value: * 0 on success. @@ -3723,13 +3572,13 @@ static int s2io_link_test(nic_t * sp, ui } /** - * s2io_rldram_test - offline test for access to the RldRam chip on the NIC - * @sp - private member of the device structure, which is a pointer to the + * s2io_rldram_test - offline test for access to the RldRam chip on the NIC + * @sp - private member of the device structure, which is a pointer to the * s2io_nic structure. - * @data - variable that returns the result of each of the test + * @data - variable that returns the result of each of the test * conducted by the driver. * Description: - * This is one of the offline test that tests the read and write + * This is one of the offline test that tests the read and write * access to the RldRam chip on the NIC. * Return value: * 0 on success. @@ -3834,7 +3683,7 @@ static int s2io_rldram_test(nic_t * sp, * s2io_nic structure. * @ethtest : pointer to a ethtool command specific structure that will be * returned to the user. - * @data : variable that returns the result of each of the test + * @data : variable that returns the result of each of the test * conducted by the driver. * Description: * This function conducts 6 tests ( 4 offline and 2 online) to determine @@ -3852,23 +3701,18 @@ static void s2io_ethtool_test(struct net if (ethtest->flags == ETH_TEST_FL_OFFLINE) { /* Offline Tests. */ - if (orig_state) { + if (orig_state) s2io_close(sp->dev); - s2io_set_swapper(sp); - } else - s2io_set_swapper(sp); if (s2io_register_test(sp, &data[0])) ethtest->flags |= ETH_TEST_FL_FAILED; s2io_reset(sp); - s2io_set_swapper(sp); if (s2io_rldram_test(sp, &data[3])) ethtest->flags |= ETH_TEST_FL_FAILED; s2io_reset(sp); - s2io_set_swapper(sp); if (s2io_eeprom_test(sp, &data[1])) ethtest->flags |= ETH_TEST_FL_FAILED; @@ -3952,20 +3796,19 @@ static void s2io_get_ethtool_stats(struc tmp_stats[i++] = le32_to_cpu(stat_info->rmac_err_tcp); } -static int s2io_ethtool_get_regs_len(struct net_device *dev) +int s2io_ethtool_get_regs_len(struct net_device *dev) { return (XENA_REG_SPACE); } -static u32 s2io_ethtool_get_rx_csum(struct net_device * dev) +u32 s2io_ethtool_get_rx_csum(struct net_device * dev) { nic_t *sp = dev->priv; return (sp->rx_csum); } - -static int s2io_ethtool_set_rx_csum(struct net_device *dev, u32 data) +int s2io_ethtool_set_rx_csum(struct net_device *dev, u32 data) { nic_t *sp = dev->priv; @@ -3976,19 +3819,17 @@ static int s2io_ethtool_set_rx_csum(stru return 0; } - -static int s2io_get_eeprom_len(struct net_device *dev) +int s2io_get_eeprom_len(struct net_device *dev) { return (XENA_EEPROM_SPACE); } -static int s2io_ethtool_self_test_count(struct net_device *dev) +int s2io_ethtool_self_test_count(struct net_device *dev) { return (S2IO_TEST_LEN); } - -static void s2io_ethtool_get_strings(struct net_device *dev, - u32 stringset, u8 * data) +void s2io_ethtool_get_strings(struct net_device *dev, + u32 stringset, u8 * data) { switch (stringset) { case ETH_SS_TEST: @@ -3999,13 +3840,12 @@ static void s2io_ethtool_get_strings(str sizeof(ethtool_stats_keys)); } } - static int s2io_ethtool_get_stats_count(struct net_device *dev) { return (S2IO_STAT_LEN); } -static int s2io_ethtool_op_set_tx_csum(struct net_device *dev, u32 data) +int s2io_ethtool_op_set_tx_csum(struct net_device *dev, u32 data) { if (data) dev->features |= NETIF_F_IP_CSUM; @@ -4047,21 +3887,18 @@ static struct ethtool_ops netdev_ethtool }; /** - * s2io_ioctl - Entry point for the Ioctl + * s2io_ioctl - Entry point for the Ioctl * @dev : Device pointer. * @ifr : An IOCTL specefic structure, that can contain a pointer to * a proprietary structure used to pass information to the driver. * @cmd : This is used to distinguish between the different commands that * can be passed to the IOCTL functions. * Description: - * This function has support for ethtool, adding multiple MAC addresses on - * the NIC and some DBG commands for the util tool. - * Return value: - * Currently the IOCTL supports no operations, hence by default this - * function returns OP NOT SUPPORTED value. + * Currently there are no special functionality supported in IOCTL, hence + * function always return EOPNOTSUPPORTED */ -static int s2io_ioctl(struct net_device *dev, struct ifreq *rq, int cmd) +int s2io_ioctl(struct net_device *dev, struct ifreq *rq, int cmd) { return -EOPNOTSUPP; } @@ -4077,7 +3914,7 @@ static int s2io_ioctl(struct net_device * file on failure. */ -static int s2io_change_mtu(struct net_device *dev, int new_mtu) +int s2io_change_mtu(struct net_device *dev, int new_mtu) { nic_t *sp = dev->priv; XENA_dev_config_t __iomem *bar0 = sp->bar0; @@ -4085,7 +3922,7 @@ static int s2io_change_mtu(struct net_de if (netif_running(dev)) { DBG_PRINT(ERR_DBG, "%s: Must be stopped to ", dev->name); - DBG_PRINT(ERR_DBG, "change its MTU \n"); + DBG_PRINT(ERR_DBG, "change its MTU\n"); return -EBUSY; } @@ -4109,9 +3946,9 @@ static int s2io_change_mtu(struct net_de * @dev_adr : address of the device structure in dma_addr_t format. * Description: * This is the tasklet or the bottom half of the ISR. This is - * an extension of the ISR which is scheduled by the scheduler to be run + * an extension of the ISR which is scheduled by the scheduler to be run * when the load on the CPU is low. All low priority tasks of the ISR can - * be pushed into the tasklet. For now the tasklet is used only to + * be pushed into the tasklet. For now the tasklet is used only to * replenish the Rx buffers in the Rx buffer descriptors. * Return value: * void. @@ -4167,14 +4004,14 @@ static void s2io_set_link(unsigned long } subid = nic->pdev->subsystem_device; - /* - * Allow a small delay for the NICs self initiated + /* + * Allow a small delay for the NICs self initiated * cleanup to complete. */ msleep(100); val64 = readq(&bar0->adapter_status); - if (verify_xena_quiescence(val64, nic->device_enabled_once)) { + if (verify_xena_quiescence(nic, val64, nic->device_enabled_once)) { if (LINK_IS_UP(val64)) { val64 = readq(&bar0->adapter_control); val64 |= ADAPTER_CNTL_EN; @@ -4225,8 +4062,9 @@ static void s2io_card_down(nic_t * sp) register u64 val64 = 0; /* If s2io_set_link task is executing, wait till it completes. */ - while (test_and_set_bit(0, &(sp->link_state))) + while (test_and_set_bit(0, &(sp->link_state))) { msleep(50); + } atomic_set(&sp->card_state, CARD_DOWN); /* disable Tx and Rx traffic on the NIC */ @@ -4238,7 +4076,7 @@ static void s2io_card_down(nic_t * sp) /* Check if the device is Quiescent and then Reset the NIC */ do { val64 = readq(&bar0->adapter_status); - if (verify_xena_quiescence(val64, sp->device_enabled_once)) { + if (verify_xena_quiescence(sp, val64, sp->device_enabled_once)) { break; } @@ -4277,8 +4115,8 @@ static int s2io_card_up(nic_t * sp) return -ENODEV; } - /* - * Initializing the Rx buffers. For now we are considering only 1 + /* + * Initializing the Rx buffers. For now we are considering only 1 * Rx ring and initializing buffers into 30 Rx blocks */ mac_control = &sp->mac_control; @@ -4316,12 +4154,12 @@ static int s2io_card_up(nic_t * sp) return 0; } -/** +/** * s2io_restart_nic - Resets the NIC. * @data : long pointer to the device private structure * Description: * This function is scheduled to be run by the s2io_tx_watchdog - * function after 0.5 secs to reset the NIC. The idea is to reduce + * function after 0.5 secs to reset the NIC. The idea is to reduce * the run time of the watch dog routine which is run holding a * spin lock. */ @@ -4339,10 +4177,11 @@ static void s2io_restart_nic(unsigned lo netif_wake_queue(dev); DBG_PRINT(ERR_DBG, "%s: was reset by Tx watchdog timer\n", dev->name); + } -/** - * s2io_tx_watchdog - Watchdog for transmit side. +/** + * s2io_tx_watchdog - Watchdog for transmit side. * @dev : Pointer to net device structure * Description: * This function is triggered if the Tx Queue is stopped @@ -4370,7 +4209,7 @@ static void s2io_tx_watchdog(struct net_ * @len : length of the packet * @cksum : FCS checksum of the frame. * @ring_no : the ring from which this RxD was extracted. - * Description: + * Description: * This function is called by the Tx interrupt serivce routine to perform * some OS related operations on the SKB before passing it to the upper * layers. It mainly checks if the checksum is OK, if so adds it to the @@ -4380,35 +4219,63 @@ static void s2io_tx_watchdog(struct net_ * Return value: * SUCCESS on success and -1 on failure. */ -#ifndef CONFIG_2BUFF_MODE -static int rx_osm_handler(nic_t * sp, u16 len, RxD_t * rxdp, int ring_no) -#else -static int rx_osm_handler(nic_t * sp, RxD_t * rxdp, int ring_no, - buffAdd_t * ba) -#endif +static int rx_osm_handler(ring_info_t *ring_data, RxD_t * rxdp) { + nic_t *sp = ring_data->nic; struct net_device *dev = (struct net_device *) sp->dev; - struct sk_buff *skb = - (struct sk_buff *) ((unsigned long) rxdp->Host_Control); + struct sk_buff *skb = (struct sk_buff *) + ((unsigned long) rxdp->Host_Control); + int ring_no = ring_data->ring_no; u16 l3_csum, l4_csum; #ifdef CONFIG_2BUFF_MODE - int buf0_len, buf2_len; + int buf0_len = RXD_GET_BUFFER0_SIZE(rxdp->Control_2); + int buf2_len = RXD_GET_BUFFER2_SIZE(rxdp->Control_2); + int get_block = ring_data->rx_curr_get_info.block_index; + int get_off = ring_data->rx_curr_get_info.offset; + buffAdd_t *ba = &ring_data->ba[get_block][get_off]; unsigned char *buff; +#else + u16 len = (u16) ((RXD_GET_BUFFER0_SIZE(rxdp->Control_2)) >> 48);; #endif + skb->dev = dev; + if (rxdp->Control_1 & RXD_T_CODE) { + unsigned long long err = rxdp->Control_1 & RXD_T_CODE; + DBG_PRINT(ERR_DBG, "%s: Rx error Value: 0x%llx\n", + dev->name, err); + } - l3_csum = RXD_GET_L3_CKSUM(rxdp->Control_1); - if ((rxdp->Control_1 & TCP_OR_UDP_FRAME) && (sp->rx_csum)) { + /* Updating statistics */ + rxdp->Host_Control = 0; + sp->rx_pkt_count++; + sp->stats.rx_packets++; +#ifndef CONFIG_2BUFF_MODE + sp->stats.rx_bytes += len; +#else + sp->stats.rx_bytes += buf0_len + buf2_len; +#endif + +#ifndef CONFIG_2BUFF_MODE + skb_put(skb, len); +#else + buff = skb_push(skb, buf0_len); + memcpy(buff, ba->ba_0, buf0_len); + skb_put(skb, buf2_len); +#endif + + if ((rxdp->Control_1 & TCP_OR_UDP_FRAME) && + (sp->rx_csum)) { + l3_csum = RXD_GET_L3_CKSUM(rxdp->Control_1); l4_csum = RXD_GET_L4_CKSUM(rxdp->Control_1); if ((l3_csum == L3_CKSUM_OK) && (l4_csum == L4_CKSUM_OK)) { - /* + /* * NIC verifies if the Checksum of the received * frame is Ok or not and accordingly returns * a flag in the RxD. */ skb->ip_summed = CHECKSUM_UNNECESSARY; } else { - /* - * Packet with erroneous checksum, let the + /* + * Packet with erroneous checksum, let the * upper layers deal with it. */ skb->ip_summed = CHECKSUM_NONE; @@ -4417,44 +4284,14 @@ static int rx_osm_handler(nic_t * sp, Rx skb->ip_summed = CHECKSUM_NONE; } - if (rxdp->Control_1 & RXD_T_CODE) { - unsigned long long err = rxdp->Control_1 & RXD_T_CODE; - DBG_PRINT(ERR_DBG, "%s: Rx error Value: 0x%llx\n", - dev->name, err); - } -#ifdef CONFIG_2BUFF_MODE - buf0_len = RXD_GET_BUFFER0_SIZE(rxdp->Control_2); - buf2_len = RXD_GET_BUFFER2_SIZE(rxdp->Control_2); -#endif - - skb->dev = dev; -#ifndef CONFIG_2BUFF_MODE - skb_put(skb, len); - skb->protocol = eth_type_trans(skb, dev); -#else - buff = skb_push(skb, buf0_len); - memcpy(buff, ba->ba_0, buf0_len); - skb_put(skb, buf2_len); skb->protocol = eth_type_trans(skb, dev); -#endif - #ifdef CONFIG_S2IO_NAPI netif_receive_skb(skb); #else netif_rx(skb); #endif - dev->last_rx = jiffies; - sp->rx_pkt_count++; - sp->stats.rx_packets++; -#ifndef CONFIG_2BUFF_MODE - sp->stats.rx_bytes += len; -#else - sp->stats.rx_bytes += buf0_len + buf2_len; -#endif - atomic_dec(&sp->rx_bufs_left[ring_no]); - rxdp->Host_Control = 0; return SUCCESS; } @@ -4465,13 +4302,13 @@ static int rx_osm_handler(nic_t * sp, Rx * @link : inidicates whether link is UP/DOWN. * Description: * This function stops/starts the Tx queue depending on whether the link - * status of the NIC is is down or up. This is called by the Alarm - * interrupt handler whenever a link change interrupt comes up. + * status of the NIC is is down or up. This is called by the Alarm + * interrupt handler whenever a link change interrupt comes up. * Return value: * void. */ -static void s2io_link(nic_t * sp, int link) +void s2io_link(nic_t * sp, int link) { struct net_device *dev = (struct net_device *) sp->dev; @@ -4488,8 +4325,25 @@ static void s2io_link(nic_t * sp, int li } /** - * s2io_init_pci -Initialization of PCI and PCI-X configuration registers . - * @sp : private member of the device structure, which is a pointer to the + * get_xena_rev_id - to identify revision ID of xena. + * @pdev : PCI Dev structure + * Description: + * Function to identify the Revision ID of xena. + * Return value: + * returns the revision ID of the device. + */ + +int get_xena_rev_id(struct pci_dev *pdev) +{ + u8 id = 0; + int ret; + ret = pci_read_config_byte(pdev, PCI_REVISION_ID, (u8 *) & id); + return id; +} + +/** + * s2io_init_pci -Initialization of PCI and PCI-X configuration registers . + * @sp : private member of the device structure, which is a pointer to the * s2io_nic structure. * Description: * This function initializes a few of the PCI and PCI-X configuration registers @@ -4500,15 +4354,15 @@ static void s2io_link(nic_t * sp, int li static void s2io_init_pci(nic_t * sp) { - u16 pci_cmd = 0; + u16 pci_cmd = 0, pcix_cmd = 0; /* Enable Data Parity Error Recovery in PCI-X command register. */ pci_read_config_word(sp->pdev, PCIX_COMMAND_REGISTER, - &(sp->pcix_cmd)); + &(pcix_cmd)); pci_write_config_word(sp->pdev, PCIX_COMMAND_REGISTER, - (sp->pcix_cmd | 1)); + (pcix_cmd | 1)); pci_read_config_word(sp->pdev, PCIX_COMMAND_REGISTER, - &(sp->pcix_cmd)); + &(pcix_cmd)); /* Set the PErr Response bit in PCI command register. */ pci_read_config_word(sp->pdev, PCI_COMMAND, &pci_cmd); @@ -4517,34 +4371,36 @@ static void s2io_init_pci(nic_t * sp) pci_read_config_word(sp->pdev, PCI_COMMAND, &pci_cmd); /* Set MMRB count to 1024 in PCI-X Command register. */ - sp->pcix_cmd &= 0xFFF3; - pci_write_config_word(sp->pdev, PCIX_COMMAND_REGISTER, (sp->pcix_cmd | (0x1 << 2))); /* MMRBC 1K */ + pcix_cmd &= 0xFFF3; + pci_write_config_word(sp->pdev, PCIX_COMMAND_REGISTER, + (pcix_cmd | (0x1 << 2))); /* MMRBC 1K */ pci_read_config_word(sp->pdev, PCIX_COMMAND_REGISTER, - &(sp->pcix_cmd)); + &(pcix_cmd)); /* Setting Maximum outstanding splits based on system type. */ - sp->pcix_cmd &= 0xFF8F; - - sp->pcix_cmd |= XENA_MAX_OUTSTANDING_SPLITS(0x1); /* 2 splits. */ + pcix_cmd &= 0xFF8F; + pcix_cmd |= XENA_MAX_OUTSTANDING_SPLITS(0x1); /* 2 splits. */ pci_write_config_word(sp->pdev, PCIX_COMMAND_REGISTER, - sp->pcix_cmd); + pcix_cmd); pci_read_config_word(sp->pdev, PCIX_COMMAND_REGISTER, - &(sp->pcix_cmd)); + &(pcix_cmd)); + /* Forcibly disabling relaxed ordering capability of the card. */ - sp->pcix_cmd &= 0xfffd; + pcix_cmd &= 0xfffd; pci_write_config_word(sp->pdev, PCIX_COMMAND_REGISTER, - sp->pcix_cmd); + pcix_cmd); pci_read_config_word(sp->pdev, PCIX_COMMAND_REGISTER, - &(sp->pcix_cmd)); + &(pcix_cmd)); } MODULE_AUTHOR("Raghavendra Koushik "); MODULE_LICENSE("GPL"); module_param(tx_fifo_num, int, 0); -module_param_array(tx_fifo_len, int, NULL, 0); module_param(rx_ring_num, int, 0); -module_param_array(rx_ring_sz, int, NULL, 0); +module_param_array(tx_fifo_len, uint, NULL, 0); +module_param_array(rx_ring_sz, uint, NULL, 0); module_param(Stats_refresh_time, int, 0); +module_param_array(rts_frm_len, uint, NULL, 0); module_param(rmac_pause_time, int, 0); module_param(mc_pause_threshold_q0q3, int, 0); module_param(mc_pause_threshold_q4q7, int, 0); @@ -4554,15 +4410,16 @@ module_param(rmac_util_period, int, 0); #ifndef CONFIG_S2IO_NAPI module_param(indicate_max_pkts, int, 0); #endif + /** - * s2io_init_nic - Initialization of the adapter . + * s2io_init_nic - Initialization of the adapter . * @pdev : structure containing the PCI related information of the device. * @pre: List of PCI devices supported by the driver listed in s2io_tbl. * Description: * The function initializes an adapter identified by the pci_dec structure. - * All OS related initialization including memory and device structure and - * initlaization of the device private variable is done. Also the swapper - * control register is initialized to enable read and write into the I/O + * All OS related initialization including memory and device structure and + * initlaization of the device private variable is done. Also the swapper + * control register is initialized to enable read and write into the I/O * registers of the device. * Return value: * returns 0 on success and negative on failure. @@ -4573,7 +4430,6 @@ s2io_init_nic(struct pci_dev *pdev, cons { nic_t *sp; struct net_device *dev; - char *dev_name = "S2IO 10GE NIC"; int i, j, ret; int dma_flag = FALSE; u32 mac_up, mac_down; @@ -4583,9 +4439,9 @@ s2io_init_nic(struct pci_dev *pdev, cons mac_info_t *mac_control; struct config_param *config; - - DBG_PRINT(ERR_DBG, "Loading S2IO driver with %s\n", - s2io_driver_version); +#ifdef CONFIG_S2IO_NAPI + DBG_PRINT(ERR_DBG, "NAPI support has been enabled\n"); +#endif if ((ret = pci_enable_device(pdev))) { DBG_PRINT(ERR_DBG, @@ -4596,7 +4452,6 @@ s2io_init_nic(struct pci_dev *pdev, cons if (!pci_set_dma_mask(pdev, 0xffffffffffffffffULL)) { DBG_PRINT(INIT_DBG, "s2io_init_nic: Using 64bit DMA\n"); dma_flag = TRUE; - if (pci_set_consistent_dma_mask (pdev, 0xffffffffffffffffULL)) { DBG_PRINT(ERR_DBG, @@ -4636,21 +4491,17 @@ s2io_init_nic(struct pci_dev *pdev, cons memset(sp, 0, sizeof(nic_t)); sp->dev = dev; sp->pdev = pdev; - sp->vendor_id = pdev->vendor; - sp->device_id = pdev->device; sp->high_dma_flag = dma_flag; - sp->irq = pdev->irq; sp->device_enabled_once = FALSE; - strcpy(sp->name, dev_name); /* Initialize some PCI/PCI-X fields of the NIC. */ s2io_init_pci(sp); - /* + /* * Setting the device configuration parameters. - * Most of these parameters can be specified by the user during - * module insertion as they are module loadable parameters. If - * these parameters are not not specified during load time, they + * Most of these parameters can be specified by the user during + * module insertion as they are module loadable parameters. If + * these parameters are not not specified during load time, they * are initialized with default values. */ mac_control = &sp->mac_control; @@ -4664,6 +4515,10 @@ s2io_init_nic(struct pci_dev *pdev, cons config->tx_cfg[i].fifo_priority = i; } + /* mapping the QoS priority to the configured fifos */ + for (i = 0; i < MAX_TX_FIFOS; i++) + config->fifo_mapping[i] = fifo_map[config->tx_fifo_num][i]; + config->tx_intr_type = TXD_INT_TYPE_UTILZ; for (i = 0; i < config->tx_fifo_num; i++) { config->tx_cfg[i].f_no_snoop = @@ -4744,13 +4599,14 @@ s2io_init_nic(struct pci_dev *pdev, cons dev->do_ioctl = &s2io_ioctl; dev->change_mtu = &s2io_change_mtu; SET_ETHTOOL_OPS(dev, &netdev_ethtool_ops); + /* * will use eth_mac_addr() for dev->set_mac_address * mac address will be set every time dev->open() is called */ -#ifdef CONFIG_S2IO_NAPI +#if defined(CONFIG_S2IO_NAPI) dev->poll = s2io_poll; - dev->weight = 90; + dev->weight = 32; #endif dev->features |= NETIF_F_SG | NETIF_F_IP_CSUM; @@ -4777,22 +4633,14 @@ s2io_init_nic(struct pci_dev *pdev, cons goto set_swap_failed; } - /* Fix for all "FFs" MAC address problems observed on Alpha platforms */ + /* + * Fix for all "FFs" MAC address problems observed on + * Alpha platforms + */ fix_mac_address(sp); s2io_reset(sp); /* - * Setting swapper control on the NIC, so the MAC address can be read. - */ - if (s2io_set_swapper(sp)) { - DBG_PRINT(ERR_DBG, - "%s: S2IO: swapper settings are wrong\n", - dev->name); - ret = -EAGAIN; - goto set_swap_failed; - } - - /* * MAC address initialization. * For now only one mac address will be read and used. */ @@ -4829,23 +4677,22 @@ s2io_init_nic(struct pci_dev *pdev, cons memcpy(dev->dev_addr, sp->def_mac_addr, ETH_ALEN); /* - * Initialize the tasklet status and link state flags + * Initialize the tasklet status and link state flags * and the card statte parameter */ atomic_set(&(sp->card_state), 0); sp->tasklet_status = 0; sp->link_state = 0; - /* Initialize spinlocks */ spin_lock_init(&sp->tx_lock); #ifndef CONFIG_S2IO_NAPI spin_lock_init(&sp->put_lock); #endif - /* - * SXE-002: Configure link and activity LED to init state - * on driver load. + /* + * SXE-002: Configure link and activity LED to init state + * on driver load. */ subid = sp->pdev->subsystem_device; if ((subid & 0xFF) >= 0x07) { @@ -4865,9 +4712,9 @@ s2io_init_nic(struct pci_dev *pdev, cons goto register_failed; } - /* - * Make Link state as off at this point, when the Link change - * interrupt comes the state will be automatically changed to + /* + * Make Link state as off at this point, when the Link change + * interrupt comes the state will be automatically changed to * the right state. */ netif_carrier_off(dev); @@ -4892,11 +4739,11 @@ s2io_init_nic(struct pci_dev *pdev, cons } /** - * s2io_rem_nic - Free the PCI device + * s2io_rem_nic - Free the PCI device * @pdev: structure containing the PCI related information of the device. - * Description: This function is called by the Pci subsystem to release a + * Description: This function is called by the Pci subsystem to release a * PCI device and free up all resource held up by the device. This could - * be in response to a Hot plug event or when the driver is to be removed + * be in response to a Hot plug event or when the driver is to be removed * from memory. */ @@ -4920,7 +4767,6 @@ static void __devexit s2io_rem_nic(struc pci_disable_device(pdev); pci_release_regions(pdev); pci_set_drvdata(pdev, NULL); - free_netdev(dev); } @@ -4936,11 +4782,11 @@ int __init s2io_starter(void) } /** - * s2io_closer - Cleanup routine for the driver + * s2io_closer - Cleanup routine for the driver * Description: This function is the cleanup routine for the driver. It unregist * ers the driver. */ -static void s2io_closer(void) +void s2io_closer(void) { pci_unregister_driver(&s2io_driver); DBG_PRINT(INIT_DBG, "cleanup done\n"); diff -urpN vanilla_kernel/drivers/net/s2io.h linux-2.6.12-rc6/drivers/net/s2io.h --- vanilla_kernel/drivers/net/s2io.h 2005-06-06 08:22:29.000000000 -0700 +++ linux-2.6.12-rc6/drivers/net/s2io.h 2005-06-27 06:21:32.000000000 -0700 @@ -31,6 +31,9 @@ #define SUCCESS 0 #define FAILURE -1 +/* Maximum time to flicker LED when asked to identify NIC using ethtool */ +#define MAX_FLICKER_TIME 60000 /* 60 Secs */ + /* Maximum outstanding splits to be configured into xena. */ typedef enum xena_max_outstanding_splits { XENA_ONE_SPLIT_TRANSACTION = 0, @@ -45,10 +48,10 @@ typedef enum xena_max_outstanding_splits #define XENA_MAX_OUTSTANDING_SPLITS(n) (n << 4) /* OS concerned variables and constants */ -#define WATCH_DOG_TIMEOUT 5*HZ -#define EFILL 0x1234 -#define ALIGN_SIZE 127 -#define PCIX_COMMAND_REGISTER 0x62 +#define WATCH_DOG_TIMEOUT 15*HZ +#define EFILL 0x1234 +#define ALIGN_SIZE 127 +#define PCIX_COMMAND_REGISTER 0x62 /* * Debug related variables. @@ -61,7 +64,7 @@ typedef enum xena_max_outstanding_splits #define INTR_DBG 4 /* Global variable that defines the present debug level of the driver. */ -static int debug_level = ERR_DBG; /* Default level. */ +int debug_level = ERR_DBG; /* Default level. */ /* DEBUG message print. */ #define DBG_PRINT(dbg_level, args...) if(!(debug_level> 48) @@ -382,7 +408,7 @@ typedef struct _RxD_t { #endif } RxD_t; -/* Structure that represents the Rx descriptor block which contains +/* Structure that represents the Rx descriptor block which contains * 128 Rx descriptors. */ #ifndef CONFIG_2BUFF_MODE @@ -392,11 +418,11 @@ typedef struct _RxD_block { u64 reserved_0; #define END_OF_BLOCK 0xFEFFFFFFFFFFFFFFULL - u64 reserved_1; /* 0xFEFFFFFFFFFFFFFF to mark last + u64 reserved_1; /* 0xFEFFFFFFFFFFFFFF to mark last * Rxd in this blk */ u64 reserved_2_pNext_RxD_block; /* Logical ptr to next */ u64 pNext_RxD_Blk_physical; /* Buff0_ptr.In a 32 bit arch - * the upper 32 bits should + * the upper 32 bits should * be 0 */ } RxD_block_t; #else @@ -405,13 +431,13 @@ typedef struct _RxD_block { RxD_t rxd[MAX_RXDS_PER_BLOCK]; #define END_OF_BLOCK 0xFEFFFFFFFFFFFFFFULL - u64 reserved_1; /* 0xFEFFFFFFFFFFFFFF to mark last Rxd + u64 reserved_1; /* 0xFEFFFFFFFFFFFFFF to mark last Rxd * in this blk */ u64 pNext_RxD_Blk_physical; /* Phy ponter to next blk. */ } RxD_block_t; #define SIZE_OF_BLOCK 4096 -/* Structure to hold virtual addresses of Buf0 and Buf1 in +/* Structure to hold virtual addresses of Buf0 and Buf1 in * 2buf mode. */ typedef struct bufAdd { void *ba_0_org; @@ -423,8 +449,8 @@ typedef struct bufAdd { /* Structure which stores all the MAC control parameters */ -/* This structure stores the offset of the RxD in the ring - * from which the Rx Interrupt processor can start picking +/* This structure stores the offset of the RxD in the ring + * from which the Rx Interrupt processor can start picking * up the RxDs for processing. */ typedef struct _rx_curr_get_info_t { @@ -436,7 +462,7 @@ typedef struct _rx_curr_get_info_t { typedef rx_curr_get_info_t rx_curr_put_info_t; /* This structure stores the offset of the TxDl in the FIFO - * from which the Tx Interrupt processor can start picking + * from which the Tx Interrupt processor can start picking * up the TxDLs for send complete interrupt processing. */ typedef struct { @@ -446,32 +472,96 @@ typedef struct { typedef tx_curr_get_info_t tx_curr_put_info_t; -/* Infomation related to the Tx and Rx FIFOs and Rings of Xena - * is maintained in this structure. - */ -typedef struct mac_info { -/* rx side stuff */ - /* Put pointer info which indictes which RxD has to be replenished +/* Structure that holds the Phy and virt addresses of the Blocks */ +typedef struct rx_block_info { + RxD_t *block_virt_addr; + dma_addr_t block_dma_addr; +} rx_block_info_t; + +/* pre declaration of the nic structure */ +typedef struct s2io_nic nic_t; + +/* Ring specific structure */ +typedef struct ring_info { + /* The ring number */ + int ring_no; + + /* + * Place holders for the virtual and physical addresses of + * all the Rx Blocks + */ + rx_block_info_t rx_blocks[MAX_RX_BLOCKS_PER_RING]; + int block_count; + int pkt_cnt; + + /* + * Put pointer info which indictes which RxD has to be replenished * with a new buffer. */ - rx_curr_put_info_t rx_curr_put_info[MAX_RX_RINGS]; + rx_curr_put_info_t rx_curr_put_info; - /* Get pointer info which indictes which is the last RxD that was + /* + * Get pointer info which indictes which is the last RxD that was * processed by the driver. */ - rx_curr_get_info_t rx_curr_get_info[MAX_RX_RINGS]; + rx_curr_get_info_t rx_curr_get_info; - u16 rmac_pause_time; - u16 mc_pause_threshold_q0q3; - u16 mc_pause_threshold_q4q7; +#ifndef CONFIG_S2IO_NAPI + /* Index to the absolute position of the put pointer of Rx ring */ + int put_pos; +#endif + +#ifdef CONFIG_2BUFF_MODE + /* Buffer Address store. */ + buffAdd_t **ba; +#endif + nic_t *nic; +} ring_info_t; +/* Fifo specific structure */ +typedef struct fifo_info { + /* FIFO number */ + int fifo_no; + + /* Maximum TxDs per TxDL */ + int max_txds; + + /* Place holder of all the TX List's Phy and Virt addresses. */ + list_info_hold_t *list_info; + + /* + * Current offset within the tx FIFO where driver would write + * new Tx frame + */ + tx_curr_put_info_t tx_curr_put_info; + + /* + * Current offset within tx FIFO from where the driver would start freeing + * the buffers + */ + tx_curr_get_info_t tx_curr_get_info; + + nic_t *nic; +}fifo_info_t; + +/* Infomation related to the Tx and Rx FIFOs and Rings of Xena + * is maintained in this structure. + */ +typedef struct mac_info { /* tx side stuff */ /* logical pointer of start of each Tx FIFO */ TxFIFO_element_t __iomem *tx_FIFO_start[MAX_TX_FIFOS]; -/* Current offset within tx_FIFO_start, where driver would write new Tx frame*/ - tx_curr_put_info_t tx_curr_put_info[MAX_TX_FIFOS]; - tx_curr_get_info_t tx_curr_get_info[MAX_TX_FIFOS]; + /* Fifo specific structure */ + fifo_info_t fifos[MAX_TX_FIFOS]; + +/* rx side stuff */ + /* Ring specific structure */ + ring_info_t rings[MAX_RX_RINGS]; + + u16 rmac_pause_time; + u16 mc_pause_threshold_q0q3; + u16 mc_pause_threshold_q4q7; void *stats_mem; /* orignal pointer to allocated mem */ dma_addr_t stats_mem_phy; /* Physical address of the stat block */ @@ -485,12 +575,6 @@ typedef struct { int usage_cnt; } usr_addr_t; -/* Structure that holds the Phy and virt addresses of the Blocks */ -typedef struct rx_block_info { - RxD_t *block_virt_addr; - dma_addr_t block_dma_addr; -} rx_block_info_t; - /* Default Tunable parameters of the NIC. */ #define DEFAULT_FIFO_LEN 4096 #define SMALL_RXD_CNT 30 * (MAX_RXDS_PER_BLOCK+1) @@ -499,7 +583,20 @@ typedef struct rx_block_info { #define LARGE_BLK_CNT 100 /* Structure representing one instance of the NIC */ -typedef struct s2io_nic { +struct s2io_nic { +#ifdef CONFIG_S2IO_NAPI + /* + * Count of packets to be processed in a given iteration, it will be indicated + * by the quota field of the device structure when NAPI is enabled. + */ + int pkts_to_process; +#endif + struct net_device *dev; + mac_info_t mac_control; + struct config_param config; + struct pci_dev *pdev; + void __iomem *bar0; + void __iomem *bar1; #define MAX_MAC_SUPPORTED 16 #define MAX_SUPPORTED_MULTICASTS MAX_MAC_SUPPORTED @@ -507,33 +604,17 @@ typedef struct s2io_nic { macaddr_t pre_mac_addr[MAX_MAC_SUPPORTED]; struct net_device_stats stats; - void __iomem *bar0; - void __iomem *bar1; - struct config_param config; - mac_info_t mac_control; int high_dma_flag; int device_close_flag; int device_enabled_once; - char name[32]; + char name[50]; struct tasklet_struct task; volatile unsigned long tasklet_status; - struct timer_list timer; - struct net_device *dev; - struct pci_dev *pdev; - u16 vendor_id; - u16 device_id; - u16 ccmd; - u32 cbar0_1; - u32 cbar0_2; - u32 cbar1_1; - u32 cbar1_2; - u32 cirq; - u8 cache_line; - u32 rom_expansion; - u16 pcix_cmd; - u32 irq; + /* Space to back up the PCI config space */ + u32 config_space[256 / sizeof(u32)]; + atomic_t rx_bufs_left[MAX_RX_RINGS]; spinlock_t tx_lock; @@ -558,27 +639,11 @@ typedef struct s2io_nic { u16 tx_err_count; u16 rx_err_count; -#ifndef CONFIG_S2IO_NAPI - /* Index to the absolute position of the put pointer of Rx ring. */ - int put_pos[MAX_RX_RINGS]; -#endif - - /* - * Place holders for the virtual and physical addresses of - * all the Rx Blocks - */ - rx_block_info_t rx_blocks[MAX_RX_RINGS][MAX_RX_BLOCKS_PER_RING]; - int block_count[MAX_RX_RINGS]; - int pkt_cnt[MAX_RX_RINGS]; - - /* Place holder of all the TX List's Phy and Virt addresses. */ - list_info_hold_t *list_info[MAX_TX_FIFOS]; - /* Id timer, used to blink NIC to physically identify NIC. */ struct timer_list id_timer; /* Restart timer, used to restart NIC if the device is stuck and - * a schedule task that will set the correct Link state once the + * a schedule task that will set the correct Link state once the * NIC's PHY has stabilized after a state change. */ #ifdef INIT_TQUEUE @@ -589,12 +654,12 @@ typedef struct s2io_nic { struct work_struct set_link_task; #endif - /* Flag that can be used to turn on or turn off the Rx checksum + /* Flag that can be used to turn on or turn off the Rx checksum * offload feature. */ int rx_csum; - /* after blink, the adapter must be restored with original + /* after blink, the adapter must be restored with original * values. */ u64 adapt_ctrl_org; @@ -604,16 +669,12 @@ typedef struct s2io_nic { #define LINK_DOWN 1 #define LINK_UP 2 -#ifdef CONFIG_2BUFF_MODE - /* Buffer Address store. */ - buffAdd_t **ba[MAX_RX_RINGS]; -#endif int task_flag; #define CARD_DOWN 1 #define CARD_UP 2 atomic_t card_state; volatile unsigned long link_state; -} nic_t; +}; #define RESET_ERROR 1; #define CMD_ERROR 2; @@ -622,9 +683,10 @@ typedef struct s2io_nic { #ifndef readq static inline u64 readq(void __iomem *addr) { - u64 ret = readl(addr + 4); - ret <<= 32; - ret |= readl(addr); + u64 ret = 0; + ret = readl(addr + 4); + (u64) ret <<= 32; + (u64) ret |= readl(addr); return ret; } @@ -637,10 +699,10 @@ static inline void writeq(u64 val, void writel((u32) (val >> 32), (addr + 4)); } -/* In 32 bit modes, some registers have to be written in a +/* In 32 bit modes, some registers have to be written in a * particular order to expect correct hardware operation. The - * macro SPECIAL_REG_WRITE is used to perform such ordered - * writes. Defines UF (Upper First) and LF (Lower First) will + * macro SPECIAL_REG_WRITE is used to perform such ordered + * writes. Defines UF (Upper First) and LF (Lower First) will * be used to specify the required write order. */ #define UF 1 @@ -716,6 +778,7 @@ static inline void SPECIAL_REG_WRITE(u64 #define PCC_FB_ECC_ERR vBIT(0xff, 16, 8) /* Interrupt to indicate PCC_FB_ECC Error. */ +#define RXD_GET_VLAN_TAG(Control_2) (u16)(Control_2 & MASK_VLAN_TAG) /* * Prototype declaration. */ @@ -725,36 +788,29 @@ static void __devexit s2io_rem_nic(struc static int init_shared_mem(struct s2io_nic *sp); static void free_shared_mem(struct s2io_nic *sp); static int init_nic(struct s2io_nic *nic); -#ifndef CONFIG_S2IO_NAPI -static void rx_intr_handler(struct s2io_nic *sp); -#endif -static void tx_intr_handler(struct s2io_nic *sp); +static void rx_intr_handler(ring_info_t *ring_data); +static void tx_intr_handler(fifo_info_t *fifo_data); static void alarm_intr_handler(struct s2io_nic *sp); static int s2io_starter(void); -static void s2io_closer(void); +void s2io_closer(void); static void s2io_tx_watchdog(struct net_device *dev); static void s2io_tasklet(unsigned long dev_addr); static void s2io_set_multicast(struct net_device *dev); -#ifndef CONFIG_2BUFF_MODE -static int rx_osm_handler(nic_t * sp, u16 len, RxD_t * rxdp, int ring_no); -#else -static int rx_osm_handler(nic_t * sp, RxD_t * rxdp, int ring_no, - buffAdd_t * ba); -#endif -static void s2io_link(nic_t * sp, int link); -static void s2io_reset(nic_t * sp); -#ifdef CONFIG_S2IO_NAPI +static int rx_osm_handler(ring_info_t *ring_data, RxD_t * rxdp); +void s2io_link(nic_t * sp, int link); +void s2io_reset(nic_t * sp); +#if defined(CONFIG_S2IO_NAPI) static int s2io_poll(struct net_device *dev, int *budget); #endif static void s2io_init_pci(nic_t * sp); -static int s2io_set_mac_addr(struct net_device *dev, u8 * addr); +int s2io_set_mac_addr(struct net_device *dev, u8 * addr); static irqreturn_t s2io_isr(int irq, void *dev_id, struct pt_regs *regs); -static int verify_xena_quiescence(u64 val64, int flag); +static int verify_xena_quiescence(nic_t *sp, u64 val64, int flag); static struct ethtool_ops netdev_ethtool_ops; static void s2io_set_link(unsigned long data); -static int s2io_set_swapper(nic_t * sp); -static void s2io_card_down(nic_t * nic); -static int s2io_card_up(nic_t * nic); - +int s2io_set_swapper(nic_t * sp); +static void s2io_card_down(nic_t *nic); +static int s2io_card_up(nic_t *nic); +int get_xena_rev_id(struct pci_dev *pdev); #endif /* _S2IO_H */ From raghavendra.koushik@neterion.com Thu Jul 7 15:23:01 2005 Received: with ECARTIS (v1.0.0; list netdev); Thu, 07 Jul 2005 15:23:10 -0700 (PDT) Received: from linux.site (adsl-67-120-213-161.dsl.sntc01.pacbell.net [67.120.213.161]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j67MN1H9013553 for ; Thu, 7 Jul 2005 15:23:01 -0700 Received: by linux.site (Postfix, from userid 0) id C38D189828; Thu, 7 Jul 2005 15:10:33 -0700 (PDT) To: jgarzik@pobox.com, netdev@oss.sgi.com Cc: raghavendra.koushik@neterion.com, ravinandan.arakali@neterion.com, leonid.grossman@neterion.com, rapuru.sriram@neterion.com From: raghavendra.koushik@neterion.com Subject: [PATCH 2.6.12.1 2/12] S2io: Hardware fixes Message-Id: <20050707221033.C38D189828@linux.site> Date: Thu, 7 Jul 2005 15:10:33 -0700 (PDT) X-archive-position: 2673 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: raghavendra.koushik@neterion.com Precedence: bulk X-list: netdev Content-Length: 23258 Lines: 666 Hi, Below patch addresses few h/w specific issues. 1. Check for additional ownership bit on Rx path before starting Rx processing. 2. Enable only 4 PCCs(Per Context Controller) for Xframe I revisions less than 4. 3. Program Rx and Tx round robin registers depending on no. of rings/FIFOs. 4. Tx continous interrupts is now a loadable parameter. 5. Reset the card if we get double-bit ECC errors. 6. A soft reset of XGXS being done to force a link state change has been eliminated. 7. After a reset, clear "parity error detected" bit, PCI-X ECC status register, and PCI_STATUS bit in tx_pic_int register. 8. The error in the disabling allmulticast implementation has been rectified. 9. Leave the PCI-X parameters MMRBC, OST etc. at their BIOS/system defaults. Signed-off-by: Ravinandan Arakali Signed-off-by: Raghavendra Koushik --- diff -uprN vanilla_kernel/drivers/net/s2io-regs.h linux-2.6.12-rc6/drivers/net/s2io-regs.h --- vanilla_kernel/drivers/net/s2io-regs.h 2005-06-27 06:25:09.000000000 -0700 +++ linux-2.6.12-rc6/drivers/net/s2io-regs.h 2005-06-27 06:25:50.000000000 -0700 @@ -62,6 +62,7 @@ typedef struct _XENA_dev_config { #define ADAPTER_STATUS_RMAC_REMOTE_FAULT BIT(6) #define ADAPTER_STATUS_RMAC_LOCAL_FAULT BIT(7) #define ADAPTER_STATUS_RMAC_PCC_IDLE vBIT(0xFF,8,8) +#define ADAPTER_STATUS_RMAC_PCC_FOUR_IDLE vBIT(0x0F,8,8) #define ADAPTER_STATUS_RC_PRC_QUIESCENT vBIT(0xFF,16,8) #define ADAPTER_STATUS_MC_DRAM_READY BIT(24) #define ADAPTER_STATUS_MC_QUEUES_READY BIT(25) @@ -245,6 +246,7 @@ typedef struct _XENA_dev_config { #define STAT_TRSF_PER(n) TBD #define PER_SEC 0x208d5 #define SET_UPDT_PERIOD(n) vBIT((PER_SEC*n),32,32) +#define SET_UPDT_CLICKS(val) vBIT(val, 32, 32) u64 stat_addr; @@ -289,6 +291,7 @@ typedef struct _XENA_dev_config { u64 pcc_err_reg; #define PCC_FB_ECC_DB_ERR vBIT(0xFF, 16, 8) +#define PCC_ENABLE_FOUR vBIT(0x0F,0,8) u64 pcc_err_mask; u64 pcc_err_alarm; @@ -690,6 +693,10 @@ typedef struct _XENA_dev_config { #define MC_ERR_REG_MIRI_CRI_ERR_0 BIT(22) #define MC_ERR_REG_MIRI_CRI_ERR_1 BIT(23) #define MC_ERR_REG_SM_ERR BIT(31) +#define MC_ERR_REG_ECC_ALL_SNG (BIT(6) | \ + BIT(7) | BIT(17) | BIT(19)) +#define MC_ERR_REG_ECC_ALL_DBL (BIT(14) | \ + BIT(15) | BIT(18) | BIT(20)) u64 mc_err_mask; u64 mc_err_alarm; diff -uprN vanilla_kernel/drivers/net/s2io.c linux-2.6.12-rc6/drivers/net/s2io.c --- vanilla_kernel/drivers/net/s2io.c 2005-06-27 06:25:09.000000000 -0700 +++ linux-2.6.12-rc6/drivers/net/s2io.c 2005-06-27 06:25:50.000000000 -0700 @@ -67,6 +67,16 @@ static char s2io_driver_name[] = "Neterion"; static char s2io_driver_version[] = "Version 1.7.7"; +static inline int RXD_IS_UP2DT(RxD_t *rxdp) +{ + int ret; + + ret = ((!(rxdp->Control_1 & RXD_OWN_XENA)) && + (GET_RXD_MARKER(rxdp->Control_2) != THE_RXD_MARK)); + + return ret; +} + /* * Cards with following subsystem_id have a link state indication * problem, 600B, 600C, 600D, 640B, 640C and 640D. @@ -229,6 +239,7 @@ static unsigned int rx_ring_sz[MAX_RX_RI static unsigned int Stats_refresh_time = 4; static unsigned int rts_frm_len[MAX_RX_RINGS] = {[0 ...(MAX_RX_RINGS - 1)] = 0 }; +static unsigned int use_continuous_tx_intrs = 1; static unsigned int rmac_pause_time = 65535; static unsigned int mc_pause_threshold_q0q3 = 187; static unsigned int mc_pause_threshold_q4q7 = 187; @@ -637,7 +648,7 @@ static int init_nic(struct s2io_nic *nic mac_control = &nic->mac_control; config = &nic->config; - /* to set the swapper control on the card */ + /* to set the swapper controle on the card */ if(s2io_set_swapper(nic)) { DBG_PRINT(ERR_DBG,"ERROR: Setting Swapper failed\n"); return -1; @@ -755,6 +766,13 @@ static int init_nic(struct s2io_nic *nic val64 |= BIT(0); /* To enable the FIFO partition. */ writeq(val64, &bar0->tx_fifo_partition_0); + /* + * Disable 4 PCCs for Xena1, 2 and 3 as per H/W bug + * SXE-008 TRANSMIT DMA ARBITRATION ISSUE. + */ + if (get_xena_rev_id(nic->pdev) < 4) + writeq(PCC_ENABLE_FOUR, &bar0->pcc_enable); + val64 = readq(&bar0->tx_fifo_partition_0); DBG_PRINT(INIT_DBG, "Fifo partition at: 0x%p is: 0x%llx\n", &bar0->tx_fifo_partition_0, (unsigned long long) val64); @@ -822,37 +840,250 @@ static int init_nic(struct s2io_nic *nic } writeq(val64, &bar0->rx_queue_cfg); - /* Initializing the Tx round robin registers to 0 - * filling tx and rx round robin registers as per - * the number of FIFOs and Rings is still TODO - */ - writeq(0, &bar0->tx_w_round_robin_0); - writeq(0, &bar0->tx_w_round_robin_1); - writeq(0, &bar0->tx_w_round_robin_2); - writeq(0, &bar0->tx_w_round_robin_3); - writeq(0, &bar0->tx_w_round_robin_4); - - /* - * TODO - * Disable Rx steering. Hard coding all packets to be steered to - * Queue 0 for now. + /* + * Filling Tx round robin registers + * as per the number of FIFOs */ - val64 = 0x8080808080808080ULL; - writeq(val64, &bar0->rts_qos_steering); + switch (config->tx_fifo_num) { + case 1: + val64 = 0x0000000000000000ULL; + writeq(val64, &bar0->tx_w_round_robin_0); + writeq(val64, &bar0->tx_w_round_robin_1); + writeq(val64, &bar0->tx_w_round_robin_2); + writeq(val64, &bar0->tx_w_round_robin_3); + writeq(val64, &bar0->tx_w_round_robin_4); + break; + case 2: + val64 = 0x0000010000010000ULL; + writeq(val64, &bar0->tx_w_round_robin_0); + val64 = 0x0100000100000100ULL; + writeq(val64, &bar0->tx_w_round_robin_1); + val64 = 0x0001000001000001ULL; + writeq(val64, &bar0->tx_w_round_robin_2); + val64 = 0x0000010000010000ULL; + writeq(val64, &bar0->tx_w_round_robin_3); + val64 = 0x0100000000000000ULL; + writeq(val64, &bar0->tx_w_round_robin_4); + break; + case 3: + val64 = 0x0001000102000001ULL; + writeq(val64, &bar0->tx_w_round_robin_0); + val64 = 0x0001020000010001ULL; + writeq(val64, &bar0->tx_w_round_robin_1); + val64 = 0x0200000100010200ULL; + writeq(val64, &bar0->tx_w_round_robin_2); + val64 = 0x0001000102000001ULL; + writeq(val64, &bar0->tx_w_round_robin_3); + val64 = 0x0001020000000000ULL; + writeq(val64, &bar0->tx_w_round_robin_4); + break; + case 4: + val64 = 0x0001020300010200ULL; + writeq(val64, &bar0->tx_w_round_robin_0); + val64 = 0x0100000102030001ULL; + writeq(val64, &bar0->tx_w_round_robin_1); + val64 = 0x0200010000010203ULL; + writeq(val64, &bar0->tx_w_round_robin_2); + val64 = 0x0001020001000001ULL; + writeq(val64, &bar0->tx_w_round_robin_3); + val64 = 0x0203000100000000ULL; + writeq(val64, &bar0->tx_w_round_robin_4); + break; + case 5: + val64 = 0x0001000203000102ULL; + writeq(val64, &bar0->tx_w_round_robin_0); + val64 = 0x0001020001030004ULL; + writeq(val64, &bar0->tx_w_round_robin_1); + val64 = 0x0001000203000102ULL; + writeq(val64, &bar0->tx_w_round_robin_2); + val64 = 0x0001020001030004ULL; + writeq(val64, &bar0->tx_w_round_robin_3); + val64 = 0x0001000000000000ULL; + writeq(val64, &bar0->tx_w_round_robin_4); + break; + case 6: + val64 = 0x0001020304000102ULL; + writeq(val64, &bar0->tx_w_round_robin_0); + val64 = 0x0304050001020001ULL; + writeq(val64, &bar0->tx_w_round_robin_1); + val64 = 0x0203000100000102ULL; + writeq(val64, &bar0->tx_w_round_robin_2); + val64 = 0x0304000102030405ULL; + writeq(val64, &bar0->tx_w_round_robin_3); + val64 = 0x0001000200000000ULL; + writeq(val64, &bar0->tx_w_round_robin_4); + break; + case 7: + val64 = 0x0001020001020300ULL; + writeq(val64, &bar0->tx_w_round_robin_0); + val64 = 0x0102030400010203ULL; + writeq(val64, &bar0->tx_w_round_robin_1); + val64 = 0x0405060001020001ULL; + writeq(val64, &bar0->tx_w_round_robin_2); + val64 = 0x0304050000010200ULL; + writeq(val64, &bar0->tx_w_round_robin_3); + val64 = 0x0102030000000000ULL; + writeq(val64, &bar0->tx_w_round_robin_4); + break; + case 8: + val64 = 0x0001020300040105ULL; + writeq(val64, &bar0->tx_w_round_robin_0); + val64 = 0x0200030106000204ULL; + writeq(val64, &bar0->tx_w_round_robin_1); + val64 = 0x0103000502010007ULL; + writeq(val64, &bar0->tx_w_round_robin_2); + val64 = 0x0304010002060500ULL; + writeq(val64, &bar0->tx_w_round_robin_3); + val64 = 0x0103020400000000ULL; + writeq(val64, &bar0->tx_w_round_robin_4); + break; + } + + /* Filling the Rx round robin registers as per the + * number of Rings and steering based on QoS. + */ + switch (config->rx_ring_num) { + case 1: + val64 = 0x8080808080808080ULL; + writeq(val64, &bar0->rts_qos_steering); + break; + case 2: + val64 = 0x0000010000010000ULL; + writeq(val64, &bar0->rx_w_round_robin_0); + val64 = 0x0100000100000100ULL; + writeq(val64, &bar0->rx_w_round_robin_1); + val64 = 0x0001000001000001ULL; + writeq(val64, &bar0->rx_w_round_robin_2); + val64 = 0x0000010000010000ULL; + writeq(val64, &bar0->rx_w_round_robin_3); + val64 = 0x0100000000000000ULL; + writeq(val64, &bar0->rx_w_round_robin_4); + + val64 = 0x8080808040404040ULL; + writeq(val64, &bar0->rts_qos_steering); + break; + case 3: + val64 = 0x0001000102000001ULL; + writeq(val64, &bar0->rx_w_round_robin_0); + val64 = 0x0001020000010001ULL; + writeq(val64, &bar0->rx_w_round_robin_1); + val64 = 0x0200000100010200ULL; + writeq(val64, &bar0->rx_w_round_robin_2); + val64 = 0x0001000102000001ULL; + writeq(val64, &bar0->rx_w_round_robin_3); + val64 = 0x0001020000000000ULL; + writeq(val64, &bar0->rx_w_round_robin_4); + + val64 = 0x8080804040402020ULL; + writeq(val64, &bar0->rts_qos_steering); + break; + case 4: + val64 = 0x0001020300010200ULL; + writeq(val64, &bar0->rx_w_round_robin_0); + val64 = 0x0100000102030001ULL; + writeq(val64, &bar0->rx_w_round_robin_1); + val64 = 0x0200010000010203ULL; + writeq(val64, &bar0->rx_w_round_robin_2); + val64 = 0x0001020001000001ULL; + writeq(val64, &bar0->rx_w_round_robin_3); + val64 = 0x0203000100000000ULL; + writeq(val64, &bar0->rx_w_round_robin_4); + + val64 = 0x8080404020201010ULL; + writeq(val64, &bar0->rts_qos_steering); + break; + case 5: + val64 = 0x0001000203000102ULL; + writeq(val64, &bar0->rx_w_round_robin_0); + val64 = 0x0001020001030004ULL; + writeq(val64, &bar0->rx_w_round_robin_1); + val64 = 0x0001000203000102ULL; + writeq(val64, &bar0->rx_w_round_robin_2); + val64 = 0x0001020001030004ULL; + writeq(val64, &bar0->rx_w_round_robin_3); + val64 = 0x0001000000000000ULL; + writeq(val64, &bar0->rx_w_round_robin_4); + + val64 = 0x8080404020201008ULL; + writeq(val64, &bar0->rts_qos_steering); + break; + case 6: + val64 = 0x0001020304000102ULL; + writeq(val64, &bar0->rx_w_round_robin_0); + val64 = 0x0304050001020001ULL; + writeq(val64, &bar0->rx_w_round_robin_1); + val64 = 0x0203000100000102ULL; + writeq(val64, &bar0->rx_w_round_robin_2); + val64 = 0x0304000102030405ULL; + writeq(val64, &bar0->rx_w_round_robin_3); + val64 = 0x0001000200000000ULL; + writeq(val64, &bar0->rx_w_round_robin_4); + + val64 = 0x8080404020100804ULL; + writeq(val64, &bar0->rts_qos_steering); + break; + case 7: + val64 = 0x0001020001020300ULL; + writeq(val64, &bar0->rx_w_round_robin_0); + val64 = 0x0102030400010203ULL; + writeq(val64, &bar0->rx_w_round_robin_1); + val64 = 0x0405060001020001ULL; + writeq(val64, &bar0->rx_w_round_robin_2); + val64 = 0x0304050000010200ULL; + writeq(val64, &bar0->rx_w_round_robin_3); + val64 = 0x0102030000000000ULL; + writeq(val64, &bar0->rx_w_round_robin_4); + + val64 = 0x8080402010080402ULL; + writeq(val64, &bar0->rts_qos_steering); + break; + case 8: + val64 = 0x0001020300040105ULL; + writeq(val64, &bar0->rx_w_round_robin_0); + val64 = 0x0200030106000204ULL; + writeq(val64, &bar0->rx_w_round_robin_1); + val64 = 0x0103000502010007ULL; + writeq(val64, &bar0->rx_w_round_robin_2); + val64 = 0x0304010002060500ULL; + writeq(val64, &bar0->rx_w_round_robin_3); + val64 = 0x0103020400000000ULL; + writeq(val64, &bar0->rx_w_round_robin_4); + + val64 = 0x8040201008040201ULL; + writeq(val64, &bar0->rts_qos_steering); + break; + } /* UDP Fix */ val64 = 0; for (i = 0; i < 8; i++) writeq(val64, &bar0->rts_frm_len_n[i]); - /* Set the default rts frame length for ring0 */ - writeq(MAC_RTS_FRM_LEN_SET(dev->mtu+22), - &bar0->rts_frm_len_n[0]); + /* Set the default rts frame length for the rings configured */ + val64 = MAC_RTS_FRM_LEN_SET(dev->mtu+22); + for (i = 0 ; i < config->rx_ring_num ; i++) + writeq(val64, &bar0->rts_frm_len_n[i]); + + /* Set the frame length for the configured rings + * desired by the user + */ + for (i = 0; i < config->rx_ring_num; i++) { + /* If rts_frm_len[i] == 0 then it is assumed that user not + * specified frame length steering. + * If the user provides the frame length then program + * the rts_frm_len register for those values or else + * leave it as it is. + */ + if (rts_frm_len[i] != 0) { + writeq(MAC_RTS_FRM_LEN_SET(rts_frm_len[i]), + &bar0->rts_frm_len_n[i]); + } + } /* Program statistics memory */ writeq(mac_control->stats_mem_phy, &bar0->stat_addr); val64 = SET_UPDT_PERIOD(Stats_refresh_time) | - STAT_CFG_STAT_RO | STAT_CFG_STAT_EN; + STAT_CFG_STAT_RO | STAT_CFG_STAT_EN; writeq(val64, &bar0->stat_cfg); /* @@ -876,13 +1107,14 @@ static int init_nic(struct s2io_nic *nic val64 = TTI_DATA1_MEM_TX_TIMER_VAL(0x2078) | TTI_DATA1_MEM_TX_URNG_A(0xA) | TTI_DATA1_MEM_TX_URNG_B(0x10) | - TTI_DATA1_MEM_TX_URNG_C(0x30) | TTI_DATA1_MEM_TX_TIMER_AC_EN | - TTI_DATA1_MEM_TX_TIMER_CI_EN; + TTI_DATA1_MEM_TX_URNG_C(0x30) | TTI_DATA1_MEM_TX_TIMER_AC_EN; + if (use_continuous_tx_intrs) + val64 |= TTI_DATA1_MEM_TX_TIMER_CI_EN; writeq(val64, &bar0->tti_data1_mem); val64 = TTI_DATA2_MEM_TX_UFC_A(0x10) | TTI_DATA2_MEM_TX_UFC_B(0x20) | - TTI_DATA2_MEM_TX_UFC_C(0x40) | TTI_DATA2_MEM_TX_UFC_D(0x80); + TTI_DATA2_MEM_TX_UFC_C(0x70) | TTI_DATA2_MEM_TX_UFC_D(0x80); writeq(val64, &bar0->tti_data2_mem); val64 = TTI_CMD_MEM_WE | TTI_CMD_MEM_STROBE_NEW_CMD; @@ -926,10 +1158,11 @@ static int init_nic(struct s2io_nic *nic writeq(val64, &bar0->rti_command_mem); /* - * Once the operation completes, the Strobe bit of the command - * register will be reset. We poll for this particular condition - * We wait for a maximum of 500ms for the operation to complete, - * if it's not complete by then we return error. + * Once the operation completes, the Strobe bit of the + * command register will be reset. We poll for this + * particular condition. We wait for a maximum of 500ms + * for the operation to complete, if it's not complete + * by then we return error. */ time = 0; while (TRUE) { @@ -1184,10 +1417,10 @@ static void en_dis_able_nic_intrs(struct temp64 &= ~((u64) val64); writeq(temp64, &bar0->general_int_mask); /* - * All MC block error interrupts are disabled for now. - * TODO + * Enable all MC Intrs. */ - writeq(DISABLE_ALL_INTRS, &bar0->mc_int_mask); + writeq(0x0, &bar0->mc_int_mask); + writeq(0x0, &bar0->mc_err_mask); } else if (flag == DISABLE_INTRS) { /* * Disable MC Intrs in the general intr mask register @@ -1246,23 +1479,41 @@ static void en_dis_able_nic_intrs(struct } } -static int check_prc_pcc_state(u64 val64, int flag) +static int check_prc_pcc_state(u64 val64, int flag, int rev_id) { int ret = 0; if (flag == FALSE) { - if (!(val64 & ADAPTER_STATUS_RMAC_PCC_IDLE) && - ((val64 & ADAPTER_STATUS_RC_PRC_QUIESCENT) == - ADAPTER_STATUS_RC_PRC_QUIESCENT)) { - ret = 1; + if (rev_id >= 4) { + if (!(val64 & ADAPTER_STATUS_RMAC_PCC_IDLE) && + ((val64 & ADAPTER_STATUS_RC_PRC_QUIESCENT) == + ADAPTER_STATUS_RC_PRC_QUIESCENT)) { + ret = 1; + } + } else { + if (!(val64 & ADAPTER_STATUS_RMAC_PCC_FOUR_IDLE) && + ((val64 & ADAPTER_STATUS_RC_PRC_QUIESCENT) == + ADAPTER_STATUS_RC_PRC_QUIESCENT)) { + ret = 1; + } } } else { - if (((val64 & ADAPTER_STATUS_RMAC_PCC_IDLE) == - ADAPTER_STATUS_RMAC_PCC_IDLE) && - (!(val64 & ADAPTER_STATUS_RC_PRC_QUIESCENT) || - ((val64 & ADAPTER_STATUS_RC_PRC_QUIESCENT) == - ADAPTER_STATUS_RC_PRC_QUIESCENT))) { - ret = 1; + if (rev_id >= 4) { + if (((val64 & ADAPTER_STATUS_RMAC_PCC_IDLE) == + ADAPTER_STATUS_RMAC_PCC_IDLE) && + (!(val64 & ADAPTER_STATUS_RC_PRC_QUIESCENT) || + ((val64 & ADAPTER_STATUS_RC_PRC_QUIESCENT) == + ADAPTER_STATUS_RC_PRC_QUIESCENT))) { + ret = 1; + } + } else { + if (((val64 & ADAPTER_STATUS_RMAC_PCC_FOUR_IDLE) == + ADAPTER_STATUS_RMAC_PCC_FOUR_IDLE) && + (!(val64 & ADAPTER_STATUS_RC_PRC_QUIESCENT) || + ((val64 & ADAPTER_STATUS_RC_PRC_QUIESCENT) == + ADAPTER_STATUS_RC_PRC_QUIESCENT))) { + ret = 1; + } } } @@ -1285,6 +1536,7 @@ static int verify_xena_quiescence(nic_t { int ret = 0; u64 tmp64 = ~((u64) val64); + int rev_id = get_xena_rev_id(sp->pdev); if (! (tmp64 & @@ -1293,7 +1545,7 @@ static int verify_xena_quiescence(nic_t ADAPTER_STATUS_PIC_QUIESCENT | ADAPTER_STATUS_MC_DRAM_READY | ADAPTER_STATUS_MC_QUEUES_READY | ADAPTER_STATUS_M_PLL_LOCK | ADAPTER_STATUS_P_PLL_LOCK))) { - ret = check_prc_pcc_state(val64, flag); + ret = check_prc_pcc_state(val64, flag, rev_id); } return ret; @@ -1406,7 +1658,7 @@ static int start_nic(struct s2io_nic *ni /* Enable select interrupts */ interruptible = TX_TRAFFIC_INTR | RX_TRAFFIC_INTR | TX_MAC_INTR | - RX_MAC_INTR; + RX_MAC_INTR | MC_INTR; en_dis_able_nic_intrs(nic, interruptible, ENABLE_INTRS); /* @@ -1438,21 +1690,6 @@ static int start_nic(struct s2io_nic *ni */ schedule_work(&nic->set_link_task); - /* - * Here we are performing soft reset on XGXS to - * force link down. Since link is already up, we will get - * link state change interrupt after this reset - */ - SPECIAL_REG_WRITE(0x80010515001E0000ULL, &bar0->dtx_control, UF); - val64 = readq(&bar0->dtx_control); - udelay(50); - SPECIAL_REG_WRITE(0x80010515001E00E0ULL, &bar0->dtx_control, UF); - val64 = readq(&bar0->dtx_control); - udelay(50); - SPECIAL_REG_WRITE(0x80070515001F00E4ULL, &bar0->dtx_control, UF); - val64 = readq(&bar0->dtx_control); - udelay(50); - return SUCCESS; } @@ -1523,7 +1760,7 @@ static void stop_nic(struct s2io_nic *ni /* Disable all interrupts */ interruptible = TX_TRAFFIC_INTR | RX_TRAFFIC_INTR | TX_MAC_INTR | - RX_MAC_INTR; + RX_MAC_INTR | MC_INTR; en_dis_able_nic_intrs(nic, interruptible, DISABLE_INTRS); /* Disable PRCs */ @@ -1738,6 +1975,7 @@ int fill_rx_buffers(struct s2io_nic *nic off++; mac_control->rings[ring_no].rx_curr_put_info.offset = off; #endif + rxdp->Control_2 |= SET_RXD_MARKER; atomic_inc(&nic->rx_bufs_left[ring_no]); alloc_tab++; @@ -1966,11 +2204,8 @@ static void rx_intr_handler(ring_info_t put_offset = (put_block * (MAX_RXDS_PER_BLOCK + 1)) + put_info.offset; #endif - while ((!(rxdp->Control_1 & RXD_OWN_XENA)) && -#ifdef CONFIG_2BUFF_MODE - (!rxdp->Control_2 & BIT(0)) && -#endif - (((get_offset + 1) % ring_bufs) != put_offset)) { + while (RXD_IS_UP2DT(rxdp) && + (((get_offset + 1) % ring_bufs) != put_offset)) { skb = (struct sk_buff *) ((unsigned long)rxdp->Host_Control); if (skb == NULL) { DBG_PRINT(ERR_DBG, "%s: The skb is ", @@ -2154,6 +2389,21 @@ static void alarm_intr_handler(struct s2 schedule_work(&nic->set_link_task); } + /* Handling Ecc errors */ + val64 = readq(&bar0->mc_err_reg); + writeq(val64, &bar0->mc_err_reg); + if (val64 & (MC_ERR_REG_ECC_ALL_SNG | MC_ERR_REG_ECC_ALL_DBL)) { + if (val64 & MC_ERR_REG_ECC_ALL_DBL) { + DBG_PRINT(ERR_DBG, "%s: Device indicates ", + dev->name); + DBG_PRINT(ERR_DBG, "double ECC error!!\n"); + netif_stop_queue(dev); + schedule_work(&nic->rst_timer_task); + } else { + /* Device can recover from Single ECC errors */ + } + } + /* In case of a serious error, the device will be Reset. */ val64 = readq(&bar0->serr_source); if (val64 & SERR_SOURCE_ANY) { @@ -2227,7 +2477,7 @@ void s2io_reset(nic_t * sp) { XENA_dev_config_t __iomem *bar0 = sp->bar0; u64 val64; - u16 subid; + u16 subid, pci_cmd; val64 = SW_RESET_ALL; writeq(val64, &bar0->sw_reset); @@ -2256,6 +2506,18 @@ void s2io_reset(nic_t * sp) /* Set swapper to enable I/O register access */ s2io_set_swapper(sp); + /* Clear certain PCI/PCI-X fields after reset */ + pci_read_config_word(sp->pdev, PCI_STATUS, &pci_cmd); + pci_cmd &= 0x7FFF; /* Clear parity err detect bit */ + pci_write_config_word(sp->pdev, PCI_STATUS, pci_cmd); + + /* Clearing PCIX Ecc status register */ + pci_write_config_dword(sp->pdev, 0x68, 0); + + val64 = readq(&bar0->txpic_int_reg); + val64 &= ~BIT(62); /* Clearing PCI_STATUS error reflected here */ + writeq(val64, &bar0->txpic_int_reg); + /* Reset device statistics maintained by OS */ memset(&sp->stats, 0, sizeof (struct net_device_stats)); @@ -2798,6 +3060,8 @@ static void s2io_set_multicast(struct ne /* Disable all Multicast addresses */ writeq(RMAC_ADDR_DATA0_MEM_ADDR(dis_addr), &bar0->rmac_addr_data0_mem); + writeq(RMAC_ADDR_DATA1_MEM_MASK(0x0), + &bar0->rmac_addr_data1_mem); val64 = RMAC_ADDR_CMD_MEM_WE | RMAC_ADDR_CMD_MEM_STROBE_NEW_CMD | RMAC_ADDR_CMD_MEM_OFFSET(sp->all_multi_pos); @@ -4370,21 +4634,6 @@ static void s2io_init_pci(nic_t * sp) (pci_cmd | PCI_COMMAND_PARITY)); pci_read_config_word(sp->pdev, PCI_COMMAND, &pci_cmd); - /* Set MMRB count to 1024 in PCI-X Command register. */ - pcix_cmd &= 0xFFF3; - pci_write_config_word(sp->pdev, PCIX_COMMAND_REGISTER, - (pcix_cmd | (0x1 << 2))); /* MMRBC 1K */ - pci_read_config_word(sp->pdev, PCIX_COMMAND_REGISTER, - &(pcix_cmd)); - - /* Setting Maximum outstanding splits based on system type. */ - pcix_cmd &= 0xFF8F; - pcix_cmd |= XENA_MAX_OUTSTANDING_SPLITS(0x1); /* 2 splits. */ - pci_write_config_word(sp->pdev, PCIX_COMMAND_REGISTER, - pcix_cmd); - pci_read_config_word(sp->pdev, PCIX_COMMAND_REGISTER, - &(pcix_cmd)); - /* Forcibly disabling relaxed ordering capability of the card. */ pcix_cmd &= 0xfffd; pci_write_config_word(sp->pdev, PCIX_COMMAND_REGISTER, @@ -4401,6 +4650,7 @@ module_param_array(tx_fifo_len, uint, NU module_param_array(rx_ring_sz, uint, NULL, 0); module_param(Stats_refresh_time, int, 0); module_param_array(rts_frm_len, uint, NULL, 0); +module_param(use_continuous_tx_intrs, int, 1); module_param(rmac_pause_time, int, 0); module_param(mc_pause_threshold_q0q3, int, 0); module_param(mc_pause_threshold_q4q7, int, 0); diff -uprN vanilla_kernel/drivers/net/s2io.h linux-2.6.12-rc6/drivers/net/s2io.h --- vanilla_kernel/drivers/net/s2io.h 2005-06-27 06:25:09.000000000 -0700 +++ linux-2.6.12-rc6/drivers/net/s2io.h 2005-06-27 06:25:50.000000000 -0700 @@ -372,6 +372,10 @@ typedef struct _RxD_t { #define RXD_GET_L4_CKSUM(val) ((u16)(val) & 0xFFFF) u64 Control_2; +#define THE_RXD_MARK 0x3 +#define SET_RXD_MARKER vBIT(THE_RXD_MARK, 0, 2) +#define GET_RXD_MARKER(ctrl) ((ctrl & SET_RXD_MARKER) >> 62) + #ifndef CONFIG_2BUFF_MODE #define MASK_BUFFER0_SIZE vBIT(0x3FFF,2,14) #define SET_BUFFER0_SIZE(val) vBIT(val,2,14) From davem@davemloft.net Thu Jul 7 15:26:12 2005 Received: with ECARTIS (v1.0.0; list netdev); Thu, 07 Jul 2005 15:26:17 -0700 (PDT) Received: from sunset.davemloft.net ([216.27.180.168]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j67MQBH9014292 for ; Thu, 7 Jul 2005 15:26:11 -0700 Received: from localhost ([127.0.0.1] ident=davem) by sunset.davemloft.net with esmtp (Exim 4.50) id 1Dqen3-00033k-Qk; Thu, 07 Jul 2005 15:24:17 -0700 Date: Thu, 07 Jul 2005 15:24:17 -0700 (PDT) Message-Id: <20050707.152417.59653729.davem@davemloft.net> To: tgraf@suug.ch Cc: dada1@cosmosbay.com, netdev@oss.sgi.com Subject: Re: [PATCH] loop unrolling in net/sched/sch_generic.c From: "David S. Miller" In-Reply-To: <20050707213450.GB16076@postel.suug.ch> References: <20050706124206.GW16076@postel.suug.ch> <20050707.141718.85410359.davem@davemloft.net> <20050707213450.GB16076@postel.suug.ch> X-Mailer: Mew version 4.2 on Emacs 21.4 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 2674 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@davemloft.net Precedence: bulk X-list: netdev Content-Length: 28488 Lines: 804 From: Thomas Graf Date: Thu, 7 Jul 2005 23:34:50 +0200 > Since I'm changing the classful qdiscs to use a generic API > for queue management anyway I could take care of this if you want. > WRT the leaf qdiscs it's a bit more complicated since we have > to change the new API to take a new struct which includes the qlen > and the sk_buff_head but not a problem either. Ok. I'm going to check something like the following into my tree. It takes care of the obvious cases of direct binary test of queue length being zero vs. non-zero. This uncovered some seriously questionable stuff along the way. For example, take a look at drivers/usb/net/usbnet.c:usbnet_stop(). That code seems to want to wait until all the SKB queues are empty, but the way it is coded it only waits if all the queues have at least one packet. I preserved the behavior there, but if someone could verify my analysis and post a bug fix, I'd really appreciate it. Thanks. diff --git a/drivers/bluetooth/hci_vhci.c b/drivers/bluetooth/hci_vhci.c --- a/drivers/bluetooth/hci_vhci.c +++ b/drivers/bluetooth/hci_vhci.c @@ -120,7 +120,7 @@ static unsigned int hci_vhci_chr_poll(st poll_wait(file, &hci_vhci->read_wait, wait); - if (skb_queue_len(&hci_vhci->readq)) + if (!skb_queue_empty(&hci_vhci->readq)) return POLLIN | POLLRDNORM; return POLLOUT | POLLWRNORM; diff --git a/drivers/isdn/hisax/isdnl1.c b/drivers/isdn/hisax/isdnl1.c --- a/drivers/isdn/hisax/isdnl1.c +++ b/drivers/isdn/hisax/isdnl1.c @@ -279,7 +279,8 @@ BChannel_proc_xmt(struct BCState *bcs) if (test_and_clear_bit(FLG_L1_PULL_REQ, &st->l1.Flags)) st->l1.l1l2(st, PH_PULL | CONFIRM, NULL); if (!test_bit(BC_FLG_ACTIV, &bcs->Flag)) { - if (!test_bit(BC_FLG_BUSY, &bcs->Flag) && (!skb_queue_len(&bcs->squeue))) { + if (!test_bit(BC_FLG_BUSY, &bcs->Flag) && + skb_queue_empty(&bcs->squeue)) { st->l2.l2l1(st, PH_DEACTIVATE | CONFIRM, NULL); } } diff --git a/drivers/isdn/hisax/isdnl2.c b/drivers/isdn/hisax/isdnl2.c --- a/drivers/isdn/hisax/isdnl2.c +++ b/drivers/isdn/hisax/isdnl2.c @@ -108,7 +108,8 @@ static int l2addrsize(struct Layer2 *l2) static void set_peer_busy(struct Layer2 *l2) { test_and_set_bit(FLG_PEER_BUSY, &l2->flag); - if (skb_queue_len(&l2->i_queue) || skb_queue_len(&l2->ui_queue)) + if (!skb_queue_empty(&l2->i_queue) || + !skb_queue_empty(&l2->ui_queue)) test_and_set_bit(FLG_L2BLOCK, &l2->flag); } @@ -754,7 +755,7 @@ l2_restart_multi(struct FsmInst *fi, int st->l2.l2l3(st, DL_ESTABLISH | INDICATION, NULL); if ((ST_L2_7==state) || (ST_L2_8 == state)) - if (skb_queue_len(&st->l2.i_queue) && cansend(st)) + if (!skb_queue_empty(&st->l2.i_queue) && cansend(st)) st->l2.l2l1(st, PH_PULL | REQUEST, NULL); } @@ -810,7 +811,7 @@ l2_connected(struct FsmInst *fi, int eve if (pr != -1) st->l2.l2l3(st, pr, NULL); - if (skb_queue_len(&st->l2.i_queue) && cansend(st)) + if (!skb_queue_empty(&st->l2.i_queue) && cansend(st)) st->l2.l2l1(st, PH_PULL | REQUEST, NULL); } @@ -1014,7 +1015,7 @@ l2_st7_got_super(struct FsmInst *fi, int if(typ != RR) FsmDelTimer(&st->l2.t203, 9); restart_t200(st, 12); } - if (skb_queue_len(&st->l2.i_queue) && (typ == RR)) + if (!skb_queue_empty(&st->l2.i_queue) && (typ == RR)) st->l2.l2l1(st, PH_PULL | REQUEST, NULL); } else nrerrorrecovery(fi); @@ -1120,7 +1121,7 @@ l2_got_iframe(struct FsmInst *fi, int ev return; } - if (skb_queue_len(&st->l2.i_queue) && (fi->state == ST_L2_7)) + if (!skb_queue_empty(&st->l2.i_queue) && (fi->state == ST_L2_7)) st->l2.l2l1(st, PH_PULL | REQUEST, NULL); if (test_and_clear_bit(FLG_ACK_PEND, &st->l2.flag)) enquiry_cr(st, RR, RSP, 0); @@ -1138,7 +1139,7 @@ l2_got_tei(struct FsmInst *fi, int event test_and_set_bit(FLG_L3_INIT, &st->l2.flag); } else FsmChangeState(fi, ST_L2_4); - if (skb_queue_len(&st->l2.ui_queue)) + if (!skb_queue_empty(&st->l2.ui_queue)) tx_ui(st); } @@ -1301,7 +1302,7 @@ l2_pull_iqueue(struct FsmInst *fi, int e FsmDelTimer(&st->l2.t203, 13); FsmAddTimer(&st->l2.t200, st->l2.T200, EV_L2_T200, NULL, 11); } - if (skb_queue_len(&l2->i_queue) && cansend(st)) + if (!skb_queue_empty(&l2->i_queue) && cansend(st)) st->l2.l2l1(st, PH_PULL | REQUEST, NULL); } @@ -1347,7 +1348,7 @@ l2_st8_got_super(struct FsmInst *fi, int } invoke_retransmission(st, nr); FsmChangeState(fi, ST_L2_7); - if (skb_queue_len(&l2->i_queue) && cansend(st)) + if (!skb_qu