Received: by oss.sgi.com id ; Wed, 21 Jun 2000 04:02:46 -0700 Received: from [203.126.247.144] ([203.126.247.144]:51422 "EHLO zsngs001") by oss.sgi.com with ESMTP id ; Wed, 21 Jun 2000 04:02:28 -0700 Received: from zsngd101.asiapac.nortel.com (actually znsgd101) by zsngs001; Wed, 21 Jun 2000 19:02:02 +0800 Received: from zctwb003.asiapac.nortel.com ([47.152.32.111]) by zsngd101.asiapac.nortel.com with SMTP (Microsoft Exchange Internet Mail Service Version 5.5.2650.21) id NBKHB5LZ; Wed, 21 Jun 2000 19:02:05 +0800 Received: from pwold011.asiapac.nortel.com ([47.181.193.45]) by zctwb003.asiapac.nortel.com with SMTP (Microsoft Exchange Internet Mail Service Version 5.5.2650.21) id NCLF8D8W; Wed, 21 Jun 2000 21:02:09 +1000 Received: from uow.edu.au (IDENT:akpm@[47.181.194.109]) by pwold011.asiapac.nortel.com (8.9.3/8.9.3) with ESMTP id VAA30260; Wed, 21 Jun 2000 21:02:11 +1000 Message-ID: <3950A1C1.72CDA79B@uow.edu.au> Date: Wed, 21 Jun 2000 21:06:41 +1000 X-Sybari-Space: 00000000 00000000 00000000 From: Andrew Morton X-Mailer: Mozilla 4.7 [en] (X11; I; Linux 2.2.14-15mdk i586) X-Accept-Language: en MIME-Version: 1.0 To: Keith Owens CC: "netdev@oss.sgi.com" Subject: Re: modular net drivers, take 2 References: Your message of "Wed, 21 Jun 2000 04:45:19 GMT." <3950485F.AD6F58C6@uow.edu.au> <13465.961564584@kao2.melbourne.sgi.com> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-Orig: Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Keith Owens wrote: > > On Wed, 21 Jun 2000 04:45:19 +0000, > Andrew Morton wrote: > >Plan M: > > > >sys_delete_module() doesn't do the vfree(). It schedules it for 5 > >seconds in the future. Or provides a mechanism for userland to do this. > > Does not fix the problem where module open() code is running at the > same time as module_exit() code which tears down the kernel structures > that open needs. Freeing the code pages while they are in use is only > one of the possible failure modes for this race. Aaaah. I can see it all clearly now. The fundamental misdesign here is that register_netdevice() and unregister_netdev() are called from within the module constructor and destructor functions. You see, right now a module "owns" a number of netdevices. This is wrong. It should be that a module "owns" a driver (pci_driver) and that driver "owns" a number of netdevices: module_init() registers the availability of the driver. refcount remains zero driver *open_module("eepro100") Locates the driver. Increments module refcount. No netdevices open yet. Shold not be called from within module_init! That's the current problem. device *open_driver(driver *) Creates an interface instance. Increments an 'opencount' in the driver. Critically: register_netdevice is called olny now. close_device(device *) Unregister_netdev() close_driver(driver *) Fails if any devices are open. Decrements module refcount to zero. Now, no drivers are registered, so nobody can access "eth0", or device->open or anything else. module_exit() Unregisters the driver availability. Safe. We just need to prevent module_exit() from racing against open_driver(). Just a spinlock. Too late for all of this.