Received: by oss.sgi.com id ; Tue, 20 Jun 2000 22:43:13 -0700 Received: from [203.126.247.144] ([203.126.247.144]:22713 "EHLO zsngs001") by oss.sgi.com with ESMTP id ; Tue, 20 Jun 2000 22:42:57 -0700 Received: from zsngd101.asiapac.nortel.com (actually znsgd101) by zsngs001; Wed, 21 Jun 2000 13:41:59 +0800 Received: from zctwb003.asiapac.nortel.com ([47.152.32.111]) by zsngd101.asiapac.nortel.com with SMTP (Microsoft Exchange Internet Mail Service Version 5.5.2650.21) id NBKHBY2V; Wed, 21 Jun 2000 13:42:03 +0800 Received: from pwold011.asiapac.nortel.com ([47.181.193.45]) by zctwb003.asiapac.nortel.com with SMTP (Microsoft Exchange Internet Mail Service Version 5.5.2650.21) id NCLF8DSZ; Wed, 21 Jun 2000 15:42:06 +1000 Received: from uow.edu.au (IDENT:akpm@localhost [127.0.0.1]) by pwold011.asiapac.nortel.com (8.9.3/8.9.3) with ESMTP id PAA28679; Wed, 21 Jun 2000 15:41:58 +1000 Message-ID: <395055A6.25FC7058@uow.edu.au> Date: Wed, 21 Jun 2000 05:41:58 +0000 X-Sybari-Space: 00000000 00000000 00000000 From: Andrew Morton X-Mailer: Mozilla 4.61 [en] (X11; I; Linux 2.4.0-test1-ac10 i686) X-Accept-Language: en MIME-Version: 1.0 To: Keith Owens CC: "netdev@oss.sgi.com" Subject: Re: modular net drivers, take 2 References: Your message of "Wed, 21 Jun 2000 04:24:01 GMT." <39504361.81F03943@uow.edu.au> <13370.961564335@kao2.melbourne.sgi.com> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-Orig: Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Keith Owens wrote: > > ... > >sys_delete_module() > >{ > > ... > > spin_lock(&module_deletion_lock); > > blocked_cpus = 1 << smp_processor_id(); > > while (blocked_cpus != ((1 << smp_num_cpus) - 1)) > > ; > > { > > I think the only code whcih needs to go in > > here is the call to vfree(module). > > sys_delete_module() -> free_module() -> mod->cleanup() -> module_exit() > which is entered with module_deletion_lock held. You just constrained > all module cleanup code to never sleep - no chance. I was proposing that the global CPU-grab only need surround the vfree(). It's more of a way of getting all CPUs into a known state than a lock. So sys_delete_module would be more like: sys_delete_module() { free_module_no_vfree(); spin_lock(&module_deletion_lock); /* Something like this... */ for_each_task(tsk) { if (tsk->state == TASK_RUNNING) tsk->need_resched = 1; } blocked_cpus = 1 << smp_processor_id(); while (blocked_cpus != ((1 << smp_num_cpus) - 1)) ; vfree(module); spin_unlock(&module_deletion_lock); } > For sys_delete_module() to "grab" the entire machine it has to exclude > all processors from entering the module being unloaded (not too > difficult), OK, they're all known to be spinning on module_deletion_lock. > to verify that no processor is currently executing the code > pages (a bit harder) OK, they're all running in wait_while_unloading() > and that no suspended process or timer queue will > ever pop its stack and return into those code pages (the really hard > bit). I think any code which got into a situation like this when the module refcount is zero would be rather broken.