Received: with ECARTIS (v1.0.0; list pcp); Sun, 01 Jul 2007 22:48:22 -0700 (PDT) Received: from postoffice.aconex.com (mail.app.aconex.com [203.89.192.138]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id l625mAtL007354 for ; Sun, 1 Jul 2007 22:48:12 -0700 Received: from edge.yarra.acx (unknown [203.89.192.141]) by postoffice.aconex.com (Postfix) with ESMTP id C4DA592C3C8; Mon, 2 Jul 2007 15:48:10 +1000 (EST) Subject: Re: Review: PCP & pmlogger take too long to start From: Nathan Scott Reply-To: nscott@aconex.com To: Michael Newton Cc: pcp@oss.sgi.com In-Reply-To: References: <1182996127.15488.102.camel@edge.yarra.acx> Content-Type: multipart/mixed; boundary="=-eMwRGrfpgKwjtxy6S0KQ" Organization: Aconex Date: Mon, 02 Jul 2007 15:47:18 +1000 Message-Id: <1183355238.15488.217.camel@edge.yarra.acx> Mime-Version: 1.0 X-Mailer: Evolution 2.6.3 X-archive-position: 1287 X-ecartis-version: Ecartis v1.0.0 Sender: pcp-bounce@oss.sgi.com Errors-to: pcp-bounce@oss.sgi.com X-original-sender: nscott@aconex.com Precedence: bulk X-list: pcp --=-eMwRGrfpgKwjtxy6S0KQ Content-Type: text/plain Content-Transfer-Encoding: 7bit Hi Michael, More complete review follows (thanks for implementing pmsleep btw)... On Fri, 2007-06-29 at 18:11 +1000, Michael Newton wrote: > > =========================================================================== > mgmt/pcp/man/man1/pmsleep.1 > =========================================================================== > > --- a/mgmt/pcp/man/man1/pmsleep.1 2006-06-17 00:58:24.000000000 > +1000 > +++ b/mgmt/pcp/man/man1/pmsleep.1 2007-06-29 15:48:28.024750676 > +1000 > @@ -0,0 +1,41 @@ > +'\"macro stdmacro > +.\" > +.\" Copyright (c) 2007 Silicon Graphics, Inc. All Rights Reserved. > +.\" > +.\" $Id$ > +.ie \(.g \{\ > +.\" ... groff (hack for khelpcenter, man2html, etc.) > +.TH PMSLEEP 1 "SGI" "Performance Co-Pilot" > +\} > +.el \{\ > +.if \nX=0 .ds x} PMSLEEP 1 "SGI" "Performance Co-Pilot" > +.if \nX=1 .ds x} PMSLEEP 1 "Performance Co-Pilot" > +.if \nX=2 .ds x} PMSLEEP 1 "" "\&" > +.if \nX=3 .ds x} PMSLEEP "" "" "\&" > +.TH \*(x} > +.rr X > +\} > +.SH NAME > +\f3pmsleep\f1 \- portable subsecond-capable sleep > +.\" literals use .B or \f3 > +.\" arguments use .I or \f2 > +.SH SYNOPSIS > +.B $PCP_BINADM_DIR/pmsleep > +.I interval > +.SH DESCRIPTION > +.B pmsleep > +sleeps for > +.I interval. > +The > +.I interval > +argument follows the syntax described in > +.BR PCPIntro (1) > +for > +.B \-t, > +and in the simplest form may be an unsigned integer > +or floating point constant > +(the implied units in this case are seconds). > + > +.PP > +The exit status is 0 for success, or 1 for a malformed command line. > +If the underlying nanosleep fails, an errno is returned. The SEE ALSO section could probably reference sleep(1) and nanosleep(2). > =========================================================================== > mgmt/pcp/src/pmcd/rc_pcp > =========================================================================== > > --- a/mgmt/pcp/src/pmcd/rc_pcp 2007-06-29 18:09:45.000000000 +1000 > +++ b/mgmt/pcp/src/pmcd/rc_pcp 2007-06-29 16:07:49.625951131 +1000 > @@ -100,6 +100,8 @@ > ;; > esac > > +SLEEPCMND="$PCP_BINADM_DIR/pmsleep 0.1" > + This variable just seems to be obfuscating the logic, I'd remove it. PCP_BINADM_DIR is set in the PATH in /etc/pcp.env, so full path isn't needed there. Since the 0.1 argument is important in several of the uses (as there is control flow assuming tenths of a second in places), the argument should be expanded close to the logic thats using it too. > _pmcd_logfile() > { > default=$RUNDIR/pmcd.log > @@ -383,16 +385,25 @@ > fi > $ECHO $PCP_ECHO_N "Waiting for PMCD to > terminate ...""$PCP_ECHO_C" > gone=0 > - for i in 1 2 3 4 5 6 > + i=0 > + j=0 > + while : > do > - sleep 3 > _get_pids_by_name pmcd >$tmp.tmp > if [ ! -s $tmp.tmp ] > then > gone=1 > break > fi > - $ECHO $PCP_ECHO_N ".""$PCP_ECHO_C" > + i=`expr $i + 1` > + if [ $i -ge 10 ] > + then > + i=0 > + [ $j -ge $delay ] && break > + j=`expr $j + 1` > + $ECHO $PCP_ECHO_N ".""$PCP_ECHO_C" > + fi > + $SLEEPCMND > done > if [ $gone != 1 ] # It just WON'T DIE, give up. Hmmm, thats not right. Firstly, $delay doesn't exist in this script (it looks like this script snippet has been incorrectly duplicated), so you'll get a shell-syntax-error if that branch (with $j) is taken. Secondly, that branch is b0rken in that it will only ever print one '.' character - its meant to print one dot every iteration (or every second now, I guess). You'll want to use an expr modulo (%) there. Thirdly, the while logic is strange - you could just use the normal loop control mechanism (rather than the "while :" and explicit break statements). (I've attached an alternate patch that implements these things). > =========================================================================== > mgmt/pcp/src/pmie/pmie_check.sh > =========================================================================== > =========================================================================== > mgmt/pcp/src/pmlogctl/pmlogger_check.sh > =========================================================================== These two scripts have the same sorts of problems, fixed in attached patch. > > =========================================================================== > mgmt/pcp/src/pmsleep/pmsleep.c > =========================================================================== > ... > +int > +main(int argc, char **argv) > +{ > + struct timespec rqt; > + struct timeval delta; > + int r = 0; > + char *msg; > + > + if (argc == 2) { > + if (pmParseInterval(argv[1], &delta, &msg) < 0) { > + fputs(msg, stderr); > + free(msg); > + } else { > + rqt.tv_sec = delta.tv_sec; > + rqt.tv_nsec = delta.tv_usec * 1000; > + if (0 != nanosleep(&rqt, NULL)) > + r = errno; > + > + exit(r); > + } > + } > + fprintf(stderr, "Usage: pmsleep [-v] interval\n"); There's no -v option. The 'r' variable isn't really necessary - see my attached patch that makes this slightly simpler. Was it intentional to print a usage message when a pmParseInterval error occurs? Seems a bit odd - *shrug*, not a big deal obviously. BTW, good work on finding the DSO agent speedup - that will help across the board (definately makes pmcd stop alot quicker for me). cheers. ps: the attached patch is an incremental patch (on top of yours), applied to my current git tree - so, you may see some fuzzy patch matching due to other pmie start script patches in particular. -- Nathan --=-eMwRGrfpgKwjtxy6S0KQ Content-Disposition: attachment; filename=update_faster_startup Content-Type: text/x-patch; name=update_faster_startup; charset=utf-8 Content-Transfer-Encoding: 7bit Index: pcp/man/man1/pmsleep.1 =================================================================== --- pcp.orig/man/man1/pmsleep.1 2007-07-02 15:33:26.988523000 +1000 +++ pcp/man/man1/pmsleep.1 2007-07-02 15:33:27.180523000 +1000 @@ -35,7 +35,11 @@ for and in the simplest form may be an unsigned integer or floating point constant (the implied units in this case are seconds). - -.PP +.SH DIAGNOSTICS The exit status is 0 for success, or 1 for a malformed command line. -If the underlying nanosleep fails, an errno is returned. +If the underlying +.B nanosleep (2) +system call fails, an errno is returned. +.SH SEE ALSO +.BR sleep (1), +.BR nanosleep (3). Index: pcp/src/pmcd/rc_pcp =================================================================== --- pcp.orig/src/pmcd/rc_pcp 2007-07-02 15:33:27.036523000 +1000 +++ pcp/src/pmcd/rc_pcp 2007-07-02 15:38:31.376523000 +1000 @@ -117,8 +117,6 @@ in ;; esac -SLEEPCMND="$PCP_BINADM_DIR/pmsleep 0.1" - _pmcd_logfile() { default=$RUNDIR/pmcd.log @@ -403,28 +401,16 @@ _shutdown() $PCP_KILLALL_PROG -TERM pmcd > /dev/null 2>&1 fi $ECHO $PCP_ECHO_N "Waiting for PMCD to terminate ...""$PCP_ECHO_C" - gone=0 - i=0 - j=0 - while : + delay=200 # tenths of a second + while [ $delay -gt 0 ] do _get_pids_by_name pmcd >$tmp.tmp - if [ ! -s $tmp.tmp ] - then - gone=1 - break - fi - i=`expr $i + 1` - if [ $i -ge 10 ] - then - i=0 - [ $j -ge $delay ] && break - j=`expr $j + 1` - $ECHO $PCP_ECHO_N ".""$PCP_ECHO_C" - fi - $SLEEPCMND + [ ! -s $tmp.tmp ] && break + pmsleep 0.1 + delay=`expr $delay - 1` + [ `expr $delay % 10` -ne 0 ] || $ECHO $PCP_ECHO_N ".""$PCP_ECHO_C" done - if [ $gone != 1 ] # It just WON'T DIE, give up. + if [ $delay -eq 0 ] # It just WON'T DIE, give up. then echo "Process ..." cat $tmp.tmp Index: pcp/src/pmsleep/pmsleep.c =================================================================== --- pcp.orig/src/pmsleep/pmsleep.c 2007-07-02 15:33:27.168523000 +1000 +++ pcp/src/pmsleep/pmsleep.c 2007-07-02 15:33:27.212523000 +1000 @@ -13,7 +13,6 @@ main(int argc, char **argv) { struct timespec rqt; struct timeval delta; - int r = 0; char *msg; if (argc == 2) { @@ -23,13 +22,11 @@ main(int argc, char **argv) } else { rqt.tv_sec = delta.tv_sec; rqt.tv_nsec = delta.tv_usec * 1000; - if (0 != nanosleep(&rqt, NULL)) - r = errno; - - exit(r); + return (nanosleep(&rqt, NULL) == 0) ? 0 : errno; } + } else { + fprintf(stderr, "Usage: pmsleep \n"); } - fprintf(stderr, "Usage: pmsleep [-v] interval\n"); exit(1); /*NOTREACHED*/ } Index: pcp/src/pmie/pmie_check.sh =================================================================== --- pcp.orig/src/pmie/pmie_check.sh 2007-07-02 15:33:27.072523000 +1000 +++ pcp/src/pmie/pmie_check.sh 2007-07-02 15:33:27.228523000 +1000 @@ -31,8 +31,6 @@ PMIE=pmie -SLEEPCMND="$PCP_BINADM_DIR/pmsleep 0.1" - # added to handle problem when /var/log/pcp is a symlink, as first # reported by Micah_Altman@harvard.edu in Nov 2001 # @@ -177,15 +175,13 @@ _lock() { # demand mutual exclusion # - fail=true rm -f $tmp.stamp - i=0 - while : + delay=200 # tenths of a second + while [ $delay -ne 0 ] do if pmlock -v $logfile.lock >$tmp.out then echo $logfile.lock >$tmp.lock - fail=false break else if [ ! -f $tmp.stamp ] @@ -199,12 +195,11 @@ _lock() rm -f $logfile.lock fi fi - [ $i -ge 200 ] && break #tenths of a sec - $SLEEPCMND - i=`expr $i + 1` + pmsleep 0.1 + delay=`expr $delay - 1` done - if $fail + if [ $delay -eq 0 ] then # failed to gain mutex lock # @@ -311,10 +306,8 @@ _check_pmie() # wait for maximum time of a connection and 20 requests # - delay=`expr $delay + 20 \* $x` - i=0 - j=0 - while : + delay=`expr \( $delay + 20 \* $x \) \* 10` # tenths of a second + while [ $delay -ne 0 ] do if [ -f $logfile ] then @@ -327,7 +320,7 @@ _check_pmie() then : else - $SLEEPCMND + pmsleep 0.1 $VERBOSE && echo " done" return 0 fi @@ -365,15 +358,10 @@ _check_pmie() return 1 fi fi - i=`expr $i + 1` - if [ $i -ge 10 ] - then - i=0 - [ $j -ge $delay ] && break - j=`expr $j + 1` - $VERBOSE && $PCP_ECHO_PROG $PCP_ECHO_N ".""$PCP_ECHO_C" - fi - $SLEEPCMND + pmsleep 0.1 + delay=`expr $delay - 1` + $VERBOSE && [ `expr $delay % 10` -eq 0 ] && \ + $PCP_ECHO_PROG $PCP_ECHO_N ".""$PCP_ECHO_C" done $VERBOSE || _message restart echo " timed out waiting!" @@ -681,14 +669,14 @@ then then $VERY_VERBOSE && ( echo; $PCP_ECHO_PROG $PCP_ECHO_N "+ $KILL -KILL `cat $tmp.pmies` ...""$PCP_ECHO_C" ) eval $KILL -KILL $pmielist >/dev/null 2>&1 - i=0 + delay=30 # tenths of a second while ps -f -p "$pmielist" >$tmp.alive 2>&1 do - if [ $i -lt 30 ] + if [ $delay -gt 0 ] then - $SLEEPCMND - i=`expr $i + 1` - continue; + pmsleep 0.1 + delay=`expr $delay - 1` + continue fi echo "$prog: Error: pmie process(es) will not die" cat $tmp.alive Index: pcp/src/pmlogctl/pmlogger_check.sh =================================================================== --- pcp.orig/src/pmlogctl/pmlogger_check.sh 2007-07-02 15:33:27.088523000 +1000 +++ pcp/src/pmlogctl/pmlogger_check.sh 2007-07-02 15:33:27.244523000 +1000 @@ -68,8 +68,6 @@ then PWDCMND=/bin/pwd fi -SLEEPCMND="$PCP_BINADM_DIR/pmsleep 0.1" - # default location # logfile=pmlogger.log @@ -211,10 +209,8 @@ _check_logger() # wait for maximum time of a connection and 20 requests # - delay=`expr $delay + 20 \* $x` - i=0 - j=0 - while : + delay=`expr \( $delay + 20 \* $x \) \* 10` # tenths of a second + while [ $delay -gt 0 ] do if [ -f $logfile ] then @@ -226,7 +222,7 @@ _check_logger() then : else - $SLEEPCMND + pmsleep 0.1 $VERBOSE && echo " done" return 0 fi @@ -263,15 +259,10 @@ _check_logger() return 1 fi fi - i=`expr $i + 1` - if [ $i -ge 10 ] - then - i=0 - [ $j -ge $delay ] && break - j=`expr $j + 1` - $VERBOSE && $PCP_ECHO_PROG $PCP_ECHO_N ".""$PCP_ECHO_C" - fi - $SLEEPCMND + pmsleep 0.1 + delay=`expr $delay - 1` + $VERBOSE && [ `expr $delay % 10` -eq 0 ] && \ + $PCP_ECHO_PROG $PCP_ECHO_N ".""$PCP_ECHO_C" done $VERBOSE || _message restart echo " timed out waiting!" @@ -403,10 +394,9 @@ s/^\([A-Za-z][A-Za-z0-9_]*\)=/export \1; else # demand mutual exclusion # - fail=true rm -f $tmp.stamp - i=0 - while : + delay=200 # tenths of a second + while [ $delay -gt 0 ] do if pmlock -v lock >$tmp.out then @@ -434,12 +424,11 @@ s/^\([A-Za-z][A-Za-z0-9_]*\)=/export \1; rm -f lock fi fi - [ $i -ge 200 ] && break #tenths of a sec - $SLEEPCMND - i=`expr $i + 1` + pmsleep 0.1 + delay=`expr $delay - 1` done - if $fail + if [ $delay -eq 0 ] then # failed to gain mutex lock # --=-eMwRGrfpgKwjtxy6S0KQ--