Valid XHTML 1.0!


 

(update 14. Jun 2001)

You may have seen or even used already some scripts I presented before. Now at my office plans are ongoing, to substitute the ISDN dialup connection by a permanent connection. Inspired by this opportunity to dismiss the tedious and not always reliable diskettes from my SETI effort, I created two new scripts, tightly intertwined, for a complete automation of the process without any manual action required in future there. Due to some more or less expected delays I couldn't apply them therefore for months... Meanwhile I have tested both scripts on several LINUX systems (Kernel 2.2.14 and 19 and glibc 2.1) as well as under Solaris 8 on SPARC, and before as well with a true bourne shell (LINUX ash) as with the usual (besides Debian) "fake" sh (soft link to bash!) and it worked after some corrections pretty well; you will be the final tester anyway. Be cautious not to overload either your LAN or the Berkeley server! I hope, that this won't happen, but some careful observation is required, when these scripts are applied in the beginning.

My motivation not to choose the simple cron entry solution mentioned in the readme file of the client is twofold: first, I want to be ready for another outage of the insane California power supply or for the often overloaded connection server in Berkeley or for their bandwidth limit (30% of the total campus traffic). Therefore still several directories are used, and some clients may continue, even if for a certain time connection trials fail. The other is a more personal documentation love: only by decoupling the crunching process phase from the transmission phase I'm able to copy the result files into the upper directory, before the transfer deletes it locally, just as it is now respective was the case before without automatic connection allowed.

Now a short explanation of the scripts will follow. They are not hard to understand for experienced sh/ksh/bash script writers, but some hints are always welcome respective necessary, I think:

The first script launches the transmission script as a background task and is itself the only one to enter in the crontab. It exports some variables to the called script, which are useful in both. The UNIX 98 standard conformant ps options should work on any sufficient modern UNIX system (for sure on Solaris since 7 and AIX since 4.3), like current LINUX distributions, Solaris 2.6, AIX 4.2 and a number of others. As usually, the script has to check first (after launching the transfer script), if there are any client proecesses lacking to count up to your choice minSetiNum. This last one should be chosen reasonably, as 1 for 1 CPU machines in general, with sufficient power you may take even 2, on not overloaded servers with severals CPUs it depends from their usage, average load, your allowances and judgment respective posture. For example, once I administered a SUN SPARCserver 1000 with 8 CPUs, and by controlling the load, I saw always 6 CPUs idle (!), one running at average with 50 % load and one with about 5 % in the mean. Under such circumstances you may run any number from 5 to 9 client processes concurrently, with 5 as conservative posture (1 spare CPU at average for unexpected peaks without any compromise of performance) and 9 the brave choice, exploiting the high nice level, which should leave enough CPU power for all other processes and wasting no idle time at all --- even when one finishes, the cron daemon will generally launch another one before the next terminates.

Either way, next just waiting for eventually downloads of new work units by the transfer script seems reasonable, you may change this value (three minutes) to adapt your needs better? To process all work units in the right order, the oldest one is tried first, and when tried, after a short possible latency time (older, slow CPUs or heavy load, the last should not happen of course!) either a new one was launched or tried in vain (one already running, as always client checks itself for it).

The iteration ends (respective doesn't start at all), if either enough clients are running, or if all available work units are already served and not all configured processes could be started. The process ID has to be appended to the temporary files name, because several instances of the script could run in parallel, launced by cron. Keep in mind, that only by this script started nice cruncher processes are recognized as running one; if you start another one manually not with the exactly same nice value as first option, it's ignored and you get one process more than configured. Important hint: since the function match is supported only in nawk and gawk, you have to confirm the usage of the right one for this script or better to explicitly set nawk or gawk in the at the script beginning defined Nawk variable! Some other old-fashioned parameter conventions are now used, to relax the need for not always present modern versions of tools like head etc.

The second script is a little more simple. It's mainly trying to transmit subsequently any present results and to download every missing work unit to the working directories. The long timeout (3 hours) is acceptable and a good value due to my own experience and conformant with Berkeleys recommandations --- do you remember, that the client retries in case of not total blackout of the connection itself a few times? This time blocking by another process has to be asked explicitly by the in all known versions equal string as indicator: the script needs to know, if another one is already trying there --- this will happen at longer outages for sure. This should be enough for all circumstances, at least I hope so! A minor bug was fixed (numbering began with 0 instead of 1 for the copied result files).

Concluding remark: The number of directories to create depends solely from the average turnaround time for one work unit and the time you want to be safe against any connection problems. Maybe you use 6 to 7 days at most for processing as many units as diretories are present. This is sufficient even for outstanding long outages as witnessed by the vandalism incident in begin of 2001, but prevents too dated results transmitted back (compared to their work units) in general too. The minimum number of directories respective maximum minSetiNum value is evidently limited by each other, for practically reasons, the minSetiNum has to be always lower then the number of used directories. Otherwise you can use the readme recommended simple way obviously.


 

back to SETI main  back to SETI hints  back to main

remarks etc. to: stefan.urbat@apastron.lb.shuttle.de

(URL:  http://www.lb.shuttle.de/apastron/setiMgTr.htm)