Valid XHTML 1.0!


 

(major addition 28. Jun 2002 and 27. Apr 2002, additional scripts)

The word DHTML stands for dynamic HTML: a collection of all non-static HTML elements. These are the risky JavaScript and Java script and programming languages, which use I strongly discourage, and the CGI scripts for active responses and dynamic produced HTML pages. For this last purpose a whole number of languages is used: sh (bourne shell) script, Perl script (use at least 5.6 if you can!), PHP (version 4 recommended) and even classic programming languages like C are sometimes used (or Java too).

What fits to your purpose depends on both your environment and your exact requirements. For a little more complicated actions, like complex multi-field form analysis and reaction often Perl is best suited, for special RDBMS related (SQL query containing) actions PHP is the best selection, for programmers and very special tasks programming in C or Java may be the best choice. But for many (the most often!) simple purposes the traditional shell script (only bourne shell is available generally speaking, but also absolutely portable!) is often the best solution! This I will demonstrate you with the following two simple examples.

Newly I added this script to check the current browser used to visit a site and activating a CGI/SSI, which I use to propose the visitor a more adequate browser as substitute for the one, he/she uses, at least in my view and when no acceptable one in my eyes is recognized (the environment variable in question is the same as in the Apache log: HTTP_USER_AGENT).

Sh is portable, because any UNIX-type system interprets it equally correct, nearly all Internet servers are UNIX based and no true webmaster will use another system then UNIX/LINUX for such a server! And with the mighty awk program (some kind of parent of the Perl scripting language!), which contains a C like programming language, you can easily fulfill the most usual requirements; this has for UNIX experts like me the additional advantage, that the learning of another of the too many languages present may be superfluous... Besides it's usually faster than Perl and PHP.

The first example is redirecting dynamically to another URL (in this case to a randomly chosen one out of a constant list) and the second a threefold poll about a single topic with three facets. Both try to ensure the privacy of the visitors! I hope you find it useful, comments are welcome! (email see below) The only further thing you have to be aware of is, to copy the script(s) into the appropriate directory and to make it executable (depends a little bit from the CGI policy of root/your provider, generally you need a chmod u+x script.cgi or chmod 700 script.cgi or similar).

All echo output lines are simply writing the HTML page, besides the first two lines, which give the executing webserver the necessary hint about the content type.

Redirecting a Visitor to an arbitrary URL

This script (often appending a .cgi extension, eventually instead of this also a .sh extension is recommended or even required!) can be used quite simple: you build just a link with its URL and then it works already. Now a few hints about special constructs in it: because there is a list of cases with one for each URL, you can simply determine the number by counting the occurrence of the case seperator in the script itself! This time awk is used just for getting a random number, for simplicity I give it again the script itself and because it has several lines I have only an END section of awk, because we need only one value. The random generator must be initialized each time with an arbitrary value (srand() without parameter!), because otherwise it would deliver always the same value... Finally simply a minimal HTML 4 page is created with the usual http refresh redirection entry and an hint, which is necessary for example for w3m users (which doesn't redirect automatically), all around the randomly chosen URL. Please pay attention, that I'm not using any CGI specific means to gain informations about the visitor! It only serves for him/her anonymously. It's quite easy to adapt to your own URL list: just replace my ones with yours and delete the remaining respective add further ones --- nothing else has to be done!

A not too simple Poll

In the following I discuss a little more complicated and useful case, all needed ingredients can be downloaded from you as a
GNU zipped tar file or as zip file, because it comes with a total of five files: the static HTML page, from which by a form the script can be activated (and its downsized version used to display the result without voting), the poll script and a slightly changed and reduced version for simply displaying, a little GIF file for a graphical bar and finally the counter file. All five are tightly interdependent and should therefore also logically considered as one unit or package.

The interesting part of the HTML 4 page (valid, what means it conforms to the W3C standard) is the <form> ... </form> section. There the script is set up for work: for simplicity I have chosen GET as method, because this enables direct access onto the variable QUERY_STRING (see below) by the script in action with the server absolute directory path given. It would be not too difficult, to use instead POST as method and to read in for example in the case, that the QUERY_STRING is empty, the string manually into that environment variable in the script. Then nearly nothing would be required to be changed in the script, and it could serve as well a post as a get action invocation! GET means URL encoding, so the parts are delivered with & signs and the values after the = sign in the URL of the script call --- when submit is clicked on. With post the submit value(s) are/is given secretly on the server, so it has to be read (see above) manually. Next are the three groups of radio button choices (you can use even checkboxes instead, all will work fine nevertheless!), where name is serving as key and the value depends from the chosen case. As I will explain below, you could use without any complication even unequal case numbers among the three groups! Below a reduced and slightly modified version of the voting script is called by a simple link, it displays the result without changing the counter (or storing anything else).

The next topic is the counter file: there simply for each radio button key one line is created (you should set the numbers back all to 0, I recommend, as you adapt the keys to your needs!) and therein behind each counter number the corresponding evaluation key is presented. You will see this system at good use in the script itself now, which creates a valid XHTML1.0 document (don't be afraid, there are no problems with displaying it! if you want to put in even the header line of XML, which is not needed for validity, you may copy the header line of this HTML page!):

in line 15 of the voting script (pamPoll.cgi) the umask for the script is set such, that the script created files can't be read by anybody (serves as privacy enhancement) on the server. In line 28 it's first asked, if the user voted at all, if not, a warning is given and nothing else (besides displaying the unchanged result) happens subsequently.

In the regular case of at least one vote given, the last voting IP adress key (not the ID itself!) is stored in a variable and compared afterwards with the current one. If equal, it is straightforward interpreted as cheating and I give a message of disgust to the offender along with supression of counting that inacceptable (double) vote. The tr command mangles simply the URL, so it can't be read easily even by yourself or the root of the server: I want only to ensure, that the poll is not manipulated, but don't want to store true URL informations. For the most widespread cheating practice this is sufficient as a measure against.

In line 38 the counting itself takes place. The QUERY_STRING keys are compared with the keys in the counter file and accordingly the key value is extracted and used as index into the counter line found, while the key names have to be skipped (that's the factor 2). Finally the new counter file is moved onto the old one and the current, mangled IP address is stored for the next invocation.

Starting with lines 48 (table header) respective 51 (table contents) the current contents of the counter are send to output as dynamic HTML: there is intense masking of some characters required, to avoid misinterpretation by awk. Votes and percentages are given along with a graphical percentage bar for all listed single entries. For that the fifth file, the red.png is given as expandable little color area. It's path in the awk img src tag has to be adapted to your one anyway, of course!

With lines 53 and the remainder only the table and the HTML page are finished. It should be now clear, how you can easily adapt this all to your own needs! I have not put in a query, if the sum is bigger than 0 and I propose, that you take part in your own poll initially (why not?) as an usual visitor, to test finally, if all is right and to initialize it with some zero division preventing values. Otherwise you could ask simply, if the sum is bigger 0; failing with a zero division causes no harm, but produces an at least partly empty page.

The second script is a simple reduced and changed version of the first: pam8.cgi has all but the voting, file manipulating part of the poll script version. I prefer single standalone scripts, which can be easily changed later independantly.

A simple, but useful Site internal Search

You can find an example HTML file, the usual dummy png file for the bars and the bourne shell script itself in this tarball.

Some explanations and hints about the script: wwwDir is the root of your HTML pages containing directory; be aware, that this script searches recursively in all sub directories too. So if you don't want this, you have to exclude these undesired behaviour for one or more sub directories explicitly in both find statements with another expression like -a -not -path "$undesDir/*" or similar, where undesDir is a subdirectory contained in wwwDir (only use absolute paths). As usual I recommend on a GNU/Linux system with Kernel 2.4 or higher and enabled /dev/shm (shared memory) device to use that instead /tmp for performance reasons (memory access is even with a good SCSI [raid] system much faster than a disk bound /tmp file system).

This entire script performs a live search, so no database is required, but this may impact the system performance (indexing on a regulary cron base could help out, but then you have to build and retrieve a database, which makes the whole thing much more complicated and sophisticated), so the fact, that usage of the script process ID to prevent interference between several concurrently invoked instances of the script by an Apache (or Netscape Enterprise or another true webserver) makes it safe from confusion doesn't mean, that it is suitable for high-traffic sites with high search rate.

The algorithm is the most important and final thing to explain, I think: the query is separated into (white space delimited) words and then for every word the number of lines, in which it occurs, is counted. The "quality" of the match is finally determined by the product of these line numbers for every word in the search query, so there is a simple and relation between all given search words (like Google does). No more sophisticated ingredients are present, but on a reasonable created site this will give you pretty useful results either way; I use it now not only on my own site here, but for my company (with a little more extra parts, but essentially the same) too, and it seems to be okay in general.


 

back to internet/webmasters main  back to main

remarks etc. to: stefan.urbat@apastron.lb.shuttle.de

(URL:  http://www.lb.shuttle.de/apastron/cgiEx.htm)