More LINUX links and related Stuff

(last updates 7. Jul and 1. Nov 2002, about Internet Browsers, web page statistics analysis and the Google search (r)evolution and its "offspring")


 w3m --- a capable, but small and fast text browser (English)
 
Many of you will often need only a text browser, saving time and money through ignoring the graphical browser habit of automatically loading all pictures of any URL you choose. Because the well-known text browser Lynx can't cope with frames and multi column tables, you can now forget Lynx and use instead this small, but more capable text browser, which you can get either as UNIX/LINUX C source package or as MS Windows executable. Not only on SPARC/Solaris, but also in Intel/LINUX you can get by applying strip onto the compiled executable this down to a file size of about 300 kBytes. Because w3m is developed by a Japanese, you may also read the Japanese pages and/or source/executable versions (!). Remark: targets with spaces in between are not correctly interpreted, but this should be fixed in the final release later on.

 

Junkbuster - Advertisement Blocker

At this site you can get a very desirable tool for most of us Internet surfers: it helps to get finally rid of (nearly) all annoying and your money wasting advertisement images and the aggressive cookies. Only popup windows are not inhibited, to avoid these also you have to disable yourself your JAVA(script) features. But with the advertisements and cookies you can configure yourself, which - not often necessary - ones also to exclude or which ones should be accepted nonetheless. Junkbuster works as a proxy on port 8000 with your localhost and is run as a usual background task (preferably launched by nobody) in UNIX-type operational systems like LINUX. For several platforms the executables are available, but it's also simple to recompile the C source, if you need to do so. The file sizes aren't very large. Be aware, that despite it's also available for MS systems (in contrary to Siemens' WebWasher not only for these insane platforms!), it's primarily for us UNIX users. And it's also totally free. Current remark: meanwhile the advanced privoxy 3.0 (the official successor of junkbuster) is taking over from junkbuster, it has HTTP 1.1 support, JavaScript blocking and browser configuration support. You should check it out and maybe substitute junkbuster with it!

 

Macromedia LogoMacromedia Download Center
 
More and more multimedia is presented in the Internet. Some of the sites mentioned in my homepage offer contents, which can be viewed only with the so-called Shockwave Flash player plugin (astronomy: windows to the universe, Pamela Lee: official page and V.I.P. official page, People and the Internet and celebrity resources: Sampras official page) or related tools from Macromedia. At the above URL you can get especially the Flash plugin as download for free. At now there are versions of this plugin available for LINUX and Solaris and new IRIX (about 500 kByte archive and 900 kBytes decompressed for LINUX, sizes similar for other UNIX systems), MacIntosh and also the MS platforms, all in version 5 (only the IRIX, 680X0 Mac and OS/2 versions were at last still in version 4). Simply follow the instructions on the download page. On sites with Shockwave contents the right mousebutton menu changes its contents, despite you have some control to play, stop, rewind and zoom the shockwave parts you can't save anything with it.

 
A few hints for searching the Internet for any Subject

There are mainly four methods for searching (I will ignore newsgroups and personal mails in this respect):

  • simple guessing of an URL: you can find a considerable number of homepages easily by typing of a probable URL: for example http://www.sun.com is indeed the official homepage of SUN microsystems incorporated, a commercial site, or http://www.linux.org is the official site of the non-profit organization for LINUX development.

  • if you know already one or more pages of the desired matter, you have often the chance to use lists of links on these sites - this is a fast lane to recommended sites...

  • search the webrings, mainly  http://www.bomis.com  and  http://www.webring.org  and meanwhile especially as substitution for the since the Yahoo action against webring nearly worthless yahoo.webring also  Netring system  for a collection of sites about your items. In the BOMIS one you can find more easily the appropriate category, in the WEBRING and similar NETRING you will be more easily able to navigate through all member sites of a ring, you have entered. Often you will find such a ring entry also on a page, which you have found by other means.

  • advanced search engines of new type like Google --- these are virtually immune against any spamming abuse by employing a referencing system through the hyperlinks in the Internet. This one is already the leading in content (relevance and also with respect of number of web pages); its only actual disadvantage is its relatively slow reaction to changes in the web, mainly due to its time consuming way of ranking the pages (the pages database is built roughly once a month completely new, no minor changes occur in between!). It's extremely highly reliable and fast, due to it's 6,000 and more LINUX machines cluster hardware too. But in the first place it delivers reliable high quality sites on top, as you will witness by trying it:
    Google
    And there are a few other remarkable offers from Google: you can search also the biggest human built web directory dmoz.org via Google, which gives you the benefit of largely good sorting of the entries; you can search the Newsgroups by Google (formerly the DejaNews database, featuring entries as fresh as one day old too) and newest even search for images by Google as another useful, specialized service. Meanwhile they have added simple text link tabs on all of these searches, so you can now easily change from web to groups search and so on without typing the search request itself again. Wow: they are always topping themselves! The special LINUX search was already mentioned, also versions for BSD UNIX, Macintosh, US government, universities and military are available. At last Google revealed another way to search as beta (not yet tabbed on the entry page): a news search service with headlines, another really "real time" way to search at Google (as mentioned, due to the build technique applied, the database especially about the web, which is searched by visitors, tends to be four weeks old at average, with a range of about two to six weeks). Current hints: with begin of summer 2002 another similar search engine, based on hyperlink rating of web pages, has officially launched as beta (while indexing was going on already a considerable time before obviously), still displaying sometimes a little strange behavior. This OpenFind is your only alternative in these modern style engines at now together with WiseNut Search. Be aware, that the last is much less elaborate than Google at now, despite it has not much less pages indexed than Google with comparable quality and some nice features as likely further search categories (Wise Guise), and is technically not as strong as Google, because it's based on feeble, buggy and easy to attack M$ IIS systems for the visitors --- when heavy load occurs, after the site has become more familiar, they will encounter without any doubt serious availability and reliability problems never encountered with Google. But maybe they learn, that any kind of UNIX (with LINUX as best bet for a migration) together with Apache can help them out of these foreseeable problems. While WiseNut has still to prove it's potential, OpenFind looks very promising and features a proxy cache similar as Google offering the crawled documents in case of removals respective temporary offtimes.

    The newest addition to your search box should be Teoma search, which hasn't even nearly as much pages in the database as Google, OpenFine or WiseNut and has similar drawbacks technically as WiseNut, but delivers good overviews of close sites about or links to a topic ---- it can especially help you to find an entry point to more stuff regarding your search. For my more comprehensive overview about the search topic see there.

  • the "ultima ratio", and most often no very easy or at all good choice, are the classical search engines. If you use AltaVista, All the Web FAST search or any other is mostly depending of your preferences, AltaVista is (was?) the fastest reacting of these, the very big All the Web has improved largely at last too; but this poses generally serious problems. In most cases you will either get too many or sometimes too scare or even no hits by a given search request, depending, if it was relatively general or very special. The so-called "spamming" of URLs is one of the problems involved: misleading keywords compared with the contents or even lists of words supposed to be searched often after as sheer high-ranking means are very nasty methods of reducing the value of the classical search engines. Therefore you should enter requests of the following type, for example:

    +"gabriela sabatini" -sex

    This will guide you at least partly to the aim: all major sites dealing with her will mention her complete name and as a general rule lower case letters are preferable for search requests; the "bracing" holds together words seperated by spaces or another word limiting characters and the + forces the presence of the string (sequence of characters) immediately following in any listed site. The - works contrary, excluding undesired contents; because in this case you will avoid most nasty sex sites, abusing her name as means of getting visitors (in her case clearly a pure abuse!). To perform the last one  better, you can even use - before the infamous "four-letter word". One final remark: you can try to widen the search by using "meta engines" like Dogpile or profusion, which utilize and combine the results of several stand-alone search engines. But in my view the benefit of these is limited: in the first place you get only even more spam, but hardly more useful results.


 

A few remarks about publishing in the Internet

Because at first you need some space for your stuff, I have recently added a page regarding free web space hosting. You may start there, if you have any need for it (for a small, text dominated homepage often the space your ISP gives without additional cost to you will suffice).

If anybody of you plans to go online with an own homepage, you have to be warned, that it's no easy task to get even minor attention from visitors and the most difficult, to be recognized at all by a greater number of visitors. There are three principal possibilities, to get visitors besides personal known people:

  • the classical approach is uploading into search engines: this is some sort of requirement, but you should not expect too much from it. The logic of the diverse engines is quite different and therefore it's virtually impossible, to do this for all engines with the same result. But the worst thing is the huge number of pages in the net. If you haven't a well suited matter, which is very precisely and impossible to confuse with others, or there are many pages about the subject, the visitors have nearly no chance, to find your page by search engine usage. Nevertheless you should enter the major engines like AltaVista, Yahoo! (a directory, to be correct, see below), All the Web FAST search and some others; the easiest way is probably through the Netscape Search Center and there exists also the possibility, to join most of these services by one submission. Also possible and recommendable is a trial to submit to the Open Directory, the largest human edited web directory today. Once more I underline the usefulness of submissions to Google for a much more comprehensive and despite this relatively complete way of publishing --- yours will not be obscured by spam pages... The same as with Google holds true for the also already mentioned WiseNut Search, which is relatively new, but promises and gives a lot too. Another rather useful search tool is since a short time the engine/reference directory, but submissions to it have to be paid... While Google is an automaton, Yahoo! is kept up by human reviewers (!), which poses difficulties with the ever faster growing web. For additional and actual informations about search engines, look at this important site: Search Engine Watch. And who wants to know even more regarding details, for example an independent technical evaluation of sizes and features, there is a comparison site too.

  • if you know somebody or think, he or she would like to set a link to your page, than you can at least in the case, that the other page gets a lot of visitors, gain a considerable advantage from this most personal approach for Internet publishing, if you get a link entry on that page. Often this is practised as "banner exchange", what means nothing else, than you set a link back to the other page, which mentions yours.

  • the probably best way is joining a web ring about the subject: at least three major systems are established:  BOMIS  ,  WEBRING and NetRings --- the last is quite similar to Webring and therefore not treated seperately in this item. There are only slight differences in the used method to join (see below). They have in common, that they offer an entire system of rings with different subjects, so you can choose the most appropriate. This task is more easily performed for BOMIS, because they use a tree for arranging the different subjects, and you can find a certaing ring sometimes through different paths through the tree. In WEBRING, you have to do a little more search, because the categories are more general and list all rings of it together. After choosing the most appropriate, you have to send your URL, subject/title and email address and in the case of WEBRING also a password (use a medium good one, it is not encoded!) and a short description of the contents on the appropriate page to the service. The next step is getting a confirmation mail about your submission. There you get your site ID and a fragment of HTML code, which you - if you get one - have to manually (or by copying with the mouse) insert into the source of your submitted page. Than you send a mail to the webmaster, that you have done this and request the check by him, if she or he accepts the page for "her/his" ring. If you get an acceptance mail, you have done all and can be sure, that the visitor count will increase considerably. The code you get by WEBRING makes it easy for visitors, to scan the entire ring you are a member of, by clicking on the appropriate buttons of the received HTML code. But also for BOMIS you should at least include the link (if you didn't get a code fragment) to the used BOMIS ring as text.

  • the statistic of the rings can be found at BOMIS report pages , there only the sums of all hits (pageviews) to the rings per week are presented, but they want to refine it in future and the changes of BOMIS seem now to be completed. The formerly more elaborate webring statistics seem to be a victim of the Yahoo! takeover of the webring.org --- at least I didn't manage so far to get them in a similar way like it was some time ago! Remark: the RingSurf offers quite an easy way to get the statistics in a medium elaborated manner --- more detailed than in BOMIS, but less than it was formerly in Webring. Just list the sites of a ring, then you will find in the upper left corner a statistics link!

  • If you are user of the webalizer webserver statistics analysing tool, than I can offer the users among you with access to any UNIX platform a few shell scripts for better using the raw statistics. These ease up the monthly statistics and offer newly as well support for long-term analysis. If you have no UNIX system, you may run the scripts on your providers nearly for sure UNIX server?! (alternatively you may get the powerful and highly useful GNU tools for M$ Win too, which are by far superior to all of the Redmond gang produced feeble commands) For even more detail about web server log analysis look on the related LINUX package page.

  • If you are interested in some way in CGI scripts, you can find two simple examples of bourne shell scripts on a special page. These should fit many of the usual purposes after minor adaptations!

  • one final remark: if you treat different subjects on your page like me, the above methods should be used for each considerable part of your page, so you should not forget, to let the search engines know the required pages too (automatic recursion works seldom and is also not flexible) in contrast to your other (sub-)pages, which should forbid the indexing!

About the HTML Syntax: URL to get sp...

 
Some remarks about HTML and the XML language, to use in future, are appropriate, I guess: mostly pages for the Internet are not created by computer experts, like programmers or similar. And the variety of the used means to do so, ranges from purely typing the HTML code directly to advanced tools like the Netscape composer, which offer WYSIWYG abilities in Windows style. But either way, you will be surprised to witness the stream of errors, which comes with most HTML pages! The reason is, that Internet browsers are very sloppy in interpreting HTML code, they present all, for which they can assign any sense in the used elements - disregarding the strong rules of HTML. This means, that often HTML pages are incorrect, despite the desired appearance even in more than one browser type! Now you can check with this in SuSE contained package with name sp (means SGML parser) the pages, if you want to do so. Despite it is primarily designed to convert HTML into XML (we will all one day need such a tool - but beware, don't use XML now, because there is currently virtually no browser, which is capable to display such pages yet correct!), it is very useful for detecting faults produced by human and/or automatic HTML encoders! And now the XML mode is useful for the W3C recommended, vastly HTML4.0 compatible XHTML1.0. For example, the Netscape composer (versions since 4.5) is wrong in placing the DOCTYPE top line (wrong upper/lower case usage, because this is the only line, which is not yet HTML code!), omits the "" in the relative font size tags (for example font size=+2 instead correctly font size="+2") and places by default illegal constructs like NOSAVE and COLS=n into table headers! The open source successor Mozilla, while still a little buggy and in beta test phase, offers at least since version 0.8 a valid HTML4.01 page composer, so far I can judge, and first measures to cope also with XML, not only with XHTML, despite not yet completely, but already superior to the shitty M$ IE >=5.5. But even Netscape produces not many faults, compared to most hand coded pages, this I can assure you after a number of tests! In the future I will try to eliminate as much faults as I can from my composer made pages... Meanwhile I removed all faults and write the pages now myself with a common prototype file and pure manual editing with vim (no kidding!). This works great together with the SGML parser for the final control. (only one page remains no valid HTML4.0, can you tell which one it is? --- hint: it's not far away from this page... I couldn't resolve the problem yet because of includes of foreign code fragments, which are often offending the W3C HTML rules)

 

W3C Norm Organization
 
At this homepage you can get all the stuff about actual (now HTML4.0 with upcoming XHTML respective XML) web oriented SGML (standard general markup language) in different versions: as plain text file, as fine compressed GNU Zip file, as less well compressed MS Zip file (also feasible, but useless for us UNIX/LINUX users) and even as Acrobat PDF file (GNU Zip file). I have to admit, that you will have an advantage against me, if you make appropriate use of style sheets (tag style) in your documents - I do this not yet; also the usage of elements like <br> or &nbsp; is strongly discouraged by the organization - such you will frequently find for example in Netscape Composer made HTML pages.

 
 
 
German Flag Nun folgen zwei Hinweise in Deutsch für all jene, die sich eine eigene Homepage schaffen oder ihre bereits vorhandene verbessern wollen. Da die meisten Provider inzwischen eine Homepage mit ein paar MB Umfang in ihrem Internet-Paket mit anbieten, dürfte dies auch einigermaßen kostenneutral sein - solange man nicht Video-Animationen oder Unmengen von Grafiken unterbringen möchte. Denn zusätzlichen Platz muß man teuer bezahlen; aber auch so erhöht das Anlegen einer Homepage i.a. die Online-Zeit und die Kosten, wobei meist das Kopieren der auf dem eigenen Rechner erstellten Seiten per ftp o.ä. auf den Server des Providers zumindest für ISDN-Nutzer das wenigste ist.

 

German flagGOLDHTML logoGold HTML - goldene Regeln für schlechtes HTML
 
Wirklich genial: sollte jemand unter meinen Besuchern eine eigene Homepage planen, muß er unbedingt die z.Zt. 70 Regeln beachten, die auf dieser wunderbar ironisch-gehässigen Seite präsentiert werden. Damit hat wenigstens die Pharmaindustrie einen Nutzen, denn sie kann dann mehr Mittel gegen Erbrechen, Depressionen etc. verkaufen... Aber im Ernst: so überzogen die Beispiele und Anspielungen auch teilweise erscheinen mögen, so ist es doch traurige Wahrheit, daß die meisten von uns (ich jedenfalls schon!) für nahezu jede dort präsentierte bzw. angesprochene Unsitte schon Beispiele gesehen haben. Um es ganz klar zu machen: wer eine Homepage plant oder erstellt, muß unbedingt alle diese Regeln mißachten! Und durch seine Kompaktheit ist dieser Leitfaden schon eine recht nützliche Anleitung, wie man es NICHT machen soll - v.a. für Anwender des Netscape Composers, denen die meiste Detailarbeit technischer Art abgenommen wird, unbedingt empfehlenswert und fast alleine ausreichend.

 

German flagSelfHTML - HTML-Dateien selbst erstellen
 
Diese Dokumentation ist sehr umfassend und geht nicht nur auf die Elemente des HTML 4.0-Format (und natürlich alle älteren) ein, sondern bietet auch den Inhalt als ZIP-File (mehrere MB!) an und ist sogar als Paperback-Buch erhältlich. Es bleibt somit jedem selbst überlassen, wie er diese wertvolle Informationsquelle nutzt - für gelegentliches Nachsehen technischer Details wie Fluchtsequenzen für deutsche Umlaute, französische Akzente u.ä. oder Metatags (sehr gut beschrieben) ist die direkte Online-Nutzung vielleicht am Besten, wenn man relativ viel und oft nachliest, dann sollte man die Sache besser per Download regeln oder speziell auf stärker einengenden Betriebssystemen sogar als Buch erwerben. Für alle, die es ganz genau wissen wollen, ist mir jedenfalls derzeit keine bessere Quelle bekannt.

 
 

back to top    LINUX/Programming   back to LINUX main    back to main

comments, suggestions, questions to:  stefan.urbat@apastron.lb.shuttle.de

(URL:  http://www.lb.shuttle.de/apastron/crunch2.htm)