Documention for: webcheck.r
Created by: btiffin
on: 20-May-2007
Last updated by: btiffin on: 20-May-2007
Format: html
Downloaded on: 9-Oct-2024

Usage document for the %webcheck.r

1. Using webcheck.r
1.1. Setup required
1.2. Standard REBOL network settings
1.3. Running %webcheck.r
1.4. Support file created
2. What you can learn
2.1. Reading web pages
2.2. Checksums
2.3. if with any
2.4. print with blocks
2.5. load and save
3. Some Definitions
4. Also worth a look
5. List of tutorial scripts in the web category
6. A script you have to check out
7. More in-depth scripts
7.1. Other web related scripts
8. Credits

1. Using webcheck.r

1.1. Setup required

%webcheck.r requires some setup before it is useable.

Copy the library script to a local directory with the command listed below. It is listed below to keep the floating table of contents from obscuring the command.

Edit the local %webcheck.r. Change the web page you wish to check and the email address for sending results.

If you are using REBOL/View, just use the builtin editor function. For REBOL/Core, you will need to run an external editor, such as notepad.

Getting a local copy using REBOL:

 >> write %webcheck.r read http://www.rebol.org/cgi-bin/cgiwrap/rebol/download-a-script.r?script-name=webcheck.r
 
or you use your browser and the rebol.org download script options.

1.2. Standard REBOL network settings

There is a standard utility that holds some basic REBOL configuration information. set-net takes a block of information so that REBOL knows how to route mail and for other internet connections. The REBOL/View Viewtop main User menu allows access to these settings or you can edit the default %user.r file and change the set-net information.

 >> help set-net
 USAGE:
     SET-NET settings

 DESCRIPTION:
      Network setup.  All values after default are optional.
      Words OK for server names.
      SET-NET is a function value.

 ARGUMENTS:
      settings -- [
                        email-addr default-server pop-server
                        proxy-server proxy-port-id proxy-type
                        esmtp-user esmtp-pass
                      ] (Type: block)
 

The first two setting are for sending mail, the third is for reading mail, then there are proxy connection settings and then two settings for authenticated mail username and password.

1.3. Running %webcheck.r

After the configuration is all set up, using webcheck is simple. Just do it.

 >> do %webcheck.r
 

This will examine the web page, see if its checksum matches a previous check and, if a change is detected, mail the new page to the address specified after the send function.

This is the main reason that this script needs modification. Sending mail to luke@rebol.com is not much good to anybody.

1.4. Support file created

%webcheck.r creates a summary file %page-sum.r. This file is used to determine if the checked page has changed since the last time you ran %webcheck.r.

2. What you can learn

This script has a few learning goodies in it.

2.1. Reading web pages

First and foremost, the ease of reading a web page.
The expression page: read http://www.rebol.com places the HTML of the rebol.com home page into the variable page. You can change the site you check by changing the url!.

2.2. Checksums

REBOL comes with a builtin checksum function. Checksums are a fairly high level computer science concept that scans a piece of data and returns a single value that is (usually) the sum of all the bits squished into a smallish number.

2.3. if with any

REBOL provides a complete set of looping and control structures, common to most programming environments. REBOL extends these normal control structures with a nifty little any sequence. The native any word, evaluates a block of expressions and, working like a shortcut logical or, returns the first value that is not FALSE or NONE.

In the case of the %webcheck.r script, the following occurs:

Now the if statement will test to see if any returned true or false. If it was true, (no %page-sum.r exists, or the checksums were different), the block with the print, save, and send will execute. This will save the checksum value if this is the first time that you have run %webcheck.r.

If the result of the any block was false (%page-sum.r exists and the checksums were the same), nothing else will happen. That means the page is the same and there is no sense informing anyone or saving the checksum value, since it is the same.

2.4. print with blocks

The REBOL print statement can take many different datatypes. In this case the data is in a block and all values in the block are printed after being evaluated. In REBOL for printing, this type of evaluation is termed reduce. The quoted string "page changed" evaluates to itself, a string! and now evaluates to the current date and time.

2.5. load and save

REBOL has powerful but easy to use save and load functionality. The save function produces output that REBOL knows how to load directly.
Well worth a further read at Loading scripts  and Saving scripts  as this is a very powerful feature of REBOL.

3. Some Definitions

REBOL Relative Expression Based Object Language, pronounced as rebel.
HTML HyperText Markup Language.
HTTP HyperText Transport Protocol.
Web Common expression for World Wide Web, a term coined by Tim Berners-Lee in 1989.
www Abbreviation of World Wide Web.
URL Uniform Resource Locator, for naming things on the World Wide Web.
URI The Uniform Resource Identifier.
URN and the Uniform Resource Name, sometimes the U can stand for Universal.
url! REBOL's builtin URL datatype. REBOL just knows.

4. Also worth a look

There is a full suite of scripts that demonstrate how easy it is to use the HTTP url! (or "the web" ) features in REBOL.

These features are one of the central design goals of the REBOL scripting environment. These sample scripts highlight the ease of using internet resources with REBOL.

5. List of tutorial scripts in the web category

%webcheck.r  Determine if a web page has changed since it was last checked, and if it has, send the new page via email
%weblinks.r  Display all of the web links found on a page.
%webprint.r  Fetch a web page and display its HTML code.
%websend.r  Fetch a web page and send it as email.
%webfind.r  Search a web page for a string, and save the page.
%webtitle.r  Find the title of a web page and display it.
%webget.r  Fetch a web page and save it as a file.
%webfinder.r  Search multiple web pages for a string, and print the URL of the ones where it was found.
%web-to-plain.r  to translate htmlized text into plain text in one pass.
%webbanner.r  Generate HTML code that displays a banner and links to its destination.
%websplit.r  Separate the HTML tags from the body text of a document.
%webcam.r  style for webcam images
%webgetter.r  Fetch several web pages and save them as local files.
%oneliner-save-web-page-text.r  This line reads a web page, strips all its tags (leaving just the text) and writes it to a file called page.txt.
%countweb.r  Count the number of times a string appears on each of a given set of web pages.
%findweb.r  Simple example of searching multiple web pages for a specified string.
%timewebs.r  Time how long it takes to get each of the web pages listed in a block.
%webloop.r  Send a set of pages via email every hour.
%oneliner-webserver.r  Webserver serving files from the current directory.
%oneliner-print-web-page.r  Prints to the console the HTML source for a web page.

6. A script you have to check out

%oneliner-webserver.r  Webserver serving files from the current directory. A single line of REBOL code.

7. More in-depth scripts

%webcrawler.r  To crawl the web starting from any site. Does not record duplicate visits. Saves all links found in 'newlinks.
%extract-web-links.r  A function which scans a string (normally a web page) and creates a block of URL/Text combinations for each HTML <a> tag in the string.
%webserver.r  Here is a web server that works quite well and can be run from just about any machine. It's not only fast, but its also small so it's easy to enhance.
%webserv.r  A Simple HTTP-Server that can run REBOL CGI scripts
%volkswebserv.r  HTTP-Server for running and debugging REBOL CGI scripts, modified %webserv.r
%webwidget.r  Generate HTML code quickly and easily for several form elements.

7.1. Other web related scripts

dealing with web related REBOL programming that you can find in the rebol.org library. There are also complete suites for FTP, HTML, CGI, and many others.

8. Credits

%webcheck.r Original author: Unknown
%weblinks.r Original author: Unknown
%webprint.r Original author: Unknown
%websend.r Original author: Unknown
%webfind.r Original author: Unknown
%webtitle.r Original author: Unknown
%webget.r Original author: Unknown
%webcrawler.r Original author: Bohdan Lechnowsky
%webfinder.r Original author: Unknown
%webserver.r Original author: Unknown
%web-to-plain.r Original author: Tom Conlin
%webbanner.r Original author: Andrew Grossman
%websplit.r Original author: Unknown
%webwidget.r Original author: Andrew Grossman
%webcam.r Original author: Piotr Gapinski
%webgetter.r Original author: Unknown
%oneliner-save-web-page-text.r Original author: Carl Sassenrath
%webserv.r Original author: Cal Dixon
%countweb.r Original author: Unknown
%findweb.r Original author: Unknown
%timewebs.r Original author: Unknown
%webloop.r Original author: Unknown
%extract-web-links.r Original author: Peter WA Wood
%oneliner-webserver.r Original author: Cal Dixon
%volkswebserv.r Original author: Cal Dixon, mods by Volker Nitsch
%oneliner-print-web-page.r Original author: Carl Sassenrath
REBOL/Core Carl Sassenrath, REBOL Technologies
REBOL/View Carl Sassenrath, REBOL Technologies