Script Library: 1240 scripts
  • Home
  • Script library
  • AltME Archive
  • Mailing list
  • Articles Index
  • Site search
 

Documentation for: webprint.r


Usage document for %webprint.r

1. Introduction to %webprint.r

webprint.r is an introduction to the ease and simplicity of accessing internet URLs and HTML.

HTML Hyper Text Markup Language, usually pronounced
H T M L
URL Uniform Resource Locator, usually pronounced
U R L

2. webprint At a Glance

Not setup is required, just do it.

>> do %webprint.r
 

3. Using %webprint.r

Requires REBOL/Core or REBOL/View console mode.

3.1. Running %webprint.r

From the library with:

 >> do http://www.rebol.org/cgi-bin/cgiwrap/rebol/download-a-script.r?script-name=webprint.r
 
or locally with:
 >> do %webprint.r
 

4. See also

There is another rebol.org script, very similar to this one,
%view-html.r  that is more than the sample D program listed below, in that there is a gui window involved.

5. What you can learn

5.1. Powerful builtin Internet Access

REBOL has fantastically simple builtin procedures for accessing the internet.
read http://www.rebol.com accesses and returns the HTML as text. How cool is that?

5.2. URLs

http://www.rebol.com is actually a value with a special datatype. In REBOL this is a url!. Very powerful. No quotes needed. REBOL just knows.

5.3. Web Server defaults

Web Servers have default files that are returned. http://www.rebol.com is actually returned as http://www.rebol.com/index.html. This is not always the case. Some sites return default.htm, or index.php, or index.cgi. No need to worry, the REBOL read function and the web server will work that all out for you. If REBOL Technologies ever changes its web server setup, a different file may be returned and this script will still work.

5.4. Changing the page printed

Changing the website or page that is printed is as easy as changing the text after the http:// part following the read command.

5.5. Getting REBOL

Both REBOL/Core and REBOL/View are available
free of charge from www.rebol.com 

5.6. Compare the complexity to the simplicity

Please compare the print read http://www.rebol.com 31 character sequence to this D language  program.

5.6.1. D Language sample for printing HTML as text

 /*
     HTMLget written by Christopher E. Miller
     This code is public domain.
     You may use it for any purpose.
     This code has no warranties and is provided 'as-is'.
 */

 //debug = HTMLGET;

 import std.string, std.conv, std.stream;
 import std.socket, std.socketstream;

 int main(char[][] args)
 {
     if(args.length < 2)
     {
         printf("Usage:\n   htmlget <web-page>\n");
         return 0;
     }
     char[] url = args[1];
     int i;

     i = std.string.find(url, "://");
     if(i != -1)
     {
         if(icmp(url[0 .. i], "http"))
             throw new Exception("http:// expected");
     }

     i = std.string.find(url, '#');
     if(i != -1) // Remove anchor ref.
         url = url[0 .. i];

     i = std.string.find(url, '/');
     char[] domain;
     if(i == -1)
     {
         domain = url;
         url = "/";
     }
     else
     {
         domain = url[0 .. i];
         url = url[i .. url.length];
     }

     uint port;
     i = std.string.find(domain, ':');
     if(i == -1)
     {
         port = 80; // Default HTTP port.
     }
     else
     {
         port = std.conv.toUshort(domain[i + 1 .. domain.length]);
         domain = domain[0 .. i];
     }

     debug(HTMLGET)
         printf("Connecting to " ~ domain ~ " on port " ~ std.string.toString(port) ~ "...\n");

     auto Socket sock = new TcpSocket(new InternetAddress(domain, port));
     Stream ss = new SocketStream(sock);

     debug(HTMLGET)
         printf("Connected!\nRequesting URL \" ~ url ~ "\"...\n");

     if(port != 80)
         domain = domain ~ ":" ~ std.string.toString(port);
     ss.writeString("GET " ~ url ~ " HTTP/1.1\r\n"
         "Host: " ~ domain ~ "\r\n"
         "\r\n");

     // Skip HTTP header.
     char[] line;
     for(;;)
     {
         line = ss.readLine();
         if(!line.length)
             break;

         const char[] CONTENT_TYPE_NAME = "Content-Type: ";
         if(line.length > CONTENT_TYPE_NAME.length &&
             !icmp(CONTENT_TYPE_NAME, line[0 .. CONTENT_TYPE_NAME.length]))
         {
             char[] type;
             type = line[CONTENT_TYPE_NAME.length .. line.length];
             if(type.length <= 5 || icmp("text/", type[0 .. 5]))
                 throw new Exception("URL is not text");
         }
     }

     print_lines:
     while(!ss.eof())
     {
         line = ss.readLine();
         printf("%.*s\n", line);

         //if(std.string.ifind(line, "</html>") != -1)
         //  break;
         size_t iw;
         for(iw = 0; iw != line.length; iw++)
         {
             if(!icmp("</html>", line[iw .. line.length]))
                 break print_lines;
         }
     }

     return 0;
 }
 
What would you rather type? The above or something like
print read join http:// ask "Web site? " What will be easier to remember 6 months from now?

6. What can break

You will need access to the internet and rebol.com will have to be up and running for this script to work. Don't worry, http://www.rebol.com is always up and running.

7. Credits

%webprint.r Author: Unknown
%html-view.r Author: Unknown
htmlget.d Author Christopher E. Miller
D Programming Language Walter Bright, Digital Mars
REBOL/Core Carl Sassenrath, REBOL Technologies
  • The rebol.org Library Team
  • Usage document by Brian Tiffin, Library Team Apprentice, Last updated: 17-May-2007