Mailing List Archive: 49091 messages
  • Home
  • Script library
  • AltME Archive
  • Mailing list
  • Articles Index
  • Site search
 

does modified? GET or HEAD?

 [1/3] from: hallvard:ystad:helpinhand at: 14-Apr-2003 13:53


Hi Does anyone know if 'modified? reads the whole URL with a http GET command, or does it simply use a HEAD command? Does anyone have some code for getting the date out of a port! object? Say I'd like to fetch a URL only if it's been modified since the last time I was there. Is there a way to check without having to read the URL twice? E.g. this would read the URL twice, if the site has changed: if lastvisit < modified? some-url [ read some-url ] Thanks, ~H

 [2/3] from: andreas:bolka:gmx at: 14-Apr-2003 18:42


Monday, April 14, 2003, 1:53:15 PM, Hallvard wrote:
> Does anyone know if 'modified? reads the whole URL with a http GET > command, or does it simply use a HEAD command?
Short version: it uses HTTP GET. You can see that using REBOL's own 'trace function:
>> trace/net on >> modified? http://www.rebol.com
URL Parse: none none www.rebol.com none none none Net-log: ["Opening" "tcp" "for" "HTTP"] connecting to: www.rebol.com Net-log: {GET / HTTP/1.0 Accept: */* Connection: close User-Agent: REBOL Core 2.5.5.3.1 Host: www.rebol.com } Net-log: "HTTP/1.1 200 OK" == 7-Apr-2003/23:11:33 'modified? actually calls 'info? (try source modified?) which in turn calls 'query (try source info?).
> Does anyone have some code for getting the date out of a port! > object? >> p: open http://rebol.com >> probe p/locals/headers
make object! [ Date: "Mon, 14 Apr 2003 15:58:05 GMT" Server: "Apache/1.3.26 (Unix) FrontPage/5.0.2.2623" Last-Modified: "Mon, 07/Apr/2003/23:11:33/+GMT" Accept-Ranges: "bytes" Content-Encoding: none Content-Type: "text/html" Content-Length: "11998" Location: none Expires: none Referer: none Connection: "close" Authorization: none ETag: {"1259b8-2ede-3e9205a5"} content: "" ] So use 'open instead of 'read and you can access the Last-Modified date and the ETag; using e.g. p/locals/headers/Last-Modified.
> Say I'd like to fetch a URL only if it's been modified since the > last time I was there. Is there a way to check without having to > read the URL twice? E.g. this would read the URL twice, if the site > has changed:
It would probably be the best thing to accomplish that using HTTP's inherent methods, referred to as "conditional GET". Preferrably, conditional GETs are done using "ETags". An example: ; first we need to get the last modified date ("lastvisit") p: open http://www.rebol.com/ etag: copy p/locals/headers/ETag ; == {"1259b8-2ede-3e9205a5"} close p ; then, subsequently, we do conditional GETs: p: open/custom http://www.rebol.com/ compose/deep [ header [ If-None-Match: (etag) ] ] Problem is, that REBOL doesn't provide you the HTTP response code (you usually check on HTTP response code == 304 to know wether the resource changed or not). A good workaround is to simply check the ETag value again: print either etag = p/locals/headers/ETag [ "not changed" ] [ "changed" ] ; not changed A conditional GET can be done using the Last-Modified/If-Modified-Since headers as well. But as REBOL mangles the Last-Modified value you'd have to clean the value first. To see a bit more about what happens at HTTP level, try the above code with trace/net turned on, use a HTTP/TCP sniffer or see the examples in Simon Fell's "BDG to Etags" - http://www.pocketsoap.com/weblog/stories/2002/05/0015.html -- Best regards, Andreas mailto:[andreas--bolka--gmx--net]

 [3/3] from: hallvard::ystad::helpinhand::com at: 14-Apr-2003 20:43


Thanks, Andreas, your response was very clear. I wasn't aware of the If-None-Match header field. Cool. Guess it doesn't work with HTTP/1.0 servers, but who cares. ~H Dixit Andreas Bolka (18.42 14.04.2003):