Need some URL help
[1/14] from: hallvard::ystad::gmail::com at: 23-Jan-2006 13:27
Hi list,
I have a problem with downloading this URL:
http://www.linuxtelephony.org/article.cgi?i=400&r=0
It seems the linuxtelephony.org returns something in the HTTP headers that I
cannot successfully read with rebol. Here's my code (copy&paste into your
rebol shell, but watch out for line breaks):
REBOL []
; This is the URL: http://www.linuxtelephony.org/article.cgi?i=400&r=0
insert-this: "GET /article.cgi?i=400&r=0 HTTP/1.0^/Host:
www.linuxtelephony.org^/"
port: open/lines [
scheme: 'tcp
host: "www.linuxtelephony.org"
port-id: 80
]
print "[We send:]"
print form insert-this
insert port insert-this
header: make block! 10
while [ not empty? reply: first port ] [
if none? reply [print "Break!" break]
parse reply [
[copy name [thru #":"] (name: load name) | copy name [to end] (print
form name)]
[copy value to newline | copy value to end] (if value [append header
reduce [name value]])
]
]
either [ not empty? content: first port ] [content: copy port] [print
EMPTY!
]
close port
print reduce [header content]
halt
===== End code
From other servers, this code is OK, but this linuxtelephony.org URL always
crashes on me. Does anyone have a clue?
Thanks,
HY
PS. The server seems to run Apache with PHP4:
Server: Apache/1.3.33 (Unix) PHP/4.3.11-dev
X-Accelerated-By: PHPA/1.3.3r2
X-Powered-By: PHP/4.3.11-dev
Could this be a PHP bug?
[2/14] from: SunandaDH:aol at: 23-Jan-2006 7:51
Hallvard:
> I have a problem with downloading this URL:
> http://www.linuxtelephony.org/article.cgi?i=400&r=0
Your code works fine for me -- I can read the page with it.
>> do read clipboard://
[We send:]
GET /article.cgi?i=400&r=0 HTTP/1.0
Host:www.linuxtelephony.org
HTTP/1.1 200 OK
Mon, 23 Jan 2006 12:48:45 GMT Apache/1.3.33 (Unix) PHP/4.3.11-dev
PHP/4.3.11-dev PHPA/1.3.3r2 close text/html <!doctype h
tml public "-//w3c//dtd html 4.0 transitional//en"> <head>
<title>Linuxtelephony.org - Linuxtelephony.org</title> <me
ta http-equiv="Content-Type" content="text/...........
The headers look okay to me:
HTTP/1.1 200 OK
Date: Mon, 23 Jan 2006 12:45:50 GMT
Server: Apache/1.3.33 (Unix) PHP/4.3.11-dev
X-Powered-By: PHP/4.3.11-dev
X-Accelerated-By: PHPA/1.3.3r2
Connection: close
Content-Type: tex
Could it be a firewall at your end?
Sunanda.
[3/14] from: hallvard:ystad:gma:il at: 23-Jan-2006 14:47
Argh! I just hate it when errors aren't repeatable.
But the behaviour is consistent on my side: I get a 302 redirect to some URL
with a session ID in it, which will (probably set a cookie and then)
redirect me back. This works with a browser, but not with rebol...
Darn.
Actually, I know that rebol will halt in the line
while [ not empty? reply: first port ] [
and claim there is an out-of-index or past-end error. When I probe 'port, I
cannot see anything irregular at that point.
Thanks for trying anyway.
HY
On 23/01/06, SunandaDH-aol.com <SunandaDH-aol.com> wrote:
[4/14] from: SunandaDH::aol::com at: 24-Jan-2006 6:16
Hallvard:
> But the behaviour is consistent on my side: I get a 302 redirect to some URL
> with a session ID in it, which will (probably set a cookie and then)
> redirect me back. This works with a browser, but not with rebol...
Weird.....I got a 302 redirect one time I tried, but not any of the other
times.
It's possible the site is sniffing user agents, and redirecting ones it
doesn't like to a different (simpler?) version of the site. So you could try
playing round with:
system/schemes/http/user-agent: "Mozilla/4.0 (compatible; MSIE 6.0; Windows
NT 5.0)"
see if different user agent strings make a difference.
Another possibility to explain the randomness is that they are doing some
sort of load balancing/redirecting based on IP address. If so, you could try
coming at them via an anonymizing website.
Either way, good luck!
Sunanda.
[5/14] from: hallvard:ystad::gmail at: 24-Jan-2006 14:42
Thanks for the tip, but I now tried with different haeders (didn't even
include the user-agent header on first attempts). I have experimented with
both HTTP/1.1 and HTTP/1.0, kept-alive and closed connections, with and
without specifying the host, with different user-agents and even from
different IPs... And nothing seems to make any difference. I now get the 302
error on all attempts.
The rebol console looks like this:
[We send:]
GET /article.cgi?i=400&r=0 HTTP/1.1
Host: www.linuxtelephony.org
connection: close
user-agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0)
HTTP/1.1 302 Moved Temporarily
** Script Error: Out of range or past end
** Near: not empty? reply: first port
>>
When I probe the port, I see this:
state: make object! [
flags: 791107
misc: [144 [] 0]
tail: 0
num: 1
with: "^M^/"
custom: none
index: 0
func: 3
fpos: 0
inBuffer: "^/"
... Could it be that the server says it's newlines are "^M^/", but the last
newline it actually sends, is only "^/"?
I put everything in a file: http://babelserver.org/ strange.txt
if anyone would like to have a look.
HY
On 24/01/06, SunandaDH-aol.com <SunandaDH-aol.com> wrote:
[6/14] from: hallvard::ystad::gmail::com at: 26-Jan-2006 13:30
OK, I'm a little bit closer to the problem now.
I tried to download the same URL using 'read, and rebol hung. Setting on
trace, I got an endless loop in parsing of the HTTP headers, so there
definitely is something spooky going on there. I put the lot (trace produces
kilometres of output) on this URL: http://www.babelserver.org/
strange.html
I believe this is a bug. I'd like to look closer into it myself, but since
'read is native, I cannot peek into it, can I? And from the 'port object,
(open/lines), I don't know how to get more information than what I have on
the above mentionned URL.
What should I do? Should I post it directly to Rambo?
HY
On 24/01/06, Hallvard Ystad <hallvard.ystad-gmail.com> wrote:
[7/14] from: compkarori::gmail at: 27-Jan-2006 20:37
read works for me on this page
http://www.linuxtelephony.org/article.cgi?i=400&r=0
On 1/24/06, Hallvard Ystad <hallvard.ystad-gmail.com> wrote:
> Hi list,
>
> I have a problem with downloading this URL:
> http://www.linuxtelephony.org/article.cgi?i=400&r=0
>
--
Graham Chiu
http://www.compkarori.com/emr/
[8/14] from: hallvard:ystad:gm:ail at: 27-Jan-2006 21:40
Sure. The server behaves as it pleases, it seems. The error appears when you
get a 302 redirect http response header. Which one occationally does,
Sunanda seems to have gotten it on his third attempt. I get it *most* of the
time, and when I get a 200 OK header, everything works just fine for me too.
The server that responds to http://www.viahardware.com/ exhibits the same
behaviour. (Exhibits? Is that an english way of speaking? Or does it smell
like new-mowed grass?)
HY
On 27/01/06, Graham Chiu <compkarori-gmail.com> wrote:
[9/14] from: compkarori::gmail::com at: 28-Jan-2006 0:52
If they redirect only sometimes, it's probably some load balancing they're
doing.
On 1/28/06, Hallvard Ystad <hallvard.ystad-gmail.com> wrote:
> Sure. The server behaves as it pleases, it seems. The error appears when
> you
<<quoted lines omitted: 35>>
> To unsubscribe from the list, just send an email to
> lists at rebol.com with unsubscribe as the subject.
--
Graham Chiu
http://www.compkarori.com/emr/
[10/14] from: hallvard:ystad:gmai:l at: 27-Jan-2006 21:58
Still, the problem isn't the forwarding, but that rebol isn't able to read
the response. It hangs, or actually, as the trace log shows, goes into an
infinite loop. With read/lines on an open 'port, it crashes.
HY
On 27/01/06, Graham Chiu <compkarori-gmail.com> wrote:
[11/14] from: anton::wilddsl::net::au at: 29-Jan-2006 1:38
I got it straight away:
>> trace/net on forever [print "trying..." read
http://www.linuxtelephony.org/article.cgi?i=400&r=0]
trying...
URL Parse: none none www.linuxtelephony.org none none article.cgi?i=400&r=0
Net-log: ["Opening" "tcp" "for" "HTTP"]
connecting to: www.linuxtelephony.org
Net-log: {GET /article.cgi?i=400&r=0 HTTP/1.0
Accept: */*
Connection: close
User-Agent: REBOL View 1.3.2.3.1
Host: www.linuxtelephony.org
}
Net-log: "HTTP/1.1 302 Moved Temporarily"
Anton.
> Sure. The server behaves as it pleases, it seems. The error
> appears when you
<<quoted lines omitted: 7>>
> like new-mowed grass?)
> HY
That use of "exhibits" is fine, Halvard. :)
[12/14] from: anton::wilddsl::net::au at: 29-Jan-2006 10:54
Hi Hallvard,
I forgot to mention, that after the final Net-log line below,
rebol hung for a long time, until this was printed:
** Script Error: Not enough memory
** Where: append
** Near: insert tail series :value
>>
Anton.
[13/14] from: hallvard::ystad::gmail::com at: 29-Jan-2006 20:41
On 29/01/06, Anton Rolls <anton-wilddsl.net.au> wrote:
> Hi Hallvard,
> I forgot to mention, that after the final Net-log line below,
<<quoted lines omitted: 3>>
> ** Near: insert tail series :value
> >>
I believe you must have gone through the same endless loop as I, except I
didn't have the patience to wait for the memory error.
I'll submit to rambo.
HY
[14/14] from: hallvard::ystad::gmail::com at: 30-Jan-2006 20:50
On 29/01/06, Hallvard Ystad <hallvard.ystad-gmail.com> wrote:
> I'll submit to rambo.
>
Done
HY
Notes
- Quoted lines have been omitted from some messages.
View the message alone to see the lines that have been omitted