Problem with newsgroup message count
[1/4] from: youpi::technologies::wanadoo::fr at: 1-Sep-2000 19:34
Hello,
When requesting number of messages for newsgroup fr.misc.finance we got
1000000 ! although there are less than 3000 messages. How can we cope with
that ?
Thanks.
----- Message d'origine -----
De : <[hopeless--eircom--net]>
À : <[list--rebol--com]>
Envoyé : vendredi 1 septembre 2000 10:17
Objet : [REBOL] REBOL and Java integration Re:(2)
[2/4] from: kevin:sunshinecable at: 1-Sep-2000 13:56
> When requesting number of messages for newsgroup fr.misc.finance we got
> 1000000 ! although there are less than 3000 messages. How can we cope with
> that ?
The built-in nntp protocol (nntp://) appears to always return 1000000
as the length of the group.
The external news:// protocol ('do %nntp.r) has a count command which
looks like it correctly obtains article counters from the news
server. However, I can't figure out the proper way to issue the
count command via news:// ... I'm copying this to Jeff directly (as
the all-knowing master of the news:// protocol) to see if he can
explain it.
Cheers,
Kev
--------------------------------------------------------------------------
Kevin McKinnon, Sr. News Administrator [news--sunshinecable--com]
Sunshine Communications Cable Internet http://news.sunshinecable.com
[cronkite.sunshinecable.com INN2.3 Incoming Curr:14.0Mbps,Peak:22.7Mbps]
[3/4] from: jeff:rebol at: 1-Sep-2000 17:07
> > When requesting number of messages for newsgroup
> > fr.misc.finance we got 1000000 ! although there are less
<<quoted lines omitted: 9>>
> it.
> Cheers, Kev
yo:
The problem is news servers lie. One of the aspects of the
built in nntp handler is that it is supposed to provide a
series metaphor for a newsgroup, but we can't be assured of
total ordering of the messages there (many may be missing
from the total that is returned). This is as the nntp spec
goes-- since articles may be expiring at the time you
inquire. The count data returned from nntp servers isn't
supposed to be reliable, so the series metaphor is kind of
weak in this case (especially compared to pop:// for
instance). 1000000 was just an arbitrary large number for
the length of the port since it is indeterminable. Do you
think it would be better to have whatever the server said
about the message count there?
Anyhow, here's a post I made a while back that talks about
this issue and what someone might do to guarantee message
ordering, at least with nntp.r:
--------------------------------------------
NNTP servers lie. They tell you a range of numbers of
articles that MAY be there. They may well not be there,
though. It's a funny thing.
The only way to really get the true number of articles in a
newsgroup is to get the headers for each article (which over
a 22kb modem 'aint always a good idea).
NNTP.r, as it is, also provides the optional NNTP "XHDR"
command which lets you quickly download just the subject
lines (or any other given header field: from, to, keywords,
etc..) of all the headers in the group. Having all the
subject lines, you can then know for sure (unless some of
those articles expire while you are reading) the count of
articles in a newsgroup.
One of the things NNTP does when it connects to a news
server is determine if it can do XHDR. Interactively you
can ask an open news port what it can do by inserting [help]
into the port. Here's how you can determine
non-interactively if the server you are talking to has XHDR:
np: open news://news.somewhere
found? find np/handler/commands 'xhdr
Using XHDR you can do something like the following:
np: open news://news.somewhere
set [total start end] insert np [count from "alt.test"]
x-mids: rejoin ["Message-ID " start "-" end]
message-ids: insert np [xhdr x-mids from "alt.test" please]
;- please is optional :)
The XHDR command gives you back a big string in a block.
Yes, that is a little odd (XHDR was added at the last
minute just to help aspiring news bot writers, if you want
to know!). The string you will find in the block has the
number of each article followed by the message id. It's
trivial to parse the string and it'll allow you to ask for
individual articles by their message-ids in order. There's
examples of getting articles by message id in the NNTP.r
howto. Using XHDR, you'll have an efficient way of
obtaining true newsgroup ordering with no gaps (for news
severs that support the feature ... If they don't well, you
probably have to fall back on getting all the message
headers in a group if you want to insure total
ordering... that's what Forte' free agent does!!).
Boy, nntp.r really is in need of an update. Looking at
NNTP.r it's doing all sorts of dialecting things by hand
that would be a lot easier to do today in modern REBOL with
things like parse block. The whole thing could be shrunk
by at least a half. So many things rattling around on our
overly loaded wagon trains...
[still need to get to that... arg.. too much stuff.. head
caving in... must.. alert the others... desk jobs sometimes
aren't "cushy".. but can be fun anyhow]
[4/4] from: kevin:sunshinecable at: 1-Sep-2000 23:24
On Fri, 1 Sep 2000 [jeff--rebol--net] wrote:
> > > When requesting number of messages for newsgroup
> > > fr.misc.finance we got 1000000 ! although there are less
<<quoted lines omitted: 4>>
> >
> The problem is news servers lie.
<news server admin mode>
*My* news server NEVER lies... <wink>
</off>
> inquire. The count data returned from nntp servers isn't
> supposed to be reliable, so the series metaphor is kind of
<<quoted lines omitted: 3>>
> think it would be better to have whatever the server said
> about the message count there?
Well, from the news server's perspective, the high and low counts returned
(as demonstrated in the sample code Jeff attached) will be reasonably
accurate as of the time of the request. Some of the older articles may
have already vanished, particularly if CNFS storage is being used (cyclic
buffers, which is what I run on my INN server), and of course new ones
could arrive moments after the request.
If you're going to return an arbitrary number though, I'd suggest taking
the (high - low) count and returning that... at least it will be *closer*
to the actual number of articles on the server. There will never be
*more* than (high - low), and factoring cancelled and expired
messages usually less. Having a smaller number the 1000000 should also
improve things if reading messages in a loop.
(FWIW, my news server kept about 1000000 new articles yesterday across
*all* groups I carry -- and threw away about 1000000 more from groups
that I don't.)
Cheers,
Kev
Notes
- Quoted lines have been omitted from some messages.
View the message alone to see the lines that have been omitted