Problem with newsgroup message count
[1/4] from: youpi::technologies::wanadoo::fr at: 1-Sep-2000 19:34
Hello, When requesting number of messages for newsgroup fr.misc.finance we got 1000000 ! although there are less than 3000 messages. How can we cope with that ? Thanks. ----- Message d'origine ----- De : <[hopeless--eircom--net]> À : <[list--rebol--com]> Envoyé : vendredi 1 septembre 2000 10:17 Objet : [REBOL] REBOL and Java integration Re:(2)
[2/4] from: kevin:sunshinecable at: 1-Sep-2000 13:56
> When requesting number of messages for newsgroup fr.misc.finance we got > 1000000 ! although there are less than 3000 messages. How can we cope with > that ?
The built-in nntp protocol (nntp://) appears to always return 1000000 as the length of the group. The external news:// protocol ('do %nntp.r) has a count command which looks like it correctly obtains article counters from the news server. However, I can't figure out the proper way to issue the count command via news:// ... I'm copying this to Jeff directly (as the all-knowing master of the news:// protocol) to see if he can explain it. Cheers, Kev -------------------------------------------------------------------------- Kevin McKinnon, Sr. News Administrator [news--sunshinecable--com] Sunshine Communications Cable Internet http://news.sunshinecable.com [cronkite.sunshinecable.com INN2.3 Incoming Curr:14.0Mbps,Peak:22.7Mbps]
[3/4] from: jeff:rebol at: 1-Sep-2000 17:07
> > When requesting number of messages for newsgroup > > fr.misc.finance we got 1000000 ! although there are less
<<quoted lines omitted: 9>>> it. > Cheers, Kev
yo: The problem is news servers lie. One of the aspects of the built in nntp handler is that it is supposed to provide a series metaphor for a newsgroup, but we can't be assured of total ordering of the messages there (many may be missing from the total that is returned). This is as the nntp spec goes-- since articles may be expiring at the time you inquire. The count data returned from nntp servers isn't supposed to be reliable, so the series metaphor is kind of weak in this case (especially compared to pop:// for instance). 1000000 was just an arbitrary large number for the length of the port since it is indeterminable. Do you think it would be better to have whatever the server said about the message count there? Anyhow, here's a post I made a while back that talks about this issue and what someone might do to guarantee message ordering, at least with nntp.r: -------------------------------------------- NNTP servers lie. They tell you a range of numbers of articles that MAY be there. They may well not be there, though. It's a funny thing. The only way to really get the true number of articles in a newsgroup is to get the headers for each article (which over a 22kb modem 'aint always a good idea). NNTP.r, as it is, also provides the optional NNTP "XHDR" command which lets you quickly download just the subject lines (or any other given header field: from, to, keywords, etc..) of all the headers in the group. Having all the subject lines, you can then know for sure (unless some of those articles expire while you are reading) the count of articles in a newsgroup. One of the things NNTP does when it connects to a news server is determine if it can do XHDR. Interactively you can ask an open news port what it can do by inserting [help] into the port. Here's how you can determine non-interactively if the server you are talking to has XHDR: np: open news://news.somewhere found? find np/handler/commands 'xhdr Using XHDR you can do something like the following: np: open news://news.somewhere set [total start end] insert np [count from "alt.test"] x-mids: rejoin ["Message-ID " start "-" end] message-ids: insert np [xhdr x-mids from "alt.test" please] ;- please is optional :) The XHDR command gives you back a big string in a block. Yes, that is a little odd (XHDR was added at the last minute just to help aspiring news bot writers, if you want to know!). The string you will find in the block has the number of each article followed by the message id. It's trivial to parse the string and it'll allow you to ask for individual articles by their message-ids in order. There's examples of getting articles by message id in the NNTP.r howto. Using XHDR, you'll have an efficient way of obtaining true newsgroup ordering with no gaps (for news severs that support the feature ... If they don't well, you probably have to fall back on getting all the message headers in a group if you want to insure total ordering... that's what Forte' free agent does!!). Boy, nntp.r really is in need of an update. Looking at NNTP.r it's doing all sorts of dialecting things by hand that would be a lot easier to do today in modern REBOL with things like parse block. The whole thing could be shrunk by at least a half. So many things rattling around on our overly loaded wagon trains... [still need to get to that... arg.. too much stuff.. head caving in... must.. alert the others... desk jobs sometimes aren't "cushy".. but can be fun anyhow]
[4/4] from: kevin:sunshinecable at: 1-Sep-2000 23:24
On Fri, 1 Sep 2000 [jeff--rebol--net] wrote:
> > > When requesting number of messages for newsgroup > > > fr.misc.finance we got 1000000 ! although there are less
<<quoted lines omitted: 4>>> > > The problem is news servers lie.
<news server admin mode> *My* news server NEVER lies... <wink> </off>
> inquire. The count data returned from nntp servers isn't > supposed to be reliable, so the series metaphor is kind of
<<quoted lines omitted: 3>>> think it would be better to have whatever the server said > about the message count there?
Well, from the news server's perspective, the high and low counts returned (as demonstrated in the sample code Jeff attached) will be reasonably accurate as of the time of the request. Some of the older articles may have already vanished, particularly if CNFS storage is being used (cyclic buffers, which is what I run on my INN server), and of course new ones could arrive moments after the request. If you're going to return an arbitrary number though, I'd suggest taking the (high - low) count and returning that... at least it will be *closer* to the actual number of articles on the server. There will never be *more* than (high - low), and factoring cancelled and expired messages usually less. Having a smaller number the 1000000 should also improve things if reading messages in a loop. (FWIW, my news server kept about 1000000 new articles yesterday across *all* groups I carry -- and threw away about 1000000 more from groups that I don't.) Cheers, Kev
- Quoted lines have been omitted from some messages.
View the message alone to see the lines that have been omitted