UHURU/RSL/? summary

[1/7] from: chris::starforge::co::uk at: 25-May-2001 14:24

Hi, My apologies for the length, but IMO it's important this is done.. Before we can go into much more detail with this system, I think we need to clear up all the discussion so far and summarise the key ideas otherwise me may start to lose new contributors to the thread or confuse everyone. I'll attempt to distil the key points form the discussion so far, please make corrections and additions or raise objections where you see fit. I will use "the system" here rather than one of the acronyms (mainly because I type it faster than UHURU ;)) I will try to describe things in a more "task based" way than the concepts discussed in Joel's introduction (reference 1 below) and I've fleshed some parts out with some additional descriptions of my own, feel free to rip them to pieces :) Intro: Intent of the system The aim of this system is to provide: - A centralised, keyword searchable repository of packages (from individual functions through to applications). See part 1. - A REBOL script or package which can obtain specific packages from the repository, based on package name (or keywords ?). This must handle dependancies automagically. See part 2 for more on this. - A standard for package implementation, including a specification of required metadata and coding and documentation standards. This is discussed in part 3. Part 1. The repository. Some would call this a Library (which is the less "hackeresque" term). The core of this system is a hierarcical structure of packages which can either be browsed or searched via a keyword search engine on the web. In this respect the repositry would be similar to the script libraries on rebol.com or rebol.org, with a search engine added for assisted lookups. A possible refinement would be that when selecting package, rather than being presented with the code itself a summary is shown. For example, selecting delim-ascii-html would bring up: Author: Joel Neely Latest version: 2.1 Last update: 13:32, 25 May 2001 Synopsis: Delimited ASCII to HTML conversions Keywords: ASCII HTML comma tab convert text conversion Dependancies: delimited-ascii Version Date Source Documentation 1.0 1 May 2001 15k (mirrors) 40k (mirrors) ... 2.1 25 May 2001 28k (mirrors) 64k (mirrors) In this scenario, clicking on the author name would bring up all the scripts the author had written (profile as well?), clicking on one of the keywords would search for that keyword, clicking on the dependancies would show the summary for that file. The table at the end lists the available versions. Clicking on the size would download the file, clicking on the mirrors word would bring up a list of mirrors of that file. A key requirement of the repository is that it must be easily mirrored to ensure fast access from any location in the world. The problem with this is ensuring coherancy across the mirrors. Mirroring would be best achieved via a script which obtained all new or updated packages from a central repository on an automatic basis (cron triggered for example). This means that a coordinating repository with a recent additions facility is required. Something to consider: REBOL has built in compression, if the repository is purely cgi powered then it would be possible to store the files on the server in compressed form. When a file is requested, the script could decompress it and sent it to the the user's browser. This does not gain anything for the web version - bit for the remote access program this would be a bit help. Note that I do not mean "store all the files for a package in one archive". Dynamically processing this would be clumbersome, and it would cause even more problems when obtaining packages with the remote access program. I mean that each file should be stored seperately on the server but in a compressed form. Part 2. The remote access program. Using this package developers can obtain packages from the repository without a web browser, or scripts can automagically obtain sources from the repository when they are setting up or checking for updates. What it needs to be able to do: - Given a package name it must check a designated repository (see below) for that package. If found, and the version on the local machine is not up to date, the package is downloaded to the local machine. - (possible, this could be tricky) Given keywords this will search the repository for suitable files. This is more likely to be a front-end to the search engine on the repository which processes the search result webpage to produce a list of matching packages to display in the console - All dependancies must be resolved during the download operation. If delim-ascii-html depends on delimited-ascii then the script must detect this, check whether a suitable version is available locally and, if not, automatically obtain the delimited-ascii package. - By default the script only downloads the source file(s). By specifying an additional argument, the script will download both the sources and associated documentation. As a tie-in to the compression discussion in step 1, if files on the repository are compressed then the remote access program would be able to work considerably more efficiently than a web browser for obtaining sources: the archive could be downloaded and unpacked locally (unlike the web situation where it would have to be unpacked on the server). (note that this reduces somewhat the need for "low comment" code.) Specifying a repository could be tricky - the most efficient way to do it is to choose the fastest, closest mirror. How this could be done automatically I have no idea (I don't even know if it could be done automatically). I would suggest that the central repositroy holds a list of mirrors giving their geographical location, when the script is run for the first time the user is asked to choose a local mirror.. something we need to clear up. Part 3. This mail is already too big, so I'll do this in a second one. Coding standards are always a nasty area as they need to be flexible enough to allow programmers to keep their personal style while ensuring code can be used and understood by all. There are some big questions still left in here, particularly the issues of docuentation content, extensions to the REBOL style guide and the specification of metadata required to make the system possible. A. References These are not in chronological order, I have rearranged them on a subject basis. I have also removed mails not directly related to the specification of the system. 1. http://www.escribe.com/internet/rebol/m9432.html - Joel Neely Introduction to the concepts behind the system. 2. http://www.escribe.com/internet/rebol/m9437.html - Chris Page Discussion of REBOL header v XML meta information (and dissing Joel's acronym ;)) Ladislav Mecir backs up Chris' view in m9468.html 3. http://www.escribe.com/internet/rebol/m9447.html - Joel Neely Explanation of rationale behind XML suggestion, start of comments discussion. 4. http://www.escribe.com/internet/rebol/m9458.html - Chris Page Comments on XML v REBOL, followup to Joel's comment statement. 5. http://www.escribe.com/internet/rebol/m9461.html - GS Jones Comments on obtaining commented or uncommented versions of packages. 6. http://www.escribe.com/internet/rebol/m9463.html - Chris Page Expansion on m9461 with discussion of possible mechanisms for comment/documentation selection. 7. http://www.escribe.com/internet/rebol/m9467.html - Joel Neely Further discussion of the commenting/documentation distinction. 8. http://www.escribe.com/internet/rebol/m9422.html - GS Jones Discussion of versioning, packages, namespace and operation. 9. http://www.escribe.com/internet/rebol/m9449.html - Chris Page Elaboration of m9422, discussion of mertis of REBOL over cvs 10 http://www.escribe.com/internet/rebol/m9479.html - Volker Nitsch Discussing methods. Partially the reason I started this as there seems ot have been confusion about the final system which Volker was trying to address here.

[2/7] from: joel:neely:fedex at: 25-May-2001 9:13

Hi, Chris, I just wanted to express a high-decibel ***** * * *** * * * * * * *** * * * * * * * * ** * * * * * * * * * * * ***** ***** * * * * * *** * * * * * * * * * * * ** ** * * * * * * * * * * * * * * * * *** **** * For your work in summarizing and organizing the material on this topic! -jn- Chris wrote:

> Hi, > My apologies for the length, but IMO it's important

<<quoted lines omitted: 3>>

> far and summarise the key ideas otherwise we may start to > lose new contributors to the thread or confuse...

------------------------------------------------------------ Programming languages: compact, powerful, simple ... Pick any two! joel'dot'neely'at'fedex'dot'com

[3/7] from: chris:starforge at: 25-May-2001 17:47

#25-May-01# Message from *Joel Neely*: Hi Joel,

> Hi, Chris, > I just wanted to express a high-decibel <snip> > For your work in summarizing and organizing the material on > this topic!

No problem. I've worked on internet based collaborative projects before (and I teach DL training courses) so I know - from bitter experience - how quickly confusion can grind a project to a halt if consise summaries aren't done at milestones. I will post discussion material for Step 3 later, IMO this contains the Big Issues we need to address: the server and remote access scripts may not be trivial, but we know how to do them. All of it will be for nothing if we don't get the step 3 stuff right. Chris -- New sig in the works Explorer 2260, Designer and Coder http://www.starforge.co.uk -- A baby is an alimentary canal with a loud voice at one end and no responsibility at the other.

[4/7] from: chris:starforge at: 25-May-2001 19:56

#25-May-01# Message from *Chris*: Hi Chris, Please refer to http://www.escribe.com/internet/rebol/m9502.html for a full copy of the summary and references.

> - A standard for package implementation, including a specification > of required metadata and coding and documentation standards. > This is discussed in part 3.

Part 3 - Package, coding and documentation standards. As yet there has only really been a cursory descussion of some of the topics this covers, unfortunately this is probably the most important part of the project. Package issues - package hierarchy, contexts, versioning and dependancies. As mentioned in Step 1, packages will be stored in the repositry in a heirarchical structure. The existing script libraries have single level classifications for scripts. While this is sufficient for small collections it is not suitable for very large script libraries. While I don't suggest we attempt to determine the complete hierarchy now - attempting such a thing would be futile at best - some form of structure is plainly needed before the project goes live. I would suggest using the classifications on rebol.com and rebol.org as starting points. Please see Joel's example in reference 1 for the solution to the context issue. Versioning is a problem. As has already been discussed (refs 8 and 9), revisions within versions must retain backwards compatability but versions do not. In theory this is not a problem but consider the situation where two scripts are running, one using v1 of a package and the other using v2. As most filesystems do not provide a method to distinguish files purely on version information, one of them would overwrite the package the other had obtained. This is clearly a Bad Thing. The alternatives as I see them: - Do not store such files locally. This is fine provided your client lives on the end of a 24/7 broadband connection. If this is not the case then this is not really a viable solution. - Try to encourage authors to use the latest version of a package. This is clearly impractical. - Use a linux library like system where source names contain the version number, for example: delim-ascii-html-1_5.r delim-ascii-html-2_1.r Some may find this inelegant, but it is a viable alternative which avoids version clashes. Provided the coding guidelines prevent developers from using versions like 1_4_2_93_2Pre3B1 or something equally rediculous, this scheme should work. A side effect of the latter is that dependancy checks can obtain the correct version without the need for two dependancy blocks: depends-on [/text/conversion/delimited-ascii-2 /foo/bar/thingy-1] as opposed to depends-on [/text/conversion/delimited-ascii /foo/bar/thingy] depends-versions [2 1] (you could argue that depends-on and versions could be combined into one block of course..) Coding standards. In case anyone reading this hasn't read the REBOL style guide, please consult page 4-13 (Scripts - Style Guide) in the /Core user manual before continuing (even if you have, you may want to recap) The REBOL style guide is a good start, but it does not make clear enough distinctions about the standards which must be followed in production code. For the moment we should leave the issue of UHURU metadata to one side and concentrate on the parts of the style guide which must be extended or adhered to. - All the guidelines for formatting and word names can be used as is, they do not directly affect the system. Developers should take note of the word naming rules to make code more readable though. - The Script header section is far too minimal. I would suggest the following as a minimum for a source file in the system: REBOL [ Title: "Delimited ASCII to HTMl conversions" Date: 25-May-2001 Version: 2.1 File: %delim-ascii-html-2_1.r Author: "Joel Neely" Email: [joel--neely--fedex--com] Home: .. umm :) .. History: [ 1.0 [1-May-2001 {Initial version} "Joel Neely"] .... ] Purpose: { Bla bla } ] Feel free to argue why the field I've left out should be in, and why the ones I've put in shouldn't be :)) - The function headers guide should be taken as-is, this is helpful for when docs are auto-generated or the developer needs to use help. Somewhat wasteful for user clients, but as I mentioned - if we compress files for download by the remote access script, this is not too much of a killer issue (IMO - YMMV) - Embedded exmamples should not be included in release code. These should go in the documentation (see later). - Debugging is an interesting problem. My gut reaction is to keep it even in release code as other developers may want to use it while testing their scripts. I realise I'm probably getting above myself messng around with Carl's guidelines here, but UHURU needs to tighten a few things up in here IMO. Now of the contentious part - the metadata storage mechanism. Joel suggests using XML and the parse-xml code, I suggested adding header fields. The following are the summaries of the pros and cons for both as far as I can see. Joel makes some other statements in ref 3 but I object to 1 because there is no definitive document specifying what should be in current headers anyway, so adding a few extra fields shouldn't rock the boat too much.. 1. XML metadata - pros: Possible to strip meta data when installing scripts. The XML block could just be stripped during the download. Parse-xml is already available so little extra work would be needed to implement the XML lookups. Could be used to drive a keyword search Extensible. 2. XML metadata - cons: Requires more work to parse (parse-xml needs to go in and do its bit) Developers need to keep a couple of fields in the header and XML section in step (version info etc). 3. Header metadata - pros: All the metadatafiles are available to the script simply by doing a script-code: load/header followed by a header: pick script-code 1 No additional parsing code is required No need to maintain two copies of some fields Could be used to drive a keyword search Extensible 4. Header metsdata - cons May conflict with RT's plans for the header in the future Increases amount of redundant text in scripts (stripping header fields would be trickier than an XML block). Now I guess these are both pretty evenly matched really: XML wins mainly on space saving, while the header route avoids the need for additional parsing work. I /personally/ prefer the header route as it feels like a simpler solution. If people can expand on the pros and cons above we may be able to decide one way or t'other. Documentation conventions. I'm too tired to have a good go at this one now, I'll try tomorrow unless someone else wants a bash at it :) Chris -- New sig in the works Explorer 2260, Designer and Coder http://www.starforge.co.uk -- Grelb's Reminder: Eighty percent of all people consider themselves to be above average drivers.

[5/7] from: m:koopmans2:chello:nl at: 25-May-2001 21:03

Hi Chris, I think I start believing in the RSL again ;-) Some suggestions from my side: 1) Make the repository database'd. We can use MySQL for that, as bindings for both Core and /Pro /Command exist. Should simplify the keyword search part. 2) Style guides shouldn't be that hard. Formatting is most of the time what make a script readable: that can be automated. Undocumented functions/objects can be detected and a reason for replying to the author the original script with automatically inserted hints, such as <Insert documentation here> My point with 2) is that if you automate, people can code in whatever style they wish as long as they use Rebols built-in doc facilities. This is a realistic approach (IMHO) to get a library/repository up and running. 3) Namespace placement (on submission) can be automated too, I think. One other option is some distributed script network. Thsi can be easily built with the next release of Rugby, as that will feature oneway, deffered and redirect messages. The latter allow you to do something like rexec/oneway/redirect [message] [destination-list] which simplifies P2P messages flying around. --Maarten

[6/7] from: gjones05:mail:orion at: 25-May-2001 16:29

From: "Chris"

<snip> > 1_4_2_93_2Pre3B1 or something equally ridiculous, ...

Darn! :-)

> A side effect of the latter is that dependancy checks can obtain the

correct

> version without the need for two dependancy blocks: > > depends-on [/text/conversion/delimited-ascii-2

/foo/bar/thingy-1]

> as opposed to > > depends-on [/text/conversion/delimited-ascii /foo/bar/thingy] > depends-versions [2 1]

Tcl, by way of example, seems to use both. A subcategory of a library might be tcllib. Folders holding separate versions are named differently, like: lib ... tcllib0.6.1 tcllib0.8 ... However within each of these versions the subfolders are named identically and without version. The actual package name is typically a single name without version, but then the version of the package is embedded in the code as such: for C:\Program Files\TclPro1.4\lib\tcllib0.6.1\base64\base64.tcl contains a line of code like: package provide base64 2.0 whereas, C:\Program Files\TclPro1.4\lib\tcllib0.8\base64\base64.tcl contains a line of code like: package provide base64 2.1 I highly suspect that there was an excellent rationale given (does anyone happen to know?), but I guess the system could have evolved. I do know that the interpreter can easily query and interact with the version numbers. In Tcl, because there is no standard header, an explicit line of code would guarantee that code could interact with the version information. However, I guess that since the proposed standard RSL header would contain the filename, and that filename would then have the version number embedded, that further explicit versioning commands/data would add redundant storage, which is a Bad Thing, in my opinion (more opportunity for errors). Therefore, my intuitive guess is that the former method (eg .../thingy-1 type) is probably the more robust method. Dissention is welcome, of course. I've decided, Chris, that you could make a Magna Carta out of conffetti. Great job. --Scott Jones

[7/7] from: chris:starforge at: 26-May-2001 14:49

#25-May-01# Message from *Maarten Koopmans*: Hi Maarten,

> 1) Make the repository database'd. We can use MySQL for that, as bindings > for both Core and /Pro /Command exist. Should simplify the keyword search > part.

I agree this would help with the search but I do have some objections: - Anyone who wants to set up a mirror needs mySQL. Some places, especially in the UK, charge a lot for database access and some do not use mySQL. This is a potential restriction on the mirroring process. YMMV on how important you think this is. - This would either require the remote access script to do database lookups to obtain the files (which in turn relies on your hoster not firewalling the mySQL port), or some hairy scripting is needed for the RAS to invoke a script on the server, which then trundles away to get the file(s) and sent them back to the RAS. This is as opposed to scriptname: to-url rejoin [ "http://" repositryname path package ] scriptdata: read scriptname in the RAS. Just MO. Chris -- New sig in the works Explorer 2260, Designer and Coder http://www.starforge.co.uk -- You can create your own opportunities this week. Blackmail a senior executive.

Notes

Quoted lines have been omitted from some messages.
View the message alone to see the lines that have been omitted