REBOL.org Script Library

Documention for: application-sizer.r
Created by: sunanda
on: 1-Dec-2008
Last updated by: sunanda on: 7-Dec-2008
Format: html
Downloaded on: 8-May-2024

     Author: Sunanda
    Version: 0.0.3
       Date: 6-dec-2008

1. Application sizer

Purpose: to generate some metrics on a REBOL application to give some indication of how large / complex the application is.

These metrics are highly interpretable. So I suspect the early versions of this application is more likely to generate discussion than facts!

2. Installation

Copy the script to a folder.

You need do nothing more. You could edit the configuration object to meet your precise circumstances, but it is probably easier to run the script with the /config refinement.

3. Running

3.1. Basic

    do %application-sizer.r
    results: app-sizer/run
    probe results

Will run with the default configuration: it will count all files:

in the current folder
that have a .r suffix
and have a REBOL [...] header
It will not count any file called application-sizer.r

All these options (and more) are configurable -- see Configuration below.

3.2. Creating a CSV file

    do %application-sizer.r
    results: app-sizer/run/csv
    probe results

The default CSV file is called application-sizer.csv.

If it does not exist, it is created with a header row and one data row
If it does exist, a new data row is added to the end. So you can gather stats on the same application on different days, or on different applications, into the same spreadsheet

3.3. With an override configuration

You can change several settings by supplying a config object:

    my-config: make object! [
                  app-name: "website project"
                  app-folders: [%/c/mystuff/ %/c/morestuff/]
                 ]
    do %application-sizer.r
    results: app-sizer/run/config my-config
    probe results

In the example above, the config object changes:

the default application name (useful if you are using the /csv refinement to gather stats on several applications); and
the folders searched for scripts

3.4. /config and /csv

    do %application-sizer.r
    results: app-sizer/run/csv config make object! [csv-file-name: %my-csv-file.txt]
    probe results

In the example above, a CSV file with the name my-csv-file.txt will be created (or extended if it already exists).

3.5. Preprocessing the files


 ppf: func [
     folder-name [file!]
       file-name [file!]
   file-contents [string!]
 ][
   if find file-name "-test-" [return none]
   return file-contents
   ]

 do %application-sizer.r
    results: app-sizer/run/preprocess :ppf
    probe results

The example uses a preprocessing function to perform manipulations on the script file before it is processed.

The example excludes some files (those whose name contains the string "-test-") by returning none. For files it wants processed it simply returns the supplied file.

The preprocess function may also return a block of strings. If so, each string will be counted as a separate file. This is a way of expanding one file to many (perhaps you actually hold the sources in a Source Control Facility -- if so, you can use the preprocess function to return them all).

Another use for preprocessing is to expand the source if you are using PREBOL or other include subsystems.

4. Interpreting the results

Application-sizer returns you an object that looks like this:

make object! [
     app-sizer-version: 0.0.2
     run-date: 4-Dec-2008
     app-name: "REBOL Library CGIs"
     app-version: "None"
     folders: 2
     files: 257
     raw-bytes: 2116471.0
     compressed-size: 440048
     raw-lines: 88589.0
     code-lines: 42621.0
     elements: make object! [
         string: [16005 227226.0]
         datatype: [2571 16878.0]
         number: [3609 5413.0]
         refinement: [1320 8672.0]
         function: [10814 64803.0]
         operator: [3257 3859.0]
         native: [15915 70030.0]
         action: [9914 54112.0]
         object: [170 3457.0]
         image: [0 0.0]
         comment: [14981 529475.0]
         body: [57781 614478.0]
         whitespace: [136337 637107.0]
     ]
     element-definitions: [
     "comment" [cmt]
     "datatype" [date! issue! money! pair! time! tuple!]
     "number" [decimal! integer!]
     "refinement" [refinement!]
     "string" [char! email! file! string! tag! url!]
     ]
 ]

The results object above, incidentally, comes from running the application against the cgi scripts that run the REBOL.org website. So the numbers are for a real application.

The various counts and numbers are:

folders:	count of the number of folders
files:	total unique script files counted. Unique script files are identified by a checksum/secure test. So a script is only counted once even if it is present in more than one of your folders
raw-bytes:	total length of all the scripts
compressed-size:	length of compress all-scripts -- gives an indication of the density
raw-lines:	total lines (as measured by length? read/lines)
code-lines:	Total lines that are not: blank (only spaces) comments (first non whitespace is ; minimal (sole content is ], [, { or } minimal code plus comment (eg ] ;end of func)
elements:	Counts of various elements of the scripts. The elements identified and counted are: string datatype number refinement function operator native action object image comment body whitespace each element has a block with two numbers: 1. the count -- how many of them found 2. the size -- total size of them
element-definitions:	The definitions used when identifying the above elements. For example, it is a string if it was any of these datatypes: char! email! file! string! tag! url! If you do not like those definitions, you can change some of them by editing the colors block in the script

5. CSV output file

If you have one of these:

it is a comma delimited text file (not tab delimited)
a row is appended each time you run the application (run date is a column in the data)
a header row is added when the file is created
you can specify your own CSV file by an entry in the configuration.

The columns are the same (and in the same order) as the results object described above. The only difference is that block elements are named by an -n suffix to the name.

It is not guaranteed that the columns will remain consistent between versions of application-sizer.r. (The version number of application-sizer.r is a column in the data). However, output from version 0.0.4 is compatible with output from 0.0.3.

6. Configuration

The full set of configuration options are:

Option	Default	Purpose
app-name:	"No name"	string The name of your application
app-version:	"None"	string The version of your application
app-folders:	[%./]	file or block of files: folder or folders to search. If more than one folder, supply as a block
exclude-files:	[%application-sizer.r]	block of files: files to ignore
source-suffixes:	[%.r]	block of files What suffixes define a script
add-header?:	false	logic: If the script cannot be loaded, should a REBOL [] header be added to a copy of it?
minimal-chars:	"{}[]"	string lines containing only these characters will not be counted as code lines
csv-file-name:	%application-sizer.csv	file: csv file name -- if you are using the /csv refinement

7. Limitations -- practical

application-sizer is an R2 product, not an R3 one:
- it uses load/header not transcode to interpret code
- each load/header may add words to system/words. In R2 this block has a finite size, so a large application may overflow it
conflicts in the needs: header may stop you sizing an entire application in one go. For example, in a client/server application, some scripts may have a need for View while others need Command

comment [....] blocks are counted as code. Comments in the counts refer only to partial lines that start with ";"

8. Limitations -- philosophical

8.1. What is a line of code?

All of the following count as 2 lines:

     if a = b [
         c: true
     ]

     if a = b
        [
         c: true
     ]

     if a = b
         [c: true]

But this is 1 line:

     if a = b [c: true]

8.2. What is an application?

An application may have Javascript on the client-side and some C++ or other language on the server-side. Just counting the REBOL code may miss important elements.

8.3. What is the size of an application?

In the REBOL.org CGI scripts above, some things have been included when they should not, and perhaps vice versa:

8.4. Not counted

many HTML templates.....If they'd been coded in some sort or REBOL dialect, they would be counted
some images. These are hardly crucial for REBOL.org, but could be a major part of the application in other cases.

8.5. Counted when it should not be

REBOL.org makes use of many "library functions" -- ie scripts written for a different application, and included in the REBOL.org code base. Examples include skimp.r and color-code.r. These are being counted towards the total application size, when they really should not be.

9. What does it all mean in practice?

I'm hoping a few people will publish their application sizes, and we'll get some discussion going. Watch this space!

10. Change log

Thanks to Dockimbel, Reichart, Robert Munch and Ashley Truter for suggestions that improve the metrics.

10.1. 0.0.0 1-dec-2008

First release.

10.2. 0.0.1 3-dec-2008

added version to results object
non-code lines counted differently.
initial value of folders to search is the current folder
added exclude-list for files that are not be included
added operator, action, native, object, and image to the datatypes counted. (previously, these were all counted as function).

10.3. 0.0.2 4-dec-2008

changed Size counts to decimal
added /csv refinement
added /config refinement

10.4. 0.0.3 4-dec-2008

added /preprocess refinement
changed app-folder in config: may be a single file, or a block of files

Last updated: 7-Dec-2008