[REBOL] Re: Line reduction
From: aroberts:swri at: 5-Jul-2001 12:47
Thanks for the suggestions! You are correct, it does nix the first line. The line which
is nixed initially is not important, but rather the net effect of the removals. Long
sentence to say - Cut the amount of input data in half. If I choose 2, 3, etc, then
I'm only reducing it by a smaller amount (2, reduces the file by a third, 3 only reduces
it by a fourth etc.)
Maybe I should have explained more background -
I have a very large dataset comprised of numbers. The values come in a set of three,
one set to a line. The ordering of the sets is arbitrary. I need a way of reducing
the data down, so I can get a 'snap shot' of the full data set. To do this, I decided
to remove every other line (reduce by 50%). If I run the program on the new data, I
can keep reducing by 50% each time. Since I can choose how many lines to skip, I can
alter the amount of loss from 50% or less.
It takes quite a long time to run the program I wrote using a 2.5Meg data set.
This brings me to my thoughts/questions -
1. The data file could have numbers like so: 34564 23512 18372. As a string, this takes
15(17 with spaces) bytes of memory. The number representation would take 2 bytes per
value or 6 bytes total. Its much faster to search thorough 6 bytes than 15(17). Since
I'm working with the file as a huge string, is there a faster way to do what I'm doing,
capitalizing on the fact all the data is numerical data?
I ran your version of the program on a 1.2 Meg file. Your prg time: 13s. The original:
32s. I did this after writing the above, so I figured I'd leave it in as food for thought.
I doubt there is much which could be done to speed up your solution. The files I would
normally work with are about 2 to 3 Meg, in the form of my original post.
2. Is there a way to not type cast all of my inputs?
3. What editor would one recommend for using rebol? I prefer to have stylized text if
> REBOL [
> TITLE: "Remove lines"
> DATE: 3-July-01
> input_file: ask "Enter the source file name: "
> output_file: ask "Enter the output file name: "
> skip_amount: ask "Remove lines every X line: "
> datafile: read/lines to-file input_file
> forskip datafile to-integer skip_amount[
> remove datafile
> datafile: head datafile
> write/lines to-file output_file datafile
If you really want to remove the *last* of every group of lines
(SKIP_AMOUNT + 1 in all), you might consider this (note the
change in the prompt string):
input_file: ask "Enter the source file name: "
output_file: ask "Enter the output file name: "
skip_amount: ask "Remove line after every X lines: "
datafile: read/lines to-file input_file
use [len skp] [
len: length? datafile
skp: 1 + to-integer skip_amount
for rem len - (len // skp) skp (- skp) [
remove at datafile rem
write/lines to-file output_file datafile