Tip for splitting very long string ?

[1/8] from: richard::coffre::francetelecom::com at: 15-Apr-2002 11:24

Hi Rebol fellows, Is there a tip to quickly split a very long string more than 2000 characters into n characters sub strings and to create a list with these subsets ? For instance : str: copy "azertyuiopqsdfghjklmwxcvbnazertyuiopqsdfghjklmwxcvbn" I want to split this string into substrings of 5 characters to have a-list: [ "azert" "yuiop" "qsdfg" "hjklm" ...] TIA Richard Coffre France Telecom Orbiscom T=E9l. : 01 47 61 46 28

[2/8] from: brett:codeconscious at: 15-Apr-2002 20:04

A couple of possibilities to check out below. str: copy "azertyuiopqsdfghjklmwxcvbnazertyuiopqsdfghjklmwxcvbn" string-split-up1: function [ string [string!] n [integer!] ][result subrule][ if lesser? n 1 [return none] result: make block! divide length? str n subrule: compose [1 (n) skip] parse/all str [any [copy text subrule (insert tail result text)]] result ] string-split-up2: function [string [string!] n [integer!]][result][ if lesser? n 1 [return none] result: make block! divide length? str n forskip string n [insert tail result copy/part string n] result ] string-split-up1 str 5 If your string/list is very long you might wan't to avoid making the block (or list) in order not to waste time with memory allocations. Regards, Brett. ----- Original Message ----- From: "COFFRE Richard FTO" <[richard--coffre--francetelecom--com]> To: <[rebol-list--rebol--com]> Sent: Monday, April 15, 2002 7:24 PM Subject: [REBOL] Tip for splitting very long string ? Hi Rebol fellows, Is there a tip to quickly split a very long string more than 2000 characters into n characters sub strings and to create a list with these subsets ? For instance : str: copy "azertyuiopqsdfghjklmwxcvbnazertyuiopqsdfghjklmwxcvbn" I want to split this string into substrings of 5 characters to have a-list: [ "azert" "yuiop" "qsdfg" "hjklm" ...] TIA Richard Coffre France Telecom Orbiscom T�l. : 01 47 61 46 28

[3/8] from: joel:neely:fedex at: 15-Apr-2002 8:27

Hi, Richard, COFFRE Richard FTO wrote:

> Hi Rebol fellows, > > Is there a tip to quickly split a very long string more than > 2000 characters into n characters sub strings and to create > a list with these subsets ? > For instance : >

...

> I want to split this string into substrings of 5 characters... >

Here's a quick-and-dirty solution:

>> longstring: "abcdefghijklmnopqrstuvwxyz"

== "abcdefghijklmnopqrstuvwxyz"

>> i: 1

== 1

>> stringblock: []

== []

>> while [i <= length? longstring] [

[ append stringblock copy/part at longstring i 5 [ i: i + 5 [ ] == 31

>> stringblock

== ["abcde" "fghij" "klmno" "pqrst" "uvwxy" "z"]

-jn- -- ; Joel Neely joeldotneelyatfedexdotcom REBOL [] do [ do func [s] [ foreach [a b] s [prin b] ] sort/skip do function [s] [t] [ t: "" foreach [a b] s [repend t [b a]] t ] { | e s m!zauafBpcvekexEohthjJakwLrngohOqrlryRnsctdtiub} 2 ]

[4/8] from: pyxos:netcourrier at: 15-Apr-2002 16:29

That's just I want to avoid but thanks because it's always useful to have several solutions. ----Message d'origine----

>Date: Mon, 15 Apr 2002 08:27:36 -0500 >De: Joel Neely <joel.neely=40fedex.com>

<<quoted lines omitted: 39>>

>rebol-request=40rebol.com with =22unsubscribe=22 in the >subject, without the quotes.

------------------------------------------------- =22Sound Mind, Sound Body=22 Juv=E9nal =22Lire et =EAtre curieux, c'est la m=EAme chose=22 Pascal Quignard =22Qui triomphe de lui-m=EAme poss=E8de la force=22 Lao-Tseu, extrait du Tao Te King =22Dans la course =E0 la qualit=E9, il n'y a pas de ligne d'arriv=E9e=22 David Kearns Allez voir mon site : http://www.desala.org ------------------------------------------------------------- NetCourrier, votre bureau virtuel sur Internet : Mail, Agenda, Clubs, Toolbar... Web/Wap : www.netcourrier.com T=E9l=E9phone/Fax : 08 92 69 00 21 (0,34 =80 TTC/min) Minitel: 3615 NETCOURRIER (0,15 =80 TTC/min)

[5/8] from: anton:lexicon at: 16-Apr-2002 1:24

Forgive me if I seem competitive Joel, but I couldn't resist. I think this would be faster, featuring the insert tail instead of append optimization (which I probably learned from you):

>> string: "abcdefghijklmnopqrstuvwxyz"

== "abcdefghijklmnopqrstuvwxyz"

>> block: copy []

== []

>> while [not tail? string][

insert tail block copy/part string 5 string: skip string 5 ] == ""

>> block

== ["abcde" "fghij" "klmno" "pqrst" "uvwxy" "z"] Note it does modify the index of the string, however you can set it back afterwards. (And it won't matter if you make a function of it and pass the string in). Anton.

[6/8] from: carl:cybercraft at: 16-Apr-2002 23:16

On 16-Apr-02, Richard Coffre wrote:

> That's just I want to avoid but thanks because it's always useful to > have several solutions.

Indeed - so here's another one that uses parsing instead of loops... split: func [ "Copy sub-strings of a set length from a string." str [string!] "String to be split." num [integer!] "Length of sub-strings." /local c-set blk s ][ c-set: charset [#"^(00)" - #"^(ff)"] blk: copy [] parse/all str [some [ s: 1 num c-set (insert tail blk copy/part s num) ]] blk ]

>> split "abcdefghijklmnopqrstuvwxyz" 5

== ["abcde" "fghij" "klmno" "pqrst" "uvwxy" "z"] I tried to cut out the need for the charset in the following function but it ends up in an infinate loop when it starts comparing an empty string with an empty string. Can anyone think of a rule that would override that? Be interesting to know if this would be faster than the above. (If it worked...) split2: func [ "This don't work..." str [string!] num [integer!] /local blk s ][ blk: copy [] parse/all str [some[ s: (s: copy/part s num) s (insert tail blk s) ]] blk ]

> ----Message d'origine---- >> Date: Mon, 15 Apr 2002 08:27:36 -0500

<<quoted lines omitted: 36>>

>> >> -jn-

-- Carl Read

[7/8] from: ingo:2b1 at: 16-Apr-2002 21:28

Hi Carl, Am Die, 2002-04-16 um 13.16 schrieb Carl Read: <..>

> I tried to cut out the need for the charset in the following function > but it ends up in an infinate loop when it starts comparing an empty

<<quoted lines omitted: 13>>

> blk > ]

<..> you could do it like this: split2: func [ "This don't work... Now it does ;-)" str [string!] num [integer!] /local blk s ][ blk: copy [] parse/all str [some[ s: (s: copy/part s num insert tail blk s) num skip ]] blk ] just 'skip the right number of characters. The 'insert is in the first paren!, because otherwise the last charecters would be added if the string does not contain a multiple of num characters. Kind regards, Ingo

[8/8] from: carl:cybercraft at: 17-Apr-2002 19:36

On 17-Apr-02, Ingo Hohmann wrote:

> Hi Carl, > Am Die, 2002-04-16 um 13.16 schrieb Carl Read:

<<quoted lines omitted: 34>>

> charecters would be added if the string does not contain a multiple > of num characters.

Very nice Ingo! And I now realise skip needs its numbers preceeding it, not following it. No wonder I'd never been able to get it to work properly before. (: So, here's the final version of the function (perhaps:)... split: func [ "Copy sub-strings of a set length from a string." str [string!] "String to be split." num [integer!] "Length of sub-strings." /local blk s ][ blk: copy [] parse/all str [some[ s: 1 num skip (insert tail blk copy/part s num) ]] blk ] Note I moved the skip to before the paren and gave it a range with a low value of 1. This was because your version would add an empty string to the block when the string could be divided evenly. ie...

>> split2 "abcde" 5

== ["abcde" ""] I reduced the contents of the paren a bit too, there being some redundancy there. On my system this function is a bit faster than my charset version and about 30% faster than the looping version posted by Anton, so thumbs up for parsing in this case. -- Carl Read

Notes

Quoted lines have been omitted from some messages.
View the message alone to see the lines that have been omitted