SPSSX Discussion

How to split text properly into new variables?

Classic

List

Threaded

7 messages Options

88videos

How to split text properly into new variables?

Hello again :)

I have base like this

ID,var1
1,a) żaba żabka albo żabeczka b) łapka
2,a) ryba rybka rybeńka maleńka (np.sledzik) b) kotek c) piesek d) chomieczek
3,a) zenon b) marian i hela c) alekasadra(ola)

and want to have like this.

ID, var2, var3, var4, var5
1, a) żaba żabka albo żabeczka, b) łapka
2, a) ryba rybka rybeńka maleńka (np.sledzik), b) kotek, c) piesek, d) chomieczek
3, a) zenon, b) marian i hela, c) alekasadra(ola),

To do this I run char.index to to find where "a)" , "b)", "c)" and "d)" were used and substr do split text.
It works if don't use polish letters like "ż", "ł".
This is caused by the fact that substr count that letters as 2 characters.
Below example.

Maybe you can show me other method, how to do it and keep polish letters?

**********************************
*without polish letters
**********************************

data list list
/ID(f8.0) var1(a90).
begin data.
1 'a) zaba zabka albo zabeczka b) lapka'
2 'a) ryba rybka rybenka malenka (np.sledzik) b) kotek c) piesek d) chomieczek'
3 'a) zenon b) marian i hela c) alekasadra(ola)'
4
5
6
7
8
9
10
end data.
execute.
DATASET NAME base1.
DATASET ACTIVATE base1.

compute a=CHAR.INDEX(var1, 'a)').
compute b=CHAR.INDEX(var1, 'b)').
compute c=CHAR.INDEX(var1, 'c)').
compute d=CHAR.INDEX(var1, 'd)').
execute.

string var2 to var5(a60).

do if a<>0 and b<>0.
compute var2=SUBSTR(var1, a, b-a).
else if a<>0 and b=0.
compute var2=SUBSTR(var1, a, 90).
end if.
execute.

do if b<>0 and c<>0.
compute var3=SUBSTR(var1, b, c-b).
else if b<>0 and c=0.
compute var3=SUBSTR(var1, b, 90).
end if.
execute.

do if c<>0 and d<>0.
compute var4=SUBSTR(var1, c, d-c).
else if c<>0 and d=0.
compute var4=SUBSTR(var1, c, 90).
end if.
execute.

do if d<>0.
compute var5=SUBSTR(var1, d, 90).
end if.
execute.

**********************************
*with polish letters
**********************************

data list list
/ID(f8.0) var1(a90).
begin data.
1 'a) żaba żabka albo żabeczka b) łapka'
2 'a) ryba rybka rybeńka maleńka (np.sledzik) b) kotek c) piesek d) chomieczek'
3 'a) zenon b) marian i hela c) alekasadra(ola)'
4
5
6
7
8
9
10
end data.
execute.
DATASET NAME base2.
DATASET ACTIVATE base2.

compute a=CHAR.INDEX(var1, 'a)').
compute b=CHAR.INDEX(var1, 'b)').
compute c=CHAR.INDEX(var1, 'c)').
compute d=CHAR.INDEX(var1, 'd)').
execute.

string var2 to var5(a60).

do if a<>0 and b<>0.
compute var2=SUBSTR(var1, a, b-a).
else if a<>0 and b=0.
compute var2=SUBSTR(var1, a, 90).
end if.
execute.

do if b<>0 and c<>0.
compute var3=SUBSTR(var1, b, c-b).
else if b<>0 and c=0.
compute var3=SUBSTR(var1, b, 90).
end if.
execute.

do if c<>0 and d<>0.
compute var4=SUBSTR(var1, c, d-c).
else if c<>0 and d=0.
compute var4=SUBSTR(var1, c, 90).
end if.
execute.

do if d<>0.
compute var5=SUBSTR(var1, d, 90).
end if.
execute.

Wolny od wirusów. www.avast.com

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD

Jon Peck

Re: How to split text properly into new variables?

Use char.substr. That works on characters regardless of number of bytes

. This could also be done with a regular expression and spssinc trans with much less code.

On Tue, Jul 4, 2017 at 6:14 PM 88Videoclips . <[hidden email]> wrote:

Hello again :)

I have base like this

ID,var1
1,a) żaba żabka albo żabeczka b) łapka
2,a) ryba rybka rybeńka maleńka (np.sledzik) b) kotek c) piesek d) chomieczek
3,a) zenon b) marian i hela c) alekasadra(ola)

and want to have like this.

ID, var2, var3, var4, var5
1, a) żaba żabka albo żabeczka, b) łapka
2, a) ryba rybka rybeńka maleńka (np.sledzik), b) kotek, c) piesek, d) chomieczek
3, a) zenon, b) marian i hela, c) alekasadra(ola),

To do this I run char.index to to find where "a)" , "b)", "c)" and "d)" were used and substr do split text.
It works if don't use polish letters like "ż", "ł".
This is caused by the fact that substr count that letters as 2 characters.
Below example.

Maybe you can show me other method, how to do it and keep polish letters?

**********************************
*without polish letters
**********************************

data list list
/ID(f8.0) var1(a90).
begin data.
1 'a) zaba zabka albo zabeczka b) lapka'
2 'a) ryba rybka rybenka malenka (np.sledzik) b) kotek c) piesek d) chomieczek'
3 'a) zenon b) marian i hela c) alekasadra(ola)'
4
5
6
7
8
9
10
end data.
execute.
DATASET NAME base1.
DATASET ACTIVATE base1.

compute a=CHAR.INDEX(var1, 'a)').
compute b=CHAR.INDEX(var1, 'b)').
compute c=CHAR.INDEX(var1, 'c)').
compute d=CHAR.INDEX(var1, 'd)').
execute.

string var2 to var5(a60).

do if a<>0 and b<>0.
compute var2=SUBSTR(var1, a, b-a).
else if a<>0 and b=0.
compute var2=SUBSTR(var1, a, 90).
end if.
execute.

do if b<>0 and c<>0.
compute var3=SUBSTR(var1, b, c-b).
else if b<>0 and c=0.
compute var3=SUBSTR(var1, b, 90).
end if.
execute.

do if c<>0 and d<>0.
compute var4=SUBSTR(var1, c, d-c).
else if c<>0 and d=0.
compute var4=SUBSTR(var1, c, 90).
end if.
execute.

do if d<>0.
compute var5=SUBSTR(var1, d, 90).
end if.
execute.

**********************************
*with polish letters
**********************************

data list list
/ID(f8.0) var1(a90).
begin data.
1 'a) żaba żabka albo żabeczka b) łapka'
2 'a) ryba rybka rybeńka maleńka (np.sledzik) b) kotek c) piesek d) chomieczek'
3 'a) zenon b) marian i hela c) alekasadra(ola)'
4
5
6
7
8
9
10
end data.
execute.
DATASET NAME base2.
DATASET ACTIVATE base2.

compute a=CHAR.INDEX(var1, 'a)').
compute b=CHAR.INDEX(var1, 'b)').
compute c=CHAR.INDEX(var1, 'c)').
compute d=CHAR.INDEX(var1, 'd)').
execute.

string var2 to var5(a60).

do if a<>0 and b<>0.
compute var2=SUBSTR(var1, a, b-a).
else if a<>0 and b=0.
compute var2=SUBSTR(var1, a, 90).
end if.
execute.

do if b<>0 and c<>0.
compute var3=SUBSTR(var1, b, c-b).
else if b<>0 and c=0.
compute var3=SUBSTR(var1, b, 90).
end if.
execute.

do if c<>0 and d<>0.
compute var4=SUBSTR(var1, c, d-c).
else if c<>0 and d=0.
compute var4=SUBSTR(var1, c, 90).
end if.
execute.

do if d<>0.
compute var5=SUBSTR(var1, d, 90).
end if.
execute.

Wolny od wirusów. www.avast.com

=====================

To manage your subscription to SPSSX-L, send a message to

[hidden email] (not to SPSSX-L), with no body text except the

command. To leave the list, send the command

SIGNOFF SPSSX-L

For a list of commands to manage subscriptions, send the command

INFO REFCARD

Bruce Weaver

Re: How to split text properly into new variables?

Administrator

In reply to this post by 88videos

Here is a shorter version of your syntax that appears to work.

DO REPEAT v = a b c d / s = 'a)' 'b)' 'c)' 'd)'.
- COMPUTE v=CHAR.INDEX(var1,s).
END REPEAT.

STRING var2 to var5(a60).

DO REPEAT a = a b c / b = b c d / v = var2 var3 var4.
- DO IF a NE 0.
- IF b NE 0 v=CHAR.SUBSTR(var1, a, b-a).
- IF b EQ 0 v=CHAR.SUBSTR(var1, a, 90).
- END IF.
END REPEAT.
IF d NE 0 var5=CHAR.SUBSTR(var1, d, 90).
FORMATS a to d (F5.0).
LIST var2 to var5.

88videos wrote

Hello again :)

I have base like this

ID,var1
1,a) żaba żabka albo żabeczka b) łapka
2,a) ryba rybka rybeńka maleńka (np.sledzik) b) kotek c) piesek d)
chomieczek
3,a) zenon b) marian i hela c) alekasadra(ola)

and want to have like this.

ID, var2, var3, var4, var5
1, a) żaba żabka albo żabeczka, b) łapka
2, a) ryba rybka rybeńka maleńka (np.sledzik), b) kotek, c) piesek, d)
chomieczek
3, a) zenon, b) marian i hela, c) alekasadra(ola),

To do this I run char.index to to find where "a)" , "b)", "c)" and "d)"
were used and substr do split text.
It works if don't use polish letters like "ż", "ł".
This is caused by the fact that substr count that letters as 2 characters.
Below example.

*Maybe you can show me other method, how to do it and keep polish letters?*

**********************************
*without polish letters
**********************************

data list list
/ID(f8.0) var1(a90).
begin data.
1 'a) zaba zabka albo zabeczka b) lapka'
2 'a) ryba rybka rybenka malenka (np.sledzik) b) kotek c) piesek d)
chomieczek'
3 'a) zenon b) marian i hela c) alekasadra(ola)'
4
5
6
7
8
9
10
end data.
execute.
DATASET NAME base1.
DATASET ACTIVATE base1.

compute a=CHAR.INDEX(var1, 'a)').
compute b=CHAR.INDEX(var1, 'b)').
compute c=CHAR.INDEX(var1, 'c)').
compute d=CHAR.INDEX(var1, 'd)').
execute.

string var2 to var5(a60).

do if a<>0 and b<>0.
compute var2=SUBSTR(var1, a, b-a).
else if a<>0 and b=0.
compute var2=SUBSTR(var1, a, 90).
end if.
execute.

do if b<>0 and c<>0.
compute var3=SUBSTR(var1, b, c-b).
else if b<>0 and c=0.
compute var3=SUBSTR(var1, b, 90).
end if.
execute.

do if c<>0 and d<>0.
compute var4=SUBSTR(var1, c, d-c).
else if c<>0 and d=0.
compute var4=SUBSTR(var1, c, 90).
end if.
execute.

do if d<>0.
compute var5=SUBSTR(var1, d, 90).
end if.
execute.

**********************************
*with polish letters
**********************************

data list list
/ID(f8.0) var1(a90).
begin data.
1 'a) żaba żabka albo żabeczka b) łapka'
2 'a) ryba rybka rybeńka maleńka (np.sledzik) b) kotek c) piesek d)
chomieczek'
3 'a) zenon b) marian i hela c) alekasadra(ola)'
4
5
6
7
8
9
10
end data.
execute.
DATASET NAME base2.
DATASET ACTIVATE base2.

compute a=CHAR.INDEX(var1, 'a)').
compute b=CHAR.INDEX(var1, 'b)').
compute c=CHAR.INDEX(var1, 'c)').
compute d=CHAR.INDEX(var1, 'd)').
execute.

string var2 to var5(a60).

do if a<>0 and b<>0.
compute var2=SUBSTR(var1, a, b-a).
else if a<>0 and b=0.
compute var2=SUBSTR(var1, a, 90).
end if.
execute.

do if b<>0 and c<>0.
compute var3=SUBSTR(var1, b, c-b).
else if b<>0 and c=0.
compute var3=SUBSTR(var1, b, 90).
end if.
execute.

do if c<>0 and d<>0.
compute var4=SUBSTR(var1, c, d-c).
else if c<>0 and d=0.
compute var4=SUBSTR(var1, c, 90).
end if.
execute.

do if d<>0.
compute var5=SUBSTR(var1, d, 90).
end if.
execute.

<https://www.avast.com/sig-email?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=webmail>
Wolny
od wirusów. www.avast.com
<https://www.avast.com/sig-email?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=webmail>
<#DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2>

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

--
Bruce Weaver
bweaver@lakeheadu.ca
http://sites.google.com/a/lakeheadu.ca/bweaver/

"When all else fails, RTFM."

PLEASE NOTE THE FOLLOWING:
1. My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above.
2. The SPSSX Discussion forum on Nabble is no longer linked to the SPSSX-L listserv administered by UGA (https://listserv.uga.edu/).

Jon Peck

Re: How to split text properly into new variables?

These patterns are all at risk, though, of parentheses in the main text. Case 3 has k). If a-d) could appear in the text, a smarter algorithm that would ignore matching parentheses would be needed.

On Wed, Jul 5, 2017 at 6:38 AM, Bruce Weaver <[hidden email]> wrote:

Here is a shorter version of your syntax that appears to work.

DO REPEAT v = a b c d / s = 'a)' 'b)' 'c)' 'd)'.
- COMPUTE v=CHAR.INDEX(var1,s).
END REPEAT.

STRING var2 to var5(a60).

DO REPEAT a = a b c / b = b c d / v = var2 var3 var4.
- DO IF a NE 0.
- IF b NE 0 v=CHAR.SUBSTR(var1, a, b-a).
- IF b EQ 0 v=CHAR.SUBSTR(var1, a, 90).
- END IF.
END REPEAT.
IF d NE 0 var5=CHAR.SUBSTR(var1, d, 90).
FORMATS a to d (F5.0).
LIST var2 to var5.

88videos wrote
> Hello again :)
>
> I have base like this
>
> ID,var1
> 1,a) żaba żabka albo żabeczka b) łapka
> 2,a) ryba rybka rybeńka maleńka (np.sledzik) b) kotek c) piesek d)
> chomieczek
> 3,a) zenon b) marian i hela c) alekasadra(ola)
>
> and want to have like this.
>
> ID, var2, var3, var4, var5
> 1, a) żaba żabka albo żabeczka, b) łapka
> 2, a) ryba rybka rybeńka maleńka (np.sledzik), b) kotek, c) piesek, d)
> chomieczek
> 3, a) zenon, b) marian i hela, c) alekasadra(ola),
>
> To do this I run char.index to to find where "a)" , "b)", "c)" and "d)"
> were used and substr do split text.
> It works if don't use polish letters like "ż", "ł".
> This is caused by the fact that substr count that letters as 2 characters.
> Below example.
>
>
>
> *Maybe you can show me other method, how to do it and keep polish

> letters?*
>
>
>
> **********************************
> *without polish letters
> **********************************
>
> data list list
> /ID(f8.0) var1(a90).
> begin data.
> 1 'a) zaba zabka albo zabeczka b) lapka'
> 2 'a) ryba rybka rybenka malenka (np.sledzik) b) kotek c) piesek d)
> chomieczek'
> 3 'a) zenon b) marian i hela c) alekasadra(ola)'
> 4
> 5
> 6
> 7
> 8
> 9
> 10
> end data.
> execute.
> DATASET NAME base1.
> DATASET ACTIVATE base1.
>
> compute a=CHAR.INDEX(var1, 'a)').
> compute b=CHAR.INDEX(var1, 'b)').
> compute c=CHAR.INDEX(var1, 'c)').
> compute d=CHAR.INDEX(var1, 'd)').
> execute.
>
> string var2 to var5(a60).
>
> do if a<>0 and b<>0.
> compute var2=SUBSTR(var1, a, b-a).
> else if a<>0 and b=0.
> compute var2=SUBSTR(var1, a, 90).
> end if.
> execute.
>
>
> do if b<>0 and c<>0.
> compute var3=SUBSTR(var1, b, c-b).
> else if b<>0 and c=0.
> compute var3=SUBSTR(var1, b, 90).
> end if.
> execute.
>
>
> do if c<>0 and d<>0.
> compute var4=SUBSTR(var1, c, d-c).
> else if c<>0 and d=0.
> compute var4=SUBSTR(var1, c, 90).
> end if.
> execute.
>
>
> do if d<>0.
> compute var5=SUBSTR(var1, d, 90).
> end if.
> execute.
>
>
>
> **********************************
> *with polish letters
> **********************************
>
> data list list
> /ID(f8.0) var1(a90).
> begin data.
> 1 'a) żaba żabka albo żabeczka b) łapka'
> 2 'a) ryba rybka rybeńka maleńka (np.sledzik) b) kotek c) piesek d)
> chomieczek'
> 3 'a) zenon b) marian i hela c) alekasadra(ola)'
> 4
> 5
> 6
> 7
> 8
> 9
> 10
> end data.
> execute.
> DATASET NAME base2.
> DATASET ACTIVATE base2.
>
> compute a=CHAR.INDEX(var1, 'a)').
> compute b=CHAR.INDEX(var1, 'b)').
> compute c=CHAR.INDEX(var1, 'c)').
> compute d=CHAR.INDEX(var1, 'd)').
> execute.
>
> string var2 to var5(a60).
>
> do if a<>0 and b<>0.
> compute var2=SUBSTR(var1, a, b-a).
> else if a<>0 and b=0.
> compute var2=SUBSTR(var1, a, 90).
> end if.
> execute.
>
>
> do if b<>0 and c<>0.
> compute var3=SUBSTR(var1, b, c-b).
> else if b<>0 and c=0.
> compute var3=SUBSTR(var1, b, 90).
> end if.
> execute.
>
>
> do if c<>0 and d<>0.
> compute var4=SUBSTR(var1, c, d-c).
> else if c<>0 and d=0.
> compute var4=SUBSTR(var1, c, 90).
> end if.
> execute.
>
>
> do if d<>0.
> compute var5=SUBSTR(var1, d, 90).
> end if.
> execute.
>
>

> <https://www.avast.com/sig-email?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=webmail>
> Wolny
> od wirusów. www.avast.com
> <https://www.avast.com/sig-email?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=webmail>
> <#DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2>
>
> =====================
> To manage your subscription to SPSSX-L, send a message to

> LISTSERV@.UGA

> (not to SPSSX-L), with no body text except the
> command. To leave the list, send the command
> SIGNOFF SPSSX-L
> For a list of commands to manage subscriptions, send the command
> INFO REFCARD

-----
--
Bruce Weaver
[hidden email]
http://sites.google.com/a/lakeheadu.ca/bweaver/

"When all else fails, RTFM."

NOTE: My Hotmail account is not monitored regularly.
To send me an e-mail, please use the address shown above.

--
View this message in context: http://spssx-discussion.1045642.n5.nabble.com/How-to-split-text-properly-into-new-variables-tp5734501p5734503.html
Sent from the SPSSX Discussion mailing list archive at Nabble.com.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

Jon K Peck
[hidden email]

David Marso

Re: How to split text properly into new variables?

Administrator

In reply to this post by 88videos

First of all use CHAR.SUBSTR and CHAR.INDEX.
Second, your embedded ) is boning you.
I suggest searching for ( and then locating the matching ).
Replace these with [ and ] respectively.

---

88videos wrote

Hello again :)

I have base like this

ID,var1
1,a) żaba żabka albo żabeczka b) łapka
2,a) ryba rybka rybeńka maleńka (np.sledzik) b) kotek c) piesek d)
chomieczek
3,a) zenon b) marian i hela c) alekasadra(ola)

and want to have like this.

ID, var2, var3, var4, var5
1, a) żaba żabka albo żabeczka, b) łapka
2, a) ryba rybka rybeńka maleńka (np.sledzik), b) kotek, c) piesek, d)
chomieczek
3, a) zenon, b) marian i hela, c) alekasadra(ola),

To do this I run char.index to to find where "a)" , "b)", "c)" and "d)"
were used and substr do split text.
It works if don't use polish letters like "ż", "ł".
This is caused by the fact that substr count that letters as 2 characters.
Below example.

*Maybe you can show me other method, how to do it and keep polish letters?*

**********************************
*without polish letters
**********************************

data list list
/ID(f8.0) var1(a90).
begin data.
1 'a) zaba zabka albo zabeczka b) lapka'
2 'a) ryba rybka rybenka malenka (np.sledzik) b) kotek c) piesek d)
chomieczek'
3 'a) zenon b) marian i hela c) alekasadra(ola)'
4
5
6
7
8
9
10
end data.
execute.
DATASET NAME base1.
DATASET ACTIVATE base1.

compute a=CHAR.INDEX(var1, 'a)').
compute b=CHAR.INDEX(var1, 'b)').
compute c=CHAR.INDEX(var1, 'c)').
compute d=CHAR.INDEX(var1, 'd)').
execute.

string var2 to var5(a60).

do if a<>0 and b<>0.
compute var2=SUBSTR(var1, a, b-a).
else if a<>0 and b=0.
compute var2=SUBSTR(var1, a, 90).
end if.
execute.

do if b<>0 and c<>0.
compute var3=SUBSTR(var1, b, c-b).
else if b<>0 and c=0.
compute var3=SUBSTR(var1, b, 90).
end if.
execute.

do if c<>0 and d<>0.
compute var4=SUBSTR(var1, c, d-c).
else if c<>0 and d=0.
compute var4=SUBSTR(var1, c, 90).
end if.
execute.

do if d<>0.
compute var5=SUBSTR(var1, d, 90).
end if.
execute.

**********************************
*with polish letters
**********************************

data list list
/ID(f8.0) var1(a90).
begin data.
1 'a) żaba żabka albo żabeczka b) łapka'
2 'a) ryba rybka rybeńka maleńka (np.sledzik) b) kotek c) piesek d)
chomieczek'
3 'a) zenon b) marian i hela c) alekasadra(ola)'
4
5
6
7
8
9
10
end data.
execute.
DATASET NAME base2.
DATASET ACTIVATE base2.

compute a=CHAR.INDEX(var1, 'a)').
compute b=CHAR.INDEX(var1, 'b)').
compute c=CHAR.INDEX(var1, 'c)').
compute d=CHAR.INDEX(var1, 'd)').
execute.

string var2 to var5(a60).

do if a<>0 and b<>0.
compute var2=SUBSTR(var1, a, b-a).
else if a<>0 and b=0.
compute var2=SUBSTR(var1, a, 90).
end if.
execute.

do if b<>0 and c<>0.
compute var3=SUBSTR(var1, b, c-b).
else if b<>0 and c=0.
compute var3=SUBSTR(var1, b, 90).
end if.
execute.

do if c<>0 and d<>0.
compute var4=SUBSTR(var1, c, d-c).
else if c<>0 and d=0.
compute var4=SUBSTR(var1, c, 90).
end if.
execute.

do if d<>0.
compute var5=SUBSTR(var1, d, 90).
end if.
execute.

<https://www.avast.com/sig-email?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=webmail>
Wolny
od wirusów. www.avast.com
<https://www.avast.com/sig-email?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=webmail>
<#DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2>

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me.
---
"Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis."
Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?"

88videos

Re: How to split text properly into new variables?

In reply to this post by 88videos

Like always I got very useful advices here.Thanks!!

As Jon and David mentioned the most problematic issue is that "a)/b)/c)..." could be part of text and appear in case many times. It is not inconceivable that some work must be done under supervision.

2017-07-05 2:14 GMT+02:00 88Videoclips . <[hidden email]>:

Hello again :)

I have base like this

ID,var1
1,a) żaba żabka albo żabeczka b) łapka
2,a) ryba rybka rybeńka maleńka (np.sledzik) b) kotek c) piesek d) chomieczek
3,a) zenon b) marian i hela c) alekasadra(ola)

and want to have like this.

ID, var2, var3, var4, var5
1, a) żaba żabka albo żabeczka, b) łapka
2, a) ryba rybka rybeńka maleńka (np.sledzik), b) kotek, c) piesek, d) chomieczek
3, a) zenon, b) marian i hela, c) alekasadra(ola),

To do this I run char.index to to find where "a)" , "b)", "c)" and "d)" were used and substr do split text.
It works if don't use polish letters like "ż", "ł".
This is caused by the fact that substr count that letters as 2 characters.
Below example.

Maybe you can show me other method, how to do it and keep polish letters?

**********************************
*without polish letters
**********************************

data list list
/ID(f8.0) var1(a90).
begin data.
1 'a) zaba zabka albo zabeczka b) lapka'
2 'a) ryba rybka rybenka malenka (np.sledzik) b) kotek c) piesek d) chomieczek'
3 'a) zenon b) marian i hela c) alekasadra(ola)'
4
5
6
7
8
9
10
end data.
execute.
DATASET NAME base1.
DATASET ACTIVATE base1.

compute a=CHAR.INDEX(var1, 'a)').
compute b=CHAR.INDEX(var1, 'b)').
compute c=CHAR.INDEX(var1, 'c)').
compute d=CHAR.INDEX(var1, 'd)').
execute.

string var2 to var5(a60).

do if a<>0 and b<>0.
compute var2=SUBSTR(var1, a, b-a).
else if a<>0 and b=0.
compute var2=SUBSTR(var1, a, 90).
end if.
execute.

do if b<>0 and c<>0.
compute var3=SUBSTR(var1, b, c-b).
else if b<>0 and c=0.
compute var3=SUBSTR(var1, b, 90).
end if.
execute.

do if c<>0 and d<>0.
compute var4=SUBSTR(var1, c, d-c).
else if c<>0 and d=0.
compute var4=SUBSTR(var1, c, 90).
end if.
execute.

do if d<>0.
compute var5=SUBSTR(var1, d, 90).
end if.
execute.

**********************************
*with polish letters
**********************************

data list list
/ID(f8.0) var1(a90).
begin data.
1 'a) żaba żabka albo żabeczka b) łapka'
2 'a) ryba rybka rybeńka maleńka (np.sledzik) b) kotek c) piesek d) chomieczek'
3 'a) zenon b) marian i hela c) alekasadra(ola)'
4
5
6
7
8
9
10
end data.
execute.
DATASET NAME base2.
DATASET ACTIVATE base2.

compute a=CHAR.INDEX(var1, 'a)').
compute b=CHAR.INDEX(var1, 'b)').
compute c=CHAR.INDEX(var1, 'c)').
compute d=CHAR.INDEX(var1, 'd)').
execute.

string var2 to var5(a60).

do if a<>0 and b<>0.
compute var2=SUBSTR(var1, a, b-a).
else if a<>0 and b=0.
compute var2=SUBSTR(var1, a, 90).
end if.
execute.

do if b<>0 and c<>0.
compute var3=SUBSTR(var1, b, c-b).
else if b<>0 and c=0.
compute var3=SUBSTR(var1, b, 90).
end if.
execute.

do if c<>0 and d<>0.
compute var4=SUBSTR(var1, c, d-c).
else if c<>0 and d=0.
compute var4=SUBSTR(var1, c, 90).
end if.
execute.

do if d<>0.
compute var5=SUBSTR(var1, d, 90).
end if.
execute.

Wolny od wirusów. www.avast.com

Jon Peck

Re: How to split text properly into new variables?

Here is code that can handle grouped parentheses. It assumes that a pair (...) should not match the markers such as a) even if it contains that string. First it finds pairs and changes the parentheses to left and right chevrons, assuming that these would not occur in text. Then it splits at a), b), ... and then puts the the original parentheses back.

The code follows, but if it gets mangled by the listserv, send me an email ([hidden email]), and I will send the code as a file.

Note that for test purposes I changed the input in case 2 to contain sledzia) instead of the original. The spssinc trans command creates string variables v1...v4 of length 50. Change the 50 as needed. This code can easily be tweaked for any number of blocks. If there are fewer than 4 blocks, the extra variables are blank. Statistics should be in Unicode mode for this.

* Encoding: UTF-8.

data list list

/ID(f8.0) var1(a90).

begin data.

1 'a) żaba żabka albo żabeczka b) łapka'

2 'a) ryba rybka rybeńka maleńka (np.sledzia) b) kotek c) piesek d) chomieczek'

3 'a) zenon b) marian i hela c) alekasadra(ola)'

end data.

DATASET NAME base2.

begin program.

import re

def splitter(s):

s = re.sub(r"\((.*?)\)", unichr(171)+r"\1"+unichr(187), s)

locs=re.findall(r"[a-d]\)",s)

pos = [s.index(item) for item in locs]

pos.append(len(s))

parts = [s[pos[i]:pos[i+1]] for i in range(len(pos)-1)]

parts = [re.sub(unichr(171), r"(", part) for part in parts]

parts = [re.sub(unichr(187), r")", part) for part in parts]

return parts

end program.

spssinc trans result=v1 v2 v3 v4 type=50

/formula "splitter(var1)".

On Thu, Jul 6, 2017 at 5:50 AM, 88Videoclips . <[hidden email]> wrote:

Like always I got very useful advices here.Thanks!!

As Jon and David mentioned the most problematic issue is that "a)/b)/c)..." could be part of text and appear in case many times. It is not inconceivable that some work must be done under supervision.

2017-07-05 2:14 GMT+02:00 88Videoclips . <[hidden email]>:

Hello again :)

I have base like this

ID,var1
1,a) żaba żabka albo żabeczka b) łapka
2,a) ryba rybka rybeńka maleńka (np.sledzik) b) kotek c) piesek d) chomieczek
3,a) zenon b) marian i hela c) alekasadra(ola)

and want to have like this.

ID, var2, var3, var4, var5
1, a) żaba żabka albo żabeczka, b) łapka
2, a) ryba rybka rybeńka maleńka (np.sledzik), b) kotek, c) piesek, d) chomieczek
3, a) zenon, b) marian i hela, c) alekasadra(ola),

To do this I run char.index to to find where "a)" , "b)", "c)" and "d)" were used and substr do split text.
It works if don't use polish letters like "ż", "ł".
This is caused by the fact that substr count that letters as 2 characters.
Below example.

Maybe you can show me other method, how to do it and keep polish letters?

**********************************
*without polish letters
**********************************

data list list
/ID(f8.0) var1(a90).
begin data.
1 'a) zaba zabka albo zabeczka b) lapka'
2 'a) ryba rybka rybenka malenka (np.sledzik) b) kotek c) piesek d) chomieczek'
3 'a) zenon b) marian i hela c) alekasadra(ola)'
4
5
6
7
8
9
10
end data.
execute.
DATASET NAME base1.
DATASET ACTIVATE base1.

compute a=CHAR.INDEX(var1, 'a)').
compute b=CHAR.INDEX(var1, 'b)').
compute c=CHAR.INDEX(var1, 'c)').
compute d=CHAR.INDEX(var1, 'd)').
execute.

string var2 to var5(a60).

do if a<>0 and b<>0.
compute var2=SUBSTR(var1, a, b-a).
else if a<>0 and b=0.
compute var2=SUBSTR(var1, a, 90).
end if.
execute.

do if b<>0 and c<>0.
compute var3=SUBSTR(var1, b, c-b).
else if b<>0 and c=0.
compute var3=SUBSTR(var1, b, 90).
end if.
execute.

do if c<>0 and d<>0.
compute var4=SUBSTR(var1, c, d-c).
else if c<>0 and d=0.
compute var4=SUBSTR(var1, c, 90).
end if.
execute.

do if d<>0.
compute var5=SUBSTR(var1, d, 90).
end if.
execute.

**********************************
*with polish letters
**********************************

data list list
/ID(f8.0) var1(a90).
begin data.
1 'a) żaba żabka albo żabeczka b) łapka'
2 'a) ryba rybka rybeńka maleńka (np.sledzik) b) kotek c) piesek d) chomieczek'
3 'a) zenon b) marian i hela c) alekasadra(ola)'
4
5
6
7
8
9
10
end data.
execute.
DATASET NAME base2.
DATASET ACTIVATE base2.

compute a=CHAR.INDEX(var1, 'a)').
compute b=CHAR.INDEX(var1, 'b)').
compute c=CHAR.INDEX(var1, 'c)').
compute d=CHAR.INDEX(var1, 'd)').
execute.

string var2 to var5(a60).

do if a<>0 and b<>0.
compute var2=SUBSTR(var1, a, b-a).
else if a<>0 and b=0.
compute var2=SUBSTR(var1, a, 90).
end if.
execute.

do if b<>0 and c<>0.
compute var3=SUBSTR(var1, b, c-b).
else if b<>0 and c=0.
compute var3=SUBSTR(var1, b, 90).
end if.
execute.

do if c<>0 and d<>0.
compute var4=SUBSTR(var1, c, d-c).
else if c<>0 and d=0.
compute var4=SUBSTR(var1, c, 90).
end if.
execute.

do if d<>0.
compute var5=SUBSTR(var1, d, 90).
end if.
execute.

Wolny od wirusów. www.avast.com

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD

Jon K Peck
[hidden email]