Issue with SPSSINC TRANS string.split

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

Issue with SPSSINC TRANS string.split

David Marso
Administrator
In an effort to make gradual concessions to the oft posted claim that python can sometimes result in more compact, easier to read code I attempted to amend my profligate ways by replacing my tried and true loopy loop string parsing technique with the newfangled "string.split()" function residing in the bowels of the SPSSINC_TRANS extension.  After searching for hours in this group I uncovered: "string.split" (since I was unsuccessful in my attempts to locate documentation on the goodies offered by the elves within this mystery tour.

After applying it to the following very simple file I was dismayed by the following result.
Using SPSS 22.0.1 on Vista 32 bit.
I do get the following brain dead warning: "Only string variables are allowed."
OK, that makes a great deal of sense ;-) *NOT*
*NOTE, the original code does an "inline" VARSTOCASES which would have followed the string.split, so please ignore the obvious non equivalence.
So, to python or not to python? That is the question!
My initial response is a resounding NAY!!!!!!!
I'll stick to my tried and true!


NEW FILE.
DATASET CLOSE ALL.
DATA LIST /Sentence (A30).
BEGIN DATA
It is a big Chair and table
It is a chair
It was a table
It is a Big Char
It is a Tabl and Char
END DATA.
DATASET NAME Source.
COMPUTE ID=$CASENUM.
DATASET COPY temp.
DATASET ACTIVATE temp.
SPSSINC TRANS RESULT = word1 TO word10  TYPE=5 /FORMULA "string.split(Sentence)".
LIST.
 
 
Sentence                             ID word1 word2 word3 word4 word5 word6 word7 word8 word9 word10
 
It is a big Chair and table        1.00 It    is    a     big   Chair and   table
It is a chair                      2.00
It was a table                     3.00
It is a Big Char                   4.00
It is a Tabl and Char              5.00
 
 
Number of cases read:  5    Number of cases listed:  5

Intended to replace the following.
SET MXLOOPS=1000.

STRING #Cpy (A30).
STRING Word (A8).
COMPUTE #Cpy= UPCASE(Sentence).
LOOP.
COMPUTE #=CHAR.INDEX(#Cpy," ").
DO IF # GT 0.
COMPUTE Word=CHAR.SUBSTR(#Cpy,1,#-1).
COMPUTE #Cpy=CHAR.SUBSTR(#Cpy,#+1).
ELSE.
COMPUTE Word=#Cpy.
END IF.
XSAVE OUTFILE "C:\Temp\Parsed" / KEEP ID Word .
END LOOP IF # EQ 0.
EXECUTE.
DELETE VARIABLES Word.
GET FILE  "C:\Temp\Parsed" .
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me.
---
"Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis."
Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?"
Reply | Threaded
Open this post in threaded view
|

Re: Issue with SPSSINC TRANS string.split

Andy W
I'm surprised it returned anything. In python:

string.split()

Is modifying the object named *string* and splits it into a list of the substrings. So in theory you want (assuming *Sentence* is the object of interest):

Sentence.split()

But the trans extension is not smart enough to figure this out (see here for alittle different example, http://spssx-discussion.1045642.n5.nabble.com/How-to-transform-alpha-numerical-values-in-a-variable-in-lower-case-letters-so-that-the-first-letter-td5724418.html#a5724425)

My SPSS is being consumed right now by other calculations, but what happens if you try to define your own function and then pass that to Trans. Something like:

**********************************************.
BEGIN PROGRAM.
def Split(s):
  return s.split()

test = "It is a big Chair and table"
print Split(test)

test2 = "It is a"
print Split(test2)
END PROGRAM.

SPSSINC TRANS RESULT = word1 TO word10  TYPE=5 5 5 5 5 5 5 5 5 5 /FORMULA "Split(Sentence)".
**********************************************.

I'm not 100% sure if you need to define the type for every variable.
Andy W
apwheele@gmail.com
http://andrewpwheeler.wordpress.com/
Reply | Threaded
Open this post in threaded view
|

Re: Issue with SPSSINC TRANS string.split

David Marso
Administrator
Hi Andy,
I'm using what I found here.
http://spssx-discussion.1045642.n5.nabble.com/Parsing-String-Name-Variable-td5714037.html#a5714041
Notice my usage works correctly for the first case but not the remaining 4.
Weird eh?

Andy W wrote
I'm surprised it returned anything. In python:

string.split()

Is modifying the object named *string* and splits it into a list of the substrings. So in theory you want (assuming *Sentence* is the object of interest):

Sentence.split()

But the trans extension is not smart enough to figure this out (see here for alittle different example, http://spssx-discussion.1045642.n5.nabble.com/How-to-transform-alpha-numerical-values-in-a-variable-in-lower-case-letters-so-that-the-first-letter-td5724418.html#a5724425)

My SPSS is being consumed right now by other calculations, but what happens if you try to define your own function and then pass that to Trans. Something like:

**********************************************.
BEGIN PROGRAM.
def Split(s):
  return s.split()

test = "It is a big Chair and table"
print Split(test)

test2 = "It is a"
print Split(test2)
END PROGRAM.

SPSSINC TRANS RESULT = word1 TO word10  TYPE=5 5 5 5 5 5 5 5 5 5 /FORMULA "Split(Sentence)".
**********************************************.

I'm not 100% sure if you need to define the type for every variable.
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me.
---
"Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis."
Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?"
Reply | Threaded
Open this post in threaded view
|

Re: Issue with SPSSINC TRANS string.split

Jon K Peck
In reply to this post by David Marso
It's nice to see that you are so open minded to not so new technology, David. SPSS has had Python integration now for almost nine  years.

The SPSSINC TRANS command makes no attempt to document the thousands of Python standard library functions that you could use in SPSSINC TRANS.

Sorry to say, since you always love to aim an arrow at my back, you found a bug.  The particular api that SPSSINC TRANS uses to update
the result dataset has apparently changed with regard to None values, so when the number of string results produced is less than the number of result variables specified, it could produce the error.

However, the benefits of the Python integration mean that many bugs can be quickly fixed.  It you download and install the latest version of this extension via the Utilities menu (be sure to start Statistics using Run As Administrator), you will find that this code works fine.


DATASET CLOSE ALL.
DATA LIST /Sentence (A50).
BEGIN DATA
It is a big Chair and table eight nine ten
It is a chair
It was a table
It is a Big Char
It is a Tabl and Char
END DATA.
DATASET NAME Source.

SPSSINC TRANS RESULT = word1 TO word10  TYPE=5
/FORMULA "string.split(Sentence)".
LIST.


Jon Peck (no "h") aka Kim
Senior Software Engineer, IBM
[hidden email]
phone: 720-342-5621




From:        David Marso <[hidden email]>
To:        [hidden email],
Date:        06/12/2014 11:40 AM
Subject:        [SPSSX-L] Issue with SPSSINC TRANS string.split
Sent by:        "SPSSX(r) Discussion" <[hidden email]>




In an effort to make gradual concessions to the oft posted claim that python
can sometimes result in more compact, easier to read code I attempted to
amend my profligate ways by replacing my tried and true loopy loop string
parsing technique with the newfangled "string.split()" function residing in
the bowels of the SPSSINC_TRANS extension.  After searching for hours in
this group I uncovered: "string.split" (since I was unsuccessful in my
attempts to locate documentation on the goodies offered by the elves within
this mystery tour.

After applying it to the following very simple file I was dismayed by the
following result.
Using SPSS 22.0.1 on Vista 32 bit.
I do get the following brain dead warning: "Only string variables are
allowed."
OK, that makes a great deal of sense ;-) **NOT**
*NOTE, the original code does an "inline" VARSTOCASES which would have
followed the string.split, so please ignore the obvious non equivalence.
So, to python or not to python? That is the question!
My initial response is a resounding *NAY*!!!!!!!
I'll stick to my tried and true!


NEW FILE.
DATASET CLOSE ALL.
DATA LIST /Sentence (A30).
BEGIN DATA
It is a big Chair and table
It is a chair
It was a table
It is a Big Char
It is a Tabl and Char
END DATA.
DATASET NAME Source.
COMPUTE ID=$CASENUM.
DATASET COPY temp.
DATASET ACTIVATE temp.
SPSSINC TRANS RESULT = word1 TO word10  TYPE=5 /FORMULA
"string.split(Sentence)".
LIST.


Sentence                             ID word1 word2 word3 word4 word5 word6
word7 word8 word9 word10

It is a big Chair and table        1.00 It    is    a     big   Chair and
table
It is a chair                      2.00
It was a table                     3.00
It is a Big Char                   4.00
It is a Tabl and Char              5.00


Number of cases read:  5    Number of cases listed:  5

Intended to replace the following.
SET MXLOOPS=1000.

STRING #Cpy (A30).
STRING Word (A8).
COMPUTE #Cpy= UPCASE(Sentence).
LOOP.
COMPUTE #=CHAR.INDEX(#Cpy," ").
DO IF # GT 0.
COMPUTE Word=CHAR.SUBSTR(#Cpy,1,#-1).
COMPUTE #Cpy=CHAR.SUBSTR(#Cpy,#+1).
ELSE.
COMPUTE Word=#Cpy.
END IF.
XSAVE OUTFILE "C:\Temp\Parsed" / KEEP ID Word .
END LOOP IF # EQ 0.
EXECUTE.
DELETE VARIABLES Word.
GET FILE  "C:\Temp\Parsed" .



-----
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me.
---
"Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis."
Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?"
--
View this message in context:
http://spssx-discussion.1045642.n5.nabble.com/Issue-with-SPSSINC-TRANS-string-split-tp5726445.html
Sent from the SPSSX Discussion mailing list archive at Nabble.com.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD


Reply | Threaded
Open this post in threaded view
|

Re: Issue with SPSSINC TRANS string.split

David Marso
Administrator
Thanks Jon.

Jon K Peck wrote
It's nice to see that you are so open minded to not so new technology,
David. SPSS has had Python integration now for almost nine  years.

The SPSSINC TRANS command makes no attempt to document the thousands of
Python standard library functions that you could use in SPSSINC TRANS.

Sorry to say, since you always love to aim an arrow at my back, you found
a bug.  The particular api that SPSSINC TRANS uses to update
the result dataset has apparently changed with regard to None values, so
when the number of string results produced is less than the number of
result variables specified, it could produce the error.

However, the benefits of the Python integration mean that many bugs can be
quickly fixed.  It you download and install the latest version of this
extension via the Utilities menu (be sure to start Statistics using Run As
Administrator), you will find that this code works fine.


DATASET CLOSE ALL.
DATA LIST /Sentence (A50).
BEGIN DATA
It is a big Chair and table eight nine ten
It is a chair
It was a table
It is a Big Char
It is a Tabl and Char
END DATA.
DATASET NAME Source.

SPSSINC TRANS RESULT = word1 TO word10  TYPE=5
/FORMULA "string.split(Sentence)".
LIST.


Jon Peck (no "h") aka Kim
Senior Software Engineer, IBM
[hidden email]
phone: 720-342-5621




From:   David Marso <[hidden email]>
To:     [hidden email],
Date:   06/12/2014 11:40 AM
Subject:        [SPSSX-L] Issue with SPSSINC TRANS string.split
Sent by:        "SPSSX(r) Discussion" <[hidden email]>



In an effort to make gradual concessions to the oft posted claim that
python
can sometimes result in more compact, easier to read code I attempted to
amend my profligate ways by replacing my tried and true loopy loop string
parsing technique with the newfangled "string.split()" function residing
in
the bowels of the SPSSINC_TRANS extension.  After searching for hours in
this group I uncovered: "string.split" (since I was unsuccessful in my
attempts to locate documentation on the goodies offered by the elves
within
this mystery tour.

After applying it to the following very simple file I was dismayed by the
following result.
Using SPSS 22.0.1 on Vista 32 bit.
I do get the following brain dead warning: "Only string variables are
allowed."
OK, that makes a great deal of sense ;-) **NOT**
*NOTE, the original code does an "inline" VARSTOCASES which would have
followed the string.split, so please ignore the obvious non equivalence.
So, to python or not to python? That is the question!
My initial response is a resounding *NAY*!!!!!!!
I'll stick to my tried and true!


NEW FILE.
DATASET CLOSE ALL.
DATA LIST /Sentence (A30).
BEGIN DATA
It is a big Chair and table
It is a chair
It was a table
It is a Big Char
It is a Tabl and Char
END DATA.
DATASET NAME Source.
COMPUTE ID=$CASENUM.
DATASET COPY temp.
DATASET ACTIVATE temp.
SPSSINC TRANS RESULT = word1 TO word10  TYPE=5 /FORMULA
"string.split(Sentence)".
LIST.


Sentence                             ID word1 word2 word3 word4 word5
word6
word7 word8 word9 word10

It is a big Chair and table        1.00 It    is    a     big   Chair and
table
It is a chair                      2.00
It was a table                     3.00
It is a Big Char                   4.00
It is a Tabl and Char              5.00


Number of cases read:  5    Number of cases listed:  5

Intended to replace the following.
SET MXLOOPS=1000.

STRING #Cpy (A30).
STRING Word (A8).
COMPUTE #Cpy= UPCASE(Sentence).
LOOP.
COMPUTE #=CHAR.INDEX(#Cpy," ").
DO IF # GT 0.
COMPUTE Word=CHAR.SUBSTR(#Cpy,1,#-1).
COMPUTE #Cpy=CHAR.SUBSTR(#Cpy,#+1).
ELSE.
COMPUTE Word=#Cpy.
END IF.
XSAVE OUTFILE "C:\Temp\Parsed" / KEEP ID Word .
END LOOP IF # EQ 0.
EXECUTE.
DELETE VARIABLES Word.
GET FILE  "C:\Temp\Parsed" .



-----
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to
email me.
---
"Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos
ne forte conculcent eas pedibus suis."
Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in
abyssum?"
--
View this message in context:
http://spssx-discussion.1045642.n5.nabble.com/Issue-with-SPSSINC-TRANS-string-split-tp5726445.html

Sent from the SPSSX Discussion mailing list archive at Nabble.com.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me.
---
"Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis."
Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?"