split-string-variable-into-components (may be Python)

classic Classic list List threaded Threaded
4 messages Options
raw
Reply | Threaded
Open this post in threaded view
|

split-string-variable-into-components (may be Python)

raw
Dear all I have to split a string variable into components, separeted by comma.

I’ve found a tutorial in Python “split-string-variable-into-components”, but it did not work in my db (may be it’s because in that example you have only one variable in the db, instead I have more)

Any tip?

thanks
Reply | Threaded
Open this post in threaded view
|

Re: split-string-variable-into-components (may be Python)

David Marso
Administrator
http://spssx-discussion.1045642.n5.nabble.com/template/NamlServlet.jtp?macro=search_page&node=1068821&query=Parse&n=1068821
raw wrote
Dear all I have to split a string variable into components, separeted by comma.

I’ve found a tutorial in Python “split-string-variable-into-components”, but it did not work in my db (may be it’s because in that example you have only one variable in the db, instead I have more)

Any tip?

thanks
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me.
---
"Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis."
Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?"
raw
Reply | Threaded
Open this post in threaded view
|

Re: split-string-variable-into-components (may be Python)

raw


Thanks. i think that the most flexible solution is the Python one, because  the normal spss synthax creates several variables with name v1 to v_n, but without a customized name.

Instead this solution does:

http://www.spss-tutorials.com/split-string-variable-into-components/


*1. Create Test Data.

begin program.
import random,spss
random.seed(1)
data = ''
for case in range(10):
    val = '"'
    for novars in range(random.randrange(12)):
        for vallen in range(random.randrange(8)):
            val += chr(random.randrange(97,123))
        val += ';'
    val += '"'
    data += val + '\n'
spss.Submit('''data list list/s1(a%s).\nbegin data\n\n%s.'''%(max(len(s) for s in data.split('"')),data))
end program.

*2. Define the function.

begin program.
def stringsplitter(variable,sep):
    import spss,spssaux
    lens = []
    curs = spss.Cursor([spssaux.VariableDict().VariableIndex(variable)],\
accessType='w')
    for case in range(curs.GetCaseCount()):
        for cnt,val in enumerate(curs.fetchone()[0].split(sep)):
            if not len(lens)>cnt:
                lens.append(len(val.strip()))
            elif len(val.strip())>lens[cnt]:
                lens[cnt] = len(val.strip())
    curs.close()
    curs=spss.Cursor([spssaux.VariableDict().VariableIndex(variable)],\
accessType='w')
    curs.SetVarNameAndType([variable + '_s' + str(cnt + 1) for cnt in range(len(lens))],[1 if leng==0 else leng for leng in lens])
    curs.CommitDictionary()
    for case in range(curs.GetCaseCount()):
        for cnt,val in enumerate(curs.fetchone()[0].split(sep)):
            curs.SetValueChar(variable+'_s'+str(cnt + 1),val)
        curs.CommitCase()
    curs.close()
end program.

*3. Apply the function.

begin program.
stringsplitter('s1',';') #Please specify string variable and separator.
end program.

but I am not able to adapt the syntax to my data, here an example:

data list list / id * city (A50) zone (A1) product (A50).
begin data
1 "berlin" "a" "stock1, stock2, stock3"
2 "paris" "a" "stock1, stock2, stock3"
3 "amsterdam" "b" "stock1, stock2, stock3, stock4"
4 "london" "b" "stock1, stock2, stock3, stock5"
end data.


i guess of course that I have to use my var name "product" instead of s1, but I miss something else
Reply | Threaded
Open this post in threaded view
|

Re: split-string-variable-into-components (may be Python)

Jon K Peck
Here is a simpler Python solution.

begin program.
def splitter(thestring):
     return thestring.split(";")
end program.

spssinc trans result=x1 to x20 type=10
/formula "splitter(s1)".

It takes s1 as the input and creates variables x1 to x20 with the split pieces.  Each output string is 10 bytes.  You can, of course, easily change those parameters.

Unused slots get value blank.  If there are too many split values, you will get an error message.

HTH,

Jon Peck (no "h") aka Kim
Senior Software Engineer, IBM
[hidden email]
phone: 720-342-5621




From:        raw <[hidden email]>
To:        [hidden email]
Date:        10/28/2015 03:36 PM
Subject:        Re: [SPSSX-L] split-string-variable-into-components (may be Python)
Sent by:        "SPSSX(r) Discussion" <[hidden email]>




Thanks. i think that the most flexible solution is the Python one, because
the normal spss synthax creates several variables with name v1 to v_n, but
without a customized name.

Instead this solution does:

http://www.spss-tutorials.com/split-string-variable-into-components/


*1. Create Test Data.

begin program.
import random,spss
random.seed(1)
data = ''
for case in range(10):
   val = '"'
   for novars in range(random.randrange(12)):
       for vallen in range(random.randrange(8)):
           val += chr(random.randrange(97,123))
       val += ';'
   val += '"'
   data += val + '\n'
spss.Submit('''data list list/s1(a%s).\nbegin data\n\n%s.'''%(max(len(s) for
s in data.split('"')),data))
end program.

*2. Define the function.

begin program.
def stringsplitter(variable,sep):
   import spss,spssaux
   lens = []
   curs = spss.Cursor([spssaux.VariableDict().VariableIndex(variable)],\
accessType='w')
   for case in range(curs.GetCaseCount()):
       for cnt,val in enumerate(curs.fetchone()[0].split(sep)):
           if not len(lens)>cnt:
               lens.append(len(val.strip()))
           elif len(val.strip())>lens[cnt]:
               lens[cnt] = len(val.strip())
   curs.close()
   curs=spss.Cursor([spssaux.VariableDict().VariableIndex(variable)],\
accessType='w')
   curs.SetVarNameAndType([variable + '_s' + str(cnt + 1) for cnt in
range(len(lens))],[1 if leng==0 else leng for leng in lens])
   curs.CommitDictionary()
   for case in range(curs.GetCaseCount()):
       for cnt,val in enumerate(curs.fetchone()[0].split(sep)):
           curs.SetValueChar(variable+'_s'+str(cnt + 1),val)
       curs.CommitCase()
   curs.close()
end program.

*3. Apply the function.

begin program.
stringsplitter('s1',';') #Please specify string variable and separator.
end program.

but I am not able to adapt the syntax to my data, here an example:

data list list / id * city (A50) zone (A1) product (A50).
begin data
1 "berlin"                 "a"                 "stock1, stock2, stock3"
2 "paris"                 "a"                 "stock1, stock2, stock3"
3 "amsterdam" "b"                 "stock1, stock2, stock3, stock4"
4 "london" "b"                 "stock1, stock2, stock3, stock5"
end data.


i guess of course that I have to use my var name "product" instead of s1,
but I miss something else



--
View this message in context:
http://spssx-discussion.1045642.n5.nabble.com/split-string-variable-into-components-may-be-Python-tp5730890p5730894.html
Sent from the SPSSX Discussion mailing list archive at Nabble.com.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD



===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD