Changing string to numeric format across entire dataset

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

Changing string to numeric format across entire dataset

Carina Vogel
I have various datasets containing the same variables (with the same variable names) and I would like to add these datasets together (add cases). My problem is, that the variables randomly come in numeric and string format (the order is different in each file). Is there a syntax that can be applied to all variables of a file that simply changes those variables that come in string format to numeric format (or vice versa)?

Thanks,
Carina

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Changing string to numeric format across entire dataset

Peck, Jon
SPSS 16 introduced the ALTER TYPE command.  Using it you can say things like make all strings numeric, or make all strings a certain fixed size, etc.

HTH,
Jon Peck

-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Carina Vogel
Sent: Wednesday, August 20, 2008 3:41 AM
To: [hidden email]
Subject: [SPSSX-L] Changing string to numeric format across entire dataset

I have various datasets containing the same variables (with the same variable names) and I would like to add these datasets together (add cases). My problem is, that the variables randomly come in numeric and string format (the order is different in each file). Is there a syntax that can be applied to all variables of a file that simply changes those variables that come in string format to numeric format (or vice versa)?

Thanks,
Carina

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Changing string to numeric format across entire dataset

Albert-Jan Roskam
In reply to this post by Carina Vogel
Hi,

The code below might do what you want. It uses the employee data set as an example. You only need to specify your own variables, in single quotes and comma-separated.

Other Pythonians: feel free to comment! The codes seems overly complex to me, which is not in line with the Zen of Python ("simple is better than complex").

Cheers!!
Albert-Jan


get file='C:\Program Files\Spss\Employee data.sav'.

begin program.
"""
Convert target variables to string variables if they are numerical vars
"""
target_vars = ['id', 'bdate', 'jobcat', 'salary', 'prevexp'] # specify target vars here.

import spss, spssaux, random

all_vars = spssaux.getVariableNamesList()
varlist = []

# bring target vars to front of the file.
for i in range(len(target_vars)):
        if target_vars[i] in all_vars:
                print target_vars[i],
                varlist.append(target_vars[i])
                addlist = ' '.join(varlist)
spss.Submit("add files / file = * / keep = %s all." % (addlist))

print "**** The following vars were converted: ", addlist

# convert target vars from numeric (F) to string.
for i in range(len(target_vars)):
        tmpvar = "tmp" + str(random.random())
        varname = spss.GetVariableName(i)
        if spss.GetVariableFormat(i)[0] in ("F", "N"):
                spss.Submit(""" string %s (a8).
                compute %s = string (%s, f8).
                exe.
                delete variables %s.
                rename variables (%s = %s). """ % (tmpvar, tmpvar, varname, varname, tmpvar, varname))

# restore original variable order
varlist = []
for i in range(len(all_vars)):
        varlist.append(all_vars[i])
        addlist = ' '.join(varlist)
spss.Submit("add files / file = * / keep = %s all." % (addlist))
spss.Submit("exe.")
end program.




--- On Wed, 8/20/08, Carina Vogel <[hidden email]> wrote:

> From: Carina Vogel <[hidden email]>
> Subject: Changing string to numeric format across entire dataset
> To: [hidden email]
> Date: Wednesday, August 20, 2008, 11:40 AM
> I have various datasets containing the same variables (with
> the same variable names) and I would like to add these
> datasets together (add cases). My problem is, that the
> variables randomly come in numeric and string format (the
> order is different in each file). Is there a syntax that can
> be applied to all variables of a file that simply changes
> those variables that come in string format to numeric format
> (or vice versa)?
>
> Thanks,
> Carina
>
> =====================
> To manage your subscription to SPSSX-L, send a message to
> [hidden email] (not to SPSSX-L), with no body
> text except the
> command. To leave the list, send the command
> SIGNOFF SPSSX-L
> For a list of commands to manage subscriptions, send the
> command
> INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Changing string to numeric format across entire dataset

Carlos Renato (www.estatistico.org)
Dear friend

     It's a old problem of SPSS users. You can solve this
independent if the labels of variables are the same or no.

Use this sequence:

First create a macro that save the name of the variables you
can to convert.

DEFINE VariablesToConvert ()
 Variable_1
 Variable_2
 Variable_N
!ENDDEFINE.

Second, you apply and run this other macro

DEFINE MacroStrNum (Variables=!CMDEND)
!DO !VarOriginal !IN (!Variables)
NUMERIC Temporaria(F20.4).
COMPUTE Temporaria=NUMBER(!VarOriginal,F20.0).
MATCH FILES FILE=* /DROP=!VarOriginal.
RENAME VARIABLE (Temporaria=!VarOriginal).
!DOEND.
!ENDDEFINE.

MacroStrNum Variables=VariablesToConvert.

This procedure solves your problem for any number and
labels.

Thanks for all.

Carlos Renato
Statistician
Recife - PE - Brazil

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Changing string to numeric format across entire dataset

Albert-Jan Roskam
Hi,

I've been playing with this a bit more, after some advice I got offlist. The macro given by Carlos works, but messes up the variable order. Also, all vars will be in F20 format.

But still, one wonders if there's no easier way to do this, I mean without all the spss code, so pure Python. ALTER TYPE really is a big improvement.

Cheers!!
Albert-Jan


get file='C:\Program Files\Spss\Employee data.sav'.

set mprint = on.

begin program.
"""
Convert target variables to string variables if they are numerical vars
"""
target_vars = ['id', 'bdate', 'jobcat', 'salary', 'prevexp'] # specify target vars here.

import spss, spssaux, random
import random, re, string

all_vars = spssaux.getVariableNamesList()
varlist = []

# convert target vars from numeric (F or N) to string (A).
for i in range(len(all_vars)):
        varlist.append(all_vars[i])
        addlist = ' '.join(varlist)
        varname = all_vars[i]
        fmt = spss.GetVariableFormat(i)
        for j in range(len(target_vars)):
                tmpvar = "tmp" + str(random.random())
                if target_vars[j] == all_vars[i] and fmt[0] in ("F", "N"):
                        print '********** varname: ', varname, '(', fmt, ')'
                        varlen = string.join(re.findall(numbers, fmt), ' ')
                        spss.Submit("string %s (a%s)." % (tmpvar, varlen))
                        spss.Submit("compute %s = string (%s, f%s)." % (tmpvar, varname, varlen))
                        spss.Submit("""add files / file = * / drop = %s.
                        add files / file = * / rename = (%s = %s) / keep = %s all.""" % (varname, tmpvar, varname, addlist))
spss.Submit("exe.")
end program.

dataset name result.
apply dictionary from 'C:\Program Files\Spss\Employee data.sav'.




--- On Thu, 8/21/08, Carlos Renato <[hidden email]> wrote:

> From: Carlos Renato <[hidden email]>
> Subject: Re: Changing string to numeric format across entire dataset
> To: [hidden email]
> Date: Thursday, August 21, 2008, 3:53 PM
> Dear friend
>
>      It's a old problem of SPSS users. You can solve
> this
> independent if the labels of variables are the same or no.
>
> Use this sequence:
>
> First create a macro that save the name of the variables
> you
> can to convert.
>
> DEFINE VariablesToConvert ()
>  Variable_1
>  Variable_2
>  Variable_N
> !ENDDEFINE.
>
> Second, you apply and run this other macro
>
> DEFINE MacroStrNum (Variables=!CMDEND)
> !DO !VarOriginal !IN (!Variables)
> NUMERIC Temporaria(F20.4).
> COMPUTE Temporaria=NUMBER(!VarOriginal,F20.0).
> MATCH FILES FILE=* /DROP=!VarOriginal.
> RENAME VARIABLE (Temporaria=!VarOriginal).
> !DOEND.
> !ENDDEFINE.
>
> MacroStrNum Variables=VariablesToConvert.
>
> This procedure solves your problem for any number and
> labels.
>
> Thanks for all.
>
> Carlos Renato
> Statistician
> Recife - PE - Brazil
>
> =====================
> To manage your subscription to SPSSX-L, send a message to
> [hidden email] (not to SPSSX-L), with no body
> text except the
> command. To leave the list, send the command
> SIGNOFF SPSSX-L
> For a list of commands to manage subscriptions, send the
> command
> INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Changing string to numeric format across entire dataset

Peck, Jon
The code below defines a function that can be used to rename the variables.  It uses a few Python techniques that are worth knowing about.

It doesn’t use any 16 features.  Of course, in 16, ALTER TYPE can do this directly.  However, the Dataset class, also new in 16, provides the ability to change the type directly just by assigning a new type in your Python code.

As always, caveats about line wrapping in messages.

HTH,
Jon Peck


import spss, spssaux, random
import random, re

def convertToString(targetVars):
        """Convert all numeric variables in targetVars to strings"""

        vardict = spssaux.VariableDict(variableType="numeric")
        varsToChange = set(set(targetVars).intersection(set(vardict)))

        strings = []
        computes = []
        newnames = []

        for v in varsToChange:
                vformat = vardict[v].VariableFormat
                swidth = re.search(r"\d+", vformat).group()
                newv = v + "." + str(random.randint(100000,999999))
                newnames.append(newv)
                strings.append(newv + "(A"+ swidth + ")")
                computes.append("COMPUTE %(newv)s = STRING(%(v)s, %(vformat)s)." % locals())

        if newnames:
                spss.Submit("STRING " + "/".join(strings))
                spss.Submit(computes)
                spss.Submit("EXECUTE")
                spss.Submit("DELETE VARIABLES " + " ".join(varsToChange))
                spss.Submit("RENAME VARIABLES (" + " ".join(newnames) + "="  + " ".join(varsToChange) + ")")

-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Albert-jan Roskam
Sent: Thursday, August 21, 2008 9:56 AM
To: [hidden email]
Subject: Re: [SPSSX-L] Changing string to numeric format across entire dataset

Hi,

I've been playing with this a bit more, after some advice I got offlist. The macro given by Carlos works, but messes up the variable order. Also, all vars will be in F20 format.

But still, one wonders if there's no easier way to do this, I mean without all the spss code, so pure Python. ALTER TYPE really is a big improvement.

Cheers!!
Albert-Jan


get file='C:\Program Files\Spss\Employee data.sav'.

set mprint = on.

begin program.
"""
Convert target variables to string variables if they are numerical vars
"""
target_vars = ['id', 'bdate', 'jobcat', 'salary', 'prevexp'] # specify target vars here.

import spss, spssaux, random
import random, re, string

all_vars = spssaux.getVariableNamesList()
varlist = []

# convert target vars from numeric (F or N) to string (A).
for i in range(len(all_vars)):
        varlist.append(all_vars[i])
        addlist = ' '.join(varlist)
        varname = all_vars[i]
        fmt = spss.GetVariableFormat(i)
        for j in range(len(target_vars)):
                tmpvar = "tmp" + str(random.random())
                if target_vars[j] == all_vars[i] and fmt[0] in ("F", "N"):
                        print '********** varname: ', varname, '(', fmt, ')'
                        varlen = string.join(re.findall(numbers, fmt), ' ')
                        spss.Submit("string %s (a%s)." % (tmpvar, varlen))
                        spss.Submit("compute %s = string (%s, f%s)." % (tmpvar, varname, varlen))
                        spss.Submit("""add files / file = * / drop = %s.
                        add files / file = * / rename = (%s = %s) / keep = %s all.""" % (varname, tmpvar, varname, addlist))
spss.Submit("exe.")
end program.

dataset name result.
apply dictionary from 'C:\Program Files\Spss\Employee data.sav'.




--- On Thu, 8/21/08, Carlos Renato <[hidden email]> wrote:

> From: Carlos Renato <[hidden email]>
> Subject: Re: Changing string to numeric format across entire dataset
> To: [hidden email]
> Date: Thursday, August 21, 2008, 3:53 PM
> Dear friend
>
>      It's a old problem of SPSS users. You can solve
> this
> independent if the labels of variables are the same or no.
>
> Use this sequence:
>
> First create a macro that save the name of the variables
> you
> can to convert.
>
> DEFINE VariablesToConvert ()
>  Variable_1
>  Variable_2
>  Variable_N
> !ENDDEFINE.
>
> Second, you apply and run this other macro
>
> DEFINE MacroStrNum (Variables=!CMDEND)
> !DO !VarOriginal !IN (!Variables)
> NUMERIC Temporaria(F20.4).
> COMPUTE Temporaria=NUMBER(!VarOriginal,F20.0).
> MATCH FILES FILE=* /DROP=!VarOriginal.
> RENAME VARIABLE (Temporaria=!VarOriginal).
> !DOEND.
> !ENDDEFINE.
>
> MacroStrNum Variables=VariablesToConvert.
>
> This procedure solves your problem for any number and
> labels.
>
> Thanks for all.
>
> Carlos Renato
> Statistician
> Recife - PE - Brazil
>
> =====================
> To manage your subscription to SPSSX-L, send a message to
> [hidden email] (not to SPSSX-L), with no body
> text except the
> command. To leave the list, send the command
> SIGNOFF SPSSX-L
> For a list of commands to manage subscriptions, send the
> command
> INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD