Reading numeric values from text when importing into SPSS?

classic Classic list List threaded Threaded
5 messages Options
pbl
Reply | Threaded
Open this post in threaded view
|

Reading numeric values from text when importing into SPSS?

pbl
I have a large dataset of administrative data that is currently in Excel, and all fields have both a number and the corresponding value. For example, one column would contain the following options:
(0) None
(1) One
(2) Two
(3) Three or more
There are about 300 variables like this (although what 0, 1, 2, etc mean differs for each of them), so I would like to avoid having to recode all of them. Is there a way to get SPSS to just read the numeric value from the parentheses? Is there another trick I could use for this?

Thanks!
Reply | Threaded
Open this post in threaded view
|

Re: Reading numeric values from text when importing into SPSS?

Maguin, Eugene
Maybe I'm wrong but from your example, it looks like the numeric value is always in position 2 of the string value that is read in from excel. If so, then it's basically a) do repeat for v1 to v300, b) use Char.Substr to extract the character in position 2, c) use Number to convert the extracted character to a number.

Do repeat x=v1 to v300/y=y1 to y300.
Compute y=number(char.substr(x,2,1),f1.0).
End repeat.

I hope it's understood that the conversion takes place after the data have been imported from the excel file.

You may also have another problem that may not have occurred to you yet. The excel file data give both the numeric value and value label. Perhaps you will want value labels. Perhaps in python there is a way to stick the extracted labels into the value labels section of the data file. That is not possible using syntax, so far as I know.

Gene Maguin



-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of pbl
Sent: Sunday, March 10, 2013 4:27 PM
To: [hidden email]
Subject: Reading numeric values from text when importing into SPSS?

I have a large dataset of administrative data that is currently in Excel, and all fields have both a number and the corresponding value. For example, one column would contain the following options:
(0) None
(1) One
(2) Two
(3) Three or more
There are about 300 variables like this (although what 0, 1, 2, etc mean differs for each of them), so I would like to avoid having to recode all of them. Is there a way to get SPSS to just read the numeric value from the parentheses? Is there another trick I could use for this?

Thanks!



--
View this message in context: http://spssx-discussion.1045642.n5.nabble.com/Reading-numeric-values-from-text-when-importing-into-SPSS-tp5718529.html
Sent from the SPSSX Discussion mailing list archive at Nabble.com.

=====================
To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Reading numeric values from text when importing into SPSS?

Art Kendall
In reply to this post by pbl
Try the example syntax below.

You most likely will want to do some autorecode.

new file.
data list list/ invar1 to invar3 (3a30).
begin data
"(0) None" "(13) orange" "(5) John"
"(1) One" " (23) apple" "(7) Mary"
"(2) Two" "(44) pineapple" "(111) Sue"
"(3) Three or more" " (-1) n/a" "(22) Bill"
end data.
numeric numvar1 to numvar3 (f4).
string strvar1 to strvar3(a20).
string numstr(a6).
do repeat
  inv  = invar1  to invar3
 /num = numvar1 to numvar3
 /str = strvar1 to strvar3.
compute EndNumPart = index(inv,")").
compute numstr = char.substr(inv,1,EndNumPart+1).
compute numstr = replace(numstr,'(','').
compute numstr = replace(numstr,')','').
compute numstr = replace(numstr,' ','').
compute num = number(numstr,f6).
compute str = char.substr(inv,(EndNumPart+2)).
end repeat.

list.
delete vars numstr.
alter type strvar1 to strvar3 (a=amin) .



Art Kendall
Social Research Consultants
On 3/10/2013 4:27 PM, pbl wrote:
I have a large dataset of administrative data that is currently in Excel, and
all fields have both a number and the corresponding value. For example, one
column would contain the following options:
(0) None
(1) One
(2) Two
(3) Three or more
There are about 300 variables like this (although what 0, 1, 2, etc mean
differs for each of them), so I would like to avoid having to recode all of
them. Is there a way to get SPSS to just read the numeric value from the
parentheses? Is there another trick I could use for this?

Thanks!



--
View this message in context: http://spssx-discussion.1045642.n5.nabble.com/Reading-numeric-values-from-text-when-importing-into-SPSS-tp5718529.html
Sent from the SPSSX Discussion mailing list archive at Nabble.com.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD


===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Art Kendall
Social Research Consultants
Reply | Threaded
Open this post in threaded view
|

Re: Reading numeric values from text when importing into SPSS?

Albert-Jan Roskam
In reply to this post by Maguin, Eugene
Hi,

Getting the values is easy, but as Gene said, you (me at least) would probably like to have the value labels.

do repeat #in = v1 to v300 / #out = num1 to num300.
+compute #out = number(substr(#in, index(#in, "(") + 1, index(#in, ")") -1), f5).
end repeat.

* getting the value labels: entirely untested.
BEGIN PROGRAM.
import re
import spss
prefix = "blah_" # assumes that target vars start with this
vnames = spssaux.GetVariableNamesList()
varPos = [v.index() for v in vnames if v.startswith(prefix)]
for i in varPos:
    dataCursor = spss.Cursor([i])
    col = dataCursor.fetchall()
    dataCursor.close()
    valueLabels = dict([re.findall("\((\d+)\) (\w+)", v[0], re.I)) for v in col])
    syntax = ["value labels %s" % vnames[i]]
    syntax.extend(["  %s '%s' " % value, label for value, label in valueLabels.items()])
    spss.Submit("\n".join(syntax))
END PROGRAM.


Regards,
Albert-Jan


~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
All right, but apart from the sanitation, the medicine, education, wine, public order, irrigation, roads, a
fresh water system, and public health, what have the Romans ever done for us?
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~


----- Original Message -----

> From: "Maguin, Eugene" <[hidden email]>
> To: [hidden email]
> Cc:
> Sent: Sunday, March 10, 2013 9:41 PM
> Subject: Re: [SPSSX-L] Reading numeric values from text when importing              into SPSS?
>
> Maybe I'm wrong but from your example, it looks like the numeric value is
> always in position 2 of the string value that is read in from excel. If so, then
> it's basically a) do repeat for v1 to v300, b) use Char.Substr to extract
> the character in position 2, c) use Number to convert the extracted character to
> a number.
>
> Do repeat x=v1 to v300/y=y1 to y300.
> Compute y=number(char.substr(x,2,1),f1.0).
> End repeat.
>
> I hope it's understood that the conversion takes place after the data have
> been imported from the excel file.
>
> You may also have another problem that may not have occurred to you yet. The
> excel file data give both the numeric value and value label. Perhaps you will
> want value labels. Perhaps in python there is a way to stick the extracted
> labels into the value labels section of the data file. That is not possible
> using syntax, so far as I know.
>
> Gene Maguin
>
>
>
> -----Original Message-----
> From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of pbl
> Sent: Sunday, March 10, 2013 4:27 PM
> To: [hidden email]
> Subject: Reading numeric values from text when importing into SPSS?
>
> I have a large dataset of administrative data that is currently in Excel, and
> all fields have both a number and the corresponding value. For example, one
> column would contain the following options:
> (0) None
> (1) One
> (2) Two
> (3) Three or more
> There are about 300 variables like this (although what 0, 1, 2, etc mean differs
> for each of them), so I would like to avoid having to recode all of them. Is
> there a way to get SPSS to just read the numeric value from the parentheses? Is
> there another trick I could use for this?
>
> Thanks!
>
>
>
> --
> View this message in context:
> http://spssx-discussion.1045642.n5.nabble.com/Reading-numeric-values-from-text-when-importing-into-SPSS-tp5718529.html
> Sent from the SPSSX Discussion mailing list archive at Nabble.com.
>
> =====================
> To manage your subscription to SPSSX-L, send a message to
> [hidden email] (not to SPSSX-L), with no body text except the
> command. To leave the list, send the command SIGNOFF SPSSX-L For a list of
> commands to manage subscriptions, send the command INFO REFCARD
>
> =====================
> To manage your subscription to SPSSX-L, send a message to
> [hidden email] (not to SPSSX-L), with no body text except the
> command. To leave the list, send the command
> SIGNOFF SPSSX-L
> For a list of commands to manage subscriptions, send the command
> INFO REFCARD
>

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Reading numeric values from text when importing into SPSS?

Rich Ulrich
In reply to this post by pbl
The easiest way to get value labels might be to use AUTORECODE
on the original strings -- assuming that they were computer-generated
and so there are not any variations in spelling, spacing, or capital letters.

If *all*  the values are observed as zero to <something>, no gaps, then you
could recover the original scores by subtracting 1 in a DO REPEAT.

--
Rich Ulrich

> Date: Sun, 10 Mar 2013 13:27:00 -0700

> From: [hidden email]
> Subject: Reading numeric values from text when importing into SPSS?
> To: [hidden email]
>
> I have a large dataset of administrative data that is currently in Excel, and
> all fields have both a number and the corresponding value. For example, one
> column would contain the following options:
> (0) None
> (1) One
> (2) Two
> (3) Three or more
> There are about 300 variables like this (although what 0, 1, 2, etc mean
> differs for each of them), so I would like to avoid having to recode all of
> them. Is there a way to get SPSS to just read the numeric value from the
> parentheses? Is there another trick I could use for this?
>