Restructuring an SPSS data file

classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|

Restructuring an SPSS data file

Vincent LOUIS-2
Hi Everyone,
I am trying to write an SPSS syntax to restructure a file that
contains the following information:

Year    GeoUnit                DataValue          VariableName

2000    08002                       7.50                NumArtsJobs2000
2000    08002                   11404.00        NumInfoTechJobs2000
2000    08002                     672.00        NumManuJobs2000
2000    08003                      42.00        NumArtsJobs2000
2000    08003                    3342.00        NumInfoTechJobs2000
2000    08003                    1279.00        NumManuJobs2000
2000    08004                        .00                NumArtsJobs2000
2000    08004                     183.00        NumInfoTechJobs2000
2000    08004                      66.00        NumManuJobs2000

The file that I ultimate want should consist of the above information,
but in the following structure:

GeoUnit         NumArtsJobs2000   NumInfoTechJobs2000    NumInfoTechJobs2000

08002                7.50                       11404.00                             672.00
08003               42.00                       3342.00                            1279.00
08004                  .00                        183.00                                66.00

I would appreciate any help you may be able to provide to assist me in
reconfiguring the data.

Thank you in advance,
Vincent
Reply | Threaded
Open this post in threaded view
|

AW: Restructuring an SPSS data file

la volta statistics
Hi Vincent

Try the following syntax. Hope that helps.

Christian

DATA LIST FREE / Year(F8.0)    GeoUnit(A5)   DataValue(F8.2)
VariableName(A19).
BEGIN DATA
2000    08002                       7.50        NumArtsJobs2000
2000    08002                   11404.00        NumInfoTechJobs2000
2000    08002                     672.00        NumManuJobs2000
2000    08003                      42.00        NumArtsJobs2000
2000    08003                    3342.00        NumInfoTechJobs2000
2000    08003                    1279.00        NumManuJobs2000
2000    08004                        .00        NumArtsJobs2000
2000    08004                     183.00        NumInfoTechJobs2000
2000    08004                      66.00        NumManuJobs2000
END DATA.


SORT CASES BY Year GeoUnit VariableName .
CASESTOVARS
 /ID = GeoUnit
 /drop = VariableName Year
 /seperator="".


Rename Variable (DataValue1 DataValue2 DataValue3 = NumArtsJobs2000
NumInfoTechJobs2000 NumManuJobs2000).



*******************************
la volta statistics
Christian Schmidhauser, Dr.phil.II
Weinbergstrasse 108
Ch-8006 Zürich
Tel: +41 (043) 233 98 01
Fax: +41 (043) 233 98 02
email: mailto:[hidden email]
internet: http://www.lavolta.ch/


-----Ursprüngliche Nachricht-----
Von: SPSSX(r) Discussion [mailto:[hidden email]]Im Auftrag von
Vincent Louis
Gesendet: Mittwoch, 28. März 2007 22:56
An: [hidden email]
Betreff: Restructuring an SPSS data file


Hi Everyone,
I am trying to write an SPSS syntax to restructure a file that
contains the following information:

Year    GeoUnit                DataValue          VariableName

2000    08002                       7.50                NumArtsJobs2000
2000    08002                   11404.00        NumInfoTechJobs2000
2000    08002                     672.00        NumManuJobs2000
2000    08003                      42.00        NumArtsJobs2000
2000    08003                    3342.00        NumInfoTechJobs2000
2000    08003                    1279.00        NumManuJobs2000
2000    08004                        .00                NumArtsJobs2000
2000    08004                     183.00        NumInfoTechJobs2000
2000    08004                      66.00        NumManuJobs2000

The file that I ultimate want should consist of the above information,
but in the following structure:

GeoUnit         NumArtsJobs2000   NumInfoTechJobs2000    NumInfoTechJobs2000

08002                7.50                       11404.00
672.00
08003               42.00                       3342.00
1279.00
08004                  .00                        183.00
66.00

I would appreciate any help you may be able to provide to assist me in
reconfiguring the data.

Thank you in advance,
Vincent
Reply | Threaded
Open this post in threaded view
|

Deleting calculated values

Gary Oliver
Colleagues

I would like to keep the variables and their details eg format but
remove the values for selected variables from my file. I now have
command syntax that will derive all the required values. The reason I
want a delete values command is that my syntax does not always overwrite
the existing values in cells. I cannot find a delete values command so
how do you suggest I go about this?

Warm regards/gary
Reply | Threaded
Open this post in threaded view
|

Re: Deleting calculated values

Spousta Jan
Hi Gary,

If I understand it right you need something like

Recode /*here the list of numerical variables to be deleted*/ (else =
sysmis).
Execute.

For example

GET FILE='C:\Program Files\SPSS\Employee data.sav'.
Recode bdate jobtime to minority (else = sysmis).
Execute.

Best regards,

Jan

-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of
Gary Oliver
Sent: Thursday, March 29, 2007 9:43 AM
To: [hidden email]
Subject: Deleting calculated values

Colleagues

I would like to keep the variables and their details eg format but
remove the values for selected variables from my file. I now have
command syntax that will derive all the required values. The reason I
want a delete values command is that my syntax does not always overwrite
the existing values in cells. I cannot find a delete values command so
how do you suggest I go about this?

Warm regards/gary
Reply | Threaded
Open this post in threaded view
|

Re: Deleting calculated values

Richard Ristow
In reply to this post by Gary Oliver
At 03:43 AM 3/29/2007, Gary Oliver wrote:

>I would like to remove the values for selected variables from my file.

First, the method that Jan Spousta's recommendeds,

Recode /* list of numerical variables to be deleted*/
   (else = sysmis).
Execute.

is the neatest way I know of. (But, Jan? "Execute"?)

Second, though,

>The reason I want a delete values command is that my syntax [to derive
>all the required values] does not always overwrite the existing values.

For me, that raises a strong question whether you have logic problems.

First, an SPSS statement that assigns a new value to a variable, does
so regardless of the old value, whether that's valid or missing. That
leaves instances where you >think< SPSS is going to assign a new value,
but it doesn't. Look for,

- IF statements where the test fails. By design, this leaves the
previous value of the target variable unaltered

- RECODEing a variable into itself leaves the value unchanged, if no
RECODE condition is satisfied. (That's one reason to be careful of
RECODEing a variable into itself - or, to resolve never to do it at
all. You can, by the way, ensure that some condition is satisfied, but
including an ELSE condition.)

- Assignment by (usually) COMPUTEs in a DO IF construct, if no clause
containing a COMPUTE is executed. A well-known, and insidious, instance
is if a test on the DO IF or an ELSE IF returns 'missing'. In this
case, the remainder of the construct, tests and all, is skipped >WITH
NO WARNING<.

If you expect values to be changed, and they aren't, check whether
problems like these are getting by you.

Second, it sounds like you're reading a file with variables that
already have values, and calculating new values for them. That's a
well-known recipe for confusion: there are versions of what looks like
the same file, with variables that may be computed according to one of
two or more logic rules.

Generally, maybe 99% of the time, don't do it. Often enough your raw
data changes, notably by adding more, and you need to rerun your
transformations. Do this by erasing the file with the transformed
variables, then reading the data file and generating the transformed
variables anew.

-Good luck to you,
  Richard
Reply | Threaded
Open this post in threaded view
|

Re: Restructuring an SPSS data file

Hal 9000
In reply to this post by la volta statistics
...just be SURE that each unique value for GeoUnit has all 3 values for
VarableName or you might wrongly assign values to columns. Generally, I find
that assuming my data is complete and accurate is a very bad idea!
-Gary


On 3/29/07, la volta statistics <[hidden email]> wrote:

>
> Hi Vincent
>
> Try the following syntax. Hope that helps.
>
> Christian
>
> DATA LIST FREE / Year(F8.0)    GeoUnit(A5)   DataValue(F8.2)
> VariableName(A19).
> BEGIN DATA
> 2000    08002                       7.50        NumArtsJobs2000
> 2000    08002                   11404.00        NumInfoTechJobs2000
> 2000    08002                     672.00        NumManuJobs2000
> 2000    08003                      42.00        NumArtsJobs2000
> 2000    08003                    3342.00        NumInfoTechJobs2000
> 2000    08003                    1279.00        NumManuJobs2000
> 2000    08004                        .00        NumArtsJobs2000
> 2000    08004                     183.00        NumInfoTechJobs2000
> 2000    08004                      66.00        NumManuJobs2000
> END DATA.
>
>
> SORT CASES BY Year GeoUnit VariableName .
> CASESTOVARS
> /ID = GeoUnit
> /drop = VariableName Year
> /seperator="".
>
>
> Rename Variable (DataValue1 DataValue2 DataValue3 = NumArtsJobs2000
> NumInfoTechJobs2000 NumManuJobs2000).
>
>
>
> *******************************
> la volta statistics
> Christian Schmidhauser, Dr.phil.II
> Weinbergstrasse 108
> Ch-8006 Zürich
> Tel: +41 (043) 233 98 01
> Fax: +41 (043) 233 98 02
> email: mailto:[hidden email]
> internet: http://www.lavolta.ch/
>
>
> -----Ursprüngliche Nachricht-----
> Von: SPSSX(r) Discussion [mailto:[hidden email]]Im Auftrag von
> Vincent Louis
> Gesendet: Mittwoch, 28. März 2007 22:56
> An: [hidden email]
> Betreff: Restructuring an SPSS data file
>
>
> Hi Everyone,
> I am trying to write an SPSS syntax to restructure a file that
> contains the following information:
>
> Year    GeoUnit                DataValue          VariableName
>
> 2000    08002                       7.50                NumArtsJobs2000
> 2000    08002                   11404.00        NumInfoTechJobs2000
> 2000    08002                     672.00        NumManuJobs2000
> 2000    08003                      42.00        NumArtsJobs2000
> 2000    08003                    3342.00        NumInfoTechJobs2000
> 2000    08003                    1279.00        NumManuJobs2000
> 2000    08004                        .00                NumArtsJobs2000
> 2000    08004                     183.00        NumInfoTechJobs2000
> 2000    08004                      66.00        NumManuJobs2000
>
> The file that I ultimate want should consist of the above information,
> but in the following structure:
>
> GeoUnit         NumArtsJobs2000
> NumInfoTechJobs2000    NumInfoTechJobs2000
>
> 08002                7.50                       11404.00
> 672.00
> 08003               42.00                       3342.00
> 1279.00
> 08004                  .00                        183.00
> 66.00
>
> I would appreciate any help you may be able to provide to assist me in
> reconfiguring the data.
>
> Thank you in advance,
> Vincent
>
Reply | Threaded
Open this post in threaded view
|

Re: Restructuring an SPSS data file

Richard Ristow
At 06:05 PM 3/29/2007, Hal 9000 wrote:

>Generally, I find that assuming my data is complete and accurate is a
>very bad idea!

Awwwww - YOU'RE no fun.

"Thou shalt not seek for bugs in thy programs, nor yet for errors in
thy data; for if thou dost, lo, thou shalt surely find them."
Reply | Threaded
Open this post in threaded view
|

Re: AW: Restructuring an SPSS data file

Richard Ristow
In reply to this post by la volta statistics
I'd recommend a variation. At 03:11 AM 3/29/2007, la volta statistics
wrote:

>SORT CASES BY Year GeoUnit VariableName .
>CASESTOVARS
>  /ID = GeoUnit
>  /drop = VariableName Year
>  /seperator="".
>
>Rename Variable (DataValue1 DataValue2 DataValue3
>                = NumArtsJobs2000
>                  NumInfoTechJobs2000
>                  NumManuJobs2000).

At 06:05 PM 3/29/2007, Hal 9000 warned, quite properly,

>...just be SURE that each unique value for GeoUnit has all 3 values
>for VariableName or you might wrongly assign values to columns.

But easier to write (needs no "RENAME VARIABLES") and more reliable,
let CASESTOVARS take the variable names from the data. SPSS 15 draft
output <WRR-not saved separately>:

List
|-----------------------------|---------------------------|
|Output Created               |29-MAR-2007 19:19:04       |
|-----------------------------|---------------------------|
     Year GeoUnit DataValue VariableName

     2000 08002        7.50 NumArtsJobs2000
     2000 08002    11404.00 NumInfoTechJobs2000
     2000 08002      672.00 NumManuJobs2000
     2000 08003       42.00 NumArtsJobs2000
     2000 08003     3342.00 NumInfoTechJobs2000
     2000 08003     1279.00 NumManuJobs2000
     2000 08004         .00 NumArtsJobs2000
     2000 08004      183.00 NumInfoTechJobs2000
     2000 08004       66.00 NumManuJobs2000

Number of cases read:  9    Number of cases listed:  9


SORT CASES BY GeoUnit VariableName .
CASESTOVARS
  /ID      = GeoUnit
  /INDEX   = VariableName
  /GROUPBY = VARIABLE
  /DROP    = YEAR.


Cases to Variables
|----------------------------|---------------------------|
|Output Created              |29-MAR-2007 19:19:04       |
|----------------------------|---------------------------|

Generated Variables
|---------|---------------|---------------|
|Original |VariableName   |Result         |
|Variable |               |---------------|
|         |               |Name           |
|---------|---------------|---------------|
|DataValue|NumArtsJobs2000|NumArtsJobs2000|
|         |---------------|---------------|
|         |NumInfoTechJobs|NumInfoTechJobs|
|         |2000           |2000           |
|         |---------------|---------------|
|         |NumManuJobs2000|NumManuJobs2000|
|---------|---------------|---------------|

Processing Statistics
|---------------|---|
|Cases In       |9  |
|Cases Out      |3  |
|---------------|---|
|Cases In/Cases |3.0|
|Out            |   |
|---------------|---|
|Variables In   |4  |
|Variables Out  |4  |
|---------------|---|
|Index Values   |3  |
|---------------|---|


LIST.

List
|-----------------------------|---------------------------|
|Output Created               |29-MAR-2007 19:19:04       |
|-----------------------------|---------------------------|
GeoUnit NumArtsJobs2000 NumInfoTechJobs2000 NumManuJobs2000

08002           7.50          11404.00            672.00
08003          42.00           3342.00           1279.00
08004            .00            183.00             66.00

Number of cases read:  3    Number of cases listed:  3
===================
APPENDIX: Test data
===================
DATA LIST FREE / Year(F8.0)    GeoUnit(A5)   DataValue(F8.2)
VariableName(A19).
BEGIN DATA
2000    08002                       7.50        NumArtsJobs2000
2000    08002                   11404.00        NumInfoTechJobs2000
2000    08002                     672.00        NumManuJobs2000
2000    08003                      42.00        NumArtsJobs2000
2000    08003                    3342.00        NumInfoTechJobs2000
2000    08003                    1279.00        NumManuJobs2000
2000    08004                        .00        NumArtsJobs2000
2000    08004                     183.00        NumInfoTechJobs2000
2000    08004                      66.00        NumManuJobs2000
END DATA.

LIST.