SPSSX Discussion

Restructuring an SPSS data file

Classic

List

Threaded

8 messages Options

Vincent LOUIS-2

Restructuring an SPSS data file

Hi Everyone,
I am trying to write an SPSS syntax to restructure a file that
contains the following information:

Year GeoUnit DataValue VariableName

2000 08002 7.50 NumArtsJobs2000
2000 08002 11404.00 NumInfoTechJobs2000
2000 08002 672.00 NumManuJobs2000
2000 08003 42.00 NumArtsJobs2000
2000 08003 3342.00 NumInfoTechJobs2000
2000 08003 1279.00 NumManuJobs2000
2000 08004 .00 NumArtsJobs2000
2000 08004 183.00 NumInfoTechJobs2000
2000 08004 66.00 NumManuJobs2000

The file that I ultimate want should consist of the above information,
but in the following structure:

GeoUnit NumArtsJobs2000 NumInfoTechJobs2000 NumInfoTechJobs2000

08002 7.50 11404.00 672.00
08003 42.00 3342.00 1279.00
08004 .00 183.00 66.00

I would appreciate any help you may be able to provide to assist me in
reconfiguring the data.

Thank you in advance,
Vincent

la volta statistics

AW: Restructuring an SPSS data file

Hi Vincent

Try the following syntax. Hope that helps.

Christian

DATA LIST FREE / Year(F8.0) GeoUnit(A5) DataValue(F8.2)
VariableName(A19).
BEGIN DATA
2000 08002 7.50 NumArtsJobs2000
2000 08002 11404.00 NumInfoTechJobs2000
2000 08002 672.00 NumManuJobs2000
2000 08003 42.00 NumArtsJobs2000
2000 08003 3342.00 NumInfoTechJobs2000
2000 08003 1279.00 NumManuJobs2000
2000 08004 .00 NumArtsJobs2000
2000 08004 183.00 NumInfoTechJobs2000
2000 08004 66.00 NumManuJobs2000
END DATA.

SORT CASES BY Year GeoUnit VariableName .
CASESTOVARS
/ID = GeoUnit
/drop = VariableName Year
/seperator="".

Rename Variable (DataValue1 DataValue2 DataValue3 = NumArtsJobs2000
NumInfoTechJobs2000 NumManuJobs2000).

*******************************
la volta statistics
Christian Schmidhauser, Dr.phil.II
Weinbergstrasse 108
Ch-8006 Zürich
Tel: +41 (043) 233 98 01
Fax: +41 (043) 233 98 02
email: mailto:[hidden email]
internet: http://www.lavolta.ch/

-----Ursprüngliche Nachricht-----
Von: SPSSX(r) Discussion [mailto:[hidden email]]Im Auftrag von
Vincent Louis
Gesendet: Mittwoch, 28. März 2007 22:56
An: [hidden email]
Betreff: Restructuring an SPSS data file

Hi Everyone,
I am trying to write an SPSS syntax to restructure a file that
contains the following information:

Year GeoUnit DataValue VariableName

2000 08002 7.50 NumArtsJobs2000
2000 08002 11404.00 NumInfoTechJobs2000
2000 08002 672.00 NumManuJobs2000
2000 08003 42.00 NumArtsJobs2000
2000 08003 3342.00 NumInfoTechJobs2000
2000 08003 1279.00 NumManuJobs2000
2000 08004 .00 NumArtsJobs2000
2000 08004 183.00 NumInfoTechJobs2000
2000 08004 66.00 NumManuJobs2000

The file that I ultimate want should consist of the above information,
but in the following structure:

GeoUnit NumArtsJobs2000 NumInfoTechJobs2000 NumInfoTechJobs2000

08002 7.50 11404.00
672.00
08003 42.00 3342.00
1279.00
08004 .00 183.00
66.00

I would appreciate any help you may be able to provide to assist me in
reconfiguring the data.

Thank you in advance,
Vincent

Gary Oliver

Deleting calculated values

Colleagues

I would like to keep the variables and their details eg format but
remove the values for selected variables from my file. I now have
command syntax that will derive all the required values. The reason I
want a delete values command is that my syntax does not always overwrite
the existing values in cells. I cannot find a delete values command so
how do you suggest I go about this?

Warm regards/gary

Spousta Jan

Re: Deleting calculated values

Hi Gary,

If I understand it right you need something like

Recode /*here the list of numerical variables to be deleted*/ (else =
sysmis).
Execute.

For example

GET FILE='C:\Program Files\SPSS\Employee data.sav'.
Recode bdate jobtime to minority (else = sysmis).
Execute.

Best regards,

Jan

-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of
Gary Oliver
Sent: Thursday, March 29, 2007 9:43 AM
To: [hidden email]
Subject: Deleting calculated values

Colleagues

I would like to keep the variables and their details eg format but
remove the values for selected variables from my file. I now have
command syntax that will derive all the required values. The reason I
want a delete values command is that my syntax does not always overwrite
the existing values in cells. I cannot find a delete values command so
how do you suggest I go about this?

Warm regards/gary

Richard Ristow

Re: Deleting calculated values

In reply to this post by Gary Oliver

At 03:43 AM 3/29/2007, Gary Oliver wrote:

>I would like to remove the values for selected variables from my file.

First, the method that Jan Spousta's recommendeds,

Recode /* list of numerical variables to be deleted*/
(else = sysmis).
Execute.

is the neatest way I know of. (But, Jan? "Execute"?)

Second, though,

>The reason I want a delete values command is that my syntax [to derive
>all the required values] does not always overwrite the existing values.

For me, that raises a strong question whether you have logic problems.

First, an SPSS statement that assigns a new value to a variable, does
so regardless of the old value, whether that's valid or missing. That
leaves instances where you >think< SPSS is going to assign a new value,
but it doesn't. Look for,

- IF statements where the test fails. By design, this leaves the
previous value of the target variable unaltered

- RECODEing a variable into itself leaves the value unchanged, if no
RECODE condition is satisfied. (That's one reason to be careful of
RECODEing a variable into itself - or, to resolve never to do it at
all. You can, by the way, ensure that some condition is satisfied, but
including an ELSE condition.)

- Assignment by (usually) COMPUTEs in a DO IF construct, if no clause
containing a COMPUTE is executed. A well-known, and insidious, instance
is if a test on the DO IF or an ELSE IF returns 'missing'. In this
case, the remainder of the construct, tests and all, is skipped >WITH
NO WARNING<.

If you expect values to be changed, and they aren't, check whether
problems like these are getting by you.

Second, it sounds like you're reading a file with variables that
already have values, and calculating new values for them. That's a
well-known recipe for confusion: there are versions of what looks like
the same file, with variables that may be computed according to one of
two or more logic rules.

Generally, maybe 99% of the time, don't do it. Often enough your raw
data changes, notably by adding more, and you need to rerun your
transformations. Do this by erasing the file with the transformed
variables, then reading the data file and generating the transformed
variables anew.

-Good luck to you,
Richard

Hal 9000

Re: Restructuring an SPSS data file

In reply to this post by la volta statistics

...just be SURE that each unique value for GeoUnit has all 3 values for
VarableName or you might wrongly assign values to columns. Generally, I find
that assuming my data is complete and accurate is a very bad idea!
-Gary

On 3/29/07, la volta statistics <[hidden email]> wrote:

>
> Hi Vincent
>
> Try the following syntax. Hope that helps.
>
> Christian
>
> DATA LIST FREE / Year(F8.0) GeoUnit(A5) DataValue(F8.2)
> VariableName(A19).
> BEGIN DATA
> 2000 08002 7.50 NumArtsJobs2000
> 2000 08002 11404.00 NumInfoTechJobs2000
> 2000 08002 672.00 NumManuJobs2000
> 2000 08003 42.00 NumArtsJobs2000
> 2000 08003 3342.00 NumInfoTechJobs2000
> 2000 08003 1279.00 NumManuJobs2000
> 2000 08004 .00 NumArtsJobs2000
> 2000 08004 183.00 NumInfoTechJobs2000
> 2000 08004 66.00 NumManuJobs2000
> END DATA.
>
>
> SORT CASES BY Year GeoUnit VariableName .
> CASESTOVARS
> /ID = GeoUnit
> /drop = VariableName Year
> /seperator="".
>
>
> Rename Variable (DataValue1 DataValue2 DataValue3 = NumArtsJobs2000
> NumInfoTechJobs2000 NumManuJobs2000).
>
>
>
> *******************************
> la volta statistics
> Christian Schmidhauser, Dr.phil.II
> Weinbergstrasse 108
> Ch-8006 Zürich
> Tel: +41 (043) 233 98 01
> Fax: +41 (043) 233 98 02
> email: mailto:[hidden email]
> internet: http://www.lavolta.ch/
>
>
> -----Ursprüngliche Nachricht-----
> Von: SPSSX(r) Discussion [mailto:[hidden email]]Im Auftrag von
> Vincent Louis
> Gesendet: Mittwoch, 28. März 2007 22:56
> An: [hidden email]
> Betreff: Restructuring an SPSS data file
>
>
> Hi Everyone,
> I am trying to write an SPSS syntax to restructure a file that
> contains the following information:
>
> Year GeoUnit DataValue VariableName
>
> 2000 08002 7.50 NumArtsJobs2000
> 2000 08002 11404.00 NumInfoTechJobs2000
> 2000 08002 672.00 NumManuJobs2000
> 2000 08003 42.00 NumArtsJobs2000
> 2000 08003 3342.00 NumInfoTechJobs2000
> 2000 08003 1279.00 NumManuJobs2000
> 2000 08004 .00 NumArtsJobs2000
> 2000 08004 183.00 NumInfoTechJobs2000
> 2000 08004 66.00 NumManuJobs2000
>
> The file that I ultimate want should consist of the above information,
> but in the following structure:
>
> GeoUnit NumArtsJobs2000
> NumInfoTechJobs2000 NumInfoTechJobs2000
>
> 08002 7.50 11404.00
> 672.00
> 08003 42.00 3342.00
> 1279.00
> 08004 .00 183.00
> 66.00
>
> I would appreciate any help you may be able to provide to assist me in
> reconfiguring the data.
>
> Thank you in advance,
> Vincent
>

Richard Ristow

Re: Restructuring an SPSS data file

At 06:05 PM 3/29/2007, Hal 9000 wrote:

>Generally, I find that assuming my data is complete and accurate is a
>very bad idea!

Awwwww - YOU'RE no fun.

"Thou shalt not seek for bugs in thy programs, nor yet for errors in
thy data; for if thou dost, lo, thou shalt surely find them."

Richard Ristow

Re: AW: Restructuring an SPSS data file

In reply to this post by la volta statistics

I'd recommend a variation. At 03:11 AM 3/29/2007, la volta statistics
wrote:

>SORT CASES BY Year GeoUnit VariableName .
>CASESTOVARS
> /ID = GeoUnit
> /drop = VariableName Year
> /seperator="".
>
>Rename Variable (DataValue1 DataValue2 DataValue3
> = NumArtsJobs2000
> NumInfoTechJobs2000
> NumManuJobs2000).

At 06:05 PM 3/29/2007, Hal 9000 warned, quite properly,

>...just be SURE that each unique value for GeoUnit has all 3 values
>for VariableName or you might wrongly assign values to columns.

But easier to write (needs no "RENAME VARIABLES") and more reliable,
let CASESTOVARS take the variable names from the data. SPSS 15 draft
output <WRR-not saved separately>:

List
|-----------------------------|---------------------------|
|Output Created |29-MAR-2007 19:19:04 |
|-----------------------------|---------------------------|
Year GeoUnit DataValue VariableName

2000 08002 7.50 NumArtsJobs2000
2000 08002 11404.00 NumInfoTechJobs2000
2000 08002 672.00 NumManuJobs2000
2000 08003 42.00 NumArtsJobs2000
2000 08003 3342.00 NumInfoTechJobs2000
2000 08003 1279.00 NumManuJobs2000
2000 08004 .00 NumArtsJobs2000
2000 08004 183.00 NumInfoTechJobs2000
2000 08004 66.00 NumManuJobs2000

Number of cases read: 9 Number of cases listed: 9

SORT CASES BY GeoUnit VariableName .
CASESTOVARS
/ID = GeoUnit
/INDEX = VariableName
/GROUPBY = VARIABLE
/DROP = YEAR.

Cases to Variables
|----------------------------|---------------------------|
|Output Created |29-MAR-2007 19:19:04 |
|----------------------------|---------------------------|

Generated Variables
|---------|---------------|---------------|
|Original |VariableName |Result |
|Variable | |---------------|
| | |Name |
|---------|---------------|---------------|
|DataValue|NumArtsJobs2000|NumArtsJobs2000|
| |---------------|---------------|
| |NumInfoTechJobs|NumInfoTechJobs|
| |2000 |2000 |
| |---------------|---------------|
| |NumManuJobs2000|NumManuJobs2000|
|---------|---------------|---------------|

Processing Statistics
|---------------|---|
|Cases In |9 |
|Cases Out |3 |
|---------------|---|
|Cases In/Cases |3.0|
|Out | |
|---------------|---|
|Variables In |4 |
|Variables Out |4 |
|---------------|---|
|Index Values |3 |
|---------------|---|

LIST.

List
|-----------------------------|---------------------------|
|Output Created |29-MAR-2007 19:19:04 |
|-----------------------------|---------------------------|
GeoUnit NumArtsJobs2000 NumInfoTechJobs2000 NumManuJobs2000

08002 7.50 11404.00 672.00
08003 42.00 3342.00 1279.00
08004 .00 183.00 66.00

Number of cases read: 3 Number of cases listed: 3
===================
APPENDIX: Test data
===================
DATA LIST FREE / Year(F8.0) GeoUnit(A5) DataValue(F8.2)
VariableName(A19).
BEGIN DATA
2000 08002 7.50 NumArtsJobs2000
2000 08002 11404.00 NumInfoTechJobs2000
2000 08002 672.00 NumManuJobs2000
2000 08003 42.00 NumArtsJobs2000
2000 08003 3342.00 NumInfoTechJobs2000
2000 08003 1279.00 NumManuJobs2000
2000 08004 .00 NumArtsJobs2000
2000 08004 183.00 NumInfoTechJobs2000
2000 08004 66.00 NumManuJobs2000
END DATA.

LIST.