SPSSX Discussion

how to restructure data file

Classic

List

Threaded

7 messages Options

msherman

how to restructure data file

Dear List: I have the following data set arranged something like this.

ID Session a1 a2 a3 b1 b2 b3 b4 b5 c1 c2 c3 c4 c5 c6.

01 4 1 3 2 4 4 1 5 2 1 1 4 2 2 4

01 8 3 4 2 4 4 3 3 4 3 4 4 4 2 3

02 3 1 1 2 1 2 1 1 2 3 1 1 2 1 1

03 5 2 2 2 1 2 1 1 3 2 1 2 2 2 1

03 25 1 2 3 4 3 2 1 2 3 4 2 3 1 1

04 2 3 3 3 3 3 4 4 4 4 2 2 3 3 3

04 6 4 4 4 4 4 3 4 4 3 3 3 4 4 4

04 10 4 4 4 4 4 4 4 4 4 4 4 4 4 4

Etc.

I want each subject to have one row of data values and I want variables for each session regardless of whether or not the subject had data for that session. Thus, I want session 1 with my variables, then session 2 with my variables, session 3 with my variables. Now it may be that I have no data at all for session 3 but I still want to have the variable Session 3 with all of my variables even though they may be missing.

Thus, what I want is the following.

ID Session_1 a1_1 a2_1 a3_1 b1_1 b2_1 b3_1 b4_1 b5_1 c1_1 c2_1 c3_1 c4_1 c5_1 c6_1 Session_2 a1_2 a2_2 a3_2 b1_2 b2_2 b3_2 b4_2 b5_2 c1_2 c2_2 c3_2 c4_2 c5_2 c6_2 etc session_25 a1_25 a2_25 a3_25 b1_25 b2_25 b3_25 b4_25 b5_25 c1_25 c2_25 c3_25 c4_25 c5_25 c6_25

I have tried using Restructure but have been unable to produce the data file. Does anyone have any suggestions. Thanks, martin sherman

Albert-Jan Roskam

Re: how to restructure data file

Hi Martin,

Did you already try using CASESTOVARS?

data list free / ID Session a1 a2 a3 b1 b2 b3 b4 b5 c1 c2 c3 c4 c5 c6.
begin data
01     4          1    3   2    4    4    1    5   2   1   1    4   2    2   4
01     8          3    4   2    4    4    3    3   4   3   4    4    4   2   3
02     3          1    1    2    1   2    1    1   2   3   1    1    2   1   1
03     5          2   2     2    1    2    1   1   3   2   1    2    2   2   1
03   25          1   2    3    4    3    2    1   2   3   4    2   3    1   1
04     2          3   3    3    3    3    4    4    4   4   2    2    3   3   3
04     6          4 4    4    4     4    3    4    4   3   3    3    4   4   4
04    10        4   4   4    4     4     4    4    4    4   4   4    4   4   4
end data.

sort cases by id session .
casestovars / id = id.

Cheers!!
Albert-Jan

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
All right, but apart from the sanitation, the medicine, education, wine, public order, irrigation, roads, a fresh water system, and public health, what have the Romans ever done for us?
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

--- On Wed, 7/7/10, Martin Sherman <[hidden email]> wrote:

From: Martin Sherman <[hidden email]>
Subject: [SPSSX-L] how to restructure data file
To: [hidden email]
Date: Wednesday, July 7, 2010, 3:45 PM

Dear List: I have the following data set arranged something like this.

ID Session a1 a2 a3 b1 b2 b3 b4 b5 c1 c2 c3 c4 c5 c6.
01     4          1    3   2    4    4    1    5   2   1   1    4   2    2 4
01     8          3    4   2    4    4    3    3   4   3   4    4    4   2   3
02     3          1    1   2    1   2    1    1   2   3   1    1    2   1   1
03     5          2   2     2    1    2    1   1   3   2   1    2    2   2   1
03   25         1   2    3    4    3    2    1   2   3   4    2   3    1   1
04     2          3   3    3    3    3    4    4    4   4   2    2    3   3   3
04     6          4 4    4    4     4    3    4    4   3   3    3    4   4   4
04    10        4   4   4    4    4     4    4    4    4   4   4    4   4   4
Etc.
I want each subject to have one row of data values and I want variables for each session regardless of whether or not the subject had data for that session. Thus, I want session 1 with my variables, then session 2 with my variables, session 3 with my variables. Now it may be that I have no data at all for session 3 but I still want to have the variable Session 3 with all of my variables even though they may be missing.
Thus, what I want is the following.
ID   Session_1 a1_1 a2_1 a3_1 b1_1 b2_1 b3_1 b4_1 b5_1 c1_1 c2_1 c3_1 c4_1 c5_1 c6_1 Session_2 a1_2 a2_2 a3_2 b1_2 b2_2 b3_2 b4_2 b5_2 c1_2 c2_2 c3_2 c4_2 c5_2 c6_2 etc session_25 a1_25 a2_25 a3_25 b1_25 b2_25 b3_25 b4_25 b5_25 c1_25 c2_25 c3_25 c4_25 c5_25 c6_25

I have tried using Restructure but have been unable to produce the data file.   Does anyone have any suggestions. Thanks, martin sherman

Maguin, Eugene

Re: how to restructure data file

In reply to this post by msherman

Martin,

I'm assuming you did this already, Yes?

casestovars id=id/index=session.

In your example data session ranges between 2 and 25. Not all possible
session values in that range are present in the data. Do you want all
possible session values represented in the resulting dataset or only the
ones actually present in the data?

If the latter, you're done; if the former, you'll need to add variables
using the numeric command.

Gene Maguin

>>>Dear List: I have the following data set arranged something like this.

ID Session a1 a2 a3 b1 b2 b3 b4 b5 c1 c2 c3 c4 c5 c6.
01 4 1 3 2 4 4 1 5 2 1 1 4 2 2 4
01 8 3 4 2 4 4 3 3 4 3 4 4 4 2 3
02 3 1 1 2 1 2 1 1 2 3 1 1 2 1 1
03 5 2 2 2 1 2 1 1 3 2 1 2 2 2 1
03 25 1 2 3 4 3 2 1 2 3 4 2 3 1 1
04 2 3 3 3 3 3 4 4 4 4 2 2 3 3 3
04 6 4 4 4 4 4 3 4 4 3 3 3 4 4 4
04 10 4 4 4 4 4 4 4 4 4 4 4 4 4 4
Etc.

I want each subject to have one row of data values and I want variables for
each session regardless of whether or not the subject had data for that
session. Thus, I want session 1 with my variables, then session 2 with my
variables, session 3 with my variables. Now it may be that I have no data
at all for session 3 but I still want to have the variable Session 3 with
all of my variables even though they may be missing.

Thus, what I want is the following.

ID Session_1 a1_1 a2_1 a3_1 b1_1 b2_1 b3_1 b4_1 b5_1 c1_1 c2_1
c3_1 c4_1 c5_1 c6_1 Session_2 a1_2 a2_2 a3_2 b1_2 b2_2 b3_2 b4_2
b5_2 c1_2 c2_2 c3_2 c4_2 c5_2 c6_2 etc session_25 a1_25 a2_25 a3_25
b1_25 b2_25 b3_25 b4_25 b5_25 c1_25 c2_25 c3_25 c4_25 c5_25 c6_25

I have tried using Restructure but have been unable to produce the data
file. Does anyone have any suggestions. Thanks, martin sherman

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

Bruce Weaver

Re: how to restructure data file

Administrator

Gene & Albert-Jan, I don't think your solutions are creating all of the variables that Martin wants. I think you need to do something like the following--although there are no doubt more elegant ways to do it!

new file.
dataset close all.

* --- Read in the sample data --- .
data list list / ID(a2) Session(f2.0) a1 a2 a3 b1 b2 b3 b4 b5 c1 c2 c3 c4 c5 c6 (14f2.0).
begin data
01 4 1 3 2 4 4 1 5 2 1 1 4 2 2 4
01 8 3 4 2 4 4 3 3 4 3 4 4 4 2 3
02 3 1 1 2 1 2 1 1 2 3 1 1 2 1 1
03 5 2 2 2 1 2 1 1 3 2 1 2 2 2 1
03 25 1 2 3 4 3 2 1 2 3 4 2 3 1 1
04 2 3 3 3 3 3 4 4 4 4 2 2 3 3 3
04 6 4 4 4 4 4 3 4 4 3 3 3 4 4 4
04 10 4 4 4 4 4 4 4 4 4 4 4 4 4 4
end data.

dataset name original.
sort cases by id session .
casestovars / id = id /index=session /sep="_".

data list list / ID(a2).
begin data
00
end data.
dataset name allvars.

* Define macro to create the variables Martin wants .
define !newvars ()
!do !i = 1 !to 25
numeric
!concat("Session_",!i)
!concat("a1_",!i)
!concat("a2_",!i)
!concat("a3_",!i)
!concat("b1_",!i)
!concat("b2_",!i)
!concat("b3_",!i)
!concat("b4_",!i)
!concat("b5_",!i)
!concat("c1_",!i)
!concat("c2_",!i)
!concat("c3_",!i)
!concat("c4_",!i)
!concat("c5_",!i)
!concat("c6_",!i) (f2.0).
!doend
execute.
!enddefine.

!newvars.

* Now stack the two files, with ALLVARS first to control order of variables.

ADD FILES
file = * /
file = 'original'.
execute.

dataset close original.
select if (ID NE "00").
exe.

Gene Maguin wrote

Martin,

I'm assuming you did this already, Yes?

casestovars id=id/index=session.

In your example data session ranges between 2 and 25. Not all possible
session values in that range are present in the data. Do you want all
possible session values represented in the resulting dataset or only the
ones actually present in the data?

If the latter, you're done; if the former, you'll need to add variables
using the numeric command.

Gene Maguin

>>>Dear List: I have the following data set arranged something like this.

ID Session a1 a2 a3 b1 b2 b3 b4 b5 c1 c2 c3 c4 c5 c6.
01 4 1 3 2 4 4 1 5 2 1 1 4 2 2 4
01 8 3 4 2 4 4 3 3 4 3 4 4 4 2 3
02 3 1 1 2 1 2 1 1 2 3 1 1 2 1 1
03 5 2 2 2 1 2 1 1 3 2 1 2 2 2 1
03 25 1 2 3 4 3 2 1 2 3 4 2 3 1 1
04 2 3 3 3 3 3 4 4 4 4 2 2 3 3 3
04 6 4 4 4 4 4 3 4 4 3 3 3 4 4 4
04 10 4 4 4 4 4 4 4 4 4 4 4 4 4 4
Etc.

I want each subject to have one row of data values and I want variables for
each session regardless of whether or not the subject had data for that
session. Thus, I want session 1 with my variables, then session 2 with my
variables, session 3 with my variables. Now it may be that I have no data
at all for session 3 but I still want to have the variable Session 3 with
all of my variables even though they may be missing.

Thus, what I want is the following.

ID Session_1 a1_1 a2_1 a3_1 b1_1 b2_1 b3_1 b4_1 b5_1 c1_1 c2_1
c3_1 c4_1 c5_1 c6_1 Session_2 a1_2 a2_2 a3_2 b1_2 b2_2 b3_2 b4_2
b5_2 c1_2 c2_2 c3_2 c4_2 c5_2 c6_2 etc session_25 a1_25 a2_25 a3_25
b1_25 b2_25 b3_25 b4_25 b5_25 c1_25 c2_25 c3_25 c4_25 c5_25 c6_25

I have tried using Restructure but have been unable to produce the data
file. Does anyone have any suggestions. Thanks, martin sherman

=====================
To manage your subscription to SPSSX-L, send a message to
LISTSERV@LISTSERV.UGA.EDU (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

--
Bruce Weaver
bweaver@lakeheadu.ca
http://sites.google.com/a/lakeheadu.ca/bweaver/

"When all else fails, RTFM."

PLEASE NOTE THE FOLLOWING:
1. My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above.
2. The SPSSX Discussion forum on Nabble is no longer linked to the SPSSX-L listserv administered by UGA (https://listserv.uga.edu/).

Richard Ristow

Re: how to restructure data file

At 07:14 PM 7/7/2010, Bruce Weaver wrote:

Gene & Albert-Jan, I don't think your solutions are creating all of the variables that Martin wants. I think you need to do something like the following--although there are no doubt more elegant ways to do it!

Bruce notes that the problem is to create variables for sessions that are provided for in the study, but for which no subject in the file has any data. CASESTOVARS can roll up the file and create variables for all sessions that are there, but not for all that could be there.

Bruce's solution is, using a macro, to create a dummy record (for 'subject' 00) containing all the desired variables, and catenate this ahead of the file actually created.

An alternative (not tested) is to create and catenate the dummy subject before the CASESTOVARS, using LOOP and XSAVE. It's similar logic, but it takes less code. NOT TESTED:

* This requires that the original data be in dataset DATA, . * and a scratch file (not dataset) named DUMMY be available. * Initial dummy record, as Bruce had it. data list list / ID(a2). begin data 00 end data. * XSAVE loop to create 'records' for sessions 1 to 25 . * (These records don't need variables a1 to c6; those . * will be added when the files are catenated.) . LOOP Session = 1 TO 25. . XSAVE OUTFILE=Dummy. END LOOP. EXECUTE /* This one is needed */. * Catenate dummy and real data: . ADD FILES /FILE=Dummy /FILE=DATA. * Now CASESTOVARS should give what is wanted: . * (I've copied Gene's CASESTOVARS and Bruce's SELECT IF). casestovars id=id/index=session. select if (ID NE "00"). ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD

Bruce Weaver

Re: how to restructure data file

Administrator

Hi Richard. I couldn't get it to work with an XSAVE loop--I was getting a data set with only one row, and Session = 26. But your post prompted me to try an INPUT PROGRAM, which did work. Here's the whole thing again. Notice the /GROUPBY INDEX in the CasesToVars command.

new file.
dataset close all.

input program.
loop Session = 1 to 25.
end case.
end loop.
end file.
end input program.
exe.

dataset name dummy window = front.
string ID (a2).
compute ID = '00' .
exe.

* --- Read in the sample data --- .
data list list / ID(a2) Session(f2.0) a1 a2 a3 b1 b2 b3 b4 b5 c1 c2 c3 c4 c5 c6 (14f2.0).
begin data
01 4 1 3 2 4 4 1 5 2 1 1 4 2 2 4
01 8 3 4 2 4 4 3 3 4 3 4 4 4 2 3
02 3 1 1 2 1 2 1 1 2 3 1 1 2 1 1
03 5 2 2 2 1 2 1 1 3 2 1 2 2 2 1
03 25 1 2 3 4 3 2 1 2 3 4 2 3 1 1
04 2 3 3 3 3 3 4 4 4 4 2 2 3 3 3
04 6 4 4 4 4 4 3 4 4 3 3 3 4 4 4
04 10 4 4 4 4 4 4 4 4 4 4 4 4 4 4
end data.

dataset name original.

* Stack the two datasets .

ADD FILES
/FILE= 'dummy'
/FILE= 'original'.
EXE.

dataset name merged.
dataset close all.

* Now CASESTOVARS should give what is wanted.
* But note that one must specify GROUPBY index to
* get the variables in the desired order.

sort cases by id session .
casestovars / id = id /index=session /sep="_" /groupby = index.
select if (ID NE "00").
exe.

Cheers,
Bruce

Richard Ristow wrote

At 07:14 PM 7/7/2010, Bruce Weaver wrote:
Gene & Albert-Jan, I don't
think your solutions are creating all of the variables that Martin
wants. I think you need to do something like the
following--although there are no doubt more elegant ways to do
it!
Bruce notes that the problem is to create variables for sessions that are
provided for in the study, but for which no subject in the file has any
data. CASESTOVARS can roll up the file and create variables for all
sessions that are there, but not for all that could be
there.
Bruce's solution is, using a macro, to create a dummy record (for
'subject' 00) containing all the desired variables, and catenate this
ahead of the file actually created.
An alternative (not tested) is to create and catenate the dummy subject
before the CASESTOVARS, using
LOOP and XSAVE.
It's similar logic, but it takes less code. NOT TESTED:
* This requires that the original data be in
dataset DATA, .
* and a scratch file (not dataset) named DUMMY be
available.

* Initial dummy record, as Bruce had it.
data list list / ID(a2).
begin data
00
end data.
* XSAVE loop to create 'records' for sessions 1 to 25 .
* (These records don't need variables a1 to c6; those .
* will be added when the files are
catenated.)        .
LOOP Session = 1 TO 25.
. XSAVE OUTFILE=Dummy.
END LOOP.
EXECUTE /* This one is needed */.
* Catenate dummy and real
data:
.
ADD FILES
/FILE=Dummy
/FILE=DATA.
* Now CASESTOVARS should give what is
wanted:          .
* (I've copied Gene's CASESTOVARS and Bruce's SELECT IF).
casestovars id=id/index=session.
select if (ID NE "00").

=====================
To manage your subscription to SPSSX-L, send a message to
LISTSERV@LISTSERV.UGA.EDU (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

Richard Ristow

Re: how to restructure data file

At 03:34 PM 7/12/2010, Bruce Weaver wrote:

>I couldn't get it to work with an XSAVE loop--I was getting a data
>set with only one row, and Session = 26. But your post prompted me
>to try an INPUT PROGRAM, which did work.

The XSAVE loop doesn't put its result in the active dataset; it
writes to a scratch disk file (it must be a disk file, not a
dataset), which can then be loaded.

But for the present purpose, your INPUT PROGRAM is better.

>Notice the /GROUPBY INDEX in the CasesToVars command.

Thanks! I think we'd all missed that one.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD