how to restructure data file

classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

how to restructure data file

msherman

Dear List:  I have the following data set arranged something like this.

 

ID Session a1 a2 a3 b1 b2 b3 b4 b5 c1 c2 c3 c4 c5 c6.

01     4          1    3   2    4    4    1    5   2   1   1    4   2    2   4

01     8          3    4   2    4    4    3    3   4   3   4    4    4   2   3

02     3          1    1    2    1   2    1    1   2   3   1    1    2   1   1

03     5          2   2     2    1    2    1   1   3   2   1    2    2   2   1

03   25          1   2    3    4    3    2    1   2   3   4    2   3    1   1

04     2          3   3    3    3    3    4    4    4   4   2    2    3   3   3

04     6          4  4    4    4     4    3    4    4   3   3    3    4   4   4

04    10        4   4   4    4     4     4    4    4    4   4   4    4   4   4

Etc.

I want each subject to have one row of data values and I want variables for each session regardless of whether or not the subject had data for that session.  Thus, I want session 1 with my variables, then session 2 with my variables, session 3 with my variables.  Now it may be that I have no data at all for session 3 but I still want to have the variable Session 3 with all of my variables even though they may be missing.

Thus, what I want is the following.

ID   Session_1  a1_1  a2_1  a3_1  b1_1  b2_1  b3_1  b4_1  b5_1  c1_1  c2_1  c3_1  c4_1  c5_1  c6_1  Session_2  a1_2  a2_2  a3_2  b1_2  b2_2  b3_2  b4_2  b5_2  c1_2 c2_2  c3_2  c4_2 c5_2  c6_2  etc session_25 a1_25  a2_25 a3_25 b1_25 b2_25 b3_25 b4_25 b5_25 c1_25 c2_25 c3_25 c4_25 c5_25 c6_25

 

I have tried using Restructure but have been unable to produce the data file.   Does anyone have any suggestions.  Thanks,  martin sherman

Reply | Threaded
Open this post in threaded view
|

Re: how to restructure data file

Albert-Jan Roskam
Hi Martin,
 
Did you already try using CASESTOVARS?
 
data list free / ID Session a1 a2 a3 b1 b2 b3 b4 b5 c1 c2 c3 c4 c5 c6.
begin data
01     4          1    3   2    4    4    1    5   2   1   1    4   2    2   4
01     8          3    4   2    4    4    3    3   4   3   4    4    4   2   3
02     3          1    1    2    1   2    1    1   2   3   1    1    2   1   1
03     5          2   2     2    1    2    1   1   3   2   1    2    2   2   1
03   25          1   2    3    4    3    2    1   2   3   4    2   3    1   1
04     2          3   3    3    3    3    4    4    4   4   2    2    3   3   3
04     6          4  4    4    4     4    3    4    4   3   3    3    4   4   4
04    10        4   4   4    4     4     4    4    4    4   4   4    4   4   4
end data.
sort cases by id session .
casestovars / id = id.

Cheers!!
Albert-Jan

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
All right, but apart from the sanitation, the medicine, education, wine, public order, irrigation, roads, a fresh water system, and public health, what have the Romans ever done for us?
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

--- On Wed, 7/7/10, Martin Sherman <[hidden email]> wrote:

From: Martin Sherman <[hidden email]>
Subject: [SPSSX-L] how to restructure data file
To: [hidden email]
Date: Wednesday, July 7, 2010, 3:45 PM

Dear List:  I have the following data set arranged something like this.

 

ID Session a1 a2 a3 b1 b2 b3 b4 b5 c1 c2 c3 c4 c5 c6.

01     4          1    3   2    4    4    1    5   2   1   1    4   2    2   4

01     8          3    4   2    4    4    3    3   4   3   4    4    4   2   3

02     3          1    1    2    1   2    1    1   2   3   1    1    2   1   1

03     5          2   2     2    1    2    1   1   3   2   1    2    2   2   1

03   25          1   2    3    4    3    2    1   2   3   4    2   3    1   1

04     2          3   3    3    3    3    4    4    4   4   2    2    3   3   3

04     6          4  4    4    4     4    3    4    4   3   3    3    4   4   4

04    10        4   4   4    4     4     4    4    4    4   4   4    4   4   4

Etc.

I want each subject to have one row of data values and I want variables for each session regardless of whether or not the subject had data for that session.  Thus, I want session 1 with my variables, then session 2 with my variables, session 3 with my variables.  Now it may be that I have no data at all for session 3 but I still want to have the variable Session 3 with all of my variables even though they may be missing.

Thus, what I want is the following.

ID   Session_1  a1_1  a2_1  a3_1  b1_1  b2_1  b3_1  b4_1  b5_1  c1_1  c2_1  c3_1  c4_1  c5_1  c6_1  Session_2  a1_2  a2_2  a3_2  b1_2  b2_2  b3_2  b4_2  b5_2  c1_2 c2_2  c3_2  c4_2 c5_2  c6_2  etc session_25 a1_25  a2_25 a3_25 b1_25 b2_25 b3_25 b4_25 b5_25 c1_25 c2_25 c3_25 c4_25 c5_25 c6_25

 

I have tried using Restructure but have been unable to produce the data file.   Does anyone have any suggestions.  Thanks,  martin sherman


Reply | Threaded
Open this post in threaded view
|

Re: how to restructure data file

Maguin, Eugene
In reply to this post by msherman
Martin,

I'm assuming you did this already, Yes?

casestovars id=id/index=session.

In your example data session ranges between 2 and 25. Not all possible
session values in that range are present in the data. Do you want all
possible session values represented in the resulting dataset or only the
ones actually present in the data?

If the latter, you're done; if the former, you'll need to add variables
using the numeric command.

Gene Maguin


>>>Dear List:  I have the following data set arranged something like this.

ID Session a1 a2 a3 b1 b2 b3 b4 b5 c1 c2 c3 c4 c5 c6.
01     4    1  3  2  4  4  1  5  2  1  1  4  2  2  4
01     8    3  4  2  4  4  3  3  4  3  4  4  4  2  3
02     3    1  1  2  1  2  1  1  2  3  1  1  2  1  1
03     5    2  2  2  1  2  1  1  3  2  1  2  2  2  1
03    25    1  2  3  4  3  2  1  2  3  4  2  3  1  1
04     2    3  3  3  3  3  4  4  4  4  2  2  3  3  3
04     6    4  4  4  4  4  3  4  4  3  3  3  4  4  4
04    10    4  4  4  4  4  4  4  4  4  4  4  4  4  4
Etc.

I want each subject to have one row of data values and I want variables for
each session regardless of whether or not the subject had data for that
session.  Thus, I want session 1 with my variables, then session 2 with my
variables, session 3 with my variables.  Now it may be that I have no data
at all for session 3 but I still want to have the variable Session 3 with
all of my variables even though they may be missing.

Thus, what I want is the following.

ID   Session_1  a1_1  a2_1  a3_1  b1_1  b2_1  b3_1  b4_1  b5_1  c1_1  c2_1
c3_1  c4_1  c5_1  c6_1  Session_2  a1_2  a2_2  a3_2  b1_2  b2_2  b3_2  b4_2
b5_2  c1_2 c2_2  c3_2  c4_2 c5_2  c6_2  etc session_25 a1_25  a2_25 a3_25
b1_25 b2_25 b3_25 b4_25 b5_25 c1_25 c2_25 c3_25 c4_25 c5_25 c6_25



I have tried using Restructure but have been unable to produce the data
file.   Does anyone have any suggestions.  Thanks,  martin sherman

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: how to restructure data file

Bruce Weaver
Administrator
Gene & Albert-Jan, I don't think your solutions are creating all of the variables that Martin wants.  I think you need to do something like the following--although there are no doubt more elegant ways to do it!

new file.
dataset close all.

* --- Read in the sample data --- .
data list list / ID(a2) Session(f2.0) a1 a2 a3 b1 b2 b3 b4 b5 c1 c2 c3 c4 c5 c6 (14f2.0).
begin data
01     4          1    3   2    4    4    1    5   2   1   1    4   2    2   4
01     8          3    4   2    4    4    3    3   4   3   4    4    4   2   3
02     3          1    1    2    1   2    1    1   2   3   1    1    2   1   1
03     5          2   2     2    1    2    1   1   3   2   1    2    2   2   1
03   25          1   2    3    4    3    2    1   2   3   4    2   3    1   1
04     2          3   3    3    3    3    4    4    4   4   2    2    3   3   3
04     6          4  4    4    4     4    3    4    4   3   3    3    4   4   4
04    10        4   4   4    4     4     4    4    4    4   4   4    4   4   4
end data.

dataset name original.
sort cases by id session .
casestovars / id = id /index=session /sep="_".

data list list / ID(a2).
begin data
00
end data.
dataset name allvars.

* Define macro to create the variables Martin wants .
define !newvars ()
!do !i = 1 !to 25
numeric
 !concat("Session_",!i)
 !concat("a1_",!i)
 !concat("a2_",!i)
 !concat("a3_",!i)
 !concat("b1_",!i)
 !concat("b2_",!i)
 !concat("b3_",!i)
 !concat("b4_",!i)
 !concat("b5_",!i)
 !concat("c1_",!i)
 !concat("c2_",!i)
 !concat("c3_",!i)
 !concat("c4_",!i)
 !concat("c5_",!i)
 !concat("c6_",!i) (f2.0).
!doend
execute.
!enddefine.

!newvars.

* Now stack the two files, with ALLVARS first to control order of variables.

ADD FILES
 file = * /
 file = 'original'.
execute.

dataset close original.
select if (ID NE "00").
exe.




Gene Maguin wrote
Martin,

I'm assuming you did this already, Yes?

casestovars id=id/index=session.

In your example data session ranges between 2 and 25. Not all possible
session values in that range are present in the data. Do you want all
possible session values represented in the resulting dataset or only the
ones actually present in the data?

If the latter, you're done; if the former, you'll need to add variables
using the numeric command.

Gene Maguin


>>>Dear List:  I have the following data set arranged something like this.

ID Session a1 a2 a3 b1 b2 b3 b4 b5 c1 c2 c3 c4 c5 c6.
01     4    1  3  2  4  4  1  5  2  1  1  4  2  2  4
01     8    3  4  2  4  4  3  3  4  3  4  4  4  2  3
02     3    1  1  2  1  2  1  1  2  3  1  1  2  1  1
03     5    2  2  2  1  2  1  1  3  2  1  2  2  2  1
03    25    1  2  3  4  3  2  1  2  3  4  2  3  1  1
04     2    3  3  3  3  3  4  4  4  4  2  2  3  3  3
04     6    4  4  4  4  4  3  4  4  3  3  3  4  4  4
04    10    4  4  4  4  4  4  4  4  4  4  4  4  4  4
Etc.

I want each subject to have one row of data values and I want variables for
each session regardless of whether or not the subject had data for that
session.  Thus, I want session 1 with my variables, then session 2 with my
variables, session 3 with my variables.  Now it may be that I have no data
at all for session 3 but I still want to have the variable Session 3 with
all of my variables even though they may be missing.

Thus, what I want is the following.

ID   Session_1  a1_1  a2_1  a3_1  b1_1  b2_1  b3_1  b4_1  b5_1  c1_1  c2_1
c3_1  c4_1  c5_1  c6_1  Session_2  a1_2  a2_2  a3_2  b1_2  b2_2  b3_2  b4_2
b5_2  c1_2 c2_2  c3_2  c4_2 c5_2  c6_2  etc session_25 a1_25  a2_25 a3_25
b1_25 b2_25 b3_25 b4_25 b5_25 c1_25 c2_25 c3_25 c4_25 c5_25 c6_25



I have tried using Restructure but have been unable to produce the data
file.   Does anyone have any suggestions.  Thanks,  martin sherman

=====================
To manage your subscription to SPSSX-L, send a message to
LISTSERV@LISTSERV.UGA.EDU (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
--
Bruce Weaver
bweaver@lakeheadu.ca
http://sites.google.com/a/lakeheadu.ca/bweaver/

"When all else fails, RTFM."

PLEASE NOTE THE FOLLOWING: 
1. My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above.
2. The SPSSX Discussion forum on Nabble is no longer linked to the SPSSX-L listserv administered by UGA (https://listserv.uga.edu/).
Reply | Threaded
Open this post in threaded view
|

Re: how to restructure data file

Richard Ristow
At 07:14 PM 7/7/2010, Bruce Weaver wrote:

Gene & Albert-Jan, I don't think your solutions are creating all of the variables that Martin wants.  I think you need to do something like the following--although there are no doubt more elegant ways to do it!

Bruce notes that the problem is to create variables for sessions that are provided for in the study, but for which no subject in the file has any data. CASESTOVARS can roll up the file and create variables for all sessions that are there, but not for all that could be there.

Bruce's solution is, using a macro, to create a dummy record (for 'subject' 00) containing all the desired variables, and catenate this ahead of the file actually created.

An alternative (not tested) is to create and catenate the dummy subject before the CASESTOVARS, using LOOP and XSAVE. It's similar logic, but it takes less code. NOT TESTED:

*  This requires that the original data be in dataset DATA, .
*  and a scratch file (not dataset) named DUMMY be available.


*  Initial dummy record, as Bruce had it.
data list list / ID(a2).
begin data
00
end data.

*  XSAVE loop to create 'records' for sessions 1 to 25 .
*  (These records don't need variables a1 to c6; those .
*  will be added when the files are catenated.)        .

LOOP Session = 1 TO 25.
.  XSAVE OUTFILE=Dummy.
END LOOP.
EXECUTE /* This one is needed */.

*  Catenate dummy and real data:                        .
ADD FILES
  /FILE=Dummy
  /FILE=DATA.

*  Now CASESTOVARS should give what is wanted:          .
* (I've copied Gene's CASESTOVARS and Bruce's SELECT IF).

casestovars id=id/index=session.
select if (ID NE "00").

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: how to restructure data file

Bruce Weaver
Administrator
Hi Richard.  I couldn't get it to work with an XSAVE loop--I was getting a data set with only one row, and Session = 26.  But your post prompted me to try an INPUT PROGRAM, which did work.  Here's the whole thing again.  Notice the /GROUPBY INDEX in the CasesToVars command.


new file.
dataset close all.

input program.
loop Session = 1 to 25.
end case.
end loop.
end file.
end input program.
exe.

dataset name dummy window = front.
string ID (a2).
compute ID = '00' .
exe.

* --- Read in the sample data --- .
data list list / ID(a2) Session(f2.0) a1 a2 a3 b1 b2 b3 b4 b5 c1 c2 c3 c4 c5 c6 (14f2.0).
begin data
01     4          1    3   2    4    4    1    5   2   1   1    4   2    2   4
01     8          3    4   2    4    4    3    3   4   3   4    4    4   2   3
02     3          1    1    2    1   2    1    1   2   3   1    1    2   1   1
03     5          2   2     2    1    2    1   1   3   2   1    2    2   2   1
03   25          1   2    3    4    3    2    1   2   3   4    2   3    1   1
04     2          3   3    3    3    3    4    4    4   4   2    2    3   3   3
04     6          4  4    4    4     4    3    4    4   3   3    3    4   4   4
04    10        4   4   4    4     4     4    4    4    4   4   4    4   4   4
end data.

dataset name original.

* Stack the two datasets .

ADD FILES
  /FILE= 'dummy'
  /FILE= 'original'.
EXE.

dataset name merged.
dataset close all.

*  Now CASESTOVARS should give what is wanted.
* But note that one must specify GROUPBY index to
* get the variables in the desired order.

sort cases by id session .
casestovars / id = id /index=session /sep="_" /groupby = index.
select if (ID NE "00").
exe.


Cheers,
Bruce

Richard Ristow wrote

At 07:14 PM 7/7/2010, Bruce Weaver wrote:
Gene & Albert-Jan, I don't
think your solutions are creating all of the variables that Martin
wants.  I think you need to do something like the
following--although there are no doubt more elegant ways to do
it!
Bruce notes that the problem is to create variables for sessions that are
provided for in the study, but for which no subject in the file has any
data. CASESTOVARS can roll up the file and create variables for all
sessions that are there, but not for all that could be
there.
Bruce's solution is, using a macro, to create a dummy record (for
'subject' 00) containing all the desired variables, and catenate this
ahead of the file actually created.
An alternative (not tested) is to create and catenate the dummy subject
before the CASESTOVARS, using
LOOP and XSAVE.
It's similar logic, but it takes less code. NOT TESTED:
*  This requires that the original data be in
dataset DATA, .
*  and a scratch file (not dataset) named DUMMY be
available.

*  Initial dummy record, as Bruce had it.
data list list / ID(a2).
begin data
00
end data.
*  XSAVE loop to create 'records' for sessions 1 to 25 .
*  (These records don't need variables a1 to c6; those .
*  will be added when the files are
catenated.)        .
LOOP Session = 1 TO 25.
.  XSAVE OUTFILE=Dummy.
END LOOP.
EXECUTE /* This one is needed */.
*  Catenate dummy and real
data:                       
.
ADD FILES
  /FILE=Dummy
  /FILE=DATA.
*  Now CASESTOVARS should give what is
wanted:          .
* (I've copied Gene's CASESTOVARS and Bruce's SELECT IF).
casestovars id=id/index=session.
select if (ID NE "00").



=====================
To manage your subscription to SPSSX-L, send a message to
LISTSERV@LISTSERV.UGA.EDU (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
--
Bruce Weaver
bweaver@lakeheadu.ca
http://sites.google.com/a/lakeheadu.ca/bweaver/

"When all else fails, RTFM."

PLEASE NOTE THE FOLLOWING: 
1. My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above.
2. The SPSSX Discussion forum on Nabble is no longer linked to the SPSSX-L listserv administered by UGA (https://listserv.uga.edu/).
Reply | Threaded
Open this post in threaded view
|

Re: how to restructure data file

Richard Ristow
At 03:34 PM 7/12/2010, Bruce Weaver wrote:

>I couldn't get it to work with an XSAVE loop--I was getting a data
>set with only one row, and Session = 26.  But your post prompted me
>to try an INPUT PROGRAM, which did work.

The XSAVE loop doesn't put its result in the active dataset; it
writes to a scratch disk file (it must be a disk file, not a
dataset), which can then be loaded.

But for the present purpose, your INPUT PROGRAM is better.

>Notice the /GROUPBY INDEX in the CasesToVars command.

Thanks! I think we'd all missed that one.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD