Fwd: complex restructure

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

Fwd: complex restructure

alex mcveigh

Dear all, this is my first post to the forum. I searched for a similar query but couldn’t find. I am using SPSS v20 on windows 7 32-bit.

I'm a postgrad medical researcher with some raw blood pressure (SBP) data in long format which needs reshaping to allow me to generate new variables to mixed-effects model.  Extract of the data is here. the full dataset has  >30000 rows. 

ID

Treat_date

BP

PrePost_T

365

09/04/2009

171

1

365

09/04/2009

162

2

365

09/06/2009

171

1

365

09/06/2009

174

2

365

09/08/2009

163

1

365

09/08/2009

196

2

365

09/11/2009

154

1

365

9/13/2009

158

1

365

9/13/2009

180

2

350

10/20/2009

163

1

350

10/20/2009

143

2

350

10/22/2009

136

1

350

10/22/2009

140

2

350

10/25/2009

189

1

350

10/25/2009

185

2

355

02/01/2011

152

1

355

02/01/2011

133

2

355

02/02/2011

144

1

355

02/02/2011

144

2

355

02/03/2011

137

1

355

02/03/2011

142

2

355

02/06/2011

145

1

355

02/06/2011

139

2

355

02/08/2011

133

1

355

02/08/2011

169

2

 

 

Subjects (ID) had SBP measured both before and after a treatment (PrePost_T,  1=Pre-treatment, 2=Post-treatment) on several dates (Treat_date) usually 2-3 days apart. Data are unbalanced with varying number of observations per ID. My main interest is in the change in SBP on the same Treat_date. Ideally I need the Pre-Treatment BP in a column adjacent to the Post-treatment BP column.  I tried using a variety of setups in CASESTOVARS using a combination of ID, Treat_date and PrePost_T to make a unique INDEX.

 

SORT CASES BY ID Treat_date PrePost_T.

CASESTOVARS

   /INDEX=Treat_date PrePost_T

  /GROUPBY=VARIABLE.


This led to either an error message saying my observations were not unique or generating 1 row per ID with all observations as new variables in wide format. My questions are:

1.            I suspect there is an alternative method/syntax to separate my pre and post-treatment BPs in column PrePost_T for each ID but seems beyond my lateral thinking or SPSS knowledge. Can anyone help?

2.            Assuming CASESTOVARS is the wrong approach, I would still find it useful in future to create a unique index from long format data using a combination of ID and 2 variables. Is this possible?

 

My thanks in advance

Alex McVeigh


===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Fwd: complex restructure

Andy W
You may be confusing the INDEX subcommand with the ID subcommand for CASESTOVARS. Index just tells SPSS what to put as the suffix for the new variables. The ID subcommand (which can take more than one variable) tells SPSS which rows in the old dataset to collapse into one row in the new dataset.

Try this:

*************************************.
DATA LIST FREE / ID (F3.0) Treat_date (ADATE10) BP PrePost_T (2F3.0).
BEGIN DATA
365 09/04/2009 171 1
365 09/04/2009 162 2
365 09/06/2009 171 1
365 09/06/2009 174 2
365 09/08/2009 163 1
365 09/08/2009 196 2
365 09/11/2009 154 1
365 9/13/2009 158 1
365 9/13/2009 180 2
350 10/20/2009 163 1
350 10/20/2009 143 2
350 10/22/2009 136 1
350 10/22/2009 140 2
350 10/25/2009 189 1
350 10/25/2009 185 2
355 02/01/2011 152 1
355 02/01/2011 133 2
355 02/02/2011 144 1
355 02/02/2011 144 2
355 02/03/2011 137 1
355 02/03/2011 142 2
355 02/06/2011 145 1
355 02/06/2011 139 2
355 02/08/2011 133 1
355 02/08/2011 169 2
END DATA.

SORT CASES BY ID Treat_Date PrePost_T.
CASESTOVARS /ID = ID Treat_Date /INDEX = PrePost_T.
*************************************.
Andy W
apwheele@gmail.com
http://andrewpwheeler.wordpress.com/
Reply | Threaded
Open this post in threaded view
|

Re: Fwd: complex restructure

alex mcveigh
Thanks Andy for the explanation and syntax. I obviously didn't understand how CASESTOVARS worked.
Using your syntax in my full raw data the restructure is partly running but stopped by an error. i have reproduced the error below adding the variable casenum for illustrative purposes (which i don't have in the real data):

DATA LIST FREE / casenum (F1.0) ID (F3.0) Treat_date (ADATE10) BP PrePost_T (2F3.0).
BEGIN DATA
1 350 10/20/2009 163 1
2 350 10/20/2009 143 2
3 350 10/22/2009 136 1
4 350 10/22/2009 140 2
5 350 10/25/2009 189 1
6 350 10/25/2009 185 2
7 355 02/01/2011 152 1
8 355 02/01/2011 133 2
9 355 02/02/2011 144 1
10 355 02/02/2011 144 2
11 355 02/03/2011 137 1
12 355 02/03/2011 142 2
13 355 02/06/2011 145 1
14 355 02/06/2011 139 2
15 355 02/08/2011 133 1
16 355 02/08/2011 136 1
17 355 02/08/2011 169 2
18 365 09/04/2009 171 1
19 365 09/04/2009 162 2
20 365 09/06/2009 171 1
21 365 09/06/2009 174 2
22 365 09/06/2009 176 2
23 365 09/08/2009 163 1
24 365 09/08/2009 196 2
25 365 09/13/2009 158 1
26 365 09/13/2009 180 2
END DATA.

SORT CASES BY ID Treat_Date PrePost_T.
CASESTOVARS /ID = ID Treat_Date /INDEX = PrePost_T.

The INDEX values for case 16 have occurred before in the cases with the same ID values.
Execution of this command stops.

if i delete/filter case 16 it occurs for further cases (22 in this simulated example). This is obviously due to data entry errors in the raw data with 2 pre or post-treatment BPs when there should only be one. to save time manually filtering the large dataset and (improve reproducibility of syntax for similar data) i wonder how i can filter cases to accept only the first value of BP for PrePost_T 1 or 2? in other words how to filter out casenum 16 and 22 in the example data?

Reply | Threaded
Open this post in threaded view
|

Re: Fwd: complex restructure

David Marso
Administrator
Look up the LAG function! or use MATCH with FIRST function. Your call!

alex mcveigh wrote
Thanks Andy for the explanation and syntax. I obviously didn't understand how CASESTOVARS worked.
Using your syntax in my full raw data the restructure is partly running but stopped by an error. i have reproduced the error below adding the variable casenum for illustrative purposes (which i don't have in the real data):

DATA LIST FREE / casenum (F1.0) ID (F3.0) Treat_date (ADATE10) BP PrePost_T (2F3.0).
BEGIN DATA
1 350 10/20/2009 163 1
2 350 10/20/2009 143 2
3 350 10/22/2009 136 1
4 350 10/22/2009 140 2
5 350 10/25/2009 189 1
6 350 10/25/2009 185 2
7 355 02/01/2011 152 1
8 355 02/01/2011 133 2
9 355 02/02/2011 144 1
10 355 02/02/2011 144 2
11 355 02/03/2011 137 1
12 355 02/03/2011 142 2
13 355 02/06/2011 145 1
14 355 02/06/2011 139 2
15 355 02/08/2011 133 1
16 355 02/08/2011 136 1
17 355 02/08/2011 169 2
18 365 09/04/2009 171 1
19 365 09/04/2009 162 2
20 365 09/06/2009 171 1
21 365 09/06/2009 174 2
22 365 09/06/2009 176 2
23 365 09/08/2009 163 1
24 365 09/08/2009 196 2
25 365 09/13/2009 158 1
26 365 09/13/2009 180 2
END DATA.

SORT CASES BY ID Treat_Date PrePost_T.
CASESTOVARS /ID = ID Treat_Date /INDEX = PrePost_T.

The INDEX values for case 16 have occurred before in the cases with the same ID values.
Execution of this command stops.

if i delete/filter case 16 it occurs for further cases (22 in this simulated example). This is obviously due to data entry errors in the raw data with 2 pre or post-treatment BPs when there should only be one. to save time manually filtering the large dataset and (improve reproducibility of syntax for similar data) i wonder how i can filter cases to accept only the first value of BP for PrePost_T 1 or 2? in other words how to filter out casenum 16 and 22 in the example data?
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me.
---
"Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis."
Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?"
Reply | Threaded
Open this post in threaded view
|

Re: Fwd: complex restructure

Jon K Peck
In reply to this post by alex mcveigh
See Data > Identify Duplicate Cases


Jon Peck (no "h") aka Kim
Senior Software Engineer, IBM
[hidden email]
phone: 720-342-5621




From:        alex mcveigh <[hidden email]>
To:        [hidden email]
Date:        08/13/2014 06:44 AM
Subject:        Re: [SPSSX-L] Fwd: complex restructure
Sent by:        "SPSSX(r) Discussion" <[hidden email]>




Thanks Andy for the explanation and syntax. I obviously didn't understand how
CASESTOVARS worked.
Using your syntax in my full raw data the restructure is partly running but
stopped by an error. i have reproduced the error below adding the variable
casenum for illustrative purposes (which i don't have in the real data):

DATA LIST FREE / casenum (F1.0) ID                 (F3.0) Treat_date (ADATE10) BP PrePost_T
(2F3.0).
BEGIN DATA
1                 350                 10/20/2009                 163                 1
2                 350                 10/20/2009                 143                 2
3                 350                 10/22/2009                 136                 1
4                 350                 10/22/2009                 140                 2
5                 350                 10/25/2009                 189                 1
6                 350                 10/25/2009                 185                 2
7                 355                 02/01/2011                 152                 1
8                 355                 02/01/2011                 133                 2
9                 355                 02/02/2011                 144                 1
10                 355                 02/02/2011                 144                 2
11                 355                 02/03/2011                 137                 1
12                 355                 02/03/2011                 142                 2
13                 355                 02/06/2011                 145                 1
14                 355                 02/06/2011                 139                 2
15                 355                 02/08/2011                 133                 1
16                 355                 02/08/2011                 136                 1
17                 355                 02/08/2011                 169                 2
18                 365                 09/04/2009                 171                 1
19                 365                 09/04/2009                 162                 2
20                 365                 09/06/2009                 171                 1
21                 365                 09/06/2009                 174                 2
22                 365                 09/06/2009                 176                 2
23                 365                 09/08/2009                 163                 1
24                 365                 09/08/2009                 196                 2
25                 365                 09/13/2009                 158                 1
26                 365                 09/13/2009                 180                 2
END DATA.

SORT CASES BY ID Treat_Date PrePost_T.
CASESTOVARS /ID = ID Treat_Date /INDEX = PrePost_T.

The INDEX values for case 16 have occurred before in the cases with the same
ID values.
Execution of this command stops.

if i delete/filter case 16 it occurs for further cases (22 in this simulated
example). This is obviously due to data entry errors in the raw data with 2
pre or post-treatment BPs when there should only be one. to save time
manually filtering the large dataset and (improve reproducibility of syntax
for similar data) i wonder how i can filter cases to accept only the first
value of BP for PrePost_T 1 or 2? in other words how to filter out casenum
16 and 22 in the example data?





--
View this message in context:
http://spssx-discussion.1045642.n5.nabble.com/Fwd-complex-restructure-tp5726939p5726943.html
Sent from the SPSSX Discussion mailing list archive at Nabble.com.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD


===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD