Dear all, this is my first post to the forum. I searched for a similar query but couldn’t find. I am using SPSS v20 on windows 7 32-bit. I'm a postgrad medical researcher with some raw blood pressure (SBP) data in long format which needs reshaping to allow me to generate new variables to mixed-effects model. Extract of the data is here. the full dataset has >30000 rows.
Subjects (ID) had SBP measured both before and after a treatment (PrePost_T, 1=Pre-treatment, 2=Post-treatment) on several dates (Treat_date) usually 2-3 days apart. Data are unbalanced with varying number of observations per ID. My main interest is in the change in SBP on the same Treat_date. Ideally I need the Pre-Treatment BP in a column adjacent to the Post-treatment BP column. I tried using a variety of setups in CASESTOVARS using a combination of ID, Treat_date and PrePost_T to make a unique INDEX.
SORT CASES BY ID Treat_date PrePost_T. CASESTOVARS /INDEX=Treat_date PrePost_T /GROUPBY=VARIABLE. This led to either an error message saying my observations were not unique or generating 1 row per ID with all observations as new variables in wide format. My questions are: 1. I suspect there is an alternative method/syntax to separate my pre and post-treatment BPs in column PrePost_T for each ID but seems beyond my lateral thinking or SPSS knowledge. Can anyone help? 2. Assuming CASESTOVARS is the wrong approach, I would still find it useful in future to create a unique index from long format data using a combination of ID and 2 variables. Is this possible?
My thanks in advance Alex McVeigh |
You may be confusing the INDEX subcommand with the ID subcommand for CASESTOVARS. Index just tells SPSS what to put as the suffix for the new variables. The ID subcommand (which can take more than one variable) tells SPSS which rows in the old dataset to collapse into one row in the new dataset.
Try this: *************************************. DATA LIST FREE / ID (F3.0) Treat_date (ADATE10) BP PrePost_T (2F3.0). BEGIN DATA 365 09/04/2009 171 1 365 09/04/2009 162 2 365 09/06/2009 171 1 365 09/06/2009 174 2 365 09/08/2009 163 1 365 09/08/2009 196 2 365 09/11/2009 154 1 365 9/13/2009 158 1 365 9/13/2009 180 2 350 10/20/2009 163 1 350 10/20/2009 143 2 350 10/22/2009 136 1 350 10/22/2009 140 2 350 10/25/2009 189 1 350 10/25/2009 185 2 355 02/01/2011 152 1 355 02/01/2011 133 2 355 02/02/2011 144 1 355 02/02/2011 144 2 355 02/03/2011 137 1 355 02/03/2011 142 2 355 02/06/2011 145 1 355 02/06/2011 139 2 355 02/08/2011 133 1 355 02/08/2011 169 2 END DATA. SORT CASES BY ID Treat_Date PrePost_T. CASESTOVARS /ID = ID Treat_Date /INDEX = PrePost_T. *************************************. |
Thanks Andy for the explanation and syntax. I obviously didn't understand how CASESTOVARS worked.
Using your syntax in my full raw data the restructure is partly running but stopped by an error. i have reproduced the error below adding the variable casenum for illustrative purposes (which i don't have in the real data): DATA LIST FREE / casenum (F1.0) ID (F3.0) Treat_date (ADATE10) BP PrePost_T (2F3.0). BEGIN DATA 1 350 10/20/2009 163 1 2 350 10/20/2009 143 2 3 350 10/22/2009 136 1 4 350 10/22/2009 140 2 5 350 10/25/2009 189 1 6 350 10/25/2009 185 2 7 355 02/01/2011 152 1 8 355 02/01/2011 133 2 9 355 02/02/2011 144 1 10 355 02/02/2011 144 2 11 355 02/03/2011 137 1 12 355 02/03/2011 142 2 13 355 02/06/2011 145 1 14 355 02/06/2011 139 2 15 355 02/08/2011 133 1 16 355 02/08/2011 136 1 17 355 02/08/2011 169 2 18 365 09/04/2009 171 1 19 365 09/04/2009 162 2 20 365 09/06/2009 171 1 21 365 09/06/2009 174 2 22 365 09/06/2009 176 2 23 365 09/08/2009 163 1 24 365 09/08/2009 196 2 25 365 09/13/2009 158 1 26 365 09/13/2009 180 2 END DATA. SORT CASES BY ID Treat_Date PrePost_T. CASESTOVARS /ID = ID Treat_Date /INDEX = PrePost_T. The INDEX values for case 16 have occurred before in the cases with the same ID values. Execution of this command stops. if i delete/filter case 16 it occurs for further cases (22 in this simulated example). This is obviously due to data entry errors in the raw data with 2 pre or post-treatment BPs when there should only be one. to save time manually filtering the large dataset and (improve reproducibility of syntax for similar data) i wonder how i can filter cases to accept only the first value of BP for PrePost_T 1 or 2? in other words how to filter out casenum 16 and 22 in the example data? |
Administrator
|
Look up the LAG function! or use MATCH with FIRST function. Your call!
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me. --- "Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis." Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?" |
In reply to this post by alex mcveigh
See Data > Identify Duplicate Cases
Jon Peck (no "h") aka Kim Senior Software Engineer, IBM [hidden email] phone: 720-342-5621 From: alex mcveigh <[hidden email]> To: [hidden email] Date: 08/13/2014 06:44 AM Subject: Re: [SPSSX-L] Fwd: complex restructure Sent by: "SPSSX(r) Discussion" <[hidden email]> Thanks Andy for the explanation and syntax. I obviously didn't understand how CASESTOVARS worked. Using your syntax in my full raw data the restructure is partly running but stopped by an error. i have reproduced the error below adding the variable casenum for illustrative purposes (which i don't have in the real data): DATA LIST FREE / casenum (F1.0) ID (F3.0) Treat_date (ADATE10) BP PrePost_T (2F3.0). BEGIN DATA 1 350 10/20/2009 163 1 2 350 10/20/2009 143 2 3 350 10/22/2009 136 1 4 350 10/22/2009 140 2 5 350 10/25/2009 189 1 6 350 10/25/2009 185 2 7 355 02/01/2011 152 1 8 355 02/01/2011 133 2 9 355 02/02/2011 144 1 10 355 02/02/2011 144 2 11 355 02/03/2011 137 1 12 355 02/03/2011 142 2 13 355 02/06/2011 145 1 14 355 02/06/2011 139 2 15 355 02/08/2011 133 1 16 355 02/08/2011 136 1 17 355 02/08/2011 169 2 18 365 09/04/2009 171 1 19 365 09/04/2009 162 2 20 365 09/06/2009 171 1 21 365 09/06/2009 174 2 22 365 09/06/2009 176 2 23 365 09/08/2009 163 1 24 365 09/08/2009 196 2 25 365 09/13/2009 158 1 26 365 09/13/2009 180 2 END DATA. SORT CASES BY ID Treat_Date PrePost_T. CASESTOVARS /ID = ID Treat_Date /INDEX = PrePost_T. The INDEX values for case 16 have occurred before in the cases with the same ID values. Execution of this command stops. if i delete/filter case 16 it occurs for further cases (22 in this simulated example). This is obviously due to data entry errors in the raw data with 2 pre or post-treatment BPs when there should only be one. to save time manually filtering the large dataset and (improve reproducibility of syntax for similar data) i wonder how i can filter cases to accept only the first value of BP for PrePost_T 1 or 2? in other words how to filter out casenum 16 and 22 in the example data? -- View this message in context: http://spssx-discussion.1045642.n5.nabble.com/Fwd-complex-restructure-tp5726939p5726943.html Sent from the SPSSX Discussion mailing list archive at Nabble.com. ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
Free forum by Nabble | Edit this page |