Dear SPSS Literservers,
I am having a very difficult time transposing my dataset. The dataset is 100,000 rows in total. Each family has one or more epsiodes of treatment and in each episode are one or more data collection points made at different times. Therefore each row represents a specific time point at which data was collected for a family's episode of treatment.
I wish to transpose this data and have item_01 TO item_05 for every Time_point (upto 4 timepoints) within an episode on the same row. The goal is to have one row for each episode of treament keeping in mind that a family will still be able to have multiple rows if they have more than one episode. So at the end of the transposing the first row would look like:
1 4251302 9999 1 0 3 1 3 9999 9999 9999 4
In the dataset example below I have included only item_01 TO item_05 but my full datafile includes upto item43. Also note that I only with to transpose 4 timewaves and not all timewaves so for family 21 and 22 we would lose the 5th time wave in the tranposing process.
CASESTOVARS is far too slow and I can't get VECTORS to work without errors.
Ideas anyone?
DATA LIST LIST/ Family (F4.0) Episode (f10.0) Time_point (F4.0) item_01(F4.0) item_02(F4.0) item_03(F4.0) item_04(F4.0) item_05(F4.0). BEGIN DATA 1 4251302 8 9999 1 0 3 1 1 4251302 10 3 9999 9999 9999 4 3 4764380 45 9999 9999 9999 9999 9999
8 4298546 77 1 2 0 3 3 8 4298546 78 1 4 5 3 1 8 4298546 80 1 1 1 2 3 8 4298546 81 1 4 9999 2 3 10 4327688 90 1 9999 1 2 2 10 4367123 94 2 2 2 2 2 13 4359197 177 1 1 1 9999 1 13 4833983 205 0 1 0 1 1 17 4452881 234 0 0 0 0 0 18 4284998 236 1 1 1 1 1 18 4756727 246 1 1 1 1 1 20 4693406 263 2 1 1 1 9999 20 4693406 264 2 3 2 2 2 20 4693406 265 1 2 3 0 9999 21 5100011 1 9999 1 0 3 1 21 5100011 2 9999 9999 9999 9999 9999 21 5100011 3 9999 9999 9999 9999 9999 21 5100011 4 1 2 0 3 3 21 5100011 5 1 4 5 3 1 22 5100011 1 1 1 1 2 3 22 5100011 2 1 4 9999 2 3 22 5100011 3 1 9999 1 2 2 22 5100011 4 2 2 2 2 2 22 5100011 5 1 1 1 9999 1 END DATA. |
This should help get you started. (No tested code.) RANK time_point BY family episode /INTO time_seq. ** data reduction. COMPUTE first_four = time_seq LE 4. DO REPEAT t = item_01 TO item_05 /a = first_01 TO first_05 /b = second_01 TO second_05 /c = third_01 TO third_05 /d = fourth_01 TO fourth_05 . IF time_seq = 1 a = t. IF time_seq = 2 b = t. IF time_seq = 3 c = t. IF time_seq = 4 b = t. . END REPEAT. EXECUTE. FILTER BY first_four. DATASET DECLARE wide_file. AGGREGATE OUTFILE = wide_file /BREAK = family /first_01 TO first_05 = SUM(first_01 TO first_05) /second_01 TO second_05 = SUM(second_01 TO second_05) /third_01 TO third_05 = SUM(third_01 TO third_05) /fourth_01 TO fourth_05 = SUM(fourth_01 TO fourth_05) . DATASET ACTIVATE wide_file. Jim Marks Director, Market Research x1616 From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Snuffy Dog Dear SPSS Literservers, I am having a very difficult time transposing my dataset. The dataset is 100,000 rows in total. Each family has one or more epsiodes of treatment and in each episode are one or more data collection points made at different times. Therefore each row represents a specific time point at which data was collected for a family's episode of treatment. I wish to transpose this data and have item_01 TO item_05 for every Time_point (upto 4 timepoints) within an episode on the same row. The goal is to have one row for each episode of treament keeping in mind that a family will still be able to have multiple rows if they have more than one episode. So at the end of the transposing the first row would look like: 1 4251302 9999 1 0 3 1 3 9999 9999 9999 4 In the dataset example below I have included only item_01 TO item_05 but my full datafile includes upto item43. Also note that I only with to transpose 4 timewaves and not all timewaves so for family 21 and 22 we would lose the 5th time wave in the tranposing process. CASESTOVARS is far too slow and I can't get VECTORS to work without errors. Ideas anyone?
3 4764380 45 9999 9999 9999 9999 9999 |
Administrator
|
In reply to this post by Snuffy Dog
What syntax have you attempted and what error messages do you receive?
"I have included only item_01 TO item_05 but my full datafile includes up to item43." I have a suspicion that the resulting "wide" file with 170+ variables will be an utter pain in the ass to manage. What do you need to do with this file which can NOT be done in the long format? Why toss the 5th timewave? Why throw away information? The basic solution to this is some version of a counter followed by a vector/compute/aggregate sequence. I posted the same to your question last week so will not repost it here.
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me. --- "Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis." Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?" |
Yeah sure David, the solution that you posted last week was a good one, and I fiddleled with this endlessly for my new and slightly different problem but I had great difficulty in running the compute statements after defining my vectors and I would encounter errors that I could not resolve. This aside Jim solution worked well, thank you again for your thoughts I've learnt a lot about transposing in the last week
Jack Thomas
On Sat, Dec 11, 2010 at 1:02 AM, David Marso <[hidden email]> wrote: What syntax have you attempted and what error messages do you receive? |
Free forum by Nabble | Edit this page |