Difficult time transposing a dataset using VECTORS

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Difficult time transposing a dataset using VECTORS

Snuffy Dog
Dear SPSS Literservers,
 
I am having a very difficult time transposing my dataset.  The dataset is 100,000 rows in total.  Each family has one or more epsiodes of treatment and in each episode are one or more data collection points made at different times.  Therefore each row represents a specific time point at which data was collected for a family's episode of treatment. 
I wish to transpose this data and have item_01 TO item_05 for every Time_point (upto 4 timepoints) within an episode on the same row. The goal is to have one row for each episode of treament keeping in mind that a family will still be able to have multiple rows if they have more than one episode.  So at the end of the transposing the first row would look like:
 
1 4251302  9999 1 0 3 1 3 9999 9999 9999 4
 
In the dataset example below I have included only item_01 TO item_05 but my full datafile includes upto item43. Also note that I only with to transpose 4 timewaves and not all timewaves so for family 21 and 22 we would lose the 5th time wave in the tranposing process.
 
CASESTOVARS is far too slow and I can't get VECTORS to work without errors.
 
Ideas anyone?
 
 

DATA LIST LIST/
Family (F4.0) Episode (f10.0) Time_point (F4.0) item_01(F4.0) item_02(F4.0) item_03(F4.0) item_04(F4.0)
item_05(F4.0).
BEGIN DATA
1 4251302 8 9999 1 0 3 1
1 4251302 10 3 9999 9999 9999 4
3 4764380 45 9999 9999 9999 9999 9999
8 4298546 77 1 2 0 3 3
8 4298546 78 1 4 5 3 1
8 4298546 80 1 1 1 2 3
8 4298546 81 1 4 9999 2 3
10 4327688 90 1 9999 1 2 2
10 4367123 94 2 2 2 2 2
13 4359197 177 1 1 1 9999 1
13 4833983 205 0 1 0 1 1
17 4452881 234 0 0 0 0 0
18 4284998 236 1 1 1 1 1
18 4756727 246 1 1 1 1 1
20 4693406 263 2 1 1 1 9999
20 4693406 264 2 3 2 2 2
20 4693406 265 1 2 3 0 9999
21 5100011 1 9999 1 0 3 1
21 5100011 2 9999 9999 9999 9999 9999
21 5100011 3 9999 9999 9999 9999 9999
21 5100011 4 1 2 0 3 3
21 5100011 5 1 4 5 3 1
22 5100011 1 1 1 1 2 3
22 5100011 2 1 4 9999 2 3
22 5100011 3 1 9999 1 2 2
22 5100011 4 2 2 2 2 2
22 5100011 5 1 1 1 9999 1
END DATA.
Reply | Threaded
Open this post in threaded view
|

Re: Difficult time transposing a dataset using VECTORS

Marks, Jim

 

 

This should help get you started. (No tested code.)

 

RANK time_point BY family episode /INTO time_seq.

 

** data reduction.

COMPUTE first_four = time_seq LE 4.

 

DO REPEAT t = item_01 TO item_05

/a = first_01 TO first_05

/b = second_01 TO second_05

/c = third_01 TO third_05

/d = fourth_01 TO fourth_05

.

IF time_seq = 1 a = t.

 

IF time_seq = 2 b = t.

 

IF time_seq = 3 c = t.

 

IF time_seq = 4 b = t.

.

END REPEAT.

EXECUTE.

 

FILTER BY first_four.

 

DATASET DECLARE wide_file.

AGGREGATE OUTFILE = wide_file

  /BREAK = family

/first_01 TO first_05 = SUM(first_01 TO first_05)

/second_01 TO second_05 = SUM(second_01 TO second_05)

/third_01 TO third_05 = SUM(third_01 TO third_05)

/fourth_01 TO fourth_05 = SUM(fourth_01 TO fourth_05)

.

DATASET ACTIVATE wide_file.

 

Jim Marks

Director, Market Research

x1616

 

From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Snuffy Dog
Sent: Wednesday, December 08, 2010 6:31 PM
To: [hidden email]
Subject: Difficult time transposing a dataset using VECTORS

 

Dear SPSS Literservers,

 

I am having a very difficult time transposing my dataset.  The dataset is 100,000 rows in total.  Each family has one or more epsiodes of treatment and in each episode are one or more data collection points made at different times.  Therefore each row represents a specific time point at which data was collected for a family's episode of treatment. 

I wish to transpose this data and have item_01 TO item_05 for every Time_point (upto 4 timepoints) within an episode on the same row. The goal is to have one row for each episode of treament keeping in mind that a family will still be able to have multiple rows if they have more than one episode.  So at the end of the transposing the first row would look like:

 

1 4251302  9999 1 0 3 1 3 9999 9999 9999 4

 

In the dataset example below I have included only item_01 TO item_05 but my full datafile includes upto item43. Also note that I only with to transpose 4 timewaves and not all timewaves so for family 21 and 22 we would lose the 5th time wave in the tranposing process.

 

CASESTOVARS is far too slow and I can't get VECTORS to work without errors.

 

Ideas anyone?

 

 


DATA LIST LIST/
Family (F4.0) Episode (f10.0) Time_point (F4.0) item_01(F4.0) item_02(F4.0) item_03(F4.0) item_04(F4.0)
item_05(F4.0).
BEGIN DATA
1 4251302 8 9999 1 0 3 1
1 4251302 10 3 9999 9999 9999 4

3 4764380 45 9999 9999 9999 9999 9999
8 4298546 77 1 2 0 3 3
8 4298546 78 1 4 5 3 1
8 4298546 80 1 1 1 2 3
8 4298546 81 1 4 9999 2 3
10 4327688 90 1 9999 1 2 2
10 4367123 94 2 2 2 2 2
13 4359197 177 1 1 1 9999 1
13 4833983 205 0 1 0 1 1
17 4452881 234 0 0 0 0 0
18 4284998 236 1 1 1 1 1
18 4756727 246 1 1 1 1 1
20 4693406 263 2 1 1 1 9999
20 4693406 264 2 3 2 2 2
20 4693406 265 1 2 3 0 9999
21 5100011 1 9999 1 0 3 1
21 5100011 2 9999 9999 9999 9999 9999
21 5100011 3 9999 9999 9999 9999 9999
21 5100011 4 1 2 0 3 3
21 5100011 5 1 4 5 3 1
22 5100011 1 1 1 1 2 3
22 5100011 2 1 4 9999 2 3
22 5100011 3 1 9999 1 2 2
22 5100011 4 2 2 2 2 2
22 5100011 5 1 1 1 9999 1
END DATA.

Reply | Threaded
Open this post in threaded view
|

Re: Difficult time transposing a dataset using VECTORS

David Marso
Administrator
In reply to this post by Snuffy Dog
What syntax have you attempted and what error messages do you receive?
"I have included only item_01 TO item_05 but my full datafile includes up to item43."
I have a suspicion that the resulting "wide" file with 170+ variables will be an utter pain in the ass to manage.  What do you need to do with this file which can NOT be done in the long format?
Why toss the 5th timewave?  Why throw away information?
The basic solution to this is some version of a counter followed by a vector/compute/aggregate sequence.
I posted the same to your question last week so will not repost it here.

Snuffy Dog wrote
Dear SPSS Literservers,

I am having a very difficult time transposing my dataset.  The dataset is
100,000 rows in total.  Each family has one or more epsiodes of treatment
and in each episode are one or more data collection points made at different
times.  Therefore each row represents a specific time point at which data
was collected for a family's episode of treatment.
I wish to transpose this data and have item_01 TO item_05 for every
Time_point (upto 4 timepoints) within an episode on the same row. The goal
is to have one row for each episode of treament keeping in mind that a
family will still be able to have multiple rows if they have more than one
episode.  So at the end of the transposing the first row would look like:

1 4251302  9999 1 0 3 1 3 9999 9999 9999 4

In the dataset example below I have included only item_01 TO item_05 but my
full datafile includes upto item43. Also note that I only with to transpose
4 timewaves and not all timewaves so for family 21 and 22 we would lose the
5th time wave in the tranposing process.

CASESTOVARS is far too slow and I can't get VECTORS to work without errors.

Ideas anyone?



DATA LIST LIST/
Family (F4.0) Episode (f10.0) Time_point (F4.0) item_01(F4.0) item_02(F4.0)
item_03(F4.0) item_04(F4.0)
item_05(F4.0).
BEGIN DATA
1 4251302 8 9999 1 0 3 1
1 4251302 10 3 9999 9999 9999 4
3 4764380 45 9999 9999 9999 9999 9999
8 4298546 77 1 2 0 3 3
8 4298546 78 1 4 5 3 1
8 4298546 80 1 1 1 2 3
8 4298546 81 1 4 9999 2 3
10 4327688 90 1 9999 1 2 2
10 4367123 94 2 2 2 2 2
13 4359197 177 1 1 1 9999 1
13 4833983 205 0 1 0 1 1
17 4452881 234 0 0 0 0 0
18 4284998 236 1 1 1 1 1
18 4756727 246 1 1 1 1 1
20 4693406 263 2 1 1 1 9999
20 4693406 264 2 3 2 2 2
20 4693406 265 1 2 3 0 9999
21 5100011 1 9999 1 0 3 1
21 5100011 2 9999 9999 9999 9999 9999
21 5100011 3 9999 9999 9999 9999 9999
21 5100011 4 1 2 0 3 3
21 5100011 5 1 4 5 3 1
22 5100011 1 1 1 1 2 3
22 5100011 2 1 4 9999 2 3
22 5100011 3 1 9999 1 2 2
22 5100011 4 2 2 2 2 2
22 5100011 5 1 1 1 9999 1
END DATA.
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me.
---
"Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis."
Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?"
Reply | Threaded
Open this post in threaded view
|

Re: Difficult time transposing a dataset using VECTORS

Snuffy Dog
Yeah sure David, the solution that you posted last week was a good one, and I fiddleled with this endlessly for my new and slightly different problem but I had great difficulty in running the compute statements after defining my vectors and I would encounter errors that I could not resolve.  This aside Jim solution worked well, thank you again for your thoughts I've learnt a lot about transposing in the last week
 
Jack Thomas

On Sat, Dec 11, 2010 at 1:02 AM, David Marso <[hidden email]> wrote:
What syntax have you attempted and what error messages do you receive?
"I have included only item_01 TO item_05 but my full datafile includes up to
item43."
I have a suspicion that the resulting "wide" file with 170+ variables will
be an utter pain in the ass to manage.  What do you need to do with this
file which can NOT be done in the long format?
Why toss the 5th timewave?  Why throw away information?
The basic solution to this is some version of a counter followed by a
vector/compute/aggregate sequence.
I posted the same to your question last week so will not repost it here.


Snuffy Dog wrote:
>
> Dear SPSS Literservers,
>
> I am having a very difficult time transposing my dataset.  The dataset is
> 100,000 rows in total.  Each family has one or more epsiodes of treatment
> and in each episode are one or more data collection points made at
> different
> times.  Therefore each row represents a specific time point at which data
> was collected for a family's episode of treatment.
> I wish to transpose this data and have item_01 TO item_05 for every
> Time_point (upto 4 timepoints) within an episode on the same row. The goal
> is to have one row for each episode of treament keeping in mind that a
> family will still be able to have multiple rows if they have more than one
> episode.  So at the end of the transposing the first row would look like:
>
> 1 4251302  9999 1 0 3 1 3 9999 9999 9999 4
>
> In the dataset example below I have included only item_01 TO item_05 but
> my
> full datafile includes upto item43. Also note that I only with to
> transpose
> 4 timewaves and not all timewaves so for family 21 and 22 we would lose
> the

> 5th time wave in the tranposing process.
>
> CASESTOVARS is far too slow and I can't get VECTORS to work without
> errors.
>
> Ideas anyone?
>
>
>
> DATA LIST LIST/
> Family (F4.0) Episode (f10.0) Time_point (F4.0) item_01(F4.0)
> item_02(F4.0)
> item_03(F4.0) item_04(F4.0)
> item_05(F4.0).
> BEGIN DATA
> 1 4251302 8 9999 1 0 3 1
> 1 4251302 10 3 9999 9999 9999 4
> 3 4764380 45 9999 9999 9999 9999 9999
> 8 4298546 77 1 2 0 3 3
> 8 4298546 78 1 4 5 3 1
> 8 4298546 80 1 1 1 2 3
> 8 4298546 81 1 4 9999 2 3
> 10 4327688 90 1 9999 1 2 2
> 10 4367123 94 2 2 2 2 2
> 13 4359197 177 1 1 1 9999 1
> 13 4833983 205 0 1 0 1 1
> 17 4452881 234 0 0 0 0 0
> 18 4284998 236 1 1 1 1 1
> 18 4756727 246 1 1 1 1 1
> 20 4693406 263 2 1 1 1 9999
> 20 4693406 264 2 3 2 2 2
> 20 4693406 265 1 2 3 0 9999
> 21 5100011 1 9999 1 0 3 1
> 21 5100011 2 9999 9999 9999 9999 9999
> 21 5100011 3 9999 9999 9999 9999 9999
> 21 5100011 4 1 2 0 3 3
> 21 5100011 5 1 4 5 3 1
> 22 5100011 1 1 1 1 2 3
> 22 5100011 2 1 4 9999 2 3
> 22 5100011 3 1 9999 1 2 2
> 22 5100011 4 2 2 2 2 2
> 22 5100011 5 1 1 1 9999 1
> END DATA.
>
>

--
View this message in context: http://spssx-discussion.1045642.n5.nabble.com/Difficult-time-transposing-a-dataset-using-VECTORS-tp3298298p3300328.html
Sent from the SPSSX Discussion mailing list archive at Nabble.com.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD