problem with dis-aggregating data

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

problem with dis-aggregating data

Cleland, Patricia (EDU)

A colleague has received a data set which has been aggregated so that there is one record per client per agency by the length_of_admission_in_Days. So that if a client had admissions of 1, 3 and 5 days, there would be 3 records.  For her analysis, she needs to have the data dis-aggregated.

 

Getting the original data set is unlikely at this time.

 

Here’s what the aggregated data look:

 

agency_id

client_id

number_of_admissions

length_of_admission_in_Days

4

5

3

1

4

5

1

2

 

 

 

What she wants is a data set with one record per client per agency per admission.

 

agency_id

client_id

number_of_admissions

length_of_admission_in_Days

4

5

1

1

4

5

1

1

4

5

1

1

4

5

1

2

 

 

The variable number_of_admissions is redundant in the dis-aggregated data set and can be deleted since it will be 1 for all cases.

 

This looks like it should be solvable by restructuring the data but I haven’t had any luck figuring out how to do it.   Any help would be appreciated.

Pat

 

Reply | Threaded
Open this post in threaded view
|

Re: problem with dis-aggregating data

David Millar

Patricia

 

The following is some code that I wrote back in 1999 to do much the same.  It could form the basis of a solution for you.

 

Best wishes

 

David

 

*The files Jude1.sps and Jude2.sps look at a data set of schools.

*The intention was to create a list of unique school/pupil ids -

*one for every pupil listed as being in the school (sch_tota).

*i.e. for a data set with 30 schools, each with a value of 20 for

*sch_tota Jude1 and 2 create a data set of 600 ids one for each

*student (actually in this case there are an extra 5 ids created

*for each school - sch_tota+5 - in case extra pupils had come into

*the school.

 

*The purpose for creating these ids was to send schools id lists

*on disk to which they could add pupil info such as name.  Thus

*saving some transcription at this end.

 

get file='c:\mydocu~1\selected.sav'.

 

compute pupid=0

 

compute pupils=sch_tota + 5.

loop #p=1 to pupils.

. compute pupid=pupid + 1.

. formats sch_id (f2.0)/pupid (f3.0).

. write outfile='c:\mydocu~1\jude1.txt'/ sch_id pupid.

end loop.

 

execute.

 

From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Cleland, Patricia (EDU)
Sent: Thursday, October 21, 2010 2:51 PM
To: [hidden email]
Subject: problem with dis-aggregating data

 

A colleague has received a data set which has been aggregated so that there is one record per client per agency by the length_of_admission_in_Days. So that if a client had admissions of 1, 3 and 5 days, there would be 3 records.  For her analysis, she needs to have the data dis-aggregated.

 

Getting the original data set is unlikely at this time.

 

Here’s what the aggregated data look:

 

agency_id

client_id

number_of_admissions

length_of_admission_in_Days

4

5

3

1

4

5

1

2

 

 

 

What she wants is a data set with one record per client per agency per admission.

 

agency_id

client_id

number_of_admissions

length_of_admission_in_Days

4

5

1

1

4

5

1

1

4

5

1

1

4

5

1

2

 

 

The variable number_of_admissions is redundant in the dis-aggregated data set and can be deleted since it will be 1 for all cases.

 

This looks like it should be solvable by restructuring the data but I haven’t had any luck figuring out how to do it.   Any help would be appreciated.

Pat

 

Reply | Threaded
Open this post in threaded view
|

Re: problem with dis-aggregating data

Bruce Weaver
Administrator
In reply to this post by Cleland, Patricia (EDU)
Cleland, Patricia (EDU) wrote
A colleague has received a data set which has been aggregated so that
there is one record per client per agency by the
length_of_admission_in_Days. So that if a client had admissions of 1, 3
and 5 days, there would be 3 records.  For her analysis, she needs to
have the data dis-aggregated.

 

Getting the original data set is unlikely at this time.

Here's what the aggregated data look:

--- Tables snipped ---  


The variable number_of_admissions is redundant in the dis-aggregated
data set and can be deleted since it will be 1 for all cases.

This looks like it should be solvable by restructuring the data but I
haven't had any luck figuring out how to do it.   Any help would be
appreciated.

Pat
loop # = 1 to number_of_admissions.
- xsave outfile = "C:\temp\newfile.sav" / drop = number_of_admissions .
end loop.

get file = "C:\temp\newfile.sav".
list.
--
Bruce Weaver
bweaver@lakeheadu.ca
http://sites.google.com/a/lakeheadu.ca/bweaver/

"When all else fails, RTFM."

PLEASE NOTE THE FOLLOWING: 
1. My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above.
2. The SPSSX Discussion forum on Nabble is no longer linked to the SPSSX-L listserv administered by UGA (https://listserv.uga.edu/).
Reply | Threaded
Open this post in threaded view
|

Re: problem with dis-aggregating data

Cleland, Patricia (EDU)
Thanks, Bruce. I knew it was something straight forward.

I just never think of LOOP as being useful for data restructuring.

Pat

--------------------------------
Patricia Cleland, OCT
Senior Statistical and Research Analyst
Learning Environment Branch
Ministry of Education

15th Floor, Mowat Block
900 Bay Street
Toronto, Ontario
M7A 1L2

phone: 416-325-2697
fax:     416-325-4344

email: [hidden email]


-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of
Bruce Weaver
Sent: October 21, 2010 10:21 AM
To: [hidden email]
Subject: Re: [SPSSX-L] problem with dis-aggregating data

Cleland, Patricia (EDU) wrote:
>
> A colleague has received a data set which has been aggregated so that
> there is one record per client per agency by the
> length_of_admission_in_Days. So that if a client had admissions of 1,
3

> and 5 days, there would be 3 records.  For her analysis, she needs to
> have the data dis-aggregated.
>
>
>
> Getting the original data set is unlikely at this time.
>
> Here's what the aggregated data look:
>
> --- Tables snipped ---
>
>
> The variable number_of_admissions is redundant in the dis-aggregated
> data set and can be deleted since it will be 1 for all cases.
>
> This looks like it should be solvable by restructuring the data but I
> haven't had any luck figuring out how to do it.   Any help would be
> appreciated.
>
> Pat
>
>

loop # = 1 to number_of_admissions.
- xsave outfile = "C:\temp\newfile.sav" / drop = number_of_admissions .
end loop.

get file = "C:\temp\newfile.sav".
list.


-----
--
Bruce Weaver
[hidden email]
http://sites.google.com/a/lakeheadu.ca/bweaver/

"When all else fails, RTFM."

NOTE: My Hotmail account is not monitored regularly.
To send me an e-mail, please use the address shown above.

--
View this message in context:
http://spssx-discussion.1045642.n5.nabble.com/problem-with-dis-aggregati
ng-data-tp3230534p3230583.html
Sent from the SPSSX Discussion mailing list archive at Nabble.com.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD