Aggregating variables

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

Aggregating variables

Scott Roesch
I have a very large dataset where I need to aggregate some variables. Here is specifically what I need help with. I have two id variables where staff are nested within agencies. The data looks like the table below. I would like to create a variable at the staff level that counts the number of agencies they worked out. So, for example, staff id 1 worked at 7 different agencies (she worked at agency 1 twice). The new variable would return a value of 7 for this specific staff id. Any help would be greatly appreciated.


Staff

Agency

1

1

1

1

1

2

1

3

1

4

1

5

1

6

1

7

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Aggregating variables

Maguin, Eugene

Why not run aggregate twice. First time, break by staff and agency for a count, which you throw away, because you need to collapse multiple records for the same agency to one record within staff. Second time, break by staff for another count, which is what you want.

Gene Maguin.

 

From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Scott Roesch
Sent: Thursday, June 25, 2015 2:08 PM
To: [hidden email]
Subject: Aggregating variables

 

I have a very large dataset where I need to aggregate some variables. Here is specifically what I need help with. I have two id variables where staff are nested within agencies. The data looks like the table below. I would like to create a variable at the staff level that counts the number of agencies they worked out. So, for example, staff id 1 worked at 7 different agencies (she worked at agency 1 twice). The new variable would return a value of 7 for this specific staff id. Any help would be greatly appreciated.

 

 

Staff

Agency

1

1

1

1

1

2

1

3

1

4

1

5

1

6

1

7

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Aggregating variables

Bruce Weaver
Administrator
In reply to this post by Scott Roesch
Something like this?
* Flag first record for each ID*Agency combo.

SORT CASES by ID Agency.
ADD FILES 
 FILE = * / 
 BY ID AGENCY / 
 FIRST = FirstRec.
EXECUTE.

* Use AGGREGATE to get the number of agencies for each ID.
AGGREGATE
  /OUTFILE=* MODE=ADDVARIABLES
  /BREAK=ID
  /NumAgencies=SUM(FirstRec).
FORMATS NumAgencies (F5.0).

Scott Roesch wrote
I have a very large dataset where I need to aggregate some variables. Here
is specifically what I need help with. I have two id variables where staff
are nested within agencies. The data looks like the table below. I would
like to create a variable at the staff level that counts the number of
agencies they worked out. So, for example, staff id 1 worked at 7 different
agencies (she worked at agency 1 twice). The new variable would return a
value of 7 for this specific staff id. Any help would be greatly
appreciated.


                   Staff

Agency

1

1

1

1

1

2

1

3

1

4

1

5

1

6

1

7

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
--
Bruce Weaver
bweaver@lakeheadu.ca
http://sites.google.com/a/lakeheadu.ca/bweaver/

"When all else fails, RTFM."

PLEASE NOTE THE FOLLOWING: 
1. My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above.
2. The SPSSX Discussion forum on Nabble is no longer linked to the SPSSX-L listserv administered by UGA (https://listserv.uga.edu/).
Reply | Threaded
Open this post in threaded view
|

Re: Aggregating variables

Maguin, Eugene
Bruce,
I'm curious about what is happening to the rest of your posts today. With respect to this post, the whole text is Nabble but only what you see below is what was seen on the list.
Gene Maguin


-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Bruce Weaver
Sent: Thursday, June 25, 2015 4:38 PM
To: [hidden email]
Subject: Re: Aggregating variables

Something like this?



Scott Roesch wrote

> I have a very large dataset where I need to aggregate some variables.
> Here is specifically what I need help with. I have two id variables
> where staff are nested within agencies. The data looks like the table
> below. I would like to create a variable at the staff level that
> counts the number of agencies they worked out. So, for example, staff
> id 1 worked at 7 different agencies (she worked at agency 1 twice).
> The new variable would return a value of 7 for this specific staff id.
> Any help would be greatly appreciated.
>
>
>                    Staff
>
> Agency
>
> 1
>
> 1
>
> 1
>
> 1
>
> 1
>
> 2
>
> 1
>
> 3
>
> 1
>
> 4
>
> 1
>
> 5
>
> 1
>
> 6
>
> 1
>
> 7
>
> =====================
> To manage your subscription to SPSSX-L, send a message to

> LISTSERV@.UGA

>  (not to SPSSX-L), with no body text except the command. To leave the
> list, send the command SIGNOFF SPSSX-L For a list of commands to
> manage subscriptions, send the command INFO REFCARD





-----
--
Bruce Weaver
[hidden email]
http://sites.google.com/a/lakeheadu.ca/bweaver/

"When all else fails, RTFM."

NOTE: My Hotmail account is not monitored regularly.
To send me an e-mail, please use the address shown above.

--
View this message in context: http://spssx-discussion.1045642.n5.nabble.com/Aggregating-variables-tp5729952p5729956.html
Sent from the SPSSX Discussion mailing list archive at Nabble.com.

=====================
To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Aggregating variables

Bruce Weaver
Administrator
Thanks for pointing that out, Gene.  I read & post via Nabble only, so was not aware part of my post was missing.  The bit that was missing was enclosed in html tags for raw text.  In another recent thread, Art K and some others were advocating for posting of syntax & listings in plain text, which is why I did that.  But if it's going to make things invisible to those who read via the UGA mailing list, I'll stop!  

Meanwhile, my complete earlier post in this thread can be seen here:

http://spssx-discussion.1045642.n5.nabble.com/Aggregating-variables-tp5729952p5729956.html

Cheers,
Bruce


Maguin, Eugene wrote
Bruce,
I'm curious about what is happening to the rest of your posts today. With respect to this post, the whole text is Nabble but only what you see below is what was seen on the list.
Gene Maguin


-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Bruce Weaver
Sent: Thursday, June 25, 2015 4:38 PM
To: [hidden email]
Subject: Re: Aggregating variables

Something like this?



Scott Roesch wrote
> I have a very large dataset where I need to aggregate some variables.
> Here is specifically what I need help with. I have two id variables
> where staff are nested within agencies. The data looks like the table
> below. I would like to create a variable at the staff level that
> counts the number of agencies they worked out. So, for example, staff
> id 1 worked at 7 different agencies (she worked at agency 1 twice).
> The new variable would return a value of 7 for this specific staff id.
> Any help would be greatly appreciated.
>
>
>                    Staff
>
> Agency
>
> 1
>
> 1
>
> 1
>
> 1
>
> 1
>
> 2
>
> 1
>
> 3
>
> 1
>
> 4
>
> 1
>
> 5
>
> 1
>
> 6
>
> 1
>
> 7
>
> =====================
> To manage your subscription to SPSSX-L, send a message to

> LISTSERV@.UGA

>  (not to SPSSX-L), with no body text except the command. To leave the
> list, send the command SIGNOFF SPSSX-L For a list of commands to
> manage subscriptions, send the command INFO REFCARD





-----
--
Bruce Weaver
[hidden email]
http://sites.google.com/a/lakeheadu.ca/bweaver/

"When all else fails, RTFM."

NOTE: My Hotmail account is not monitored regularly.
To send me an e-mail, please use the address shown above.

--
View this message in context: http://spssx-discussion.1045642.n5.nabble.com/Aggregating-variables-tp5729952p5729956.html
Sent from the SPSSX Discussion mailing list archive at Nabble.com.

=====================
To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
--
Bruce Weaver
bweaver@lakeheadu.ca
http://sites.google.com/a/lakeheadu.ca/bweaver/

"When all else fails, RTFM."

PLEASE NOTE THE FOLLOWING: 
1. My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above.
2. The SPSSX Discussion forum on Nabble is no longer linked to the SPSSX-L listserv administered by UGA (https://listserv.uga.edu/).
Reply | Threaded
Open this post in threaded view
|

Re: Aggregating variables

David Marso
Administrator
In reply to this post by Bruce Weaver
Bruce,
With a "very large" file that SORT will screw you.
Furthermore you can speed up both ideas (Gene's and yours) by adding a PRESORTED to the AGGREGATE.
---

/* Simulate data */.
MATRIX.
SAVE MAKE(100000000,2,0) /OUTFILE* /VARIABLES Staff Agency .
END MATRIX.
COMPUTE Staff =TRUNC(UNIFORM(100000)).
COMPUTE Agency=TRUNC(UNIFORM(100)).
SAVE OUTFILE  'C:\TEMP\aggtest.sav' .

DEFINE !BenchMe (!POS !CMDEND)
ECHO !QUOTE(!1).
DATASET ACTIVATE benchmark.
PRINT / $TIME (DATETIME20).
EXECUTE.
!ENDDEFINE .


DATA LIST FREE / x .
BEGIN DATA
1
END DATA.

DATASET NAME benchmark.

PRESERVE.
SET ERRORS OFF.
/* Bruce */.

!BenchMe  Begin Bruce.
GET FILE  'C:\TEMP\aggtest.sav' .
DATASET NAME rawdata.
SORT CASES BY  Staff Agency .
ADD FILES / FILE * / BY  Staff Agency  / FIRST= Top.
AGGREGATE OUTFILE *  / BREAK Staff / N_Agency=SUM(Top).
DATASET NAME aggBruce.
!BenchMe End Bruce.



/* Gene */.
!BenchMe  Begin Gene.
GET FILE  'C:\TEMP\aggtest.sav' .
DATASET NAME rawdata.
AGGREGATE OUTFILE *  / BREAK Staff Agency / Throwaway=N.
AGGREGATE OUTFILE *  / BREAK Staff / N_Agency=N.
DATASET NAME aggGene.
!BenchMe End Gene.


/* David */.
!BenchMe Begin David.
GET FILE  'C:\TEMP\aggtest.sav' .
DATASET NAME rawdata.
AGGREGATE OUTFILE *  / BREAK Staff Agency / Throwaway=N.
AGGREGATE OUTFILE * /PRESORTED / BREAK Staff / N_Agency=N.
DATASET NAME aggDavid.
!BenchMe End David.

RESTORE.


Bruce Weaver wrote
Something like this?
* Flag first record for each ID*Agency combo.

SORT CASES by ID Agency.
ADD FILES 
 FILE = * / 
 BY ID AGENCY / 
 FIRST = FirstRec.
EXECUTE.

* Use AGGREGATE to get the number of agencies for each ID.
AGGREGATE
  /OUTFILE=* MODE=ADDVARIABLES
  /BREAK=ID
  /NumAgencies=SUM(FirstRec).
FORMATS NumAgencies (F5.0).

Scott Roesch wrote
I have a very large dataset where I need to aggregate some variables. Here
is specifically what I need help with. I have two id variables where staff
are nested within agencies. The data looks like the table below. I would
like to create a variable at the staff level that counts the number of
agencies they worked out. So, for example, staff id 1 worked at 7 different
agencies (she worked at agency 1 twice). The new variable would return a
value of 7 for this specific staff id. Any help would be greatly
appreciated.


                   Staff

Agency

1

1

1

1

1

2

1

3

1

4

1

5

1

6

1

7

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me.
---
"Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis."
Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?"