I have a very large dataset where I need to aggregate some variables. Here is specifically what I need help with. I have two id variables where staff are nested within agencies. The data looks like the table below. I would like to create a variable at the staff level that counts the number of agencies they worked out. So, for example, staff id 1 worked at 7 different agencies (she worked at agency 1 twice). The new variable would return a value of 7 for this specific staff id. Any help would be greatly appreciated.
|
Why not run aggregate twice. First time, break by staff and agency for a count, which you throw away, because you need to collapse multiple records for the same
agency to one record within staff. Second time, break by staff for another count, which is what you want. Gene Maguin.
From: SPSSX(r) Discussion [mailto:[hidden email]]
On Behalf Of Scott Roesch I have a very large dataset where I need to aggregate some variables. Here is specifically what I need help with. I have two id variables where staff are nested within agencies. The data looks like the table below. I would like to create
a variable at the staff level that counts the number of agencies they worked out. So, for example, staff id 1 worked at 7 different agencies (she worked at agency 1 twice). The new variable would return a value of 7 for this specific staff id. Any help would
be greatly appreciated.
===================== To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
|
Administrator
|
In reply to this post by Scott Roesch
Something like this?
* Flag first record for each ID*Agency combo. SORT CASES by ID Agency. ADD FILES FILE = * / BY ID AGENCY / FIRST = FirstRec. EXECUTE. * Use AGGREGATE to get the number of agencies for each ID. AGGREGATE /OUTFILE=* MODE=ADDVARIABLES /BREAK=ID /NumAgencies=SUM(FirstRec). FORMATS NumAgencies (F5.0).
--
Bruce Weaver bweaver@lakeheadu.ca http://sites.google.com/a/lakeheadu.ca/bweaver/ "When all else fails, RTFM." PLEASE NOTE THE FOLLOWING: 1. My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above. 2. The SPSSX Discussion forum on Nabble is no longer linked to the SPSSX-L listserv administered by UGA (https://listserv.uga.edu/). |
Bruce,
I'm curious about what is happening to the rest of your posts today. With respect to this post, the whole text is Nabble but only what you see below is what was seen on the list. Gene Maguin -----Original Message----- From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Bruce Weaver Sent: Thursday, June 25, 2015 4:38 PM To: [hidden email] Subject: Re: Aggregating variables Something like this? Scott Roesch wrote > I have a very large dataset where I need to aggregate some variables. > Here is specifically what I need help with. I have two id variables > where staff are nested within agencies. The data looks like the table > below. I would like to create a variable at the staff level that > counts the number of agencies they worked out. So, for example, staff > id 1 worked at 7 different agencies (she worked at agency 1 twice). > The new variable would return a value of 7 for this specific staff id. > Any help would be greatly appreciated. > > > Staff > > Agency > > 1 > > 1 > > 1 > > 1 > > 1 > > 2 > > 1 > > 3 > > 1 > > 4 > > 1 > > 5 > > 1 > > 6 > > 1 > > 7 > > ===================== > To manage your subscription to SPSSX-L, send a message to > LISTSERV@.UGA > (not to SPSSX-L), with no body text except the command. To leave the > list, send the command SIGNOFF SPSSX-L For a list of commands to > manage subscriptions, send the command INFO REFCARD ----- -- Bruce Weaver [hidden email] http://sites.google.com/a/lakeheadu.ca/bweaver/ "When all else fails, RTFM." NOTE: My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above. -- View this message in context: http://spssx-discussion.1045642.n5.nabble.com/Aggregating-variables-tp5729952p5729956.html Sent from the SPSSX Discussion mailing list archive at Nabble.com. ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
Administrator
|
Thanks for pointing that out, Gene. I read & post via Nabble only, so was not aware part of my post was missing. The bit that was missing was enclosed in html tags for raw text. In another recent thread, Art K and some others were advocating for posting of syntax & listings in plain text, which is why I did that. But if it's going to make things invisible to those who read via the UGA mailing list, I'll stop!
Meanwhile, my complete earlier post in this thread can be seen here: http://spssx-discussion.1045642.n5.nabble.com/Aggregating-variables-tp5729952p5729956.html Cheers, Bruce
--
Bruce Weaver bweaver@lakeheadu.ca http://sites.google.com/a/lakeheadu.ca/bweaver/ "When all else fails, RTFM." PLEASE NOTE THE FOLLOWING: 1. My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above. 2. The SPSSX Discussion forum on Nabble is no longer linked to the SPSSX-L listserv administered by UGA (https://listserv.uga.edu/). |
Administrator
|
In reply to this post by Bruce Weaver
Bruce,
With a "very large" file that SORT will screw you. Furthermore you can speed up both ideas (Gene's and yours) by adding a PRESORTED to the AGGREGATE. --- /* Simulate data */. MATRIX. SAVE MAKE(100000000,2,0) /OUTFILE* /VARIABLES Staff Agency . END MATRIX. COMPUTE Staff =TRUNC(UNIFORM(100000)). COMPUTE Agency=TRUNC(UNIFORM(100)). SAVE OUTFILE 'C:\TEMP\aggtest.sav' . DEFINE !BenchMe (!POS !CMDEND) ECHO !QUOTE(!1). DATASET ACTIVATE benchmark. PRINT / $TIME (DATETIME20). EXECUTE. !ENDDEFINE . DATA LIST FREE / x . BEGIN DATA 1 END DATA. DATASET NAME benchmark. PRESERVE. SET ERRORS OFF. /* Bruce */. !BenchMe Begin Bruce. GET FILE 'C:\TEMP\aggtest.sav' . DATASET NAME rawdata. SORT CASES BY Staff Agency . ADD FILES / FILE * / BY Staff Agency / FIRST= Top. AGGREGATE OUTFILE * / BREAK Staff / N_Agency=SUM(Top). DATASET NAME aggBruce. !BenchMe End Bruce. /* Gene */. !BenchMe Begin Gene. GET FILE 'C:\TEMP\aggtest.sav' . DATASET NAME rawdata. AGGREGATE OUTFILE * / BREAK Staff Agency / Throwaway=N. AGGREGATE OUTFILE * / BREAK Staff / N_Agency=N. DATASET NAME aggGene. !BenchMe End Gene. /* David */. !BenchMe Begin David. GET FILE 'C:\TEMP\aggtest.sav' . DATASET NAME rawdata. AGGREGATE OUTFILE * / BREAK Staff Agency / Throwaway=N. AGGREGATE OUTFILE * /PRESORTED / BREAK Staff / N_Agency=N. DATASET NAME aggDavid. !BenchMe End David. RESTORE.
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me. --- "Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis." Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?" |
Free forum by Nabble | Edit this page |