I have a data set containing hospital data where I want to check how many different departments each patient has visited. A patient can have several cases (visits) in the data set an each case contains a department id that can be either the same as earlier or a new one.
I tried using the aggregate function with "id" as break variable, but there are no "distinct count" subfunction in the aggregate function and N(department) only return total number of cases for each patient. Here is some example data DATA LIST LIST / id department. BEGIN DATA. 1, 2 1, 3 1, 2 2, 1 2, 1 3, 3 4, 1 4, 2 4, 2 4, 5 END DATA. LIST. I want the result to look like this id dept dept_count 1.00 2.00 2.00 1.00 3.00 2.00 1.00 2.00 2.00 2.00 1.00 1.00 2.00 1.00 1.00 3.00 3.00 1.00 4.00 1.00 3.00 4.00 2.00 3.00 4.00 2.00 3.00 4.00 5.00 3.00 Can anyone please help me! |
If it's also possible to list all the distinct different departments as dept1, dept2, dept... and assign a number to how many time each distinct/unique dept has occurred (dept1_n) that would have been superb.
|
In reply to this post by nessie
I don't think your desired result data matches your description. I'd say the result should be this
Id, dept_id, count 1, 2,3 1, 3,1 2, 1,2 3, 3,1 4, 1,1 4, 2,2 4, 5,1 You get the count I show if you set 'break' to id, dept_id. Once you have the aggregated data, you could get what you posted in your second message (I think this is what you want) but this Do repeat x=dept1 to dept6/y=1 to 6. If (dept_id eq y) x=count. End repeat. Gene Maguin -----Original Message----- From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of nessie Sent: Friday, June 05, 2015 9:12 AM To: [hidden email] Subject: Count distinct using aggregate I have a data set containing hospital data where I want to check how many different departments each patient has visited. A patient can have several cases (visits) in the data set an each case contains a department id that can be either the same as earlier or a new one. I tried using the aggregate function with "id" as break variable, but there are no "distinct count" subfunction in the aggregate function and N(department) only return total number of cases for each patient. Here is some example data DATA LIST LIST / id department. BEGIN DATA. 1, 2 1, 3 1, 2 2, 1 2, 1 3, 3 4, 1 4, 2 4, 2 4, 5 END DATA. LIST. I want the result to look like this id dept dept_count 1.00 2.00 2.00 1.00 3.00 2.00 1.00 2.00 2.00 2.00 1.00 1.00 2.00 1.00 1.00 3.00 3.00 1.00 4.00 1.00 3.00 4.00 2.00 3.00 4.00 2.00 3.00 4.00 5.00 3.00 Can anyone please help me! -- View this message in context: http://spssx-discussion.1045642.n5.nabble.com/Count-distinct-using-aggregate-tp5729756.html Sent from the SPSSX Discussion mailing list archive at Nabble.com. ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
In reply to this post by nessie
This question comes up quite frequently and SPSS doesn't allow to extract this type of information quite as easily as it really should.
Another software I use has exactly as you elude to,a sub-function in their AGGREGATE command which can calculate DISTINCT values of a target variable within each BREAK. /* Count unique departments visited by each patient */. SORT CASES BY ID DEPARTMENT. MATCH FILES FILE=* /BY ID DEPARTMENT /FIRST=Pri_Dep_WI_Pat /* Primary Department Within Patient */ . CTABLES /VLABELS VARIABLES=Pri_Dep_WI_Pat DISPLAY=NONE /TABLE ID[C] BY Pri_Dep_WI_Pat[S][SUM 'Nb. of Unique Departments']. AGGREGATE OUTFILE=* MODE=ADDVARIABLES /BREAK=ID /DepCount=SUM(Pri_Dep_WI_Pat). /* Count unique patients visted at each department */. SORT CASES BY DEPARTMENT ID. MATCH FILES FILE=* /BY DEPARTMENT ID /FIRST=Pri_Pat_WI_Dep /*Primary Patient Within Department */. CTABLES /VLABELS VARIABLES=Pri_Pat_WI_Dep DISPLAY=NONE /TABLE DEPARTMENT[C] BY Pri_Pat_WI_Dep[S][SUM 'Nb. of Unique Patients']. AGGREGATE OUTFILE=* /BREAK=DEPARTMENT /PatCount=SUM(Pri_Pat_WI_Dep). |
I've logged this as a Request for Enhancement (RFE) with IBM. If anyone wishes to do so please up-vote, all my previous entries still show their initial status "Submitted" without any further update, so don't have much faith anything may come of it?!
-------------------------------------------------------------------------------------------------------------------------- Notification generated at: 05 Jun 2015, 09:45 AM Eastern Time (ET) ID: 72354 Headline: AGGREGATE DISTINCT FUNCTION Submitted on: 05 Jun 2015, 09:45 AM Eastern Time (ET) Brand: Business Analytics Product: SPSS Statistics Link: http://www.ibm.com/developerworks/rfe/execute?use_case=viewRfe&CR_ID=72354 -------------------------------------------------------------------------------------------------------------------------- |
In reply to this post by Jignesh Sutar
Jignesh,
Would educate me about your posting, please. What does the CTABLES command do in relation to the preceding and following Aggregate commands? I don't now have CTables on my machine but when I did it simply provided a sophisticated data summarization display. Is that the purpose here? To provide a display of the result of the preceding Aggregate? Thanks, Gene Maguin -----Original Message----- From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Jignesh Sutar Sent: Friday, June 05, 2015 9:37 AM To: [hidden email] Subject: Re: Count distinct using aggregate This question comes up quite frequently and SPSS doesn't allow to extract this type of information quite as easily as it really should. Another software I use has exactly as you elude to,a sub-function in their AGGREGATE command which can calculate DISTINCT values of a target variable within each BREAK. /* Count unique departments visited by each patient */. SORT CASES BY ID DEPARTMENT. MATCH FILES FILE=* /BY ID DEPARTMENT /FIRST=Pri_Dep_WI_Pat /* Primary Department Within Patient */ . CTABLES /VLABELS VARIABLES=Pri_Dep_WI_Pat DISPLAY=NONE /TABLE ID[C] BY Pri_Dep_WI_Pat[S][SUM 'Nb. of Unique Departments']. AGGREGATE OUTFILE=* MODE=ADDVARIABLES /BREAK=ID /DepCount=SUM(Pri_Dep_WI_Pat). /* Count unique patients visted at each department */. SORT CASES BY DEPARTMENT ID. MATCH FILES FILE=* /BY DEPARTMENT ID /FIRST=Pri_Pat_WI_Dep /*Primary Patient Within Department */. CTABLES /VLABELS VARIABLES=Pri_Pat_WI_Dep DISPLAY=NONE /TABLE DEPARTMENT[C] BY Pri_Pat_WI_Dep[S][SUM 'Nb. of Unique Patients']. AGGREGATE OUTFILE=* /BREAK=DEPARTMENT /PatCount=SUM(Pri_Pat_WI_Dep). -- View this message in context: http://spssx-discussion.1045642.n5.nabble.com/Count-distinct-using-aggregate-tp5729756p5729759.html Sent from the SPSSX Discussion mailing list archive at Nabble.com. ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
Administrator
|
In reply to this post by nessie
DATA LIST FREE / id department.
BEGIN DATA. 1 2 1 3 1 2 2 1 2 1 3 3 4 1 4 2 4 2 4 5 END DATA. SORT CASES BY ID department. AGGREGATE OUTFILE * MODE ADDVARIABLES /BREAK ID department /Count=N. * But IMNSHO it makes more sense to omit the SORT and just do. AGGREGATE OUTFILE * /BREAK ID department /Count=N.
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me. --- "Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis." Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?" |
In reply to this post by Maguin, Eugene
Yes exactly, it may not be necessary that one needs to generate a variable containing those statistics so the CTABLES produce the figures as a table and then the following AGGREGATE adds to the dataset, if so you require. Just a sense check , if nothing else also. On 5 June 2015 at 14:50, Maguin, Eugene <[hidden email]> wrote: Jignesh, |
This post was updated on .
In reply to this post by David Marso
David, That looks like it would give you total number of VISITS per patient AND department, not the unique number of DEPARTMENTS visited per patient ONLY as OP desired?
|
In reply to this post by nessie
I didn't answer the second part of the post in complete, but you can do something like this:
DATA LIST LIST / id department. BEGIN DATA. 1, 2 1, 3 1, 2 2, 1 2, 1 3, 3 4, 1 4, 2 4, 2 4, 5 END DATA. DATASET NAME DS0. /* Count unique patients visited at each department / Number of total visits and average visits per patient */. SORT CASES BY DEPARTMENT ID. MATCH FILES FILE=* /BY DEPARTMENT ID /FIRST=Pri_Pat_WI_Dep /*Primary Patient Within Department */. DATASET DECLARE DSAGG. AGGREGATE OUTFILE=DSAGG /BREAK=DEPARTMENT /PatCount=SUM(Pri_Pat_WI_Dep) /TotalVisits=N. DATASET ACTIVATE DSAGG. COMPUTE VisitsPerPat=TotalVisits/PatCount. FORMATS ALL (F8.1). CTABLES /VLABELS VARIABLES=PatCount TotalVisits VisitsPerPat DISPLAY=NONE /TABLE DEPARTMENT[C] BY PatCount[S][SUM 'Nb. of Unique Patients'] + TotalVisits[S][SUM 'Total Visits'] + VisitsPerPat[S][MEAN 'Ave. Visits Per Patient']. |
Administrator
|
In reply to this post by Jignesh Sutar
Oops, misread the Q.
-- DATA LIST FREE / id department. BEGIN DATA. 1 2 1 3 1 2 2 1 2 1 3 3 4 1 4 2 4 2 4 5 END DATA. SORT CASES BY id. DATASET NAME raw. AGGREGATE OUTFILE * /BREAK ID department /Count=N. AGGREGATE OUTFILE * /BREAK ID /N_Dept=N. MATCH FILES / FILE raw / TABLE * / BY id. LIST.
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me. --- "Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis." Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?" |
Administrator
|
In reply to this post by nessie
DATA LIST FREE / id department.
BEGIN DATA. 1 2 1 3 1 2 2 1 2 1 3 3 4 1 4 2 4 2 4 5 END DATA. SORT CASES BY id. DATASET NAME raw. FORMATS department (F1). AGGREGATE OUTFILE * /BREAK ID department /Count=N. CASESTOVARS /ID=id /INDEX=department. COMPUTE @=$SYSMIS. COMPUTE n_dept=NVALID(id TO @)-1. MATCH FILES / FILE * / DROP @. LIST. id Count.1 Count.2 Count.3 Count.5 n_dept 1.00 . 2 1 . 2.00 2.00 2 . . . 1.00 3.00 . . 1 . 1.00 4.00 1 2 . 1 3.00 Number of cases read: 4 Number of cases listed: 4
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me. --- "Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis." Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?" |
In reply to this post by David Marso
That's a neat way of doing it also. Avoids a multi variable SORT so perhaps preferable in terms of speed performance with large datasets (but you do then have to juggle/match with multiple datasets).
I do feel there still should be a more direct way of doing this in SPSS. A simple DISTINCT function in AGGREGATE would be very much apt for this. |
Administrator
|
Actually the SORT and DATASET were leftover from previous solution.
It could be as simple as follows: Note the raw data are replaced by the AGGREGATE. -- DATA LIST FREE / id department. BEGIN DATA. 1 2 1 3 1 2 2 1 2 1 3 3 4 1 4 2 4 2 4 5 END DATA. FORMATS department (F1). AGGREGATE OUTFILE * /BREAK ID department /Count=N. CASESTOVARS /ID=id /INDEX=department. COMPUTE @=$SYSMIS. COMPUTE n_dept=NVALID(id TO @)-1. MATCH FILES / FILE * / DROP @. LIST.
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me. --- "Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis." Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?" |
Dear David and Jignesh
Thank’s a lot - I would go as far as to say you guys are geniuses. I still agree with Jignesh that this should be implemented as a “distinct count” (and maybe also a “list distinct”) under the “aggregate” function in SPSS, but thank’s to you I can do the same without it. I bet this could also be useful for a lot of other SPSS users. One more question: Is it possible to do this across multiple variables for each case? E.g.: A patient could have several variables describing diagnosis codes for each case. One for main diagnose and quite a few to describe possible bi diagnosis. Each code could only be present one time for each case, but the same code could be found in different places (main and different bi diagnosis variables) across cases with the same id. I want to count the number of distinct diagnosis codes for each id across cases and code-variables. Example data: DATA LIST LIST / id * di_m (A1) di_b1 (A1) di_b2 (A1) di_b3 (A1). BEGIN DATA. 1, S, G 1, D, S 1, B, X 2, A, B 2, A 3, C 4, A 4, A 4, H, J 4, A, Y, U, J END DATA. LIST. The output would look like this: id di_m di_b1 di_b2 di_b3 1.00 S G 1.00 D S 1.00 B X 2.00 A B 2.00 A 3.00 C 4.00 A 4.00 A 4.00 H J 4.00 A Y U J Desired outcome (the n_distinct_di is most important, the rest is a bonus): id A B C D G H J S U X Y n_di n_distinct_di 1 1 1 1 2 1 6 5 2 2 1 3 2 3 1 1 1 4 3 1 2 1 1 8 5 Even more desired outcome (there is almost 20 000 different diagnosis codes, so this would make it easier to interpret and analyse): id di_1 di_2 di_3 di_4 di_5 n_di n_distinct_di 1 S G D B X 6 5 2 A B 3 2 3 C 1 1 4 A H J Y U 8 5 Thanks a lot for all you help and effort to find smart and quick solutions. Regardless of if you can help me with this last question or not your help has been priceless. Best regards Lars N.
|
You can get a distinct number of values across variables with each case using VECTOR/LOOP, I have some code to do this but can't remember on the fly at the moment, neither on my laptop.
You cold always restructure the dataset so you have a long format PATIENT-DIAGNOSIS level dataset (probably preferable in most cases) and then apply same logic as previously to get distinct diagnosis codes within each patient, which is what I think you are trying to do? On Sat, 6 Jun 2015 at 22:53, nessie <[hidden email]> wrote:
=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Dear David and Jignesh |
Thanks a lot Jignesh, I will try to follow your tips and look into it.
If I restructure my data as you suggest I agree that the same logic should give me my desired distinct count of diagnosis. Best regards Lars N.
|
Administrator
|
In reply to this post by Jignesh Sutar
Definitely do the restructure using VARSTOCASES. Then the identical logic applies to get your desired result.
I suspect that will be much easier than doing the vector/loop with an indefinite number of possible targets.
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me. --- "Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis." Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?" |
Administrator
|
DATA LIST LIST /
id * di_m (A1) di_b1 (A1) di_b2 (A1) di_b3 (A1). BEGIN DATA. 1, S, G 1, D, S 1, B, X 2, A, B 2, A 3, C 4, A 4, A 4, H, J 4, A, Y, U, J END DATA. VARSTOCASES /MAKE Di FROM di_m di_b1 di_b2 di_b3. AGGREGATE OUTFILE * / BREAK id Di/ Count=N. AGGREGATE OUTFILE * MODE ADDVARIABLES/ BREAK id/ Distinct=N. /* First version (probably unweildy with 20K possible D(i) /*CASESTOVARS ID=id /INDEX=Di. CASESTOVARS ID=id .
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me. --- "Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis." Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?" |
Thank’s again David
I made it using VARSTOCASES and your original genius syntax for aggregating departments, but I will try this first thing tomorrow. You are a true hero Best regards Lars N.
|
Free forum by Nabble | Edit this page |