Count distinct using aggregate

classic Classic list List threaded Threaded
21 messages Options
12
Reply | Threaded
Open this post in threaded view
|

Count distinct using aggregate

nessie
I have a data set containing hospital data where I want to check how many different departments each patient has visited. A patient can have several cases (visits) in the data set an each case contains a department id that can be either the same as earlier or a new one.

I tried using the aggregate function with "id" as break variable, but there are no "distinct count" subfunction in the aggregate function and N(department) only return total number of cases for each patient.

Here is some example data

DATA LIST LIST /
id department.
BEGIN DATA.
1, 2
1, 3
1, 2
2, 1
2, 1
3, 3
4, 1
4, 2
4, 2
4, 5
END DATA.

LIST.

I want the result to look like this
id        dept     dept_count
1.00  2.00  2.00
1.00  3.00  2.00
1.00  2.00  2.00
2.00  1.00  1.00
2.00  1.00  1.00
3.00  3.00  1.00
4.00  1.00  3.00
4.00  2.00  3.00
4.00  2.00  3.00
4.00  5.00          3.00

Can anyone please help me!
Reply | Threaded
Open this post in threaded view
|

Re: Count distinct using aggregate

nessie
If it's also possible to list all the distinct different departments as dept1, dept2, dept... and assign a number to how many time each distinct/unique dept has occurred (dept1_n) that would have been superb.
Reply | Threaded
Open this post in threaded view
|

Re: Count distinct using aggregate

Maguin, Eugene
In reply to this post by nessie
I don't think your desired result data matches your description.  I'd say the result should be this
Id, dept_id, count
1, 2,3
1, 3,1
2, 1,2
3, 3,1
4, 1,1
4, 2,2
4, 5,1

You get the count I show if you set 'break' to id, dept_id.

Once you have the aggregated data, you could get what you posted in your second message (I think this is what you want) but this

Do repeat x=dept1 to dept6/y=1 to 6.
If (dept_id eq y) x=count.
End repeat.


Gene Maguin




-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of nessie
Sent: Friday, June 05, 2015 9:12 AM
To: [hidden email]
Subject: Count distinct using aggregate

I have a data set containing hospital data where I want to check how many
different departments each patient has visited. A patient can have several
cases (visits) in the data set an each case contains a department id that
can be either the same as earlier or a new one.

I tried using the aggregate function with "id" as break variable, but there
are no "distinct count" subfunction in the aggregate function and
N(department) only return total number of cases for each patient.

Here is some example data

DATA LIST LIST /
id department.
BEGIN DATA.
1, 2
1, 3
1, 2
2, 1
2, 1
3, 3
4, 1
4, 2
4, 2
4, 5
END DATA.

LIST.

I want the result to look like this
id        dept     dept_count
1.00  2.00  2.00
1.00  3.00  2.00
1.00  2.00  2.00
2.00  1.00  1.00
2.00  1.00  1.00
3.00  3.00  1.00
4.00  1.00  3.00
4.00  2.00  3.00
4.00  2.00  3.00
4.00  5.00          3.00

Can anyone please help me!




--
View this message in context: http://spssx-discussion.1045642.n5.nabble.com/Count-distinct-using-aggregate-tp5729756.html
Sent from the SPSSX Discussion mailing list archive at Nabble.com.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Count distinct using aggregate

Jignesh Sutar
In reply to this post by nessie
This question comes up quite frequently and SPSS doesn't allow to extract this type of information quite as easily as it really should.

Another software I use has exactly as you elude to,a sub-function in their AGGREGATE command which can calculate DISTINCT values of a target variable within each BREAK.


/* Count unique departments visited by each patient */.
SORT CASES BY ID DEPARTMENT.
MATCH FILES FILE=* /BY ID DEPARTMENT /FIRST=Pri_Dep_WI_Pat /* Primary Department Within Patient */ .
CTABLES /VLABELS VARIABLES=Pri_Dep_WI_Pat DISPLAY=NONE /TABLE ID[C] BY Pri_Dep_WI_Pat[S][SUM 'Nb. of Unique Departments'].
AGGREGATE OUTFILE=* MODE=ADDVARIABLES /BREAK=ID /DepCount=SUM(Pri_Dep_WI_Pat).


/* Count unique patients visted at each department */.
SORT CASES BY DEPARTMENT ID.
MATCH FILES FILE=* /BY DEPARTMENT ID /FIRST=Pri_Pat_WI_Dep /*Primary Patient Within Department */.
CTABLES /VLABELS VARIABLES=Pri_Pat_WI_Dep DISPLAY=NONE /TABLE DEPARTMENT[C] BY Pri_Pat_WI_Dep[S][SUM 'Nb. of Unique Patients'].
AGGREGATE OUTFILE=* /BREAK=DEPARTMENT /PatCount=SUM(Pri_Pat_WI_Dep).
Reply | Threaded
Open this post in threaded view
|

Re: Count distinct using aggregate

Jignesh Sutar
I've logged this as a Request for Enhancement (RFE) with IBM. If anyone wishes to do so please up-vote, all my previous entries still show their initial status "Submitted" without any further update, so don't have much faith anything may come of it?!

--------------------------------------------------------------------------------------------------------------------------
Notification generated at: 05 Jun 2015, 09:45 AM Eastern Time (ET)

ID:                                                72354
Headline:                                    AGGREGATE DISTINCT FUNCTION
Submitted on:                            05 Jun 2015, 09:45 AM Eastern Time (ET)
Brand:                                          Business Analytics
Product:                                      SPSS Statistics

Link:                                            http://www.ibm.com/developerworks/rfe/execute?use_case=viewRfe&CR_ID=72354
--------------------------------------------------------------------------------------------------------------------------
Reply | Threaded
Open this post in threaded view
|

Re: Count distinct using aggregate

Maguin, Eugene
In reply to this post by Jignesh Sutar
Jignesh,
Would educate me about your posting, please.
What does the CTABLES command do in relation to the preceding and following Aggregate commands? I don't now have CTables on my machine but when I did it simply provided a sophisticated data summarization display. Is that the purpose here? To provide a display of the result of the preceding Aggregate?
Thanks, Gene Maguin

-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Jignesh Sutar
Sent: Friday, June 05, 2015 9:37 AM
To: [hidden email]
Subject: Re: Count distinct using aggregate

This question comes up quite frequently and SPSS doesn't allow to extract this type of information quite as easily as it really should.

Another software I use has exactly as you elude to,a sub-function in their AGGREGATE command which can calculate DISTINCT values of a target variable within each BREAK.


/* Count unique departments visited by each patient */.
SORT CASES BY ID DEPARTMENT.
MATCH FILES FILE=* /BY ID DEPARTMENT /FIRST=Pri_Dep_WI_Pat /* Primary Department Within Patient */ .
CTABLES /VLABELS VARIABLES=Pri_Dep_WI_Pat DISPLAY=NONE /TABLE ID[C] BY Pri_Dep_WI_Pat[S][SUM 'Nb. of Unique Departments'].
AGGREGATE OUTFILE=* MODE=ADDVARIABLES /BREAK=ID /DepCount=SUM(Pri_Dep_WI_Pat).


/* Count unique patients visted at each department */.
SORT CASES BY DEPARTMENT ID.
MATCH FILES FILE=* /BY DEPARTMENT ID /FIRST=Pri_Pat_WI_Dep /*Primary Patient Within Department */.
CTABLES /VLABELS VARIABLES=Pri_Pat_WI_Dep DISPLAY=NONE /TABLE DEPARTMENT[C] BY Pri_Pat_WI_Dep[S][SUM 'Nb. of Unique Patients'].
AGGREGATE OUTFILE=* /BREAK=DEPARTMENT /PatCount=SUM(Pri_Pat_WI_Dep).



--
View this message in context: http://spssx-discussion.1045642.n5.nabble.com/Count-distinct-using-aggregate-tp5729756p5729759.html
Sent from the SPSSX Discussion mailing list archive at Nabble.com.

=====================
To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Count distinct using aggregate

David Marso
Administrator
In reply to this post by nessie
DATA LIST FREE / id department.
BEGIN DATA.
1 2 1 3 1 2   2 1 2 1   3 3   4 1 4 2 4 2 4 5
END DATA.
SORT CASES BY ID department.
AGGREGATE OUTFILE * MODE ADDVARIABLES /BREAK ID department /Count=N.

* But IMNSHO it makes more sense to omit the SORT and just do.
AGGREGATE OUTFILE *  /BREAK ID department /Count=N.
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me.
---
"Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis."
Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?"
Reply | Threaded
Open this post in threaded view
|

Re: Count distinct using aggregate

Jignesh Sutar
In reply to this post by Maguin, Eugene
Yes exactly, it may not be necessary that one needs to generate a variable containing those statistics so the CTABLES produce the figures as a table and then the following AGGREGATE adds to the dataset, if so you require. Just a sense check , if nothing else also.

On 5 June 2015 at 14:50, Maguin, Eugene <[hidden email]> wrote:
Jignesh,
Would educate me about your posting, please.
What does the CTABLES command do in relation to the preceding and following Aggregate commands? I don't now have CTables on my machine but when I did it simply provided a sophisticated data summarization display. Is that the purpose here? To provide a display of the result of the preceding Aggregate?
Thanks, Gene Maguin

-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Jignesh Sutar
Sent: Friday, June 05, 2015 9:37 AM
To: [hidden email]
Subject: Re: Count distinct using aggregate

This question comes up quite frequently and SPSS doesn't allow to extract this type of information quite as easily as it really should.

Another software I use has exactly as you elude to,a sub-function in their AGGREGATE command which can calculate DISTINCT values of a target variable within each BREAK.


/* Count unique departments visited by each patient */.
SORT CASES BY ID DEPARTMENT.
MATCH FILES FILE=* /BY ID DEPARTMENT /FIRST=Pri_Dep_WI_Pat /* Primary Department Within Patient */ .
CTABLES /VLABELS VARIABLES=Pri_Dep_WI_Pat DISPLAY=NONE /TABLE ID[C] BY Pri_Dep_WI_Pat[S][SUM 'Nb. of Unique Departments'].
AGGREGATE OUTFILE=* MODE=ADDVARIABLES /BREAK=ID /DepCount=SUM(Pri_Dep_WI_Pat).


/* Count unique patients visted at each department */.
SORT CASES BY DEPARTMENT ID.
MATCH FILES FILE=* /BY DEPARTMENT ID /FIRST=Pri_Pat_WI_Dep /*Primary Patient Within Department */.
CTABLES /VLABELS VARIABLES=Pri_Pat_WI_Dep DISPLAY=NONE /TABLE DEPARTMENT[C] BY Pri_Pat_WI_Dep[S][SUM 'Nb. of Unique Patients'].
AGGREGATE OUTFILE=* /BREAK=DEPARTMENT /PatCount=SUM(Pri_Pat_WI_Dep).



--
View this message in context: http://spssx-discussion.1045642.n5.nabble.com/Count-distinct-using-aggregate-tp5729756p5729759.html
Sent from the SPSSX Discussion mailing list archive at Nabble.com.

=====================
To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Count distinct using aggregate

Jignesh Sutar
This post was updated on .
In reply to this post by David Marso
David, That looks like it would give you total number of VISITS per patient AND department, not the unique number of DEPARTMENTS visited per patient ONLY as OP desired?
Reply | Threaded
Open this post in threaded view
|

Re: Count distinct using aggregate

Jignesh Sutar
In reply to this post by nessie
I didn't answer the second part of the post in complete, but you can do something like this:


DATA LIST LIST /
id department.
BEGIN DATA.
1, 2
1, 3
1, 2
2, 1
2, 1
3, 3
4, 1
4, 2
4, 2
4, 5
END DATA.
DATASET NAME DS0.


/* Count unique patients visited at each department / Number of total visits and average visits per patient */.

SORT CASES BY DEPARTMENT ID.
MATCH FILES FILE=* /BY DEPARTMENT ID /FIRST=Pri_Pat_WI_Dep /*Primary Patient Within Department */.
DATASET DECLARE DSAGG.
AGGREGATE OUTFILE=DSAGG /BREAK=DEPARTMENT /PatCount=SUM(Pri_Pat_WI_Dep) /TotalVisits=N.
DATASET ACTIVATE DSAGG.
COMPUTE VisitsPerPat=TotalVisits/PatCount.
FORMATS ALL (F8.1).
CTABLES /VLABELS VARIABLES=PatCount TotalVisits VisitsPerPat DISPLAY=NONE
  /TABLE DEPARTMENT[C] BY PatCount[S][SUM 'Nb. of Unique Patients'] + TotalVisits[S][SUM 'Total Visits'] + VisitsPerPat[S][MEAN 'Ave. Visits Per Patient'].

Reply | Threaded
Open this post in threaded view
|

Re: Count distinct using aggregate

David Marso
Administrator
In reply to this post by Jignesh Sutar
Oops, misread the Q.
--
DATA LIST FREE / id department.
BEGIN DATA.
1 2 1 3 1 2   2 1 2 1   3 3   4 1 4 2 4 2 4 5
END DATA.
SORT CASES BY id.
DATASET NAME raw.
AGGREGATE OUTFILE *  /BREAK ID department /Count=N.
AGGREGATE OUTFILE *  /BREAK ID  /N_Dept=N.
MATCH FILES / FILE raw / TABLE * / BY id.
LIST.


Jignesh Sutar wrote
David, That looks like it would give you total number of VISITS per patient AND department, not the unique number of DEPARTMENTS visited per patient ONLY as OP desired?
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me.
---
"Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis."
Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?"
Reply | Threaded
Open this post in threaded view
|

Re: Count distinct using aggregate

David Marso
Administrator
In reply to this post by nessie
DATA LIST FREE / id department.
BEGIN DATA.
1 2 1 3 1 2   2 1 2 1   3 3   4 1 4 2 4 2 4 5
END DATA.
SORT CASES BY id.
DATASET NAME raw.
FORMATS department (F1).
AGGREGATE OUTFILE *  /BREAK ID department /Count=N.
CASESTOVARS /ID=id /INDEX=department.
COMPUTE @=$SYSMIS.
COMPUTE n_dept=NVALID(id TO @)-1.
MATCH FILES / FILE * / DROP @.
LIST.
 
      id Count.1 Count.2 Count.3 Count.5   n_dept
 
    1.00       .       2       1       .     2.00
    2.00       2       .       .       .     1.00
    3.00       .       .       1       .     1.00
    4.00       1       2       .       1     3.00

Number of cases read:  4    Number of cases listed:  4
nessie wrote
If it's also possible to list all the distinct different departments as dept1, dept2, dept... and assign a number to how many time each distinct/unique dept has occurred (dept1_n) that would have been superb.
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me.
---
"Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis."
Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?"
Reply | Threaded
Open this post in threaded view
|

Re: Count distinct using aggregate

Jignesh Sutar
In reply to this post by David Marso
That's a neat way of doing it also. Avoids a multi variable SORT so perhaps preferable in terms of speed performance with large datasets (but you do then have to juggle/match with multiple datasets).

I do feel there still should be a more direct way of doing this in SPSS. A simple DISTINCT function in AGGREGATE would be very much apt for this.

Reply | Threaded
Open this post in threaded view
|

Re: Count distinct using aggregate

David Marso
Administrator
Actually the SORT and DATASET were leftover from previous solution.
It could be as simple as follows: Note the raw data are replaced by the AGGREGATE.
--
DATA LIST FREE / id department.
BEGIN DATA.
1 2 1 3 1 2   2 1 2 1   3 3   4 1 4 2 4 2 4 5
END DATA.

FORMATS department (F1).
AGGREGATE OUTFILE *  /BREAK ID department /Count=N.
CASESTOVARS /ID=id /INDEX=department.
COMPUTE @=$SYSMIS.
COMPUTE n_dept=NVALID(id TO @)-1.
MATCH FILES / FILE * / DROP @.
LIST.

Jignesh Sutar wrote
That's a neat way of doing it also. Avoids a multi variable SORT so perhaps preferable in terms of speed performance with large datasets (but you do then have to juggle/match with multiple datasets).

I do feel there still should be a more direct way of doing this in SPSS. A simple DISTINCT function in AGGREGATE would be very much apt for this.
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me.
---
"Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis."
Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?"
Reply | Threaded
Open this post in threaded view
|

Re: Count distinct using aggregate

nessie
Dear David and Jignesh

Thank’s a lot - I would go as far as to say you guys are geniuses.
I still agree with Jignesh that this should be implemented as a “distinct count” (and maybe also a “list distinct”) under the “aggregate” function in SPSS, but thank’s to you I can do the same without it. I bet this could also be useful for a lot of other SPSS users.

One more question:
Is it possible to do this across multiple variables for each case?

E.g.:
A patient could have several variables describing diagnosis codes for each case. One for main diagnose and quite a few to describe possible bi diagnosis. Each code could only be present one time for each case, but the same code could be found in different places (main and different bi diagnosis variables) across cases with the same id.

I want to count the number of distinct diagnosis codes for each id across cases and code-variables.

Example data:

DATA LIST LIST / 
id * di_m (A1) di_b1 (A1) di_b2 (A1) di_b3 (A1). 
BEGIN DATA. 
1, S, G
1, D, S
1, B, X
2, A, B
2, A
3, C
4, A
4, A
4, H, J 
4, A, Y, U, J
END DATA. 

LIST. 

The output would look like this:

id di_m di_b1 di_b2 di_b3
1.00 S G
1.00 D S
1.00 B X
2.00 A B
2.00 A
3.00 C
4.00 A
4.00 A
4.00 H J
4.00 A Y U J

Desired outcome (the n_distinct_di is most important, the rest is a bonus):

id A B C D G H J S U X Y n_di n_distinct_di
1 1 1 1 2 1 6 5
2 2 1 3 2
3 1 1 1
4 3 1 2 1 1 8 5

Even more desired outcome (there is almost 20 000 different diagnosis codes, so this would make it easier to interpret and analyse):

id di_1 di_2 di_3 di_4 di_5 n_di n_distinct_di
1 S G D B X 6 5
2 A B 3 2
3 C 1 1
4 A H J Y U 8 5

Thanks a lot for all you help and effort to find smart and quick solutions. Regardless of if you can help me with this last question or not your help has been priceless.

Best regards
Lars N.

5. jun. 2015 kl. 17.39 skrev David Marso [via SPSSX Discussion] <[hidden email]>:

Actually the SORT and DATASET were leftover from previous solution.
It could be as simple as follows: Note the raw data are replaced by the AGGREGATE.
--
DATA LIST FREE / id department.
BEGIN DATA.
1 2 1 3 1 2   2 1 2 1   3 3   4 1 4 2 4 2 4 5
END DATA.

FORMATS department (F1).
AGGREGATE OUTFILE *  /BREAK ID department /Count=N.
CASESTOVARS /ID=id /INDEX=department.
COMPUTE @=$SYSMIS.
COMPUTE n_dept=NVALID(id TO @)-1.
MATCH FILES / FILE * / DROP @.
LIST.

Jignesh Sutar wrote
That's a neat way of doing it also. Avoids a multi variable SORT so perhaps preferable in terms of speed performance with large datasets (but you do then have to juggle/match with multiple datasets).

I do feel there still should be a more direct way of doing this in SPSS. A simple DISTINCT function in AGGREGATE would be very much apt for this.
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me.
---
"Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis."
Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?"



If you reply to this email, your message will be added to the discussion below:
http://spssx-discussion.1045642.n5.nabble.com/Count-distinct-using-aggregate-tp5729756p5729769.html
To unsubscribe from Count distinct using aggregate, click here.
NAML

Reply | Threaded
Open this post in threaded view
|

Re: Count distinct using aggregate

Jignesh Sutar
You can get a distinct number of values across variables with each case using VECTOR/LOOP, I have some code to do this but can't remember on the fly at the moment, neither on my laptop.

You cold always restructure the dataset so you have a long format PATIENT-DIAGNOSIS level dataset (probably preferable in most cases) and then apply same logic as previously to get distinct diagnosis codes within each patient, which is what I think you are trying to do?
On Sat, 6 Jun 2015 at 22:53, nessie <[hidden email]> wrote:
Dear David and Jignesh

Thank’s a lot - I would go as far as to say you guys are geniuses.
I still agree with Jignesh that this should be implemented as a “distinct count” (and maybe also a “list distinct”) under the “aggregate” function in SPSS, but thank’s to you I can do the same without it. I bet this could also be useful for a lot of other SPSS users.

One more question:
Is it possible to do this across multiple variables for each case?

E.g.:
A patient could have several variables describing diagnosis codes for each case. One for main diagnose and quite a few to describe possible bi diagnosis. Each code could only be present one time for each case, but the same code could be found in different places (main and different bi diagnosis variables) across cases with the same id.

I want to count the number of distinct diagnosis codes for each id across cases and code-variables.

Example data:

DATA LIST LIST / 
id * di_m (A1) di_b1 (A1) di_b2 (A1) di_b3 (A1). 
BEGIN DATA. 
1, S, G
1, D, S
1, B, X
2, A, B
2, A
3, C
4, A
4, A
4, H, J 
4, A, Y, U, J
END DATA. 

LIST. 

The output would look like this:

id di_m di_b1 di_b2 di_b3
1.00 S G
1.00 D S
1.00 B X
2.00 A B
2.00 A
3.00 C
4.00 A
4.00 A
4.00 H J
4.00 A Y U J

Desired outcome (the n_distinct_di is most important, the rest is a bonus):

id A B C D G H J S U X Y n_di n_distinct_di
1 1 1 1 2 1 6 5
2 2 1 3 2
3 1 1 1
4 3 1 2 1 1 8 5

Even more desired outcome (there is almost 20 000 different diagnosis codes, so this would make it easier to interpret and analyse):

id di_1 di_2 di_3 di_4 di_5 n_di n_distinct_di
1 S G D B X 6 5
2 A B 3 2
3 C 1 1
4 A H J Y U 8 5

Thanks a lot for all you help and effort to find smart and quick solutions. Regardless of if you can help me with this last question or not your help has been priceless.

Best regards
Lars N.

5. jun. 2015 kl. 17.39 skrev David Marso [via SPSSX Discussion] <[hidden email]>:

Actually the SORT and DATASET were leftover from previous solution.
It could be as simple as follows: Note the raw data are replaced by the AGGREGATE.
--
DATA LIST FREE / id department.
BEGIN DATA.
1 2 1 3 1 2   2 1 2 1   3 3   4 1 4 2 4 2 4 5
END DATA.

FORMATS department (F1).
AGGREGATE OUTFILE *  /BREAK ID department /Count=N.
CASESTOVARS /ID=id /INDEX=department.
COMPUTE @=$SYSMIS.
COMPUTE n_dept=NVALID(id TO @)-1.
MATCH FILES / FILE * / DROP @.
LIST.

Jignesh Sutar wrote
That's a neat way of doing it also. Avoids a multi variable SORT so perhaps preferable in terms of speed performance with large datasets (but you do then have to juggle/match with multiple datasets).

I do feel there still should be a more direct way of doing this in SPSS. A simple DISTINCT function in AGGREGATE would be very much apt for this.
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me.
---
"Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis."
Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?"



If you reply to this email, your message will be added to the discussion below:
http://spssx-discussion.1045642.n5.nabble.com/Count-distinct-using-aggregate-tp5729756p5729769.html
To unsubscribe from Count distinct using aggregate, click here.
NAML



View this message in context: Re: Count distinct using aggregate
Sent from the SPSSX Discussion mailing list archive at Nabble.com.
===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Count distinct using aggregate

nessie
Thanks a lot Jignesh, I will try to follow your tips and look into it.
If I restructure my data as you suggest I agree that the same logic should give me my desired distinct count of diagnosis.

Best regards
Lars N.

7. jun. 2015 kl. 00.11 skrev Jignesh Sutar [via SPSSX Discussion] <[hidden email]>:

You can get a distinct number of values across variables with each case using VECTOR/LOOP, I have some code to do this but can't remember on the fly at the moment, neither on my laptop.

You cold always restructure the dataset so you have a long format PATIENT-DIAGNOSIS level dataset (probably preferable in most cases) and then apply same logic as previously to get distinct diagnosis codes within each patient, which is what I think you are trying to do?
On Sat, 6 Jun 2015 at 22:53, nessie <<a href="x-msg://17/user/SendEmail.jtp?type=node&amp;node=5729774&amp;i=0" target="_top" rel="nofollow" link="external" class="">[hidden email]> wrote:
Dear David and Jignesh

Thank’s a lot - I would go as far as to say you guys are geniuses.
I still agree with Jignesh that this should be implemented as a “distinct count” (and maybe also a “list distinct”) under the “aggregate” function in SPSS, but thank’s to you I can do the same without it. I bet this could also be useful for a lot of other SPSS users.

One more question:
Is it possible to do this across multiple variables for each case?

E.g.:
A patient could have several variables describing diagnosis codes for each case. One for main diagnose and quite a few to describe possible bi diagnosis. Each code could only be present one time for each case, but the same code could be found in different places (main and different bi diagnosis variables) across cases with the same id.

I want to count the number of distinct diagnosis codes for each id across cases and code-variables.

Example data:

DATA LIST LIST / 
id * di_m (A1) di_b1 (A1) di_b2 (A1) di_b3 (A1). 
BEGIN DATA. 
1, S, G
1, D, S
1, B, X
2, A, B
2, A
3, C
4, A
4, A
4, H, J 
4, A, Y, U, J
END DATA. 

LIST. 

The output would look like this:

id di_m di_b1 di_b2 di_b3
1.00 S G
1.00 D S
1.00 B X
2.00 A B
2.00 A
3.00 C
4.00 A
4.00 A
4.00 H J
4.00 A Y U J

Desired outcome (the n_distinct_di is most important, the rest is a bonus):

id A B C D G H J S U X Y n_di n_distinct_di
1 1 1 1 2 1 6 5
2 2 1 3 2
3 1 1 1
4 3 1 2 1 1 8 5

Even more desired outcome (there is almost 20 000 different diagnosis codes, so this would make it easier to interpret and analyse):

id di_1 di_2 di_3 di_4 di_5 n_di n_distinct_di
1 S G D B X 6 5
2 A B 3 2
3 C 1 1
4 A H J Y U 8 5

Thanks a lot for all you help and effort to find smart and quick solutions. Regardless of if you can help me with this last question or not your help has been priceless.

Best regards
Lars N.

5. jun. 2015 kl. 17.39 skrev David Marso [via SPSSX Discussion] <[hidden email]>:

Actually the SORT and DATASET were leftover from previous solution.
It could be as simple as follows: Note the raw data are replaced by the AGGREGATE.
--
DATA LIST FREE / id department.
BEGIN DATA.
1 2 1 3 1 2   2 1 2 1   3 3   4 1 4 2 4 2 4 5
END DATA.

FORMATS department (F1).
AGGREGATE OUTFILE *  /BREAK ID department /Count=N.
CASESTOVARS /ID=id /INDEX=department.
COMPUTE @=$SYSMIS.
COMPUTE n_dept=NVALID(id TO @)-1.
MATCH FILES / FILE * / DROP @.
LIST.

Jignesh Sutar wrote
That's a neat way of doing it also. Avoids a multi variable SORT so perhaps preferable in terms of speed performance with large datasets (but you do then have to juggle/match with multiple datasets).

I do feel there still should be a more direct way of doing this in SPSS. A simple DISTINCT function in AGGREGATE would be very much apt for this.
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me.
---
"Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis."
Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?"



If you reply to this email, your message will be added to the discussion below:
http://spssx-discussion.1045642.n5.nabble.com/Count-distinct-using-aggregate-tp5729756p5729769.html
To unsubscribe from Count distinct using aggregate, click here.
NAML



View this message in context: Re: Count distinct using aggregate
Sent from the SPSSX Discussion mailing list archive at Nabble.com.
===================== To manage your subscription to SPSSX-L, send a message to <a href="x-msg://17/user/SendEmail.jtp?type=node&amp;node=5729774&amp;i=1" target="_top" rel="nofollow" link="external" class="">[hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD ===================== To manage your subscription to SPSSX-L, send a message to <a href="x-msg://17/user/SendEmail.jtp?type=node&amp;node=5729774&amp;i=2" target="_top" rel="nofollow" link="external" class="">[hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD


If you reply to this email, your message will be added to the discussion below:
http://spssx-discussion.1045642.n5.nabble.com/Count-distinct-using-aggregate-tp5729756p5729774.html
To unsubscribe from Count distinct using aggregate, click here.
NAML

Reply | Threaded
Open this post in threaded view
|

Re: Count distinct using aggregate

David Marso
Administrator
In reply to this post by Jignesh Sutar
Definitely do the restructure using VARSTOCASES.  Then the identical logic applies to get your desired result.
I suspect that will be much easier than doing the vector/loop with an indefinite number of possible targets.


Jignesh Sutar wrote
You can get a distinct number of values across variables with each case
using VECTOR/LOOP, I have some code to do this but can't remember on the
fly at the moment, neither on my laptop.

You cold always restructure the dataset so you have a long format
PATIENT-DIAGNOSIS level dataset (probably preferable in most cases) and
then apply same logic as previously to get distinct diagnosis codes within
each patient, which is what I think you are trying to do?
On Sat, 6 Jun 2015 at 22:53, nessie <[hidden email]> wrote:

> Dear David and Jignesh
>
> Thank’s a lot - I would go as far as to say you guys are geniuses.
> I still agree with Jignesh that this should be implemented as a “distinct
> count” (and maybe also a “list distinct”) under the “aggregate” function in
> SPSS, but thank’s to you I can do the same without it. I bet this could
> also be useful for a lot of other SPSS users.
>
> *One more question:*
> Is it possible to do this across multiple variables for each case?
>
> E.g.:
> A patient could have several variables describing diagnosis codes for each
> case. One for main diagnose and quite a few to describe possible bi
> diagnosis. Each code could only be present one time for each case, but the
> same code could be found in different places (main and different bi
> diagnosis variables) across cases with the same id.
>
> I want to count the number of distinct diagnosis codes for each id across
> cases and code-variables.
>
> *Example data:*
>
> DATA LIST LIST /
> id * di_m (A1) di_b1 (A1) di_b2 (A1) di_b3 (A1).
> BEGIN DATA.
> 1, S, G
> 1, D, S
> 1, B, X
> 2, A, B
> 2, A
> 3, C
> 4, A
> 4, A
> 4, H, J
> 4, A, Y, U, J
> END DATA.
>
> LIST.
>
> *The output would look like this:*
>
> id di_m di_b1 di_b2 di_b3
> 1.00 S G
> 1.00 D S
> 1.00 B X
> 2.00 A B
> 2.00 A
> 3.00 C
> 4.00 A
> 4.00 A
> 4.00 H J
> 4.00 A Y U J
>
> *Desired outcome *(the n_distinct_di is most important, the rest is a
> bonus)*:*
>
> id A B C D G H J S U X Y n_di *n_distinct_di*
> 1 1 1 1 2 1 6 *5*
> 2 2 1 3 *2*
> 3 1 1 *1*
> 4 3 1 2 1 1 8 *5*
>
> *Even more desired outcome (there is almost 20 000 different diagnosis
> codes, so this would make it easier to interpret and analyse):*
>
> id di_1 di_2 di_3 di_4 di_5 n_di *n_distinct_di*
> 1 S G D B X 6 *5*
> 2 A B 3 *2*
> 3 C 1 *1*
> 4 A H J Y U 8 *5*
>
> Thanks a lot for all you help and effort to find smart and quick
> solutions. Regardless of if you can help me with this last question or not
> your help has been priceless.
>
> Best regards
> Lars N.
>
> 5. jun. 2015 kl. 17.39 skrev David Marso [via SPSSX Discussion] <[hidden
> email] <http:///user/SendEmail.jtp?type=node&node=5729773&i=0>>:
>
> Actually the SORT and DATASET were leftover from previous solution.
> It could be as simple as follows: Note the raw data are replaced by the
> AGGREGATE.
> --
> DATA LIST FREE / id department.
> BEGIN DATA.
> 1 2 1 3 1 2   2 1 2 1   3 3   4 1 4 2 4 2 4 5
> END DATA.
>
> FORMATS department (F1).
> AGGREGATE OUTFILE *  /BREAK ID department /Count=N.
> CASESTOVARS /ID=id /INDEX=department.
> COMPUTE @=$SYSMIS.
> COMPUTE n_dept=NVALID(id TO @)-1.
> MATCH FILES / FILE * / DROP @.
> LIST.
>
> Jignesh Sutar wrote
> That's a neat way of doing it also. Avoids a multi variable SORT so
> perhaps preferable in terms of speed performance with large datasets (but
> you do then have to juggle/match with multiple datasets).
>
> I do feel there still should be a more direct way of doing this in SPSS. A
> simple DISTINCT function in AGGREGATE would be very much apt for this.
>
>  Please reply to the list and not to my personal email.
>
> Those desiring my consulting or training services please feel free to
> email me.
> ---
> "Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos
> ne forte conculcent eas pedibus suis."
> Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in
> abyssum?"
>
>
>
> ------------------------------
>  If you reply to this email, your message will be added to the discussion
> below:
>
> http://spssx-discussion.1045642.n5.nabble.com/Count-distinct-using-aggregate-tp5729756p5729769.html
>  To unsubscribe from Count distinct using aggregate, click here.
> NAML
> <http://spssx-discussion.1045642.n5.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml>
>
>
>
> ------------------------------
> View this message in context: Re: Count distinct using aggregate
> <http://spssx-discussion.1045642.n5.nabble.com/Count-distinct-using-aggregate-tp5729756p5729773.html>
> Sent from the SPSSX Discussion mailing list archive
> <http://spssx-discussion.1045642.n5.nabble.com/> at Nabble.com.
> ===================== To manage your subscription to SPSSX-L, send a
> message to [hidden email] (not to SPSSX-L), with no body text
> except the command. To leave the list, send the command SIGNOFF SPSSX-L For
> a list of commands to manage subscriptions, send the command INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me.
---
"Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis."
Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?"
Reply | Threaded
Open this post in threaded view
|

Re: Count distinct using aggregate

David Marso
Administrator
DATA LIST LIST /
id * di_m (A1) di_b1 (A1) di_b2 (A1) di_b3 (A1).
BEGIN DATA.
1, S, G
1, D, S
1, B, X
2, A, B
2, A
3, C
4, A
4, A
4, H, J
4, A, Y, U, J
END DATA.
VARSTOCASES /MAKE Di FROM di_m  di_b1 di_b2 di_b3.
AGGREGATE OUTFILE * / BREAK id Di/ Count=N.
AGGREGATE OUTFILE * MODE ADDVARIABLES/ BREAK id/ Distinct=N.
/* First version (probably unweildy with 20K possible D(i)
/*CASESTOVARS ID=id /INDEX=Di.
CASESTOVARS ID=id .

David Marso wrote
Definitely do the restructure using VARSTOCASES.  Then the identical logic applies to get your desired result.
I suspect that will be much easier than doing the vector/loop with an indefinite number of possible targets.


Jignesh Sutar wrote
You can get a distinct number of values across variables with each case
using VECTOR/LOOP, I have some code to do this but can't remember on the
fly at the moment, neither on my laptop.

You cold always restructure the dataset so you have a long format
PATIENT-DIAGNOSIS level dataset (probably preferable in most cases) and
then apply same logic as previously to get distinct diagnosis codes within
each patient, which is what I think you are trying to do?
On Sat, 6 Jun 2015 at 22:53, nessie <[hidden email]> wrote:

> Dear David and Jignesh
>
> Thank’s a lot - I would go as far as to say you guys are geniuses.
> I still agree with Jignesh that this should be implemented as a “distinct
> count” (and maybe also a “list distinct”) under the “aggregate” function in
> SPSS, but thank’s to you I can do the same without it. I bet this could
> also be useful for a lot of other SPSS users.
>
> *One more question:*
> Is it possible to do this across multiple variables for each case?
>
> E.g.:
> A patient could have several variables describing diagnosis codes for each
> case. One for main diagnose and quite a few to describe possible bi
> diagnosis. Each code could only be present one time for each case, but the
> same code could be found in different places (main and different bi
> diagnosis variables) across cases with the same id.
>
> I want to count the number of distinct diagnosis codes for each id across
> cases and code-variables.
>
> *Example data:*
>
> DATA LIST LIST /
> id * di_m (A1) di_b1 (A1) di_b2 (A1) di_b3 (A1).
> BEGIN DATA.
> 1, S, G
> 1, D, S
> 1, B, X
> 2, A, B
> 2, A
> 3, C
> 4, A
> 4, A
> 4, H, J
> 4, A, Y, U, J
> END DATA.
>
> LIST.
>
> *The output would look like this:*
>
> id di_m di_b1 di_b2 di_b3
> 1.00 S G
> 1.00 D S
> 1.00 B X
> 2.00 A B
> 2.00 A
> 3.00 C
> 4.00 A
> 4.00 A
> 4.00 H J
> 4.00 A Y U J
>
> *Desired outcome *(the n_distinct_di is most important, the rest is a
> bonus)*:*
>
> id A B C D G H J S U X Y n_di *n_distinct_di*
> 1 1 1 1 2 1 6 *5*
> 2 2 1 3 *2*
> 3 1 1 *1*
> 4 3 1 2 1 1 8 *5*
>
> *Even more desired outcome (there is almost 20 000 different diagnosis
> codes, so this would make it easier to interpret and analyse):*
>
> id di_1 di_2 di_3 di_4 di_5 n_di *n_distinct_di*
> 1 S G D B X 6 *5*
> 2 A B 3 *2*
> 3 C 1 *1*
> 4 A H J Y U 8 *5*
>
> Thanks a lot for all you help and effort to find smart and quick
> solutions. Regardless of if you can help me with this last question or not
> your help has been priceless.
>
> Best regards
> Lars N.
>
> 5. jun. 2015 kl. 17.39 skrev David Marso [via SPSSX Discussion] <[hidden
> email] <http:///user/SendEmail.jtp?type=node&node=5729773&i=0>>:
>
> Actually the SORT and DATASET were leftover from previous solution.
> It could be as simple as follows: Note the raw data are replaced by the
> AGGREGATE.
> --
> DATA LIST FREE / id department.
> BEGIN DATA.
> 1 2 1 3 1 2   2 1 2 1   3 3   4 1 4 2 4 2 4 5
> END DATA.
>
> FORMATS department (F1).
> AGGREGATE OUTFILE *  /BREAK ID department /Count=N.
> CASESTOVARS /ID=id /INDEX=department.
> COMPUTE @=$SYSMIS.
> COMPUTE n_dept=NVALID(id TO @)-1.
> MATCH FILES / FILE * / DROP @.
> LIST.
>
> Jignesh Sutar wrote
> That's a neat way of doing it also. Avoids a multi variable SORT so
> perhaps preferable in terms of speed performance with large datasets (but
> you do then have to juggle/match with multiple datasets).
>
> I do feel there still should be a more direct way of doing this in SPSS. A
> simple DISTINCT function in AGGREGATE would be very much apt for this.
>
>  Please reply to the list and not to my personal email.
>
> Those desiring my consulting or training services please feel free to
> email me.
> ---
> "Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos
> ne forte conculcent eas pedibus suis."
> Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in
> abyssum?"
>
>
>
> ------------------------------
>  If you reply to this email, your message will be added to the discussion
> below:
>
> http://spssx-discussion.1045642.n5.nabble.com/Count-distinct-using-aggregate-tp5729756p5729769.html
>  To unsubscribe from Count distinct using aggregate, click here.
> NAML
> <http://spssx-discussion.1045642.n5.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml>
>
>
>
> ------------------------------
> View this message in context: Re: Count distinct using aggregate
> <http://spssx-discussion.1045642.n5.nabble.com/Count-distinct-using-aggregate-tp5729756p5729773.html>
> Sent from the SPSSX Discussion mailing list archive
> <http://spssx-discussion.1045642.n5.nabble.com/> at Nabble.com.
> ===================== To manage your subscription to SPSSX-L, send a
> message to [hidden email] (not to SPSSX-L), with no body text
> except the command. To leave the list, send the command SIGNOFF SPSSX-L For
> a list of commands to manage subscriptions, send the command INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me.
---
"Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis."
Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?"
Reply | Threaded
Open this post in threaded view
|

Re: Count distinct using aggregate

nessie
Thank’s again David

I made it using VARSTOCASES and your original genius syntax for aggregating departments, but I will try this first thing tomorrow.

You are a true hero

Best regards
Lars N.


7. jun. 2015 kl. 18.49 skrev David Marso [via SPSSX Discussion] <[hidden email]>:

DATA LIST LIST / 
id * di_m (A1) di_b1 (A1) di_b2 (A1) di_b3 (A1). 
BEGIN DATA. 
1, S, G 
1, D, S 
1, B, X 
2, A, B 
2, A 
3, C 
4, A 
4, A 
4, H, J 
4, A, Y, U, J 
END DATA. 
VARSTOCASES /MAKE Di FROM di_m  di_b1 di_b2 di_b3. 
AGGREGATE OUTFILE * / BREAK id Di/ Count=N. 
AGGREGATE OUTFILE * MODE ADDVARIABLES/ BREAK id/ Distinct=N. 
/* First version (probably unweildy with 20K possible D(i) 
/*CASESTOVARS ID=id /INDEX=Di. 
CASESTOVARS ID=id . 

David Marso wrote
Definitely do the restructure using VARSTOCASES.  Then the identical logic applies to get your desired result. 
I suspect that will be much easier than doing the vector/loop with an indefinite number of possible targets. 


Jignesh Sutar wrote
You can get a distinct number of values across variables with each case 
using VECTOR/LOOP, I have some code to do this but can't remember on the 
fly at the moment, neither on my laptop. 

You cold always restructure the dataset so you have a long format 
PATIENT-DIAGNOSIS level dataset (probably preferable in most cases) and 
then apply same logic as previously to get distinct diagnosis codes within 
each patient, which is what I think you are trying to do? 
On Sat, 6 Jun 2015 at 22:53, nessie <<a href="x-msg://20/user/SendEmail.jtp?type=node&amp;node=5729778&amp;i=0" target="_top" rel="nofollow" link="external" class="">[hidden email]> wrote: 

> Dear David and Jignesh 
> 
> Thank’s a lot - I would go as far as to say you guys are geniuses. 
> I still agree with Jignesh that this should be implemented as a “distinct 
> count” (and maybe also a “list distinct”) under the “aggregate” function in 
> SPSS, but thank’s to you I can do the same without it. I bet this could 
> also be useful for a lot of other SPSS users. 
> 
> *One more question:* 
> Is it possible to do this across multiple variables for each case? 
> 
> E.g.: 
> A patient could have several variables describing diagnosis codes for each 
> case. One for main diagnose and quite a few to describe possible bi 
> diagnosis. Each code could only be present one time for each case, but the 
> same code could be found in different places (main and different bi 
> diagnosis variables) across cases with the same id. 
> 
> I want to count the number of distinct diagnosis codes for each id across 
> cases and code-variables. 
> 
> *Example data:* 
> 
> DATA LIST LIST / 
> id * di_m (A1) di_b1 (A1) di_b2 (A1) di_b3 (A1). 
> BEGIN DATA. 
> 1, S, G 
> 1, D, S 
> 1, B, X 
> 2, A, B 
> 2, A 
> 3, C 
> 4, A 
> 4, A 
> 4, H, J 
> 4, A, Y, U, J 
> END DATA. 
> 
> LIST. 
> 
> *The output would look like this:* 
> 
> id di_m di_b1 di_b2 di_b3 
> 1.00 S G 
> 1.00 D S 
> 1.00 B X 
> 2.00 A B 
> 2.00 A 
> 3.00 C 
> 4.00 A 
> 4.00 A 
> 4.00 H J 
> 4.00 A Y U J 
> 
> *Desired outcome *(the n_distinct_di is most important, the rest is a 
> bonus)*:* 
> 
> id A B C D G H J S U X Y n_di *n_distinct_di* 
> 1 1 1 1 2 1 6 *5* 
> 2 2 1 3 *2* 
> 3 1 1 *1* 
> 4 3 1 2 1 1 8 *5* 
> 
> *Even more desired outcome (there is almost 20 000 different diagnosis 
> codes, so this would make it easier to interpret and analyse):* 
> 
> id di_1 di_2 di_3 di_4 di_5 n_di *n_distinct_di* 
> 1 S G D B X 6 *5* 
> 2 A B 3 *2* 
> 3 C 1 *1* 
> 4 A H J Y U 8 *5* 
> 
> Thanks a lot for all you help and effort to find smart and quick 
> solutions. Regardless of if you can help me with this last question or not 
> your help has been priceless. 
> 
> Best regards 
> Lars N. 
> 
> 5. jun. 2015 kl. 17.39 skrev David Marso [via SPSSX Discussion] <[hidden 
> email] <http:///user/SendEmail.jtp?type=node&node=5729773&i=0>>: 
> 
> Actually the SORT and DATASET were leftover from previous solution. 
> It could be as simple as follows: Note the raw data are replaced by the 
> AGGREGATE. 
> -- 
> DATA LIST FREE / id department. 
> BEGIN DATA. 
> 1 2 1 3 1 2   2 1 2 1   3 3   4 1 4 2 4 2 4 5 
> END DATA. 
> 
> FORMATS department (F1). 
> AGGREGATE OUTFILE *  /BREAK ID department /Count=N. 
> CASESTOVARS /ID=id /INDEX=department. 
> COMPUTE @=$SYSMIS. 
> COMPUTE n_dept=NVALID(id TO @)-1. 
> MATCH FILES / FILE * / DROP @. 
> LIST. 
> 
> Jignesh Sutar wrote 
> That's a neat way of doing it also. Avoids a multi variable SORT so 
> perhaps preferable in terms of speed performance with large datasets (but 
> you do then have to juggle/match with multiple datasets). 
> 
> I do feel there still should be a more direct way of doing this in SPSS. A 
> simple DISTINCT function in AGGREGATE would be very much apt for this. 
> 
>  Please reply to the list and not to my personal email. 
> 
> Those desiring my consulting or training services please feel free to 
> email me. 
> --- 
> "Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos 
> ne forte conculcent eas pedibus suis." 
> Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in 
> abyssum?" 
> 
> 
> 
> ------------------------------ 
>  If you reply to this email, your message will be added to the discussion 
> below: 
> 
> http://spssx-discussion.1045642.n5.nabble.com/Count-distinct-using-aggregate-tp5729756p5729769.html
>  To unsubscribe from Count distinct using aggregate, click here. 
> NAML 
> <http://spssx-discussion.1045642.n5.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml>
> 
> 
> 
> ------------------------------ 
> View this message in context: Re: Count distinct using aggregate 
> <http://spssx-discussion.1045642.n5.nabble.com/Count-distinct-using-aggregate-tp5729756p5729773.html>
> Sent from the SPSSX Discussion mailing list archive 
> <http://spssx-discussion.1045642.n5.nabble.com/> at Nabble.com. 
> ===================== To manage your subscription to SPSSX-L, send a 
> message to <a href="x-msg://20/user/SendEmail.jtp?type=node&amp;node=5729778&amp;i=1" target="_top" rel="nofollow" link="external" class="">[hidden email] (not to SPSSX-L), with no body text 
> except the command. To leave the list, send the command SIGNOFF SPSSX-L For 
> a list of commands to manage subscriptions, send the command INFO REFCARD 

===================== 
To manage your subscription to SPSSX-L, send a message to 
<a href="x-msg://20/user/SendEmail.jtp?type=node&amp;node=5729778&amp;i=2" target="_top" rel="nofollow" link="external" class="">[hidden email] (not to SPSSX-L), with no body text except the 
command. To leave the list, send the command 
SIGNOFF SPSSX-L 
For a list of commands to manage subscriptions, send the command 
INFO REFCARD
Please reply to the list and not to my personal email. 
Those desiring my consulting or training services please feel free to email me. 
--- 
"Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis." 
Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?"



If you reply to this email, your message will be added to the discussion below:
http://spssx-discussion.1045642.n5.nabble.com/Count-distinct-using-aggregate-tp5729756p5729778.html
To unsubscribe from Count distinct using aggregate, click here.
NAML

12