Hi In order to explain my problem, here what’s the situation and what I have done so far: 1.
A dataset with about 30‘000 cases; a case is a course and a person ; when a person attended more than one course, she/he appears equal to the number of courses. 2.
In order to get a person just once I changed cases to variables.
3.
Now I got a person as a case with additional variables : courseID1 to courseID10; year.1 to year.10 (which is the year a course was attended) ; status.1 to status.10 (which is an additional information to the
course). 4.
I figured out with the list’s help to count how many courses are mentioned in the dataset per person, say in the year 2010 : COUNT NumberOfCourses10=year.1 .2 year.3 year.4 year.5 year.6 year.7 year.8 year.9
year.10 (10). But now I have to consider constraints : the variables status1 to status10 contains the information, whether a person really ‘attended’, stayed on the ‘waiting list’ , the course was ‘cancelled’ (and others, always as
strings). I’d like now to count the number of courses a year with the constraint, that this person also attended it, which means for the year 2010: year.1=10 AND status.1=’attended’. So it should count the value 10 in the variable
year.1 only if this course was attended. How can I put this constraint in the syntax ? Thanks ! Tom |
Administrator
|
Thomas,
Please describe your data with greater clarity! Preferably with an example of relevant inputs and desired outputs. For now I will assume that you have 10 contiguous year variables (1..10:Value of interest is say 3) and 5 associated contiguous numeric status variables (0,1 : 1 denotes attended). If you data are otherwise then recode or whatever to adapt your data to these 'constraints'. VECTOR yr=Year.1 TO Year.10 / status=status.1 TO status.5. LOOP #=1 TO 10. + COMPUTE yrstatus=SUM(yrstatus,yr(#) EQ 3 AND status(#)). END LOOP. HTH, David
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me. --- "Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis." Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?" |
In reply to this post by Tom
Thomas, Maybe I don’t understand something, but why can’t you just count status.x variables that have a specific value? Gene Maguin From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Balmer, Thomas Hi In order to explain my problem, here what’s the situation and what I have done so far: 1. A dataset with about 30‘000 cases; a case is a course and a person ; when a person attended more than one course, she/he appears equal to the number of courses. 2. In order to get a person just once I changed cases to variables. 3. Now I got a person as a case with additional variables : courseID1 to courseID10; year.1 to year.10 (which is the year a course was attended) ; status.1 to status.10 (which is an additional information to the course). 4. I figured out with the list’s help to count how many courses are mentioned in the dataset per person, say in the year 2010 : COUNT NumberOfCourses10=year.1 .2 year.3 year.4 year.5 year.6 year.7 year.8 year.9 year.10 (10). But now I have to consider constraints : the variables status1 to status10 contains the information, whether a person really ‘attended’, stayed on the ‘waiting list’ , the course was ‘cancelled’ (and others, always as strings). I’d like now to count the number of courses a year with the constraint, that this person also attended it, which means for the year 2010: year.1=10 AND status.1=’attended’. So it should count the value 10 in the variable year.1 only if this course was attended. How can I put this constraint in the syntax ? Thanks ! Tom |
In reply to this post by Tom
Thomas,
I re-read your message and I think I misunderstood what you are wanting. You want to count the number of courses a person has taken in a year provided that the person actually attended the course as opposed to just signed up for it. If so, theres two ways to do this and (wheres Richard long format Ristow when we need him?) the long format is easier. So, long format: If (status eq attended) attendyear=year. Aggregate outfile=*/break=personid attendyear/courses=nu. * if you want one record per person, then use casestovars here where attendyear is used to index courses. There may be additional complexity that I haven't accounted for. One that occurs to me is what to do with persons who sign up for courses but never take any. They will be sysmis for both attendyear and courses. I don't recall the specifics of how aggregate handles missing values on break variables but I think it keeps them so you may want to delete records with sysmis attendyear or recode attendyear to 0, say. Probably the same for courses. Wide format: Wide format is harder because you have to know the range of values for year unless there is something about the naming that defines the year. So, let's say you done a frequencies or multiple response on year.1 to year.10 and know that the year range is 2004 to 2010. I don't know a way around this but perhaps someone else does. Therefore. numeric attended04 to attended10. Do repeat a=attended04 to attended10. + compute a=0. End repeat. Vector x=year.1 to year.10/y=status.1 to status.10/z=attended04 to attended10. Loop #i=1 to 10. + if (y eq attended) z(x-2003)=z(x-2003)+1. End loop. I'm assuming that z(1) to z(7) and not z(4) to z(10) indexes attended04 to attended10. From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Balmer, Thomas Sent: Thursday, February 23, 2012 5:51 AM To: [hidden email] Subject: Counting and constraints Hi In order to explain my problem, here whats the situation and what I have done so far: 1. A dataset with about 30000 cases; a case is a course and a person ; when a person attended more than one course, she/he appears equal to the number of courses. 2. In order to get a person just once I changed cases to variables. 3. Now I got a person as a case with additional variables : courseID1 to courseID10; year.1 to year.10 (which is the year a course was attended) ; status.1 to status.10 (which is an additional information to the course). 4. I figured out with the lists help to count how many courses are mentioned in the dataset per person, say in the year 2010 : COUNT NumberOfCourses10=year.1 .2 year.3 year.4 year.5 year.6 year.7 year.8 year.9 year.10 (10). But now I have to consider constraints : the variables status1 to status10 contains the information, whether a person really attended, stayed on the waiting list , the course was cancelled (and others, always as strings). Id like now to count the number of courses a year with the constraint, that this person also attended it, which means for the year 2010: year.1=10 AND status.1=attended. So it should count the value 10 in the variable year.1 only if this course was attended. How can I put this constraint in the syntax ? Thanks ! Tom ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
In reply to this post by Tom
Tom, Are you counting courses or courses that were attended or people who attended courses? This seems easier with a tall/vertical file—before casestovars. Crosstab year by status and use the attended column Ns. Melissa From: SPSSX(r) Discussion [mailto:[hidden email]]
On Behalf Of Balmer, Thomas Hi In order to explain my problem, here what’s the situation and what I have done so far: 1.
A dataset with about 30‘000 cases; a case is a course and a person ; when a person attended more than one course, she/he appears equal to the number of courses. 2.
In order to get a person just once I changed cases to variables.
3.
Now I got a person as a case with additional variables : courseID1 to courseID10; year.1 to year.10 (which is the year a course was attended) ; status.1 to status.10 (which is an additional information to the
course). 4.
I figured out with the list’s help to count how many courses are mentioned in the dataset per person, say in the year 2010 : COUNT NumberOfCourses10=year.1 .2 year.3 year.4 year.5 year.6 year.7 year.8 year.9
year.10 (10). But now I have to consider constraints : the variables status1 to status10 contains the information, whether a person really ‘attended’, stayed on the ‘waiting list’ , the course was ‘cancelled’ (and others, always as
strings). I’d like now to count the number of courses a year with the constraint, that this person also attended it, which means for the year 2010: year.1=10 AND status.1=’attended’. So it should count the value 10 in the variable
year.1 only if this course was attended. How can I put this constraint in the syntax ? Thanks ! Tom PRIVILEGED AND CONFIDENTIAL INFORMATION This transmittal and any attachments may contain PRIVILEGED AND CONFIDENTIAL information and is intended only for the use of the addressee. If you are not the designated recipient, or an employee or agent authorized to deliver such transmittals to the designated recipient, you are hereby notified that any dissemination, copying or publication of this transmittal is strictly prohibited. If you have received this transmittal in error, please notify us immediately by replying to the sender and delete this copy from your system. You may also call us at (309) 827-6026 for assistance. |
In reply to this post by Tom
Yes Gene, thats what I’d like to answer to: How many courses a person attended a year, for instances, computing a new variable, which counts the attended courses for each year (Coursenumber2010, coursenumber2011). In order to explain it better, here an example with the orginal datastructure:
I run first the CASESTOVARS-Command with name and city, because there are persons with the same name, but actually beeing different persons. As far as I understand it, your proposed IF-Command gets me the year in the new variable – but what, if this person has attended more than one course in 2010 and also one in 2011 ? |
Administrator
|
Thomas,
This can be resolved by a straightforward use of AGGREGATE. AGGREGATE OUTFILE * / BREAK name city year status / N=N. I could not properly paste you data table for some odd reason it shows as a single column so I won't bother to run this. Follow the AGGREGATE with a CASESTOVARS and you'll be off and running. HTH, David
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me. --- "Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis." Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?" |
Administrator
|
FWIW: Almost EVERY case where one goes from long to wide and then does counting or other calculations across the resulting variables is a first step towards a giant painful process!
Learn your aggregate and what it can do for you! And it always pays to think ahead as to what you really require from the data before butchering it!
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me. --- "Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis." Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?" |
Free forum by Nabble | Edit this page |