Counting and constraints

classic Classic list List threaded Threaded
8 messages Options
Tom
Reply | Threaded
Open this post in threaded view
|

Counting and constraints

Tom

Hi

 

In order to explain my problem, here what’s the situation and what I have done so far:

1.       A dataset with about 30‘000 cases; a case is a course and a person ; when a person attended more than one course, she/he appears equal to the number of courses.

2.       In order to get a person just once I changed cases to variables.

3.       Now I got a person as a case with additional variables : courseID1 to courseID10; year.1 to year.10 (which is the year a course was attended) ; status.1 to status.10 (which is an additional information to the course).

4.       I figured out with the list’s help to count how many courses are mentioned in the dataset per person, say in the year 2010 : COUNT NumberOfCourses10=year.1 .2 year.3 year.4 year.5 year.6 year.7 year.8 year.9 year.10 (10).

 

But now I have to consider constraints : the variables status1 to status10 contains the information, whether a person really ‘attended’, stayed on the ‘waiting list’ , the course was ‘cancelled’ (and others, always as strings).

I’d like now to count the number of courses a year with the constraint, that this person also attended it, which means for the year 2010: year.1=10 AND status.1=’attended’. So it should count the value 10 in the variable year.1 only if this course was attended.

 

How can I put this constraint in the syntax ?

 

Thanks !

Tom

Reply | Threaded
Open this post in threaded view
|

Re: Counting and constraints

David Marso
Administrator
Thomas,
  Please describe your data with greater clarity!
Preferably with an example of relevant inputs and desired outputs.
For now I will assume that you have 10 contiguous year variables (1..10:Value of interest is say 3) and 5 associated contiguous numeric status variables (0,1 : 1 denotes attended).  If you data are otherwise then recode or whatever to adapt your data to these 'constraints'.

VECTOR yr=Year.1 TO Year.10 / status=status.1 TO status.5.
LOOP #=1 TO 10.
+  COMPUTE yrstatus=SUM(yrstatus,yr(#) EQ 3 AND status(#)).
END LOOP.

HTH, David

Balmer Thomas wrote
Hi

In order to explain my problem, here what's the situation and what I have done so far:

1.       A dataset with about 30'000 cases; a case is a course and a person ; when a person attended more than one course, she/he appears equal to the number of courses.

2.       In order to get a person just once I changed cases to variables.

3.       Now I got a person as a case with additional variables : courseID1 to courseID10; year.1 to year.10 (which is the year a course was attended) ; status.1 to status.10 (which is an additional information to the course).

4.       I figured out with the list's help to count how many courses are mentioned in the dataset per person, say in the year 2010 : COUNT NumberOfCourses10=year.1 .2 year.3 year.4 year.5 year.6 year.7 year.8 year.9 year.10 (10).

But now I have to consider constraints : the variables status1 to status10 contains the information, whether a person really 'attended', stayed on the 'waiting list' , the course was 'cancelled' (and others, always as strings).
I'd like now to count the number of courses a year with the constraint, that this person also attended it, which means for the year 2010: year.1=10 AND status.1='attended'. So it should count the value 10 in the variable year.1 only if this course was attended.

How can I put this constraint in the syntax ?

Thanks !
Tom
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me.
---
"Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis."
Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?"
Reply | Threaded
Open this post in threaded view
|

Re: Counting and constraints

Maguin, Eugene
In reply to this post by Tom

Thomas,

 

Maybe I don’t understand something, but why can’t you just count status.x variables that have a specific value?

 

Gene Maguin

 

From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Balmer, Thomas
Sent: Thursday, February 23, 2012 5:51 AM
To: [hidden email]
Subject: Counting and constraints

 

Hi

 

In order to explain my problem, here what’s the situation and what I have done so far:

1.       A dataset with about 30‘000 cases; a case is a course and a person ; when a person attended more than one course, she/he appears equal to the number of courses.

2.       In order to get a person just once I changed cases to variables.

3.       Now I got a person as a case with additional variables : courseID1 to courseID10; year.1 to year.10 (which is the year a course was attended) ; status.1 to status.10 (which is an additional information to the course).

4.       I figured out with the list’s help to count how many courses are mentioned in the dataset per person, say in the year 2010 : COUNT NumberOfCourses10=year.1 .2 year.3 year.4 year.5 year.6 year.7 year.8 year.9 year.10 (10).

 

But now I have to consider constraints : the variables status1 to status10 contains the information, whether a person really ‘attended’, stayed on the ‘waiting list’ , the course was ‘cancelled’ (and others, always as strings).

I’d like now to count the number of courses a year with the constraint, that this person also attended it, which means for the year 2010: year.1=10 AND status.1=’attended’. So it should count the value 10 in the variable year.1 only if this course was attended.

 

How can I put this constraint in the syntax ?

 

Thanks !

Tom

Reply | Threaded
Open this post in threaded view
|

Re: Counting and constraints

Maguin, Eugene
In reply to this post by Tom
Thomas,

I re-read your message and I think I misunderstood what you are wanting. You
want to count the number of courses a person has taken in a year provided
that the person actually attended the course as opposed to just ‘signed up
for it’.

If so, there’s two ways to do this and (where’s Richard ‘long format’ Ristow
when we need him?) the long format is easier.
So, long format:

If (status eq ‘attended’) attendyear=year.
Aggregate outfile=*/break=personid attendyear/courses=nu.
*  if you want one record per person, then use casestovars here where
attendyear is used to index courses. There may be additional complexity that
I haven't accounted for. One that occurs to me is what to do with persons
who sign up for courses but never take any. They will be sysmis for both
attendyear and courses. I don't recall the specifics of how aggregate
handles missing values on break variables but I think it keeps them so you
may want to delete records with sysmis attendyear or recode attendyear to 0,
say. Probably the same for courses.

Wide format:
Wide format is harder because you have to know the range of values for year
unless there is something about the naming that defines the year. So, let's
say you done a frequencies or multiple response on year.1 to year.10 and
know that the year range is 2004 to 2010. I don't know a way around this but
perhaps someone else does. Therefore.

numeric attended04 to attended10.
Do repeat a=attended04 to attended10.
+  compute a=0.
End repeat.
Vector x=year.1 to year.10/y=status.1 to status.10/z=attended04 to
attended10.
Loop #i=1 to 10.
+  if (y eq ‘attended’) z(x-2003)=z(x-2003)+1.
End loop.

I'm assuming that z(1) to z(7) and not z(4) to z(10) indexes attended04 to
attended10.



From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of
Balmer, Thomas
Sent: Thursday, February 23, 2012 5:51 AM
To: [hidden email]
Subject: Counting and constraints

Hi

In order to explain my problem, here what’s the situation and what I have
done so far:
1. A dataset with about 30‘000 cases; a case is a course and a person ; when
a person attended more than one course, she/he appears equal to the number
of courses.
2. In order to get a person just once I changed cases to variables.
3. Now I got a person as a case with additional variables : courseID1 to
courseID10; year.1 to year.10 (which is the year a course was attended) ;
status.1 to status.10 (which is an additional information to the course).
4. I figured out with the list’s help to count how many courses are
mentioned in the dataset per person, say in the year 2010 : COUNT
NumberOfCourses10=year.1 .2 year.3 year.4 year.5 year.6 year.7 year.8 year.9
year.10 (10).

But now I have to consider constraints : the variables status1 to status10
contains the information, whether a person really ‘attended’, stayed on the
‘waiting list’ , the course was ‘cancelled’ (and others, always as strings).
I’d like now to count the number of courses a year with the constraint, that
this person also attended it, which means for the year 2010: year.1=10 AND
status.1=’attended’. So it should count the value 10 in the variable year.1
only if this course was attended.

How can I put this constraint in the syntax ?

Thanks !
Tom

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Counting and constraints

Melissa Ives
In reply to this post by Tom

Tom,

Are you counting courses or courses that were attended or people who attended courses?

 

This seems easier with a tall/vertical file—before casestovars.

 

Crosstab year by status and use the attended column Ns.

 

Melissa

From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Balmer, Thomas
Sent: Thursday, February 23, 2012 4:51 AM
To: [hidden email]
Subject: [SPSSX-L] Counting and constraints

 

Hi

 

In order to explain my problem, here what’s the situation and what I have done so far:

1.       A dataset with about 30‘000 cases; a case is a course and a person ; when a person attended more than one course, she/he appears equal to the number of courses.

2.       In order to get a person just once I changed cases to variables.

3.       Now I got a person as a case with additional variables : courseID1 to courseID10; year.1 to year.10 (which is the year a course was attended) ; status.1 to status.10 (which is an additional information to the course).

4.       I figured out with the list’s help to count how many courses are mentioned in the dataset per person, say in the year 2010 : COUNT NumberOfCourses10=year.1 .2 year.3 year.4 year.5 year.6 year.7 year.8 year.9 year.10 (10).

 

But now I have to consider constraints : the variables status1 to status10 contains the information, whether a person really ‘attended’, stayed on the ‘waiting list’ , the course was ‘cancelled’ (and others, always as strings).

I’d like now to count the number of courses a year with the constraint, that this person also attended it, which means for the year 2010: year.1=10 AND status.1=’attended’. So it should count the value 10 in the variable year.1 only if this course was attended.

 

How can I put this constraint in the syntax ?

 

Thanks !

Tom



PRIVILEGED AND CONFIDENTIAL INFORMATION
This transmittal and any attachments may contain PRIVILEGED AND
CONFIDENTIAL information and is intended only for the use of the
addressee. If you are not the designated recipient, or an employee
or agent authorized to deliver such transmittals to the designated
recipient, you are hereby notified that any dissemination,
copying or publication of this transmittal is strictly prohibited. If
you have received this transmittal in error, please notify us
immediately by replying to the sender and delete this copy from your
system. You may also call us at (309) 827-6026 for assistance.
Tom
Reply | Threaded
Open this post in threaded view
|

Re: Counting and Constraints

Tom
In reply to this post by Tom

Yes Gene, thats what I’d like to answer to: How many courses a person attended a year, for instances, computing a new variable, which counts the attended courses for each year (Coursenumber2010, coursenumber2011).

 

In order to explain it better, here an example with the orginal datastructure:

 

ID-Course

Coursegroup

course

year

name

city

status

11

1

1

2010

A

Bern

attended

11

1

1

2010

B

Boll

attended

11

1

1

2010

C

Muensingen

attended

12

1

2

2010

D

Thun

attended

12

1

2

2010

E

Biel

attended

12

1

2

2010

F

Langenthal

attended

13

1

3

2010

A

Bern

resigned

13

1

3

2010

G

Ittigen

attended

21

2

1

2010

H

Bern

attended

21

2

1

2010

I

Bern

attended

21

2

1

2010

D

Thun

attended

31

3

1

2010

J

Thun

waitinglist

31

3

1

2010

K

Bern

attended

32

3

2

2010

L

Bern

attended

32

3

2

2010

B

Boll

attended

32

3

2

2010

M

Bern

attended

32

3

2

2010

A

Thun

attended

32

3

2

2010

B

Biel

attended

11

1

1

2011

D

Thun

attended

11

1

1

2011

A

Thun

attended

11

1

1

2011

L

Bern

attended

12

1

2

2011

E

Biel

attended

12

1

2

2011

D

Thun

attended

12

1

2

2011

H

Bern

attended

13

1

3

2011

I

Bern

attended

13

1

3

2011

K

Bern

attended

21

2

1

2011

F

Langenthal

attended

21

2

1

2011

M

Bern

attended

21

2

1

2011

G

Ittigen

attended

25

3

1

2011

A

Thun

canceled

25

3

1

2011

I

Bern

canceled

32

3

2

2011

D

Thun

resigned

32

3

2

2011

C

Muensingen

attended

32

3

2

2011

B

Boll

attended

32

3

2

2011

K

Bern

attended

32

3

2

2011

A

Bern

attended

 

I run first the CASESTOVARS-Command with name and city, because there are persons with the same name, but actually beeing different persons.

 

As far as I understand it, your proposed IF-Command gets me the year in the new variable – but what, if this person has attended more than one course in 2010 and also one in 2011 ?

 

 

Reply | Threaded
Open this post in threaded view
|

Re: Counting and Constraints

David Marso
Administrator
Thomas,
This can be resolved by a straightforward use of AGGREGATE.

AGGREGATE OUTFILE * / BREAK name city year status / N=N.

I could not properly paste you data table for some odd reason it shows as a single column so I won't bother to run this.
Follow the AGGREGATE with a CASESTOVARS and you'll be off and running.
HTH, David




Balmer Thomas wrote
Yes Gene, thats what I'd like to answer to: How many courses a person attended a year, for instances, computing a new variable, which counts the attended courses for each year (Coursenumber2010, coursenumber2011).

In order to explain it better, here an example with the orginal datastructure:

ID-Course

Coursegroup

course

year

name

city

status

11

1

1

2010

A

Bern

attended

11

1

1

2010

B

Boll

attended

11

1

1

2010

C

Muensingen

attended

12

1

2

2010

D

Thun

attended

12

1

2

2010

E

Biel

attended

12

1

2

2010

F

Langenthal

attended

13

1

3

2010

A

Bern

resigned

13

1

3

2010

G

Ittigen

attended

21

2

1

2010

H

Bern

attended

21

2

1

2010

I

Bern

attended

21

2

1

2010

D

Thun

attended

31

3

1

2010

J

Thun

waitinglist

31

3

1

2010

K

Bern

attended

32

3

2

2010

L

Bern

attended

32

3

2

2010

B

Boll

attended

32

3

2

2010

M

Bern

attended

32

3

2

2010

A

Thun

attended

32

3

2

2010

B

Biel

attended

11

1

1

2011

D

Thun

attended

11

1

1

2011

A

Thun

attended

11

1

1

2011

L

Bern

attended

12

1

2

2011

E

Biel

attended

12

1

2

2011

D

Thun

attended

12

1

2

2011

H

Bern

attended

13

1

3

2011

I

Bern

attended

13

1

3

2011

K

Bern

attended

21

2

1

2011

F

Langenthal

attended

21

2

1

2011

M

Bern

attended

21

2

1

2011

G

Ittigen

attended

25

3

1

2011

A

Thun

canceled

25

3

1

2011

I

Bern

canceled

32

3

2

2011

D

Thun

resigned

32

3

2

2011

C

Muensingen

attended

32

3

2

2011

B

Boll

attended

32

3

2

2011

K

Bern

attended

32

3

2

2011

A

Bern

attended


I run first the CASESTOVARS-Command with name and city, because there are persons with the same name, but actually beeing different persons.

As far as I understand it, your proposed IF-Command gets me the year in the new variable - but what, if this person has attended more than one course in 2010 and also one in 2011 ?
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me.
---
"Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis."
Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?"
Reply | Threaded
Open this post in threaded view
|

Re: Counting and Constraints

David Marso
Administrator
FWIW:  Almost EVERY case where one goes from long to wide and then  does counting or other calculations across the resulting variables is a first step towards a giant painful process!
Learn your aggregate and what it can do for you!
And it always pays to think ahead as to what you really require from the data before butchering it!

David Marso wrote
Thomas,
This can be resolved by a straightforward use of AGGREGATE.

AGGREGATE OUTFILE * / BREAK name city year status / N=N.

I could not properly paste you data table for some odd reason it shows as a single column so I won't bother to run this.
Follow the AGGREGATE with a CASESTOVARS and you'll be off and running.
HTH, David




Balmer Thomas wrote
Yes Gene, thats what I'd like to answer to: How many courses a person attended a year, for instances, computing a new variable, which counts the attended courses for each year (Coursenumber2010, coursenumber2011).
<SNIP>

As far as I understand it, your proposed IF-Command gets me the year in the new variable - but what, if this person has attended more than one course in 2010 and also one in 2011 ?
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me.
---
"Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis."
Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?"