Automating variable creation based on data values

classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|

Automating variable creation based on data values

Van Overschelde, Jim

Hi all,

 

I have a dataset with a student identifier and the subject/course/section in which the student was enrolled.  I want to create a series of dummy variables that indicates whether a student was enrolled in a particular section (subject/course/section combination).  For example, Student 1234 was enrolled in ENGL, course 1301, section 010.  I want to create a variable ENGL1301010 and code students enrolled in that section with 1 and enrolled in ENGL1301 but a different section with 0.  If the student was not enrolled in any section of ENGL 1301, then the value is sysmis. 

Because there are over 100 subject/course/section combinations, I don’t want to hard code all possibilities.  Can anyone help me with native SPSS code?

 

Thanks,

Jim

 

James (Jim) P. Van Overschelde, PhD

Co-Director, Office of Educator Preparation

College of Education

Texas State Univ.-San Marcos
601 University Dr
San Marcos, TX 78666

512-245-6382

 

 

Reply | Threaded
Open this post in threaded view
|

Re: Automating variable creation based on data values

Andy W
A general approach may be to;

1) concatenate the identifiers (class, course and section) into one string that uniquely identifies each combination
2) use auto-recode to turn that into a numeric variable
3) use vector + do repeat to make the set of dummy variables

I believe there are some python extensions to help make dummy variables (I presume Jon will pitch in to point to them and say whether they can do this in one fell swoop).
Andy W
apwheele@gmail.com
http://andrewpwheeler.wordpress.com/
Reply | Threaded
Open this post in threaded view
|

Re: Automating variable creation based on data values

Jon K Peck
ok, ok.

First do step 1 below.
Then you can use the SPSSINC CREATE DUMMIES extension command (Transform > Create Dummy Variables) to generate a dummy variable for each distinct value.

However, the names of the dummies will all be taken from a specified root name.  They can, however, have variable labels based on the values of the input variables, which would have the requested form in this case using the concatenated variable as the input.  

The command has some other options that can be seen in the dialog box that might be useful.

This extension command requires the Python Essentials via the SPSS Community website (www.ibm.com/developerworks/spssdevcentral).  CREATE DUMMIES is included in the Essentials in recent versions, but it can also be downloaded from the Extension Commands collection on the site.

Jon Peck (no "h") aka Kim
Senior Software Engineer, IBM
[hidden email]
phone: 720-342-5621




From:        Andy W <[hidden email]>
To:        [hidden email],
Date:        08/07/2013 02:32 PM
Subject:        Re: [SPSSX-L] Automating variable creation based on data values
Sent by:        "SPSSX(r) Discussion" <[hidden email]>




A general approach may be to;

1) concatenate the identifiers (class, course and section) into one string
that uniquely identifies each combination
2) use auto-recode to turn that into a numeric variable
3) use vector + do repeat to make the set of dummy variables

I believe there are some python extensions to help make dummy variables (I
presume Jon will pitch in to point to them and say whether they can do this
in one fell swoop).



-----
Andy W
[hidden email]
http://andrewpwheeler.wordpress.com/
--
View this message in context:
http://spssx-discussion.1045642.n5.nabble.com/Automating-variable-creation-based-on-data-values-tp5721506p5721508.html
Sent from the SPSSX Discussion mailing list archive at Nabble.com.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD


Reply | Threaded
Open this post in threaded view
|

Re: Automating variable creation based on data values

David Marso
Administrator
In reply to this post by Van Overschelde, Jim
This will get you ALL but the non course takers being set sysmis.
You are on your own for that step.
Donning my teflon Kilt as this technique is frowned upon by the pythonistas.
Not that I give a hoot, but 'horrible hack' it is ;-)
--
DATA LIST LIST / SubID (F3) Subject (A4) Course (A3) Section (A2).
BEGIN DATA
123 ENGL 101 01
123 MATH 102 01
123 SCIE 100 01
234 ENGL 101 02
234 ENGL 200 01
END DATA.

STRING X (A9).
COMPUTE X=CONCAT(Subject ,Course,Section).
DATASET NAME raw.
DATASET DECLARE agg .
AGGREGATE OUTFILE agg / BREAK Subject Course Section /st=MAX(X)/ Ni=N.
DATASET ACTIVATE agg.

WRITE OUTFILE 'C:\Temp\HorribleHack101.sps'
      /"COMPUTE " st " = (Subject EQ '" Subject "' AND Course EQ '" Course "' AND Section EQ '" Section "').".
EXECUTE.
DATASET ACTIVATE raw.
INSERT FILE= 'C:\Temp\HorribleHack101.sps' .

LIST.

SubID Subject Course Section X         ENGL10101 ENGL10102 ENGL20001 MATH10201 SCIE10001

 123  ENGL    101    01      ENGL10101      1.00       .00       .00       .00       .00
 123  MATH    102    01      MATH10201       .00       .00       .00      1.00       .00
 123  SCIE    100    01      SCIE10001       .00       .00       .00       .00      1.00
 234  ENGL    101    02      ENGL10102       .00      1.00       .00       .00       .00
 234  ENGL    200    01      ENGL20001       .00       .00      1.00       .00       .00


Number of cases read:  5    Number of cases listed:  5


Van Overschelde, Jim wrote
Hi all,

I have a dataset with a student identifier and the subject/course/section in which the student was enrolled.  I want to create a series of dummy variables that indicates whether a student was enrolled in a particular section (subject/course/section combination).  For example, Student 1234 was enrolled in ENGL, course 1301, section 010.  I want to create a variable ENGL1301010 and code students enrolled in that section with 1 and enrolled in ENGL1301 but a different section with 0.  If the student was not enrolled in any section of ENGL 1301, then the value is sysmis.
Because there are over 100 subject/course/section combinations, I don't want to hard code all possibilities.  Can anyone help me with native SPSS code?

Thanks,
Jim

James (Jim) P. Van Overschelde, PhD
Co-Director, Office of Educator Preparation
College of Education
Texas State Univ.-San Marcos
601 University Dr
San Marcos, TX 78666
512-245-6382
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me.
---
"Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis."
Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?"
Reply | Threaded
Open this post in threaded view
|

Re: Automating variable creation based on data values

Jim Van Overschelde
In reply to this post by Jon K Peck

This is easy to use and functionally works to create all the different variables but whether I select Use Value Label or Use Values I get the same variable names.  I used a prefix of SEC and all I get is SEC_1, SEC_2, instead of SEC_ENGL1301001, SEC_ENGL1301002. 

 

Suggestions? 

 

From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Jon K Peck
Sent: Wednesday, August 07, 2013 4:08 PM
To: [hidden email]
Subject: Re: Automating variable creation based on data values

 

ok, ok.

First do step 1 below.
Then you can use the SPSSINC CREATE DUMMIES extension command (Transform > Create Dummy Variables) to generate a dummy variable for each distinct value.

However, the names of the dummies will all be taken from a specified root name.  They can, however, have variable labels based on the values of the input variables, which would have the requested form in this case using the concatenated variable as the input.  

The command has some other options that can be seen in the dialog box that might be useful.

This extension command requires the Python Essentials via the SPSS Community website (www.ibm.com/developerworks/spssdevcentral).  CREATE DUMMIES is included in the Essentials in recent versions, but it can also be downloaded from the Extension Commands collection on the site.

Jon Peck (no "h") aka Kim
Senior Software Engineer, IBM
[hidden email]
phone: 720-342-5621




From:        Andy W <[hidden email]>
To:        [hidden email],
Date:        08/07/2013 02:32 PM
Subject:        Re: [SPSSX-L] Automating variable creation based on data values
Sent by:        "SPSSX(r) Discussion" <[hidden email]>





A general approach may be to;

1) concatenate the identifiers (class, course and section) into one string
that uniquely identifies each combination
2) use auto-recode to turn that into a numeric variable
3) use vector + do repeat to make the set of dummy variables

I believe there are some python extensions to help make dummy variables (I
presume Jon will pitch in to point to them and say whether they can do this
in one fell swoop).



-----
Andy W
[hidden email]
http://andrewpwheeler.wordpress.com/
--
View this message in context:
http://spssx-discussion.1045642.n5.nabble.com/Automating-variable-creation-based-on-data-values-tp5721506p5721508.html
Sent from the SPSSX Discussion mailing list archive at Nabble.com.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

Reply | Threaded
Open this post in threaded view
|

Re: Automating variable creation based on data values

Jon K Peck
Since value label text generally does not conform to variable naming rules, we can't use those labels to make variable names.  The choice you are referring to is how to make the variable labels, not the variable names.  That's what I was trying to say in the note below.


Jon Peck (no "h") aka Kim
Senior Software Engineer, IBM
[hidden email]
phone: 720-342-5621




From:        Jim Van Overschelde <[hidden email]>
To:        [hidden email],
Date:        08/08/2013 12:57 PM
Subject:        Re: [SPSSX-L] Automating variable creation based on data values
Sent by:        "SPSSX(r) Discussion" <[hidden email]>




This is easy to use and functionally works to create all the different variables but whether I select Use Value Label or Use Values I get the same variable names.  I used a prefix of SEC and all I get is SEC_1, SEC_2, instead of SEC_ENGL1301001, SEC_ENGL1301002.  
 
Suggestions?  
 
From: SPSSX(r) Discussion [[hidden email]] On Behalf Of Jon K Peck
Sent:
Wednesday, August 07, 2013 4:08 PM
To:
[hidden email]
Subject:
Re: Automating variable creation based on data values

 
ok, ok.

First do step 1 below.

Then you can use the SPSSINC CREATE DUMMIES extension command (Transform > Create Dummy Variables) to generate a dummy variable for each distinct value.


However, the names of the dummies will all be taken from a specified root name.  They can, however, have variable labels based on the values of the input variables, which would have the requested form in this case using the concatenated variable as the input.  


The command has some other options that can be seen in the dialog box that might be useful.


This extension command requires the Python Essentials via the SPSS Community website (
www.ibm.com/developerworks/spssdevcentral).  CREATE DUMMIES is included in the Essentials in recent versions, but it can also be downloaded from the Extension Commands collection on the site.

Jon Peck (no "h") aka Kim
Senior Software Engineer, IBM

peck@...
phone: 720-342-5621





From:        
Andy W <apwheele@...>
To:        
[hidden email],
Date:        
08/07/2013 02:32 PM
Subject:        
Re: [SPSSX-L] Automating variable creation based on data values
Sent by:        
"SPSSX(r) Discussion" <[hidden email]>





A general approach may be to;

1) concatenate the identifiers (class, course and section) into one string
that uniquely identifies each combination
2) use auto-recode to turn that into a numeric variable
3) use vector + do repeat to make the set of dummy variables

I believe there are some python extensions to help make dummy variables (I
presume Jon will pitch in to point to them and say whether they can do this
in one fell swoop).



-----
Andy W

apwheele@...
http://andrewpwheeler.wordpress.com/
--
View this message in context:
http://spssx-discussion.1045642.n5.nabble.com/Automating-variable-creation-based-on-data-values-tp5721506p5721508.html
Sent from the SPSSX Discussion mailing list archive at Nabble.com.

=====================
To manage your subscription to SPSSX-L, send a message to

LISTSERV@... (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

Reply | Threaded
Open this post in threaded view
|

Re: Automating variable creation based on data values

Richard Ristow
In reply to this post by Van Overschelde, Jim
At 10:23 PM 8/6/2013, Van Overschelde, Jim wrote:

>I have a dataset with a student identifier and the
>subject/course/section in which the student was enrolled.  I want to
>create a series of dummy variables that indicates whether a student
>was enrolled in a particular section (subject/course/section combination).

Am I missing something, or have people been making this harder than
it is? Working with David Marso's test data,
|-----------------------------|---------------------------|
|Output Created               |09-AUG-2013 17:00:14       |
|-----------------------------|---------------------------|
  [raw]

SubID Subject Course Section X

  123  ENGL    101    01      ENGL10101
  123  MATH    102    01      MATH10201
  123  SCIE    100    01      SCIE10001
  234  ENGL    101    02      ENGL10102
  234  ENGL    200    01      ENGL20001

Number of cases read:  5    Number of cases listed:  5


DATASET COPY     Restruct.
DATASET ACTIVATE Restruct WINDOW=FRONT.
NUMERIC IS (F2).
COMPUTE IS=1.

CASESTOVARS
  /ID      = SubID
  /INDEX   = X
  /DROP    = Subject Course Section
  /AUTOFIX = NO
  /GROUPBY = VARIABLE .


Cases to Variables
[lots of stuff deleted]

RECODE ALL (SYSMIS=0).
LIST.

List
|-----------------------------|---------------------------|
|Output Created               |09-AUG-2013 17:00:14       |
|-----------------------------|---------------------------|
  [Restruct]

SubID ENGL10101 ENGL10102 ENGL20001 MATH10201 SCIE10001

  123       1         0         0        1         1
  234       0         1         1        0         0

Number of cases read:  2    Number of cases listed:  2

=======================================================
APPENDIX: Test data and all code (not saved separately)
=======================================================
DATA LIST LIST / SubID (F3) Subject (A4) Course (A3) Section (A2).
BEGIN DATA
123 ENGL 101 01
123 MATH 102 01
123 SCIE 100 01
234 ENGL 101 02
234 ENGL 200 01
END DATA.

STRING X (A9).
COMPUTE X=CONCAT(Subject ,Course,Section).
DATASET NAME raw.
LIST.

DATASET COPY     Restruct.
DATASET ACTIVATE Restruct WINDOW=FRONT.

NUMERIC IS (F2).
COMPUTE IS=1.

CASESTOVARS
  /ID      = SubID
  /INDEX   = X
  /DROP    = Subject Course Section
  /AUTOFIX = NO
  /GROUPBY = VARIABLE .

RECODE ALL (SYSMIS=0).
LIST.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Automating variable creation based on data values

Andy W
Very nice! Even if you don't want to actually reshape the data (i.e. all SubID's in one row) you can just insert any arbitrary id (e.g. using $casenum), use that as the ID on CASESTOVARS command, and then specify the other variables you don't want to drop on the FIXED subcommand.
Andy W
apwheele@gmail.com
http://andrewpwheeler.wordpress.com/