Hi all, I have a dataset with a student identifier and the subject/course/section in which the student was enrolled. I want to create a series of dummy variables that indicates whether a student was enrolled in a particular section (subject/course/section
combination). For example, Student 1234 was enrolled in ENGL, course 1301, section 010. I want to create a variable ENGL1301010 and code students enrolled in that section with 1 and enrolled in ENGL1301 but a different section with 0. If the student was
not enrolled in any section of ENGL 1301, then the value is sysmis. Because there are over 100 subject/course/section combinations, I don’t want to hard code all possibilities. Can anyone help me with native SPSS code? Thanks, Jim James (Jim) P. Van Overschelde, PhD Co-Director, Office of Educator Preparation College of Education Texas State Univ.-San Marcos 512-245-6382 |
A general approach may be to;
1) concatenate the identifiers (class, course and section) into one string that uniquely identifies each combination 2) use auto-recode to turn that into a numeric variable 3) use vector + do repeat to make the set of dummy variables I believe there are some python extensions to help make dummy variables (I presume Jon will pitch in to point to them and say whether they can do this in one fell swoop). |
ok, ok.
First do step 1 below. Then you can use the SPSSINC CREATE DUMMIES extension command (Transform > Create Dummy Variables) to generate a dummy variable for each distinct value. However, the names of the dummies will all be taken from a specified root name. They can, however, have variable labels based on the values of the input variables, which would have the requested form in this case using the concatenated variable as the input. The command has some other options that can be seen in the dialog box that might be useful. This extension command requires the Python Essentials via the SPSS Community website (www.ibm.com/developerworks/spssdevcentral). CREATE DUMMIES is included in the Essentials in recent versions, but it can also be downloaded from the Extension Commands collection on the site. Jon Peck (no "h") aka Kim Senior Software Engineer, IBM [hidden email] phone: 720-342-5621 From: Andy W <[hidden email]> To: [hidden email], Date: 08/07/2013 02:32 PM Subject: Re: [SPSSX-L] Automating variable creation based on data values Sent by: "SPSSX(r) Discussion" <[hidden email]> A general approach may be to; 1) concatenate the identifiers (class, course and section) into one string that uniquely identifies each combination 2) use auto-recode to turn that into a numeric variable 3) use vector + do repeat to make the set of dummy variables I believe there are some python extensions to help make dummy variables (I presume Jon will pitch in to point to them and say whether they can do this in one fell swoop). ----- Andy W [hidden email] http://andrewpwheeler.wordpress.com/ -- View this message in context: http://spssx-discussion.1045642.n5.nabble.com/Automating-variable-creation-based-on-data-values-tp5721506p5721508.html Sent from the SPSSX Discussion mailing list archive at Nabble.com. ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
Administrator
|
In reply to this post by Van Overschelde, Jim
This will get you ALL but the non course takers being set sysmis.
You are on your own for that step. Donning my teflon Kilt as this technique is frowned upon by the pythonistas. Not that I give a hoot, but 'horrible hack' it is ;-) -- DATA LIST LIST / SubID (F3) Subject (A4) Course (A3) Section (A2). BEGIN DATA 123 ENGL 101 01 123 MATH 102 01 123 SCIE 100 01 234 ENGL 101 02 234 ENGL 200 01 END DATA. STRING X (A9). COMPUTE X=CONCAT(Subject ,Course,Section). DATASET NAME raw. DATASET DECLARE agg . AGGREGATE OUTFILE agg / BREAK Subject Course Section /st=MAX(X)/ Ni=N. DATASET ACTIVATE agg. WRITE OUTFILE 'C:\Temp\HorribleHack101.sps' /"COMPUTE " st " = (Subject EQ '" Subject "' AND Course EQ '" Course "' AND Section EQ '" Section "').". EXECUTE. DATASET ACTIVATE raw. INSERT FILE= 'C:\Temp\HorribleHack101.sps' . LIST. SubID Subject Course Section X ENGL10101 ENGL10102 ENGL20001 MATH10201 SCIE10001 123 ENGL 101 01 ENGL10101 1.00 .00 .00 .00 .00 123 MATH 102 01 MATH10201 .00 .00 .00 1.00 .00 123 SCIE 100 01 SCIE10001 .00 .00 .00 .00 1.00 234 ENGL 101 02 ENGL10102 .00 1.00 .00 .00 .00 234 ENGL 200 01 ENGL20001 .00 .00 1.00 .00 .00 Number of cases read: 5 Number of cases listed: 5
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me. --- "Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis." Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?" |
In reply to this post by Jon K Peck
This is easy to use and functionally works to create all the different variables but whether I select Use Value Label or Use Values I get the same variable names. I used a prefix of SEC and all I get is SEC_1, SEC_2, instead of SEC_ENGL1301001, SEC_ENGL1301002. Suggestions? From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Jon K Peck ok, ok.
|
Since value label text generally does not
conform to variable naming rules, we can't use those labels to make variable
names. The choice you are referring to is how to make the variable
labels, not the variable names. That's what I was trying to say in
the note below.
Jon Peck (no "h") aka Kim Senior Software Engineer, IBM [hidden email] phone: 720-342-5621 From: Jim Van Overschelde <[hidden email]> To: [hidden email], Date: 08/08/2013 12:57 PM Subject: Re: [SPSSX-L] Automating variable creation based on data values Sent by: "SPSSX(r) Discussion" <[hidden email]> This is easy to use and functionally works to create all the different variables but whether I select Use Value Label or Use Values I get the same variable names. I used a prefix of SEC and all I get is SEC_1, SEC_2, instead of SEC_ENGL1301001, SEC_ENGL1301002. Suggestions? From: SPSSX(r) Discussion [[hidden email]] On Behalf Of Jon K Peck Sent: Wednesday, August 07, 2013 4:08 PM To: [hidden email] Subject: Re: Automating variable creation based on data values ok, ok. First do step 1 below. Then you can use the SPSSINC CREATE DUMMIES extension command (Transform > Create Dummy Variables) to generate a dummy variable for each distinct value. However, the names of the dummies will all be taken from a specified root name. They can, however, have variable labels based on the values of the input variables, which would have the requested form in this case using the concatenated variable as the input. The command has some other options that can be seen in the dialog box that might be useful. This extension command requires the Python Essentials via the SPSS Community website (www.ibm.com/developerworks/spssdevcentral). CREATE DUMMIES is included in the Essentials in recent versions, but it can also be downloaded from the Extension Commands collection on the site. Jon Peck (no "h") aka Kim Senior Software Engineer, IBM peck@... phone: 720-342-5621 From: Andy W <apwheele@...> To: [hidden email], Date: 08/07/2013 02:32 PM Subject: Re: [SPSSX-L] Automating variable creation based on data values Sent by: "SPSSX(r) Discussion" <[hidden email]> A general approach may be to; 1) concatenate the identifiers (class, course and section) into one string that uniquely identifies each combination 2) use auto-recode to turn that into a numeric variable 3) use vector + do repeat to make the set of dummy variables I believe there are some python extensions to help make dummy variables (I presume Jon will pitch in to point to them and say whether they can do this in one fell swoop). ----- Andy W apwheele@... http://andrewpwheeler.wordpress.com/ -- View this message in context: http://spssx-discussion.1045642.n5.nabble.com/Automating-variable-creation-based-on-data-values-tp5721506p5721508.html Sent from the SPSSX Discussion mailing list archive at Nabble.com. ===================== To manage your subscription to SPSSX-L, send a message to LISTSERV@... (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
In reply to this post by Van Overschelde, Jim
At 10:23 PM 8/6/2013, Van Overschelde, Jim wrote:
>I have a dataset with a student identifier and the >subject/course/section in which the student was enrolled. I want to >create a series of dummy variables that indicates whether a student >was enrolled in a particular section (subject/course/section combination). Am I missing something, or have people been making this harder than it is? Working with David Marso's test data, |-----------------------------|---------------------------| |Output Created |09-AUG-2013 17:00:14 | |-----------------------------|---------------------------| [raw] SubID Subject Course Section X 123 ENGL 101 01 ENGL10101 123 MATH 102 01 MATH10201 123 SCIE 100 01 SCIE10001 234 ENGL 101 02 ENGL10102 234 ENGL 200 01 ENGL20001 Number of cases read: 5 Number of cases listed: 5 DATASET COPY Restruct. DATASET ACTIVATE Restruct WINDOW=FRONT. NUMERIC IS (F2). COMPUTE IS=1. CASESTOVARS /ID = SubID /INDEX = X /DROP = Subject Course Section /AUTOFIX = NO /GROUPBY = VARIABLE . Cases to Variables [lots of stuff deleted] RECODE ALL (SYSMIS=0). LIST. List |-----------------------------|---------------------------| |Output Created |09-AUG-2013 17:00:14 | |-----------------------------|---------------------------| [Restruct] SubID ENGL10101 ENGL10102 ENGL20001 MATH10201 SCIE10001 123 1 0 0 1 1 234 0 1 1 0 0 Number of cases read: 2 Number of cases listed: 2 ======================================================= APPENDIX: Test data and all code (not saved separately) ======================================================= DATA LIST LIST / SubID (F3) Subject (A4) Course (A3) Section (A2). BEGIN DATA 123 ENGL 101 01 123 MATH 102 01 123 SCIE 100 01 234 ENGL 101 02 234 ENGL 200 01 END DATA. STRING X (A9). COMPUTE X=CONCAT(Subject ,Course,Section). DATASET NAME raw. LIST. DATASET COPY Restruct. DATASET ACTIVATE Restruct WINDOW=FRONT. NUMERIC IS (F2). COMPUTE IS=1. CASESTOVARS /ID = SubID /INDEX = X /DROP = Subject Course Section /AUTOFIX = NO /GROUPBY = VARIABLE . RECODE ALL (SYSMIS=0). LIST. ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
Very nice! Even if you don't want to actually reshape the data (i.e. all SubID's in one row) you can just insert any arbitrary id (e.g. using $casenum), use that as the ID on CASESTOVARS command, and then specify the other variables you don't want to drop on the FIXED subcommand.
|
Free forum by Nabble | Edit this page |