SPSSX Discussion

Automating variable creation based on data values

Classic

List

Threaded

8 messages Options

Van Overschelde, Jim

Automating variable creation based on data values

Hi all,

I have a dataset with a student identifier and the subject/course/section in which the student was enrolled. I want to create a series of dummy variables that indicates whether a student was enrolled in a particular section (subject/course/section combination). For example, Student 1234 was enrolled in ENGL, course 1301, section 010. I want to create a variable ENGL1301010 and code students enrolled in that section with 1 and enrolled in ENGL1301 but a different section with 0. If the student was not enrolled in any section of ENGL 1301, then the value is sysmis.

Because there are over 100 subject/course/section combinations, I don’t want to hard code all possibilities. Can anyone help me with native SPSS code?

Thanks,

Jim

James (Jim) P. Van Overschelde, PhD

Co-Director, Office of Educator Preparation

College of Education

Texas State Univ.-San Marcos
601 University Dr
San Marcos, TX 78666

512-245-6382

Andy W

Re: Automating variable creation based on data values

Andy W
apwheele@gmail.com
http://andrewpwheeler.wordpress.com/

Jon K Peck

Re: Automating variable creation based on data values

ok, ok.

First do step 1 below.
Then you can use the SPSSINC CREATE DUMMIES extension command (Transform > Create Dummy Variables) to generate a dummy variable for each distinct value.

However, the names of the dummies will all be taken from a specified root name. They can, however, have variable labels based on the values of the input variables, which would have the requested form in this case using the concatenated variable as the input.

The command has some other options that can be seen in the dialog box that might be useful.

This extension command requires the Python Essentials via the SPSS Community website (www.ibm.com/developerworks/spssdevcentral). CREATE DUMMIES is included in the Essentials in recent versions, but it can also be downloaded from the Extension Commands collection on the site.

Jon Peck (no "h") aka Kim
Senior Software Engineer, IBM
[hidden email]
phone: 720-342-5621

From: Andy W <[hidden email]>
To: [hidden email],
Date: 08/07/2013 02:32 PM
Subject: Re: [SPSSX-L] Automating variable creation based on data values
Sent by: "SPSSX(r) Discussion" <[hidden email]>

A general approach may be to; 1) concatenate the identifiers (class, course and section) into one string that uniquely identifies each combination 2) use auto-recode to turn that into a numeric variable 3) use vector + do repeat to make the set of dummy variables I believe there are some python extensions to help make dummy variables (I presume Jon will pitch in to point to them and say whether they can do this in one fell swoop). ----- Andy W [hidden email]http://andrewpwheeler.wordpress.com/-- View this message in context:http://spssx-discussion.1045642.n5.nabble.com/Automating-variable-creation-based-on-data-values-tp5721506p5721508.htmlSent from the SPSSX Discussion mailing list archive at Nabble.com. ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD

David Marso

Re: Automating variable creation based on data values

Administrator

In reply to this post by Van Overschelde, Jim

This will get you ALL but the non course takers being set sysmis.
You are on your own for that step.
Donning my teflon Kilt as this technique is frowned upon by the pythonistas.
Not that I give a hoot, but 'horrible hack' it is ;-)
--
DATA LIST LIST / SubID (F3) Subject (A4) Course (A3) Section (A2).
BEGIN DATA
123 ENGL 101 01
123 MATH 102 01
123 SCIE 100 01
234 ENGL 101 02
234 ENGL 200 01
END DATA.

STRING X (A9).
COMPUTE X=CONCAT(Subject ,Course,Section).
DATASET NAME raw.
DATASET DECLARE agg .
AGGREGATE OUTFILE agg / BREAK Subject Course Section /st=MAX(X)/ Ni=N.
DATASET ACTIVATE agg.

WRITE OUTFILE 'C:\Temp\HorribleHack101.sps'
/"COMPUTE " st " = (Subject EQ '" Subject "' AND Course EQ '" Course "' AND Section EQ '" Section "').".
EXECUTE.
DATASET ACTIVATE raw.
INSERT FILE= 'C:\Temp\HorribleHack101.sps' .

LIST.

SubID Subject Course Section X ENGL10101 ENGL10102 ENGL20001 MATH10201 SCIE10001

123 ENGL 101 01 ENGL10101 1.00 .00 .00 .00 .00
123 MATH 102 01 MATH10201 .00 .00 .00 1.00 .00
123 SCIE 100 01 SCIE10001 .00 .00 .00 .00 1.00
234 ENGL 101 02 ENGL10102 .00 1.00 .00 .00 .00
234 ENGL 200 01 ENGL20001 .00 .00 1.00 .00 .00

Number of cases read: 5 Number of cases listed: 5

Van Overschelde, Jim wrote

Hi all,

I have a dataset with a student identifier and the subject/course/section in which the student was enrolled. I want to create a series of dummy variables that indicates whether a student was enrolled in a particular section (subject/course/section combination). For example, Student 1234 was enrolled in ENGL, course 1301, section 010. I want to create a variable ENGL1301010 and code students enrolled in that section with 1 and enrolled in ENGL1301 but a different section with 0. If the student was not enrolled in any section of ENGL 1301, then the value is sysmis.
Because there are over 100 subject/course/section combinations, I don't want to hard code all possibilities. Can anyone help me with native SPSS code?

Thanks,
Jim

James (Jim) P. Van Overschelde, PhD
Co-Director, Office of Educator Preparation
College of Education
Texas State Univ.-San Marcos
601 University Dr
San Marcos, TX 78666
512-245-6382

Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me.
---
"Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis."
Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?"

Jim Van Overschelde

Re: Automating variable creation based on data values

In reply to this post by Jon K Peck

Suggestions?

From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Jon K Peck
Sent: Wednesday, August 07, 2013 4:08 PM
To: [hidden email]
Subject: Re: Automating variable creation based on data values

A general approach may be to;

1) concatenate the identifiers (class, course and section) into one string
that uniquely identifies each combination
2) use auto-recode to turn that into a numeric variable
3) use vector + do repeat to make the set of dummy variables

I believe there are some python extensions to help make dummy variables (I
presume Jon will pitch in to point to them and say whether they can do this
in one fell swoop).

-----
Andy W
[hidden email]
http://andrewpwheeler.wordpress.com/
--
View this message in context: http://spssx-discussion.1045642.n5.nabble.com/Automating-variable-creation-based-on-data-values-tp5721506p5721508.html
Sent from the SPSSX Discussion mailing list archive at Nabble.com.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

Jon K Peck

Re: Automating variable creation based on data values

Since value label text generally does not conform to variable naming rules, we can't use those labels to make variable names. The choice you are referring to is how to make the variable labels, not the variable names. That's what I was trying to say in the note below.

Jon Peck (no "h") aka Kim
Senior Software Engineer, IBM
[hidden email]
phone: 720-342-5621

From: Jim Van Overschelde <[hidden email]>
To: [hidden email],
Date: 08/08/2013 12:57 PM
Subject: Re: [SPSSX-L] Automating variable creation based on data values
Sent by: "SPSSX(r) Discussion" <[hidden email]>

This is easy to use and functionally works to create all the different variables but whether I select Use Value Label or Use Values I get the same variable names. I used a prefix of SEC and all I get is SEC_1, SEC_2, instead of SEC_ENGL1301001, SEC_ENGL1301002.

Suggestions?

From: SPSSX(r) Discussion [[hidden email]] On Behalf Of Jon K Peck
Sent: Wednesday, August 07, 2013 4:08 PM
To: [hidden email]
Subject: Re: Automating variable creation based on data values

ok, ok.

First do step 1 below.
Then you can use the SPSSINC CREATE DUMMIES extension command (Transform > Create Dummy Variables) to generate a dummy variable for each distinct value.

However, the names of the dummies will all be taken from a specified root name. They can, however, have variable labels based on the values of the input variables, which would have the requested form in this case using the concatenated variable as the input.

The command has some other options that can be seen in the dialog box that might be useful.

This extension command requires the Python Essentials via the SPSS Community website (www.ibm.com/developerworks/spssdevcentral). CREATE DUMMIES is included in the Essentials in recent versions, but it can also be downloaded from the Extension Commands collection on the site.

Jon Peck (no "h") aka Kim
Senior Software Engineer, IBM
peck@...
phone: 720-342-5621

From: Andy W <apwheele@...>
To: [hidden email],
Date: 08/07/2013 02:32 PM
Subject: Re: [SPSSX-L] Automating variable creation based on data values
Sent by: "SPSSX(r) Discussion" <[hidden email]>

A general approach may be to;

1) concatenate the identifiers (class, course and section) into one string
that uniquely identifies each combination
2) use auto-recode to turn that into a numeric variable
3) use vector + do repeat to make the set of dummy variables

I believe there are some python extensions to help make dummy variables (I
presume Jon will pitch in to point to them and say whether they can do this
in one fell swoop).

-----
Andy W
apwheele@...
http://andrewpwheeler.wordpress.com/
--
View this message in context: http://spssx-discussion.1045642.n5.nabble.com/Automating-variable-creation-based-on-data-values-tp5721506p5721508.html
Sent from the SPSSX Discussion mailing list archive at Nabble.com.

=====================
To manage your subscription to SPSSX-L, send a message to
LISTSERV@... (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

Richard Ristow

Re: Automating variable creation based on data values

In reply to this post by Van Overschelde, Jim

At 10:23 PM 8/6/2013, Van Overschelde, Jim wrote:

>I have a dataset with a student identifier and the
>subject/course/section in which the student was enrolled. I want to
>create a series of dummy variables that indicates whether a student
>was enrolled in a particular section (subject/course/section combination).

Am I missing something, or have people been making this harder than
it is? Working with David Marso's test data,
|-----------------------------|---------------------------|
|Output Created |09-AUG-2013 17:00:14 |
|-----------------------------|---------------------------|
[raw]

SubID Subject Course Section X

123 ENGL 101 01 ENGL10101
123 MATH 102 01 MATH10201
123 SCIE 100 01 SCIE10001
234 ENGL 101 02 ENGL10102
234 ENGL 200 01 ENGL20001

Number of cases read: 5 Number of cases listed: 5

DATASET COPY Restruct.
DATASET ACTIVATE Restruct WINDOW=FRONT.
NUMERIC IS (F2).
COMPUTE IS=1.

CASESTOVARS
/ID = SubID
/INDEX = X
/DROP = Subject Course Section
/AUTOFIX = NO
/GROUPBY = VARIABLE .

Cases to Variables
[lots of stuff deleted]

RECODE ALL (SYSMIS=0).
LIST.

List
|-----------------------------|---------------------------|
|Output Created |09-AUG-2013 17:00:14 |
|-----------------------------|---------------------------|
[Restruct]

SubID ENGL10101 ENGL10102 ENGL20001 MATH10201 SCIE10001

123 1 0 0 1 1
234 0 1 1 0 0

Number of cases read: 2 Number of cases listed: 2

=======================================================
APPENDIX: Test data and all code (not saved separately)
=======================================================
DATA LIST LIST / SubID (F3) Subject (A4) Course (A3) Section (A2).
BEGIN DATA
123 ENGL 101 01
123 MATH 102 01
123 SCIE 100 01
234 ENGL 101 02
234 ENGL 200 01
END DATA.

STRING X (A9).
COMPUTE X=CONCAT(Subject ,Course,Section).
DATASET NAME raw.
LIST.

DATASET COPY Restruct.
DATASET ACTIVATE Restruct WINDOW=FRONT.

NUMERIC IS (F2).
COMPUTE IS=1.

CASESTOVARS
/ID = SubID
/INDEX = X
/DROP = Subject Course Section
/AUTOFIX = NO
/GROUPBY = VARIABLE .

RECODE ALL (SYSMIS=0).
LIST.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

Andy W

Re: Automating variable creation based on data values

Very nice! Even if you don't want to actually reshape the data (i.e. all SubID's in one row) you can just insert any arbitrary id (e.g. using $casenum), use that as the ID on CASESTOVARS command, and then specify the other variables you don't want to drop on the FIXED subcommand.

Andy W
apwheele@gmail.com
http://andrewpwheeler.wordpress.com/