SPSSX Discussion

Identify if column set is less than x and create columns

Classic

List

Threaded

7 messages Options

wsu_wright

Sep 17, 2022; 7:07pm

Identify if column set is less than x and create columns

107 posts

Background:
I use SPSS (v28) to extract data on classes (some refer to these as class sections of a course) of courses offered by semester from our student information system. The SIS table is a non-repeating class section by semester by year oracle table with class information (type, method, size, number of students, etc.). I then pull data from another SIS table of instructors in which a class section may have several instructors; the instructor SIS table is a repeating instructor by class section by semester by year, the SIS allows up to 6 instructors for a class section. Once the instructor data are extracted I slot the instructor by class section by semester by year (CASESTOVARS), this give me a single record for a class section with all the instructors IDs list in columns. Finally, I join (match files) the instructor data to the class section data mentioned above, creating a single record for a class section within a given semester and year with class information and instructors in slotted columns. This table is uploaded daily using SPSS bulk insert into an Oracle database that our enterprising reporting tool pulls for reports. This is an automated spss syntax file that runs daily during the early morning hours.

The problem trying to solve:
While the SIS allows up to 6 instructors per class section, it’s possible that in any given daily upload there will be less than 6 max instructors across the class sections and thus less than 6 columns (via CASESTOVARS) which then causes an upload failure as the database is expecting 6 columns for instructors (whether they have data or nulls). So in the automated spss syntax file mentioned above, I need a syntax method to identify when I have less than 6 columns and if less than 6 columns exist to create the additional columns needed (these can via the String command and empty or null columns).

Listed below is the current logic (minimalized for display) that pulls the data from the two databases. Since this syntax is shared among other programmers whom I cannot assume they have python, python is not preferred.

dataset close all.
*obtain classection instructor data.
GET Capture ODBC
/CONNECT=########
/select instructorID"ID", term, classsection
from table X.
CACHE.
EXECUTE.
sort cases by term classsection.
CASESTOVARS /ID=term classsection /separator="" /GROUPBY=INDEX.
*prep for join.
sort cases by term classsection.
dataset name instructors.

*obtain class section data- non-repeating class section by term.
GET Capture ODBC
/CONNECT=#######
/select classsection, term, type, method, stu_count
from table Y.
CACHE.
EXECUTE.
*join instructor data.
sort cases by term classsection.
match files file=* /table=instructors /by term classsection.
dataset close instructors.

*identify column count and create missing nulled columns for upload.
???.

David, thanks for any suggestions.

jkpeck

Sep 18, 2022; 7:35pm

Re: Identify if column set is less than x and create columns

87 posts

The COUNT subcommand produces a variable of the number of rows in the input that were used to generate the new variables. So, if I understand your setup, you would need to generate new, empty variables to fill out that number to six. You could use a VECTOR command like
VECTOR V(n, A20).
to generate a set of instructor variables, where n is the number needed, which would be 6 minus the maximum of the COUNT variable or however many additional variable names you need.

But, an easy way to do that without Python doesn't come to mind. However, Python is always installed for Statistics in recent versions, so you should be able to rely on that. There has not been the option to omit it for several versions back.

If you want to pursue this approach, I can help, but I would need to see a small sample of the data after the CASESTOVARS step and the list of expected instructor variable names. (jkpeck@gmail.com). (The actual data doesn't matter).

If I am on the wrong track, a sample of the data would help with that, too.

Andy W

Sep 19, 2022; 12:24pm

Re: Identify if column set is less than x and create columns

732 posts

You can create a fake dummy dataset with all of your columns, and then use ADD FILES to add this to your actual data. In the real data these will then be missing.

*******************************************************.
* Making a fake dummy dataset with all of the columns.
* And no rows.
DATA LIST FREE / Term S1 TO S6.
DATASET NAME Dummy.
BEGIN DATA
END DATA.

* Example not full data.
DATA LIST FREE / Term S1 S2 S3.
BEGIN DATA
1 1 2 3
END DATA.
DATASET NAME RealData.
DATASET ACTIVATE RealData

* Union those two datasets together.
* Get rid of the fake row.
ADD FILES FILE=* /FILE = 'Dummy'.
EXECUTE.
*******************************************************.

Andy W
apwheele@gmail.com
http://andrewpwheeler.wordpress.com/

Bruce Weaver

Sep 19, 2022; 1:16pm

Re: Identify if column set is less than x and create columns

Administrator

3512 posts

Great idea, Andy. However, I would suggest adding the RealData to the Dummy dataset so that the variables are in the desired order (in case that is important). ;-)

ADD FILES FILE='Dummy' /FILE = 'Realdata'.
EXECUTE.

Andy W wrote

You can create a fake dummy dataset with all of your columns, and then use ADD FILES to add this to your actual data. In the real data these will then be missing.

*******************************************************.
* Making a fake dummy dataset with all of the columns.
* And no rows.
DATA LIST FREE / Term S1 TO S6.
DATASET NAME Dummy.
BEGIN DATA
END DATA.

* Example not full data.
DATA LIST FREE / Term S1 S2 S3.
BEGIN DATA
1 1 2 3
END DATA.
DATASET NAME RealData.
DATASET ACTIVATE RealData

* Union those two datasets together.
* Get rid of the fake row.
ADD FILES FILE=* /FILE = 'Dummy'.
EXECUTE.
*******************************************************.
... [show rest of quote]

... [show rest of quote]

--
Bruce Weaver
bweaver@lakeheadu.ca
http://sites.google.com/a/lakeheadu.ca/bweaver/

"When all else fails, RTFM."

PLEASE NOTE THE FOLLOWING:
1. My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above.
2. The SPSSX Discussion forum on Nabble is no longer linked to the SPSSX-L listserv administered by UGA (https://listserv.uga.edu/).

Andy W

Sep 19, 2022; 1:29pm

Re: Identify if column set is less than x and create columns

732 posts

Yes that makes sense Bruce. If re-uploading to database hopefully SPSS takes care of that insert order under the hood, but another approach is to use a /KEEP subcommand to explicitly order the variables as well.

Andy W
apwheele@gmail.com
http://andrewpwheeler.wordpress.com/

wsu_wright

Sep 21, 2022; 10:44am

Re: Identify if column set is less than x and create columns

107 posts

Jon, Andy, Bruce,

Thanks for your quick response. The dummy dataset works, I've had two occasions since Monday in which I had less than the 6 columns and the suggested fix returned all 6 columns for the upload. Jon, to your point on python, I'm getting close, I've got most of the programmer on a current version that has python (v28) but still have a small set to go. Thanks again fellows for your help, this list (& the previous spssx list serv) has never failed me yet!

jkpeck

Sep 21, 2022; 2:21pm

Re: Identify if column set is less than x and create columns

87 posts

IIRC, SPSS started including Python (Python 2) in the regular SPSS install back in V21 or 22 but allowed the user to refuse it. The default was to install it. It became a mandatory install somewhat later but still several releases ago. Of course, now it is only Python 3 that is installed as Python 2 has passed its end-of-life data for the Python Software Foundation.