Background:
I use SPSS (v28) to extract data on classes (some refer to these as class sections of a course) of courses offered by semester from our student information system. The SIS table is a non-repeating class section by semester by year oracle table with class information (type, method, size, number of students, etc.). I then pull data from another SIS table of instructors in which a class section may have several instructors; the instructor SIS table is a repeating instructor by class section by semester by year, the SIS allows up to 6 instructors for a class section. Once the instructor data are extracted I slot the instructor by class section by semester by year (CASESTOVARS), this give me a single record for a class section with all the instructors IDs list in columns. Finally, I join (match files) the instructor data to the class section data mentioned above, creating a single record for a class section within a given semester and year with class information and instructors in slotted columns. This table is uploaded daily using SPSS bulk insert into an Oracle database that our enterprising reporting tool pulls for reports. This is an automated spss syntax file that runs daily during the early morning hours. The problem trying to solve: While the SIS allows up to 6 instructors per class section, it’s possible that in any given daily upload there will be less than 6 max instructors across the class sections and thus less than 6 columns (via CASESTOVARS) which then causes an upload failure as the database is expecting 6 columns for instructors (whether they have data or nulls). So in the automated spss syntax file mentioned above, I need a syntax method to identify when I have less than 6 columns and if less than 6 columns exist to create the additional columns needed (these can via the String command and empty or null columns). Listed below is the current logic (minimalized for display) that pulls the data from the two databases. Since this syntax is shared among other programmers whom I cannot assume they have python, python is not preferred. dataset close all. *obtain classection instructor data. GET Capture ODBC /CONNECT=######## /select instructorID"ID", term, classsection from table X. CACHE. EXECUTE. sort cases by term classsection. CASESTOVARS /ID=term classsection /separator="" /GROUPBY=INDEX. *prep for join. sort cases by term classsection. dataset name instructors. *obtain class section data- non-repeating class section by term. GET Capture ODBC /CONNECT=####### /select classsection, term, type, method, stu_count from table Y. CACHE. EXECUTE. *join instructor data. sort cases by term classsection. match files file=* /table=instructors /by term classsection. dataset close instructors. *identify column count and create missing nulled columns for upload. ???. David, thanks for any suggestions. |
The COUNT subcommand produces a variable of the number of rows in the input that were used to generate the new variables. So, if I understand your setup, you would need to generate new, empty variables to fill out that number to six. You could use a VECTOR command like
VECTOR V(n, A20). to generate a set of instructor variables, where n is the number needed, which would be 6 minus the maximum of the COUNT variable or however many additional variable names you need. But, an easy way to do that without Python doesn't come to mind. However, Python is always installed for Statistics in recent versions, so you should be able to rely on that. There has not been the option to omit it for several versions back. If you want to pursue this approach, I can help, but I would need to see a small sample of the data after the CASESTOVARS step and the list of expected instructor variable names. (jkpeck@gmail.com). (The actual data doesn't matter). If I am on the wrong track, a sample of the data would help with that, too. |
You can create a fake dummy dataset with all of your columns, and then use ADD FILES to add this to your actual data. In the real data these will then be missing.
*******************************************************. * Making a fake dummy dataset with all of the columns. * And no rows. DATA LIST FREE / Term S1 TO S6. DATASET NAME Dummy. BEGIN DATA END DATA. * Example not full data. DATA LIST FREE / Term S1 S2 S3. BEGIN DATA 1 1 2 3 END DATA. DATASET NAME RealData. DATASET ACTIVATE RealData * Union those two datasets together. * Get rid of the fake row. ADD FILES FILE=* /FILE = 'Dummy'. EXECUTE. *******************************************************. |
Administrator
|
Great idea, Andy. However, I would suggest adding the RealData to the Dummy dataset so that the variables are in the desired order (in case that is important). ;-)
ADD FILES FILE='Dummy' /FILE = 'Realdata'. EXECUTE.
--
Bruce Weaver bweaver@lakeheadu.ca http://sites.google.com/a/lakeheadu.ca/bweaver/ "When all else fails, RTFM." PLEASE NOTE THE FOLLOWING: 1. My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above. 2. The SPSSX Discussion forum on Nabble is no longer linked to the SPSSX-L listserv administered by UGA (https://listserv.uga.edu/). |
Yes that makes sense Bruce. If re-uploading to database hopefully SPSS takes care of that insert order under the hood, but another approach is to use a /KEEP subcommand to explicitly order the variables as well.
|
Jon, Andy, Bruce,
Thanks for your quick response. The dummy dataset works, I've had two occasions since Monday in which I had less than the 6 columns and the suggested fix returned all 6 columns for the upload. Jon, to your point on python, I'm getting close, I've got most of the programmer on a current version that has python (v28) but still have a small set to go. Thanks again fellows for your help, this list (& the previous spssx list serv) has never failed me yet! |
IIRC, SPSS started including Python (Python 2) in the regular SPSS install back in V21 or 22 but allowed the user to refuse it. The default was to install it. It became a mandatory install somewhat later but still several releases ago. Of course, now it is only Python 3 that is installed as Python 2 has passed its end-of-life data for the Python Software Foundation.
|
Free forum by Nabble | Edit this page |