Clarification on file filter question

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Clarification on file filter question

GREENE Teresa
Greetings--

Allow me to clarify this question.  I have a master file of student
enrollment data.  Every time a student moves from one school to another,
a new record is created that provides a date range for the time the
student was enrolled at a particular institution (enrollment
date/withdrawal date).  There are over 850,000 records for one school
year in this file.  I have another file that has expulsion and
suspension data in it, with a single date for the expulsion/suspension
incident.  Some kids have multiple instances in one school year, but in
different institutions.  Some of the enrollment information is missing
in this file.  I need to pull it out of the other file, but I need only
the enrollment record for the student that includes the date of the
expulsion/suspension incident.  I have some 30 students that are missing
this data, representing over 50 records.  I have tried matching the
files, but a merge just on ID does not function properly.  Some records
get matched incorrectly, and I suspect it is because I have variable
numbers of records per id in each file.  I have worked with other
programs where you can create a text file that contains just the id
numbers you want to isolate, and then you write a query that uses that
text file as a filter so that only the records for those ids are pulled
out.  My question is whether SPSS has this capability through the syntax
language.

Thanks for your thoughts--

Teresa :)
Reply | Threaded
Open this post in threaded view
|

Re: Clarification on file filter question

Maguin, Eugene
Teresa,

Thinge are much clearer now than they were in your first message. One small
but important discrepancy is whether you do have a common id variable. This
morning it sounded like you didn't; now it sounds like you do. I'll assume
you do.

Your working on data from a school district, I'll bet. I've worked on data
like that. And, it's a real joy! Technically, you have 'many' records in
each file with the same id. A regular match files won't work because it
assumes a one to one relationship. If you had a 'one to many' relationship
you might be able to use the Table subcommand. However, really have the
problem of filling in the enrollment dates that bracket the suspension.

I think you will have to dig the records out of the enrollment file. I'd
start this way.

List out your susupesion records that have the missing enrollment info. Use
that information to select out the enrollment record for that kid that
brackets the suspension date. However, I'd suggest that you use a print
statement to structure your output and print all the enrollment records for
that kid. I noticed that kids could be suspended and transferred to a new
school on the same day or, even more interesting, between the time they left
one school and entered another. Probably clerical error. District doesn't
have the money to groom the data. This also means that you will have to
construct a series of If statements to insert the correction back into the
suspension file.

50 cases is a lot but you must have 70 or 80 thousand unique kids in that
file. So, not very many problems. Data is pretty clean.

Gene Maguin