I have a very large dataset with information from all over the county. I often want to tell SPSS that I want ALL available data from a specific area (lets say Alabama). I have thousands of different variables to hold information from all over and I don’t have a way to know which variables are unique to Alabama; several variables may unique to different areas. When I filter out all cases except Alabama, all of the variables still appear as options to run frequencies on, and since I do not know which variables contain Alabama data, I have to run frequencies on all of the variables. When I filter to select all Alabama cases, I want to add a filter to tell SPSS to ignore (or also filter out) all variables that do not have data in them because there is no data to run in those variables. If I run the filter to export into a new file, the new dataset would ideally NOT contain empty variables. Because I have to run all of the variables frequencies to find out which variables have data, it would result in a unmanageably long frequency table output. Except, since there are so many variables and cases, SPSS crashes and doesn’t actually run all of the 15k+ variables. I have run the syntax described in this technote http://www-01.ibm.com/support/docview.wss?uid=swg21481480 that addresses the issue, but it isn't efficent, and it takes several hours to run on my dataset with 15k+ variables. If anyone has a more efficient or streamlined method, please let me know! |
Here's a simple bit of Python code to delete empty variables. For numerics, that means all values are sysmis and for strings, all values are blank. begin program. import spssaux2 spssaux2.delEmptyVars() end program. spssaux2 also has a function, FindEmptyVars, that offers some other options. On Tue, Dec 5, 2017 at 3:38 PM, Melissa <[hidden email]> wrote:
|
In reply to this post by Melissa David
Jon shows an elegant solution to the problem as presented.
I think I would resist, as strongly as I could, any demand
to maintain one flat file with 15K variables, even if most of the
values were not Missing. 90% of the effort of ordinary "data analysis"
is devoted to cleaning and prepping -- and that is when it is easy to
spot the proper Missing, or occasional bad values. You are in worse shape when massive portions are Missing-by-definition.
My own "multiplicity" examples had multiple dates; 50 vars times 5 dates can be 250 vars, or 50 vars: 50 is far faster to edit-check and manipulate,
and is in appropriate form for aggregation, selection, etc.
And in yours, I bet it is only a few "States" that have most of the problems.
Argue for separate datasets, which will be Joined at the time of analyses.
Maintain standard sets of lines to do the joining.
-- Rich Ulrich From: SPSSX(r) Discussion <[hidden email]> on behalf of Melissa <[hidden email]>
Sent: Tuesday, December 5, 2017 5:38:40 PM To: [hidden email] Subject: Filtering out variables without data I have a very large dataset with information from all over the county. I often want to tell SPSS that I want ALL available data from a specific area (lets say Alabama). I have thousands of different variables to hold information from all over and I don’t have a way to know which variables are unique to Alabama; several variables may unique to different areas. When I filter out all cases except Alabama, all of the variables still appear as options to run frequencies on, and since I do not know which variables contain Alabama data, I have to run frequencies on all of the variables. When I filter to select all Alabama cases, I want to add a filter to tell SPSS to ignore (or also filter out) all variables that do not have data in them because there is no data to run in those variables. If I run the filter to export into a new file, the new dataset would ideally NOT contain empty variables. Because I have to run all of the variables frequencies to find out which variables have data, it would result in a unmanageably long frequency table output. Except, since there are so many variables and cases, SPSS crashes and doesn’t actually run all of the 15k+ variables. I have run the syntax described in this technote http://www-01.ibm.com/support/docview.wss?uid=swg21481480 that addresses the issue, but it isn't efficent, and it takes several hours to run on my dataset with 15k+ variables. If anyone has a more efficient or streamlined method, please let me know! ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
15K variables is pretty unwieldy. Splitting up the data into separate files can conveniently be done with the STATS SPLIT DATASET (Data > Split into Files) extension command, and the STATS PROCESS FILES extension command can iterate a file of syntax over the separate files producing one or multiple Viewer files. The advantage of this over SPLIT FILES is that you run a whole set of procedures and have the output grouped by file whereas SPLIT FILES iterates within a single procedure. On Wed, Dec 6, 2017 at 10:39 AM, Rich Ulrich <[hidden email]> wrote:
|
In reply to this post by Jon Peck
This worked, and it only took an hour on my large dataset. Thank you so much, Jon! From: Jon Peck [mailto:[hidden email]] Here's a simple bit of Python code to delete empty variables. For numerics, that means all values are sysmis and for strings, all values are blank. begin program. import spssaux2 spssaux2.delEmptyVars() end program. spssaux2 also has a function, FindEmptyVars, that offers some other options. On Tue, Dec 5, 2017 at 3:38 PM, Melissa <[hidden email]> wrote:
-- Jon K Peck |
Administrator
|
In reply to this post by Melissa David
I looked at that IBM link and my only comment is OMG YUCK ICKY WTF....
I would likely go with Jon's python solution. Perhaps you would like to discuss how you have 15,000 variables. There is likely a much more reasonable solution than maintaining them in one file. Why do you think SPSS has ADD FILES and MATCH FILES commands. I am not going to try to guess the origins of this mess. ----- Please reply to the list and not to my personal email. Those desiring my consulting or training services please feel free to email me. --- "Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis." Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?" -- Sent from: http://spssx-discussion.1045642.n5.nabble.com/ ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me. --- "Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis." Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?" |
Free forum by Nabble | Edit this page |