Hi, how can I delete variables in an SPSS file that contain missing values? I want to do this with the syntax in SPSS. I am using version 20, with no Python or so.
Thnx. a lot Christian My SPSS file looks like this V1 V2 V3 1 . 4 2 . . 3 . 3 4 . 2 I want either delete variables that only conain missings (= V2) or variables that contain any missing value (= V3) |
There are quite a few methods.
If you already know which variables are completely empty, then use the DELETE VARS command. WRT deleting variables with some missing data, why? most procedures can deal with missing data. how elaborate a solution is needed would depend on several things. How many cases and variables do you have? is this a one time thing or do you need to do it for many data sets? do you need it to be fully automated? Art Kendall Social Research Consultants On 6/6/2012 8:04 AM, Memphisbelle wrote: ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARDV1 V2 V3 1 . 4 2 . . 3 . 3 4 . 2
Art Kendall
Social Research Consultants |
Hi Art,
thanks for your reply. I mainly need it to delete variables that contain no data, just missings. But I do not know what Variables do have this feature, that means it should be automatically. something like this DELETE VARIABLES varX to varY if variable value is missing but I didn't succeed to write it in SPSS syntax the file has about 600 variables and it must be done automatically |
In reply to this post by Memphisbelle
Dear Christian,
You have a lot of options for this. 1) I'd certainly do it with Python but if you mean to say you're not willing to install Python I won't bother to write the syntax for you. 2) A reasonably automated option is to write syntax generating syntax (you basically [OMS] the descriptives to a new [DATASET] which you manipulate and save as plain text with the extension ".sps". You can then [INSERT] it into your master syntax file. 3) If you find this too complicated too, you could manually copy-paste your descriptives table into EXCEL, copy-paste the desired syntax into it and copy-paste back into a syntax window. BTW, are there any string variables in your data and if so, where? If you need any further assistance, please let us know! HTH, Ruben > Date: Wed, 6 Jun 2012 05:04:07 -0700 > From: [hidden email] > Subject: Deleting Variables with Missings Values > To: [hidden email] > > Hi, how can I delete variables in an SPSS file that contain missing values? I > want to do this with the syntax in SPSS. I am using version 20, with no > Python or so. > Thnx. a lot > Christian > > My SPSS file looks like this > > V1 V2 V3 > 1 . 4 > 2 . . > 3 . 3 > 4 . 2 > > I want either delete variables that only conain missings (= V2) or variables > that contain any missing value (= V3) > > -- > View this message in context: http://spssx-discussion.1045642.n5.nabble.com/Deleting-Variables-with-Missings-Values-tp5713534.html > Sent from the SPSSX Discussion mailing list archive at Nabble.com. > > ===================== > To manage your subscription to SPSSX-L, send a message to > [hidden email] (not to SPSSX-L), with no body text except the > command. To leave the list, send the command > SIGNOFF SPSSX-L > For a list of commands to manage subscriptions, send the command > INFO REFCARD |
Administrator
|
In reply to this post by Memphisbelle
Here is one pre-python approach. YMMV if you have *MANY* cases.
If you have *MANY* cases then use AGGREGATE to determine number of missing then use RG approach writing command syntax for DELETE VARIABLES. You could also use MATRIX command and get clever. data list free / a b c d. begin data 6 3 6 5 6 5 6 3 1 6 3 5 6 1 5 3 6 7 1 5 3 6 1 5 3 6 7 1 5 3 7 1 5 3 6 7 1 5 6 7 3 1 6 3 5 1 2 6 5 3 6 1 5 6 4 5 6 2 5 4 6 1 2 5 6 4 1 2 5 4 6 1 5 6 4 5 6 6 5 2 6 4`5 6 3 6 3 5 3 1 5 2 3 6 7 5 5 5 3 2 1 6 3 5 6 1 2 5 3 1 6 3 4 5 6 1 1 6 3 6 1 1 1 6 4 5 2 3 end data. compute c=$SYSMIS. FLIP. COMPUTE @=$SYSMIS. COMPUTE @@=NVALID(var001 TO @). SELECT IF @@ NE 0. **MATCH FILES / FILE * / DROP @ @@ /* OLD SKOOL *. DELETE VARIABLES @ @@. FLIP. DELETE VARIABLES CASE_LBL.
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me. --- "Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis." Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?" |
In reply to this post by Memphisbelle
Ruben mentioned some options.
if this is a one time thing, especially if you only expect a few variables to be missing for all cases try something like this. means tables = v001 to v600 /cells=count. eyeball the output then delete vars v015 v099 v211 v 345. However, it is still unclear why you would want to get rid of those variables, how often you need to do this (daily, weekly, once), etc., Art Kendall Social Research Consultants On 6/6/2012 10:56 AM, Memphisbelle wrote: ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARDHi Art, thanks for your reply. I mainly need it to delete variables that contain no data, just missings. But I do not know what Variables do have this feature, that means it should be automatically. something like this DELETE VARIABLES varX to varY if variable value is missing but I didn't succeed to write it in SPSS syntax the file has about 600 variables and it must be done automatically -- View this message in context: http://spssx-discussion.1045642.n5.nabble.com/Deleting-Variables-with-Missings-Values-tp5713534p5713539.html Sent from the SPSSX Discussion mailing list archive at Nabble.com. ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Art Kendall
Social Research Consultants |
Administrator
|
A fairly general solution requiring neither Python nor subject to reasonable sample size limitations also avoids a potentially FUGLY dual data transposition issue [ ie delete my previous post from memory ;-))) ] WTFWIT. OTOH: If you have string variables get rid of them from the active file and merge them back in later!
-- INPUT PROGRAM. + LOOP CaseID=1 TO 60000. + DO REPEAT V=V1 TO V600. + COMPUTE V=NORMAL(1). + END REPEAT. + END CASE. + END LOOP. + END FILE. END INPUT PROGRAM. DO REPEAT V=v1 v12 v100 v30 v60 v200 v250 v260 v300 v500 v599. + COMPUTE V=$SYSMIS. END REPEAT. EXE. PRESERVE. SET MXLOOPS 60000. MATRIX. + GET DATA / FILE * / VAR ALL/MISSING=ACCEPT/SYSMIS = -999/NAMES=vnames. + COMPUTE DMin=CMIN(DATA). + COMPUTE DMax=CMAX(DATA). + COMPUTE Ncoldata=NCOL(DATA). + COMPUTE datavec=MAKE(1,Ncoldata,1). + LOOP #=1 TO Ncoldata. + DO IF DMin(#)=DMax(#) AND DMin(#)=-999. + COMPUTE datavec(#)=0. + END IF. + END LOOP. + COMPUTE NVars=RSUM(datavec). + COMPUTE Curvar=1. + LOOP #=1 TO Ncoldata. + DO IF datavec(#)=1. + COMPUTE data(:,Curvar)=Data(:,#). + COMPUTE vnames(Curvar)=vnames(#). + COMPUTE Curvar=Curvar+1. + END IF. + END LOOP. + COMPUTE vnames=vnames(1:NVars). + COMPUTE data=data(:,1:NVars). SAVE data / OUTFILE * / NAMES=vnames. END MATRIX. SHOW WORKSPACE. RESTORE.
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me. --- "Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis." Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?" |
Dear David (and others),
Ik didn't get your first solution and it didn't work for me either. The one I meant was: ***Create fake data***. datas clo all. set seed 1. INPUT PROGRAM. LOOP ID=1 TO 10. DO REPEAT V=V1 TO V10. COMPUTE V=rnd(rv.uni(0,5)). compute #=rv.ber(.1). if #=1 v=$sysmis. END REPEAT. END CASE. END LOOP. END FILE. END INPUT PROGRAM. EXE. dataset name original. ***OMS descriptives to new dataset***. DATASET DECLARE des. OMS /SELECT TABLES /IF COMMANDS=['Descriptives'] SUBTYPES=['Descriptive Statistics'] /DESTINATION FORMAT=SAV NUMBERED=TableNumber_ OUTFILE='des' /TAG='des'. descriptives v1 to v10. omsend tag=['des']. dataset activate des. ***Manipulate temp dataset***. select if N<10 and rtrim(var1)ne"Valid N (listwise)"./*OK, the "10" here must be inserted manually.*/ ***Write syntax***. write outfile'c:/delvars.sps'/'delete variables 'Var1'.'. exe. ***Apply syntax to ori data***. dataset close des. DATASET activate original. include 'c:/delvars.sps'. ***Delete syntax file***. erase file 'c:/delvars.sps'. But this is dumb syntax, IMNSHO this is better done with Python. HTH, Ruben > Date: Wed, 6 Jun 2012 12:19:11 -0700 > From: [hidden email] > Subject: Re: Deleting Variables with Missings Values > To: [hidden email] > > A fairly general solution requiring neither Python nor subject to reasonable > sample size limitations also avoids a potentially FUGLY dual data > transposition issue [ ie delete my previous post from memory ;-))) ] WTFWIT. > OTOH: If you have string variables get rid of them from the active file and > merge them back in later! > -- > INPUT PROGRAM. > + LOOP CaseID=1 TO 60000. > + DO REPEAT V=V1 TO V600. > + COMPUTE V=NORMAL(1). > + END REPEAT. > + END CASE. > + END LOOP. > + END FILE. > END INPUT PROGRAM. > DO REPEAT V=v1 v12 v100 v30 v60 v200 v250 v260 v300 v500 v599. > + COMPUTE V=$SYSMIS. > END REPEAT. > EXE. > > PRESERVE. > SET MXLOOPS 60000. > MATRIX. > + GET DATA / FILE * / VAR ALL/MISSING=ACCEPT/SYSMIS = -999/NAMES=vnames. > + COMPUTE DMin=CMIN(DATA). > + COMPUTE DMax=CMAX(DATA). > + COMPUTE Ncoldata=NCOL(DATA). > + COMPUTE datavec=MAKE(1,Ncoldata,1). > + LOOP #=1 TO Ncoldata. > + DO IF DMin(#)=DMax(#) AND DMin(#)=-999. > + COMPUTE datavec(#)=0. > + END IF. > + END LOOP. > + COMPUTE NVars=RSUM(datavec). > + COMPUTE Curvar=1. > + LOOP #=1 TO Ncoldata. > + DO IF datavec(#)=1. > + COMPUTE data(:,Curvar)=Data(:,#). > + COMPUTE vnames(Curvar)=vnames(#). > + COMPUTE Curvar=Curvar+1. > + END IF. > + END LOOP. > + COMPUTE vnames=vnames(1:NVars). > + COMPUTE data=data(:,1:NVars). > SAVE data / OUTFILE * / NAMES=vnames. > END MATRIX. > SHOW WORKSPACE. > RESTORE. > > > -- > View this message in context: http://spssx-discussion.1045642.n5.nabble.com/Deleting-Variables-with-Missings-Values-tp5713534p5713546.html > Sent from the SPSSX Discussion mailing list archive at Nabble.com. > > ===================== > To manage your subscription to SPSSX-L, send a message to > [hidden email] (not to SPSSX-L), with no body text except the > command. To leave the list, send the command > SIGNOFF SPSSX-L > For a list of commands to manage subscriptions, send the command > INFO REFCARD |
Administrator
|
I tested my MATRIX program against the fake data I generated N=60000, P=600 at the top and it worked PERFECTLY.
My WORKSPACE was set at 500000 but it could probably be set much lower.
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me. --- "Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis." Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?" |
In reply to this post by Ruben Geert van den Berg
If you install the Python Essentials and
the spssaux2.py module from the SPSS Community website, this program will
delete all the variables where all the cases are missing (including strings
that are entirely blank).
begin program. import spss, spssaux2 spssaux2.FindEmptyVars(delete=True) end program. I don't know how people come up with empty variables, but this functionality get requested surprisingly often. The materials can be found in the Downloads for SPSS Statistics section of the Community site (www.ibm.com/developerworks/spssdevcentral). The spssaux2.py module is in the Python Modules collection and should be copied to a place like the Python site-packages directory. Jon Peck (no "h") aka Kim Senior Software Engineer, IBM [hidden email] new phone: 720-342-5621 From: Ruben van den Berg <[hidden email]> To: [hidden email] Date: 06/06/2012 11:10 AM Subject: Re: [SPSSX-L] Deleting Variables with Missings Values Sent by: "SPSSX(r) Discussion" <[hidden email]> Dear Christian, You have a lot of options for this. 1) I'd certainly do it with Python but if you mean to say you're not willing to install Python I won't bother to write the syntax for you. 2) A reasonably automated option is to write syntax generating syntax (you basically [OMS] the descriptives to a new [DATASET] which you manipulate and save as plain text with the extension ".sps". You can then [INSERT] it into your master syntax file. 3) If you find this too complicated too, you could manually copy-paste your descriptives table into EXCEL, copy-paste the desired syntax into it and copy-paste back into a syntax window. BTW, are there any string variables in your data and if so, where? If you need any further assistance, please let us know! HTH, Ruben > Date: Wed, 6 Jun 2012 05:04:07 -0700 > From: [hidden email] > Subject: Deleting Variables with Missings Values > To: [hidden email] > > Hi, how can I delete variables in an SPSS file that contain missing values? I > want to do this with the syntax in SPSS. I am using version 20, with no > Python or so. > Thnx. a lot > Christian > > My SPSS file looks like this > > V1 V2 V3 > 1 . 4 > 2 . . > 3 . 3 > 4 . 2 > > I want either delete variables that only conain missings (= V2) or variables > that contain any missing value (= V3) > > -- > View this message in context: http://spssx-discussion.1045642.n5.nabble.com/Deleting-Variables-with-Missings-Values-tp5713534.html > Sent from the SPSSX Discussion mailing list archive at Nabble.com. > > ===================== > To manage your subscription to SPSSX-L, send a message to > [hidden email] (not to SPSSX-L), with no body text except the > command. To leave the list, send the command > SIGNOFF SPSSX-L > For a list of commands to manage subscriptions, send the command > INFO REFCARD |
In reply to this post by David Marso
Dear David,
I meant the first solution (the one with the FLIP and all that) didn't delete the variable it was supposed to when I tried it. Best, Ruben > Date: Wed, 6 Jun 2012 14:00:42 -0700 > From: [hidden email] > Subject: Re: Deleting Variables with Missing Values > To: [hidden email] > > I tested my MATRIX program against the fake data I generated N=60000, P=600 > at the top and it worked PERFECTLY. > My WORKSPACE was set at 500000 but it could probably be set much lower. > > > Ruben van den Berg wrote > > > > Dear David (and others), > > Ik didn't get your first solution and it didn't work for me either. The > > one I meant was: > > ***Create fake data***. > > datas clo all.set seed 1.INPUT PROGRAM.LOOP ID=1 TO 10.DO REPEAT V=V1 TO > > V10.COMPUTE V=rnd(rv.uni(0,5)).compute #=rv.ber(.1).if #=1 v=$sysmis.END > > REPEAT.END CASE.END LOOP.END FILE.END INPUT PROGRAM.EXE. > > dataset name original. > > ***OMS descriptives to new dataset***. > > DATASET DECLARE des. > > OMS /SELECT TABLES /IF COMMANDS=['Descriptives'] SUBTYPES=['Descriptive > > Statistics'] /DESTINATION FORMAT=SAV NUMBERED=TableNumber_ > > OUTFILE='des' /TAG='des'. > > descriptives v1 to v10. > > omsend tag=['des']. > > dataset activate des. > > ***Manipulate temp dataset***. > > select if N<10 and rtrim(var1)ne"Valid N (listwise)"./*OK, the "10" here > > must be inserted manually.*/ > > ***Write syntax***. > > write outfile'c:/delvars.sps'/'delete variables 'Var1'.'.exe. > > ***Apply syntax to ori data***. > > dataset close des.DATASET activate original.include 'c:/delvars.sps'. > > ***Delete syntax file***. > > erase file 'c:/delvars.sps'. > > But this is dumb syntax, IMNSHO this is better done with Python. > > HTH, > > Ruben > > > >> Date: Wed, 6 Jun 2012 12:19:11 -0700 > >> From: david.marso@ > >> Subject: Re: Deleting Variables with Missings Values > >> To: SPSSX-L@.UGA > >> > >> A fairly general solution requiring neither Python nor subject to > >> reasonable > >> sample size limitations also avoids a potentially FUGLY dual data > >> transposition issue [ ie delete my previous post from memory ;-))) ] > >> WTFWIT. > >> OTOH: If you have string variables get rid of them from the active file > >> and > >> merge them back in later! > >> -- > >> INPUT PROGRAM. > >> + LOOP CaseID=1 TO 60000. > >> + DO REPEAT V=V1 TO V600. > >> + COMPUTE V=NORMAL(1). > >> + END REPEAT. > >> + END CASE. > >> + END LOOP. > >> + END FILE. > >> END INPUT PROGRAM. > >> DO REPEAT V=v1 v12 v100 v30 v60 v200 v250 v260 v300 v500 v599. > >> + COMPUTE V=$SYSMIS. > >> END REPEAT. > >> EXE. > >> > >> PRESERVE. > >> SET MXLOOPS 60000. > >> MATRIX. > >> + GET DATA / FILE * / VAR ALL/MISSING=ACCEPT/SYSMIS = -999/NAMES=vnames. > >> + COMPUTE DMin=CMIN(DATA). > >> + COMPUTE DMax=CMAX(DATA). > >> + COMPUTE Ncoldata=NCOL(DATA). > >> + COMPUTE datavec=MAKE(1,Ncoldata,1). > >> + LOOP #=1 TO Ncoldata. > >> + DO IF DMin(#)=DMax(#) AND DMin(#)=-999. > >> + COMPUTE datavec(#)=0. > >> + END IF. > >> + END LOOP. > >> + COMPUTE NVars=RSUM(datavec). > >> + COMPUTE Curvar=1. > >> + LOOP #=1 TO Ncoldata. > >> + DO IF datavec(#)=1. > >> + COMPUTE data(:,Curvar)=Data(:,#). > >> + COMPUTE vnames(Curvar)=vnames(#). > >> + COMPUTE Curvar=Curvar+1. > >> + END IF. > >> + END LOOP. > >> + COMPUTE vnames=vnames(1:NVars). > >> + COMPUTE data=data(:,1:NVars). > >> SAVE data / OUTFILE * / NAMES=vnames. > >> END MATRIX. > >> SHOW WORKSPACE. > >> RESTORE. > >> > >> > >> -- > >> View this message in context: > >> http://spssx-discussion.1045642.n5.nabble.com/Deleting-Variables-with-Missings-Values-tp5713534p5713546.html > >> Sent from the SPSSX Discussion mailing list archive at Nabble.com. > >> > >> ===================== > >> To manage your subscription to SPSSX-L, send a message to > >> LISTSERV@.UGA (not to SPSSX-L), with no body text except the > >> command. To leave the list, send the command > >> SIGNOFF SPSSX-L > >> For a list of commands to manage subscriptions, send the command > >> INFO REFCARD > > > > > -- > View this message in context: http://spssx-discussion.1045642.n5.nabble.com/Deleting-Variables-with-Missings-Values-tp5713534p5713548.html > Sent from the SPSSX Discussion mailing list archive at Nabble.com. > > ===================== > To manage your subscription to SPSSX-L, send a message to > [hidden email] (not to SPSSX-L), with no body text except the > command. To leave the list, send the command > SIGNOFF SPSSX-L > For a list of commands to manage subscriptions, send the command > INFO REFCARD |
Administrator
|
Hi Ruben. You might have missed it, but David later retracted that solution. I quote:
"[ ie delete my previous post from memory ;-))) ] WTFWIT" ;-)
--
Bruce Weaver bweaver@lakeheadu.ca http://sites.google.com/a/lakeheadu.ca/bweaver/ "When all else fails, RTFM." PLEASE NOTE THE FOLLOWING: 1. My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above. 2. The SPSSX Discussion forum on Nabble is no longer linked to the SPSSX-L listserv administered by UGA (https://listserv.uga.edu/). |
Administrator
|
I tested that first solution and it works. Maybe we need to see any error message(s).
The reason for my redaction is that FLIP can be painful if N is large. The MATRIX solution avoids that issue sacrificing a bit of cognitive overhead. ---
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me. --- "Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis." Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?" |
Administrator
|
It occurred to me that the MATRIX solution is weighed down by using the raw data and is not readily scalable. Also any meta data is wiped (NOT GOOD but could be patched with APPLY DICTIONARY).
Also changed this to eliminate any variables which are CONSTANT across the file. Wonder how this performs relative to Jon's Python function? Say with 1000000 cases and 600 variables? It would basically be a showdown between native AGGREGATE and python cursor.fetchone . ---- Hence: 1. AGGREGATE the data and locate min/max for all variables. 2. Use the AGGREGATE in the MATRIX code (blindingly fast). 3. Use ADD FILES to shuffle the dictionary. 4. Nuke the two Min/Max cases. 4. DELETE variables between two marker variables. ------------- GET FILE 'C:\Temp\RawData.sav'. COMPUTE @=1. AGGREGATE OUTFILE 'C:\TEMP\MinMax.sav' / BREAK @NoBreak / CaseID V1 TO v600=MIN(CaseID V1 TO V600) / @=MAX(@) / XCaseID MX_V1 TO MX_V600=MAX(CaseID V1 TO V600) . PRESERVE. SET MXLOOPS 60000. MATRIX. + GET DATA / FILE 'C:\TEMP\MinMax.sav' / VAR ALL/ MISSING=ACCEPT/ SYSMIS = -999 / NAMES=vnames. + COMPUTE Data=RESHAPE(Data,2,NCOL(DATA)/2). + COMPUTE DMinMax=CMAX(DATA)-CMIN(DATA). + COMPUTE Ncoldata=NCOL(DATA). + COMPUTE datavec=MAKE(1,Ncoldata,1). + LOOP #=1 TO Ncoldata. + DO IF DMinMax(#)=0. + COMPUTE datavec(#)=0. + END IF. + END LOOP. + COMPUTE NVars=RSUM(datavec). + COMPUTE Curvar=1. + LOOP #=1 TO Ncoldata. + DO IF datavec(#)=1. + COMPUTE data(:,Curvar)=Data(:,#). + COMPUTE vnames(Curvar)=vnames(#). + COMPUTE Curvar=Curvar+1. + END IF. + END LOOP. + COMPUTE vnames={vnames(1:NVars),'@@'}. + COMPUTE data={data(:,1:NVars),{0;0}}. SAVE data / OUTFILE 'C:\TEMP\Reordered.sav' / NAMES=vnames. END MATRIX. RESTORE. ADD FILES FILE 'C:\TEMP\Reordered.sav'/FILE * . SELECT IF SYSMIS(@@). MATCH FILES / FILE * /DROP @NoBreak @@ TO @. EXE.
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me. --- "Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis." Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?" |
In reply to this post by David Marso
Thanks to all for the help!
|
In reply to this post by Jon K Peck
Hi Jon,
Thank you so much for the code. It really really helps me a lot! Since my data set has thousands of variables and it takes too long (more than 5 minutes) when loading data to Python list. By doing this way, it's now only a few seconds. Your support is greatly appreciated. Jon K Peck wrote > If you install the Python Essentials and the spssaux2.py module from the > SPSS Community website, this program will delete all the variables where > all the cases are missing (including strings that are entirely blank). > > begin program. > import spss, spssaux2 > spssaux2.FindEmptyVars(delete=True) > end program. > > I don't know how people come up with empty variables, but this > functionality get requested surprisingly often. > > The materials can be found in the Downloads for SPSS Statistics section of > the Community site (www.ibm.com/developerworks/spssdevcentral). > The spssaux2.py module is in the Python Modules collection and should be > copied to a place like > the Python site-packages directory. > > Jon Peck (no "h") aka Kim > Senior Software Engineer, IBM > peck@.ibm > new phone: 720-342-5621 -- Sent from: http://spssx-discussion.1045642.n5.nabble.com/ ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
This post was updated on .
In reply to this post by Jon K Peck
Hi Jon,
Thank you so much for your help. It really really helps me a lot. Since my data set has thousands of variables and it takes too long when loading data to a Python list (around 5 minutes or more). But now it's only a few seconds by doing this way. Your support is greatly appreciated. Jon K Peck wrote > If you install the Python Essentials and the spssaux2.py module from the > SPSS Community website, this program will delete all the variables where > all the cases are missing (including strings that are entirely blank). > > begin program. > import spss, spssaux2 > spssaux2.FindEmptyVars(delete=True) > end program. > > I don't know how people come up with empty variables, but this > functionality get requested surprisingly often. > > The materials can be found in the Downloads for SPSS Statistics section of > the Community site (www.ibm.com/developerworks/spssdevcentral). > The spssaux2.py module is in the Python Modules collection and should be > copied to a place like > the Python site-packages directory. > > Jon Peck (no "h") aka Kim > Senior Software Engineer, IBM > peck@.ibm > new phone: 720-342-5621 -- Sent from: http://spssx-discussion.1045642.n5.nabble.com/ ===================== To manage your subscription to SPSSX-L, send a message to LISTSERV@LISTSERV.UGA.EDU (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
Glad this helped. Since my message below is very old, I should clarify that the spssaux2 is now installed as part of the Python Essentials and does not need to be set up separately. It has a lot of other useful functions besides FindEmptyVariables. On Wed, Jul 25, 2018 at 9:31 PM Alice <[hidden email]> wrote: Hi Jon, -- |
Hi Jon,
Thank you for your clarification. I'm using SPSS 24 and the spssaux2 module is already in the Python site-packages directory now. It also has a lot of useful functions as you said. However, when I run the FindEmptyVariables function with a data set with more than 21000 variables, the SPSS takes a lot of time processing the command AGGREGATE, more than 15 minutes. So I have tried another solution by dividing the data set into each variable group (around 2000 variables for each) before running this function and it works really fast. I have also tried with the command SPSSINC SELECT VARIABLES below and have troubled with how I can loop through each item in the resulted macro and CONCAT all their values? I can do this by using a little bit Python code as well but I prefer using SPSS syntax since my other non-IT background team members can understand it easily. Do you have any idea about this? *SPSS:* SPSSINC SELECT VARIABLES MACRONAME="!test" /PROPERTIES PATTERN = "Qad_\d{5}_Resp\d{2}" /OPTIONS ORDER=FILE REVERSE=NO PRINT=YES SEPARATOR=" ". FREQUENCIES !test. *Python:* BEGIN PROGRAM. import spss, re varlist = [] for ind in range(spss.GetVariableCount()): varName = spss.GetVariableName(ind) if re.match("Qad_\d{5}_Resp\d{2}",varName): varlist.append(varName) # Concat each value in the varlist here... END PROGRAM. Jon Peck wrote > Glad this helped. > > Since my message below is very old, I should clarify that the spssaux2 is > now installed as part of the Python Essentials and does not need to be set > up separately. It has a lot of other useful functions besides > FindEmptyVariables. > > On Wed, Jul 25, 2018 at 9:31 PM Alice < > thuyminhs@ > > wrote: > >> Hi Jon, >> >> Thank you so much for your help. It really really helps a lot. >> >> Since my data set has thousands of variables and it takes too long when >> loading data to Python list (around 5 minutes or more). But now it's only >> a >> few seconds by doing this way. >> >> Your support is greatly appreciated. >> >> >> Jon K Peck wrote >> > If you install the Python Essentials and the spssaux2.py module from >> the >> > SPSS Community website, this program will delete all the variables >> where >> > all the cases are missing (including strings that are entirely blank). >> > >> > begin program. >> > import spss, spssaux2 >> > spssaux2.FindEmptyVars(delete=True) >> > end program. >> > >> > I don't know how people come up with empty variables, but this >> > functionality get requested surprisingly often. >> > >> > The materials can be found in the Downloads for SPSS Statistics section >> of >> > the Community site (www.ibm.com/developerworks/spssdevcentral). >> > The spssaux2.py module is in the Python Modules collection and should >> be >> > copied to a place like >> > the Python site-packages directory. >> > >> > Jon Peck (no "h") aka Kim >> > Senior Software Engineer, IBM >> >> > peck@.ibm >> >> > new phone: 720-342-5621 >> >> >> >> >> >> -- >> Sent from: http://spssx-discussion.1045642.n5.nabble.com/ >> >> ===================== >> To manage your subscription to SPSSX-L, send a message to >> > LISTSERV@.UGA > (not to SPSSX-L), with no body text except the >> command. To leave the list, send the command >> SIGNOFF SPSSX-L >> For a list of commands to manage subscriptions, send the command >> INFO REFCARD >> > -- > Jon K Peck > jkpeck@ > > ===================== > To manage your subscription to SPSSX-L, send a message to > LISTSERV@.UGA > (not to SPSSX-L), with no body text except the > command. To leave the list, send the command > SIGNOFF SPSSX-L > For a list of commands to manage subscriptions, send the command > INFO REFCARD -- Sent from: http://spssx-discussion.1045642.n5.nabble.com/ ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
In the macro generated by SELECT VARIABLES the variables are already concatenated with a blank separator (or other separator of your choice) as you can see from your FREQUENCIES command. So, if you specify a comma as the separator, you can just write a compute with CONCAT(!test) assuming that the variables are all strings and you declare the result with the STRING command. On Fri, Jul 27, 2018 at 9:55 PM Alice <[hidden email]> wrote: Hi Jon, -- |
Free forum by Nabble | Edit this page |