|
Dear all,
I merged 9 data files with ADD FILES. However, to make sure that the variable labels are identical over the 9 files, I made a table with a single column of variable names and the corresponding variable labels for each of the 9 files (so 10 string variables in total). Since the original files had a set of some 70 variables in common, my 'variable label table' has some 70 lines. Ideally, all variable labels should be identical but on visual inspection I've already spotted some slight differences. What I was thinking about, is to count the number of different values within 'respondents' over my 9 string variables in order to identify those variables for which labels differ between files. I thought about FLIPping the data and using OMS and FREQUENCIES but I think FLIP doesn't work with strings. Does anybody have an idea whether/how this is possible? I've Python installed but virtually no experience with it. Thanks a lot! Ruben van den Berg New Windows 7: Find the right PC for you. Learn more. |
|
The easiest way to get a table of variable labels across files would be to use the GATHERMD extension command. You give it a file specification, and it reads all the files and collects variable names and labels. (The original motivation was to catalog a lot of datasets). From that, you could just do FREQUENCIES on the label column after filtering by the set of variable names of interest. This extension command will work with V17 or 18 and probably works with V16, too. Of course it requires the Python plugin and the extension command, both of which can be downloaded from SPSS Developer Central, www.spss.com/DevCentral. HTH, Jon Peck SPSS, an IBM Company [hidden email] 312-651-3435
Dear all, I merged 9 data files with ADD FILES. However, to make sure that the variable labels are identical over the 9 files, I made a table with a single column of variable names and the corresponding variable labels for each of the 9 files (so 10 string variables in total). Since the original files had a set of some 70 variables in common, my 'variable label table' has some 70 lines. Ideally, all variable labels should be identical but on visual inspection I've already spotted some slight differences. What I was thinking about, is to count the number of different values within 'respondents' over my 9 string variables in order to identify those variables for which labels differ between files. I thought about FLIPping the data and using OMS and FREQUENCIES but I think FLIP doesn't work with strings. Does anybody have an idea whether/how this is possible? I've Python installed but virtually no experience with it. Thanks a lot! Ruben van den Berg New Windows 7: Find the right PC for you. Learn more. |
|
Dear Jon,
GATHERMD is lovely and I'll surely use it more often. Especially the ability to get an overview of all SPSS files in a single folder is great! But honestly, it didn't really solve the problem I posted. The easiest -but unelegant- solution was saving the entire file as XLS, transposing it in XLS, and reopening it in SPSS (essentially FLIP with string variables). Now I could use OMS -> frequencies -> aggregate -> match files to add the number of different string values to the original table. I realized the structure of the original table (varname and 9 labels in single rows) facilitated the intercomparison of the labels a lot. Kind regards! Ruben van den Berg Date: Thu, 29 Oct 2009 08:33:44 -0600 From: [hidden email] Subject: Re: Check whether 9 string variables are identical over some 70 "respondents" To: [hidden email] The easiest way to get a table of variable labels across files would be to use the GATHERMD extension command. You give it a file specification, and it reads all the files and collects variable names and labels. (The original motivation was to catalog a lot of datasets). From that, you could just do FREQUENCIES on the label column after filtering by the set of variable names of interest. This extension command will work with V17 or 18 and probably works with V16, too. Of course it requires the Python plugin and the extension command, both of which can be downloaded from SPSS Developer Central, www.spss.com/DevCentral. HTH, Jon Peck SPSS, an IBM Company [hidden email] 312-651-3435
Dear all, I merged 9 data files with ADD FILES. However, to make sure that the variable labels are identical over the 9 files, I made a table with a single column of variable names and the corresponding variable labels for each of the 9 files (so 10 string variables in total). Since the original files had a set of some 70 variables in common, my 'variable label table' has some 70 lines. Ideally, all variable labels should be identical but on visual inspection I've already spotted some slight differences. What I was thinking about, is to count the number of different values within 'respondents' over my 9 string variables in order to identify those variables for which labels differ between files. I thought about FLIPping the data and using OMS and FREQUENCIES but I think FLIP doesn't work with strings. Does anybody have an idea whether/how this is possible? I've Python installed but virtually no experience with it. Thanks a lot! Ruben van den Berg New Windows 7: Find the right PC for you. Learn more. Express yourself instantly with MSN Messenger! MSN Messenger |
|
In reply to this post by Jon K Peck
Hi,
Just for fun and in case you have version < 16, I created the code below. It loops over all savs in a given dir and for each specified var, it returns a list of unique variable names, as well as the number of unique variable names. It's case-sensitive, and it will even nag about differences in preceding and trailing blanks. * sample code to generate some files. begin program. import spss, random for fileno in range(20): suffix1, suffix2 = random.randint(0, 20), random.randint(20, 40) spss.Submit(""" data list free / respondent (a5) somevar (a10). begin data 'blah' 'qwerty' end data. variable label respondent 'mylabel %02d' / somevar 'somelabel %02d'. save outfile = 'd:/temp2/file_%02d.sav'. new file. """ % (suffix1, suffix2, fileno)) end program. * actual code. begin program. import os, spss, spssaux def func(var, path): savs = [os.path.join(path, sav) for sav in os.listdir(path) if sav.lower().endswith(".sav")] labels = [] for sav in sorted(savs): spssaux.OpenDataFile(sav) for v in spssaux.VariableDict(var): labels.append(v.VariableLabel) varname = v.VariableName.upper() print varname, "- there are", len(set(labels)), "unique variable labels out of a total of", len(labels), ":" for unique_label in sorted(frozenset(labels)): print "\t" + unique_label def checkvars (vars_to_be_checked, path="d:/temp"): for var in vars_to_be_checked: func(var, path) print "\n" + 70 * "*" checkvars (vars_to_be_checked = ["respondent", "somevar"], path="d:/temp2") end program. Cheers!! Albert-Jan ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Before you criticize someone, walk a mile in their shoes, that way when you do criticize them, you're a mile away and you have their shoes! ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ --- On Thu, 10/29/09, Jon K Peck <[hidden email]> wrote: > From: Jon K Peck <[hidden email]> > Subject: Re: [SPSSX-L] Check whether 9 string variables are identical over some 70 "respondents" > To: [hidden email] > Date: Thursday, October 29, 2009, 3:33 PM > > > The easiest way to get a > table of variable > labels across files would be to use the GATHERMD extension > command. You > give it a file specification, and it reads all the files > and collects variable > names and labels. (The original motivation was to > catalog a lot of > datasets). From that, you could just do FREQUENCIES > on the label > column after filtering by the set of variable names of > interest. > > > > This extension command > will work with > V17 or 18 and probably works with V16, too. Of course > it requires > the Python plugin and the extension command, both of which > can be downloaded > from SPSS Developer Central, www.spss.com/DevCentral. > > > > HTH, > > > > Jon Peck > > SPSS, an IBM Company > > [hidden email] > > 312-651-3435 > > > > > > > > > From: > Ruben van den Berg > <[hidden email]> > > To: > [hidden email] > > Date: > 10/29/2009 08:18 > AM > > Subject: > [SPSSX-L] Check > whether 9 string variables > are identical over > some > 70 > "respondents" > > Sent > by: > "SPSSX(r) > Discussion" > <[hidden email]> > > > > > > > > > Dear all, > > > > I merged 9 data files with ADD FILES. However, to make sure > that the variable > labels are identical over the 9 files, I made a table with > a single column > of variable names and the corresponding variable labels for > each of the > 9 files (so 10 string variables in total). Since the > original files had > a set of some 70 variables in common, my 'variable > label table' has some > 70 lines. Ideally, all variable labels should be identical > but on visual > inspection I've already spotted some slight > differences. > > > > What I was thinking about, is to count the number of > different values > within 'respondents' over my 9 string variables > in order to identify > those variables for which labels differ between files. I > thought about > FLIPping the data and using OMS and FREQUENCIES but I think > FLIP doesn't > work with strings. > > > > Does anybody have an idea whether/how this is possible? > I've Python installed > but virtually no experience with it. > > > > Thanks a lot! > > > > Ruben van den Berg > > > > > > > > > > > > > > > > > > > New Windows 7: Find the > right PC for you. > Learn > more. > > > > ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
|
Could we use some data visualization technique since this is just some
70 rows Sent from my iPhone On Oct 29, 2009, at 12:26, Albert-Jan Roskam <[hidden email]> wrote: > Hi, > > Just for fun and in case you have version < 16, I created the code > below. It loops over all savs in a given dir and for each specified > var, it returns a list of unique variable names, as well as the > number of unique variable names. It's case-sensitive, and it will > even nag about differences in preceding and trailing blanks. > > * sample code to generate some files. > begin program. > import spss, random > for fileno in range(20): > suffix1, suffix2 = random.randint(0, 20), random.randint(20, 40) > spss.Submit(""" > data list free / respondent (a5) somevar (a10). > begin data > 'blah' 'qwerty' > end data. > variable label respondent 'mylabel %02d' / somevar 'somelabel %02d'. > save outfile = 'd:/temp2/file_%02d.sav'. > new file. > """ % (suffix1, suffix2, fileno)) > end program. > > * actual code. > begin program. > import os, spss, spssaux > def func(var, path): > savs = [os.path.join(path, sav) for sav in os.listdir(path) if > sav.lower().endswith(".sav")] > labels = [] > for sav in sorted(savs): > spssaux.OpenDataFile(sav) > for v in spssaux.VariableDict(var): > labels.append(v.VariableLabel) > varname = v.VariableName.upper() > print varname, "- there are", len(set(labels)), "unique variable > labels out of a total of", len(labels), ":" > for unique_label in sorted(frozenset(labels)): > print "\t" + unique_label > def checkvars (vars_to_be_checked, path="d:/temp"): > for var in vars_to_be_checked: > func(var, path) > print "\n" + 70 * "*" > checkvars (vars_to_be_checked = ["respondent", "somevar"], path="d:/ > temp2") > end program. > > > Cheers!! > Albert-Jan > > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > Before you criticize someone, walk a mile in their shoes, that way > when you do criticize them, you're a mile away and you have their > shoes! > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > > > --- On Thu, 10/29/09, Jon K Peck <[hidden email]> wrote: > >> From: Jon K Peck <[hidden email]> >> Subject: Re: [SPSSX-L] Check whether 9 string variables are >> identical over some 70 "respondents" >> To: [hidden email] >> Date: Thursday, October 29, 2009, 3:33 PM >> >> >> The easiest way to get a >> table of variable >> labels across files would be to use the GATHERMD extension >> command. You >> give it a file specification, and it reads all the files >> and collects variable >> names and labels. (The original motivation was to >> catalog a lot of >> datasets). From that, you could just do FREQUENCIES >> on the label >> column after filtering by the set of variable names of >> interest. >> >> >> >> This extension command >> will work with >> V17 or 18 and probably works with V16, too. Of course >> it requires >> the Python plugin and the extension command, both of which >> can be downloaded >> from SPSS Developer Central, www.spss.com/DevCentral. >> >> >> >> HTH, >> >> >> >> Jon Peck >> >> SPSS, an IBM Company >> >> [hidden email] >> >> 312-651-3435 >> >> >> >> >> >> >> >> >> From: >> Ruben van den Berg >> <[hidden email]> >> >> To: >> [hidden email] >> >> Date: >> 10/29/2009 08:18 >> AM >> >> Subject: >> [SPSSX-L] Check >> whether 9 string variables >> are identical over >> some >> 70 >> "respondents" >> >> Sent >> by: >> "SPSSX(r) >> Discussion" >> <[hidden email]> >> >> >> >> >> >> >> >> >> Dear all, >> >> >> >> I merged 9 data files with ADD FILES. However, to make sure >> that the variable >> labels are identical over the 9 files, I made a table with >> a single column >> of variable names and the corresponding variable labels for >> each of the >> 9 files (so 10 string variables in total). Since the >> original files had >> a set of some 70 variables in common, my 'variable >> label table' has some >> 70 lines. Ideally, all variable labels should be identical >> but on visual >> inspection I've already spotted some slight >> differences. >> >> >> >> What I was thinking about, is to count the number of >> different values >> within 'respondents' over my 9 string variables >> in order to identify >> those variables for which labels differ between files. I >> thought about >> FLIPping the data and using OMS and FREQUENCIES but I think >> FLIP doesn't >> work with strings. >> >> >> >> Does anybody have an idea whether/how this is possible? >> I've Python installed >> but virtually no experience with it. >> >> >> >> Thanks a lot! >> >> >> >> Ruben van den Berg >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> New Windows 7: Find the >> right PC for you. >> Learn >> more. >> >> >> >> > > ===================== > To manage your subscription to SPSSX-L, send a message to > [hidden email] (not to SPSSX-L), with no body text except > the > command. To leave the list, send the command > SIGNOFF SPSSX-L > For a list of commands to manage subscriptions, send the command > INFO REFCARD ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
|
Hi,
Where do I get the patch for spss 17 (that fixed the recoding problem)? Thanks, Sandy Sandra T. Sigmon, Ph.D. Professor, Department of Psychology Senior Scientist, Maine Institute of Human Genetics & Health 376 Little Hall University of Maine, Orono, ME 04469 phone: 207-581-2049 fax: 207-581-6128 ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
|
Hi Sandra,
The 17.0.2 patch resolved the problem of limits in the Recode into Same Variable dialog. You can find it on the Support web site at http://support.spss.com . Once you enter the site, click the Statistics link at the left side of the page, then click the Patches link that appears under Statistics. You will see links for the 17.0.2 and earlier patches as well as for the 17.0.3 patch. The Resolution below my signature is also available at the support web site. Click the Knowledgebase Search link on the main page of the site. You can find this resolution by entering "patch" and "recode" (quotes not necessary) into the search terms box. David Matheson Statistical Support SPSS, an IBM company ******************** Resolution number: 81717 Created on: Jan 26 2009 Last Reviewed on: Aug 7 2009 Problem Subject: Problem with Transform and the Recode into same Variables selection Problem Description: In cleaning a data file, I wanted to Recode values that had been partially entered as lower-case into upper-case only, but found that the procedure "Recode into same Variable" has a bug - only 6 changed values can be added. This window should change to have scroll bars on the right but doesn't. It works perfectly in V16 and also in V17 when using "Recode into a different Variable". How can I get around this? Resolution Subject: Issue resolved in 17.0.2 patch. Resolution Description: This problem has been corrected in version 17.0.2 patch. Please visit support.spss.com, register and log in to download patches. We apologize for any inconvenience. -----Original Message----- From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Sandra Sigmon Sent: Thursday, October 29, 2009 12:09 PM To: [hidden email] Subject: need help with spss 17 for mac Hi, Where do I get the patch for spss 17 (that fixed the recoding problem)? Thanks, Sandy Sandra T. Sigmon, Ph.D. Professor, Department of Psychology Senior Scientist, Maine Institute of Human Genetics & Health 376 Little Hall University of Maine, Orono, ME 04469 phone: 207-581-2049 fax: 207-581-6128 ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
|
Administrator
|
In reply to this post by Ruben Geert van den Berg
Hi Ruben. If I understand, something like this might work for you. DATASET DECLARE vinfo. OMS /SELECT TABLES /IF COMMANDS=['Sysfile Info'] SUBTYPES=['Variable Information'] /DESTINATION FORMAT=SAV NUMBERED=FileNo OUTFILE='vinfo'. SYSFILE INFO 'C:\MyFolder\file1.sav'. SYSFILE INFO 'C:\MyFolder\file2.sav'. * etc . SYSFILE INFO 'C:\MyFolder\file9.sav'. OMSEND. dataset activate vinfo window = front. * Keep only the needed variables in VINFO . * For now, I assume that is FileNo, Var1 and Label . match files file = * / keep = FileNo Var1 Label . exe. * Now restructure to move all variable labels onto a single row . sort cases by Var1 FileNo. CASESTOVARS /ID=Var1 /INDEX=FileNo /GROUPBY=VARIABLE /AUTOFIX = NO . If you set AUTOFIX to YES (the default), then in the event that all files have exactly the same variable labels, you'll end up having a single LABEL variable, not LABEL.1, LABEL.2, ... LABEL.9.
--
Bruce Weaver bweaver@lakeheadu.ca http://sites.google.com/a/lakeheadu.ca/bweaver/ "When all else fails, RTFM." PLEASE NOTE THE FOLLOWING: 1. My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above. 2. The SPSSX Discussion forum on Nabble is no longer linked to the SPSSX-L listserv administered by UGA (https://listserv.uga.edu/). |
| Free forum by Nabble | Edit this page |
