We'd like to read data stored in an Access database where the
descriptive labels on categorical variables are stored in separate tables. The ODBC import will pull the fields containing the labels in as additional variables and we'd like to get them to be used as value labels instead. Is there a way to assign the values of one variable (a string variable) as the value labels for another variable, i.e., the equivalent of the labmask function available as part of the labutil package in Stata? In Stata: labmask targetvar, values(stringvar) Or otherwise get SPSS to recognize the fields as value labels on import? Any tips/tricks would be greatly appreciated! ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
Administrator
|
If it is not important to retain the original numeric values of the numeric variable, then you could just use the string variable containing the labels in an AUTORECODE. E.g.,
AUTORECODE VARIABLES= LabelVar /INTO=NewVar . NewVar will be a numeric variable with value labels equal to the values of LabelVar. From the FM re the DESCENDING sub-command: By default, values for the source variable are recoded in ascending order (from lowest to highest). DESCENDING assigns the values to new variables in descending order (from highest to lowest). The largest value is assigned 1, the second-largest, 2, and so on. HTH.
--
Bruce Weaver bweaver@lakeheadu.ca http://sites.google.com/a/lakeheadu.ca/bweaver/ "When all else fails, RTFM." PLEASE NOTE THE FOLLOWING: 1. My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above. 2. The SPSSX Discussion forum on Nabble is no longer linked to the SPSSX-L listserv administered by UGA (https://listserv.uga.edu/). |
Bruce's solution is very simple and efficient.
If you are going to draw the data repeatedly and the set of values
actually in the data may vary AND you need to keep the autorecode output
the same as far as preexisting values go, use the TEMPLATE feature of AUTORECODE
to maintain consistent recoding.
If you need to keep the original values, a little more work is required. If just a few variables are involved, you could do an aggregated query for each, producing a separate table of values and labels for each. Then a little bit of Python code would do the job. If there are a lot of variables, that would be clumsy, and the whole task could be better done in a somewhat more complicated Python job. Python, of course, would require that you install the (free) Python Essentials if you don't already have that. Jon Peck (no "h") aka Kim Senior Software Engineer, IBM [hidden email] phone: 720-342-5621 From: Bruce Weaver <[hidden email]> To: [hidden email], Date: 01/06/2014 04:05 PM Subject: Re: [SPSSX-L] value labels from string vars Sent by: "SPSSX(r) Discussion" <[hidden email]> If it is not important to retain the original numeric values of the numeric variable, then you could just use the string variable containing the labels in an AUTORECODE. E.g., AUTORECODE VARIABLES= LabelVar /INTO=NewVar . NewVar will be a numeric variable with value labels equal to the values of LabelVar. From the FM re the DESCENDING sub-command: By default, values for the source variable are recoded in ascending order (from lowest to highest). DESCENDING assigns the values to new variables in descending order (from highest to lowest). The largest value is assigned 1, the second-largest, 2, and so on. HTH. Elizabeth Tighe wrote > We'd like to read data stored in an Access database where the > descriptive labels on categorical variables are stored in separate > tables. The ODBC import will pull the fields containing the labels in > as additional variables and we'd like to get them to be used as value > labels instead. > > Is there a way to assign the values of one variable (a string variable) > as the value labels for another variable, i.e., the equivalent of the > labmask function available as part of the labutil package in Stata? > > In Stata: > > labmask targetvar, values(stringvar) > > Or otherwise get SPSS to recognize the fields as value labels on import? > > > Any tips/tricks would be greatly appreciated! > > ===================== > To manage your subscription to SPSSX-L, send a message to > LISTSERV@.UGA > (not to SPSSX-L), with no body text except the > command. To leave the list, send the command > SIGNOFF SPSSX-L > For a list of commands to manage subscriptions, send the command > INFO REFCARD ----- -- Bruce Weaver [hidden email] http://sites.google.com/a/lakeheadu.ca/bweaver/ "When all else fails, RTFM." NOTE: My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above. -- View this message in context: http://spssx-discussion.1045642.n5.nabble.com/value-labels-from-string-vars-tp5723809p5723810.html Sent from the SPSSX Discussion mailing list archive at Nabble.com. ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
We have many variables and we do need to maintain the original numeric
values. We'll just transfer to Stata and use the labmask function there ... was hoping to save this step and stick with SPSS if it was do-able in SPSS directly. many thanks, Liz On 1/6/2014 6:14 PM, Jon K Peck wrote: > Bruce's solution is very simple and efficient. If you are going to draw > the data repeatedly and the set of values actually in the data may vary > AND you need to keep the autorecode output the same as far as > preexisting values go, use the TEMPLATE feature of AUTORECODE to > maintain consistent recoding. > > If you need to keep the original values, a little more work is required. > If just a few variables are involved, you could do an aggregated query > for each, producing a separate table of values and labels for each. > Then a little bit of Python code would do the job. If there are a lot > of variables, that would be clumsy, and the whole task could be better > done in a somewhat more complicated Python job. Python, of > course, would require that you install the (free) Python Essentials if > you don't already have that. > > > Jon Peck (no "h") aka Kim > Senior Software Engineer, IBM > [hidden email] > phone: 720-342-5621 > > > > > From: Bruce Weaver <[hidden email]> > To: [hidden email], > Date: 01/06/2014 04:05 PM > Subject: Re: [SPSSX-L] value labels from string vars > Sent by: "SPSSX(r) Discussion" <[hidden email]> > ------------------------------------------------------------------------ > > > > If it is not important to retain the original numeric values of the numeric > variable, then you could just use the string variable containing the labels > in an AUTORECODE. E.g., > > AUTORECODE VARIABLES= LabelVar /INTO=NewVar . > > NewVar will be a numeric variable with value labels equal to the values of > LabelVar. > > From the FM re the DESCENDING sub-command: > > By default, values for the source variable are recoded in ascending order > (from lowest to highest). > DESCENDING assigns the values to new variables in descending order (from > highest to lowest). > The largest value is assigned 1, the second-largest, 2, and so on. > > HTH. > > > > Elizabeth Tighe wrote > > We'd like to read data stored in an Access database where the > > descriptive labels on categorical variables are stored in separate > > tables. The ODBC import will pull the fields containing the labels in > > as additional variables and we'd like to get them to be used as value > > labels instead. > > > > Is there a way to assign the values of one variable (a string variable) > > as the value labels for another variable, i.e., the equivalent of the > > labmask function available as part of the labutil package in Stata? > > > > In Stata: > > > > labmask targetvar, values(stringvar) > > > > Or otherwise get SPSS to recognize the fields as value labels on import? > > > > > > Any tips/tricks would be greatly appreciated! > > > > ===================== > > To manage your subscription to SPSSX-L, send a message to > > > LISTSERV@.UGA > > > (not to SPSSX-L), with no body text except the > > command. To leave the list, send the command > > SIGNOFF SPSSX-L > > For a list of commands to manage subscriptions, send the command > > INFO REFCARD > > > > > > ----- > -- > Bruce Weaver > [hidden email] > http://sites.google.com/a/lakeheadu.ca/bweaver/ > > "When all else fails, RTFM." > > NOTE: My Hotmail account is not monitored regularly. > To send me an e-mail, please use the address shown above. > > -- > View this message in context: > http://spssx-discussion.1045642.n5.nabble.com/value-labels-from-string-vars-tp5723809p5723810.html > Sent from the SPSSX Discussion mailing list archive at Nabble.com. > > ===================== > To manage your subscription to SPSSX-L, send a message to > [hidden email] (not to SPSSX-L), with no body text except the > command. To leave the list, send the command > SIGNOFF SPSSX-L > For a list of commands to manage subscriptions, send the command > INFO REFCARD > > ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
In reply to this post by Elizabeth Tighe
At 05:46 PM 1/6/2014, Elizabeth Tighe wrote:
>We'd like to read data stored in an Access database where the >descriptive labels on categorical variables are stored in separate >tables. The ODBC import will pull the fields containing the labels >in as additional variables and we'd like to get them to be used as >value labels instead. > >Is there a way to get SPSS to recognize the fields as value labels on import? Here's another approach, maybe practicable, maybe not. It depends on how much control you have over your ODBC import. It sounds like the SQL for the ODBC joins the separate tables with the descriptive labels to the data table, and then imports the result; and then, in effect, you want to UNDO the join, and have a lookup table of descriptive values again. (A set of value labels is, in effect, such a table.) I'd look into, . Importing the data *without* joining to the tables with labels. Then, you'll have the numeric values of the categorical variables; you won't have the extra variables; and you won't, at that point, have any labels for the categories. . Separately, importing each of the tables with the descriptive labels. I presume each of these has two fields: a numeric one with the numeric value, and a character one with the corresponding descriptive label. You'll get, then, for each a corresponding SPSS file with the same variables: numeric value, and string with the descriptive label. . Converting the imported tables to VALUE LABELS statements. That can be done a number of ways; my own preference would be to use native SPSS code to create the VALUE LABELS statements, write those to an external file, and INSERT them in a syntax program. Now, you later write, "We have many variables and we do need to maintain the original numeric values." The approach I'm outlining maintains the numeric variables; but it does take some work for each set of value labels. One question is how many *different* sets of value labels you have; basically, you create the VALUE LABELS statement once for each such set, and on it name all the variables to which it applies. You'll have to decide how this compares in difficulty to the SPSS -> STATA -> SPSS route. ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
In reply to this post by Elizabeth Tighe
At 05:46 PM 1/6/2014, Elizabeth Tighe wrote:
>We'd like to read data [from] an Access database where the >descriptive labels on categorical variables are stored in separate >tables. The ODBC import will pull the fields containing the labels >in as additional variables and we'd like to get them to be used as >value labels instead. The thread and list have moved on, and presumably the original poster either does, or doesn't, like any of the solutions proposed. However, here is a solution in native SPSS that, I believe, does what was originally requested. Admittedly, it took some coding. To use: 1. Specify a file handle for file 'Code', usually a temporary location 2. On the VARSTOCASES statement, . Change "/KEEP = CaseID" to name whatever case identifier is in your data -- or, omit this clause altogether; the case identifier isn't actually used. . On "/MAKE Value", name the variables to which you want to assign value labels; on "/MAKE Label", name the corresponding variables containing the labels. 3. If desired, after running drop the label variables. |-----------------------------|---------------------------| |Output Created |16-JAN-2014 12:33:52 | |-----------------------------|---------------------------| [TestData] CaseID LatinVal LatinLbl GreekVal GreekLbl HebrwVal HebrwLbl 1001 1 A 1 Alpha 1 Aleph 1002 1 A 2 Beta 1 Aleph 1003 3 C 2 Beta 1 Aleph 1004 3 C 1 Alpha 1 Aleph 1005 3 C 1 Alpha 2 Beth 1006 1 A 1 Alpha 3 Gimel 1007 3 C 1 Alpha 2 Beth 1008 1 A 2 Beta 2 Beth 1009 3 C 1 Alpha 1 Aleph 1010 2 B 3 Gamma 3 Gimel Number of cases read: 10 Number of cases listed: 10 * I. Unroll, putting each code-label pair in a separate ... . * record ... . DATASET ACTIVATE TestData WINDOW=FRONT. DATASET COPY Summary. DATASET ACTIVATE Summary WINDOW=Front. * ........................................................... ... . * To use this for any other dataset, you should only have to ... . * change the lists of variables on the two MAKE lists, ... . * and change (or delete) the KEEP clause. ... . * ........................................................... ... . VARSTOCASES /MAKE Value FROM LatinVal GreekVal HebrwVal /MAKE Label FROM LatinLbl GreekLbl HebrwLbl /INDEX = Variable(Value) /KEEP = CaseID /NULL = DROP. Variables to Cases |-----------------------------|---------------------------| |Output Created |16-JAN-2014 12:33:53 | |-----------------------------|---------------------------| | (Notes omitted) | |-----------------------------|---------------------------| * II. Summarize, to a single record for each ... . * variable-value-label triplet found in the data ... . AGGREGATE OUTFILE=* /BREAK =Variable Value Label /N# 'Instances of this pair' = NU. LIST. List |-----------------------------|---------------------------| |Output Created |16-JAN-2014 12:33:53 | |-----------------------------|---------------------------| Variable Value Label N# GreekVal 1 Alpha 6 GreekVal 2 Beta 3 GreekVal 3 Gamma 1 HebrwVal 1 Aleph 5 HebrwVal 2 Beth 3 HebrwVal 3 Gimel 2 LatinVal 1 A 4 LatinVal 2 B 1 LatinVal 3 C 5 Number of cases read: 9 Number of cases listed: 9 * III. Generate VALUE LABEL statements ... . ADD FILES /FILE=* /BY Variable /FIRST=VarStart LAST =VarEnd. * III.A. Scratch variables: A literal quotation mark, and ... . * a buffer in which to build the labels ... . STRING #Qt (A1) #Label (A50). COMPUTE #Qt = '"'. * III.B. Write the statement beginnings, naming the variables ... . DO IF VarStart. . WRITE OUTFILE=Code /'VALUE LABELS ' Variable . END IF. * III.C. Write the label specifications ... . COMPUTE #Label = CONCAT(#Qt,RTRIM(Label),#Qt). IF Varend #Label = CONCAT(RTRIM(#Label),'.'). . WRITE OUTFILE=Code /' ' Value (F3) ' ' #Label *. * III.D. Force a data pass, to write out the code ... . EXECUTE. * IV. Apply the labels to the original file ... . DATASET ACTIVATE TestData WINDOW=FRONT. INSERT FILE=Code. 162 VALUE LABELS GreekVal 163 1 "Alpha" 164 2 "Beta" 165 3 "Gamma". 166 VALUE LABELS HebrwVal 167 1 "Aleph" 168 2 "Beth" 169 3 "Gimel". 170 VALUE LABELS LatinVal 171 1 "A" 172 2 "B" 173 3 "C". 174 175 * End of INSERT and INCLUDE nesting level 01. ============================= APPENDIX: Test data, and code ============================= * C:\Documents and Settings\Richard\My Documents . * \Technical\spssx-l\Z-2014\ . * 2014-01-06 Tighe-value labels from string vars.SPS . * In response to posting . * Date: Mon, 6 Jan 2014 17:46:20 -0500 . * From: Elizabeth Tighe <[hidden email]> . * Subject: value labels from string vars . * To: [hidden email] . * ................................................................. . * "We'd like to read data stored in an Access database where the . * descriptive labels on categorical variables are stored in . * separate tables. The ODBC import will pull the fields containing . * the labels in as additional variables and we'd like to get them . * to be used as value labels instead. . * . * "Is there a way to assign the values of one variable (a string . * variable) as the value labels for another variable, i.e., the . * equivalent of the labmask function available as part of the . * labutil package in Stata?" . * ................................................................. . * 0.a External file for generated code ... . FILE HANDLE Code /NAME='C:\Documents and Settings\Richard\My Documents' + '\Temporary\SPSS\' + '2014-01-06 Tighe-value labels from string vars' + ' - ' + 'CODE.SPS'. * 0.b Test data ... . SET RNG =MT. SET MTINDEX=4833. NEW FILE. INPUT PROGRAM. NUMERIC CaseID (F5). NUMERIC LatinVal (F3). STRING LatinLbl (A1). NUMERIC GreekVal (F3). STRING GreekLbl (A5). NUMERIC HebrwVal (F3). STRING HebrwLbl (A6). LOOP CaseID = 1001 TO 1010. . DO REPEAT Value = LatinVal GreekVal HebrwVal. . COMPUTE Value = RV.POISSON(1.0) + 1. . RECODE Value (3 THRU HI = 3). . END REPEAT. . RECODE LatinVal (1 = 'A') (2 = 'B') (3 = 'C') INTO LatinLbl. . RECODE GreekVal (1 = 'Alpha') (2 = 'Beta') (3 = 'Gamma') INTO GreekLbl. . RECODE HebrwVal (1 = 'Aleph') (2 = 'Beth') (3 = 'Gimel') INTO HebrwLbl. . END CASE. END LOOP. END FILE. END INPUT PROGRAM. DATASET NAME TestData WINDOW=FRONT. * ... Post, after this point ... . LIST. * I. Unroll, putting each code-label pair in a separate ... . * record ... . DATASET ACTIVATE TestData WINDOW=FRONT. DATASET COPY Summary. DATASET ACTIVATE Summary WINDOW=Front. * ........................................................... ... . * To use this for any other dataset, you should only have to ... . * change the lists of variables on the two MAKE lists, ... . * and change (or delete) the KEEP clause. ... . * ........................................................... ... . VARSTOCASES /MAKE Value FROM LatinVal GreekVal HebrwVal /MAKE Label FROM LatinLbl GreekLbl HebrwLbl /INDEX = Variable(Value) /KEEP = CaseID /NULL = DROP. LIST. * II. Summarize, to a single record for each ... . * variable-value-label triplet found in the data ... . AGGREGATE OUTFILE=* /BREAK =Variable Value Label /N# 'Instances of this pair' = NU. LIST. * III. Generate VALUE LABEL statements ... . ADD FILES /FILE=* /BY Variable /FIRST=VarStart LAST =VarEnd. * III.A. Scratch variables: A literal quotation mark, and ... . * a buffer in which to build the labels ... . STRING #Qt (A1) #Label (A50). COMPUTE #Qt = '"'. * III.B. Write the statement beginnings, naming the variables ... . DO IF VarStart. . WRITE OUTFILE=Code /'VALUE LABELS ' Variable . END IF. * III.C. Write the label specifications ... . COMPUTE #Label = CONCAT(#Qt,RTRIM(Label),#Qt). IF Varend #Label = CONCAT(RTRIM(#Label),'.'). . WRITE OUTFILE=Code /' ' Value (F3) ' ' #Label *. * III.D. Force a data pass, to write out the code ... . EXECUTE. * IV. Apply the labels to the original file ... . DATASET ACTIVATE TestData WINDOW=FRONT. INSERT FILE=Code. ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
In reply to this post by Jon K Peck
At 06:14 PM 1/6/2014, Jon K Peck wrote:
>If you need to keep the original values, a little more work is >required. If just a few variables are involved, you could do an >aggregated query for each, producing a separate table of values and >labels for each. Then a little bit of Python code would do the >job. If there are a lot of variables, that would be clumsy, and the >whole task could be better done in a somewhat more complicated Python job. This is a case where it really seems Python might help -- notably, SPSS.SUBMIT from Python is a neater way of submitting generated code to SPSS than is writing to an external file and INSERTing. However, in other respects, the solution I've posted in native SPSS, based on VARSTOCASES and AGGREGATE, seems pretty straightforward. What would be the most natural approach in Python? Would it be neater? ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
Here's an example of what I am working
on.
STATS VALLBLS FROMDATA VARIABLES=a b c to f LBLVARS="label.*" /OUTPUT SYNTAX="dolabels.sps". It will check for conflicts (two different labels for the same value of a variable) and can report duplicate value labels for a variable (two different values with the same label). It can optionally write a syntax file of VALUE LABEL statements and generate the labels. In most cases it will do everything on one data pass. Either variable list can be a regular variable list or a regular expression for selecting the variables. If many variables share the same set of labels, LBLVARS can be just one variable that is applied to all the VARIABLES variables. It will also have a dialog box interface. Details still subject to change. Jon Peck (no "h") aka Kim Senior Software Engineer, IBM [hidden email] phone: 720-342-5621 From: Richard Ristow <[hidden email]> To: Jon K Peck/Chicago/IBM@IBMUS, [hidden email], Date: 01/16/2014 11:19 AM Subject: Re: value labels from string vars At 06:14 PM 1/6/2014, Jon K Peck wrote: >If you need to keep the original values, a little more work is >required. If just a few variables are involved, you could do an >aggregated query for each, producing a separate table of values and >labels for each. Then a little bit of Python code would do the >job. If there are a lot of variables, that would be clumsy, and the >whole task could be better done in a somewhat more complicated Python job. This is a case where it really seems Python might help -- notably, SPSS.SUBMIT from Python is a neater way of submitting generated code to SPSS than is writing to an external file and INSERTing. However, in other respects, the solution I've posted in native SPSS, based on VARSTOCASES and AGGREGATE, seems pretty straightforward. What would be the most natural approach in Python? Would it be neater? |
In reply to this post by Richard Ristow
At 03:27 PM 1/16/2014, Jon K Peck wrote, off-list:
>One problem with [Richard's native-SPSS] code is that it presumes >that the variables being labelled are numeric. String variables can >also take labels. It also doesn't deal with the case where the >label text contains quotes Both points taken. The OP stated (in her second posting on the problem) that all her data values were numeric, so I felt OK accepting that limitation. It is a limitation that is difficult to surmount in native SPSS, probably much easier in Python. Some of the other features planned for the Python implementation, like . handling label text containing embedded quotes and (from an earlier posting) . check[ing] for conflicts (two different labels for the same value of a variable) . report[ing] duplicate value labels for a variable (two different values with the same label), could be added to the native-SPSS implementation: the latter two trivially easily, the former with some work. And others, like . It will have a dialog box interface . Either variable list can be a regular variable list or a regular expression for selecting the variables are essentially impossible in native SPSS; Python programmability really does buy some important new capabilities. One feature, . If many variables share the same set of labels, LBLVARS can be just one variable that is applied to all the VARIABLES variables is probably of little use in, as stated. By stipulation, the values of the label variables are the labels that should be applied to the corresponding values of the data variables. If two data variable share the same label variable, they necessarily have the same values unless labels are duplicated. On the other hand, if you can specify a set of data variables that share the same value labels (though different label variables), you could combine the checking of values against labels, and improve your chances of finding the correct labels for values that happen not to occur in one or more of the data variables. ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
Free forum by Nabble | Edit this page |