|
Dear all,
I have a question: I have 2 files, one has 10 variables, say: var1 var2 ....var10 another file has two variables, say var and coeff which looks like: var coeff var2 0.2 var8 0.3 var10 0.5 I want to now create a new variable in file1, say c. c should be var2*0.2 + var8*0.3 + var10*0.5. How can I do that automatically? That means, if in the file 2, the three values of var become var1, var10 and var5, c should be var1*0.2 + var10*0.3 + var5*0.5. #########################################. I want to create a new variable, say c in file one, so that the file looks like: var1 var2 ....var10 c 0.2 0.4 .....10 * 0.4 0.6 20 * The value of c (*) depends on var2, var8, var10 (if in file2 the value of var is var2, var8, var10). Does someone has any idea? Thanks for your help in advance!! I am using SPSS 15. Baiyun |
|
At 07:29 AM 9/20/2007, [hidden email] wrote:
>I have 2 files, one has 10 variables, say: var1 var2 ....var10 > >another file has two variables, say var and coeff which looks like: > >var coeff >var2 0.2 >var8 0.3 >var10 0.5 > >I want to now create a new variable in file1, say c. c should be >var2*0.2 + var8*0.3 + var10*0.5. I'm assuming you have a lot of records in the first file, all with those 10 variables. How many records are there in the second file? Is it just those three, and the same set of coefficients apply to every record in the first file? Or are there several groups of records in the second file, with different sets of coefficients, and there's a key variable connecting the files? This looks like CASESTOVARS followed by MATCH FILES followed by COMPUTE, likely in a LOOP, but we'd have to know more to be sure. -Best of luck, Richard |
|
Hello Richard
yes, I have a lot of records in the first file, all with those 10 variables. In the second file the variables vary, they are a subset of the 10 variables. I would like SPSS detect automatically which variables are in the file 2 . Is it possbile? thanks in advance! baiyun 2007/9/21, Richard Ristow <[hidden email]>: > > At 07:29 AM 9/20/2007, [hidden email] wrote: > > >I have 2 files, one has 10 variables, say: var1 var2 ....var10 > > > >another file has two variables, say var and coeff which looks like: > > > >var coeff > >var2 0.2 > >var8 0.3 > >var10 0.5 > > > >I want to now create a new variable in file1, say c. c should be > >var2*0.2 + var8*0.3 + var10*0.5. > > I'm assuming you have a lot of records in the first file, all with > those 10 variables. > > How many records are there in the second file? Is it just those three, > and the same set of coefficients apply to every record in the first > file? Or are there several groups of records in the second file, with > different sets of coefficients, and there's a key variable connecting > the files? > > This looks like CASESTOVARS followed by MATCH FILES followed by > COMPUTE, likely in a LOOP, but we'd have to know more to be sure. > > -Best of luck, > Richard > > |
|
This can be done easily with a little Pythonery.
Assume the formulas are in formulas.sav and the main dataset is main.sav. The program below does the work. First it reads the formula dataset and builds a little dictionary of the terms. Then it opens the other dataset, generates a compute command based on the terms it found in the formula dataset, and submits that compute. It also prints the formula. This example uses some supplementary modules from SPSS Developer Central (www.spss.com/devcentral) and will work on SPSS 14 and later. HTH, Jon Peck get file='c:/temp/formulas.sav'. begin program. import spss, spssdata formula={} curs = spssdata.Spssdata() for case in curs: formula[case.var] = case.coeff curs.CClose() spss.Submit("get file='c:/temp/main.sav'") cmd = "COMPUTE newvar = " + "+".join([n.strip() + "*" + \ str(c) for n,c in formula.items()]) print cmd spss.Submit(cmd) end program. exec. -----Original Message----- From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of [hidden email] Sent: Monday, September 24, 2007 9:02 AM To: [hidden email] Subject: Re: [SPSSX-L] can someone help me? Is it possible to use Python to solve this problem? Hello Richard yes, I have a lot of records in the first file, all with those 10 variables. In the second file the variables vary, they are a subset of the 10 variables. I would like SPSS detect automatically which variables are in the file 2 . Is it possbile? thanks in advance! baiyun 2007/9/21, Richard Ristow <[hidden email]>: > > At 07:29 AM 9/20/2007, [hidden email] wrote: > > >I have 2 files, one has 10 variables, say: var1 var2 ....var10 > > > >another file has two variables, say var and coeff which looks like: > > > >var coeff > >var2 0.2 > >var8 0.3 > >var10 0.5 > > > >I want to now create a new variable in file1, say c. c should be > >var2*0.2 + var8*0.3 + var10*0.5. > > I'm assuming you have a lot of records in the first file, all with > those 10 variables. > > How many records are there in the second file? Is it just those three, > and the same set of coefficients apply to every record in the first > file? Or are there several groups of records in the second file, with > different sets of coefficients, and there's a key variable connecting > the files? > > This looks like CASESTOVARS followed by MATCH FILES followed by > COMPUTE, likely in a LOOP, but we'd have to know more to be sure. > > -Best of luck, > Richard > > |
|
In reply to this post by xiaoqin.wan@gmail.com
At 10:02 AM 9/24/2007, [hidden email] wrote:
>yes, I have a lot of records in the first file, all with those 10 >variables. Now, you DON'T say whether there's some key variable telling which coefficients in file 2 go with which records in file 1. Really, that's important, OK? >In the second file the variables vary, they are a subset of the 10 >variables. I would like SPSS detect automatically which variables are >in the file 2. Of course they aren't a subset of the 10 variables. They're a set of *coefficients* to be applied to the 10 variables. I think Python is more fuss than necessary. (I do like the strengths of native SPSS.)) Your big need is that all 10 variables occur *somewhere* in file 2; there doesn't need to be one key for which they all occur. If file 2 is of any size, that's probably the case, but in any case you can ensure it by a dummy record key. Here's a demo, with 5 variables instead of 10, and a key variable named 'Key'. This code provides for one set of coefficients applying to more than one record; see Key=2. SPSS 14 draft output: * ..... Test data: ........................................ . |-----------------------------|---------------------------| |Output Created |24-SEP-2007 18:31:34 | |-----------------------------|---------------------------| [File1] Key Sub var1 var2 var3 var4 var5 001 A 1 2 3 4 5 002 A 2 3 4 5 6 002 B 3 4 5 6 7 003 A 4 5 6 7 8 Number of cases read: 4 Number of cases listed: 4 |-----------------------------|---------------------------| |Output Created |24-SEP-2007 18:31:34 | |-----------------------------|---------------------------| [File2] Key var coef 001 var4 .5 001 var5 .5 002 var1 .3 002 var2 .3 002 var4 .2 003 var1 .6 003 var3 .3 003 var4 .4 Number of cases read: 8 Number of cases listed: 8 * ..... Restructure File2 to wide organization: ............... . * Here's the kludgy step: Make sure that all variables are . * represented in File2, by creating a dummy set WITH all variables, . * to catenate to the real file. . INPUT PROGRAM. . NUMERIC Key (N3). . STRING var (A6). . LEAVE Key. . COMPUTE Key = 0. . LOOP #idx = 1 TO 5. . COMPUTE var = CONCAT('var',LTRIM(STRING(#idx,F3))). . END CASE. . END LOOP. END FILE. END INPUT PROGRAM. DATASET NAME Dummy WINDOW=FRONT. LIST. List |-----------------------------|---------------------------| |Output Created |24-SEP-2007 18:31:34 | |-----------------------------|---------------------------| [Dummy] Key var 000 var1 000 var2 000 var3 000 var4 000 var5 Number of cases read: 5 Number of cases listed: 5 * Catenate the dummy set and File2, for restructuring . ADD FILE /FILE=Dummy /FILE=File2 /BY Key. * Here's the real step: Restructure to wide organization. For . * this purpose, compute a numeric index from string variable 'var', . * to be the index value for CASESTOVARS. . DATASET NAME Wide WINDOW=FRONT. NUMERIC IDX (F3). COMPUTE IDX = NUMBER(SUBSTR(var,INDEX(var,'0123456789',1)),F3). . /**/ LIST /*-*/. List |-----------------------------|---------------------------| |Output Created |24-SEP-2007 18:31:35 | |-----------------------------|---------------------------| [Wide] Key var coef IDX 000 var1 . 1 000 var2 . 2 000 var3 . 3 000 var4 . 4 000 var5 . 5 001 var4 .5 4 001 var5 .5 5 002 var1 .3 1 002 var2 .3 2 002 var4 .2 4 003 var1 .6 1 003 var3 .3 3 003 var4 .4 4 Number of cases read: 13 Number of cases listed: 13 SORT CASES BY Key IDX . CASESTOVARS /ID = Key /INDEX = IDX /SEPARATOR = '' /DROP = var /GROUPBY = VARIABLE . Cases to Variables |-----------------------------|---------------------------| |Output Created |24-SEP-2007 18:31:35 | |-----------------------------|---------------------------| [Wide] Generated Variables |--------|---|------| |Original|IDX|Result| |Variable| |------| | | |Name | |--------|---|------| |coef |1 |coef1 | | |2 |coef2 | | |3 |coef3 | | |4 |coef4 | | |5 |coef5 | |--------|---|------| Processing Statistics |---------------|---| |Cases In |13 | |Cases Out |4 | |---------------|---| |Cases In/Cases |3.3| |Out | | |---------------|---| |Variables In |4 | |Variables Out |6 | |---------------|---| |Index Values |5 | |---------------|---| * And, drop the dummy record: . SELECT IF NVALID(coef1 TO coef5) GT 0. LIST. List |-----------------------------|---------------------------| |Output Created |24-SEP-2007 18:31:36 | |-----------------------------|---------------------------| [Wide] Key coef1 coef2 coef3 coef4 coef5 001 . . . .5 .5 002 .3 .3 . .2 . 003 .6 . .3 .4 . Number of cases read: 3 Number of cases listed: 3 * ..... Now, it's easy: Join File1 with the 'wide' file, ....... . * ..... and perform the requested computation. ....... . DATASET ACTIVATE File1 WINDOW=FRONT. MATCH FILES /FILE =* /TABLE=Wide /BY Key. NUMERIC c (F5.2). VECTOR var = var1 TO var5. VECTOR coef = coef1 TO coef5. LOOP #idx = 1 TO 5. * (The computation below uses 'SUM' rather than '+', so ....... . * missing values don't give a missing result.) ....... . . COMPUTE c = SUM(c, var(#idx)*coef(#idx)). END LOOP. LIST. List |-----------------------------|---------------------------| |Output Created |24-SEP-2007 18:31:37 | |-----------------------------|---------------------------| [File1] Key Sub var1 var2 var3 var4 var5 coef1 coef2 coef3 coef4 coef5 c 001 A 1 2 3 4 5 . . . .5 .5 4.50 002 A 2 3 4 5 6 .3 .3 . .2 . 2.50 002 B 3 4 5 6 7 .3 .3 . .2 . 3.30 003 A 4 5 6 7 8 .6 . .3 .4 . 7.00 Number of cases read: 4 Number of cases listed: 4 =================== APPENDIX: Test data =================== * ................................................................. . * ................. Test data ..................... . SET RNG = MT /* 'Mersenne twister' random number generator */ . SET MTINDEX = 8173 /* Providence, RI telephone book */ . * ................. 'File 1' ..................... . INPUT PROGRAM. . NUMERIC Key (N3). . STRING Sub (A1). . NUMERIC var1 TO var5 (F3). . LOOP #CaseNum = 1 TO 4. . RECODE #CaseNum (1 = 1) (2 = 2) (3 = 2) (4 = 3) INTO Key. . RECODE #CaseNum (1 ='A') (2 ='A') (3 ='B') (4 = 'A') INTO Sub. . VECTOR var = var1 TO var5. . LOOP #idx = 1 TO 5. . COMPUTE var(#idx) = #CaseNum + #idx - 1. . END LOOP. . END CASE. . END LOOP. END FILE. END INPUT PROGRAM. DATASET NAME File1 WINDOW=FRONT. . /*-- LIST /*-*/. * ................. 'File 2' ..................... . INPUT PROGRAM. . NUMERIC Key (N3). . STRING var (A6). . NUMERIC coef (F4.1). . LEAVE Key. . LOOP Key = 1 TO 3. . LOOP #idx = 1 TO 5. . DO IF RV.BERNOULLI(0.4). . COMPUTE var = CONCAT('var',LTRIM(STRING(#idx,F3))). . COMPUTE coef = TRUNC(RV.UNIFORM(2,7))/10. . END CASE. . END IF. . END LOOP. . END LOOP. END FILE. END INPUT PROGRAM. DATASET NAME File2 WINDOW=FRONT. . /**/ LIST /*-*/. * ................. Post after this point ..................... . * ................................................................. . |
|
In reply to this post by Peck, Jon
Hello Jon,
your python program works fine! Thanks so much! Only I don't understand every row of your codes, e.g. what formula[case.var] = case.coeff means, or what n.strip() means. Where can I find some explanations of these codes? Thank you again!! Regards Baiyun 2007/9/24, Peck, Jon <[hidden email]>: > > This can be done easily with a little Pythonery. > Assume the formulas are in formulas.sav and the main dataset is main.sav. The > program below does the work. First it reads the formula dataset and builds > a little dictionary of the terms. > Then it opens the other dataset, generates a compute command based on the > terms it found in the formula dataset, and submits that compute. It also > prints the formula. > > This example uses some supplementary modules from SPSS Developer Central ( > www.spss.com/devcentral) and will work on SPSS 14 and later. > > HTH, > Jon Peck > > > get file='c:/temp/formulas.sav'. > begin program. > import spss, spssdata > > formula={} > curs = spssdata.Spssdata() > for case in curs: > formula[case.var] = case.coeff > curs.CClose() > spss.Submit("get file='c:/temp/main.sav'") > > cmd = "COMPUTE newvar = " + "+".join([n.strip() + "*" + \ > str(c) for n,c in formula.items()]) > print cmd > spss.Submit(cmd) > end program. > exec. > > -----Original Message----- > From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of > [hidden email] > Sent: Monday, September 24, 2007 9:02 AM > To: [hidden email] > Subject: Re: [SPSSX-L] can someone help me? Is it possible to use Python > to solve this problem? > > Hello Richard > > yes, I have a lot of records in the first file, all with > those 10 variables. > > In the second file the variables vary, they are a subset of the 10 > variables. I would like SPSS detect automatically which variables are in > the > file 2 . > > Is it possbile? > > thanks in advance! > > baiyun > > > > 2007/9/21, Richard Ristow <[hidden email]>: > > > > At 07:29 AM 9/20/2007, [hidden email] wrote: > > > > >I have 2 files, one has 10 variables, say: var1 var2 ....var10 > > > > > >another file has two variables, say var and coeff which looks like: > > > > > >var coeff > > >var2 0.2 > > >var8 0.3 > > >var10 0.5 > > > > > >I want to now create a new variable in file1, say c. c should be > > >var2*0.2 + var8*0.3 + var10*0.5. > > > > I'm assuming you have a lot of records in the first file, all with > > those 10 variables. > > > > How many records are there in the second file? Is it just those three, > > and the same set of coefficients apply to every record in the first > > file? Or are there several groups of records in the second file, with > > different sets of coefficients, and there's a key variable connecting > > the files? > > > > This looks like CASESTOVARS followed by MATCH FILES followed by > > COMPUTE, likely in a LOOP, but we'd have to know more to be sure. > > > > -Best of luck, > > Richard > > > > > |
| Free forum by Nabble | Edit this page |
