can someone help me? Is it possible to use Python to solve this problem?

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

can someone help me? Is it possible to use Python to solve this problem?

xiaoqin.wan@gmail.com
Dear all,

I have a question:

I have 2 files, one has 10 variables, say: var1 var2 ....var10

another file has two variables, say var and coeff which looks like:

var   coeff
var2   0.2
var8   0.3
var10 0.5

I want to now create a new variable in file1, say c. c should be
var2*0.2 + var8*0.3 + var10*0.5.
How can I do that automatically? That means, if in the file 2, the
three values of var become var1, var10 and var5, c should be var1*0.2
+ var10*0.3 + var5*0.5.

#########################################.

I want to create a new variable, say c in file one, so that the file
looks like:

var1 var2 ....var10  c
0.2  0.4 .....10       *
0.4 0.6        20     *

The value of c (*) depends on var2, var8, var10 (if in file2 the value
of var is var2, var8, var10).

Does someone has any idea?
Thanks for your help in advance!!

I am using SPSS 15.

Baiyun
Reply | Threaded
Open this post in threaded view
|

Re: can someone help me? Is it possible to use Python to solve this problem?

Richard Ristow
At 07:29 AM 9/20/2007, [hidden email] wrote:

>I have 2 files, one has 10 variables, say: var1 var2 ....var10
>
>another file has two variables, say var and coeff which looks like:
>
>var   coeff
>var2   0.2
>var8   0.3
>var10  0.5
>
>I want to now create a new variable in file1, say c. c should be
>var2*0.2 + var8*0.3 + var10*0.5.

I'm assuming you have a lot of records in the first file, all with
those 10 variables.

How many records are there in the second file? Is it just those three,
and the same set of coefficients apply to every record in the first
file? Or are there several groups of records in the second file, with
different sets of coefficients, and there's a key variable connecting
the files?

This looks like CASESTOVARS followed by MATCH FILES followed by
COMPUTE, likely in a LOOP, but we'd have to know more to be sure.

-Best of luck,
  Richard
Reply | Threaded
Open this post in threaded view
|

Re: can someone help me? Is it possible to use Python to solve this problem?

xiaoqin.wan@gmail.com
Hello Richard

yes, I have a lot of records in the first file, all with
those 10 variables.

In the second file the variables vary, they are a subset of the 10
variables. I would like SPSS detect automatically which variables are in the
file 2 .

Is it possbile?

thanks in advance!

baiyun



2007/9/21, Richard Ristow <[hidden email]>:

>
> At 07:29 AM 9/20/2007, [hidden email] wrote:
>
> >I have 2 files, one has 10 variables, say: var1 var2 ....var10
> >
> >another file has two variables, say var and coeff which looks like:
> >
> >var   coeff
> >var2   0.2
> >var8   0.3
> >var10  0.5
> >
> >I want to now create a new variable in file1, say c. c should be
> >var2*0.2 + var8*0.3 + var10*0.5.
>
> I'm assuming you have a lot of records in the first file, all with
> those 10 variables.
>
> How many records are there in the second file? Is it just those three,
> and the same set of coefficients apply to every record in the first
> file? Or are there several groups of records in the second file, with
> different sets of coefficients, and there's a key variable connecting
> the files?
>
> This looks like CASESTOVARS followed by MATCH FILES followed by
> COMPUTE, likely in a LOOP, but we'd have to know more to be sure.
>
> -Best of luck,
> Richard
>
>
Reply | Threaded
Open this post in threaded view
|

Re: can someone help me? Is it possible to use Python to solve this problem?

Peck, Jon
This can be done easily with a little Pythonery.
Assume the formulas are in formulas.sav and the main dataset is main.sav.  The program below does the work.  First it reads the formula dataset and builds a little dictionary of the terms.
Then it opens the other dataset, generates a compute command based on the terms it found in the formula dataset, and submits that compute.  It also prints the formula.

This example uses some supplementary modules from SPSS Developer Central (www.spss.com/devcentral) and will work on SPSS 14 and later.

HTH,
Jon Peck


get file='c:/temp/formulas.sav'.
begin program.
import spss, spssdata

formula={}
curs = spssdata.Spssdata()
for case in curs:
  formula[case.var] = case.coeff
curs.CClose()
spss.Submit("get file='c:/temp/main.sav'")

cmd = "COMPUTE newvar = " + "+".join([n.strip() + "*" + \
str(c) for n,c in formula.items()])
print cmd
spss.Submit(cmd)
end program.
exec.

-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of [hidden email]
Sent: Monday, September 24, 2007 9:02 AM
To: [hidden email]
Subject: Re: [SPSSX-L] can someone help me? Is it possible to use Python to solve this problem?

Hello Richard

yes, I have a lot of records in the first file, all with
those 10 variables.

In the second file the variables vary, they are a subset of the 10
variables. I would like SPSS detect automatically which variables are in the
file 2 .

Is it possbile?

thanks in advance!

baiyun



2007/9/21, Richard Ristow <[hidden email]>:

>
> At 07:29 AM 9/20/2007, [hidden email] wrote:
>
> >I have 2 files, one has 10 variables, say: var1 var2 ....var10
> >
> >another file has two variables, say var and coeff which looks like:
> >
> >var   coeff
> >var2   0.2
> >var8   0.3
> >var10  0.5
> >
> >I want to now create a new variable in file1, say c. c should be
> >var2*0.2 + var8*0.3 + var10*0.5.
>
> I'm assuming you have a lot of records in the first file, all with
> those 10 variables.
>
> How many records are there in the second file? Is it just those three,
> and the same set of coefficients apply to every record in the first
> file? Or are there several groups of records in the second file, with
> different sets of coefficients, and there's a key variable connecting
> the files?
>
> This looks like CASESTOVARS followed by MATCH FILES followed by
> COMPUTE, likely in a LOOP, but we'd have to know more to be sure.
>
> -Best of luck,
> Richard
>
>
Reply | Threaded
Open this post in threaded view
|

Re: Is it possible to use Python...

Richard Ristow
In reply to this post by xiaoqin.wan@gmail.com
At 10:02 AM 9/24/2007, [hidden email] wrote:

>yes, I have a lot of records in the first file, all with those 10
>variables.

Now, you DON'T say whether there's some key variable telling which
coefficients in file 2 go with which records in file 1. Really, that's
important, OK?

>In the second file the variables vary, they are a subset of the 10
>variables. I would like SPSS detect automatically which variables are
>in the file 2.

Of course they aren't a subset of the 10 variables. They're a set of
*coefficients* to be applied to the 10 variables.

I think Python is more fuss than necessary. (I do like the strengths of
native SPSS.)) Your big need is that all 10 variables occur *somewhere*
in file 2; there doesn't need to be one key for which they all occur.
If file 2 is of any size, that's probably the case, but in any case you
can ensure it by a dummy record key. Here's a demo, with 5 variables
instead of 10, and a key variable named 'Key'. This code provides for
one set of coefficients applying to more than one record; see Key=2.
SPSS 14 draft output:

*  .....   Test data:       ........................................ .
|-----------------------------|---------------------------|
|Output Created               |24-SEP-2007 18:31:34       |
|-----------------------------|---------------------------|
[File1]

Key Sub var1 var2 var3 var4 var5

001 A      1    2    3    4    5
002 A      2    3    4    5    6
002 B      3    4    5    6    7
003 A      4    5    6    7    8

Number of cases read:  4    Number of cases listed:  4


|-----------------------------|---------------------------|
|Output Created               |24-SEP-2007 18:31:34       |
|-----------------------------|---------------------------|
[File2]

Key var    coef

001 var4     .5
001 var5     .5
002 var1     .3
002 var2     .3
002 var4     .2
003 var1     .6
003 var3     .3
003 var4     .4

Number of cases read:  8    Number of cases listed:  8


*  .....   Restructure File2 to wide organization:   ............... .

*  Here's the kludgy step: Make sure that all variables are          .
*  represented in File2, by creating a dummy set WITH all variables, .
*  to catenate to the real file.                                     .

INPUT PROGRAM.
.  NUMERIC Key          (N3).
.  STRING  var          (A6).
.  LEAVE   Key.
.  COMPUTE Key  = 0.
.  LOOP #idx = 1 TO 5.
.     COMPUTE var  = CONCAT('var',LTRIM(STRING(#idx,F3))).
.     END CASE.
.  END  LOOP.
END FILE.
END INPUT PROGRAM.
DATASET NAME      Dummy WINDOW=FRONT.
LIST.

List
|-----------------------------|---------------------------|
|Output Created               |24-SEP-2007 18:31:34       |
|-----------------------------|---------------------------|
[Dummy]

Key var

000 var1
000 var2
000 var3
000 var4
000 var5

Number of cases read:  5    Number of cases listed:  5


*  Catenate the dummy set and File2, for restructuring               .
ADD FILE
   /FILE=Dummy
   /FILE=File2
   /BY   Key.


*  Here's the real step:   Restructure to wide organization. For     .
*  this purpose, compute a numeric index from string variable 'var', .
*  to be the index value for CASESTOVARS.                            .

DATASET NAME      Wide  WINDOW=FRONT.
NUMERIC IDX (F3).
COMPUTE IDX = NUMBER(SUBSTR(var,INDEX(var,'0123456789',1)),F3).

.  /**/ LIST /*-*/.

List
|-----------------------------|---------------------------|
|Output Created               |24-SEP-2007 18:31:35       |
|-----------------------------|---------------------------|
[Wide]

Key var    coef IDX

000 var1     .    1
000 var2     .    2
000 var3     .    3
000 var4     .    4
000 var5     .    5
001 var4     .5   4
001 var5     .5   5
002 var1     .3   1
002 var2     .3   2
002 var4     .2   4
003 var1     .6   1
003 var3     .3   3
003 var4     .4   4

Number of cases read:  13    Number of cases listed:  13


SORT CASES BY Key IDX .
CASESTOVARS
  /ID        = Key
  /INDEX     = IDX
  /SEPARATOR = ''
  /DROP      = var
  /GROUPBY   = VARIABLE .

Cases to Variables
|-----------------------------|---------------------------|
|Output Created               |24-SEP-2007 18:31:35       |
|-----------------------------|---------------------------|
[Wide]

Generated Variables
|--------|---|------|
|Original|IDX|Result|
|Variable|   |------|
|        |   |Name  |
|--------|---|------|
|coef    |1  |coef1 |
|        |2  |coef2 |
|        |3  |coef3 |
|        |4  |coef4 |
|        |5  |coef5 |
|--------|---|------|

Processing Statistics
|---------------|---|
|Cases In       |13 |
|Cases Out      |4  |
|---------------|---|
|Cases In/Cases |3.3|
|Out            |   |
|---------------|---|
|Variables In   |4  |
|Variables Out  |6  |
|---------------|---|
|Index Values   |5  |
|---------------|---|


*  And, drop the dummy record:                                       .

SELECT IF NVALID(coef1 TO coef5) GT 0.
LIST.

List
|-----------------------------|---------------------------|
|Output Created               |24-SEP-2007 18:31:36       |
|-----------------------------|---------------------------|
[Wide]

Key coef1 coef2 coef3 coef4 coef5

001    .     .     .     .5    .5
002    .3    .3    .     .2    .
003    .6    .     .3    .4    .

Number of cases read:  3    Number of cases listed:  3


*  .....   Now, it's easy:  Join File1 with the 'wide' file, ....... .
*  .....   and perform the requested computation.            ....... .

DATASET ACTIVATE File1 WINDOW=FRONT.
MATCH FILES
    /FILE =*
    /TABLE=Wide
    /BY    Key.

NUMERIC  c (F5.2).
VECTOR   var  = var1  TO var5.
VECTOR   coef = coef1 TO coef5.
LOOP #idx = 1 TO 5.
*  (The computation below uses 'SUM' rather than '+', so     ....... .
*  missing values don't give a missing result.)              ....... .
.  COMPUTE c = SUM(c,
                    var(#idx)*coef(#idx)).
END LOOP.

LIST.

List
|-----------------------------|---------------------------|
|Output Created               |24-SEP-2007 18:31:37       |
|-----------------------------|---------------------------|
[File1]

Key Sub var1 var2 var3 var4 var5 coef1 coef2 coef3 coef4 coef5     c

001 A      1    2    3    4    5    .     .     .     .5    .5  4.50
002 A      2    3    4    5    6    .3    .3    .     .2    .   2.50
002 B      3    4    5    6    7    .3    .3    .     .2    .   3.30
003 A      4    5    6    7    8    .6    .     .3    .4    .   7.00

Number of cases read:  4    Number of cases listed:  4


===================
APPENDIX: Test data
===================
*  ................................................................. .
*  .................   Test data               ..................... .
SET RNG = MT       /* 'Mersenne twister' random number generator  */ .
SET MTINDEX = 8173 /*  Providence, RI telephone book              */ .

*  .................   'File 1'                ..................... .
INPUT PROGRAM.
.  NUMERIC Key          (N3).
.  STRING  Sub          (A1).
.  NUMERIC var1 TO var5 (F3).
.  LOOP #CaseNum = 1 TO 4.
.     RECODE #CaseNum (1 = 1)  (2 = 2)  (3 = 2)  (4 = 3)   INTO Key.
.     RECODE #CaseNum (1 ='A') (2 ='A') (3 ='B') (4 = 'A') INTO Sub.
.     VECTOR var = var1 TO var5.
.     LOOP  #idx = 1    TO 5.
.        COMPUTE var(#idx) = #CaseNum + #idx - 1.
.     END LOOP.
.     END CASE.
.  END LOOP.
END FILE.
END INPUT PROGRAM.
DATASET NAME      File1 WINDOW=FRONT.

.  /*--  LIST  /*-*/.

*  .................   'File 2'                ..................... .
INPUT PROGRAM.
.  NUMERIC Key          (N3).
.  STRING  var          (A6).
.  NUMERIC coef         (F4.1).
.  LEAVE   Key.
.  LOOP    Key  = 1 TO 3.
.     LOOP #idx = 1 TO 5.
.     DO IF   RV.BERNOULLI(0.4).
.        COMPUTE var  = CONCAT('var',LTRIM(STRING(#idx,F3))).
.        COMPUTE coef = TRUNC(RV.UNIFORM(2,7))/10.
.        END CASE.
.     END IF.
.     END  LOOP.
.  END LOOP.
END FILE.
END INPUT PROGRAM.
DATASET NAME      File2 WINDOW=FRONT.

.  /**/  LIST  /*-*/.

*  .................   Post after this point   ..................... .
*  ................................................................. .
Reply | Threaded
Open this post in threaded view
|

Re: can someone help me? Is it possible to use Python to solve this problem?

xiaoqin.wan@gmail.com
In reply to this post by Peck, Jon
Hello Jon,

your python program works fine! Thanks so much!

Only I don't understand every row of your codes, e.g. what formula[case.var]
= case.coeff means, or what n.strip() means. Where can I find some
explanations of these codes?

Thank you again!!

Regards

Baiyun



2007/9/24, Peck, Jon <[hidden email]>:

>
> This can be done easily with a little Pythonery.
> Assume the formulas are in formulas.sav and the main dataset is main.sav.  The
> program below does the work.  First it reads the formula dataset and builds
> a little dictionary of the terms.
> Then it opens the other dataset, generates a compute command based on the
> terms it found in the formula dataset, and submits that compute.  It also
> prints the formula.
>
> This example uses some supplementary modules from SPSS Developer Central (
> www.spss.com/devcentral) and will work on SPSS 14 and later.
>
> HTH,
> Jon Peck
>
>
> get file='c:/temp/formulas.sav'.
> begin program.
> import spss, spssdata
>
> formula={}
> curs = spssdata.Spssdata()
> for case in curs:
> formula[case.var] = case.coeff
> curs.CClose()
> spss.Submit("get file='c:/temp/main.sav'")
>
> cmd = "COMPUTE newvar = " + "+".join([n.strip() + "*" + \
> str(c) for n,c in formula.items()])
> print cmd
> spss.Submit(cmd)
> end program.
> exec.
>
> -----Original Message-----
> From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of
> [hidden email]
> Sent: Monday, September 24, 2007 9:02 AM
> To: [hidden email]
> Subject: Re: [SPSSX-L] can someone help me? Is it possible to use Python
> to solve this problem?
>
> Hello Richard
>
> yes, I have a lot of records in the first file, all with
> those 10 variables.
>
> In the second file the variables vary, they are a subset of the 10
> variables. I would like SPSS detect automatically which variables are in
> the
> file 2 .
>
> Is it possbile?
>
> thanks in advance!
>
> baiyun
>
>
>
> 2007/9/21, Richard Ristow <[hidden email]>:
> >
> > At 07:29 AM 9/20/2007, [hidden email] wrote:
> >
> > >I have 2 files, one has 10 variables, say: var1 var2 ....var10
> > >
> > >another file has two variables, say var and coeff which looks like:
> > >
> > >var   coeff
> > >var2   0.2
> > >var8   0.3
> > >var10  0.5
> > >
> > >I want to now create a new variable in file1, say c. c should be
> > >var2*0.2 + var8*0.3 + var10*0.5.
> >
> > I'm assuming you have a lot of records in the first file, all with
> > those 10 variables.
> >
> > How many records are there in the second file? Is it just those three,
> > and the same set of coefficients apply to every record in the first
> > file? Or are there several groups of records in the second file, with
> > different sets of coefficients, and there's a key variable connecting
> > the files?
> >
> > This looks like CASESTOVARS followed by MATCH FILES followed by
> > COMPUTE, likely in a LOOP, but we'd have to know more to be sure.
> >
> > -Best of luck,
> > Richard
> >
> >
>