Creating a new dataset with Python

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

Creating a new dataset with Python

Vicent Giner-Bosch
This is a question about using "programming" features of SPSS 15.0.1.

The following error arrises when trying to execute a "BEGIN PROGRAM"
procedure:


spss.spss150.errMsg.SpssError: [errLevel 63] Submit cannot be used from
within a user procedure or while there is a Cursor instance.


The procedure is just like this:


BEGIN PROGRAM .

import spss, spssdata

cursor1 = spssdata.Spssdata(accessType='r', indexes=( ... list of useful
variables ... ))

cursor2 = spssdata.Spssdata(accessType='n')

cursor2.append(spssdata.vdef( ... variable definition ... ))
...
cursor2.append(spssdata.vdef( ... variable definition ... ))

cursor2.commitdict()

in_firsttime = 1
previous_id = -1

for fila in cursor1:
    if(fila.id != previous_id):
        if (in_firsttime !=  1):
            cursor2.CommitCase()
        else:
            in_firsttime = 0

        cursor2.appendvalue('id', fila.id)

    colname = ... build the name of the colum, using the info. contained by
"fila" ...
    cursor2.appendvalue(colname, ... a computed value ... )

    previous_id = fila.id

cursor2.CommitCase()
# finally:
cursor1.CClose()
cursor2.CClose()
del cursor1
del cursor2
del ...
END PROGRAM .


What is wrong?

Can't I use two cursors at once? Should I use  "fetchone()"  instead of
"for fila in cursor1:"?

I don't get the point...

Thank you in advance, and happy 2008!

--
Vicent Giner Bosch

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Creating a new dataset with Python

Peck, Jon
In SPSS 15 you are limited to one active cursor at a time.  SPSS 16 provides a Dataset class that allows an unlimited number of cursors (and is more flexible in other ways as well).

You are getting this particular error message because the implementation of accessType='n' requires that some syntax be run to define the dataset before it starts to add cases.

To make your logic work in SPSS 15, you need to add the new variable to the active dataset, and then if you want a subset of cases, select on an appropriate indicator variable after you close the cursor.

BTW, if you are not writing values,
if (in_firsttime !=  1):
            cursor2.CommitCase()

there is no need to commit the case.  The next iteration of the for loop will move to the next case leaving the previous one unchanged.

HTH,
Jon Peck

-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Vicent Giner Bosch
Sent: Monday, December 31, 2007 5:55 AM
To: [hidden email]
Subject: [SPSSX-L] Creating a new dataset with Python
Importance: High

This is a question about using "programming" features of SPSS 15.0.1.

The following error arrises when trying to execute a "BEGIN PROGRAM"
procedure:


spss.spss150.errMsg.SpssError: [errLevel 63] Submit cannot be used from
within a user procedure or while there is a Cursor instance.


The procedure is just like this:


BEGIN PROGRAM .

import spss, spssdata

cursor1 = spssdata.Spssdata(accessType='r', indexes=( ... list of useful
variables ... ))

cursor2 = spssdata.Spssdata(accessType='n')

cursor2.append(spssdata.vdef( ... variable definition ... ))
...
cursor2.append(spssdata.vdef( ... variable definition ... ))

cursor2.commitdict()

in_firsttime = 1
previous_id = -1

for fila in cursor1:
    if(fila.id != previous_id):
        if (in_firsttime !=  1):
            cursor2.CommitCase()
        else:
            in_firsttime = 0

        cursor2.appendvalue('id', fila.id)

    colname = ... build the name of the colum, using the info. contained by
"fila" ...
    cursor2.appendvalue(colname, ... a computed value ... )

    previous_id = fila.id

cursor2.CommitCase()
# finally:
cursor1.CClose()
cursor2.CClose()
del cursor1
del cursor2
del ...
END PROGRAM .


What is wrong?

Can't I use two cursors at once? Should I use  "fetchone()"  instead of
"for fila in cursor1:"?

I don't get the point...

Thank you in advance, and happy 2008!

--
Vicent Giner Bosch

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Creating a new dataset with Python

Vicent Giner-Bosch
Hello, Jon and everyone.


> In SPSS 15 you are limited to one active cursor at a time.
> SPSS 16 provides a Dataset class that allows an unlimited
> number of cursors (and is more flexible in other ways as well).


Do you mean that the following is wrong (in version 15)?



BEGIN PROGRAM .

import spss, spssdata

cursor1 = spssdata.Spssdata(accessType='r', indexes=( ... list of useful
variables ... ))

cursor2 = spssdata.Spssdata(accessType='n')

...



I do not have access to version 16 yet. Our University usually has the last
available version, but not in this case. Maybe, that's because we are in
Spain (?).



>
> You are getting this particular error message because the
> implementation of accessType='n' requires that some syntax be
> run to define the dataset before it starts to add cases.


Well, I do not understand what you mean, because before adding cases to
"cursor2", I do this (right after declaring/creating "cursor2"):



cursor2.append(spssdata.vdef( ... variable definition ... )) ...
cursor2.append(spssdata.vdef( ... variable definition ... ))

cursor2.commitdict()


I thought that was enough for defining the "columns" of the data set.



>
> To make your logic work in SPSS 15, you need to add the new
> variable to the active dataset, and then if you want a subset
> of cases, select on an appropriate indicator variable after
> you close the cursor.


What I need/want, and was trying to perform with that piece of code, is the
following:

- I have a "big" dataset whit different observations (cases) for each
"person id". One person can appear several times in the big data set, but
with a different "date" field.

This could be an example of what I mean:


ID     DATE         Other data
--     -------      ----------
1      2006-08      ......
1      2006-09      ......
1      2006-11      ......
1      2007-02      ......
2      2006-08      ......
2      2006-09      ......
2      2006-10      ......
2      2006-11      ......
3      2006-09      ......
3      2007-03      ......
..     .......      ......


Some people "appear" every month. For other people, I only have information
related to several months, but not all the months.

I want to build a new data set, in which every person appears only once, and
each column tells wether or not we have information for each person in a
given month. That is, something like this:


ID    in_200608   in_200609   in_200610  ...
--    ---------   ---------   ---------
1         1           1           0      ...
2         1           1           1      ...
3         1           0           0      ...
..        ..          ..          ..     ...


The initial "big" data set is already sorted by ID and DATE, say.

I thought that it would be possible to create the "agregated" data set by
means of reading the "big" data set, row by row, and (creating and) updating
the "agregated" data set at the same time. And using cursors was the way I
thought it seemed more logical to me.


>
> BTW, if you are not writing values,
> if (in_firsttime !=  1):
>             cursor2.CommitCase()
>
> there is no need to commit the case.  The next iteration of
> the for loop will move to the next case leaving the previous
> one unchanged.


In fact, it is supposed that the program only enters into that condition
when it has already writen something at  cursor2  by means of
cursor2.appendvalue .

The idea is that it enters there and performs the "commit" when we have
already read all the rows belonging to the same "ID" in the "big" dataset,
so it's time to "save" that row in the "agregated" data set and change to
another ID.

Maybe I did not get the logic of creating new rows from scratch... The "for
fila in cursor1:" is a loop for reading the "cursor1", and at the same time,
while I am reading it, I am creating the "agregated" data set. That was my
initial idea...

So, how can I do it in SPSS 15??

Thank you in advance for your time.


--
Vicent



> -----Original Message-----
> From: SPSSX(r) Discussion [mailto:[hidden email]]
> On Behalf Of Vicent Giner Bosch
> Sent: Monday, December 31, 2007 5:55 AM
> To: [hidden email]
> Subject: [SPSSX-L] Creating a new dataset with Python
> Importance: High
>
> This is a question about using "programming" features of SPSS 15.0.1.
>
> The following error arrises when trying to execute a "BEGIN PROGRAM"
> procedure:
>
>
> spss.spss150.errMsg.SpssError: [errLevel 63] Submit cannot be
> used from within a user procedure or while there is a Cursor instance.
>
>
> The procedure is just like this:
>
>
> BEGIN PROGRAM .
>
> import spss, spssdata
>
> cursor1 = spssdata.Spssdata(accessType='r', indexes=( ...
> list of useful variables ... ))
>
> cursor2 = spssdata.Spssdata(accessType='n')
>
> cursor2.append(spssdata.vdef( ... variable definition ... )) ...
> cursor2.append(spssdata.vdef( ... variable definition ... ))
>
> cursor2.commitdict()
>
> in_firsttime = 1
> previous_id = -1
>
> for fila in cursor1:
>     if(fila.id != previous_id):
>         if (in_firsttime !=  1):
>             cursor2.CommitCase()
>         else:
>             in_firsttime = 0
>
>         cursor2.appendvalue('id', fila.id)
>
>     colname = ... build the name of the colum, using the
> info. contained by "fila" ...
>     cursor2.appendvalue(colname, ... a computed value ... )
>
>     previous_id = fila.id
>
> cursor2.CommitCase()
> # finally:
> cursor1.CClose()
> cursor2.CClose()
> del cursor1
> del cursor2
> del ...
> END PROGRAM .
>
>
> What is wrong?
>
> Can't I use two cursors at once? Should I use  "fetchone()"
> instead of "for fila in cursor1:"?
>
> I don't get the point...
>
> Thank you in advance, and happy 2008!
>
> --
> Vicent Giner Bosch
>

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Creating a new dataset with Python

Vicent Giner-Bosch
In reply to this post by Peck, Jon
Hello again.

I managed to solve it following the suggestion from Jon:


> To make your logic work in SPSS 15, you need to add the new variable
> to the active dataset, and then if you want a subset of cases, select
> on an appropriate indicator variable after you close the cursor.


Well, in fact, it was an "inspiration" -I did what I needed using SPSS
syntax, neither Python-SPSS nor cursors.

This is more or less what I've done:





* Using the "big" dataset as active dataset .

COMPUTE in_200606 = ... create/compute a variable that amounts to 1 if the
row belongs to 2006-06, and "blank/null" if not ... .

COMPUTE in_200607 = ... idem for this other date ... .

... idem for all the months to be considered ...

EXECUTE .


* So, now we have several variables  in_YYYYMM-like  in the big dataset,
* so that  in_YYYYMM  takes the value 1 for a given row if and only that row
* has DATE = YYYY-MM .

* Now, we create a new dataset, as an agregation of the previous one, by
* custormer ID. For each custormer, we compute the maximum of the variables
* in_YYYYMM .

DATASET DECLARE customers_months.
AGGREGATE
  /OUTFILE='customers_monts'
  /PRESORTED
  /BREAK= ID
  /in_200606 = MAX(in_200606) /in_200607 = MAX(in_200607)
 ... the same for all the months to be considered... .






This worked for me. Now, I have to delete the auxiliary columns in the big
dataset.

This way of doing things is not very smart (you spend lot of time and disk
space for computing the auxiliary variables, when the "big" dataset is
really big), but it worked, in this case.

Suggestions or comments will be welcomed, anyway.

Thank you.

--
Vicent

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: [BULK] RE: [SPSSX-L] Creating a new dataset with Python

Peck, Jon
In reply to this post by Vicent Giner-Bosch
Just to follow up on this, answers are below.

-----Original Message-----
From: Vicent Giner Bosch [mailto:[hidden email]]
Sent: Wednesday, January 02, 2008 4:24 AM
To: Peck, Jon; [hidden email]
Subject: [BULK] RE: [SPSSX-L] Creating a new dataset with Python
Importance: Low

Hello, Jon and everyone.


> In SPSS 15 you are limited to one active cursor at a time.
> SPSS 16 provides a Dataset class that allows an unlimited
> number of cursors (and is more flexible in other ways as well).


Do you mean that the following is wrong (in version 15)?



BEGIN PROGRAM .

import spss, spssdata

cursor1 = spssdata.Spssdata(accessType='r', indexes=( ... list of useful
variables ... ))

cursor2 = spssdata.Spssdata(accessType='n')
[>>>Peck, Jon]
Yes, this is wrong.  Only one cursor can be active at a time in SPSS 15.

...



I do not have access to version 16 yet. Our University usually has the last
available version, but not in this case. Maybe, that's because we are in
Spain (?).

[>>>Peck, Jon] International distribution is usually a little slower, but the delay may have more to do with how the university times the deployment of the upgrade.


>
> You are getting this particular error message because the
> implementation of accessType='n' requires that some syntax be
> run to define the dataset before it starts to add cases.


Well, I do not understand what you mean, because before adding cases to
"cursor2", I do this (right after declaring/creating "cursor2"):



cursor2.append(spssdata.vdef( ... variable definition ... )) ...
cursor2.append(spssdata.vdef( ... variable definition ... ))

cursor2.commitdict()


I thought that was enough for defining the "columns" of the data set.
[>>>Peck, Jon] cursor1 is active.  While it is active you cannot run commands with Submit.  When you create a new dataset using the Spssdata class, what actually happens is that the code runs SPSS commands to define the new dataset and then switches into append mode to add the cases.  But you can't do that with another cursor already active.



>
> To make your logic work in SPSS 15, you need to add the new
> variable to the active dataset, and then if you want a subset
> of cases, select on an appropriate indicator variable after
> you close the cursor.


What I need/want, and was trying to perform with that piece of code, is the
following:

- I have a "big" dataset whit different observations (cases) for each
"person id". One person can appear several times in the big data set, but
with a different "date" field.

This could be an example of what I mean:


ID     DATE         Other data
--     -------      ----------
1      2006-08      ......
1      2006-09      ......
1      2006-11      ......
1      2007-02      ......
2      2006-08      ......
2      2006-09      ......
2      2006-10      ......
2      2006-11      ......
3      2006-09      ......
3      2007-03      ......
..     .......      ......


Some people "appear" every month. For other people, I only have information
related to several months, but not all the months.

I want to build a new data set, in which every person appears only once, and
each column tells wether or not we have information for each person in a
given month. That is, something like this:


ID    in_200608   in_200609   in_200610  ...
--    ---------   ---------   ---------
1         1           1           0      ...
2         1           1           1      ...
3         1           0           0      ...
..        ..          ..          ..     ...


The initial "big" data set is already sorted by ID and DATE, say.

I thought that it would be possible to create the "agregated" data set by
means of reading the "big" data set, row by row, and (creating and) updating
the "agregated" data set at the same time. And using cursors was the way I
thought it seemed more logical to me.
[>>>Peck, Jon] As you have discovered, the Aggregate procedure is the way to go.



--
Vicent



> -----Original Message-----
> From: SPSSX(r) Discussion [mailto:[hidden email]]
> On Behalf Of Vicent Giner Bosch
> Sent: Monday, December 31, 2007 5:55 AM
> To: [hidden email]
> Subject: [SPSSX-L] Creating a new dataset with Python
> Importance: High
>
> This is a question about using "programming" features of SPSS 15.0.1.
>
> The following error arrises when trying to execute a "BEGIN PROGRAM"
> procedure:
>
>
> spss.spss150.errMsg.SpssError: [errLevel 63] Submit cannot be
> used from within a user procedure or while there is a Cursor instance.
>
>
> The procedure is just like this:
>
>
> BEGIN PROGRAM .
>
> import spss, spssdata
>
> cursor1 = spssdata.Spssdata(accessType='r', indexes=( ...
> list of useful variables ... ))
>
> cursor2 = spssdata.Spssdata(accessType='n')
>
> cursor2.append(spssdata.vdef( ... variable definition ... )) ...
> cursor2.append(spssdata.vdef( ... variable definition ... ))
>
> cursor2.commitdict()
>
> in_firsttime = 1
> previous_id = -1
>
> for fila in cursor1:
>     if(fila.id != previous_id):
>         if (in_firsttime !=  1):
>             cursor2.CommitCase()
>         else:
>             in_firsttime = 0
>
>         cursor2.appendvalue('id', fila.id)
>
>     colname = ... build the name of the colum, using the
> info. contained by "fila" ...
>     cursor2.appendvalue(colname, ... a computed value ... )
>
>     previous_id = fila.id
>
> cursor2.CommitCase()
> # finally:
> cursor1.CClose()
> cursor2.CClose()
> del cursor1
> del cursor2
> del ...
> END PROGRAM .
>
>
> What is wrong?
>
> Can't I use two cursors at once? Should I use  "fetchone()"
> instead of "for fila in cursor1:"?
>
> I don't get the point...
>
> Thank you in advance, and happy 2008!
>
> --
> Vicent Giner Bosch
>

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD