Summing Unknown Number of Observations after Cases to Variables Inside a Loop

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Summing Unknown Number of Observations after Cases to Variables Inside a Loop

Krinsky, Alan-2
Summing Unknown Number of Observations after Cases to Variables Inside a Loop

I am looping my syntax within Python (thanks for previous assistance!), but now I find myself confronted by a new challenge. In one program, I need to transform cases into variables and then sum the values.

So, each person starts with a varying number of rows for the observation. When I transform from cases to variables, each person now has a single row, with all the observations in the row. Individuals have varying numbers of observations, but for any one group of individuals, there will be a maximum number of observations (based on who has the most). I get observation.1, observation.2, observation.3, etc. When I do this for a single group, I can run the program until this step, and then insert the appropriate number into the Compute command: COMPUTE Total=SUM(observation.1 TO observation.36). In this case, at least one person has 36 observations, but the summing works even for the individuals who do not.

What I want to do now is run the same code for a number of groups at once, using Python. But each group can have a different max for observation.x, so my problem is taking that x and inserting it in the COMPUTE...SUM operation; in each loop through the Python, this x will likely change. I tried setting x to a high number, say observation.80, knowing that no individual will ever have 80 observations, but that did not work.

Anyone know of a way to identify the max observation.x value in each loop and then insert that in the COMPUTE...SUM operation? Or, is there another way to get around it, perhaps setting a high max value and filling in the missing values to zero to avoid an error? Or maybe running some additional loop, that says sum for each individual the first two observations, then add the third, and if unchanged stop, if larger add the fourth value, etc.? I guess this would be helpful even running for a single group, without the Python, so that one did not need to stop and insert manually.

Thanks, and be well!

Alan


"Baseball is ninety percent mental. The other half is physical."
"Nobody goes there anymore because it's too crowded."
-Yogi Berra

Alan D. Krinsky  PhD, MPH
Medical Management Interventions Manager
UMass Memorial Health Care
Hahnemann Campus
281 Lincoln St.
Worcester, MA 01605
Phone: 508-334-5854
Fax: 508-793-6086
E-mail: [hidden email]

The information transmitted is intended only for the person or entity to which it is addressed and may contain confidential and/or privileged material. Any review, transmission, re-transmission, dissemination or other use of, or taking of any action in reliance upon this information by persons or entities other than the intended recipient is prohibited. If you received this in error, please contact the sender and delete the material from any computer.
Reply | Threaded
Open this post in threaded view
|

Re: Summing Unknown Number of Observations after Cases to Variables Inside a Loop

Maguin, Eugene
Alan,

I don't use python so what I say may be worthless. Also, if you are doing
this problem to improve your python abilities, then skip what follows
because it's not relevant. That said, I'm also not quite sure what you are
doing. You start off with multiple records per person and, it seems, that
persons are in groups. What I don't clearly understand is how the summing is
supposed to work. Are you summing across observations and persons to get a
sum for the group? Or, are you summing only across observations to get a sum
for each person in the group? Or, is each group in a separate file?

I understand that you have restructured your data from long to wide, but I'd
like to suggest that you consider doing the summing within the long format
using aggregate. I think it will be easier and simpler.

Given the restructured data and working only in syntax, I don't think there
is a method for doing this. Working in python, I imagine, but don't know
that, there is a way to determine the maximum number of variables. Others
are experts with python.

Gene Maguin


>>I am looping my syntax within Python (thanks for previous assistance!),
but now I find myself confronted by a new challenge. In one program, I need
to transform cases into variables and then sum the values.

So, each person starts with a varying number of rows for the observation.
When I transform from cases to variables, each person now has a single row,
with all the observations in the row. Individuals have varying numbers of
observations, but for any one group of individuals, there will be a maximum
number of observations (based on who has the most). I get observation.1,
observation.2, observation.3, etc. When I do this for a single group, I can
run the program until this step, and then insert the appropriate number into
the Compute command: COMPUTE Total=SUM(observation.1 TO observation.36). In
this case, at least one person has 36 observations, but the summing works
even for the individuals who do not.

What I want to do now is run the same code for a number of groups at once,
using Python. But each group can have a different max for observation.x, so
my problem is taking that x and inserting it in the COMPUTE...SUM operation;
in each loop through the Python, this x will likely change. I tried setting
x to a high number, say observation.80, knowing that no individual will ever
have 80 observations, but that did not work.

Anyone know of a way to identify the max observation.x value in each loop
and then insert that in the COMPUTE...SUM operation? Or, is there another
way to get around it, perhaps setting a high max value and filling in the
missing values to zero to avoid an error? Or maybe running some additional
loop, that says sum for each individual the first two observations, then add
the third, and if unchanged stop, if larger add the fourth value, etc.? I
guess this would be helpful even running for a single group, without the
Python, so that one did not need to stop and insert manually.

Thanks, and be well!

Alan


"Baseball is ninety percent mental. The other half is physical."
"Nobody goes there anymore because it's too crowded."
-Yogi Berra

Alan D. Krinsky  PhD, MPH
Medical Management Interventions Manager
UMass Memorial Health Care
Hahnemann Campus
281 Lincoln St.
Worcester, MA 01605
Phone: 508-334-5854
Fax: 508-793-6086
E-mail: [hidden email]

The information transmitted is intended only for the person or entity to
which it is addressed and may contain confidential and/or privileged
material. Any review, transmission, re-transmission, dissemination or other
use of, or taking of any action in reliance upon this information by persons
or entities other than the intended recipient is prohibited. If you received
this in error, please contact the sender and delete the material from any
computer.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Summing Unknown Number of Observations after Cases to Variables Inside a Loop

Jon K Peck
In reply to this post by Krinsky, Alan-2

If I understand what you are doing, after you restructure the data, there would be the same number of variables for each group, but values would be missing when there were fewer repeats than the maximum.
So if you just sum the maximum number of variables, you should get the number you want.  IOW, you shouldn't need to know how many repeats actually occurred for a particular group.

However, if you are working across separate CtoV's so that the number could vary, you could create a VariableDict object for each dataset and inspect the variable names to find the largest number affixed to a variable name, and that would tell you the max.  Or, more directly, suppose the new variable is named
varnn, where nn is the group number.  Then you could create a list of the variables to sum like this.

import spss, spssaux

numberedvars = spssaux.VariableDict(pattern=r"var\d+")
spss.Submit("COMPUTE varsum=sum(%s)." % ",".join(numberedvars.variables))

That would list each variable explicitly.  The pattern "var\d+" means variables whose names start with var and have one or more digits following.

HTH,
Jon Peck
SPSS, an IBM Company
[hidden email]
312-651-3435



From: "Krinsky, Alan" <[hidden email]>
To: [hidden email]
Date: 03/23/2010 07:44 AM
Subject: [SPSSX-L] Summing Unknown Number of Observations after Cases to              Variables              Inside a Loop
Sent by: "SPSSX(r) Discussion" <[hidden email]>





I am looping my syntax within Python (thanks for previous assistance!), but now I find myself confronted by a new challenge. In one program, I need to transform cases into variables and then sum the values.

So, each person starts with a varying number of rows for the observation. When I transform from cases to variables, each person now has a single row, with all the observations in the row. Individuals have varying numbers of observations, but for any one group of individuals, there will be a maximum number of observations (based on who has the most). I get observation.1, observation.2, observation.3, etc. When I do this for a single group, I can run the program until this step, and then insert the appropriate number into the Compute command: COMPUTE Total=SUM(observation.1 TO observation.36). In this case, at least one person has 36 observations, but the summing works even for the individuals who do not.

What I want to do now is run the same code for a number of groups at once, using Python. But each group can have a different max for observation.x, so my problem is taking that x and inserting it in the COMPUTE...SUM operation; in each loop through the Python, this x will likely change. I tried setting x to a high number, say observation.80, knowing that no individual will ever have 80 observations, but that did not work.

Anyone know of a way to identify the max observation.x value in each loop and then insert that in the COMPUTE...SUM operation? Or, is there another way to get around it, perhaps setting a high max value and filling in the missing values to zero to avoid an error? Or maybe running some additional loop, that says sum for each individual the first two observations, then add the third, and if unchanged stop, if larger add the fourth value, etc.? I guess this would be helpful even running for a single group, without the Python, so that one did not need to stop and insert manually.

Thanks, and be well!

Alan

"Baseball is ninety percent mental. The other half is physical."
"Nobody goes there anymore because it's too crowded."

-Yogi Berra


Alan D. Krinsky  PhD, MPH

Medical Management Interventions Manager

UMass Memorial Health Care

Hahnemann Campus

281 Lincoln St.

Worcester, MA 01605

Phone: 508-334-5854

Fax: 508-793-6086

E-mail: [hidden email]

The information transmitted is intended only for the person or entity to which it is addressed and may contain confidential and/or privileged material. Any review, transmission, re-transmission, dissemination or other use of, or taking of any action in reliance upon this information by persons or entities other than the intended recipient is prohibited. If you received this in error, please contact the sender and delete the material from any computer.