|
I am looping my syntax within Python (thanks for previous assistance!), but now I find myself confronted by a new challenge. In one program, I need to transform cases into variables and then sum the values. So, each person starts with a varying number of rows for the observation. When I transform from cases to variables, each person now has a single row, with all the observations in the row. Individuals have varying numbers of observations, but for any one group of individuals, there will be a maximum number of observations (based on who has the most). I get observation.1, observation.2, observation.3, etc. When I do this for a single group, I can run the program until this step, and then insert the appropriate number into the Compute command: COMPUTE Total=SUM(observation.1 TO observation.36). In this case, at least one person has 36 observations, but the summing works even for the individuals who do not. What I want to do now is run the same code for a number of groups at once, using Python. But each group can have a different max for observation.x, so my problem is taking that x and inserting it in the COMPUTE...SUM operation; in each loop through the Python, this x will likely change. I tried setting x to a high number, say observation.80, knowing that no individual will ever have 80 observations, but that did not work. Anyone know of a way to identify the max observation.x value in each loop and then insert that in the COMPUTE...SUM operation? Or, is there another way to get around it, perhaps setting a high max value and filling in the missing values to zero to avoid an error? Or maybe running some additional loop, that says sum for each individual the first two observations, then add the third, and if unchanged stop, if larger add the fourth value, etc.? I guess this would be helpful even running for a single group, without the Python, so that one did not need to stop and insert manually. Thanks, and be well! Alan "Baseball is ninety percent mental. The other half is physical."
|
|
Alan,
I don't use python so what I say may be worthless. Also, if you are doing this problem to improve your python abilities, then skip what follows because it's not relevant. That said, I'm also not quite sure what you are doing. You start off with multiple records per person and, it seems, that persons are in groups. What I don't clearly understand is how the summing is supposed to work. Are you summing across observations and persons to get a sum for the group? Or, are you summing only across observations to get a sum for each person in the group? Or, is each group in a separate file? I understand that you have restructured your data from long to wide, but I'd like to suggest that you consider doing the summing within the long format using aggregate. I think it will be easier and simpler. Given the restructured data and working only in syntax, I don't think there is a method for doing this. Working in python, I imagine, but don't know that, there is a way to determine the maximum number of variables. Others are experts with python. Gene Maguin >>I am looping my syntax within Python (thanks for previous assistance!), but now I find myself confronted by a new challenge. In one program, I need to transform cases into variables and then sum the values. So, each person starts with a varying number of rows for the observation. When I transform from cases to variables, each person now has a single row, with all the observations in the row. Individuals have varying numbers of observations, but for any one group of individuals, there will be a maximum number of observations (based on who has the most). I get observation.1, observation.2, observation.3, etc. When I do this for a single group, I can run the program until this step, and then insert the appropriate number into the Compute command: COMPUTE Total=SUM(observation.1 TO observation.36). In this case, at least one person has 36 observations, but the summing works even for the individuals who do not. What I want to do now is run the same code for a number of groups at once, using Python. But each group can have a different max for observation.x, so my problem is taking that x and inserting it in the COMPUTE...SUM operation; in each loop through the Python, this x will likely change. I tried setting x to a high number, say observation.80, knowing that no individual will ever have 80 observations, but that did not work. Anyone know of a way to identify the max observation.x value in each loop and then insert that in the COMPUTE...SUM operation? Or, is there another way to get around it, perhaps setting a high max value and filling in the missing values to zero to avoid an error? Or maybe running some additional loop, that says sum for each individual the first two observations, then add the third, and if unchanged stop, if larger add the fourth value, etc.? I guess this would be helpful even running for a single group, without the Python, so that one did not need to stop and insert manually. Thanks, and be well! Alan "Baseball is ninety percent mental. The other half is physical." "Nobody goes there anymore because it's too crowded." -Yogi Berra Alan D. Krinsky PhD, MPH Medical Management Interventions Manager UMass Memorial Health Care Hahnemann Campus 281 Lincoln St. Worcester, MA 01605 Phone: 508-334-5854 Fax: 508-793-6086 E-mail: [hidden email] The information transmitted is intended only for the person or entity to which it is addressed and may contain confidential and/or privileged material. Any review, transmission, re-transmission, dissemination or other use of, or taking of any action in reliance upon this information by persons or entities other than the intended recipient is prohibited. If you received this in error, please contact the sender and delete the material from any computer. ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
|
In reply to this post by Krinsky, Alan-2
If I understand what you are doing, after you restructure the data, there would be the same number of variables for each group, but values would be missing when there were fewer repeats than the maximum. So if you just sum the maximum number of variables, you should get the number you want. IOW, you shouldn't need to know how many repeats actually occurred for a particular group. However, if you are working across separate CtoV's so that the number could vary, you could create a VariableDict object for each dataset and inspect the variable names to find the largest number affixed to a variable name, and that would tell you the max. Or, more directly, suppose the new variable is named varnn, where nn is the group number. Then you could create a list of the variables to sum like this. import spss, spssaux numberedvars = spssaux.VariableDict(pattern=r"var\d+") spss.Submit("COMPUTE varsum=sum(%s)." % ",".join(numberedvars.variables)) That would list each variable explicitly. The pattern "var\d+" means variables whose names start with var and have one or more digits following. HTH, Jon Peck SPSS, an IBM Company [hidden email] 312-651-3435
I am looping my syntax within Python (thanks for previous assistance!), but now I find myself confronted by a new challenge. In one program, I need to transform cases into variables and then sum the values. So, each person starts with a varying number of rows for the observation. When I transform from cases to variables, each person now has a single row, with all the observations in the row. Individuals have varying numbers of observations, but for any one group of individuals, there will be a maximum number of observations (based on who has the most). I get observation.1, observation.2, observation.3, etc. When I do this for a single group, I can run the program until this step, and then insert the appropriate number into the Compute command: COMPUTE Total=SUM(observation.1 TO observation.36). In this case, at least one person has 36 observations, but the summing works even for the individuals who do not. What I want to do now is run the same code for a number of groups at once, using Python. But each group can have a different max for observation.x, so my problem is taking that x and inserting it in the COMPUTE...SUM operation; in each loop through the Python, this x will likely change. I tried setting x to a high number, say observation.80, knowing that no individual will ever have 80 observations, but that did not work. Anyone know of a way to identify the max observation.x value in each loop and then insert that in the COMPUTE...SUM operation? Or, is there another way to get around it, perhaps setting a high max value and filling in the missing values to zero to avoid an error? Or maybe running some additional loop, that says sum for each individual the first two observations, then add the third, and if unchanged stop, if larger add the fourth value, etc.? I guess this would be helpful even running for a single group, without the Python, so that one did not need to stop and insert manually. Thanks, and be well! Alan
"Baseball is ninety percent mental.
The other half is physical." The information transmitted is intended only for the person or entity to which it is addressed and may contain confidential and/or privileged material. Any review, transmission, re-transmission, dissemination or other use of, or taking of any action in reliance upon this information by persons or entities other than the intended recipient is prohibited. If you received this in error, please contact the sender and delete the material from any computer.
|
| Free forum by Nabble | Edit this page |
