|
I am working with data collected over a short period of about 300 milliseconds. Measures are recorded every millisecond on joint alignment during a movement. Id like to use growth curve modeling to quantify the movements over time and have be using SPSS Mixed procedure. I first ran a quadratic model and found that the linear term was significant but not the quadratic. However if I run a cubic model the quadratic term become significant and if I run a quartic model the cubic become significant and so on. the individual plots indicate that some individuals do indeed have complex trajectories consistent with the higher order model. However i am puzzled for the fact that if I increase the complexity (going to the next order the highest term in the previous model becomes significant Any thoughts on this? William N. Dudley, PhD |
|
At 05:30 PM 1/21/2010, William Dudley WNDUDLEY wrote:
>I am working with data collected over a short period of about 300 >milliseconds. Measures are recorded every millisecond on joint >alignment during a movement. I first ran a quadratic model and found >that the linear term was significant but not the quadratic. However >if I run a cubic model the quadratic term become significant and if >I run a quartic model the cubic become significant and so on. To start with: How are you parameterizing your power terms? If you're numbering your milliseconds N=1 to 300, and using N, N**2, N**3, etc., as your terms, they'll be very highly correlated and you'll have all kinds of troubles in the estimating. ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
|
At 09:54 AM 1/22/2010, William Dudley WNDUDLEY wrote, off-list:
I have not notice a problem with high correlation with the time values in past models (but my experience is with model with less than ten time points). Here are the Pearson correlations of N, N**2, N**3, and N**4, where N=1,300; they are very high: |---|---------------|--------|--------|--------|--------| | | |N |N_2 |N_3 |N_4 | |---|---------------|--------|--------|--------|--------| |N |Pearson |1 |.968 |.917 |.866 | |---|---------------|--------|--------|--------|--------| |N_2|Pearson |.968 |1 |.986 |.958 | |---|---------------|--------|--------|--------|--------| |N_3|Pearson |.917 |.986 |1 |.992 | |---|---------------|--------|--------|--------|--------| |N_4|Pearson |.866 |.958 |.992 |1 | |---|---------------|--------|--------|--------|--------| That can make it hard for a regression or ANCOVA to 'decide' what influence to ascribe to what power. Commonly, both of two highly correlated variables can test non-significant even though the F test for including at least one of them is very strong. If you mean-center your N, the odd terms (N,N**3) are uncorrelated with the even ones (N**2,N**4). That leaves the odd and even terms, themselves, highly correlated, but it's an easy place to start. For NCtr=150-N, ..... (Correlations that should be zero aren't, quite, because subtracting 150 from N isn't quite mean-centering.) One more comment: Is modeling by polynomials the best course? Is there theoretical justification for a polynomial model, and if so of what order; or a theoretical way to interpret a conclusion that the curve is a polynomial of such-and-such order? =================== APPENDIX: Test code =================== NEW FILE. INPUT PROGRAM. . NUMERIC N (F5). . NUMERIC NCtr (F5). . LOOP N=1 TO 300. . COMPUTE NCtr = N-150. . END CASE. . END LOOP. END FILE. END INPUT PROGRAM. NUMERIC N_2 (F5) N_3 (F9) N_4 (F10). COMPUTE N_2 = N**2. COMPUTE N_3 = N**3. COMPUTE N_4 = N**4. NUMERIC NC_2 (F5) NC_3 (F9) NC_4 (F10). COMPUTE NC_2 = NCtr**2. COMPUTE NC_3 = NCtr**3. COMPUTE NC_4 = NCtr**4. CORRELATIONS /VARIABLES=N N_2 N_3 N_4 /PRINT=TWOTAIL NOSIG /MISSING=PAIRWISE . CORRELATIONS /VARIABLES=NCtr NC_2 NC_3 NC_4 /PRINT=TWOTAIL NOSIG /MISSING=PAIRWISE . (Delete significance levels and cell counts by hand, or using OMS) ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
|
In reply to this post by William Dudley WNDUDLEY
William Dudley WNDUDLEY wrote:
> I am working with data collected over a short period of about 300 > milliseconds. Measures are recorded every millisecond on joint > alignment during a movement. > > Id like to use growth curve modeling to quantify the movements over > time and have be using SPSS Mixed procedure. > > I first ran a quadratic model and found that the linear term was > significant but not the quadratic. However if I run a cubic model the > quadratic term become significant and if I run a quartic model the > cubic become significant and so on. > > the individual plots indicate that some individuals do indeed have > complex trajectories consistent with the higher order model. However > i am puzzled for the fact that if I increase the complexity (going to > the next order the highest term in the previous model becomes > significant Polynomials are not ideal for this type of model, as Richard Ristow writes. There are very few, if any mechanistic reasons to adopt such a model. The advantage of polynomial models is that they are extremely flexible. If you remember the Taylor series approximations from Calculus, that is a good illustration of how they work. Unfortunately, higher order polynomials wiggle so much that they lead to bizarre predicted values in between observed data values. I show this in the December 2008 issue The Monthly Mean * http://www.pmean.com/news/2008-12.html#3 and show even more extreme polynomials on my website: * http://www.pmean.com/08/OverfittingExample.html Two good alternatives available in R, but not SPSS, are nonlinear mixed regression models and cubic spline models. Your first choice should be a nonlinear mixed model if you know enough about the physics of joint alignment to build an appropriate nonlinear model. I know, for example, a bit about the mechanics of drug absorption and with a bit of differential equations modeling, you can show that the concentration of drugs in the bloodstream can often be modeled as a sum or difference of exponential functions. See * www.childrensmercy.org/stats/weblog2007/DifferenceInExponentials.asp for a practical example of how to fit these models. If you want to try a nonlinear model, but don't have a good nonlinear model to start with, there's an excellent book, Handbook of Nonlinear Regression Models by David Ratkowsky that can give you some simple nonlinear models to try. I'd recommend the nlme library in R, but there are other good packages both in R and in other programs. SPSS has a nonlinear model and a mixed model, but not both. Another good alternative, if you can't specify a nonlinear function is a cubic spline model. These models have built-in safeguards against overfitting. I can't recommend a mixed spline model off the top of my head, but I'm sure they are out there. Good luck! -- Steve Simon, Standard Disclaimer "What do all these numbers mean? Sensitivity, specificity, and likelihood ratios" Wednesday, February 17, 11am-noon, CST Free to all! www.pmean.com/webinars ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
|
In reply to this post by Richard Ristow
This question came up on the multilevel listserv a few weeks ago and the use
of orthogonal polynomial coefficients was recommended. Gene Maguin ________________________________ From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Richard Ristow Sent: Friday, January 22, 2010 4:56 PM To: [hidden email] Subject: Re: Increasing order of polynomials in growth curve At 09:54 AM 1/22/2010, William Dudley WNDUDLEY wrote, off-list: I have not notice a problem with high correlation with the time values in past models (but my experience is with model with less than ten time points). Here are the Pearson correlations of N, N**2, N**3, and N**4, where N=1,300; they are very high: |---|---------------|--------|--------|--------|--------| | | |N |N_2 |N_3 |N_4 | |---|---------------|--------|--------|--------|--------| |N |Pearson |1 |.968 |.917 |.866 | |---|---------------|--------|--------|--------|--------| |N_2|Pearson |.968 |1 |.986 |.958 | |---|---------------|--------|--------|--------|--------| |N_3|Pearson |.917 |.986 |1 |.992 | |---|---------------|--------|--------|--------|--------| |N_4|Pearson |.866 |.958 |.992 |1 | |---|---------------|--------|--------|--------|--------| That can make it hard for a regression or ANCOVA to 'decide' what influence to ascribe to what power. Commonly, both of two highly correlated variables can test non-significant even though the F test for including at least one of them is very strong. If you mean-center your N, the odd terms (N,N**3) are uncorrelated with the even ones (N**2,N**4). That leaves the odd and even terms, themselves, highly correlated, but it's an easy place to start. For NCtr=150-N, ..... (Correlations that should be zero aren't, quite, because subtracting 150 from N isn't quite mean-centering.) One more comment: Is modeling by polynomials the best course? Is there theoretical justification for a polynomial model, and if so of what order; or a theoretical way to interpret a conclusion that the curve is a polynomial of such-and-such order? =================== APPENDIX: Test code =================== NEW FILE. INPUT PROGRAM. . NUMERIC N (F5). . NUMERIC NCtr (F5). . LOOP N=1 TO 300. . COMPUTE NCtr = N-150. . END CASE. . END LOOP. END FILE. END INPUT PROGRAM. NUMERIC N_2 (F5) N_3 (F9) N_4 (F10). COMPUTE N_2 = N**2. COMPUTE N_3 = N**3. COMPUTE N_4 = N**4. NUMERIC NC_2 (F5) NC_3 (F9) NC_4 (F10). COMPUTE NC_2 = NCtr**2. COMPUTE NC_3 = NCtr**3. COMPUTE NC_4 = NCtr**4. CORRELATIONS /VARIABLES=N N_2 N_3 N_4 /PRINT=TWOTAIL NOSIG /MISSING=PAIRWISE . CORRELATIONS /VARIABLES=NCtr NC_2 NC_3 NC_4 /PRINT=TWOTAIL NOSIG /MISSING=PAIRWISE . (Delete significance levels and cell counts by hand, or using OMS) ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
|
Dear Colleagues,
Apologies for this non-SPSS related quesiton. I am researching my options for analyzing cross-sectional data. I have four years of survey data and would like to analyze it for trends over time. It is not repeated measures since participants are surveyed only once. They are, arguably, from the same population of college freshmen attending a small public liberal arts college. I began looking into time-series analysis but quickly came to realize the number of time points is insufficient. Would a simple regression analysis be valid, regressing the DV's on survey wave??
Thanks, John |
|
In principle, nothing hinders pooling all
yearly datasets together and treating the whole as a single dataset. The year of
observation (i.e. the college cohort) can be treated as another independent
variable if variation by year is relevant. This may or may not be the case. For instance, some time ago I had to
analyze several population surveys in In your case it may be either case. You
have only a few consecutive years observing successive college cohorts in a rather
stable environment such as the US college scene, so there is probably no
significant difference between these cohorts, but better make sure. You may try
to test the null hypothesis that all cohorts come from the same population
(i.e. that their mutual differences in key variables are not significant), and
if you reject such null hypothesis you may treat all cohorts as a single group,
thus enlarging your sample size at little cost. Hector From: SPSSX(r)
Discussion Dear Colleagues, Apologies for this non-SPSS related
quesiton. I am researching my options for
analyzing cross-sectional data. I have four years of survey data and
would like to analyze it for trends over time. It is not repeated measures
since participants are surveyed only once. They are, arguably, from the same population
of college freshmen attending a small public liberal arts college. I began looking into time-series analysis
but quickly came to realize the number of time points is insufficient. Would a
simple regression analysis be valid, regressing the DV's on survey wave?? Thanks, John |
|
Thank you for your thoughtful response, Hector. Your initial paragraph describes exactly my situation: pooling the data sets and treating year of observation as the IV. The survey pertains to drug & alcohol use and the adminstration wants to track whether usage is increasing, decreasing, not changing over the years.
From: Hector Maletta <[hidden email]> To: Sent: Mon, January 25, 2010 10:37:16 AM Subject: Re: alternative to time-series analysis In principle, nothing hinders pooling all yearly datasets together and treating the whole as a single dataset. The year of observation (i.e. the college cohort) can be treated as another independent variable if variation by year is relevant. This may or may not be the case. For instance, some time ago I had to analyze several population surveys in Brazil , and used the year to reflect the changing economic environment (growth, recession, etc.) which would have affected my dependent variables (mostly related to employment and poverty). For variables not affected by time, such as in a medical survey on the physiological effects of a treatment, the particular cohort may be irrelevant: you may ignore the year and treat the whole as a single sample. In your case it may be either case. You have only a few consecutive years observing successive college cohorts in a rather stable environment such as the US college scene, so there is probably no significant difference between these cohorts, but better make sure. You may try to test the null hypothesis that all cohorts come from the same population (i.e. that their mutual differences in key variables are not significant), and if you reject such null hypothesis you may treat all cohorts as a single group, thus enlarging your sample size at little cost. Hector
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of J P
Dear Colleagues,
Apologies for this non-SPSS related quesiton.
I am researching my options for analyzing cross-sectional data. I have four years of survey data and would like to analyze it for trends over time. It is not repeated measures since participants are surveyed only once. They are, arguably, from the same population of college freshmen attending a small public liberal arts college.
I began looking into time-series analysis but quickly came to realize the number of time points is insufficient. Would a simple regression analysis be valid, regressing the DV's on survey wave??
Thanks, John
|
|
In reply to this post by J P-6
You have several options. Pooling time series and cross-sections, with a dummy for each time point but the first will give you a "random intercepts" model. It allows for the mean of the dependent variable to change with each wave. You can also estimate a multi-level model or latent growth curve model. For a discussion of these options see the paper that Julie A. Phillips and I published last year in the Journal of Quantitative Criminology, entitled "A Comparison of Methods for Analyzing Criminological Panel Data" - David Greenberg, Sociology Department, New York University
----- Original Message ----- From: J P <[hidden email]> Date: Monday, January 25, 2010 9:51 am Subject: alternative to time-series analysis To: [hidden email] > Dear Colleagues, > > Apologies for this non-SPSS related quesiton. > > I am researching my options for analyzing cross-sectional data. I have > four years of survey data and would like to analyze it for trends > over time. It is not repeated measures since participants are surveyed > only once. They are, arguably, from the same population of college > freshmen attending a small public liberal arts college. > > I began looking into time-series analysis but quickly came to realize > the number of time points is insufficient. Would a simple regression > analysis be valid, regressing the DV's on survey wave?? > > Thanks, > John > > ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
|
In reply to this post by Hector Maletta
|
|
Art Kendall Social Research Consultants On 2/9/2010 2:54 PM, Jill Stoltzfus wrote: ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Art Kendall
Social Research Consultants |
|
Administrator
|
In reply to this post by Jill Stoltzfus
I'll not be surprised if some offers a more elegant solution, but I think this works. new file. dataset close all. data list list / timestr (a8). begin data "9:00 AM" "11:25 AM" "12:20 PM" "5:00 PM" "5:01 PM" "8:00 PM" "8:01 PM" "8:59 AM" end data. * Recode string time variable to military time * using some code from Raynald's website: * www.spsstools.net/Syntax/DatesTime/FromAM_PMtoMilitaryTime.txt . COMPUTE time=NUM(SUBSTR(timestr,1,5),time). DO IF (INDEX(timestr,'PM')>0 AND NOT (SUBSTR(timestr,1,2)='12')). - COMPUTE TIME=TIME + 43200. ELSE IF ((SUBSTR(timestr,1,2)='12') AND INDEX(timestr,'AM')>0). - COMPUTE TIME=TIME - 43200. END IF. FORMATS time (time). EXE. * In the DO-IF structure below, the time values * are in seconds since midnight. do if range(time,32400,61200). - compute timecat = 1. else if (time GT 61200) and (time LE 72000) . - compute timecat = 2. else. - compute timecat = 3. end if. exe. format timecat(f1.0). var lab timecat "Time category". val lab timecat 1 "9:00 AM - 5:00 PM" 2 "5:01 PM - 8:00 PM" 3 "8:01 PM - 8:59 AM" .
--
Bruce Weaver bweaver@lakeheadu.ca http://sites.google.com/a/lakeheadu.ca/bweaver/ "When all else fails, RTFM." PLEASE NOTE THE FOLLOWING: 1. My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above. 2. The SPSSX Discussion forum on Nabble is no longer linked to the SPSSX-L listserv administered by UGA (https://listserv.uga.edu/). |
|
Administrator
|
--- snip --- * In the DO-IF structure below, the time values * are in seconds since midnight. do if range(time,32400,61200). - compute timecat = 1. else if (time GT 61200) and (time LE 72000) . - compute timecat = 2. else. - compute timecat = 3. end if. exe. format timecat(f1.0). var lab timecat "Time category". val lab timecat 1 "9:00 AM - 5:00 PM" 2 "5:01 PM - 8:00 PM" 3 "8:01 PM - 8:59 AM" . I guess I could have used RANGE in the second part of that DO-IF too. I.e., do if range(time,32400,61200). - compute timecat = 1. else if range(time,61201,72000) . - compute timecat = 2. else. - compute timecat = 3. end if.
--
Bruce Weaver bweaver@lakeheadu.ca http://sites.google.com/a/lakeheadu.ca/bweaver/ "When all else fails, RTFM." PLEASE NOTE THE FOLLOWING: 1. My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above. 2. The SPSSX Discussion forum on Nabble is no longer linked to the SPSSX-L listserv administered by UGA (https://listserv.uga.edu/). |
|
In reply to this post by Bruce Weaver
|
|
In reply to this post by Bruce Weaver
At 03:41 PM 2/9/2010, Bruce Weaver suggested the code:
do if range(time,32400,61200). Code like this is much clearer if you use SPSS functions to specify time values, rather than 'magic numbers' like 61201, that you have calculated to be the correct time values but which are hard to understand, or check for accuracy, when you read the code. Like this (and removing the two '.exe' statements, which are not necessary, nor helpful): do if range(time,TIME.HMS(9,0,0),TIME.HMS(17,0,0)). - compute timecat = 1. else if range(time,TIME.HMS(17,0,0),TIME.HMS(20,0,0)). - compute timecat = 2. else. - compute timecat = 3. end if. format timecat(f1.0). var lab timecat "Time category". val lab timecat 1 "9:00 AM - 5:00 PM" 2 "5:01 PM - 8:00 PM" 3 "8:01 PM - 8:59 AM" . Frequencies timecat. |-----------------------------|---------------------------| |Output Created |22-FEB-2010 21:31:53 | |-----------------------------|---------------------------| timecat Time category |-----|---------------|---------|-------|-------------|---------------| | | |Frequency|Percent|Valid Percent|Cumulative | | | | | | |Percent | |-----|---------------|---------|-------|-------------|---------------| |Valid|1 9:00 AM - |4 |50.0 |50.0 |50.0 | | |5:00 PM | | | | | | |---------------|---------|-------|-------------|---------------| | |2 5:01 PM - |2 |25.0 |25.0 |75.0 | | |8:00 PM | | | | | | |---------------|---------|-------|-------------|---------------| | |3 8:01 PM - |2 |25.0 |25.0 |100.0 | | |8:59 AM | | | | | | |---------------|---------|-------|-------------|---------------| | |Total |8 |100.0 |100.0 | | |-----|---------------|---------|-------|-------------|---------------| LIST. |-----------------------------|---------------------------| |Output Created |22-FEB-2010 21:31:53 | |-----------------------------|---------------------------| timestr time timecat 9:00 AM 9:00:00 1 11:25 AM 11:25:00 1 12:20 PM 12:20:00 1 5:00 PM 17:00:00 1 5:01 PM 17:01:00 2 8:00 PM 20:00:00 2 8:01 PM 20:01:00 3 8:59 AM 8:59:00 3 Number of cases read: 8 Number of cases listed: 8 ============================= APPENDIX: Test data, and code ============================= data list list / timestr (a8). begin data "9:00 AM" "11:25 AM" "12:20 PM" "5:00 PM" "5:01 PM" "8:00 PM" "8:01 PM" "8:59 AM" end data. * Recode string time variable to military time * using some code from Raynald's website: * www.spsstools.net/Syntax/DatesTime/FromAM_PMtoMilitaryTime.txt . COMPUTE time=NUM(SUBSTR(timestr,1,5),time). DO IF (INDEX(timestr,'PM')>0 AND NOT (SUBSTR(timestr,1,2)='12')). - COMPUTE TIME=TIME + 43200. ELSE IF ((SUBSTR(timestr,1,2)='12') AND INDEX(timestr,'AM')>0). - COMPUTE TIME=TIME - 43200. END IF. FORMATS time (time). do if range(time,TIME.HMS(9,0,0),TIME.HMS(17,0,0)). - compute timecat = 1. else if range(time,TIME.HMS(17,0,0),TIME.HMS(20,0,0)). - compute timecat = 2. else. - compute timecat = 3. end if. format timecat(f1.0). var lab timecat "Time category". val lab timecat 1 "9:00 AM - 5:00 PM" 2 "5:01 PM - 8:00 PM" 3 "8:01 PM - 8:59 AM" . Frequencies timecat. LIST. ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
|
In reply to this post by Bruce Weaver
|
|
In reply to this post by Bruce Weaver
|
|
Jill,
I think a better way is to build two contrast variables. The first variable contrasts ASC+VMP=1 vs control=-1 and the second contrasts ASC=1 versus VMP=-1 (or vice-versa), with control=0. These are orthogonal contrasts. If I remember correctly, Cohen calls this scheme an 'effects' coding, and, as I read the description of available contrasts, it is not available in spss logistic. BUT, note that the computation of the odds ratios are not what they would be with a 0-1 coding scheme because you have to take into account the -1 coefficient. Gene Maguin >>> Hello everyone. I have a contrast coding question that I'd appreciate help with to make sure I'm choosing the most appropriate option. Briefly, I want to compare risk factors for mesh extrusion (binary outcome) in women who've undergone one of two surgical techniques (ASC versus VMP). For my logistic regression, I need to compare (and report odds ratios for) the following: 1) ASC and VMP together ("GroupALL" versus controls who didn't need mesh extrusion); and 2) the separate effects of ASC versus VMP. Would something like Helmert coding allow me to accomplish my goal, since this takes into account the average effect of all subsequent categories? Or do I need to create a "GroupALL" code for the combined effect within a single "Group" predictor that already includes "ASC" and "VMP"? Thanks very much for your help. Jill ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
|
| Free forum by Nabble | Edit this page |
