Hi,
This is my first use of this site and excuse me for the long question. To see the question quickly please go to (4). The question is related to understand the looping benefits by using the following problem. Unfortunately, I know anything about the coding in SPSS, other than using PASTE function to ease some of the analyses. 1- I have two variables, 'A' and 'B'. 2- 'A' is the engine RPM, and 'B' is emissions of, say carbonmonoixide. 3- Hence I am trying to understand the relationship between the engine power and the emissions. 4- These two variables have values for each seconds. But their timelines are not syncronized (THIS IS THE PROBLEM). 5- 'A' has correct timeline. But 'B' is coming 240 or 300 second behind from 'A'. Other words, in the first 240 or 300 s, 'B' has irrelevant cases but not zero. 6- Since there are other emission types (that is I have 'C', 'D' and a several more variables that are need to syncronized with 'A' at different lags) I would like to use looping syntax for an accurate solution instead of manually check of every one of them. 7- The objective is: to find correct number of cases to be removed from the beginning in 'B' to syncronize with the 'A'. 8- One of the method that corrsing in my mind is using correlation or regression. If an automated regression analysis can be performed, than I can find the highest R or R2, by removing the cases of 'B' one by one, and in the end I can find the correct number of cases to be removed from the beginning. 9- And the question is how can I achieve this. 10- I know you may suggest doing this basicly manually (scrolling down the 'B' and mark the variations). But the data includes a number of different operation stages, and there are numerous data, hence doing this with eyes is difficult and not reliable very much to me either. Also the primary aim is to learn the looping utilization. Thank you for your time. |
Administrator
|
What you want to do would be much clearer if you provided a small example of what the data file looks like currently, and what you want it to look like before running your regression model.
Re "automated" regression, see these comments on "stepwise" and other automatic selection methods: http://www.stata.com/support/faqs/statistics/stepwise-regression-problems/ And see Mike Babyak's nice article on over-fitting. http://people.duke.edu/~mababyak/papers/babyakregression.pdf HTH.
--
Bruce Weaver bweaver@lakeheadu.ca http://sites.google.com/a/lakeheadu.ca/bweaver/ "When all else fails, RTFM." PLEASE NOTE THE FOLLOWING: 1. My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above. 2. The SPSSX Discussion forum on Nabble is no longer linked to the SPSSX-L listserv administered by UGA (https://listserv.uga.edu/). |
[I have not yet seen the OP's original post.]
For a lot of information on using various lags, check out a thread in Nabble from June 6, "Problem running lag syntax". As Bruce's references point out, over-fitting is a problem for starting with multiple choices. In addition, it seems to me that the problem as stated in the original post requires a different approach, based on the physics. As I read it, this is something like looking at RPM for an engine and measuring emissions at the tailpipe, where there will be a time lag that depends on the velocity of the exhaust... which will therefore depend on the preceding RPM. Higher RPM produces more gas; therefore, more velocity or greater density (or both). Since exhaust gas is compressible, the lag at the start is apt to differ from the lag at the end, and those lags will vary with the acceleration of the RPM. - The way to associate simple RPM with pollutants would be to examine the stable numbers available for a fixed period after deleting a safely-large number of measures from the start. - The way to measure the effect of a style of acceleration would be integrate (in some fashion) the values between the end of one stabilized speed and the next stabilized speed. - RPM does not measure "engine power," which I think would need in addition to incorporate "load". Rapid acceleration under heavy load, as I understand it, produces the greatest amount of unburnt hydrocarbons, which the computerized fuel-feed system attempts to minimize. -- Rich Ulrich > Date: Sun, 18 Aug 2013 08:20:21 -0700 > From: [hidden email] > Subject: Re: looping-find the best relationship between two variables > To: [hidden email] > > What you want to do would be much clearer if you provided a small example of > what the data file looks like currently, and what you want it to look like > before running your regression model. > > Re "automated" regression, see these comments on "stepwise" and other > automatic selection methods: > > http://www.stata.com/support/faqs/statistics/stepwise-regression-problems/ > > And see Mike Babyak's nice article on over-fitting. > > http://people.duke.edu/~mababyak/papers/babyakregression.pdf > > HTH. > > > > ex902 wrote > > Hi, > > This is my first use of this site and excuse me for the long question. To > > see the question quickly please go to (4). > > > > The question is related to understand the looping benefits by using the > > following problem. Unfortunately, I know anything about the coding in > > SPSS, other than using PASTE function to ease some of the analyses. > > > > 1- I have two variables, 'A' and 'B'. > > > > 2- 'A' is the engine RPM, and 'B' is emissions of, say carbonmonoixide. > > > > 3- Hence I am trying to understand the relationship between the engine > > power and the emissions. > > > > 4- These two variables have values for each seconds. But their timelines > > are not syncronized (THIS IS THE PROBLEM). > > > > 5- 'A' has correct timeline. But 'B' is coming 240 or 300 second behind > > from 'A'. Other words, in the first 240 or 300 s, 'B' has irrelevant cases > > but not zero. > > > > 6- Since there are other emission types (that is I have 'C', 'D' and a > > several more variables that are need to syncronized with 'A' at different > > lags) I would like to use looping syntax for an accurate solution instead > > of manually check of every one of them. > > > > 7- The objective is: to find correct number of cases to be removed from > > the beginning in 'B' to syncronize with the 'A'. > > > > 8- One of the method that corrsing in my mind is using correlation or > > regression. If an automated regression analysis can be performed, than I > > can find the highest R or R2, by removing the cases of 'B' one by one, and > > in the end I can find the correct number of cases to be removed from the > > beginning. > > > > 9- And the question is how can I achieve this. > > > > 10- I know you may suggest doing this basicly manually (scrolling down the > > 'B' and mark the variations). But the data includes a number of different > > operation stages, and there are numerous data, hence doing this with eyes > > is difficult and not reliable very much to me either. Also the primary aim > > is to learn the looping utilization. > > > > Thank you for your time. > > ... > View this message in context: http://spssx-discussion.1045642.n5.nabble.com/looping-find-the-best-relationship-between-two-variables-tp5721634p5721635.html |
Administrator
|
In reply to this post by ex902
Take a look at the CCF function.
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me. --- "Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis." Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?" |
Free forum by Nabble | Edit this page |