looping-find the best relationship between two variables

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

looping-find the best relationship between two variables

ex902
Hi,
This is my first use of this site and excuse me for the long question. To see the question quickly please go to (4).

The question is related to understand the looping benefits by using the following problem. Unfortunately, I know anything about the coding in SPSS, other than using PASTE function to ease some of the analyses.

1- I have two variables, 'A' and 'B'.

2- 'A' is the engine RPM, and 'B' is emissions of, say carbonmonoixide.

3- Hence I am trying to understand the relationship between the engine power and the emissions.

4- These two variables have values for each seconds. But their timelines are not syncronized (THIS IS THE PROBLEM).

5- 'A' has correct timeline. But 'B' is coming 240 or 300 second behind from 'A'. Other words, in the first 240 or 300 s, 'B' has irrelevant cases but not zero.

6- Since there are other emission types (that is I have 'C', 'D' and a several more variables that are need to syncronized with 'A' at different lags) I would like to use looping syntax for an accurate solution instead of manually check of every one of them.
 
7- The objective is: to find correct number of cases to be removed from the beginning in 'B' to syncronize with the 'A'.

8- One of the method that corrsing in my mind is using correlation or regression. If an automated regression analysis can be performed, than I can find the highest R or R2, by removing the cases of 'B' one by one, and in the end I can find the correct number of cases to be removed from the beginning.

9- And the question is how can I achieve this.

10- I know you may suggest doing this basicly manually (scrolling down the 'B' and mark the variations). But the data includes a number of different operation stages, and there are numerous data, hence doing this with eyes is difficult and not reliable very much to me either. Also the primary aim is to learn the looping utilization.

Thank you for your time.

Reply | Threaded
Open this post in threaded view
|

Re: looping-find the best relationship between two variables

Bruce Weaver
Administrator
What you want to do would be much clearer if you provided a small example of what the data file looks like currently, and what you want it to look like before running your regression model.  

Re "automated" regression, see these comments on "stepwise" and other automatic selection methods:

  http://www.stata.com/support/faqs/statistics/stepwise-regression-problems/

And see Mike Babyak's nice article on over-fitting.

  http://people.duke.edu/~mababyak/papers/babyakregression.pdf

HTH.


ex902 wrote
Hi,
This is my first use of this site and excuse me for the long question. To see the question quickly please go to (4).

The question is related to understand the looping benefits by using the following problem. Unfortunately, I know anything about the coding in SPSS, other than using PASTE function to ease some of the analyses.

1- I have two variables, 'A' and 'B'.

2- 'A' is the engine RPM, and 'B' is emissions of, say carbonmonoixide.

3- Hence I am trying to understand the relationship between the engine power and the emissions.

4- These two variables have values for each seconds. But their timelines are not syncronized (THIS IS THE PROBLEM).

5- 'A' has correct timeline. But 'B' is coming 240 or 300 second behind from 'A'. Other words, in the first 240 or 300 s, 'B' has irrelevant cases but not zero.

6- Since there are other emission types (that is I have 'C', 'D' and a several more variables that are need to syncronized with 'A' at different lags) I would like to use looping syntax for an accurate solution instead of manually check of every one of them.
 
7- The objective is: to find correct number of cases to be removed from the beginning in 'B' to syncronize with the 'A'.

8- One of the method that corrsing in my mind is using correlation or regression. If an automated regression analysis can be performed, than I can find the highest R or R2, by removing the cases of 'B' one by one, and in the end I can find the correct number of cases to be removed from the beginning.

9- And the question is how can I achieve this.

10- I know you may suggest doing this basicly manually (scrolling down the 'B' and mark the variations). But the data includes a number of different operation stages, and there are numerous data, hence doing this with eyes is difficult and not reliable very much to me either. Also the primary aim is to learn the looping utilization.

Thank you for your time.
--
Bruce Weaver
bweaver@lakeheadu.ca
http://sites.google.com/a/lakeheadu.ca/bweaver/

"When all else fails, RTFM."

PLEASE NOTE THE FOLLOWING: 
1. My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above.
2. The SPSSX Discussion forum on Nabble is no longer linked to the SPSSX-L listserv administered by UGA (https://listserv.uga.edu/).
Reply | Threaded
Open this post in threaded view
|

Re: looping-find the best relationship between two variables

Rich Ulrich
[I have not yet seen the OP's original post.]

For a lot of information on using various lags,  check out a
thread in Nabble from June 6, "Problem running lag syntax".

As Bruce's references point out, over-fitting is a problem
for starting with multiple choices.  In addition, it seems to
me that the problem as stated in the original post requires
a different approach, based on the physics.

As I read it, this is something like looking at RPM for an
engine and measuring emissions at the tailpipe, where
there will be a time lag that depends on the velocity of
the exhaust... which will therefore depend on the preceding
RPM.  Higher RPM produces more gas; therefore, more velocity
or greater density (or both).

Since exhaust gas is compressible, the lag at the start is apt to
differ from the lag at the end, and those lags will vary with the
acceleration of the RPM.
 - The way to associate simple RPM with pollutants would be
to examine the stable numbers available for a fixed period after
deleting a safely-large number of measures from the start. 
 - The way to measure the effect of a style of acceleration would
be integrate (in some fashion) the values between the end of one
stabilized speed and the next stabilized speed.
 - RPM does not measure "engine power," which I think would
need in addition to incorporate "load".   Rapid acceleration under
heavy load, as I understand it, produces the greatest amount of
unburnt hydrocarbons, which the computerized fuel-feed system
attempts to minimize. 

--
Rich Ulrich

> Date: Sun, 18 Aug 2013 08:20:21 -0700

> From: [hidden email]
> Subject: Re: looping-find the best relationship between two variables
> To: [hidden email]
>
> What you want to do would be much clearer if you provided a small example of
> what the data file looks like currently, and what you want it to look like
> before running your regression model.
>
> Re "automated" regression, see these comments on "stepwise" and other
> automatic selection methods:
>
> http://www.stata.com/support/faqs/statistics/stepwise-regression-problems/
>
> And see Mike Babyak's nice article on over-fitting.
>
> http://people.duke.edu/~mababyak/papers/babyakregression.pdf
>
> HTH.
>
>
>
> ex902 wrote
> > Hi,
> > This is my first use of this site and excuse me for the long question. To
> > see the question quickly please go to (4).
> >
> > The question is related to understand the looping benefits by using the
> > following problem. Unfortunately, I know anything about the coding in
> > SPSS, other than using PASTE function to ease some of the analyses.
> >
> > 1- I have two variables, 'A' and 'B'.
> >
> > 2- 'A' is the engine RPM, and 'B' is emissions of, say carbonmonoixide.
> >
> > 3- Hence I am trying to understand the relationship between the engine
> > power and the emissions.
> >
> > 4- These two variables have values for each seconds. But their timelines
> > are not syncronized (THIS IS THE PROBLEM).
> >
> > 5- 'A' has correct timeline. But 'B' is coming 240 or 300 second behind
> > from 'A'. Other words, in the first 240 or 300 s, 'B' has irrelevant cases
> > but not zero.
> >
> > 6- Since there are other emission types (that is I have 'C', 'D' and a
> > several more variables that are need to syncronized with 'A' at different
> > lags) I would like to use looping syntax for an accurate solution instead
> > of manually check of every one of them.
> >
> > 7- The objective is: to find correct number of cases to be removed from
> > the beginning in 'B' to syncronize with the 'A'.
> >
> > 8- One of the method that corrsing in my mind is using correlation or
> > regression. If an automated regression analysis can be performed, than I
> > can find the highest R or R2, by removing the cases of 'B' one by one, and
> > in the end I can find the correct number of cases to be removed from the
> > beginning.
> >
> > 9- And the question is how can I achieve this.
> >
> > 10- I know you may suggest doing this basicly manually (scrolling down the
> > 'B' and mark the variations). But the data includes a number of different
> > operation stages, and there are numerous data, hence doing this with eyes
> > is difficult and not reliable very much to me either. Also the primary aim
> > is to learn the looping utilization.
> >
> > Thank you for your time.
>
> ...
> View this message in context: http://spssx-discussion.1045642.n5.nabble.com/looping-find-the-best-relationship-between-two-variables-tp5721634p5721635.html
...
Reply | Threaded
Open this post in threaded view
|

Re: looping-find the best relationship between two variables

David Marso
Administrator
In reply to this post by ex902
Take a look at the CCF function.

ex902 wrote
Hi,
This is my first use of this site and excuse me for the long question. To see the question quickly please go to (4).

The question is related to understand the looping benefits by using the following problem. Unfortunately, I know anything about the coding in SPSS, other than using PASTE function to ease some of the analyses.

1- I have two variables, 'A' and 'B'.

2- 'A' is the engine RPM, and 'B' is emissions of, say carbonmonoixide.

3- Hence I am trying to understand the relationship between the engine power and the emissions.

4- These two variables have values for each seconds. But their timelines are not syncronized (THIS IS THE PROBLEM).

5- 'A' has correct timeline. But 'B' is coming 240 or 300 second behind from 'A'. Other words, in the first 240 or 300 s, 'B' has irrelevant cases but not zero.

6- Since there are other emission types (that is I have 'C', 'D' and a several more variables that are need to syncronized with 'A' at different lags) I would like to use looping syntax for an accurate solution instead of manually check of every one of them.
 
7- The objective is: to find correct number of cases to be removed from the beginning in 'B' to syncronize with the 'A'.

8- One of the method that corrsing in my mind is using correlation or regression. If an automated regression analysis can be performed, than I can find the highest R or R2, by removing the cases of 'B' one by one, and in the end I can find the correct number of cases to be removed from the beginning.

9- And the question is how can I achieve this.

10- I know you may suggest doing this basicly manually (scrolling down the 'B' and mark the variations). But the data includes a number of different operation stages, and there are numerous data, hence doing this with eyes is difficult and not reliable very much to me either. Also the primary aim is to learn the looping utilization.

Thank you for your time.
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me.
---
"Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis."
Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?"