|
Hi all, I’m trying to put together an example dataset for teaching purposes. I’ve successfully written a few macros and additional syntax that can create example data with bivariate correlations between X1 and Y1 of various strengths with corresponding error terms where I can specify the mean and std dev of X1 and Y1 (with N=1,000), but I’m at a loss to figure out how to write some type of code that might create a distribution that might illustrate the concept of heteroscedasticity. I’m assuming that the easiest way to do this is to create a variable (also with N=1000) that might represent error in Y1 where the variance of that error increases as the predicted value of Y increases, and then add that error to my original Y1 variable, but I’m unsure how to create that heteroscedastic error variable. Any hints/ideas? Best, Jeff |
|
The easy cure for heteroscedasticity, when there is a cure, is the appropriate power- transformation of the mis-scaled variable.
Assuming that you want to show the cure, you should start by generating your ordinary relationship as you prefer, using non-negative numbers; then transform with the inverse
of whichever "cure" you want. You get more effect with a larger multiplicative range of
scores, but the square (for instance) is a mild power; you get more effect from exponentiating
(cured by logging) or taking a reciprocal. ["multiplicative range" - (1,2) or (100,200) give the same range; (2,20) is a much larger range, being 10-fold instead of two-fold. I am talking about "power transformations", after all.]
When Tukey talked about simulations for heteroscedasticity, another approach for generating wild variance was to use a mixture-of-populations. The simulation would assume that there was some regular relationship for most cases; and, then, some small fraction of cases,
like 1% or 5% or 10%, came from a different population where the variance was (say) 10
times as great for the IV. I forget his other details, but I imagine situations where the small -but-variable population differs from the other; it has the only real effects, or it is pure noise, or it contradicts the other effects.
Hope this helps. -- Rich Ulrich From: SPSSX(r) Discussion <[hidden email]> on behalf of Jeff <[hidden email]>
Sent: Sunday, May 27, 2018 1:13:29 AM To: [hidden email] Subject: How to: Create a distribution to illustrate Heteroscedasticity?
Hi all,
I’m trying to put together an example dataset for teaching purposes.
I’ve successfully written a few macros and additional syntax that can create example data with bivariate correlations between X1 and Y1 of various strengths with corresponding error terms where I can specify the mean and std dev of X1 and Y1 (with N=1,000), but I’m at a loss to figure out how to write some type of code that might create a distribution that might illustrate the concept of heteroscedasticity.
I’m assuming that the easiest way to do this is to create a variable (also with N=1000) that might represent error in Y1 where the variance of that error increases as the predicted value of Y increases, and then add that error to my original Y1 variable, but I’m unsure how to create that heteroscedastic error variable.
Any hints/ideas?
Best,
Jeff
|
|
In reply to this post by Jeff6610
compute heteros = rv.normal(z, k * x) where z is the source of the mean, x is the source of the heteroscedasticity, and k scales it as desired. You might also be interested in the Data with Cases custom dialog. It generates code for a dataset of any number of random variables from any of 23 distributions and 5 types of correlations (some of which mess up the chosen distribution). The dialog generates an input program that can in most cases be run without having the dialog installed. This dialog is posted on the old SPSS Community website, but I can send it to anyone who is interested. On Sat, May 26, 2018 at 11:13 PM, Jeff <[hidden email]> wrote:
|
|
Jon,
I'd be interested in this. Thanks.
Brian
From: SPSSX(r) Discussion <[hidden email]> on behalf of Jon Peck <[hidden email]>
Sent: Sunday, May 27, 2018 1:53:18 PM To: [hidden email] Subject: Re: How to: Create a distribution to illustrate Heteroscedasticity? compute heteros = rv.normal(z, k * x)
where z is the source of the mean, x is the source of the heteroscedasticity, and k scales it as desired.
You might also be interested in the Data with Cases custom dialog. It generates code for a dataset of any number of random variables from any of 23 distributions and 5 types of correlations (some of which
mess up the chosen distribution).
The dialog generates an input program that can in most cases be run without having the dialog installed.
This dialog is posted on the old SPSS Community website, but I can send it to anyone who is interested.
On Sat, May 26, 2018 at 11:13 PM, Jeff
<[hidden email]> wrote:
|
|
In reply to this post by Jeff6610
Also interested and thanks Will WMB & Associates Statistical Services ============ mailto: [hidden email] http://home.earthlink.net/~z_statman ============ > On 5/27/2018 2:40:44 PM, Dates, Brian ([hidden email]) wrote: > |
|
In reply to this post by Jeff6610
Jeff, Interesting question. Below my name you will find simulation code that should generate data with heteroskedastic errors from a simple linear regression model. Next, there is code that produces a scatterplot of standardized residual values against standardized predicted values from the simple linear regression model. Hope this helps. Ryan -- *GENERATE DATA. SET SEED 98768795. NEW FILE. INPUT PROGRAM. LOOP i = 1 TO 100. COMPUTE X = EXP(RV.NORMAL(0,1)). COMPUTE B0 = -1. COMPUTE B1 = 2. COMPUTE ERROR = SQRT(.8*x)*RV.NORMAL(0,1). COMPUTE Y = B0 + B1*x + ERROR. END CASE. END LOOP. END FILE. END INPUT PROGRAM. EXECUTE. REGRESSION /DEPENDENT Y /METHOD=ENTER X /SCATTERPLOT=(*ZRESID ,*ZPRED). On Sun, May 27, 2018 at 1:13 AM, Jeff <[hidden email]> wrote:
|
|
In reply to this post by Jon Peck
Hi Jon, I’ve searched quickly for the website you mentioned, but didn’t have any luck so I’m definitely interested in the dialog you’ve mentioned. I’m finding are too many examples, videos, and other pieces of info on line that are either far too complex, either too detailed or not detailed enough, or just plain wrong for teaching purposes and I’ve decided to start to make my own example data and output to use in class. Jeff From: Jon Peck <[hidden email]> compute heteros = rv.normal(z, k * x) where z is the source of the mean, x is the source of the heteroscedasticity, and k scales it as desired. You might also be interested in the Data with Cases custom dialog. It generates code for a dataset of any number of random variables from any of 23 distributions and 5 types of correlations (some of which mess up the chosen distribution). The dialog generates an input program that can in most cases be run without having the dialog installed. This dialog is posted on the old SPSS Community website, but I can send it to anyone who is interested. On Sat, May 26, 2018 at 11:13 PM, Jeff <[hidden email]> wrote:
-- Jon K Peck |
|
In reply to this post by Ryan
Thanks. …much easier than then the way I’ve been generating a few variables. …will have to explore this code more and what Jon has offered to send. Jeff From: SPSSX(r) Discussion <[hidden email]> On Behalf Of Ryan Black Jeff, Interesting question. Below my name you will find simulation code that should generate data with heteroskedastic errors from a simple linear regression model. Next, there is code that produces a scatterplot of standardized residual values against standardized predicted values from the simple linear regression model. Hope this helps. Ryan -- *GENERATE DATA. SET SEED 98768795. NEW FILE. INPUT PROGRAM. LOOP i = 1 TO 100. COMPUTE X = EXP(RV.NORMAL(0,1)). COMPUTE B0 = -1. COMPUTE B1 = 2. COMPUTE ERROR = SQRT(.8*x)*RV.NORMAL(0,1). COMPUTE Y = B0 + B1*x + ERROR. END CASE. END LOOP. END FILE. END INPUT PROGRAM. EXECUTE. REGRESSION /DEPENDENT Y /METHOD=ENTER X /SCATTERPLOT=(*ZRESID ,*ZPRED). On Sun, May 27, 2018 at 1:13 AM, Jeff <[hidden email]> wrote:
===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
| Free forum by Nabble | Edit this page |
