Re: Non-parametric regression

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

Re: Non-parametric regression

Marta Garcia-Granero
Hi Bruce:

This macro has been tested only with a couple of datasets, you should your results with other software before considering it "error free".

Anyway, I'm confident enough that the results are OK.

HTH,
Marta Garcia-Granero

******************************************************
*           KENDALL-THEIL ROBUST REGRESSION          *
******************************************************
* SPSS Macro: (C) February 2011 Marta Garcia-Granero *
* [hidden email]                           *
******************************************************
* Reference:                                         *
* http://water.usgs.gov/pubs/twri/twri4a3/           *
* Techniques of Water-Resources Investigations       *
* of the United States Geological Survey             *
* Book 4, Hydrologic Analysis and Interpretation     *
* Chapter A3: Statistical Methods in Water Resources *
* D.R. Helsel & R.M. Hirsch. Chapter 10 (pp 265-74)  *
******************************************************

* Sample dataset (replace by your own), data come from Helsel&Hirsch book, p 267 (example 1) *.
DATA LIST LIST/x y (2 F8.0).
BEGIN DATA
1 1
2 2
3 3
4 4
5 5
6 16
7 7
END DATA.

* MACRO DEFINITION *.
DEFINE KendallTheil(x=!TOKENS(1) /y=!TOKENS(1)).
NONPAR CORR
  /VARIABLES=!x !y
  /PRINT=KENDALL TWOTAIL NOSIG.
PRESERVE.
SET MXLOOPS=10000./* Set it to sample size if n>10000 *.
MATRIX.
* Variable names (IV goes first, DV last) *.
GET data /VAR=!x !y/NAMES=vnames /MISSING=OMIT.
COMPUTE n=NROW(data).
COMPUTE X=data(:,1).
COMPUTE Y=data(:,2).
* Xmed & Ymed *.
COMPUTE SortedX=X.
COMPUTE SortedX(GRADE(X))=X.
COMPUTE SortedY=Y.
COMPUTE SortedY(GRADE(Y))=Y.
DO IF TRUNC(n/2) EQ n/2).         /* Median formula for even samples *.
. COMPUTE lx=SortedX(n/2).
. COMPUTE ux=SortedX(1+n/2).
. COMPUTE ly=SortedY(n/2).
. COMPUTE uy=SortedY(1+n/2).
. COMPUTE MedX=(lx+ux)/2.         /* Median is the average of two values *.
. COMPUTE MedY=(ly+uy)/2.
ELSE.                             /* Median formula for odd samples *.
. COMPUTE MedX=SortedX((n+1)/2).  /* Median is one value of the sample *.
. COMPUTE MedY=SortedY((n+1)/2).
END IF.
PRINT {MedX,MedY}
 /FORMAT='F8.1'
 /CNAMES=vnames
 /TITLE='Sample medians (X&Y)'.
* Computing all the slopes *.
COMPUTE k=0.
COMPUTE Pos=0.
COMPUTE Neg=0.
COMPUTE Ties=0.
LOOP i=1 to n-1.
. LOOP j=i+1 TO n.
.  DO IF data(i,1) NE data(j,1).
.   COMPUTE slope=(data(j,2)-data(i,2))/(data(j,1)-data(i,1)).
.    DO IF slope GT 0.
.     COMPUTE Pos=Pos+1.
.    ELSE IF Slope LT 0.
.      COMPUTE Neg=Neg+1.
.    ELSE.
.     COMPUTE Ties=Ties+1.
.    END IF.
.   COMPUTE k=k+1.
.  END IF.
.  DO IF k=1.
.   COMPUTE AllSlope=slope.
.  ELSE.
.   COMPUTE AllSlope={AllSlope;slope}.
.  END IF.
. END LOOP.
END LOOP.
COMPUTE Tau=(Pos-Neg)/(Pos+Neg).
PRINT Tau
 /FORMAT='F8.3'
 /TITLE="Kendall's Tau".
* Sort slopes (method by Ristow&Peck)*.
COMPUTE StdSlope=AllSlope.
COMPUTE StdSlope(GRADE(AllSlope))=AllSlope.
* Median slope (B1 hat) *.
DO IF TRUNC(k/2) EQ k/2).       /* Median formula for even samples *.
. COMPUTE lb=StdSlope(k/2).
. COMPUTE ub=StdSlope(1+k/2).   /* Median is the average of two values *.
. COMPUTE b=(lb+ub)/2.
ELSE.                           /* Median formula for odd samples *.
. COMPUTE b=StdSlope((k+1)/2).  /* Median is one value of the sample *.
END IF.
COMPUTE a=MedY-b*MedX.
* Approx. 95%CI for b *.
DO IF n LE 10.              /* Low sample size (n<11) values *.
. DO IF n EQ 5.
.  COMPUTE Xu=10.
. ELSE IF n EQ 6.
.  COMPUTE Xu=11.
. ELSE IF n EQ 7.
.  COMPUTE Xu=15.
. ELSE IF n EQ 8.
.  COMPUTE Xu=18.
. ELSE IF n EQ 9.
.  COMPUTE Xu=20.
. ELSE.
.  COMPUTE Xu=23.
. END IF.
. COMPUTE Ru=(k+Xu)/2.
. COMPUTE Rl=1+(k-Xu)/2.
ELSE.
. COMPUTE ZValue=1.95996398454005.
. COMPUTE Ru=rnd(1+(K+ZValue*SQRT(n*(n-1)*(2*n+5)/18))/2).
. COMPUTE Rl=rnd((K-ZValue*SQRT(n*(n-1)*(2*n+5)/18))/2).
END IF.
COMPUTE LowB=StdSlope(Rl).
COMPUTE UppB=StdSlope(Ru).
PRINT {a,b}
 /FORMAT='F8.2'
 /CLABELS='B0','B1'
 /TITLE='Intercept & Slope, Kendall-Theil estimates'.
PRINT {LowB,UppB}
 /FORMAT='F8.2'
 /CLABEL='Lower','Upper'
 /TITLE='95%CI for B1'.
END MATRIX.
RESTORE.
!ENDDEFINE.

* MACRO CALL *.
KendallTheil x=x y=y.

El 04/02/2011 18:35, Bruce Anderson escribió:
Hi Marta-
I see your work often on Raynald's site, and have enjoyed using some of your syntax over the years.  I know the basics, but am not very fluent writing SPSS syntax.  Twenty years ago I was pretty good with SAS code....but it has been a long time!

I was wondering if this problem might interest you.  It is the Sen Slope estimate with Mann-Kendall statistic.  It is a little esoteric, mainly used by water quality specialists like me, but there isn't any code out there to implement it in SPSS.  I don't think I have the talent (or time, as a single parent) to crack this one myself. 

Thanks for all your contributions to the SPSS community!

Bruce

Here is a link to a good description of the method, embedded in an email exchange with Richard Ristow:

------------------------------------------------------------- 

Hi Richard-

I have another statistical procedure that I need to implement in SPSS.  It involves the calculation of a non-parametric Sen Slope for trend analysis of water quality data.  Are you familiar with this?  It involves calculating individual slopes for all pairs of data points in in time series, and taking the median of these individual slope values as an alternative to OLS regression.  Here's a link:
 
http://www.cee.vt.edu/ewr/environmental/teach/smprimer/sen/sen.html
 
Any chance you would be interested in helping me to work out an SPSS script for this? Perhaps one already exists?  If so, I haven't found it...
Bruce

From the site you sent, it looks like Sen's slope estimate is the median value of the pairwise slopes, Slope(i,j):

If you have values X1,...,Xn observed at times T1,...,Tn, then

delX(i,j)==Xj-Xi, j>i
delT(i,j)==Tj-Ti, j>i
Slope(i,j)==delX(i,j)/delT(i,
j)

That requires a Cartesian product, or many-to-many merge, in SPSS. I'm happy to say that, after quite a lot of consideration on SPSSX-L and elsewhere, that appears to be a solved problem.

That done, SPSS has plenty of facilities for calculating the median.

I'm not familiar with Sen's method of testing for slopes, and can't comment on its validity. (I didn't find anything on a quick Net search.) I don't see that SPSS has a direct facility for calculating the Mann-Kendall confidence interval for the median, but it could probably be programmed reasonably easily. I'd probably post, since others on the list (Marta Garcia-Granero, for one important example) have done a lot with non-parametric testing.

-Further onward,
 Richard

<input onclick="jsCall();" id="jsProxy" type="hidden">


--
For miscellaneous SPSS related statistical stuff, visit:
http://gjyp.nl/marta/