SPSSX Discussion

breaking a variable's data into intervals

Classic

List

Threaded

9 messages Options

jimjohn

breaking a variable's data into intervals

hi guys im a beginner and just have an spss question

if i have a variable with a bunch of numbers in it ranging from 0-200, and I want to perform some analysis but I want to separate the variables into intervals (for example, 0-10, 10-30, 30-50,...) and I want to run this analysis separately for each interval.

For example, I want to see how two other variables in my data set correlate with each other when a third variable is between 0-10, or how they correlate when that third variable is between 10-30, and so on.

Does anyone have an idea how I can use SPSS to do this? I can think of a long way where I choose Select -> Cases and filter the variable for each interval and then run my analysis each time. But I'm sure there must be a shorter way of doing this. any ideas would be great? thanks.

Oliver, Richard

Re: breaking a variable's data into intervals

Use Recode (or the Visual Binner dialog) to create the groups, and then use Split File to run the analysis separately for each group:

recode oldvar
(lo thru 10=1) (lo thru 30=2) (lo thru 50=3) [etc...]
into newvar.
split file by newvar.
[analysis commands]

-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of jimjohn
Sent: Wednesday, January 23, 2008 12:36 PM
To: [hidden email]
Subject: breaking a variable's data into intervals

hi guys im a beginner and just have an spss question

if i have a variable with a bunch of numbers in it ranging from 0-200, and I
want to perform some analysis but I want to separate the variables into
intervals (for example, 0-10, 10-30, 30-50,...) and I want to run this
analysis separately for each interval.

For example, I want to see how two other variables in my data set correlate
with each other when a third variable is between 0-10, or how they correlate
when that third variable is between 10-30, and so on.

Does anyone have an idea how I can use SPSS to do this? I can think of a
long way where I choose Select -> Cases and filter the variable for each
interval and then run my analysis each time. But I'm sure there must be a
shorter way of doing this. any ideas would be great? thanks.
--
View this message in context: http://www.nabble.com/breaking-a-variable%27s-data-into-intervals-tp15048598p15048598.html
Sent from the SPSSX Discussion mailing list archive at Nabble.com.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

ViAnn Beadle

Re: breaking a variable's data into intervals

In reply to this post by jimjohn

Create a new variable from the intervals using recode with INTO keyword.
Split the file on this variable sorting on the new variable and then using
SPLIT FILE BY the new variable. Run your analysis which will loop through
the splits.

RECODE var (0 thru 10=1)(11 thru 30=2)... INTO newvar.
SORT CASES by newvar.
SPLIT FILE by newvar.
CORRELATIONS anothervar1 anothervar2.

Do you want 10 as the endpoint for the 1st interval or the start point for
the next interval? You've got to decide one way or the other and adjust your
RECODE accordingly.

Here are general instructions to do this via the menus and dialog box:

1. Go to Transformation>Recode into Different Variables to do your recode.
2. Go to Data>Split File to sort and split the file.
3. Go to Analyze>Correlate>Bivariate to run

-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of
jimjohn
Sent: Wednesday, January 23, 2008 11:36 AM
To: [hidden email]
Subject: breaking a variable's data into intervals

hi guys im a beginner and just have an spss question

if i have a variable with a bunch of numbers in it ranging from 0-200, and I
want to perform some analysis but I want to separate the variables into
intervals (for example, 0-10, 10-30, 30-50,...) and I want to run this
analysis separately for each interval.

For example, I want to see how two other variables in my data set correlate
with each other when a third variable is between 0-10, or how they correlate
when that third variable is between 10-30, and so on.

Does anyone have an idea how I can use SPSS to do this? I can think of a
long way where I choose Select -> Cases and filter the variable for each
interval and then run my analysis each time. But I'm sure there must be a
shorter way of doing this. any ideas would be great? thanks.
--
View this message in context:
http://www.nabble.com/breaking-a-variable%27s-data-into-intervals-tp15048598
p15048598.html
Sent from the SPSSX Discussion mailing list archive at Nabble.com.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

SR Millis-3

Re: breaking a variable's data into intervals

In reply to this post by jimjohn

-- jimjohn <[hidden email]> wrote:

> if i have a variable with a bunch of numbers in it
> ranging from 0-200, and I
> want to perform some analysis but I want to separate
> the variables into
> intervals (for example, 0-10, 10-30, 30-50,...) and
> I want to run this
> analysis separately for each interval.

Although I would need to know more about this variable
and the aims of your analysis, it is generally a
really bad idea to categorize a continuous variable.
From Frank Harrell:

Problems Caused by Categorizing Continuous Variables :

1. Loss of power and loss of precision of estimated
means, odds, hazards, etc.

2. Categorization assumes that the relationship
between the predictor and the response is flat within
intervals; this assumption is far less reasonable than
a linearity assumption in most cases.

3. To make a continuous predictor be more accurately
modeled when categorization is used, multiple
intervals are required. The needed dummy variables
will spend more degrees of freedom than will fitting a
smooth relationship, hence power and precision will
suffer. And because of sample size limitations in the
very low and very high range of the variable, the
outer intervals (e.g., outer quintiles) will be wide,
resulting in significant heterogeneity of subjects
within those intervals, and residual confounding.

4. Categorization assumes that there is a
discontinuity in response as interval boundaries are
crossed.

5. Categorization only seems to yield interpretable
estimates such as odds ratios. For example, suppose
one computes the odds ratio for stroke for persons
with a systolic blood pressure > 160 mmHg compared to
persons with a blood pressure <= 160 mmHg. The
interpretation of the resulting odds ratio will depend
on the exact distribution of blood pressures in the
sample (the proportion of subjects > 170, > 180,
etc.). On the other hand, if blood pressure is modeled
as a continuous variable (e.g., using a regression
spline, quadratic, or linear effect) one can estimate
the ratio of odds for exact settings of the predictor,
e.g., the odds ratio for 200 mmHg compared to 120
mmHg.

6. When the risk of stroke is being assessed for a new
subject with a known blood pressure (say 162), the
subject does not report to her physician "my blood
pressure exceeds 160" but rather reports 162 mmHg. The
risk for this subject will be much lower than that of
a subject with a blood pressure of 200 mmHg.

7. If cutpoints are determined in a way that is not
blinded to the response variable, calculation of P
-values and confidence intervals requires special
simulation techniques; ordinary inferential methods
are completely invalid. For example, if cutpoints are
chosen by trial and error in a way that utilizes the
response, even informally, ordinary P -values will be
too small and confidence intervals will not have the
claimed coverage probabilities. The correct
Monte-Carlo simulations must take into account both
multiplicities and uncertainty in the choice of
cutpoints. For example, if a cutpoint is chosen that
minimizes the P -value and the resulting P -value is
0.05, the true type I error can easily be above 0.5
[2].

8. Likewise, categorization that is not blinded to the
response variable results in biased effect estimates
[3,4].

9. "Optimal" cutpoints do not replicate over studies.
Hollander, Sauerbrei, and Schumacher (2) state that
"... the optimal cutpoint approach has disadvantages.
One of these is that in almost every study where this
method is applied, another cutpoint will emerge. This
makes comparisons across studies extremely difficult
or even impossible. Altman et al. point out this
problem for studies of the prognostic relevance of the
S-phase fraction in breast cancer published in the
literature. They identified 19 different cutpoints
used in the literature; some of them were solely used
because they emerged as the `optimal' cutpoint in a
specific data set. In a meta-analysis on the
relationship between cathepsin-D content and
disease-free survival in node-negative breast cancer
patients, 12 studies were in included with 12
different cutpoints ... Interestingly, neither
cathepsin-D nor the S-phase fraction are recommended
to be used as prognostic markers in breast cancer in
the recent update of the American Society of Clinical
Oncology."

10. Cutpoints are arbitrary and manipulatable;
cutpoints can be found that can result in both
positive and negative associations [5].

11. If a confounder is adjusted for by categorization,
there will be residual confounding that can be
explained away by inclusion of the continuous form of
the predictor in the model in addition to the
categories.

12. A better approach that maximizes power and that
only assumes a smooth relationship is to use a
restricted cubic spline (regression spline; piecewise
cubic polynomial) function for predictors that are not
known to predict linearly. Use of flexible parametric
approaches such as this allows standard inference
techniques (P -values, confidence limits) to be used

[1] Royston P, Altman DG, Sauerbrei W. Dichotomizing
continuous predictors in multiple regression: a bad
idea. Stat Med 2006; 25:127-141.
[2] Holl N, Sauerbrei W, Schumacher M. Confidence
intervals for the effect of a prognostic factor after
selection of an `optimal' cutpoint. Stat Med 2004;
23:1701-1713.
[3] Altman DG, Lausen B, Sauerbrei W, Schumacher M.
Dangers of using `optimal' cutpoints in the evaluation
of prognostic factors. J Nat Cancer Inst 1994;
86:829-835.
[4] Schulgen G, Lausen B, Olsen J, Schumacher M.
Outcome-oriented cutpoints in quantitative exposure.
Am J Epi 1994; 120:172-184.
[5] Wainer H. Finding what is not there through the
unfortunate binning of results: The Mendel effect.
Chance 2006; 19:49-56.

SR Millis

Scott R Millis, PhD, MEd, ABPP (CN,CL,RP), CStat
Professor & Director of Research
Dept of Physical Medicine & Rehabilitation
Wayne State University School of Medicine
261 Mack Blvd
Detroit, MI 48201
Email: [hidden email]
Tel: 313-993-8085
Fax: 313-966-7682

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

jimjohn

Re: breaking a variable's data into intervals

In reply to this post by jimjohn

thanks so much guys for the help. my variable is discrete so I dont think it should be a problem then.

jimjohn wrote

hi guys im a beginner and just have an spss question

if i have a variable with a bunch of numbers in it ranging from 0-200, and I want to perform some analysis but I want to separate the variables into intervals (for example, 0-10, 10-30, 30-50,...) and I want to run this analysis separately for each interval.

For example, I want to see how two other variables in my data set correlate with each other when a third variable is between 0-10, or how they correlate when that third variable is between 10-30, and so on.

Does anyone have an idea how I can use SPSS to do this? I can think of a long way where I choose Select -> Cases and filter the variable for each interval and then run my analysis each time. But I'm sure there must be a shorter way of doing this. any ideas would be great? thanks.

ChrisTina Leimer

multiple comparison test for Kruskal-Wallis?

I need to determine which of 8 groups differ on my dependent variable. Can
anyone tell me how to run a multiple comparison test with the Kruskal-Wallis
using SPSS?

Thanks you,
Tina

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

Dr. Jeffrey D. Leitzel-2

Re: multiple comparison test for Kruskal-Wallis?

Hi Tina,
Unless some new procedure has been built into spss within the last version
or two, there is no multiple comparison test procedure for N Par tests like
there is for ANOVA.
The multiple comparison test would be a series of group comparisons using
Mann-WHitney U, but adjusting the probability somehow to reflect the
multiple comparisons' inflation of type 1 error rate.
Using a p of .05/# of comparisons is probably the most conservative way to
go.
HTH, Jeff
Jeffrey D. Leitzel, Ph.D.
Assistant Professor, Department of Psychology
Office: McCormick 2123
Bloomsburg University
400 East Second Street
Bloomsburg, PA 17815
Office Phone:570-389-4232,fax:570-389-2019
Off Hrs (Spring 08): MWF 10:20am-12 noon
Alt. Office (Tuesday): 570 348-6100 ext:3216

-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of
ChrisTina Leimer
Sent: Thursday, January 24, 2008 3:55 PM
To: [hidden email]
Subject: multiple comparison test for Kruskal-Wallis?

I need to determine which of 8 groups differ on my dependent variable. Can
anyone tell me how to run a multiple comparison test with the Kruskal-Wallis
using SPSS?

Thanks you,
Tina

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command SIGNOFF SPSSX-L For a list of
commands to manage subscriptions, send the command INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

Marta Garcia-Granero

Re: multiple comparison test for Kruskal-Wallis?

In reply to this post by ChrisTina Leimer

ChrisTina Leimer escribió:
> I need to determine which of 8 groups differ on my dependent variable. Can
> anyone tell me how to run a multiple comparison test with the Kruskal-Wallis
> using SPSS?
>

Hi Christina

This MACRO is in Spanish, but it will give you the multiple Mann-Whitney
tests, their original p-values, plus the p-values adjusted with a
variety of methods (I recommend Holm's). The macro needs a folder called
Temp in drive C:. I used to have one in English (should still be
somewhere in one of my 2 hard disks 20+80 Gb...), but I didn't update it
to SPSS 14.

HTH,
Marta García-Granero

DEFINE MULTIMW (!POSITIONAL !TOKENS(1)/!POSITIONAL !CHAREND('(')/
!POSITIONAL !CHAREND(',')/!POSITIONAL !CHAREND(')') ).
SET OLANG=SPANISH.
DATASET NAME Datos.
TITLE'COMPARACIONES MÚLTIPLES BASADAS EN LA U DE MANN-WHITNEY'.
OMS /SELECT TABLES
/IF SUBTYPE='Notes'
/DESTINATION VIEWER=NO.
OMS /SELECT HEADING
/IF COMMANDS='NPar Tests' LABEL='Title'
/DESTINATION VIEWER=NO.
OMS /SELECT TABLES
/IF COMMANDS='NPar Tests' SUBTYPES='Mann Whitney Test Statistics'
/DESTINATION FORMAT=SAV
OUTFILE='C:\Temp\MultiMWU&P.sav'.
DO IF $CASENUM=1.
- WRITE OUTFILE 'C:\Temp\multiman.sps' /"NPAR TESTS".
- NUMERIC #I #J (F2.0).
- LOOP #I=!3 to !4-1.
- LOOP #J=#I+1 to !4.
- WRITE OUTFILE 'C:\Temp\multiman.sps'
/" /M-W= "!QUOTE(!1)" BY "!QUOTE(!2)" (" #I #J ")".
- END LOOP.
- END LOOP.
- WRITE OUTFILE 'C:\Temp\multiman.sps' /".".
END IF.
EXECUTE.
INCLUDE FILE='C:\Temp\multiman.sps'.
ERASE FILE='C:\Temp\multiman.sps'.
OMSEND.
OMS /SELECT TABLES
/IF COMMANDS='Summarize' SUBTYPES='Case Processing Summary'
/DESTINATION VIEWER=NO.
GET FILE='C:\Temp\MultiMWU&P.sav' /DROP=Command_ TO Label_.
DATASET NAME Significaciones.
SELECT IF Var1='Sig. exacta [2*(Sig. unilateral)]'.
EXECUTE.
DELETE VARIABLES Var1.
RENAME VARIABLES (ALL=pvalue).
COMPUTE id = $CASENUM.
FORMAT id (F2.0).
SORT CASES BY pvalue (A) .
COMPUTE pos = $CASENUM.
FORMAT pos (F2.0).
PRESERVE.
SET ERRORS=NONE RESULTS=NONE.
RANK pvalue /n into N /PRINT = NO.
RESTORE.
COMPUTE bonferr=MIN(pvalue*n,1).
COMPUTE sidak=1-(1-pvalue)**n.
COMPUTE holm = MIN(1,(n-pos+1)*pvalue).
IF (holm LT LAG(holm)) holm = LAG(holm).
COMPUTE downsidk = 1-(1-pvalue)**(n-pos+1).
IF (downsidk LT LAG(downsidk)) downsidk = LAG(downsidk).
COMPUTE finner = 1-(1-pvalue)**(n/pos).
IF (finner LT LAG(finner)) finner = LAG(finner).
COMPUTE cn = cn+1/pos.
LEAVE cn.
SORT CASES BY pos(D).
IF cn LT LAG(cn) cn = LAG(cn).
COMPUTE hommel = MIN(1,cn*n*pvalue/pos).
IF (hommel GT LAG(hommel)) hommel = LAG(hommel).
COMPUTE hochberg = (n-pos+1)*pvalue.
IF (hochberg GT LAG(hochberg)) hochberg = LAG(hochberg).
COMPUTE simes = n*pvalue/pos.
IF (simes GT LAG(simes)) simes = LAG(simes).
EXECUTE.
DELETE VARIABLES pos,n,cn.
FORMAT pvalue bonferr to simes (F9.4).
VARIABLE LABELS id 'Nr.' /pvalue 'Original p-value'
/bonferr 'One-step Bonferroni' /sidak 'One-step Sidak'
/holm 'Step-down Holm' /downsidk 'Step-down Dunn-Sidak'
/finner 'Step-down Finner' /hommel 'Step-up Hommel'
/hochberg 'Step-up Hochberg' /simes 'Step-up Simes'.
SORT CASES BY id (A).
SUMMARIZE
/TABLES = pvalue bonferr TO simes
/FORMAT = LIST NOCASENUM TOTAL
/TITLE = 'Valores de p exactos originales y ajustados'
/MISSING = VARIABLE
/CELLS = NONE.
OMSEND.
TITLE' '.
DATASET ACTIVATE Datos.
DATASET CLOSE Significaciones.
SET OLANG=ENGLISH.
!ENDDEFINE.

> Thanks you,
> Tina
>
> =====================
> To manage your subscription to SPSSX-L, send a message to
> [hidden email] (not to SPSSX-L), with no body text except the
> command. To leave the list, send the command
> SIGNOFF SPSSX-L
> For a list of commands to manage subscriptions, send the command
> INFO REFCARD
>
>

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

LUCINDA M TEAR

Re: multiple comparison test for Kruskal-Wallis?

In reply to this post by Dr. Jeffrey D. Leitzel-2

Another possibility might be to transform your data into ranks or rankits and then run the parametric ANOVA and multiple comparison tests using the transformed data. See "Practical Nonparametric Statistics" by Conover for more discussion...

Lucinda Tear
----- Original Message -----
From: Dr. Jeffrey D. Leitzel<mailto:[hidden email]>
To: [hidden email]<mailto:[hidden email]>
Sent: Thursday, January 24, 2008 2:33 PM
Subject: Re: multiple comparison test for Kruskal-Wallis?

Hi Tina,
Unless some new procedure has been built into spss within the last version
or two, there is no multiple comparison test procedure for N Par tests like
there is for ANOVA.
The multiple comparison test would be a series of group comparisons using
Mann-WHitney U, but adjusting the probability somehow to reflect the
multiple comparisons' inflation of type 1 error rate.
Using a p of .05/# of comparisons is probably the most conservative way to
go.
HTH, Jeff
Jeffrey D. Leitzel, Ph.D.
Assistant Professor, Department of Psychology
Office: McCormick 2123
Bloomsburg University
400 East Second Street
Bloomsburg, PA 17815
Office Phone:570-389-4232,fax:570-389-2019
Off Hrs (Spring 08): MWF 10:20am-12 noon
Alt. Office (Tuesday): 570 348-6100 ext:3216

-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of
ChrisTina Leimer
Sent: Thursday, January 24, 2008 3:55 PM
To: [hidden email]<mailto:[hidden email]>
Subject: multiple comparison test for Kruskal-Wallis?

I need to determine which of 8 groups differ on my dependent variable. Can
anyone tell me how to run a multiple comparison test with the Kruskal-Wallis
using SPSS?

Thanks you,
Tina

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email]<mailto:[hidden email]> (not to SPSSX-L), with no body text except the
command. To leave the list, send the command SIGNOFF SPSSX-L For a list of
commands to manage subscriptions, send the command INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email]<mailto:[hidden email]> (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

====================To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD