|
hi guys im a beginner and just have an spss question
if i have a variable with a bunch of numbers in it ranging from 0-200, and I want to perform some analysis but I want to separate the variables into intervals (for example, 0-10, 10-30, 30-50,...) and I want to run this analysis separately for each interval. For example, I want to see how two other variables in my data set correlate with each other when a third variable is between 0-10, or how they correlate when that third variable is between 10-30, and so on. Does anyone have an idea how I can use SPSS to do this? I can think of a long way where I choose Select -> Cases and filter the variable for each interval and then run my analysis each time. But I'm sure there must be a shorter way of doing this. any ideas would be great? thanks. |
|
Use Recode (or the Visual Binner dialog) to create the groups, and then use Split File to run the analysis separately for each group:
recode oldvar (lo thru 10=1) (lo thru 30=2) (lo thru 50=3) [etc...] into newvar. split file by newvar. [analysis commands] -----Original Message----- From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of jimjohn Sent: Wednesday, January 23, 2008 12:36 PM To: [hidden email] Subject: breaking a variable's data into intervals hi guys im a beginner and just have an spss question if i have a variable with a bunch of numbers in it ranging from 0-200, and I want to perform some analysis but I want to separate the variables into intervals (for example, 0-10, 10-30, 30-50,...) and I want to run this analysis separately for each interval. For example, I want to see how two other variables in my data set correlate with each other when a third variable is between 0-10, or how they correlate when that third variable is between 10-30, and so on. Does anyone have an idea how I can use SPSS to do this? I can think of a long way where I choose Select -> Cases and filter the variable for each interval and then run my analysis each time. But I'm sure there must be a shorter way of doing this. any ideas would be great? thanks. -- View this message in context: http://www.nabble.com/breaking-a-variable%27s-data-into-intervals-tp15048598p15048598.html Sent from the SPSSX Discussion mailing list archive at Nabble.com. ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
|
In reply to this post by jimjohn
Create a new variable from the intervals using recode with INTO keyword.
Split the file on this variable sorting on the new variable and then using SPLIT FILE BY the new variable. Run your analysis which will loop through the splits. RECODE var (0 thru 10=1)(11 thru 30=2)... INTO newvar. SORT CASES by newvar. SPLIT FILE by newvar. CORRELATIONS anothervar1 anothervar2. Do you want 10 as the endpoint for the 1st interval or the start point for the next interval? You've got to decide one way or the other and adjust your RECODE accordingly. Here are general instructions to do this via the menus and dialog box: 1. Go to Transformation>Recode into Different Variables to do your recode. 2. Go to Data>Split File to sort and split the file. 3. Go to Analyze>Correlate>Bivariate to run -----Original Message----- From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of jimjohn Sent: Wednesday, January 23, 2008 11:36 AM To: [hidden email] Subject: breaking a variable's data into intervals hi guys im a beginner and just have an spss question if i have a variable with a bunch of numbers in it ranging from 0-200, and I want to perform some analysis but I want to separate the variables into intervals (for example, 0-10, 10-30, 30-50,...) and I want to run this analysis separately for each interval. For example, I want to see how two other variables in my data set correlate with each other when a third variable is between 0-10, or how they correlate when that third variable is between 10-30, and so on. Does anyone have an idea how I can use SPSS to do this? I can think of a long way where I choose Select -> Cases and filter the variable for each interval and then run my analysis each time. But I'm sure there must be a shorter way of doing this. any ideas would be great? thanks. -- View this message in context: http://www.nabble.com/breaking-a-variable%27s-data-into-intervals-tp15048598 p15048598.html Sent from the SPSSX Discussion mailing list archive at Nabble.com. ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
|
In reply to this post by jimjohn
-- jimjohn <[hidden email]> wrote:
> if i have a variable with a bunch of numbers in it > ranging from 0-200, and I > want to perform some analysis but I want to separate > the variables into > intervals (for example, 0-10, 10-30, 30-50,...) and > I want to run this > analysis separately for each interval. Although I would need to know more about this variable and the aims of your analysis, it is generally a really bad idea to categorize a continuous variable. From Frank Harrell: Problems Caused by Categorizing Continuous Variables : 1. Loss of power and loss of precision of estimated means, odds, hazards, etc. 2. Categorization assumes that the relationship between the predictor and the response is flat within intervals; this assumption is far less reasonable than a linearity assumption in most cases. 3. To make a continuous predictor be more accurately modeled when categorization is used, multiple intervals are required. The needed dummy variables will spend more degrees of freedom than will fitting a smooth relationship, hence power and precision will suffer. And because of sample size limitations in the very low and very high range of the variable, the outer intervals (e.g., outer quintiles) will be wide, resulting in significant heterogeneity of subjects within those intervals, and residual confounding. 4. Categorization assumes that there is a discontinuity in response as interval boundaries are crossed. 5. Categorization only seems to yield interpretable estimates such as odds ratios. For example, suppose one computes the odds ratio for stroke for persons with a systolic blood pressure > 160 mmHg compared to persons with a blood pressure <= 160 mmHg. The interpretation of the resulting odds ratio will depend on the exact distribution of blood pressures in the sample (the proportion of subjects > 170, > 180, etc.). On the other hand, if blood pressure is modeled as a continuous variable (e.g., using a regression spline, quadratic, or linear effect) one can estimate the ratio of odds for exact settings of the predictor, e.g., the odds ratio for 200 mmHg compared to 120 mmHg. 6. When the risk of stroke is being assessed for a new subject with a known blood pressure (say 162), the subject does not report to her physician "my blood pressure exceeds 160" but rather reports 162 mmHg. The risk for this subject will be much lower than that of a subject with a blood pressure of 200 mmHg. 7. If cutpoints are determined in a way that is not blinded to the response variable, calculation of P -values and confidence intervals requires special simulation techniques; ordinary inferential methods are completely invalid. For example, if cutpoints are chosen by trial and error in a way that utilizes the response, even informally, ordinary P -values will be too small and confidence intervals will not have the claimed coverage probabilities. The correct Monte-Carlo simulations must take into account both multiplicities and uncertainty in the choice of cutpoints. For example, if a cutpoint is chosen that minimizes the P -value and the resulting P -value is 0.05, the true type I error can easily be above 0.5 [2]. 8. Likewise, categorization that is not blinded to the response variable results in biased effect estimates [3,4]. 9. "Optimal" cutpoints do not replicate over studies. Hollander, Sauerbrei, and Schumacher (2) state that "... the optimal cutpoint approach has disadvantages. One of these is that in almost every study where this method is applied, another cutpoint will emerge. This makes comparisons across studies extremely difficult or even impossible. Altman et al. point out this problem for studies of the prognostic relevance of the S-phase fraction in breast cancer published in the literature. They identified 19 different cutpoints used in the literature; some of them were solely used because they emerged as the `optimal' cutpoint in a specific data set. In a meta-analysis on the relationship between cathepsin-D content and disease-free survival in node-negative breast cancer patients, 12 studies were in included with 12 different cutpoints ... Interestingly, neither cathepsin-D nor the S-phase fraction are recommended to be used as prognostic markers in breast cancer in the recent update of the American Society of Clinical Oncology." 10. Cutpoints are arbitrary and manipulatable; cutpoints can be found that can result in both positive and negative associations [5]. 11. If a confounder is adjusted for by categorization, there will be residual confounding that can be explained away by inclusion of the continuous form of the predictor in the model in addition to the categories. 12. A better approach that maximizes power and that only assumes a smooth relationship is to use a restricted cubic spline (regression spline; piecewise cubic polynomial) function for predictors that are not known to predict linearly. Use of flexible parametric approaches such as this allows standard inference techniques (P -values, confidence limits) to be used [1] Royston P, Altman DG, Sauerbrei W. Dichotomizing continuous predictors in multiple regression: a bad idea. Stat Med 2006; 25:127-141. [2] Holl N, Sauerbrei W, Schumacher M. Confidence intervals for the effect of a prognostic factor after selection of an `optimal' cutpoint. Stat Med 2004; 23:1701-1713. [3] Altman DG, Lausen B, Sauerbrei W, Schumacher M. Dangers of using `optimal' cutpoints in the evaluation of prognostic factors. J Nat Cancer Inst 1994; 86:829-835. [4] Schulgen G, Lausen B, Olsen J, Schumacher M. Outcome-oriented cutpoints in quantitative exposure. Am J Epi 1994; 120:172-184. [5] Wainer H. Finding what is not there through the unfortunate binning of results: The Mendel effect. Chance 2006; 19:49-56. SR Millis Scott R Millis, PhD, MEd, ABPP (CN,CL,RP), CStat Professor & Director of Research Dept of Physical Medicine & Rehabilitation Wayne State University School of Medicine 261 Mack Blvd Detroit, MI 48201 Email: [hidden email] Tel: 313-993-8085 Fax: 313-966-7682 ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
|
In reply to this post by jimjohn
thanks so much guys for the help. my variable is discrete so I dont think it should be a problem then.
|
|
I need to determine which of 8 groups differ on my dependent variable. Can
anyone tell me how to run a multiple comparison test with the Kruskal-Wallis using SPSS? Thanks you, Tina ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
|
Hi Tina,
Unless some new procedure has been built into spss within the last version or two, there is no multiple comparison test procedure for N Par tests like there is for ANOVA. The multiple comparison test would be a series of group comparisons using Mann-WHitney U, but adjusting the probability somehow to reflect the multiple comparisons' inflation of type 1 error rate. Using a p of .05/# of comparisons is probably the most conservative way to go. HTH, Jeff Jeffrey D. Leitzel, Ph.D. Assistant Professor, Department of Psychology Office: McCormick 2123 Bloomsburg University 400 East Second Street Bloomsburg, PA 17815 Office Phone:570-389-4232,fax:570-389-2019 Off Hrs (Spring 08): MWF 10:20am-12 noon Alt. Office (Tuesday): 570 348-6100 ext:3216 -----Original Message----- From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of ChrisTina Leimer Sent: Thursday, January 24, 2008 3:55 PM To: [hidden email] Subject: multiple comparison test for Kruskal-Wallis? I need to determine which of 8 groups differ on my dependent variable. Can anyone tell me how to run a multiple comparison test with the Kruskal-Wallis using SPSS? Thanks you, Tina ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
|
In reply to this post by ChrisTina Leimer
ChrisTina Leimer escribió:
> I need to determine which of 8 groups differ on my dependent variable. Can > anyone tell me how to run a multiple comparison test with the Kruskal-Wallis > using SPSS? > Hi Christina This MACRO is in Spanish, but it will give you the multiple Mann-Whitney tests, their original p-values, plus the p-values adjusted with a variety of methods (I recommend Holm's). The macro needs a folder called Temp in drive C:. I used to have one in English (should still be somewhere in one of my 2 hard disks 20+80 Gb...), but I didn't update it to SPSS 14. HTH, Marta García-Granero DEFINE MULTIMW (!POSITIONAL !TOKENS(1)/!POSITIONAL !CHAREND('(')/ !POSITIONAL !CHAREND(',')/!POSITIONAL !CHAREND(')') ). SET OLANG=SPANISH. DATASET NAME Datos. TITLE'COMPARACIONES MÚLTIPLES BASADAS EN LA U DE MANN-WHITNEY'. OMS /SELECT TABLES /IF SUBTYPE='Notes' /DESTINATION VIEWER=NO. OMS /SELECT HEADING /IF COMMANDS='NPar Tests' LABEL='Title' /DESTINATION VIEWER=NO. OMS /SELECT TABLES /IF COMMANDS='NPar Tests' SUBTYPES='Mann Whitney Test Statistics' /DESTINATION FORMAT=SAV OUTFILE='C:\Temp\MultiMWU&P.sav'. DO IF $CASENUM=1. - WRITE OUTFILE 'C:\Temp\multiman.sps' /"NPAR TESTS". - NUMERIC #I #J (F2.0). - LOOP #I=!3 to !4-1. - LOOP #J=#I+1 to !4. - WRITE OUTFILE 'C:\Temp\multiman.sps' /" /M-W= "!QUOTE(!1)" BY "!QUOTE(!2)" (" #I #J ")". - END LOOP. - END LOOP. - WRITE OUTFILE 'C:\Temp\multiman.sps' /".". END IF. EXECUTE. INCLUDE FILE='C:\Temp\multiman.sps'. ERASE FILE='C:\Temp\multiman.sps'. OMSEND. OMS /SELECT TABLES /IF COMMANDS='Summarize' SUBTYPES='Case Processing Summary' /DESTINATION VIEWER=NO. GET FILE='C:\Temp\MultiMWU&P.sav' /DROP=Command_ TO Label_. DATASET NAME Significaciones. SELECT IF Var1='Sig. exacta [2*(Sig. unilateral)]'. EXECUTE. DELETE VARIABLES Var1. RENAME VARIABLES (ALL=pvalue). COMPUTE id = $CASENUM. FORMAT id (F2.0). SORT CASES BY pvalue (A) . COMPUTE pos = $CASENUM. FORMAT pos (F2.0). PRESERVE. SET ERRORS=NONE RESULTS=NONE. RANK pvalue /n into N /PRINT = NO. RESTORE. COMPUTE bonferr=MIN(pvalue*n,1). COMPUTE sidak=1-(1-pvalue)**n. COMPUTE holm = MIN(1,(n-pos+1)*pvalue). IF (holm LT LAG(holm)) holm = LAG(holm). COMPUTE downsidk = 1-(1-pvalue)**(n-pos+1). IF (downsidk LT LAG(downsidk)) downsidk = LAG(downsidk). COMPUTE finner = 1-(1-pvalue)**(n/pos). IF (finner LT LAG(finner)) finner = LAG(finner). COMPUTE cn = cn+1/pos. LEAVE cn. SORT CASES BY pos(D). IF cn LT LAG(cn) cn = LAG(cn). COMPUTE hommel = MIN(1,cn*n*pvalue/pos). IF (hommel GT LAG(hommel)) hommel = LAG(hommel). COMPUTE hochberg = (n-pos+1)*pvalue. IF (hochberg GT LAG(hochberg)) hochberg = LAG(hochberg). COMPUTE simes = n*pvalue/pos. IF (simes GT LAG(simes)) simes = LAG(simes). EXECUTE. DELETE VARIABLES pos,n,cn. FORMAT pvalue bonferr to simes (F9.4). VARIABLE LABELS id 'Nr.' /pvalue 'Original p-value' /bonferr 'One-step Bonferroni' /sidak 'One-step Sidak' /holm 'Step-down Holm' /downsidk 'Step-down Dunn-Sidak' /finner 'Step-down Finner' /hommel 'Step-up Hommel' /hochberg 'Step-up Hochberg' /simes 'Step-up Simes'. SORT CASES BY id (A). SUMMARIZE /TABLES = pvalue bonferr TO simes /FORMAT = LIST NOCASENUM TOTAL /TITLE = 'Valores de p exactos originales y ajustados' /MISSING = VARIABLE /CELLS = NONE. OMSEND. TITLE' '. DATASET ACTIVATE Datos. DATASET CLOSE Significaciones. SET OLANG=ENGLISH. !ENDDEFINE. > Thanks you, > Tina > > ===================== > To manage your subscription to SPSSX-L, send a message to > [hidden email] (not to SPSSX-L), with no body text except the > command. To leave the list, send the command > SIGNOFF SPSSX-L > For a list of commands to manage subscriptions, send the command > INFO REFCARD > > ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
|
In reply to this post by Dr. Jeffrey D. Leitzel-2
Another possibility might be to transform your data into ranks or rankits and then run the parametric ANOVA and multiple comparison tests using the transformed data. See "Practical Nonparametric Statistics" by Conover for more discussion...
Lucinda Tear ----- Original Message ----- From: Dr. Jeffrey D. Leitzel<mailto:[hidden email]> To: [hidden email]<mailto:[hidden email]> Sent: Thursday, January 24, 2008 2:33 PM Subject: Re: multiple comparison test for Kruskal-Wallis? Hi Tina, Unless some new procedure has been built into spss within the last version or two, there is no multiple comparison test procedure for N Par tests like there is for ANOVA. The multiple comparison test would be a series of group comparisons using Mann-WHitney U, but adjusting the probability somehow to reflect the multiple comparisons' inflation of type 1 error rate. Using a p of .05/# of comparisons is probably the most conservative way to go. HTH, Jeff Jeffrey D. Leitzel, Ph.D. Assistant Professor, Department of Psychology Office: McCormick 2123 Bloomsburg University 400 East Second Street Bloomsburg, PA 17815 Office Phone:570-389-4232,fax:570-389-2019 Off Hrs (Spring 08): MWF 10:20am-12 noon Alt. Office (Tuesday): 570 348-6100 ext:3216 -----Original Message----- From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of ChrisTina Leimer Sent: Thursday, January 24, 2008 3:55 PM To: [hidden email]<mailto:[hidden email]> Subject: multiple comparison test for Kruskal-Wallis? I need to determine which of 8 groups differ on my dependent variable. Can anyone tell me how to run a multiple comparison test with the Kruskal-Wallis using SPSS? Thanks you, Tina ===================== To manage your subscription to SPSSX-L, send a message to [hidden email]<mailto:[hidden email]> (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD ===================== To manage your subscription to SPSSX-L, send a message to [hidden email]<mailto:[hidden email]> (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD ====================To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
| Free forum by Nabble | Edit this page |
