Hi my friends
As part of a never ending macro development for best subsets in multiple regression, I need a restricted list of a very long file (can have several thousand rows): the first 3 cases for every value another variable takes (see file below). I have been using SPLIT FILE with LIST /CASES=FROM 1 TO 3, and it works, but I wanted a nicer looking table (easier to modify afterwards with Word), and I tried to use the same idea, but with REPORT instead of LIST... It doesn't work :( Any ideas? Thanks Marta * Small sample of cumbersome dataset *. DATA LIST LIST/nvars x1 to x5 (6 F3) r2 rho2 res_sd cp aic apc sbc (7 F5.3). BEGIN DATA 1 0 1 0 0 0 .599 .590 422.6 7.421 546.1 .438 549.8 1 0 0 0 0 1 .390 .376 521.5 32.74 565.1 .667 568.7 1 0 0 0 1 0 .074 .052 642.5 70.91 583.8 1.012 587.5 1 0 0 1 0 0 .042 .020 653.6 74.80 585.4 1.047 589.0 1 1 0 0 0 0 .011 -.012 664.1 78.57 586.8 1.081 590.4 2 0 1 0 1 0 .658 .642 394.9 2.282 541.0 .390 546.4 2 0 1 1 0 0 .648 .631 401.1 3.594 542.4 .403 547.8 2 0 1 0 0 1 .608 .590 422.8 8.343 547.1 .448 552.5 2 1 1 0 0 0 .603 .585 425.5 8.933 547.7 .453 553.1 2 1 0 0 0 1 .553 .531 451.9 15.07 553.1 .511 558.5 2 0 0 0 1 1 .430 .403 510.1 29.89 564.0 .651 569.4 2 0 0 1 0 1 .415 .387 516.7 31.68 565.2 .668 570.6 2 1 0 0 1 0 .078 .034 648.9 72.48 585.7 1.054 591.1 2 0 0 1 1 0 .074 .030 650.0 72.87 585.8 1.058 591.2 2 1 0 1 0 0 .053 .008 657.3 75.40 586.8 1.082 592.3 3 0 1 0 1 1 .662 .638 397.3 3.796 542.4 .403 549.7 3 0 1 1 1 0 .660 .636 398.5 4.049 542.7 .406 549.9 3 1 1 0 1 0 .659 .634 399.3 4.207 542.9 .407 550.1 3 1 1 1 0 0 .652 .627 403.3 5.037 543.8 .416 551.0 3 0 1 1 0 1 .652 .627 403.3 5.048 543.8 .416 551.0 3 1 1 0 0 1 .637 .610 412.2 6.915 545.7 .434 553.0 3 1 0 1 0 1 .576 .546 445.0 14.19 552.6 .506 559.9 3 1 0 0 1 1 .564 .533 451.3 15.64 553.9 .521 561.1 3 0 0 1 1 1 .430 .388 516.2 31.89 566.0 .681 573.2 3 1 0 1 1 0 .078 .010 656.7 74.48 587.7 1.102 594.9 4 1 1 1 0 1 .675 .642 394.8 4.296 542.7 .406 551.8 4 1 1 0 1 1 .672 .639 396.6 4.670 543.2 .410 552.2 4 0 1 1 1 1 .664 .631 401.2 5.588 544.2 .420 553.2 4 1 1 1 1 0 .662 .628 402.7 5.886 544.5 .423 553.6 4 1 0 1 1 1 .577 .535 450.1 16.09 554.6 .528 563.6 5 1 1 1 1 1 .677 .636 398.3 6.000 544.4 .422 555.2 END DATA. * Listing 3 best models for every number of predictors *. * This works (but it's ugly) *. SORT CASES BY nvars(A) Cp(A) . SPLIT FILE LAYERED BY nvars . LIST /VARS=x1 TO sbc /FORMAT=SINGLE /CASES=FROM 1 TO 3. SPLIT FILE OFF. * This doesn't work *. SORT CASES BY nvars(A) Cp(A) . SPLIT FILE LAYERED BY nvars . /* I've also tried with SEPARATE *. SUMMARIZE /TABLES=x1 TO sbc /FORMAT=LIST NOCASENUM NOTOTAL LIMIT=3 /TITLE='Best subsets models' /MISSING=VARIABLE /CELLS=NONE. SPLIT FILE OFF. |
Hi Simon,
Simon (Freidin) says ;) SF> Sort then add a cumulative counter of number of cases in each SF> group. Filter by counter < 4, then split and summarize. It worked goooody good! It looks like my MACRO is finished at last :) RANK VARIABLES=Cp (A) BY nvars /RANK /PRINT=NO. /* See note *. SORT CASES BY nvars(A) Cp(A) . SPLIT FILE LAYERED BY nvars . TEMPORARY. SELECT IF (RCp LE 3). SUMMARIZE /TABLES=x1 TO sbc /FORMAT=LIST NOCASENUM NOTOTAL /TITLE='Best subsets models' /MISSING=VARIABLE /CELLS=NONE. SPLIT FILE OFF. Note: There is no risk of tied ranks, because all Cp values have a lot of decimal places and there aren't two with the same value. Thanks a lot and happy weekend! Marta. >> DATA LIST LIST/nvars x1 to x5 (6 F3) r2 rho2 res_sd cp aic apc sbc >> (7 F5.3). >> BEGIN DATA >> 1 0 1 0 0 0 .599 .590 422.6 7.421 546.1 .438 549.8 >> 1 0 0 0 0 1 .390 .376 521.5 32.74 565.1 .667 568.7 >> 1 0 0 0 1 0 .074 .052 642.5 70.91 583.8 1.012 587.5 >> 1 0 0 1 0 0 .042 .020 653.6 74.80 585.4 1.047 589.0 >> 1 1 0 0 0 0 .011 -.012 664.1 78.57 586.8 1.081 590.4 >> 2 0 1 0 1 0 .658 .642 394.9 2.282 541.0 .390 546.4 >> 2 0 1 1 0 0 .648 .631 401.1 3.594 542.4 .403 547.8 >> 2 0 1 0 0 1 .608 .590 422.8 8.343 547.1 .448 552.5 >> 2 1 1 0 0 0 .603 .585 425.5 8.933 547.7 .453 553.1 >> 2 1 0 0 0 1 .553 .531 451.9 15.07 553.1 .511 558.5 >> 2 0 0 0 1 1 .430 .403 510.1 29.89 564.0 .651 569.4 >> 2 0 0 1 0 1 .415 .387 516.7 31.68 565.2 .668 570.6 >> 2 1 0 0 1 0 .078 .034 648.9 72.48 585.7 1.054 591.1 >> 2 0 0 1 1 0 .074 .030 650.0 72.87 585.8 1.058 591.2 >> 2 1 0 1 0 0 .053 .008 657.3 75.40 586.8 1.082 592.3 >> 3 0 1 0 1 1 .662 .638 397.3 3.796 542.4 .403 549.7 >> 3 0 1 1 1 0 .660 .636 398.5 4.049 542.7 .406 549.9 >> 3 1 1 0 1 0 .659 .634 399.3 4.207 542.9 .407 550.1 >> 3 1 1 1 0 0 .652 .627 403.3 5.037 543.8 .416 551.0 >> 3 0 1 1 0 1 .652 .627 403.3 5.048 543.8 .416 551.0 >> 3 1 1 0 0 1 .637 .610 412.2 6.915 545.7 .434 553.0 >> 3 1 0 1 0 1 .576 .546 445.0 14.19 552.6 .506 559.9 >> 3 1 0 0 1 1 .564 .533 451.3 15.64 553.9 .521 561.1 >> 3 0 0 1 1 1 .430 .388 516.2 31.89 566.0 .681 573.2 >> 3 1 0 1 1 0 .078 .010 656.7 74.48 587.7 1.102 594.9 >> 4 1 1 1 0 1 .675 .642 394.8 4.296 542.7 .406 551.8 >> 4 1 1 0 1 1 .672 .639 396.6 4.670 543.2 .410 552.2 >> 4 0 1 1 1 1 .664 .631 401.2 5.588 544.2 .420 553.2 >> 4 1 1 1 1 0 .662 .628 402.7 5.886 544.5 .423 553.6 >> 4 1 0 1 1 1 .577 .535 450.1 16.09 554.6 .528 563.6 >> 5 1 1 1 1 1 .677 .636 398.3 6.000 544.4 .422 555.2 >> END DATA. >> >> * This works (but it's ugly) *. >> SORT CASES BY nvars(A) Cp(A) . >> SPLIT FILE LAYERED BY nvars . >> LIST /VARS=x1 TO sbc /FORMAT=SINGLE /CASES=FROM 1 TO 3. >> SPLIT FILE OFF. >> >> * This doesn't work *. >> SORT CASES BY nvars(A) Cp(A) . >> SPLIT FILE LAYERED BY nvars . /* I've also tried with SEPARATE *. >> SUMMARIZE >> /TABLES=x1 TO sbc >> /FORMAT=LIST NOCASENUM NOTOTAL LIMIT=3 >> /TITLE='Best subsets models' >> /MISSING=VARIABLE >> /CELLS=NONE. >> SPLIT FILE OFF. |
In reply to this post by Marta García-Granero
Hi, Marta,
I think this works... SORT CASES BY nvars(A) Cp(A) . RANK VARIABLES=cp (A) BY nvars /RANK /PRINT=YES /TIES=CONDENSE . SPLIT FILE LAYERED BY nvars . TEMPORARY. SELECT IF rcp LE 3. SUMMARIZE /TABLES=x1 TO sbc /FORMAT=LIST NOCASENUM NOTOTAL /TITLE='Best subsets models' /MISSING=VARIABLE /CELLS=NONE. Greetings Frederic %%%%%%%%%%%%%%%%%%%%%%% Frederic Villamayor Unitat de Bioestadística Àrea de Desenvolupament Preclínic CIDF Ferrer Grupo Juan de Sada, 32 08028 - Barcelona Espanya E-mail: [hidden email] Tel: +34 935093236 Fax: +34 934112764 WWW: www.ferrergrupo.com %%%%%%%%%%%%%%%%%%%%%%% "Sanity is not statistical" 1984 (George Orwell) Marta García-Granero <[hidden email]> Enviado por: "SPSSX(r) Discussion" <[hidden email]> 21/07/2006 18:18 Por favor, responda a Marta García-Granero <[hidden email]> Para [hidden email] cc Asunto Hi my friends As part of a never ending macro development for best subsets in multiple regression, I need a restricted list of a very long file (can have several thousand rows): the first 3 cases for every value another variable takes (see file below). I have been using SPLIT FILE with LIST /CASES=FROM 1 TO 3, and it works, but I wanted a nicer looking table (easier to modify afterwards with Word), and I tried to use the same idea, but with REPORT!instead of LIST... It doesn't work :( Any ideas? Thanks Marta * Small sample of cumbersome dataset *. DATA LIST LIST/nvars x1 to x5 (6 F3) r2 rho2 res_sd cp aic apc sbc (7 F5.3). BEGIN DATA 1 0 1 0 0 0 .599 .590 422.6 7.421 546.1 .438 549.8 1 0 0 0 0 1 .390 .376 521.5 32.74 565.1 .667 568.7 1 0 0 0 1 0 .074 .052 642.5 70.91 583.8 1.012 587.5 1 0 0 1 0 0 .042 .020 653.6 74.80 585.4 1.047 589.0 1 1 0 0 0 0 .011 -.012 664.1 78.57 586.8 1.081 590.4 2 0 1 0 1 0 .658 .642 394.9 2.282 541.0 .390 546.4 2 0 1 1 0 0 .648 .631 401.1 3.594 542.4 .403 547.8 2 0 1 0 0 1 .608 .590 422.8 8.343 547.1 .448 552.5 2 1 1 0 0 0 .603 .585 425.5 8.933 547.7 .453 553.1 2 1 0 0 0 1 .553 .531 451.9 15.07 553.1 .511 558.5 2 0 0 0 1 1 .430 .403 510.1 29.89 564.0 .651 569.4 2 0 0 1 0 1 .415 .387 516.7 31.68 565.2 .668 570.6 2 1 0 0 1 0 .078 .034 648.9 72.48 585.7 1.054 591.1 2 0 0 1 1 0 .074 .030 650.0 72.87 585.8 1.058 591.2 2 1 0 1 0 0 .053 .008 657.3 75.40 586.8 1.082 592.3 3 0 1 0 1 1 .662 .638 397.3 3.796 542.4 .403 549.7 3 0 1 1 1 0 .660 .636 398.5 4.049 542.7 .406 549.9 3 1 1 0 1 0 .659 .634 399.3 4.207 542.9 .407 550.1 3 1 1 1 0 0 .652 .627 403.3 5.037 543.8 .416 551.0 3 0 1 1 0 1 .652 .627 403.3 5.048 543.8 .416 551.0 3 1 1 0 0 1 .637 .610 412.2 6.915 545.7 .434 553.0 3 1 0 1 0 1 .576 .546 445.0 14.19 552.6 .506 559.9 3 1 0 0 1 1 .564 .533 451.3 15.64 553.9 .521 561.1 3 0 0 1 1 1 .430 .388 516.2 31.89 566.0 .681 573.2 3 1 0 1 1 0 .078 .010 656.7 74.48 587.7 1.102 594.9 4 1 1 1 0 1 .675 .642 394.8 4.296 542.7 .406 551.8 4 1 1 0 1 1 .672 .639 396.6 4.670 543.2 .410 552.2 4 0 1 1 1 1 .664 .631 401.2 5.588 544.2 .420 553.2 4 1 1 1 1 0 .662 .628 402.7 5.886 544.5 .423 553.6 4 1 0 1 1 1 .577 .535 450.1 16.09 554.6 .528 563.6 5 1 1 1 1 1 .677 .636 398.3 6.000 544.4 .422 555.2 END DATA. * Listing 3 best models for every number of predictors *. * This works (but it's ugly) *. SORT CASES BY nvars(A) Cp(A) . SPLIT FILE LAYERED BY nvars . LIST /VARS=x1 TO sbc /FORMAT=SINGLE /CASES=FROM 1 TO 3. SPLIT FILE OFF. * This doesn't work *. SORT CASES BY nvars(A) Cp(A) . SPLIT FILE LAYERED BY nvars . /* I've also tried with SEPARATE *. SUMMARIZE /TABLES=x1 TO sbc /FORMAT=LIST NOCASENUM NOTOTAL LIMIT=3 /TITLE='Best subsets models' /MISSING=VARIABLE /CELLS=NONE. SPLIT FILE OFF. |
Free forum by Nabble | Edit this page |