|
Hello again.
I've succeeded in aggregating my long (or, complex) file of givers (mentioned here before) to include variables for gift year, mean gift, gift count, and gift sum for the break variables ID (MRSEQ#) and gift year (GFTYR), a small but significant victory for me. Just so you can see where I'm at, here's the syntax: * Date and Time Wizard: GFTYR. COMPUTE GFTYR = XDATE.YEAR(GFTDAT). VARIABLE LABEL GFTYR "Gift year". VARIABLE LEVEL GFTYR (SCALE). FORMATS GFTYR (F8.0). VARIABLE WIDTH GFTYR(8). EXECUTE. AGGREGATE /OUTFILE=* MODE=ADDVARIABLES /BREAK=MRSEQ# GFTYR /GFT_mean = MEAN(DCGAMT) /GFT_count = NU(DCGAMT) /GFT_sum = SUM(DCGAMT). Currently, each record in the file is a gift (i.e., individual IDs can show up hundreds of times); I'd like to end up with a file with each donor ID as a record, with variables for GFT_mean, GFT_count, and GFT_sum by year (i.e., GFT_mean77, GFT_count77, GFT_sum77, GFT_mean78, GFT_count78, GFT_sum78...). My earliest gift came in in 1950, so I'll end up with quite a few variables, but my eventual analysis will look at giving by individual over time, and this format seems much more approachable to me. I imagine there's a way to get this done with my data in its current long form, but I'm having a difficult time getting my mind to think that way (or at all, some days). Thanks so much. Mark ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
|
At 07:14 PM 3/13/2008, Mark Palmberg wrote:
>I've succeeded in aggregating my long (or, complex) file of givers >(mentioned here before) to include variables for gift year, mean >gift, gift count, and gift sum for the break variables ID (MRSEQ#) >and gift year (GFTYR), a small but significant victory for me. Hurray! "Take your victories in whatever size they come in." >Just so you can see where I'm at, here's the syntax: > >* Date and Time Wizard: GFTYR. >COMPUTE GFTYR = XDATE.YEAR(GFTDAT). >VARIABLE LABEL GFTYR "Gift year". >VARIABLE LEVEL GFTYR (SCALE). >FORMATS GFTYR (F8.0). >VARIABLE WIDTH GFTYR(8). >EXECUTE. >AGGREGATE > /OUTFILE=* MODE=ADDVARIABLES > /BREAK=MRSEQ# GFTYR > /GFT_mean = MEAN(DCGAMT) > /GFT_count = NU(DCGAMT) > /GFT_sum = SUM(DCGAMT). Looks good. Except for the "EXECUTE.", which slows processing and buys you nothing. >Currently, each record in the file is a gift (i.e., individual IDs >can show up hundreds of times); I'd like to end up with a file with >each donor ID as a record, with variables for GFT_mean, GFT_count, >and GFT_sum by year (i.e., GFT_mean77, GFT_count77, GFT_sum77, >GFT_mean78, GFT_count78, GFT_sum78...). You're 90% there. As you've noted, you've summarized to one record per donor per year, rather than one per gift, like this: |-----------------------------|---------------------------| |Output Created |13-MAR-2008 21:18:30 | |-----------------------------|---------------------------| MRSEQ# GFTYR GFT_Mean GFT_Count GFT_SUM 101 1981 $29 17 $500 101 1982 $33 12 $400 213 1983 $500 2 $1,000 213 1984 $2,500 1 $2,500 Number of cases read: 4 Number of cases listed: 4 Now, it's a simple CASESTOVARS. Note that I computed a new two-digit year to get your desired two-digit suffix. Beware: if you get gifts in year 2000+, their variables will come in the list *before* the variables for 19xx. NUMERIC YR2DIG (F2). COMPUTE YR2DIG= GFTYR-1900. EXECUTE. CASESTOVARS /ID = MRSEQ# /INDEX = YR2DIG /GROUPBY = INDEX /SEPARATOR = '' /* Line added by hand */ /DROP = GFTYR /* Line added by hand */. Cases to Variables |-----------------------------|---------------------------| |Output Created |13-MAR-2008 21:22:46 | |-----------------------------|---------------------------| Generated Variables |---------|------|-----------| |Original |YR2DIG|Result | |Variable | |-----------| | | |Name | |---------|------|-----------| |GFT_Mean |81 |GFT_Mean81 | | |82 |GFT_Mean82 | | |83 |GFT_Mean83 | | |84 |GFT_Mean84 | |---------|------|-----------| |GFT_Count|81 |GFT_Count81| | |82 |GFT_Count82| | |83 |GFT_Count83| | |84 |GFT_Count84| |---------|------|-----------| |GFT_SUM |81 |GFT_SUM81 | | |82 |GFT_SUM82 | | |83 |GFT_SUM83 | | |84 |GFT_SUM84 | |---------|------|-----------| Processing Statistics |---------------|---| |Cases In |4 | |Cases Out |2 | |---------------|---| |Cases In/Cases |2.0| |Out | | |---------------|---| |Variables In |6 | |Variables Out |13 | |---------------|---| |Index Values |4 | |---------------|---| LIST. List |-----------------------------|---------------------------| |Output Created |13-MAR-2008 21:22:46 | |-----------------------------|---------------------------| The variables are listed in the following order: LINE 1: MRSEQ# GFT_Mean81 GFT_Count81 GFT_SUM81 GFT_Mean82 GFT_Count82 LINE 2: GFT_SUM82 GFT_Mean83 GFT_Count83 GFT_SUM83 GFT_Mean84 GFT_Count84 LINE 3: GFT_SUM84 MRSEQ#: 101 $29 17 $500 $33 12 GFT_SUM82: $400 . . . . . GFT_SUM84: . MRSEQ#: 213 . . . . . GFT_SUM82: . $500 2 $1,000 $2,500 1 GFT_SUM84: $2,500 Number of cases read: 2 Number of cases listed: 2 >My earliest gift came in in 1950, so I'll end up with quite a few >variables, but my eventual analysis will look at giving by >individual over time, and this format seems much more approachable to me. You know that I'd say that about EXECUTE, right? The other thing you know I'll say is, think carefully before doing this. You can do a lot of over-time analysis with long-form data; in fact, it's generally easier. That's even more likely in your case, as organizing the file like MRSEQ# GFT_Mean81 GFT_Count81 GFT_SUM81 GFT_Mean82 GFT_Count82 GFT_SUM82 GFT_Mean83 GFT_Count83 GFT_SUM83 GFT_Mean84 GFT_Count84 GFT_SUM84 means you can't even define VECTORs for your means, counts and totals. Now go forward in peace, to find your own path. -Warm regards, Richard Ristow ================================ APPENDIX: Test data and all code ================================ DATA LIST LIST /MRSEQ# GFTYR GFT_Mean GFT_Count GFT_SUM (N4 F4 DOLLAR8 F4 DOLLAR10). BEGIN DATA 101 1981 29.4 17 500 101 1982 33.3 12 400 213 1983 500 2 1000 213 1984 2500 1 2500 END DATA. LIST. NUMERIC YR2DIG (F2). COMPUTE YR2DIG= GFTYR-1900. CASESTOVARS /ID = MRSEQ# /INDEX = YR2DIG /GROUPBY = INDEX /SEPARATOR = '' /* Line added by hand */ /DROP = GFTYR /* Line added by hand */. LIST. ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
|
In reply to this post by Mark Palmberg
Thank you for this addendum, Richard (and thank you for your reply, ViAnn).
It helps clarify things for me re: AGGREGATE that I wasn't picking up from the CSR. Also, in my meager defense, the EXECUTE. included in my original quoted syntax came from the wizard. Mark On Thu, Mar 13, 2008 at 8:47 PM, Richard Ristow <[hidden email]> wrote: > Here's a point I missed, because I was so sure it *wasn't* that way: > > > >Just so you can see where I'm at, here's the syntax: > > > >* Date and Time Wizard: GFTYR. > >COMPUTE GFTYR = XDATE.YEAR(GFTDAT). > >VARIABLE LABEL GFTYR "Gift year". > >VARIABLE LEVEL GFTYR (SCALE). > >FORMATS GFTYR (F8.0). > >VARIABLE WIDTH GFTYR(8). > >EXECUTE. > >AGGREGATE > > /OUTFILE=* MODE=ADDVARIABLES > > /BREAK=MRSEQ# GFTYR > > /GFT_mean = MEAN(DCGAMT) > > /GFT_count = NU(DCGAMT) > > /GFT_sum = SUM(DCGAMT). > > There's what amounts to a mistake in the AGGREGATE: using > MODE=ADDVARIABLES. If you use it, you'll still have a record for > every gift, with the year-summary variables appended. If you don't > use MODE=ADDVARIABLES, you'll get the summary record by year that I > assumed, and worked with in my example syntax. > > ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
|
In reply to this post by Richard Ristow
On Thu, Mar 13, 2008 at 8:35 PM, Richard Ristow <[hidden email]>
wrote: > You can do a lot > of over-time analysis with long-form data; in fact, it's generally > easier. That's even more likely in your case, as organizing the file like > > MRSEQ# GFT_Mean81 GFT_Count81 GFT_SUM81 GFT_Mean82 GFT_Count82 GFT_SUM82 > GFT_Mean83 GFT_Count83 GFT_SUM83 GFT_Mean84 GFT_Count84 GFT_SUM84 > > means you can't even define VECTORs for your means, counts and totals. > Unless I reorder my variables to be consecutive within Mean, Count, and SUM, yes? The long-form alternative, then, if I wanted to examine the annual differences in giving by individual (including, but not limited to, things like percentage dollar increase [or decrease] year-to-year and lifetime and/or mean #days between gifts [by year and lifetime]) would be something like the syntax you offered in a previous post (11/24/07): NUMERIC #NewGuy (F2) /* Flag first record for an ID */. DO IF MISSING(LAG(ID)) OR LAG(ID) NE ID. . COMPUTE #NewGuy = 1. ELSE. . COMPUTE #NewGuy = 0. END IF. * Difference in days between successive gifts . * (Requires data be sorted by ID and gift date.) . NUMERIC BtwnGift (F4). VAR LABEL BtwnGift 'Time, days, since previous gift'. IF NOT #NewGuy BtwnGift = datediff(GFTDAT,LAG(GFTDAT),"days"). Which gives me a new variable (BtwnGift) that holds -- for each gift record AFTER the first one for each ID -- the number of days since the previous gift (the same syntax could be used for my GFTAMT variable). This still leaves me with the a single variable in long form (i.e., by gift, not ID) that I would otherwise analyze in wide form (i.e., by ID) with VECTOR and LOOP-END LOOP. So, then, my enduring question is: What is the key syntax I need to review to be able to -- from a long-form data file -- end up with a way to measure a donor's giving performance (measured by amount and frequency)? I'm looking at the "VECTOR outside a Loop Structure" example in the CSR (pg. 1891 -- v. 15), but it appears to only get me back to a wide format file anyway. I feel like I'm staring a solution in the face...or at least looking in its general direction. Thank you for your patient and gentle guidance. Mark ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
|
In reply to this post by Mark Palmberg
At 12:00 AM 3/14/2008, Mark Palmberg wrote:
>Thank you for this addendum, Richard (and thank you for your reply, >ViAnn). It helps clarify things for me re: AGGREGATE that I wasn't >picking up from the CSR. Good! And I've good hopes this will get you to what you want. >Also, in my meager defense, the EXECUTE. included in my original >quoted syntax came from the wizard. (Smile) It happens. I noticed that the output I posted (though not the syntax) has an EXECUTE, too. It, also, is unneeded if you're running the syntax; but I clicked "Run Pending Transforms" before clicking up the CASESTOVARS, so I could see what the data would look like to that point. (That, by the way, is a perfectly good reason for using EXECUTE, which is what "Run Pending Transforms" means, when you're working interactively. However, the resulting EXECUTEs are generally best removed if you save and run the syntax.) ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
|
In reply to this post by Mark Palmberg
At 11:54 AM 3/14/2008, Mark Palmberg wrote:
>On Thu, Mar 13, 2008 at 8:35 PM, Richard Ristow wrote: > >>organizing the file like >>MRSEQ# GFT_Mean81 GFT_Count81 GFT_SUM81 GFT_Mean82 GFT_Count82 GFT_SUM82 >> GFT_Mean83 GFT_Count83 GFT_SUM83 GFT_Mean84 GFT_Count84 GFT_SUM84 >>means you can't even define VECTORs for your means, counts and totals. > >Unless I reorder my variables to be consecutive within Mean, Count, >and SUM, yes? Exactly. But don't reorder them. On the CASESTOVARS, use "/GROUPBY=VARIABLE" instead of "/GROUPBY=INDEX", in the first place. (It's selectable in the wizard.) >The long-form alternative, then, if I wanted to examine the annual >differences in giving by individual (including, but not limited to, >things like percentage dollar increase [or decrease] year-to-year >and lifetime and/or mean #days between gifts [by year and lifetime]) >would be something like the syntax you offered in a previous post (11/24/07): Syntax not re-quoted; but, yes. >My enduring question is: What is the key syntax I need to review to >be able to -- from a long-form data file -- end up with a way to >measure a donor's giving performance (measured by amount and >frequency)? I'm looking at the "VECTOR outside a Loop Structure" >example in the CSR (pg. 1891 -- v. 15), but it appears to only get >me back to a wide format file anyway. Yes. A VECTOR exists only within each case, and LOOP runs during the processing of one case. VECTOR/LOOP logic will only work with a wide-format file. >to be able to -- from a long-form data file -- measure a donor's >giving performance (measured by amount and frequency)? Mostly, you'll compute some intermediate variables (often using LAG), and then use AGGREGATE. But I can't be more specific (except for guesses) without knowing what performance measures you want. >I feel like I'm staring a solution in the face...or at least looking in its >general direction. You certainly are. The AGGREGATE code from your first posting in this thread (dropping MODE=ADDVARIABLES) already gives some useful performance measures by year. From the original file with one record per gift, it's quite easy to get, . Dates of earliest and most recent gifts . Number of gifts ever received, and total amount. (From these, you can get mean size of gift. From SPSS 14 on, AGGREGATE can also give you the median size of gift -- probably important, as donors usually have a 'regular' and a 'special' gift size.) From the file you've created, with summary statistics by year, you can calculate, . Time trend in giving, by any of those measures (though I suppose total per year is the most important). You can do that by, say, regression; or by looking at the ratio of giving in the latest and earliest year, converted into percent growth per year; or, probably, most other measures you can think of.) Now, can I leave these for you, as exercises in using AGGREGATE? Post again, of course, if you have trouble; or if you want measures other than these, and don't know how to get them. -With best regards, Richard ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
|
In reply to this post by Mark Palmberg
The simple question of amount and frequency is easily done with reporting
procedures like SUMMARIZE. Just break on giver ID and ask for SUM and N of the Gift Amount. If you want a breakdown by year then break on ID and YEAR. -----Original Message----- From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Mark Palmberg Sent: Friday, March 14, 2008 9:55 AM To: [hidden email] Subject: Re: CASESTOVARS / LOOP-END LOOP(?) On Thu, Mar 13, 2008 at 8:35 PM, Richard Ristow <[hidden email]> wrote: > You can do a lot > of over-time analysis with long-form data; in fact, it's generally > easier. That's even more likely in your case, as organizing the file like > > MRSEQ# GFT_Mean81 GFT_Count81 GFT_SUM81 GFT_Mean82 GFT_Count82 GFT_SUM82 > GFT_Mean83 GFT_Count83 GFT_SUM83 GFT_Mean84 GFT_Count84 GFT_SUM84 > > means you can't even define VECTORs for your means, counts and totals. > Unless I reorder my variables to be consecutive within Mean, Count, and SUM, yes? The long-form alternative, then, if I wanted to examine the annual differences in giving by individual (including, but not limited to, things like percentage dollar increase [or decrease] year-to-year and lifetime and/or mean #days between gifts [by year and lifetime]) would be something like the syntax you offered in a previous post (11/24/07): NUMERIC #NewGuy (F2) /* Flag first record for an ID */. DO IF MISSING(LAG(ID)) OR LAG(ID) NE ID. . COMPUTE #NewGuy = 1. ELSE. . COMPUTE #NewGuy = 0. END IF. * Difference in days between successive gifts . * (Requires data be sorted by ID and gift date.) . NUMERIC BtwnGift (F4). VAR LABEL BtwnGift 'Time, days, since previous gift'. IF NOT #NewGuy BtwnGift = datediff(GFTDAT,LAG(GFTDAT),"days"). Which gives me a new variable (BtwnGift) that holds -- for each gift record AFTER the first one for each ID -- the number of days since the previous gift (the same syntax could be used for my GFTAMT variable). This still leaves me with the a single variable in long form (i.e., by gift, not ID) that I would otherwise analyze in wide form (i.e., by ID) with VECTOR and LOOP-END LOOP. So, then, my enduring question is: What is the key syntax I need to review to be able to -- from a long-form data file -- end up with a way to measure a donor's giving performance (measured by amount and frequency)? I'm looking at the "VECTOR outside a Loop Structure" example in the CSR (pg. 1891 -- v. 15), but it appears to only get me back to a wide format file anyway. I feel like I'm staring a solution in the face...or at least looking in its general direction. Thank you for your patient and gentle guidance. Mark ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
|
Ah ha! Yes. This is tremendously helpful, ViAnn. Thank you. Truly, I
find I frequently lose the forest for the trees in my explorations of SPSS' capabilities. I'm looking forward to participating in some training courses in Chicago this summer. Mark On Fri, Mar 14, 2008 at 3:21 PM, ViAnn Beadle <[hidden email]> wrote: > The simple question of amount and frequency is easily done with reporting > procedures like SUMMARIZE. Just break on giver ID and ask for SUM and N of > the Gift Amount. If you want a breakdown by year then break on ID and > YEAR. > > -----Original Message----- > From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of > Mark Palmberg > Sent: Friday, March 14, 2008 9:55 AM > To: [hidden email] > Subject: Re: CASESTOVARS / LOOP-END LOOP(?) > > On Thu, Mar 13, 2008 at 8:35 PM, Richard Ristow <[hidden email]> > wrote: > > > You can do a lot > > of over-time analysis with long-form data; in fact, it's generally > > easier. That's even more likely in your case, as organizing the file > like > > > > MRSEQ# GFT_Mean81 GFT_Count81 GFT_SUM81 GFT_Mean82 GFT_Count82 GFT_SUM82 > > GFT_Mean83 GFT_Count83 GFT_SUM83 GFT_Mean84 GFT_Count84 GFT_SUM84 > > > > means you can't even define VECTORs for your means, counts and totals. > > > > Unless I reorder my variables to be consecutive within Mean, Count, and > SUM, > yes? > > The long-form alternative, then, if I wanted to examine the annual > differences in giving by individual (including, but not limited to, things > like percentage dollar increase [or decrease] year-to-year and lifetime > and/or mean #days between gifts [by year and lifetime]) would be something > like the syntax you offered in a previous post (11/24/07): > > NUMERIC #NewGuy (F2) /* Flag first record for an ID */. > > DO IF MISSING(LAG(ID)) > OR LAG(ID) NE ID. > . COMPUTE #NewGuy = 1. > ELSE. > . COMPUTE #NewGuy = 0. > END IF. > > * Difference in days between successive gifts . > * (Requires data be sorted by ID and gift date.) . > > NUMERIC BtwnGift (F4). > VAR LABEL BtwnGift 'Time, days, since previous gift'. > > IF NOT #NewGuy BtwnGift = datediff(GFTDAT,LAG(GFTDAT),"days"). > > Which gives me a new variable (BtwnGift) that holds -- for each gift > record > AFTER the first one for each ID -- the number of days since the previous > gift (the same syntax could be used for my GFTAMT variable). This still > leaves me with the a single variable in long form (i.e., by gift, not ID) > that I would otherwise analyze in wide form (i.e., by ID) with VECTOR and > LOOP-END LOOP. > > So, then, my enduring question is: What is the key syntax I need to > review > to be able to -- from a long-form data file -- end up with a way to > measure > a donor's giving performance (measured by amount and frequency)? I'm > looking at the "VECTOR outside a Loop Structure" example in the CSR (pg. > 1891 -- v. 15), but it appears to only get me back to a wide format file > anyway. > > I feel like I'm staring a solution in the face...or at least looking in > its > general direction. Thank you for your patient and gentle guidance. > > Mark > > ===================== > To manage your subscription to SPSSX-L, send a message to > [hidden email] (not to SPSSX-L), with no body text except the > command. To leave the list, send the command > SIGNOFF SPSSX-L > For a list of commands to manage subscriptions, send the command > INFO REFCARD > > ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
| Free forum by Nabble | Edit this page |
