This post was updated on .
Hi guys,
In my dataset, I have a Price variable and there was a lot of missing value. I want to perform a "carry-over" interpolation, which means that I fill the gap with previous value. But the tricky part is to identify and only fill when is the first available quote after 9:30 and when is the last available quote before 16:00. Then, perform carry-over interpolation, or gap filling, only in between this interval. For example, in the attached sample excel file, 8_&_25.xlsx there was 2 specific dates: 8/3/2000 & 25/11/2011. The yellow-highlighted gap is the one I wanna do the filling. With 8/3/2000: the first quote is 9:33 and last quote is 14:43. With 25/11/2011: the first quote is 9:30 and last quote is 15:59. The filling should be done between these interval. I gave a try by creating a syntax with Recode and If functions to do the carry-over interpolation. It was only successful for 8/3/2000 but not for 25/11/2011 because of the different time interval. Also, I used the option in SPSS like Transform-Replace Missing Value & Analyse-Multiple Imputation-Impute Missing Value do not produce the result what I want as well. That's what I get stuck and struggled to proceed to the next step for calculation. Can someone spare me some time and help me out turn it around. I am very very deeply appreciated your time, concern and understanding! Cheers, Tom |
Administrator
|
Please see LAG function!
Best to post your defective syntax next time around! -------------
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me. --- "Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis." Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?" |
In reply to this post by Tom2601
The excel file isn't distributed on the listserv (it might be on nabble). Anyway post to the listserv a specific example along with the desired result.
Gene Maguin -----Original Message----- From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Tom2601 Sent: Thursday, July 03, 2014 12:30 AM To: [hidden email] Subject: Missing Data Hi guys, In my dataset, I have a Price variable and there was a lot of missing value. I want to perform a "carry-over" interpolation, which means that I fill the gap with previous value. But the tricky part is to identify and only fill when is the first available quote after 9:30 and when is the last available quote before 16:00. Then, perform carry-over interpolation, or gap filling, only in between this interval. For example, in the attached sample excel file, there was 2 specific dates: 8/3/2000 & 25/11/2011. The yellow-highlighted gap is the one I wanna do the filling. With 8/3/2000: the first quote is 9:33 and last quote is 14:43. With 25/11/2011: the first quote is 9:30 and last quote is 15:59. The filling should be done between these interval. I gave a try by creating a syntax with Recode and If functions to do the carry-over interpolation. It was only successful for 8/3/2000 but not for 25/11/2011 because of the different time interval. Also, I used the option in SPSS like Transform-Replace Missing Value & Analyse-Multiple Imputation-Impute Missing Value do not produce the result what I want as well. That's what I get stuck and struggled to proceed to the next step for calculation. Can someone spare me some time and help me out turn it around. I am very very deeply appreciated your time, concern and understanding! Cheers, Tom -- View this message in context: http://spssx-discussion.1045642.n5.nabble.com/Missing-Data-tp5726642.html Sent from the SPSSX Discussion mailing list archive at Nabble.com. ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
Administrator
|
The Excel file can be downloaded here:
http://spssx-discussion.1045642.n5.nabble.com/file/n5726642/8_%26_25.xlsx HTH.
--
Bruce Weaver bweaver@lakeheadu.ca http://sites.google.com/a/lakeheadu.ca/bweaver/ "When all else fails, RTFM." PLEASE NOTE THE FOLLOWING: 1. My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above. 2. The SPSSX Discussion forum on Nabble is no longer linked to the SPSSX-L listserv administered by UGA (https://listserv.uga.edu/). |
In reply to this post by Tom2601
At 12:29 AM 7/3/2014, Tom2601 wrote:
>In my dataset, I have a Price variable and there was a lot of >missing value. I want to perform a "carry-over" interpolation, which >means that I fill the gap with previous value. But the tricky part >is to identify and only fill when is the first available quote after >9:30 and when is the last available quote before 16:00. Then, >perform carry-over interpolation, or gap filling, only in between >this interval. I don't understand your rules, particularly the second one. And while it is very good to have test data, yours is in a form that's difficult to use, because, . There's a great deal of it, so it's hard to go through by hand and understand it . You use color-coding in Excel to mark rows that should have interpolated prices -- but the color-coding doesn't carry over when the data is read into SPSS, so there's no way of checking whatever algorithm we come up with, against whether you say interpolation should take place. My best guess is, a missing price should be replaced by the last previous price value, IF that value occurred after 9:30 am on the current date, and (here's what I'm not sure of) IF the series of consecutive missing prices, of which this missing price is a member, does not extend past 4:00 p.m. I don't see an elegant solution. Below is an inelegant one. It's tested; but this is just code, not SPSS output, because meaningful output is simply too long. The following runs on your test data, with the time as an SPSS time. * ....... Awkward part: identify runs of present and missing . * values, and the latest member of each run. . * (In the definition below, a run may not span from one day to . * another.) . NUMERIC RunNumb (F4). VAR LABEL RunNumb 'Number, this run of present or absent prices'. VAR WIDTH RunNumb (6). COMPUTE RunNumb = /* First, mark records that begin runs */ $CASENUM EQ 1 OR Date NE LAG(Date) OR MISSING(Last) NE MISSING(LAG(Last)). CREATE RunNumb /* Then, assign numbers to runs */ =CSUM(RunNumb). * Get the latest time (note that this must be within the same day) . * that occurs in each run. . AGGREGATE OUTFILE=* MODE=ADDVARIABLES /BREAK=RunNumb /RunEnd 'Latest time within this run' = MAX(Time). * That done, run through the file, deciding in every case whether . * to fill in an earlier value for a missing one: . NUMERIC FillStat (F2). VAR LABEL FillStat 'Status of filled-in price'. VAL LABEL FillStat 1 'Present' 2 'Filled in' 3 'NO-no pv value' 4 'NO-value <0930' 5 'NO-series >1600'. NUMERIC FilledIn (F7.3). VAR WIDTH FilledIn (10). VAR LABEL FilledIn 'Price, filled in from missing values'. NUMERIC #LatestValue (F7.3) #LatestTime (TIME8). DO IF $CASENUM EQ 1 OR Date NE LAG(Date). . COMPUTE #LatestValue = $SYSMIS. . COMPUTE #LatestTime = $SYSMIS. END IF. DO IF NOT MISSING (Last). . COMPUTE FilledIn = Last. . COMPUTE FillStat = 1. . COMPUTE #LatestValue = Last. . COMPUTE #LatestTime = Time. ELSE IF MISSING(#LatestValue). . COMPUTE FillStat = 3. ELSE IF #LatestTime LT TIME.HMS(9,30). . COMPUTE FillStat = 4. ELSE IF RunEnd GT TIME.HMS(16). . COMPUTE FillStat = 5. ELSE. . COMPUTE FilledIn = #LatestValue. . COMPUTE FillStat = 2. END IF. TEMPORARY. STRING Why (A16). COMPUTE Why = VALUELABEL(FillStat). SELECT IF Date EQ DATE.MDY(11,25,2011). LIST. ========================================== APPENDIX: All code (test data not included) ========================================== * C:\Documents and Settings\Richard\My Documents . * \Technical\spssx-l\Z-2014\ . * 2014-07-02 Tom2601-Missing Data.SPS . * In response to posting . * Date: Wed, 2 Jul 2014 21:29:38 -0700 . * From: Tom2601 <[hidden email]> . * Subject: Missing Data . * To: [hidden email] . * "I have a Price variable and there was a lot of missing value. I . * want to perform a "carry-over" interpolation, which means that I . * fill the gap with previous value. But the tricky part is to . * identify and only fill when is the first available quote after . * 9:30 and when is the last available quote before 16:00. Then, . * perform carry-over interpolation, or gap filling, only in between . * this interval." . PRESERVE. SET MXWARNS 0. DATA LIST LIST / Date Time AmPm Last (DATE9, TIME9, A2, F7.3). BEGIN DATA [Data omitted. It was the contents of http://spssx-discussion.1045642.n5.nabble.com/file/n5726642/8_%26_25.xlsx , copied as text and pasted here] END DATA. RESTORE. FORMATS Date (DATE11). NUMERIC LineNumb (F5). VAR LABEL LineNumb 'Mirroring the line number in the Excel spreadsheet'. COMPUTE LineNumb = $CASENUM + 1. NUMERIC Time24 (TIME8). DO IF AmPm EQ 'AM'. . DO IF TIME GE TIME.HMS(12). . COMPUTE Time24 = Time - TIME.HMS(12). . ELSE. . COMPUTE Time24 = Time. . END IF. ELSE IF AmPm EQ 'PM'. . DO IF TIME GE TIME.HMS(12). . COMPUTE Time24 = Time. . ELSE. . COMPUTE Time24 = Time + TIME.HMS(12). . END IF. END IF. VARIABLE WIDTH LineNumb (05). VARIABLE WIDTH Date (11). VARIABLE WIDTH Time Time24 (08). VARIABLE ALIGN AmPm (CENTER). VARIABLE WIDTH AmPm (05). VARIABLE WIDTH Last (08). ADD FILES /FILE=* /KEEP= LineNumb Date Time AmPm Time24 Last. DATASET NAME TestData WINDOW=FRONT. EXECUTE /* for testing; LIST is too voluminous */. * ....... Substantive code ........... . ADD FILES /FIlE =TestData /RENAME=(Time Time24=DropTime Time) /DROP = DropTime AmPm. DATASET NAME Manipulate WINDOW=FRONT. * I believe he means: . * a missing price should be replaced by the last previous price . * value, . * IF that value occurred after 9:30 am on the current date, and . * (here's what I'm not sure of) . * IF the series of consecutive missing prices, of which this . * missing price is a member, does not extend past 4:00 p.m. . * ....... Awkward part: identify runs of present and missing . * values, and the latest member of each run. . * (In the definition below, a run may not span from one day to . * another.) . NUMERIC RunNumb (F4). VAR LABEL RunNumb 'Number, this run of present or absent prices'. VAR WIDTH RunNumb (6). COMPUTE RunNumb = /* First, mark records that begin runs */ $CASENUM EQ 1 OR Date NE LAG(Date) OR MISSING(Last) NE MISSING(LAG(Last)). CREATE RunNumb /* Then, assign numbers to runs */ =CSUM(RunNumb). * Get the latest time (note that this must be within the same day) . * that occurs in each run. . AGGREGATE OUTFILE=* MODE=ADDVARIABLES /BREAK=RunNumb /RunEnd 'Latest time within this run' = MAX(Time). * That done, run through the file, deciding in every case whether . * to fill in an earlier value for a missing one: . NUMERIC FillStat (F2). VAR LABEL FillStat 'Status of filled-in price'. VAL LABEL FillStat 1 'Present' 2 'Filled in' 3 'NO-no prv value' 4 'NO-value <0930' 5 'NO-series >1600'. NUMERIC FilledIn (F7.3). VAR WIDTH FilledIn (10). VAR LABEL FilledIn 'Price, filled in from missing values'. NUMERIC #LatestValue (F7.3) #LatestTime (TIME8). DO IF $CASENUM EQ 1 OR Date NE LAG(Date). . COMPUTE #LatestValue = $SYSMIS. . COMPUTE #LatestTime = $SYSMIS. END IF. DO IF NOT MISSING (Last). . COMPUTE FilledIn = Last. . COMPUTE FillStat = 1. . COMPUTE #LatestValue = Last. . COMPUTE #LatestTime = Time. ELSE IF MISSING(#LatestValue). . COMPUTE FillStat = 3. ELSE IF #LatestTime LT TIME.HMS(9,30). . COMPUTE FillStat = 4. ELSE IF RunEnd GT TIME.HMS(16). . COMPUTE FillStat = 5. ELSE. . COMPUTE FilledIn = #LatestValue. . COMPUTE FillStat = 2. END IF. TEMPORARY. STRING Why (A16). COMPUTE Why = VALUELABEL(FillStat). SELECT IF Date EQ DATE.MDY(11,25,2011). LIST. ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
Free forum by Nabble | Edit this page |