Dear listserv, I have a dataset with 15, 000 cases (organizations) and 400 variables. The data were collected over a 10-year period. A single organization can appear in the dataset for 10 years or for fewer years, depending when it began to meet the criteria for inclusion. For a number of organizations (roughly 5% of total), I need to replace data on 150 variables that appear sequentially in the dataset. The variables are cities in which an organization has members or affiliates, named City1 to City150 in the dataset, where 1 = organization has members/affiliates in the city and 0= organization has no known members/affiliates in the city. The organizations for which the data need to be replaced with data from another organization are given a particular code on a variable “Replacecode”, depending on the method of replacement required. Data are only replaced if another variable, “TotalCities” is equal to 0. If TotalCities=0 and Replacecode=1, the data for City1-City150 will be replaced with data from another occurrence of this organization for the most proximate year for which TotalCities is not equal to 0. If there is data for the same number of years prior to the missing year and the same number of years after the missing year, data should always be replaced with prior year data. In the table below, Organization A, year 2008, is an example of an organization that requires this type of replacement. If Replacecode=2, it means that City data are missing for all occurrences of the organization in the dataset and that the data for City1 – City150 should be replaced with data from a similar organization. For example, in year 2010, organization B in the table below was 1 year old and based in Region 3. For all organizations like organization B (Age=1; Region=3) with missing data, data for City1-City150 should be replaced with data from another organization that is 1 year old and based in Region 3 (in this example, Organization H matches the criteria). Ideally, for all cases where Replacecode=2, the city from which data are copied to replace missing data will be randomly selected from all cities in the dataset that match the criteria we specify. What is the most efficient way to replace these data according to the specifications outlined above? To give you an idea what the data look like:
Thank you for your help. Senior Research Analyst Research Center - 7th Floor American College of Physicians 190 North Independence Mall West Philadelphia, PA 19106-1572 [hidden email] (215) 351-2561 |
Administrator
|
For part 1 consider using LAG within a DO REPEAT structure.
For part 2 here is some completely UNTESTED code which will most likely work like a charm (or NOT ;-). *NOTE* I am not doing *ANYTHING* to address the other 250 odd variables. You will want to segregate them and match them back later by Organization YEAR AGE etc so you don't cobber them with incoming data from the replacement cases. --- COMPUTE SCRAMBLE=UNIFORM(1). SORT CASES BY REPLACECODE AGE REGION SCRAMBLE. COMPUTE YOKED=SUM(1, (REPLACECODE EQ LAG(REPLACECODE)) * (AGE EQ LAG(AGE)) * (REGION EQ LAG(REGION)) * LAG(YOKED) ). DO IF REPLACECODE = 2. DO REPEAT COPY=@ORG @YEAR @AGE @REGION @REPLACECODE / SOURCE=ORGANIZATION YEAR AGE REGION REPLACECODE . COMPUTE COPY = SOURCE. END REPEAT. XSAVE OUTFILE "C:\TEMP\REPLACECODE2.SAV" / KEEP AGE REGION SCRAMBLE @ORG @YEAR @AGE @REGION @REPLACECODE. END IF. EXECUTE. SELECT IF REPLACECODE NE 2 AND TOTALCITIES GT 0. MATCH FILES / FILE * / DROP @ORG @YEAR @AGE @REGION @REPLACECODE. SAVE OUTFILE "C:\TEMP\ALLBUTREPLACECODE2.SAV". MATCH FILES / FILE * / FILE "C:\TEMP\REPLACECODE2.SAV" / BY AGE REGION SCRAMBLE . SELECT IF NOT(MISSING(@REPLACECODE)). COMPUTE DATAREPLACEDBY = ORGANIZATION. COMPUTE ORGANIZATION=@ORG. ADD FILES / FILE * / FILE="C:\TEMP\ALLBUTREPLACECODE2.SAV".
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me. --- "Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis." Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?" |
Free forum by Nabble | Edit this page |