Try the new Yahoo! Messenger 9.0 Now with all you love about messenger and more! |
|
Hi all, I need to compare two datasets, D1 and D2. These sets are the result of two linkage strategies, so they're each comprised of two datasets, A and B. A and B each have two id variables. I want to evaluate to what extent the two linkage strategies have led to different linkage pairs. In other words: what is the intersection between D1 and D2, what is the difference, and what do the differences look like. I've been thinking about using either UPDATE or MATCH FILES for this, something like: UPDATE FILE = d1 / RENAME (id_b = id_b_d1) / FILE = d2 / RENAME (id_b = id_b_d2) / BY = id_a . COMPUTE intersection = ( CONCAT (id_a, id_b_d1) = CONCAT (id_a, id_b_d2) ). Does this make sense? I'm getting kinda confused. Oh, and unfortunately I can't use SPSSINC COMPAREDATASETS. :-( *grinding his teeth* ;-) Cheers!! Albert-Jan |
|
Hi Albert-Jan,
Can you post small examples of both datasets, please, as I can't quite envisage what you describe. .. meanwhile, have you considered using Access for this, using standard queries to select intersecting and non-intersecting records? Thanks Regards Clive. >Hi all, > >I need to compare two datasets, D1 and D2. These sets are the result of two linkage strategies, so they're each comprised of two datasets, A and B. A and B each have two id variables. > > I want to evaluate to what extent the two linkage strategies have led to different linkage pairs. In other words: what is the intersection between D1 and D2, what is the difference, and what do the differences look like. > >I've been thinking about using either UPDATE or MATCH FILES for this, something like: >UPDATE FILE = d1 / RENAME (id_b = id_b_d1) / FILE = d2 / RENAME (id_b = id_b_d2) / BY = id_a . >COMPUTE intersection = ( CONCAT (id_a, id_b_d1) = CONCAT (id_a, id_b_d2) ). > >Does this make sense? I'm getting kinda confused. > >Oh, and unfortunately I can't use SPSSINC COMPAREDATASETS. :-( *grinding his teeth* ;-) > >Cheers!! >Albert-Jan > > > > ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
|
Hi All,
My SPSS 16.0 site license is in the process of being renewed and I'm now getting the annoying pop-up message about the license expiring soon. Although the message indicates that running expoff.bat will stop the message, I cannot locate this file in the spss directory. Also, trying to search for this file in my whole system yields nothing even when I include hidden files and folders. Does anyone know how to turn this pop-up message off? Thanks, Rick ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
|
In reply to this post by Albert-Jan Roskam
Albert-jan,
I agree with Clive that some sample data from both D1 and D2 would be helpful. In addition, I'd like to see a definition of the two linkage strategies. I'm imagining that you are not doing a one-to-one match--as would be done with a match files command--and have some sort of probability match. Lastly, I'm guessing that you'd want a summary statistic that is the proportion of A records that link to the same B record in D1 as in D2. True? If this is true, then (and without knowing any thing else), I think I'd match (match files) D1 and D2 by A file ID and rename the B file Id in D2 so that one doesn't overwrite the other. I'm assume that every A file record is on both D1 and D2. If so, then a match files will work. If not, then I think I'd match files D1 and D2 to file A because A is the union of A records in D1 and D2. Then you could compare the values of the D1 B file id and the D2 B file id. Gene Maguin >>I need to compare two datasets, D1 and D2. These sets are the result of two linkage strategies, so they're each comprised of two datasets, A and B. A and B each have two id variables. I want to evaluate to what extent the two linkage strategies have led to different linkage pairs. In other words: what is the intersection between D1 and D2, what is the difference, and what do the differences look like. I've been thinking about using either UPDATE or MATCH FILES for this, something like: UPDATE FILE = d1 / RENAME (id_b = id_b_d1) / FILE = d2 / RENAME (id_b = id_b_d2) / BY = id_a . COMPUTE intersection = ( CONCAT (id_a, id_b_d1) = CONCAT (id_a, id_b_d2) ). Does this make sense? I'm getting kinda confused. Oh, and unfortunately I can't use SPSSINC COMPAREDATASETS. :-( *grinding his teeth* ;-) Cheers!! Albert-Jan ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
|
In reply to this post by Albert-Jan Roskam
One reasonably
easy strategy would be to read the case id's of the data into Python creating a
Python set for each. Then you can use the standard set operators to
compute all the differences. In particular a.intersection(b) a – b b – a a.symmetric_difference(b) Too bad you
can't use SPSSINC COMPARE DATASETS. That means you can't use my fuzzy
matching module, FUZZY, to do the linking either. L From: SPSSX(r)
Discussion [mailto:[hidden email]] On
Behalf Of Albert-jan Roskam Hi all, |
|
In reply to this post by Handel, Richard W.
Hello Rick,
The expoff.bat suggestion is out of date for SPSS 16, as noted in the resolution below. SPSS 15 was the last version where that file was included and effective. Renewing the license is the only way to turn off the message in SPSS 16. David Matheson SPSS Statistical Support ***************** Resolution number: 76257 Created on: Mar 10 2008 Last Reviewed on: Jan 12 2009 Problem Subject: Unable to turn off "Your license renewal date has passed" alert -- expoff.bat does not exist in SPSS 16.0.x Problem Description: When launching SPSS 16.0.x I am getting a message, 'Your license renewal date has passed. SPSS will stop working if a new license is not installed soon. If you don't want to see this message again, run expoff.bat in the SPSS directory' I have looked in my SPSS program installation directory and I do not see expoff.bat. I am aware that if I click OK on this message that SPSS will continue to function properly until my license expires, however, I would prefer the convenience of turning off this alert. What can I do? Resolution Subject: At present there is not a way to turn off this alert in SPSS 16.0.x other than updating your license. Resolution Description: At present there is not a way to turn off this alert in SPSS 16.0.x other than updating your license. This issue has been filed with SPSS Development and we apologize for the inconvenience. -----Original Message----- From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Handel, Richard W. Sent: Friday, February 06, 2009 7:53 AM To: [hidden email] Subject: expoff.bat Hi All, My SPSS 16.0 site license is in the process of being renewed and I'm now getting the annoying pop-up message about the license expiring soon. Although the message indicates that running expoff.bat will stop the message, I cannot locate this file in the spss directory. Also, trying to search for this file in my whole system yields nothing even when I include hidden files and folders. Does anyone know how to turn this pop-up message off? Thanks, Rick ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
|
In reply to this post by Maguin, Eugene
Hello all, thanks for responding,
Let me tell you something about the background of the question of my previous post. The two datasets to be compared are the result of two linkage projects. One project used a probabilistic linkage technique, while the other used a (n-1) deterministic technique. The resulting datasets differ in a quantitative and in a qualitative way. The former refers to the percentage of linked records (probably higher using probab. technique) while the latter refers to the different linkages (pairs), depending on the technique used. Probabilistic linkage is fairly laborious and deterministic linkage is a routine task. We want to know if the latter is practically as good as the former, despite of its slightly lower linkage percentage. One outcome measure we use to evaluate this is mortality. It would be a bad sign, for example, if the mortality in the non-linkages differed from one group to the other. I hope the syntax below somewhat illustrates things.Basically, the question I would like to answer is: are the outcome measures, esp. the mortality rate, technique-independent? Thanks in advance for your replies! Best wishes, Albert-Jan * sample syntax. data list free / id_a (a4) id_b (a4) mort_d1 (f1). begin data 1 23 1 2 45 1 3 56 1 4 22 0 5 88 0 7 10 0 9 100 0 end data. dataset name d1. data list free / id_a (a4) id_b (a4) mort_d2 (f1). begin data 1 23 1 2 45 1 3 56 1 4 1 1 5 99 0 6 88 1 7 10 0 end data. dataset name d2. UPDATE FILE = d1 / RENAME (id_b = id_b_d1) / IN = in_d1 / FILE = d2 / RENAME (id_b = id_b_d2) / BY = id_a . COMPUTE intersection = ( CONCAT (id_a, id_b_d1) = CONCAT (id_a, id_b_d2) ). variable label mort_d1 'mortality (D1)' / mort_d2 'mortality (D2)'. value labels intersection 0 'record pair not in both files' 1 'record pair in both files' / mort_d1 mort_d2 0 'alive' 1 'dead' / in_d1 1 'probabilistically linked data' 0 '(n-1)-deterministically linked data'. crosstabs mort_d1 by intersection / cells = col. crosstabs mort_d2 by intersection / cells = col. dataset close all. ----- Original Message ---- From: Gene Maguin <[hidden email]> To: [hidden email] Sent: Friday, February 6, 2009 3:12:53 PM Subject: Re: comparing two datasets Albert-jan, I agree with Clive that some sample data from both D1 and D2 would be helpful. In addition, I'd like to see a definition of the two linkage strategies. I'm imagining that you are not doing a one-to-one match--as would be done with a match files command--and have some sort of probability match. Lastly, I'm guessing that you'd want a summary statistic that is the proportion of A records that link to the same B record in D1 as in D2. True? If this is true, then (and without knowing any thing else), I think I'd match (match files) D1 and D2 by A file ID and rename the B file Id in D2 so that one doesn't overwrite the other. I'm assume that every A file record is on both D1 and D2. If so, then a match files will work. If not, then I think I'd match files D1 and D2 to file A because A is the union of A records in D1 and D2. Then you could compare the values of the D1 B file id and the D2 B file id. Gene Maguin >>I need to compare two datasets, D1 and D2. These sets are the result of two linkage strategies, so they're each comprised of two datasets, A and B. A and B each have two id variables. I want to evaluate to what extent the two linkage strategies have led to different linkage pairs. In other words: what is the intersection between D1 and D2, what is the difference, and what do the differences look like. I've been thinking about using either UPDATE or MATCH FILES for this, something like: UPDATE FILE = d1 / RENAME (id_b = id_b_d1) / FILE = d2 / RENAME (id_b = id_b_d2) / BY = id_a . COMPUTE intersection = ( CONCAT (id_a, id_b_d1) = CONCAT (id_a, id_b_d2) ). Does this make sense? I'm getting kinda confused. Oh, and unfortunately I can't use SPSSINC COMPAREDATASETS. :-( *grinding his teeth* ;-) Cheers!! Albert-Jan ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
|
In reply to this post by Albert-Jan Roskam
Going back to the beginning -- at 06:49 AM 2/6/2009, Albert-jan Roskam
wrote:
I need to compare two datasets, D1 and D2. These sets are the result of two linkage strategies, so they're each comprised of two datasets, A and B. A and B each have two id variables. Right. A and B are not datasets of linked elements; they are datasets of the links themselves. [I need to know] what is the intersection between D1 and D2, what is the difference, and what do the differences look like. Supposing that in each file the two IDs are called ID_L and ID_R (for 'left' and 'right' member of the link). Then, what's wrong with (untested) MATCH FILES /FILE=A IN=IN_A /FILE=B IN=IN_B /ID_L ID_R. Complications that may arise include, . If a link can occur in both orders -- X as ID_L and Y as ID_R, or Y as ID_L and X as ID_R. The above won't find those are the same. If that's a problem, change all links to 'canonical' order, where ID_L < ID_R. . Transitive closure: if X is linked to Y and Y is linked to Z, does this mean X is, by definition, linked to Z? If so, those implicit links need to be calculated and inserted as explicit links; that's a bit of a bother. -Good luck to all, Richard ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
|
Is one of these an accurate interpretation?
1) you have 2 case by case matrices. (entities by entities). the entry in each cell is dichotomous. Either the two cases are linked or they are not. You wish to compare and contrast the two matrices. 2) you have 2 case by case matrices. (entities by entities). each cell is the distance (# of links, euclidean, etc) between the cases. You wish to compare and contrast the two matrices. 3) You have two hierarchical trees and you wish to compare and contrast them? Art Kendall Social Research Consultants Richard Ristow wrote: > Going back to the beginning -- at 06:49 AM 2/6/2009, Albert-jan Roskam > wrote: > >> I need to compare two datasets, D1 and D2. These sets are the result >> of two linkage strategies, so they're each comprised of two datasets, >> A and B. A and B each have two id variables. > > Right. A and B are not datasets of linked elements; they are datasets > of the links themselves. > >> [I need to know] what is the intersection between D1 and D2, what is >> the difference, and what do the differences look like. > > Supposing that in each file the two IDs are called ID_L and ID_R (for > 'left' and 'right' member of the link). Then, what's wrong with (untested) > > MATCH FILES > /FILE=A IN=IN_A > /FILE=B IN=IN_B > /ID_L ID_R. > > Complications that may arise include, > . If a link can occur in both orders -- X as ID_L and Y as ID_R, or Y > as ID_L and X as ID_R. The above won't find those are the same. If > that's a problem, change all links to 'canonical' order, where ID_L < > ID_R. > . Transitive closure: if X is linked to Y and Y is linked to Z, does > this mean X is, by definition, linked to Z? If so, those implicit > links need to be calculated and inserted as explicit links; that's a > bit of a bother. > > -Good luck to all, > Richard > ===================== To manage your subscription to SPSSX-L, send a > message to [hidden email] (not to SPSSX-L), with no body > text except the command. To leave the list, send the command SIGNOFF > SPSSX-L For a list of commands to manage subscriptions, send the > command INFO REFCARD ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Art Kendall
Social Research Consultants |
|
In reply to this post by Albert-Jan Roskam
Hi
I, too, have two datasets that I want to compare to see if they are identical. While I can see a way to do this in SPSS syntax (by merging files, adding variables, and looping thru each pair of variables, testing if they are the same), "SPSSINC COMPAREDATASETS" sound as though it might be a smarter way to do this job. Is this procedure easily available please? Thanks Clive. On Fri, 6 Feb 2009 03:49:52 -0800, Albert-jan Roskam <[hidden email]> wrote: >Hi all, > >I need to compare two datasets, D1 and D2. These sets are the result of two linkage strategies, so they're each comprised of two datasets, A and B. A and B each have two id variables. > > I want to evaluate to what extent the two linkage strategies have led to different linkage pairs. In other words: what is the intersection between D1 and D2, what is the difference, and what do the differences look like. > >I've been thinking about using either UPDATE or MATCH FILES for this, something like: >UPDATE FILE = d1 / RENAME (id_b = id_b_d1) / FILE = d2 / RENAME (id_b = id_b_d2) / BY = id_a . >COMPUTE intersection = ( CONCAT (id_a, id_b_d1) = CONCAT (id_a, id_b_d2) ). > >Does this make sense? I'm getting kinda confused. > >Oh, and unfortunately I can't use SPSSINC COMPAREDATASETS. :-( *grinding his teeth* ;-) > >Cheers!! >Albert-Jan > > > > ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
|
If you have Version 17, you can use the SPSSINC COMPARE DATASETS extension command. If you have Version 16, you can use the COMPDS extension command, which is similar but lacks a few features and does not have a dialog box interface.
These can be downloaded from SPSS Developer Central (www.spss.com/devcentral). They require the Python programmability plug-in. Installation instructions are in the download. The commands can compare the variable dictionaries and/or the cases in two datasets (you must open the data files and name them in SPSS before calling either of these commands). I must warn you that these commands are slow with wide datasets, so be patient. We are working on improving performance in the underlying Dataset class used by these commands. HTH, Jon Peck -----Original Message----- From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Clive Downs Sent: Wednesday, February 11, 2009 7:07 AM To: [hidden email] Subject: Re: [SPSSX-L] comparing two datasets Hi I, too, have two datasets that I want to compare to see if they are identical. While I can see a way to do this in SPSS syntax (by merging files, adding variables, and looping thru each pair of variables, testing if they are the same), "SPSSINC COMPAREDATASETS" sound as though it might be a smarter way to do this job. Is this procedure easily available please? Thanks Clive. On Fri, 6 Feb 2009 03:49:52 -0800, Albert-jan Roskam <[hidden email]> wrote: >Hi all, > >I need to compare two datasets, D1 and D2. These sets are the result of two linkage strategies, so they're each comprised of two datasets, A and B. A and B each have two id variables. > > I want to evaluate to what extent the two linkage strategies have led to different linkage pairs. In other words: what is the intersection between D1 and D2, what is the difference, and what do the differences look like. > >I've been thinking about using either UPDATE or MATCH FILES for this, something like: >UPDATE FILE = d1 / RENAME (id_b = id_b_d1) / FILE = d2 / RENAME (id_b = id_b_d2) / BY = id_a . >COMPUTE intersection = ( CONCAT (id_a, id_b_d1) = CONCAT (id_a, id_b_d2) ). > >Does this make sense? I'm getting kinda confused. > >Oh, and unfortunately I can't use SPSSINC COMPAREDATASETS. :-( *grinding his teeth* ;-) > >Cheers!! >Albert-Jan > > > > ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
| Free forum by Nabble | Edit this page |
