Matrix optimisation style syntax help

classic Classic list List threaded Threaded
27 messages Options
12
Reply | Threaded
Open this post in threaded view
|

Re: Matrix optimisation style syntax help

San K
SSD - sum of squared differences is calculated between the participant
and control. That means I can't separate participant-control pair.

Participants and controls don't have nay duplicates on their own.
However, each participant is matched with all possible controls and
then only SSD is calculated. This would produce duplicate controls.

Thank you for looking at this problem.

Regards,
San

On Wed, Sep 9, 2009 at 4:19 AM, Gene Maguin <[hidden email]> wrote:

> San,
>
> On one hand it seems like a solution to your question is pretty simple given
> a couple of assumptions. 1) Participants and controls can be separated into
> two datasets. 2) SSD is a property of controls only. Given this, sort the
> controls file by SSD (I assume that any duplicates have already been removed
> from the controls file.), select the first 4000 controls and match those
> back to the participants without using the BY subcommand.
>
> On the other hand, I'll bet this scheme isn't correct. Could that be true?
> If so, what's missing in the explanation of the problem?
>
> Gene Maguin
>
>
>>>I'm wondering if anyone can help me with an algorithm (in spss) used
> for matching participants to controls.
>
> I have about 4000 participants and 700,000 potential controls.
>
> Just to make it simple, say that we have a participant, potential
> control and Squared differences as:
>
> DATA LIST FREE / Participant (A10) pControl (A10) SSD.
> BEGIN DATA
> P1 C1 0.7
> P1 C2 0.4
> P1 C3 0.99
> P1 C4 0.29
> P2 C1 0.56
> P2 C2 0.39
> P2 C3 0.2
> P2 C4 0.32
> P3 C2 0.37
> P3 C4 0.27
> P3 C6 0.11
> P3 C7 0.03
> P4 C1 0.04
> P4 C2 0.04
> P4 C4 0.05
> P4 C6 0.89
> END DATA.
>
> Now, I need to find a good NON repeated control for each participant
> where the total of SSD is minimised. Key points here are 'non
> repeated' and 'minimised total SSD'.
>
> Currently I'm simply sorting by SSD. Then picking from the top and
> making sure I don't pick the same control again. This process doesn't
> necessarily minimise the total sum of squared differences.
>
> I'm reasonably comfortable LOOPS, DO REAPEATS and macros.
>
> Any help would be greatly appreciated.
>
> Regards,
> San
>
> =====================
> To manage your subscription to SPSSX-L, send a message to
> [hidden email] (not to SPSSX-L), with no body text except the
> command. To leave the list, send the command
> SIGNOFF SPSSX-L
> For a list of commands to manage subscriptions, send the command
> INFO REFCARD
>

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Matrix optimisation style syntax help

Maguin, Eugene
San,

Ok. This is much more difficult. So, let's get some facts out on the table.
Tell me about the SSD computation. Not the formula, that's easy. But, how
many variables are included in the SSD computation? May I assume that
participants and controls have exactly the same variables that will be used
in the SSD computation? I know that's a stupid question but stranger things
have been done. And, are the variables all dichotomous or quasi-continuous
(i.e., likert) or continuous (e.g., height or weight)? Last, and you stated
this earlier but I don't recall, how many cases and how many controls?


Gene Maguin

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Matrix optimisation style syntax help

San K
Thank you for looking at this problem.

You can assume the SSD is already calculated. SSD is calculated based
on previous 12 months consumption. I had only SSD in my sample data
list to make things simple. In reality I include correlation values
and few other variables depending on the task. You could assume SSD as
a 'score' in this problem.
Basically,
1. Participants (4000) and potential controls (600,000) are in separate files.
2. Each participant is matched with each potential control after
spiting into sub groups (about 50 subgroups). That is about 50 million
records:  (4000 x 600000)/50
3. Scoring variable (Eg: SSD) is calculated. This is the sample data
list I provided with just pairs and SSD.

Now a best match needs to be found for each participant under
following conditions:
1. A perticular control can not be used more than once.
2. Total SSD should be minimised.

All these time, I didn't worry about minimising the total SSD.

Minimisation of the total SSD is an integer programming problem as
formulated below.
For n participants and k potential controls (where k>n)
Let aij = 1 if Control (j) is assigned to Participant (i), else 0.

Then, Minimise
n      k
Σ     Σ       aij*SSDij
i=1  j=1

where
 Σ a1j = 1
  j

Σ a2j = 1
 j ..
  ..

Σ anj = 1
j

and
Σ ai1 ≤ 1
i
Σ ai2 ≤ 1
i ..
..
Σ aik ≤ 1
j
and
aij = 0 or 1 for all i,j

I have no idea how to implement this in SPSS.

Regards,
San

On Fri, Sep 11, 2009 at 11:30 PM, Gene Maguin <[hidden email]> wrote:

> San,
>
> Ok. This is much more difficult. So, let's get some facts out on the table.
> Tell me about the SSD computation. Not the formula, that's easy. But, how
> many variables are included in the SSD computation? May I assume that
> participants and controls have exactly the same variables that will be used
> in the SSD computation? I know that's a stupid question but stranger things
> have been done. And, are the variables all dichotomous or quasi-continuous
> (i.e., likert) or continuous (e.g., height or weight)? Last, and you stated
> this earlier but I don't recall, how many cases and how many controls?
>
>
> Gene Maguin
>
> =====================
> To manage your subscription to SPSSX-L, send a message to
> [hidden email] (not to SPSSX-L), with no body text except the
> command. To leave the list, send the command
> SIGNOFF SPSSX-L
> For a list of commands to manage subscriptions, send the command
> INFO REFCARD
>

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Matrix optimisation style syntax help

Peck, Jon
SPSS is not really set up to handle integer programming problems, but from a practical point of view, is this really necessary?  You have 4000 participants to match and 600,000 possibilities.  Have you tried to see whether you can get an exact match for each control using a simpler approach?  With that many controls, choosing any control that exactly matches the case (without replacement) would have pretty good behavior, I'd expect, if the two samples have a reasonably similar distribution.

-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of San K
Sent: Sunday, September 13, 2009 6:03 PM
To: [hidden email]
Subject: Re: [SPSSX-L] Matrix optimisation style syntax help

Thank you for looking at this problem.

You can assume the SSD is already calculated. SSD is calculated based
on previous 12 months consumption. I had only SSD in my sample data
list to make things simple. In reality I include correlation values
and few other variables depending on the task. You could assume SSD as
a 'score' in this problem.
Basically,
1. Participants (4000) and potential controls (600,000) are in separate files.
2. Each participant is matched with each potential control after
spiting into sub groups (about 50 subgroups). That is about 50 million
records:  (4000 x 600000)/50
3. Scoring variable (Eg: SSD) is calculated. This is the sample data
list I provided with just pairs and SSD.

Now a best match needs to be found for each participant under
following conditions:
1. A perticular control can not be used more than once.
2. Total SSD should be minimised.

All these time, I didn't worry about minimising the total SSD.

Minimisation of the total SSD is an integer programming problem as
formulated below.
For n participants and k potential controls (where k>n)
Let aij = 1 if Control (j) is assigned to Participant (i), else 0.

Then, Minimise
n      k
Σ     Σ       aij*SSDij
i=1  j=1

where
 Σ a1j = 1
  j

Σ a2j = 1
 j ..
  ..

Σ anj = 1
j

and
Σ ai1 ≤ 1
i
Σ ai2 ≤ 1
i ..
..
Σ aik ≤ 1
j
and
aij = 0 or 1 for all i,j

I have no idea how to implement this in SPSS.

Regards,
San

On Fri, Sep 11, 2009 at 11:30 PM, Gene Maguin <[hidden email]> wrote:

> San,
>
> Ok. This is much more difficult. So, let's get some facts out on the table.
> Tell me about the SSD computation. Not the formula, that's easy. But, how
> many variables are included in the SSD computation? May I assume that
> participants and controls have exactly the same variables that will be used
> in the SSD computation? I know that's a stupid question but stranger things
> have been done. And, are the variables all dichotomous or quasi-continuous
> (i.e., likert) or continuous (e.g., height or weight)? Last, and you stated
> this earlier but I don't recall, how many cases and how many controls?
>
>
> Gene Maguin
>
> =====================
> To manage your subscription to SPSSX-L, send a message to
> [hidden email] (not to SPSSX-L), with no body text except the
> command. To leave the list, send the command
> SIGNOFF SPSSX-L
> For a list of commands to manage subscriptions, send the command
> INFO REFCARD
>

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Matrix optimisation style syntax help

Johnny Amora
(I apologize for cross-posting)
 
Do you know of a university that offers ONLINE Program, particularly PhD degree in Measurement, Evaluation and Research Methodology?  My colleague is looking for it.
 
Johnny


Bring your friends to the fun.
Invite your friends from Hotmail, Gmail to Yahoo! Mail today!
Reply | Threaded
Open this post in threaded view
|

Measurement, Evaluation and Research Methodology

John F Hall
Johnny
 
Not sure about on-line teaching, but Malcolm Williams at Plymouth University (UK) is interested in this area and has published a book or two.  He's one of my old students and we have a standing joke about measurement: he sent me a copy of his first book and had scribbled a note about it inside the cover.  There's also an e-journal that you may find of interest.
 
Check out:
 
 
John Hall

 

----- Original Message -----
Sent: Monday, September 14, 2009 4:28 AM
Subject: Re: Matrix optimisation style syntax help

(I apologize for cross-posting)
 
Do you know of a university that offers ONLINE Program, particularly PhD degree in Measurement, Evaluation and Research Methodology?  My colleague is looking for it.
 
Johnny


Bring your friends to the fun.
Invite your friends from Hotmail, Gmail to Yahoo! Mail today!
Reply | Threaded
Open this post in threaded view
|

Re: Matrix optimisation style syntax help

San K
In reply to this post by Peck, Jon
You are right Jon. It is not really necessary in this case.
Currently I'm doing like you said. "choosing any control that exactly
matches the case (without replacement)" and it does produce pretty
good behavior.

I'm just preparing for the next evaluation where I may not have the
luxury of having huge participants and possible controls.

Regards,
San


On Mon, Sep 14, 2009 at 11:58 AM, Peck, Jon <[hidden email]> wrote:

> SPSS is not really set up to handle integer programming problems, but from a practical point of view, is this really necessary? � You have 4000 participants to match and 600,000 possibilities. � Have you tried to see whether you can get an exact match for each control using a simpler approach? � With that many controls, choosing any control that exactly matches the case (without replacement) would have pretty good behavior, I'd expect, if the two samples have a reasonably similar distribution.
>
> -----Original Message-----
> From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of San K
> Sent: Sunday, September 13, 2009 6:03 PM
> To: [hidden email]
> Subject: Re: [SPSSX-L] Matrix optimisation style syntax help
>
> Thank you for looking at this problem.
>
> You can assume the SSD is already calculated. SSD is calculated based
> on previous 12 months consumption. I had only SSD in my sample data
> list to make things simple. In reality I include correlation values
> and few other variables depending on the task. You could assume SSD as
> a 'score' in this problem.
> Basically,
> 1. Participants (4000) and potential controls (600,000) are in separate files.
> 2. Each participant is matched with each potential control after
> spiting into sub groups (about 50 subgroups). That is about 50 million
> records: � (4000 x 600000)/50
> 3. Scoring variable (Eg: SSD) is calculated. This is the sample data
> list I provided with just pairs and SSD.
>
> Now a best match needs to be found for each participant under
> following conditions:
> 1. A perticular control can not be used more than once.
> 2. Total SSD should be minimised.
>
> All these time, I didn't worry about minimising the total SSD.
>
> Minimisation of the total SSD is an integer programming problem as
> formulated below.
> For n participants and k potential controls (where k>n)
> Let aij = 1 if Control (j) is assigned to Participant (i), else 0.
>
> Then, Minimise
> n �  �  � k
> Σ �  �  Σ �  �  �  aij*SSDij
> i=1 � j=1
>
> where
> � Σ a1j = 1
> � j
>
> Σ a2j = 1
> � j ..
> � ..
>
> Σ anj = 1
> j
>
> and
> Σ ai1 ≤ 1
> i
> Σ ai2 ≤ 1
> i ..
> ..
> Σ aik ≤ 1
> j
> and
> aij = 0 or 1 for all i,j
>
> I have no idea how to implement this in SPSS.
>
> Regards,
> San
>
> On Fri, Sep 11, 2009 at 11:30 PM, Gene Maguin <[hidden email]> wrote:
>> San,
>>
>> Ok. This is much more difficult. So, let's get some facts out on the table.
>> Tell me about the SSD computation. Not the formula, that's easy. But, how
>> many variables are included in the SSD computation? May I assume that
>> participants and controls have exactly the same variables that will be used
>> in the SSD computation? I know that's a stupid question but stranger things
>> have been done. And, are the variables all dichotomous or quasi-continuous
>> (i.e., likert) or continuous (e.g., height or weight)? Last, and you stated
>> this earlier but I don't recall, how many cases and how many controls?
>>
>>
>> Gene Maguin
>>
>> =====================
>> To manage your subscription to SPSSX-L, send a message to
>> [hidden email] (not to SPSSX-L), with no body text except the
>> command. To leave the list, send the command
>> SIGNOFF SPSSX-L
>> For a list of commands to manage subscriptions, send the command
>> INFO REFCARD
>>
>
> =====================
> To manage your subscription to SPSSX-L, send a message to
> [hidden email] (not to SPSSX-L), with no body text except the
> command. To leave the list, send the command
> SIGNOFF SPSSX-L
> For a list of commands to manage subscriptions, send the command
> INFO REFCARD
>

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
12