I have a large number of football-matches where a home team plays against an away team (see following example).
Home / Away Bayern München - Hertha BSC Augsburg - Hamburger SV 1. FC Köln - Bremen Hertha - Augsburg Bremen - Bayern München ... Now I try to find a way of 'clustering' those teams who ever played against each other. (Just using the league -as an example- is not possible, since most teams also play in different leagues, some are newcommers, ... .) Each team has a unique Team ID. My solution with clustering / factor analysis is heavily complex and need python. Isn't there an easy solution to the problem?
Dr. Frank Gaeth
|
Are there multiple matches between a given pair of
teams?�
Roughly how many teams are there altogether? How is the input data now arranged? Are they in an SPSS system file, a spread sheet, a text file, etc. Are there a number of seasons? Why do you think you need PYTHON? Once you have a matrix with (all teams as Home) by (all teams as away) what else do you want to know? Are there occasions where two teams would play at a third location? Are the character strings the unique IDs?� Are you very sure that there are no variations like spacing, diacriticals, etc. in the ID for a specific team? Art Kendall Social Research Consultants On 5/3/2012 5:28 AM, drfg2008 wrote: ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARDI have a large number of football-matches where a home team plays against an away team (see following example). Home / Away Bayern München - Hertha BSC Augsburg - Hamburger SV 1. FC Köln - Bremen Hertha - Augsburg Bremen - Bayern München ... Now I try to find a way of 'clustering' those teams who ever played against each other. (Just using the league -as an example- is not possible, since most teams also play in different leagues, some are newcommers, ... .) Each team has a unique Team ID. My solution with clustering / factor analysis is heavily complex and need python. Isn't there an easy solution to the problem? ----- Dr. Frank Gaeth FU-Berlin -- View this message in context: http://spssx-discussion.1045642.n5.nabble.com/matching-problem-tp5682738.html Sent from the SPSSX Discussion mailing list archive at Nabble.com. ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Art Kendall
Social Research Consultants |
Administrator
|
In reply to this post by drfg2008
Please define your question with greater specificity and try again Frank!
I have NO idea of what your problem is and I am NOT going to try to guess what you mean by 'clustering' in this case or try to make *ANY* sense of your 'solution with clustering / factor analysis is heavily complex and need python.' . Perhaps you need to outsource your ongoing convoluted project!
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me. --- "Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis." Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?" |
For starters, what is the unit of analysis: a team or a match?
If a team, then you want a matrix that ist all teams by all teams to see how often a team has played another. I assume that such a matrix will be somewhat sparse but you might be able to determine whether there is a tendency for certain teams to play other teams (that is the point of your analysis, right?). You might be able to "reduce" it if you want. If a match and the first entry is home team, then you want to see often a home team played an opponent. This seems to me to be a issue of nesting and whether the degree of nesting is the same for all home teams. But, as Daivd says, what exactly are you asking for? I have no idea what you could be clustering/factor analyzing. -Mike Palij New York University [hidden email] On Thu, May 3, 2012 at 7:50 AM, David Marso <[hidden email]> wrote: > Please define your question with greater specificity and try again Frank! > I have NO idea of what your problem is and I am NOT going to try to guess > what you mean by 'clustering' in this case or try to make *ANY* sense of > your 'solution with clustering / factor analysis is heavily complex and need > python.' . Perhaps you need to outsource your ongoing convoluted project! > > > > drfg2008 wrote >> >> I have a large number of football-matches where a home team plays against >> an away team (see following example). >> >> Home / Away >> >> Bayern München - Hertha BSC >> Augsburg - Hamburger SV >> 1. FC Köln - Bremen >> Hertha - Augsburg >> Bremen - Bayern München >> >> ... >> >> Now I try to find a way of 'clustering' those teams who ever played >> against each other. >> (Just using the league -as an example- is not possible, since most teams >> also play in different leagues, some are newcommers, ... .) Each team has >> a unique Team ID. >> >> My solution with clustering / factor analysis is heavily complex and need >> python. Isn't there an easy solution to the problem? ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
This post was updated on .
In reply to this post by Art Kendall
Are there multiple matches between a given pair of teams?�
Yes Roughly how many teams are there altogether? A few hundred. However, the number may vary. How is the input data now arranged? Are they in an SPSS system file, a spread sheet, a text file, etc. SPSS sav Are there a number of seasons? Yes, but this would not have to be considered. Why do you think you need PYTHON? Because the number of teams vary and I used a part of my 'dummy-coding' program in Python do create a matrix. Once you have a matrix with (all teams as Home) by (all teams as away) what else do you want to know? I do not want to differentiate between home and away. I just want to get some sort of group membership for each team. In that group teams have played against each other (no matter if home or away) at least one time. Are there occasions where two teams would play at a third location? Location does not matter. Usually the teams play either at home in their own stadium or away at the home-stadium of the away team. Are the character strings the unique IDs?� Are you very sure that there are no variations like spacing, diacriticals, etc. in the ID for a specific team? There are unique IDs for each team. No variations. Here is an example file (probably this file has no unique ID, but the normal files do have one) www.frag-einen-statistiker.de/ibm_spss/Football_example.xls
Dr. Frank Gaeth
|
Administrator
|
Frank,
The 'example' file you referenced has *OVER* 5000 Team-ids!!!! Many teams occur only once, many up to 50 times. That is *MUCH* more than a few hundred. Still nor sure what you are up to. You should look into the MATRIX language, maybe that will be useful! D --
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me. --- "Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis." Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?" |
This post was updated on .
The 'example' file you referenced has *OVER* 5000 Team-ids!!!!
oh, such much. As I said afraid. It's a little complex. (usually I do not take more than a few hundreds from European leagues)
Dr. Frank Gaeth
|
Administrator
|
Well Frank,
Maybe you need to be a little more clear about the nature of this 'complexity' and how 'a few hundred' becomes over 5000? You expect people on this group to help you but you always seem to be playing games. Maybe as a start get RID of all the teams which have only 1,2 or 3 games. Really not sure what you expect people in this group to do for you. Maybe you need to do a little thinking for yourself for a change?
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me. --- "Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis." Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?" |
David: I am absolutely serious about that.
This could be the solution: a matrix that is all teams by all teams to see how often a team has played another. If the great number of teams should be the problem, it might be possible to reduce it according to the country code. But how to create such a Matrix? Frank
Dr. Frank Gaeth
|
Hello,
id like to suggest following data restructure: Original ID1 ID2 Bayern München - Hertha BSC Augsburg - Hamburger SV Restructured ID1 ID2 HOME Bayern München - Hertha BSC 1 Hertha BSC - Bayern München 2 Augsburg - Hamburger SV 1 Hamburger SV - Augsburg 2 Every match is recorded twice, indicator 1/2 says which team played home. Change results so that win = team ID1 won. Now Aggregate by ID1 ID2 (or crosstabs) shows all matches played between ID1 and ID2 teams, it is easy to find how many times played, number of win/loose, results home/away etc. Regards jindra > ------------ Původní zpráva ------------ > Od: drfg2008 <[hidden email]> > Předmět: Re: matching problem > Datum: 04.5.2012 10:52:59 > ---------------------------------------- > David: I am absolutely serious about that. > > This could be the solution: a matrix that is all teams by all teams > to see how often a team has played another. > > If the great number of teams should be the problem, it might be possible to > reduce it according to the country code. > > But how to create such a Matrix? > > Frank > > > > ----- > Dr. Frank Gaeth > FU-Berlin > > -- > View this message in context: > http://spssx-discussion.1045642.n5.nabble.com/matching-problem-tp5682738p5685166.html > Sent from the SPSSX Discussion mailing list archive at Nabble.com. > > ===================== > To manage your subscription to SPSSX-L, send a message to > [hidden email] (not to SPSSX-L), with no body text except the > command. To leave the list, send the command > SIGNOFF SPSSX-L > For a list of commands to manage subscriptions, send the command > INFO REFCARD > > > ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
Administrator
|
In reply to this post by drfg2008
".....But how to create such a Matrix?"
----------- GET DATA /TYPE=XLS /FILE='C:\Users\David Marso\Desktop\Football_example.xls'. AGGREGATE OUTFILE * / BREAK team1 team2 / N=N. AUTORECODE team1 team2 /INTO t_id.1 t_id.2 /GROUP. PRESERVE. SET MXLOOPS=100000. MATRIX. GET DATA / FILE * / VAR n t_id.1 t_id.2 . COMPUTE SIZEDAT=RMAX(CMAX(DATA(:,2:3))). COMPUTE PAIRED=MAKE(SIZEDAT,SIZEDAT,0). LOOP #=1 TO NROW(DATA). + COMPUTE PAIRS(DATA(#,2),DATA(#,3))=PAIRS(DATA(#,2),DATA(#,3))+DATA(#,1). END LOOP. COMPUTE PAIRS=PAIRS+T(PAIRS). SAVE PAIRS / OUTFILE *. END MATRIX. RESTORE.
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me. --- "Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis." Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?" |
Administrator
|
AGGREGATE OUTFILE * / BREAK team1 team2 / N=SUM(N).
But don't get too excited Frank: N Frequency Percent Valid Percent Cumulative Percent Valid 1.00 16562 91.9 91.9 91.9 2.00 1368 7.6 7.6 99.5 3.00 92 .5 .5 100.0 4.00 5 .0 .0 100.0 5.00 1 .0 .0 100.0 6.00 1 .0 .0 100.0 Total 18029 100.0 100.0 The resulting matrix is so sparse as to be virtually useless for any sort of analytical purpose. with the 5000 teams you have 25,000,000 entries.
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me. --- "Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis." Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?" |
Administrator
|
In reply to this post by David Marso
Even simpler after a little bit of thought.
POST AGGREGATE/AUTORECODE... -------------------------------------- MATRIX. - GET DATA / FILE * / VAR n t_id.1 t_id.2 . - COMPUTE PAIRED=MAKE(MMAX(DATA),MMAX(DATA),0). - LOOP #=1 TO NROW(DATA). + COMPUTE PAIRS(DATA(#,2),DATA(#,3))= DATA(#,1). - END LOOP. - COMPUTE PAIRS=PAIRS+T(PAIRS). - SAVE PAIRS / OUTFILE *. END MATRIX. RESTORE.
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me. --- "Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis." Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?" |
Thanks Jerabek Jindrich
Thanks David Error Message: Run MATRIX procedure: >Error encountered in source line # 61 >Error # 12492 >An attempt has been made to use previously undefined matrix (or scalar). >Execution of this command stops. Matrix - 'PAIRS' is undefined >Error encountered in source line # 61 >Error # 12331 >Left hand side is undefined for subscripted COMPUTE. >Error encountered in source line # 63 >Error # 12492 >An attempt has been made to use previously undefined matrix (or scalar). >Execution of this command stops. Matrix - 'PAIRS' is undefined >Error encountered in source line # 63 >Error # 12359 >Undefined operand for transpose. >Error encountered in source line # 64 >Error # 12492 >An attempt has been made to use previously undefined matrix (or scalar). >Execution of this command stops. Matrix - 'PAIRS' is undefined >Error encountered in source line # 64 >Error # 12459 >The matrix to save is undefined. ------ END MATRIX -----
Dr. Frank Gaeth
|
Administrator
|
Change PAIRED to PAIRS.
* I changed name after I pasted ;-( This is something I should think you could have easily fixed by simply reading the code? ---
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me. --- "Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis." Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?" |
Free forum by Nabble | Edit this page |