Correlation matrix is not positive definite

classic Classic list List threaded Threaded
10 messages Options
Reply | Threaded
Open this post in threaded view
|

Correlation matrix is not positive definite

viswa21
Dear all,
I am new to SPSS software. Trying to obtain principal component analysis using factor analysis. The correlation matrix is giving a warning that it is "not a positive definite and determinant is 0".

Please take a look at the xlsx file. The data is about fluorescence emission spectrum of bacteria. Variables are wavelength of excitation.
<a href="http:// http://ifile.it/9lvg18d/3%20bacteria.xlsx">


The data contains more variables than cases. I don't know whether is this causing the problem. I can not see KMO-Bartletts parameter even though i selected it in descriptive.

Reducing the variables solve this problem?? Any help would be invaluable to me. Thanks a million!!

Visu
Reply | Threaded
Open this post in threaded view
|

Re: Correlation matrix is not positive definite

David Marso
Administrator
The warning is *CERTAINLY* related to the fact that you have more variables than cases!!!
KMO is NOT defined for singular matrices as it requires the anti-image correlation coefficients which requires the inverse of the correlation matrix (undefined if singular).  These issues are NOT unique to SPSS software.  You may wish to review your factor analysis notes!
--
viswa21 wrote
Dear all,
I am new to SPSS software. Trying to obtain principal component analysis using factor analysis. The correlation matrix is giving a warning that it is "not a positive definite and determinant is 0".

Please take a look at the xlsx file. The data is about fluorescence emission spectrum of bacteria. Variables are wavelength of excitation.
<a href="http:// http://ifile.it/9lvg18d/3%20bacteria.xlsx">


The data contains more variables than cases. I don't know whether is this causing the problem. I can not see KMO-Bartletts parameter even though i selected it in descriptive.

Reducing the variables solve this problem?? Any help would be invaluable to me. Thanks a million!!

Visu
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me.
---
"Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis."
Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?"
Reply | Threaded
Open this post in threaded view
|

Re: Correlation matrix is not positive definite

Art Kendall
In reply to this post by viswa21
That is definitely the situation. Whenever you have more variables than cases, variables are perfectly predictable from each other.

Please describe your data in more detail.
How did you choose these cases?
Each is ___________.
What are the groups of cases?
It looks like some kind of repeated measures data. The measures appear to be iterated by wave length similar to the way time series data is iterated by time. Is this correct?
the wavelengths run from 250 to 430 somethings.  Somethings measure________.
Are all wavelengths 1 something apart?
Are these ipsative/compositional data? I.e., do they sum to a constant?
What is the purpose of your research?  What is the phenomenon your are interested in?  What do you want the data to tell you?

With this information list members are more likely to be able to make suggestions about approaches and/or places to post your query.


Art Kendall
Social Research Consultants

On 11/16/2011 11:25 PM, viswa21 wrote:
Dear all,
I am new to SPSS software. Trying to obtain principal component analysis
using factor analysis. The correlation matrix is giving a warning that it is
"not a positive definite and determinant is 0".

Please take a look at the xlsx file. The data is about fluorescence emission
spectrum of bacteria. Variables are wavelength of excitation.
<a class="moz-txt-link-freetext" href="http://">http:// http://ifile.it/9lvg18d/3%20bacteria.xlsx


The data contains more variables than cases. I don't know whether is this
causing the problem. I can not see KMO-Bartletts parameter even though i
selected it in descriptive.

Reducing the variables solve this problem?? Any help would be invaluable to
me. Thanks a million!!

Visu

--
View this message in context: http://spssx-discussion.1045642.n5.nabble.com/Correlation-matrix-is-not-positive-definite-tp4999980p4999980.html
Sent from the SPSSX Discussion mailing list archive at Nabble.com.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Art Kendall
Social Research Consultants
Reply | Threaded
Open this post in threaded view
|

Re: Correlation matrix is not positive definite

viswa21
Hi Art,
             The data was obtained using fluoro-spectrometer.

Our research purpose is to distinguish different bacteria from their fluorescence emission spectra. The emission spectrum is continuous from 260-430nm with 1nm increment. independent variable is wavelength and dependent variable is intensity of emitted light from bacteria.

So there might be many variables whose values are similar with each other, that is why the correlation matrix is not positive definite.

Now my question is how to choose important variable?? in other words which variables should i leave in data matrix, so that data matrix be reduced (with less no. of variables) and becomes positive definite and it lets us to carry out PCA which makes sense??

Please suggest me further... i need resolve this issue ASAP.
Thanks
Regards
Visu
Reply | Threaded
Open this post in threaded view
|

Re: Correlation matrix is not positive definite

Art Kendall
Our research purpose is to distinguish different bacteria from their
fluorescence emission spectra.
Then shouldn't kinds of bacteria be a variable in the model?  Is that your independent variable?

I am trying to get a handle on your design (model).
You  have 1 dependent variable -- intensity of fluorescence emission.  It is repeatedly measured  along a spectrum of wavelengths.  The repeats are at 1nm interval. Repeats are an independent (i.e., design) variable).
This is the same as time series except that the increments are of wavelength rather than time.

Is there only one value for the the stimulus to excite the fluorescence?

Many uses of spectra (repeated measures across a spectrum like time or wavelength) are to find or compare profiles.  Is this perhaps what you want to do?

Are you trying to find out if different bacteria have different profiles?  Do you have pre-identified groups of bacteria and want to know how their profiles can be discriminated from each other?
Or are you trying to find sets of profiles that are maximally similar with the set and maximally different between the sets?

I hope we are getting closer to understanding your underlying question.

Art Kendall
Social Research Consultants




On 11/28/2011 2:06 AM, viswa21 wrote:
Hi Art,
             The data was obtained using fluoro-spectrometer.

Our research purpose is to distinguish different bacteria from their
fluorescence emission spectra. The emission spectrum is continuous from
260-430nm with 1nm increment. independent variable is wavelength and
dependent variable is intensity of emitted light from bacteria.

So there might be many variables whose values are similar with each other,
that is why the correlation matrix is not positive definite.

Now my question is how to choose important variable?? in other words which
variables should i leave in data matrix, so that data matrix be reduced
(with less no. of variables) and becomes positive definite and it lets us to
carry out PCA which makes sense??

Please suggest me further... i need resolve this issue ASAP.
Thanks
Regards
Visu

--
View this message in context: http://spssx-discussion.1045642.n5.nabble.com/Correlation-matrix-is-not-positive-definite-tp4999980p5027974.html
Sent from the SPSSX Discussion mailing list archive at Nabble.com.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Art Kendall
Social Research Consultants
Reply | Threaded
Open this post in threaded view
|

Re: Correlation matrix is not positive definite

David Marso
Administrator
In reply to this post by viswa21
"that is why the correlation matrix is not positive definite. "
NO NO NO!!!!!!
It is NPD because you have FEWER CASES THAN VARIABLES!!!!!!!!

viswa21 wrote
Hi Art,
             The data was obtained using fluoro-spectrometer.

Our research purpose is to distinguish different bacteria from their fluorescence emission spectra. The emission spectrum is continuous from 260-430nm with 1nm increment. independent variable is wavelength and dependent variable is intensity of emitted light from bacteria.

So there might be many variables whose values are similar with each other, that is why the correlation matrix is not positive definite.

Now my question is how to choose important variable?? in other words which variables should i leave in data matrix, so that data matrix be reduced (with less no. of variables) and becomes positive definite and it lets us to carry out PCA which makes sense??

Please suggest me further... i need resolve this issue ASAP.
Thanks
Regards
Visu
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me.
---
"Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis."
Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?"
Reply | Threaded
Open this post in threaded view
|

Re: Correlation matrix is not positive definite

viswa21
Hi Art,
             I am trying to find sets of profiles that are maximally similar with the set and maximally different between the sets.

All the bacteria were excited at single wavelength i.e. 225nm. Because visually we cant differentiate  the spectrum, so we are doing PCA to find out clusters of different bacteria at different places on PC1 Vs PC2.

To be more precise, I took emission spectra for four different kinds of bacteria.

For each bacteria i took 7 scans or repetitions with slight changes in their spectrum by changing physical parameter such as growth time. so there were 7*4=28 spectra. Now i applied data reduction steps. After clicking OK in factor analysis popup window. The correlation matrix shows it is not positive definite.

One of my colleague who had some idea tells me that, i need to carry F-test or power test to determine number of variables which represents spectrum rather having less number of variables. so plz advise me if there are any solutions like that exits??

Thanks

Regards
visu
Reply | Threaded
Open this post in threaded view
|

Re: Correlation matrix is not positive definite

Art Kendall
We are closing in on this.
Is this correct? You have 3 Independent variables:
A type of bacterium between cases 4 levels a nominal level variable
B growth time repeated within cases 7 levels a ??? level variable
C wavelength repeated within cases a ratio level variable at 171 equal intervals

and one dependent variable intensity a ratio level variable.
That is 4788 cells with 1 measurement of intensity in each.

Is factor B repeats of a single parameter? i.e., are the levels ordered? At equal intervals?  7 distinct settings of different parameters, i.e., a nominal level variable?

Does your discipline use Q Factor Analysis?

Art Kendall
Social Research Consultants



On 11/28/2011 8:39 PM, viswa21 wrote:
Hi Art,
            * I am trying to find sets of profiles that are maximally
similar with the set and maximally different between the sets.*

All the bacteria were excited at single wavelength i.e. 225nm. Because
visually we cant differentiate  the spectrum, so we are doing PCA to find
out clusters of different bacteria at different places on PC1 Vs PC2.

To be more precise, I took emission spectra for four different kinds of
bacteria.

For each bacteria i took 7 scans or repetitions with slight changes in their
spectrum by changing physical parameter such as growth time. so there were
7*4=28 spectra. Now i applied data reduction steps. After clicking OK in
factor analysis popup window. The correlation matrix shows it is not
positive definite.

One of my colleague who had some idea tells me that, i need to carry F-test
or power test to determine number of variables which represents spectrum
rather having less number of variables. so plz advise me if there are any
solutions like that exits??

Thanks

Regards
visu

--
View this message in context: http://spssx-discussion.1045642.n5.nabble.com/Correlation-matrix-is-not-positive-definite-tp4999980p5030830.html
Sent from the SPSSX Discussion mailing list archive at Nabble.com.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Art Kendall
Social Research Consultants
Reply | Threaded
Open this post in threaded view
|

Re: Correlation matrix is not positive definite

viswa21
Hi Art,
               
Question 1:
Is this correct? You have 3 Independent variables:
A type of bacterium between cases 4 levels a nominal level variable
B growth time repeated within cases 7 levels a ??? level variable
C wavelength repeated within cases a ratio level variable at 171 equal intervals

and one dependent variable intensity a ratio level variable.
That is 4788 cells with 1 measurement of intensity in each.

Answer 1: yes, it is correct.


Question 2: Is factor B repeats of a single parameter? i.e., are the levels ordered? At equal intervals?  7 distinct settings of different parameters, i.e., a nominal level variable?

Answer 2: Growth time is varied in minutes, but not of equal intervals. Answer seems like nominal level variable.

Our research field do not use Q factor analysis.Leblanc_Monitoring_the_identity_of_bacteria_using_their_intrinsic_fluorescence.pdf

Please take a look at the PDF file attached.

Thanks for your help.
Reply | Threaded
Open this post in threaded view
|

Re: Correlation matrix is not positive definite

Art Kendall
The £uorescence spectra have been normalised by reducing
the area under each spectrum to a value of 1according
to Bertrand and Scotter [11].
This results in compositional data (similar to making the data ipsative) so  the R matrix is not positive definite for this reason alone.  However, this is a step in doing a Q matrix or dual scaling (aka correspondence analysis).

Did you do this?  Do you have access to that article? 
The words standardize and normalize have many different meanings and are frequently used interchangeably even within disciplines. Please
click <help> ; type "Proximities" in the edit box; scroll down to and click <standardize> under related topics.  Are one of these what is done in your field? If not what is done?

Answer 2: Growth time is varied in minutes, but not of equal intervals.
Then it is a repeated factor with unequal intervals, You may be able to use the actual intervals in a model if you have them. B is a least ordinal.

In that article given that cases are plotted against the PCs it is possible that the authors did Q factor analysis (using a PC kind of factor analysis).
In that instance the matrix is transposed and variables become cases and cases become variables. 

It may also be possible that they just input the correlation matrix. If I recall correctly when a correlation matrix is used as input N is not specified.
Even so, there would be 2 reasons for the matrix to be not positive definite.  One is because the data within a case sum to a constant

If I am reading things correctly you have
You have 3 Independent variables:
A type of bacterium between cases 4 levels a nominal level variable
B growth time repeated within cases 7 levels an ordinal level variable with unequal intervals of (un)known size.
C wavelength repeated within cases a ratio level variable at 171 equal
intervals
and one dependent variable intensity a ratio level variable.
That is 4788 cells with 1 measurement of intensity in each.
The degrees of freedom are enhanced by the repeated nature of your data.

With 1 measurement per cell the ABC interaction cannot be distinguished form error (i.e., it must be pooled with error).
If your goal is to find points along the spectrum at which the the types of bacteria can be discriminated  then your focus would be on the AC interaction.
AB, and BC could then be pooled with error.


When there is an up spike or down spike in the spectrum does it occur at one measurement value on C or does it occur on a few?
Do you expect that different bacteria will have different sets of spikes? Or do you expect something else?


Is it at all practical to generate another set of 4788 cells? Is this a tedious or highly automated process?

There are still some exploratory possibilities.  I do not intend to frustrate you. Suggestions can be more focused when the list has a little better handle on the what the situation is. Perhaps list members more experienced in specifying MIXED or GLM models could make suggestions.

Art Kendall
Social Research Consultants

On 11/29/2011 10:16 PM, viswa21 wrote:
Hi Art,

Question 1:
Is this correct? You have 3 Independent variables:
A type of bacterium between cases 4 levels a nominal level variable
B growth time repeated within cases 7 levels a ??? level variable
C wavelength repeated within cases a ratio level variable at 171 equal
intervals

and one dependent variable intensity a ratio level variable.
That is 4788 cells with 1 measurement of intensity in each.

Answer 1: yes, it is correct.


Question 2: Is factor B repeats of a single parameter? i.e., are the levels
ordered? At equal intervals?  7 distinct settings of different parameters,
i.e., a nominal level variable?

Answer 2: Growth time is varied in minutes, but not of equal intervals.
Answer seems like nominal level variable.

Our research field do not use Q factor analysis.
http://spssx-discussion.1045642.n5.nabble.com/file/n5034443/Leblanc_Monitoring_the_identity_of_bacteria_using_their_intrinsic_fluorescence.pdf
Leblanc_Monitoring_the_identity_of_bacteria_using_their_intrinsic_fluorescence.pdf

Please take a look at the PDF file attached.

Thanks for your help.


--
View this message in context: http://spssx-discussion.1045642.n5.nabble.com/Correlation-matrix-is-not-positive-definite-tp4999980p5034443.html
Sent from the SPSSX Discussion mailing list archive at Nabble.com.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Art Kendall
Social Research Consultants