Loading data from Excel and putting it into a matrix.

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Loading data from Excel and putting it into a matrix.

Sean McKenzie
Hi, my name is Sean McKenzie, and I am up here in Anchorage Alaska.

I am a new user.  My previous experience is mainly using GAUSS to program, a
PASCAL like language especially suited to matrix operations, and EVIEWS and
its precursor TSP, pretty much Econometrics stuff.  I am by background, and
Economist (booo, hiss).

But now I am working on Child Care issues (hip, hip, hurray!), and we
finally bought SPSS, which I actually recommended despite my lack of
knowledge, because I know that it is commonly used in the Social Sciences
arena outside of Economics.

Most of my experience with lists etal are mainly from playing wargames.  (So
if any of you are up here in Alaska send me an email at:
[hidden email] and may be we can play.)

Given my backround, I believe for me it will be easier to work with matrix
operations, with which I am essentially thoroughly familiar.

I am just starting, and a routine for me in learning a new program is first
to read in data and then spit it back out and confirm that the spit out data
is exactly the same as the spit in.

I did that easily enough.

What I want to do in general is load data from an excel spreadsheet and put
it into a matrix.

So far to me it looks like the easiest way to do this is to upload the data,
save it as an spss.sav matrix file and then upload it again.  I am looking
for a more efficient way.  To directly upload data from excel as a matrix or
to convert excel data to a matrix without first having to save/re-upload.

My usual second experiment is to upload data, pull out the field names and
then spit those out.  Something I could not figure out how to do without
using matrices.

One  of the desirable things is for me not to have to explicitly know the
names of my fields or their order prior to uploading data/running a program,
and not to need to know the row or column length of my matrices, as even
when using the "same data" these typically change, even from day to day.

Typically, thinking in GAUSS, the commands are something like

LOADMATRIX M=<file>
Names=M[1,.]
Write Names

The above is pulling out the field names in row 1 of my Excel SS.

The above is just a test, but in practical terms I would then assign
variable names using the first row of the matrix...since I have perhaps a
100 or coupla hundred variables in my database, I load a matrix of variable
names into the program and using do loops I compare the list of variable
names to the names I uploaded, so when

Variablenames[i,1]=Names[1,i]  and suppose the string there was GDP

GDP=M[2:Countrows(M),i]


In the end I will want to load data directly from databse, which is an
access database, but it is just a file on my directory, no password et al.

For now I'll settle for downloading data into excel and then taking it into
SPSS.  I can also import my data from text files.

I know that was long, but can any body please help?

As an aside, one of the things I find somewhat difficult to deal with is:

In GAUSS the = sign is an "ASSIGNMENT" operator and works the same for
MATRICES and Scalars.  In general in GAUSS there was no special need to
declare something to be a matrix, in general it treated everthing like a
matrix, and a scalar is a [1,1] matrix.

Is there something similar in SPSS, an assignment operator?

Is it possible to operate in matrix style/matrix notation without going
through the whole matrix procedure?
Reply | Threaded
Open this post in threaded view
|

Re: Loading data from Excel and putting it into a matrix.

Richard Ristow
At 01:03 PM 9/8/2006, Sean McKenzie wrote:

>I am by background, and Economis (booo, hiss).

Hey, it happens to the best of us.


>But now I am working on Child Care issues (hip, hip, hurray!), and we
>finally bought SPSS, because I know that it is commonly used in the
>Social Sciences arena outside of Economics.
>
>Given my backround, I believe for me it will be easier to work with
>matrix operations, with which I am essentially thoroughly familiar.

Maybe, and by all means do. But, depending on what you need to do, look
at the native statistical and reporting procedures in SPSS, too. They
were put there for this, after all. By all means post, with what
statistical operations you want to do, and somebody likely can make
suggestions.

>My usual second experiment is to upload data, pull out the field names
>and then spit those out.  Something I could not figure out how to do
>without using matrices.

Well, if you just want to display the variable names, command DISPLAY
VARIABLES should do it for you. If you want to access the variable
names as string values, it's harder, and not particularly natural.
You're right, that they're there in a matrix data file; you can read in
your data, take the correlation matrix with CORRELATIONS or some such,
and read the matrix data file as if it were a regular file. Of course,
that loses the names of string variables.

Better: You probably have SPSS 14, so you can access the data
dictionary through a Python program. Or, see my post "Data dictionary
in SPSS", Wed, 6 Sep 2006 13:30:05 -0400.

>What I want to do in general is load data from an excel spreadsheet
>and put it into a matrix. So far to me it looks like the easiest way
>to do this is to upload the data, save it as an spss.sav matrix file
>and then upload it again.

There's your problem. An SPSS matrix, as in BEGIN MATRIX/ END MATRIX,
has nothing to do with matrix-format datasets.

I assume you want a matrix whose rows are the cases, or records, in
your file, and whose columns are the variables, or fields. For that,
you want a regular-format SPSS file, including the working file. See
(you probably have seen) the GET statement in the matrix language - see
section MATRIX in the Command Syntax reference. Load your data from the
Excel spreadsheet, issue a MATRIX command, and get the data.

(I'd save it to an SPSS format file, too; I'd do that for any data read
from an outside source.)

>One  of the desirable things is for me not to have to explicitly know
>the names of my fields or their order prior to uploading data/running
>a program, and not to need to know the row or column length of my
>matrices, as even when using the "same data" these typically change,
>even from day to day.

Well, there's GET TRANSLATE with the /FIELDNAMES subcommand, which is
probably what you want; or GET DATA.

SPSS is very flexible about the "column length", by which I take it you
mean the number of cases or records. It is flexible about "row length",
by which I take it you mean number of variables, when reading using one
of the above methods; but you'll usually have to know the variable
names in the file in order to do anything else.

>I would then assign variable names using the first row of the
>matrix...

Does GET TRANSLATE ... /FIELDNAMES do this for you?

>since I have perhaps a 100 or coupla hundred variables in my database,
>I load a matrix of variable names into the program and using do loops
>I compare the list of variable names to the names I uploaded, so when

I think, here, you're getting far away from SPSS 'idiom'. SPSS isn't
very good at meta-processing, dealing with its own data dictionary. It
shouldn't be needed for what looks like something relatively simple:
loading data from a spreadsheet.

So, don't ask how to you mimic what you'd do in GAUSS; it's probably
somewhere between awkward and impossible. Help us understand what
you're trying to accomplish.

I know. One of the most frustrating times in any of our lives is
changing from one system to another with very different style and
idiom. It can be maddeningly difficult to even phrase the question, in
the new system's terms, how to do something that was obvious in the old
system.

BE PATIENT: It probably isn't that SPSS can't do it; it's probably that
you're having trouble thinking in SPSS's idiom, when you're used to
something very different.

>As an aside, one of the things I find somewhat difficult to deal with
>is:
>
>In GAUSS the = sign is an "ASSIGNMENT" operator and works the same for
>MATRICES and Scalars.  In general in GAUSS there was no special need
>to declare something to be a matrix, in general it treated everthing
>like a matrix, and a scalar is a [1,1] matrix. Is there something
>similar in SPSS, an assignment operator?

Yes; the '=' sign. But it has to be on a COMPUTE command. On the
COMPUTE command, you can do assignment pretty much as you'd expect.

>Is it possible to operate in matrix style/matrix notation without
>going through the whole matrix procedure?

If you mean, without a MATRIX/ END MATRIX structure, no, it isn't.
That's what MATRIX is for; and the 'idiom' in MATRIX is very different
from that in the rest of SPSS. That shouldn't be a problem, if you want
to work that way: read your data, issue MATRIX, and then the GET
statement of the MATRIX language.

BUT, here's where you're hitting a difference in paradigm and idiom. It
sounds like you're used to having your data in an explicit matrix, that
you manipulate with matrix commands. The SPSS MATRIX language is pretty
much like that.

But in idiomatic standard SPSS, you're not looking at your data as an
object. Think, not of operating on all the data at once, but on one row
at a time. (Drive you crazy? But you can, in fact, do most of what you
want that way. Its advantage is being able to handle MUCH larger
files.)

When you read about the transformation language, remember that
everything you're writing is the interior of a loop, the implicit loop
through the cases in the SPSS file.

Procedures (like CORRELATION or REGRESSION) accept this kind of file
and process it sequentially.

Now, keep asking. You'll be frustrated for a little while longer, until
you get the SPSS idiom. Quickly,
- In the MATRIX language, you can work pretty much the way you used to
in Gauss, though printed output, say, is probably less flexible
- To work in general SPSS, outside MATRIX, you will need to think in a
different idiom
- To just read your data from day to day, I'm not sure where your
difficulties lie, but it doesn't sound like it should be giving you as
much trouble as it seems to be. Remember: in SPSS, you don't assign
names to 'columns' (variables). SPSS is more like a database system
than like a matrix system: variables exist ONLY with names.

If I weren't so tired, I'd have made this shorter. I hope it's some
help. Do post again.

Good luck,
Richard Ristow