Remove ID's with missing month end data

classic Classic list List threaded Threaded
15 messages Options
Reply | Threaded
Open this post in threaded view
|

Remove ID's with missing month end data

jimjohn
can someone plz help me with this:

one of my variables is ID, which is a unique id for an investment taken out by a customer. Each ID will have many different cases associated with it, for each month-end information on that customer's investment. I have another variable "Age" that is the age of the investment in days up to a maximum of 365. In some cases, an ID has some missing cases (ie a missing case for one particular month's end data). If I sort my cases by ID and then by Age, I should not see a gap of more than 30 from one age to the next for a particular ID. If I do, that means there is a missed month. What I want to do is I want to remove all cases associated with an ID that has any missing months. Does someone have an idea how I can go about this in SPSS, my data file is very big and doing it manually by sorting and looking at each ID will take way too long. Thanks!
Reply | Threaded
Open this post in threaded view
|

Re: Remove ID's with missing month end data

Maguin, Eugene
JimJohn,

After sorting the file, I'd use the lag function to calculate the intervals
between successive records on the Age variable. Call this new variable
AgeDiff. Then use the aggregate function with the add variables option and
the max function on AgeDiff. Then you can select those cases whose value of
the aggregated AgeDiff variable is less than whatever your selection
criteria is. This is the outline. Everything else that you need is in the
documentation.

Gene Maguin

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Remove ID's with missing month end data

jimjohn
Thanks a lot Gene! The only problem with this is that when I go from one ID to the next, my AgeDiff will be very high since the new ID will start at the first month. For example, if I sort by data by ID and Age, when my data goes from one ID to the next, the AgeDiff will be high. However, I only want AgeDiff to be high when within the same ID, the difference is high. Any ideas how I can work around this? Thanks!




Gene Maguin wrote
JimJohn,

After sorting the file, I'd use the lag function to calculate the intervals
between successive records on the Age variable. Call this new variable
AgeDiff. Then use the aggregate function with the add variables option and
the max function on AgeDiff. Then you can select those cases whose value of
the aggregated AgeDiff variable is less than whatever your selection
criteria is. This is the outline. Everything else that you need is in the
documentation.

Gene Maguin

=====================
To manage your subscription to SPSSX-L, send a message to
LISTSERV@LISTSERV.UGA.EDU (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Remove ID's with missing month end data

Maguin, Eugene
JimJohn,

Yes, that's right. The difference is only valid when it is computed on cases
having the same id. So, how to fix that up? Suppose you embedded the lag
computation in a Do if block that was executed only if the current record
and the immediately preceding record had the same id value. This maneuver is
pretty common so I'm sure you have seen it used in previous postings

Gene Maguin

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Remove ID's with missing month end data

jimjohn
Thanks. I hadn't heard of a Do if block before, but I will look it up in previous postings. I'm still confused how I would compare the current ID to the previuos ID though. I tried to lag the ID but since it is a string, the lag function doesn't work. Is there any way to lag a string variable? Then, I could compare the ID with the lagged ID and that would tell me if the current ID equals the previous.


Gene Maguin wrote
JimJohn,

Yes, that's right. The difference is only valid when it is computed on cases
having the same id. So, how to fix that up? Suppose you embedded the lag
computation in a Do if block that was executed only if the current record
and the immediately preceding record had the same id value. This maneuver is
pretty common so I'm sure you have seen it used in previous postings

Gene Maguin

=====================
To manage your subscription to SPSSX-L, send a message to
LISTSERV@LISTSERV.UGA.EDU (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Remove ID's with missing month end data

Maguin, Eugene
JimJohn,

Ok, so id is a string variable. I didn't anticipate that. But, wait a
minute. I thought you said you had previously gotten the lag function to
work. Are you now saying that it didn't previously work or that it did and
you tried to incorporate the Do if command and then had trouble? And, how
long is that string variable? A10? A1000?

It might be good to post back to the listserv the syntax you used that
worked and the syntax that didn't work and the exact text of the warning or
error message, if there was one.

Gene Maguin

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Remove ID's with missing month end data

jimjohn
Hi Gene. The id is a string variable. I exported it into SPSS as a string (with 19 characters), because if I exported it as a number, the leading zero's would not be included. I had previously gotten the lag function to work, but this was on the Age variable, which is a number. This is the message I get when I try to lag the ID variable:
"Invalid combination of data types in an assignment. Char and strings may only be assigned to string variables. Numerical and logical quantities may only be assigned to numeric variables. Consider using the STRING or NUMBER function."
So I guess theres no way to lag a string variable then without converting it to a number? Because I was going to lag the ID and then compare the difference between this ID's age and last ID's age as long as the two ID's are equal.
Thanks!



Gene Maguin wrote
JimJohn,

Ok, so id is a string variable. I didn't anticipate that. But, wait a
minute. I thought you said you had previously gotten the lag function to
work. Are you now saying that it didn't previously work or that it did and
you tried to incorporate the Do if command and then had trouble? And, how
long is that string variable? A10? A1000?

It might be good to post back to the listserv the syntax you used that
worked and the syntax that didn't work and the exact text of the warning or
error message, if there was one.

Gene Maguin

=====================
To manage your subscription to SPSSX-L, send a message to
LISTSERV@LISTSERV.UGA.EDU (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Remove ID's with missing month end data

ViAnn Beadle
This is no problem lagging string variables. Try this syntax.

data list list / name (a20).
begin data
jim
john
jim
jim
end data.
if (lag(name) eq name) newvar = 1.
list  variables newvar.

There is a problem with trying to compare a number to a string in a logical
expression or trying to assign a string value to a number. You are not
giving us the whole story here.

-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of
jimjohn
Sent: Wednesday, July 09, 2008 12:46 PM
To: [hidden email]
Subject: Re: Remove ID's with missing month end data

Hi Gene. The id is a string variable. I exported it into SPSS as a string
(with 19 characters), because if I exported it as a number, the leading
zero's would not be included. I had previously gotten the lag function to
work, but this was on the Age variable, which is a number. This is the
message I get when I try to lag the ID variable:
"Invalid combination of data types in an assignment. Char and strings may
only be assigned to string variables. Numerical and logical quantities may
only be assigned to numeric variables. Consider using the STRING or NUMBER
function."
So I guess theres no way to lag a string variable then without converting it
to a number? Because I was going to lag the ID and then compare the
difference between this ID's age and last ID's age as long as the two ID's
are equal.
Thanks!




Gene Maguin wrote:

>
> JimJohn,
>
> Ok, so id is a string variable. I didn't anticipate that. But, wait a
> minute. I thought you said you had previously gotten the lag function to
> work. Are you now saying that it didn't previously work or that it did and
> you tried to incorporate the Do if command and then had trouble? And, how
> long is that string variable? A10? A1000?
>
> It might be good to post back to the listserv the syntax you used that
> worked and the syntax that didn't work and the exact text of the warning
> or
> error message, if there was one.
>
> Gene Maguin
>
> =====================
> To manage your subscription to SPSSX-L, send a message to
> [hidden email] (not to SPSSX-L), with no body text except the
> command. To leave the list, send the command
> SIGNOFF SPSSX-L
> For a list of commands to manage subscriptions, send the command
> INFO REFCARD
>
>

--
View this message in context:
http://www.nabble.com/Remove-ID%27s-with-missing-month-end-data-tp18362499p1
8368210.html
Sent from the SPSSX Discussion mailing list archive at Nabble.com.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Remove ID's with missing month end data

Maguin, Eugene
In reply to this post by jimjohn
JimJohn,

Let me first acknowledge that I've never tried what I'm going to suggest.
So, I may learn something. Although I don't know for sure, I think there are
problems with your syntax but I'm not sure where because I need to see your
syntax to examine it. Anyway, I would try this first.

Do if (id eq lag(id)).
+  compute agediff=age-lag(age).
End if.

According to my reading of the 15 reference, long strings, of which your id
variable is an example, should work in a lag function. I assume that a long
string will also work in an equality test but I may be wrong. If it fails, I
want to know the error message text.

Gene Maguin

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Remove ID's with missing month end data

jimjohn
Thanks Gene. I tried your syntax. It ran without any error, but afterwards, I don't notice any change. I already had a variable called AgeDiff and my variable hasn't changed. It still takes the difference when I go from one ID to the next. And there isn't any new variable either.

Also, when I had mentioned that error message before that I received when trying to lag a string, I was using the commands instead of the syntax. I went to Transform -> Compute Variable. Then I created a new variable name and in the = section, I typed in lag(ID, 1). It just wouldn't run adn just gave me that error message.

Any ideas? Viann, thanks, I will try your syntax too just now.





Gene Maguin wrote
JimJohn,

Let me first acknowledge that I've never tried what I'm going to suggest.
So, I may learn something. Although I don't know for sure, I think there are
problems with your syntax but I'm not sure where because I need to see your
syntax to examine it. Anyway, I would try this first.

Do if (id eq lag(id)).
+  compute agediff=age-lag(age).
End if.

According to my reading of the 15 reference, long strings, of which your id
variable is an example, should work in a lag function. I assume that a long
string will also work in an equality test but I may be wrong. If it fails, I
want to know the error message text.

Gene Maguin

=====================
To manage your subscription to SPSSX-L, send a message to
LISTSERV@LISTSERV.UGA.EDU (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Remove ID's with missing month end data

jimjohn
In reply to this post by Maguin, Eugene
ok i just tried "compute laggegd = lag(ID).
and here is the error I got:

compute lagged = LAG(ID).

>Error # 4309 in column 1024.  Text: (End of Command)
>Invalid combination of data types in an assignment.  Character strings may
>only be assigned to string variables.  Numeric and logical quantities may only
>be assigned to numeric variables.  Consider using the STRING or NUMBER
>function.
>This command not executed.
EXECUTE.

thanks!
Gene Maguin wrote
JimJohn,

Let me first acknowledge that I've never tried what I'm going to suggest.
So, I may learn something. Although I don't know for sure, I think there are
problems with your syntax but I'm not sure where because I need to see your
syntax to examine it. Anyway, I would try this first.

Do if (id eq lag(id)).
+  compute agediff=age-lag(age).
End if.

According to my reading of the 15 reference, long strings, of which your id
variable is an example, should work in a lag function. I assume that a long
string will also work in an equality test but I may be wrong. If it fails, I
want to know the error message text.

Gene Maguin

=====================
To manage your subscription to SPSSX-L, send a message to
LISTSERV@LISTSERV.UGA.EDU (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Remove ID's with missing month end data

Maguin, Eugene
JimJohn,

Wait a minute. I don't quite believe what you are telling me. Do this.

In my reply back to you I said to run this:

Do if (id eq lag(id)).
+  compute agediff=age-lag(age).
End if.

You did and said it ran without error. I want you to run that bit of code
again BUT before you do, delete Agediff. It should run again without error
and what you should see is that the first record in every set of ids has a
missing value.

Gene Maguin

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Remove ID's with missing month end data

jimjohn
Gene, I did what you said and the command does work fine now. I first closed and reopened my data file and then I deleted agediff. Thanks! I'm thinking it should have still worked with agediff already being there? I have a huge data file (7 million cases), maybe SPSS has problems with such large files?


Gene Maguin wrote
JimJohn,

Wait a minute. I don't quite believe what you are telling me. Do this.

In my reply back to you I said to run this:

Do if (id eq lag(id)).
+  compute agediff=age-lag(age).
End if.

You did and said it ran without error. I want you to run that bit of code
again BUT before you do, delete Agediff. It should run again without error
and what you should see is that the first record in every set of ids has a
missing value.

Gene Maguin

=====================
To manage your subscription to SPSSX-L, send a message to
LISTSERV@LISTSERV.UGA.EDU (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Remove ID's with missing month end data

jimjohn
In reply to this post by Maguin, Eugene
Also, I tried first creating a new variable called "tester". I think the problem may be that SPSS initially defaults a new variable to a number. So when I created it and then changed it to String in the Variables section before running my command, then it worked fine.


Gene Maguin wrote
JimJohn,

Wait a minute. I don't quite believe what you are telling me. Do this.

In my reply back to you I said to run this:

Do if (id eq lag(id)).
+  compute agediff=age-lag(age).
End if.

You did and said it ran without error. I want you to run that bit of code
again BUT before you do, delete Agediff. It should run again without error
and what you should see is that the first record in every set of ids has a
missing value.

Gene Maguin

=====================
To manage your subscription to SPSSX-L, send a message to
LISTSERV@LISTSERV.UGA.EDU (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Remove ID's with missing month end data

Richard Ristow
In reply to this post by jimjohn
Oh, dear.

At 04:08 PM 7/9/2008, jimjohn wrote:

>ok i just tried "compute laggegd = lag(ID). and here is the error I got:
>
>compute lagged = LAG(ID).
>
>>Error # 4309 in column 1024.  Text: (End of Command)
>>Invalid combination of data types in an assignment.

Right. And back Wed, 9 Jul 2008 (10:55:10 -0700) you asked, "Is there
any way to lag a string variable?", i.e. your ID.

And yes, Lag works fine with string variables. I think both Gene and
ViAnn Beadle have told you that.

The problem is that variable 'lagged' is *NOT* a string variable; so,
you get the "Invalid combination of data types."  A new variable is a
string variable only if it's declared on a "STRING" statement.

I don't see your stating how long 'ID' is. If it's 11 characters,
this will work:

STRING  lagged(A11).
compute lagged = LAG(ID).

And none of this has anything to do with file size.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD