Model recommendation

classic Classic list List threaded Threaded
12 messages Options
Reply | Threaded
Open this post in threaded view
|

Model recommendation

mawais31
Hi,

I have 3 years data...

I want to quantify factors effecting Sales of business and sales unfortunatly depends on accidents :(, my response variable will be

Log_sales = mileage(class variable) + temperature + precipitation + age_of_car(class_variable) + weekday + time etc

my data is on daily basis i.e. mentioned bellow

<Dataset>
accident_date mileage vagntyp companycar age_of_car modelyear partsales
2010-01-01 2 285 0             2             2006              590
2010-01-01 2 295 0             3             2001              5672
2010-01-01 1 645 0             4             1998              6074
2010-01-01 1 184 0             3             2000              6418


log_partsales freq log_partsales type_damage PRR TTN
2,770852012 4781 3,679518744 M             0,7 -5,3
3,753736222 740 2,86923172 M             0,7 -5,6
3,783474788 1077 3,032215703 M             0,7 -5,6
3,807399713 1339 3,126780577 S             0,7 -5,6

Type_damage = S (single), M = multiple, D = Animal Accident
TTNĀ“= temperature
PRR = Precipitation

</Dataset>

I need a model recommendation for daily dataset, or shall I modify my dataset? as you will see if I plot the sales / time each day have multiple accidents i.e. multiple points on each day.

Please mention if I shall provide more detail

Thanks in Advance

Regards
Awais
Reply | Threaded
Open this post in threaded view
|

Re: Model recommendation

Maguin, Eugene
Your message is confusing. In part, it is language problems. I understand your equation (thank you for stating that equation!!) and that you have 3 years of data. I don't know exactly what you mean by 'my data is on daily basis'. Your model specifies 'Log_sales' as the dependent variable you don't have that variable in your data. You do have two variables with the same name, 'log_partsales'. I also don't understand your statement of the purpose of your analysis:  'I want to quantify factors effecting Sales of business and sales unfortunately depends on accidents :(, my response variable will be'. I don't understand 'Sales of business'. I think this is a language problem but I don't know what it means. I don't understand ' sales unfortunately depends on accidents'. And then you say, 'I  need a model recommendation for daily dataset, or shall I modify my dataset? as you will see if I plot the sales / time each day have multiple accidents i.e. multiple points on each day.' You see this as a problem; I don't know what it means.

Tell us about the data. Where does it come from? What does 'daily' mean? It seems you can use the data in its present form. Why not?

Gene Maguin



-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of mawais31
Sent: Tuesday, October 30, 2012 4:10 AM
To: [hidden email]
Subject: Model recommendation

Hi,

I have 3 years data...

I want to quantify factors effecting Sales of business and sales unfortunatly depends on accidents :(, my response variable will be

Log_sales = mileage(class variable) + temperature + precipitation +
age_of_car(class_variable) + weekday + time etc

my data is on daily basis i.e. mentioned bellow

<Dataset>
accident_date   mileage vagntyp companycar      age_of_car      modelyear       partsales
2010-01-01      2       285     0                    2               2006                     590
2010-01-01      2       295     0                    3               2001                     5672
2010-01-01      1       645     0                    4               1998                     6074
2010-01-01      1       184     0                    3               2000                     6418


log_partsales   freq    log_partsales   type_damage     PRR     TTN
2,770852012     4781    3,679518744     M                    0,7        -5,3
3,753736222     740     2,86923172      M                    0,7        -5,6
3,783474788     1077    3,032215703     M                    0,7        -5,6
3,807399713     1339    3,126780577     S                    0,7        -5,6

Type_damage = S (single), M = multiple, D = Animal Accident TTNĀ“= temperature PRR = Precipitation

</Dataset>

I need a model recommendation for daily dataset, or shall I modify my dataset? as you will see if I plot the sales / time each day have multiple accidents i.e. multiple points on each day.

Please mention if I shall provide more detail

Thanks in Advance

Regards
Awais



--
View this message in context: http://spssx-discussion.1045642.n5.nabble.com/Model-recommendation-tp5715944.html
Sent from the SPSSX Discussion mailing list archive at Nabble.com.

=====================
To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Model recommendation

mawais31
Thanks Maguin, Eugene!

thanks for your reply, regarding the accidents data, the sales highly depends on accidents i.e. body and paint amount. So as there is decrease in accidents there will be less body parts for cars are sold. Since there was a less accidents in 2011 then 2010.
So we don't know why there was a decrease, i.e.
1. whether people drive less(mileage)
2. 2010 year has harsh winter then 2011
3. introduction of city safety cars
4. less car sold in 2011

the data in my previous comment, contain column log_partsales is actually response variable.

Problem:
1. Contracts data: i.e. how many cars are sold are grouped togather using sql query i.e.
select mileage, companycar/privatecar, modeltype, modelyear, SUM(contracts) as CarSold from contracts2010 Group by mileage, companycar/privatecar, modeltype, modelyear

2. Accidents data: we have daily accidents say 16 accidents (records) for 01-01-2010 and so on for 2011,2009.

the accidents data table says car and user information i.e. car was from which mileage (1,2,3,4,5) 5 says car have drove more then 25000 kilometer /year, whether car was company car or private car, modeltype, modelyear, amount spend on part sales, age of car driver, sex of car driver.

Problem: the problem is that if I plot the sales data on daily basis then as I have multiple accidents on each day, then the plot (model) is scattered through whole year. and it didn't fit well.

How shall i organize my data so that, I can quantify the variables ?
What model, regression technique shall I use to find the effect of above 4 variables?

Regards
Awais khan
Reply | Threaded
Open this post in threaded view
|

Re: Model recommendation

Maguin, Eugene
Awais,

Let's start over and go through this very slowly because there's things I just don't understand. I went back and looked at your original post and you said:

Log_sales = mileage_category + temperature + precipitation + age_of_car_category + weekday + time etc

(I'm using 'category' rather than 'class variable' because I assume the two words have the same meaning but category will be more familiar to people on this list.)
First, your dependent variable is the log of sales. Sales of what? Where does the data for this variable come from? What is the time period for one log_sales value? Is it one day or something else? If something else, what is the time period?

Now, let's talk about your predictor variables. Where do these data come from? Do they come from the same dataset as the log_sales data?

In your original posting, you refer to 'accidents', I assume you mean car accidents/car crashes/car wrecks. Is this true?

In your original posting you said: 'my data is on daily basis'. Does that mean that you have all variables on a daily basis, every day, 365 days a year? How many days of data do you have? However, in another place, you say '... each day have multiple accidents i.e. multiple points on each day.) These two statements conflict with each because it seems that one dataset, the log_sales might be the sale each day and the predictors are the records of accidents and some days there are 10 accidents and on other days there are 34 accidents.

Awais, Please give an answer to every question (even if you're sure you've explained it before).

Lastly, I want to understand the 'big picture' of your project, so say what you are trying to do in one short sentence. Something like, 'I am trying to predict the (pick one: hourly, daily, weekly, monthly) sales of ???? by using data on ?????.

Gene Maguin





-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of mawais31
Sent: Thursday, November 01, 2012 4:27 AM
To: [hidden email]
Subject: Re: Model recommendation

Thanks Maguin, Eugene!

thanks for your reply, regarding the accidents data, the sales highly depends on accidents i.e. body and paint amount. So as there is decrease in accidents there will be less body parts for cars are sold. Since there was a less accidents in 2011 then 2010.
So we don't know why there was a decrease, i.e.
1. whether people drive less(mileage)
2. 2010 year has harsh winter then 2011
3. introduction of city safety cars
4. less car sold in 2011

the data in my previous comment, contain column log_partsales is actually response variable.

Problem:
1. Contracts data: i.e. how many cars are sold are grouped togather using sql query i.e.
select mileage, companycar/privatecar, modeltype, modelyear, SUM(contracts) as CarSold from contracts2010 Group by mileage, companycar/privatecar, modeltype, modelyear

2. Accidents data: we have daily accidents say 16 accidents (records) for
01-01-2010 and so on for 2011,2009.

the accidents data table says car and user information i.e. car was from which mileage (1,2,3,4,5) 5 says car have drove more then 25000 kilometer /year, whether car was company car or private car, modeltype, modelyear, amount spend on part sales, age of car driver, sex of car driver.

Problem: the problem is that if I plot the sales data on daily basis then as I have multiple accidents on each day, then the plot (model) is scattered through whole year. and it didn't fit well.

How shall i organize my data so that, I can quantify the variables ?
What model, regression technique shall I use to find the effect of above 4 variables?

Regards
Awais khan



--
View this message in context: http://spssx-discussion.1045642.n5.nabble.com/Model-recommendation-tp5715944p5715994.html
Sent from the SPSSX Discussion mailing list archive at Nabble.com.

=====================
To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Model recommendation

David Marso
Administrator
Good luck with that Gene!
Realize that this inquiry has been going off and on (mostly off) since July?
http://spssx-discussion.1045642.n5.nabble.com/Isolate-independent-variables-td5714101.html
---
Maguin, Eugene wrote
Awais,

Let's start over and go through this very slowly because there's things I just don't understand. I went back and looked at your original post and you said:

Log_sales = mileage_category + temperature + precipitation + age_of_car_category + weekday + time etc

(I'm using 'category' rather than 'class variable' because I assume the two words have the same meaning but category will be more familiar to people on this list.)
First, your dependent variable is the log of sales. Sales of what? Where does the data for this variable come from? What is the time period for one log_sales value? Is it one day or something else? If something else, what is the time period?

Now, let's talk about your predictor variables. Where do these data come from? Do they come from the same dataset as the log_sales data?

In your original posting, you refer to 'accidents', I assume you mean car accidents/car crashes/car wrecks. Is this true?

In your original posting you said: 'my data is on daily basis'. Does that mean that you have all variables on a daily basis, every day, 365 days a year? How many days of data do you have? However, in another place, you say '... each day have multiple accidents i.e. multiple points on each day.) These two statements conflict with each because it seems that one dataset, the log_sales might be the sale each day and the predictors are the records of accidents and some days there are 10 accidents and on other days there are 34 accidents.

Awais, Please give an answer to every question (even if you're sure you've explained it before).

Lastly, I want to understand the 'big picture' of your project, so say what you are trying to do in one short sentence. Something like, 'I am trying to predict the (pick one: hourly, daily, weekly, monthly) sales of ???? by using data on ?????.

Gene Maguin





-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of mawais31
Sent: Thursday, November 01, 2012 4:27 AM
To: [hidden email]
Subject: Re: Model recommendation

Thanks Maguin, Eugene!

thanks for your reply, regarding the accidents data, the sales highly depends on accidents i.e. body and paint amount. So as there is decrease in accidents there will be less body parts for cars are sold. Since there was a less accidents in 2011 then 2010.
So we don't know why there was a decrease, i.e.
1. whether people drive less(mileage)
2. 2010 year has harsh winter then 2011
3. introduction of city safety cars
4. less car sold in 2011

the data in my previous comment, contain column log_partsales is actually response variable.

Problem:
1. Contracts data: i.e. how many cars are sold are grouped togather using sql query i.e.
select mileage, companycar/privatecar, modeltype, modelyear, SUM(contracts) as CarSold from contracts2010 Group by mileage, companycar/privatecar, modeltype, modelyear

2. Accidents data: we have daily accidents say 16 accidents (records) for
01-01-2010 and so on for 2011,2009.

the accidents data table says car and user information i.e. car was from which mileage (1,2,3,4,5) 5 says car have drove more then 25000 kilometer /year, whether car was company car or private car, modeltype, modelyear, amount spend on part sales, age of car driver, sex of car driver.

Problem: the problem is that if I plot the sales data on daily basis then as I have multiple accidents on each day, then the plot (model) is scattered through whole year. and it didn't fit well.

How shall i organize my data so that, I can quantify the variables ?
What model, regression technique shall I use to find the effect of above 4 variables?

Regards
Awais khan



--
View this message in context: http://spssx-discussion.1045642.n5.nabble.com/Model-recommendation-tp5715944p5715994.html
Sent from the SPSSX Discussion mailing list archive at Nabble.com.

=====================
To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me.
---
"Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis."
Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?"
Reply | Threaded
Open this post in threaded view
|

Re: Model recommendation

mawais31
This post was updated on .
In reply to this post by Maguin, Eugene
I will answer one by one to your mentioned questions!

[ First, your dependent variable is the log of sales. Sales of what? Where does the data for this variable come from? What is the time period for one log_sales value? Is it one day or something else? If something else, what is the time period?]
 
(Sales of body and paint parts that are used in car damage. The data we taken from insurance company. Regarding time period we have daily say 16 accidents so 16 records on that day and we have 3 years accident data, almost 36000 accidents per year so 36000 records in one year)

[Now, let's talk about your predictor variables. Where do these data come from? Do they come from the same dataset as the log_sales data?]

(In accident table, mileage_category, age_of_car_category comes from same table, however temperature, precipitation and contracts are from temperature and contracts tables respectively)
 
In your original posting, you refer to 'accidents', I assume you mean car accidents/car crashes/car wrecks. Is this true?

(Yes exactly)
 
In your original posting you said: 'my data is on daily basis'. Does that mean that you have all variables on a daily basis, every day, 365 days a year? How many days of data do you have? However, in another place, you say '... each day have multiple accidents i.e. multiple points on each day.)

(explained earlier, I have each records represent accident, so i have multiple accident per day, and sales represent how much part expenses done on particular car. I have multiple points, accidents on each day)
 
I want to mention more about accident table,
In accident table we have car informtaion i.e. model type, modelyear,first registration date. however in contracts table we have cars grouped togather i.e.

mileage (5), xc60, modelyear(11), companycar has = 60 contracts
contracts datatable is monthly basis i.e. how many contracts we have of this particular car in month.

and hence If you want to combine the contracts with accidents table i.e.

accident1, mileage(5), companycar, damage(back), car involved in accident(3), sexowner, ageowner, warranty/insurance car, modelyear, model type

has

60Contracts....
Reply | Threaded
Open this post in threaded view
|

Re: Model recommendation

mawais31
In reply to this post by David Marso
When the data which we were working on has changed so thats why I deleted the post
Reply | Threaded
Open this post in threaded view
|

Re: Model recommendation

Maguin, Eugene
In reply to this post by mawais31
Helpful. Thanks. But more questions.
Summary thus far.
From the insurance company database you have a record for each accident, with each record listing the accident date, repair costs, mileage_category, age_of_car_category, car model type, model year, first registration date, etc.  A question: Does a multi-car accident generate one record for each car involved in the accident or one for the sum of cars in the accident? If a multi-car accident generates one record for each car, can you identify all the cars that were involved in a specific accident?

Do you have one precipitation+temperature record for each day?

Now, the contracts database. I assume that 'contracts database' is the database of persons/companies who have an insurance contract with the insurance company. In theory, a person/company could have multiple contracts, with each contract listing one vehicle OR a person/company could have exactly one contract, with the contract listing all vehicles that the owner insures with the company. From what you said in your reply, I think that a person/company has one contract listing one or more vehicles. Is this true?

Is it true that the reason you are interested in the (insurance) contracts database is because you want to bring in variables about the owner (e.g., sex, age, etc) and, possibly, variables about terms of the insurance contract?

You say, '... contracts table we have cars grouped togather i.e., mileage (5), xc60, modelyear(11), companycar has = 60 contracts contracts datatable is monthly basis i.e. how many contracts we have per month.'

Explain 'monthly basis', the '(5)' following 'mileage', the '(11)'  following 'model year', 'company car has  = 60 contracts'.

You say, '... and hence If you want to combine the contracts with accidents table, the accident table should also be grouped with same variables i.e. ...'.

Based on my present understanding of what you want to do, I disagree. Here's why. You want to predict repair costs for a given car. Therefore your analysis file would have one record for each car damaged. Each record consists of the repair costs, information about the accident itself (e.g., single car, multi-car) and the environmental conditions (e.g., temperature, time of day, precipitation),  information about the car, information about driver, and information about the owner's insurance contract. If you have 36,000 cars damaged in 27,000 accidents involving 36,000 drivers and 30,000 insurance contracts, your analysis database has 36,000 records.

Again, please give an answer to each question.

Thanks, Gene Maguin


-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of mawais31
Sent: Wednesday, November 07, 2012 2:59 AM
To: [hidden email]
Subject: Re: Model recommendation

I will answer one by one to your mentioned questions!

[ First, your dependent variable is the log of sales. Sales of what? Where does the data for this variable come from? What is the time period for one log_sales value? Is it one day or something else? If something else, what is the time period?]

(Sales of body and paint parts that are used in car damage. The data we taken from insurance company. Regarding time period we have daily say 16 accidents so 16 records on that day and we have 3 years accident data, almost 36000 accidents per year so 36000 records in one year)

[Now, let's talk about your predictor variables. Where do these data come from? Do they come from the same dataset as the log_sales data?]

(In accident table, mileage_category, age_of_car_category comes from same table, however temperature, precipitation and contracts are from temperature and contracts tables respectively)

In your original posting, you refer to 'accidents', I assume you mean car accidents/car crashes/car wrecks. Is this true?

(Yes exactly)

In your original posting you said: 'my data is on daily basis'. Does that mean that you have all variables on a daily basis, every day, 365 days a year? How many days of data do you have? However, in another place, you say '... each day have multiple accidents i.e. multiple points on each day.)

(explained earlier, I have each records represent accident, so i have multiple accident per day, and sales represent how much part expenses done on particular car. I have multiple points, accidents on each day)

I want to mention more about accident table, In accident table we have car informtaion i.e. model type, modelyear,first registration date. however in contracts table we have cars grouped togather i.e.

mileage (5), xc60, modelyear(11), companycar has = 60 contracts contracts datatable is monthly basis i.e. how many contracts we have per month.

and hence If you want to combine the contracts with accidents table, the accident table should also be grouped with same variables i.e.

accident1, mileage(5), companycar, damage(back), car involved in accident(3), sexowner, ageowner, warranty/insurance car, modelyear, model type

has

60Contracts....



--
View this message in context: http://spssx-discussion.1045642.n5.nabble.com/Model-recommendation-tp5715944p5716068.html
Sent from the SPSSX Discussion mailing list archive at Nabble.com.

=====================
To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Model recommendation

mawais31
A question: Does a multi-car accident generate one record for each car involved in the accident or one for the sum of cars in the accident?

Multi-car accidents generate multiple records but they have a accident_type column = multiple, other types includes single, animal accidents etc...

(Do you have one precipitation+temperature record for each day?

I have precipitation, temperature for each day ...

I think that a person/company has one contract listing one or more vehicles. Is this true?
In contract table we don't have personal or company information but rather grouped say In 2009 month 1 we have
modelyear, model, company/privatecar, mileage, contracts
2000 , modeltype, 0=companycar, 5 have 30 contracts

Is it true that the reason you are interested in the (insurance) contracts database is because you want to bring in variables about the owner (e.g., sex, age, etc) and, possibly, variables about terms of the insurance contract?
I want to see that why there is decrease in number of accidents in 2011, is it because the decrease in contracts, or is it because new cars with better technology i.e. city safety cars were introduced...
As because accident table will give us accidents with car involved and will not tell us any information about those cars which are not involved in accidents...

Explain 'monthly basis', the '(5)' following 'mileage', the '(11)'  following 'model year', 'company car has  = 60 contracts'.

It means that how many cars with particular mileage, modelyear 2011, company car contracts were there in January Month 2009

Based on my present understanding of what you want to do, I disagree. Here's why. You want to predict repair costs for a given car. Therefore your analysis file would have one record for each car damaged. Each record consists of the repair costs, information about the accident itself (e.g., single car, multi-car) and the environmental conditions (e.g., temperature, time of day, precipitation),  information about the car, information about driver, and information about the owner's insurance contract. If you have 36,000 cars damaged in 27,000 accidents involving 36,000 drivers and 30,000 insurance contracts, your analysis database has 36,000 records.

True, so you mean that I shall not make group on daily basis i.e.

How many cars involved in accident per day of mileage(5), whether car was companycar/private, modelyear, modeltype, total sales, contracts i.e. 60

Regards
Awais Khan

Reply | Threaded
Open this post in threaded view
|

Re: Model recommendation

Maguin, Eugene
How you structure your data set depends on what you want to know. I'll try to illustrate. I think this is might be a basic dataset but I probably left some things out.
Year, day, accidentID, <accident environment variables, e.g., time, location, temperature, rainfall, etc>, carID, <car descriptor variables, make, model, year, mileage, etc>, driverID, <age, sex, etc>, insurance_policyID, <policy related variables>, repair_orderid, <repairshopid, total repair costs, parts, labor>

Given this dataset, which, you say, has about 36,000 records per year, representing 36,000 cars/vehicles, you could analyze several different questions. I understand you to be interested in predicting vehicle repair costs and for this I'd use this dataset as it is, all 36,000 records per year. This would be an ordinary multiple regression analysis.

You also could analyze average accident repair costs. Some accidents involve one car; others involve 5 or 6 cars. To do this the simplest method would be to use the aggregate command to average the repair cost and your predictor variables over the cars involved in the same accident. Now the dataset might have 25,000 records per year.  A more sophisticated method that uses the 36,000 record file would be to go to a multilevel model using the Mixed command. The model would have two levels: car and accident.

You could analyze average daily car repair costs. Now, you aggregate over cars in an accident and accidents in a day. Now you have a 365 record dataset. Again, you could use a multilevel model, this time with three levels: Car, accident, day.

I get the impression that you are pulling data from an insurance company database that has multliple tables, maybe vehicle table, a insurance contract table, an insured table, a claims (car repair costs) table. And you have some information from other databases, a weather conditions database, for one, and maybe other databases as well. The insurance company data base is relational and when you build up your analysis datafile you have to build a 'flat' file so that if 10 cars were involved in a big wreck, the accident data common to the accident is repeated for each car involved in the accident.





-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of mawais31
Sent: Wednesday, November 07, 2012 5:15 PM
To: [hidden email]
Subject: Re: Model recommendation

/A question: Does a multi-car accident generate one record for each car involved in the accident or one for the sum of cars in the accident? /

Multi-car accidents generate multiple records but they have a accident_type column = multiple, other types includes single, animal accidents etc...

/(Do you have one precipitation+temperature record for each day?
/
I have precipitation, temperature for each day ...

/I think that a person/company has one contract listing one or more vehicles. Is this true? / In contract table we don't have personal or company information but rather grouped say In 2009 month 1 we have modelyear, model, company/privatecar, mileage, contracts
2000 , modeltype, 0=companycar, 5 have 30 contracts

/Is it true that the reason you are interested in the (insurance) contracts database is because you want to bring in variables about the owner (e.g., sex, age, etc) and, possibly, variables about terms of the insurance contract? / I want to see that why there is decrease in number of accidents in 2011, is it because the decrease in contracts, or is it because new cars with better technology i.e. city safety cars were introduced...
As because accident table will give us accidents with car involved and will not tell us any information about those cars which are not involved in accidents...

/Explain 'monthly basis', the '(5)' following 'mileage', the '(11)'
following 'model year', 'company car has  = 60 contracts'.
/
It means that how many cars with particular mileage, modelyear 2011, company car contracts were there in January Month 2009

/Based on my present understanding of what you want to do, I disagree.
Here's why. You want to predict repair costs for a given car. Therefore your analysis file would have one record for each car damaged. Each record consists of the repair costs, information about the accident itself (e.g., single car, multi-car) and the environmental conditions (e.g., temperature, time of day, precipitation),  information about the car, information about driver, and information about the owner's insurance contract. If you have
36,000 cars damaged in 27,000 accidents involving 36,000 drivers and 30,000 insurance contracts, your analysis database has 36,000 records. /

True, so you mean that I shall not make group on daily basis i.e.

How many cars involved in accident per day of mileage(5), whether car was companycar/private, modelyear, modeltype, total sales, contracts i.e. 60

Regards
Awais Khan





--
View this message in context: http://spssx-discussion.1045642.n5.nabble.com/Model-recommendation-tp5715944p5716099.html
Sent from the SPSSX Discussion mailing list archive at Nabble.com.

=====================
To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Model recommendation

mawais31
Thanks Alot for Your help!

Just one question before I fold my sleeves and start to work on this, if I use the following model

You could analyze average daily car repair costs. Now, you aggregate over cars in an accident and accidents in a day. Now you have a 365 record dataset. Again, you could use a multilevel model, this time with three levels: Car, accident, day.
 


regarding agregating what my data will look like ?i.e.

if total accidents in 1:1:2009 is 34

1st record would be
car = 1 (mileage(1), model(124), modelyear(2000)) etc
has 12 accidents in day 1:1:2009

2nd record would be
car = 2 (mileage(2), model(124), modelyear(2000)) etc
has 5 accidents in day 1:1:2009

and so on ...

OR

out of 34 accidents how much were car = 1, car = 2, car = 3 etc
in 1:1:2009

because it would become a freq table / contingency table, or some other structure which I don't have information

And accidents becomes count, Now should I use poisson regression for this model?

Thanks

Regards
Awais Khan
Reply | Threaded
Open this post in threaded view
|

Re: Model recommendation

Maguin, Eugene
Awais,

What I said in my last message was ' You could analyze average daily car repair costs. Now, you aggregate over cars in an accident and accidents in a day. Now you have a 365 record dataset. Again, you could use a multilevel model, this time with three levels: Car, accident, day.'

I want to emphasize that there one method is an ordinary single level (regression command) model in which the DV and all  IVs are aggregated to the day level. An alternative method is a multilevel mixed model, which can be done using the mixed command.

It looks like you want to analyze daily car repair costs by means of an single level regression model. Aggregating over repair costs is easy enough. Making the IVs is not. Although I don't have much experience in this type of analysis, I think I would make new variables that are percentages/proportions. For instance, your basic database would have information on whether the accident was a single vehicle, two vehicle or multi (two plus) vehicle accident. If you judge that information to be important, I suggest that you add two new variables to the basic dataset. One variable would be 'TwoVehicle' (0=no, 1=yes) and the other would be 'MultiVehicle' (0=no, 1=yes). When you aggregate over accidents, you get the proportion of accidents that were two vehicle and the proportion that were multivehicle. An alternative variable might be the number of vehicles involved in the accident. At the car level, you might (as I think you have) make new variables that represent categories of pri!
 ce ranges or vehicle type. Those new variables are coded 0/1 and when they are aggregated they become the proportion of cars/vehicles in accidents that day.

While I've suggested two different analytical methods (single level vs multilevel), I do not know how closely  the results will converge or under which conditions the convergence will be lessor or greater. This problem  must come up in economic analysis and perhaps somebody on the list will comment on that.

Gene Maguin


-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of mawais31
Sent: Friday, November 09, 2012 9:16 AM
To: [hidden email]
Subject: Re: Model recommendation

Thanks Alot for Your help!

Just one question before I fold my sleeves and start to work on this, if I use the following model

/You could analyze average daily car repair costs. Now, you aggregate over cars in an accident and accidents in a day. Now you have a 365 record dataset. Again, you could use a multilevel model, this time with three
levels: Car, accident, day.
 /

regarding agregating what my data will look like ?i.e.

if total accidents in 1:1:2009 is 34

*1st record would be *
/car = 1 (mileage(1), model(124), modelyear(2000)) etc has 12 accidents in day 1:1:2009/ *2nd record would be * /car = 2 (mileage(2), model(124), modelyear(2000)) etc has 5 accidents in day 1:1:2009/ and so on ...

OR

out of 34 accidents how much were car = 1, car = 2, car = 3 etc in 1:1:2009

because it would become a freq table / contingency table, or some other structure which I don't have information

And accidents becomes count, Now should I use poisson regression for this model?

Thanks

Regards
Awais Khan



--
View this message in context: http://spssx-discussion.1045642.n5.nabble.com/Model-recommendation-tp5715944p5716137.html
Sent from the SPSSX Discussion mailing list archive at Nabble.com.

=====================
To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD