StudentShare
Contact Us
Sign In / Sign Up for FREE
Search
Go to advanced search...
Free

Scatterplot for Credit Balance and Size - Assignment Example

Cite this document
Summary
The paper "Scatterplot for Credit Balance and Size" highlights that discarding the variable “Years” does not significantly impact the regression model as our R-Square value is 80.3% where as R-Sq (adj) is 79.5%. Mallow’s Cp is another statistic for assessing how well the model fits the data…
Download full paper File format: .doc, available for editing
GRAB THE BEST PAPER94.2% of users find it useful
Scatterplot for Credit Balance and Size
Read Text Preview

Extract of sample "Scatterplot for Credit Balance and Size"

?Statistical Assignment Scatterplot for CREDIT BALANCE vs. SIZE The relationship between the two variables, Credit Balance and the Size, is illustrated in the scatter plot below. As can be seen from the above scatter plot there exists a relationship between the two variables as with increase in the size of number of people living in a house, the credit balance also increases and vice versa. From the graph, it there seems a linear relationship between the two variables as shown below. 2. Equation of the "Best Fit" Line for Relationship between CREDIT BALANCE and SIZE In next step, regression model was determined between Credit Balance and Size using the Regression Analysis in Minitab. The result revealed that the relationship between the two variables is best determined by linear relationship using following equation: Regression Analysis: Credit Balance($) versus Size The regression equation is Credit Balance($) = 2591 + 403 Size Predictor Coef SE Coef T P Constant 2591.4 195.1 13.29 0.000 Size 403.22 50.95 7.91 0.000 S = 620.162 R-Sq = 56.6% R-Sq(adj) = 55.7% Analysis of Variance Source DF SS MS F P Regression 1 24092210 24092210 62.64 0.000 Residual Error 48 18460853 384601 Total 49 42553062 Unusual Observations Credit Obs Size Balance($) Fit SE Fit Residual St Resid 5 2.00 1864.0 3397.9 113.7 -1533.9 -2.52R R denotes an observation with a large standardized residual. From the regression analysis above, we can see that there is a linear positive relationship between the two variables, which means as the number of people in the house increase, the credit balance also increases. Minitab results for regression indicate a factor DF, which stands for degree of freedom. The DF for a variable is calculated by one less than the number of group levels. Similarly, degree of freedom for error is calculated by subtracting number of group levels from sample size; whereas, degree of freedom for total is calculated by sample size minus 1. The number of degrees of freedom is the number of values that are free to change in the final statistical calculations. However, it is not important for interpretation of Analysis of Variance in Regression Model since we have p-value calculated by Minitab. Next SS, here stands for sum of squares which is a measure of variation in the data between groups and variables during the analysis. The sum of squares (SS) are calculated using the sum of Y’s and X’s. MS is called Measure Square of the Error and is calculated by F-test in Analysis of Variance. It is a ratio of variability between groups compared to variability within the groups. If the ratio is large then the p-value would be small indicating a statistically significant result. F-test is at least 1 indicating a non-negative number. In our case it is 62.64 which considerably high thus showing a p-value less than our level of significance i.e., 0.05. Also p-value is the probability of being greater than F value or simply the area to the right of F value. . P-value of 0.000 in Analysis of Variance and 0.026 in Sequential Analysis of Variance (for Quadratic Polynomial fit) are both less than our significance level of ? = 0.05. Further, R-Square value of 56.6% suggest that the model fits well with the actual data and there is relatively a strong relationship between the two variables. 3. Coefficient of Correlation A correlation coefficient referred to as Pearson Product-Moment Correlation Coefficient is used to measure the strength of linear relationship between the two variables. The value of the coefficient is influenced by the distribution of the independent variable. Next correlation between Credit Balance and Size was determined using Pearson’s coefficient of Correlation as shown below: Correlations: Credit Balance, Size Pearson correlation of Credit Balance and Size = 0.752 P-Value = 0.000 The Pearson Correlation value of 0.752 indicated that there existed a strong relationship between the two variables since Statisticshowto.com (2009) suggests High correlation: 0.5 to 1.0 or -0.5 to -1.0 Medium correlation: 0.3 to 0.5 or -0.3 to -0.5 Low correlation: 0.1 to 0.3 or -0.1 to -0.3 4. Coefficient of Determination The value of co-efficient of determination of R-Square is used to determine how well the model fits the actual data points. In other words, this value determines how well a regression model approximates the real data points (Cameron and Windmeijer et al. 1992). From the regression model, it was determined that the standard deviation of error terms is 620.162. Also value of co-efficient of determination or R-Sq was found to be 56.6% indicating whenever a variation is observed in the value of dependent variable i.e. size, 56.6% of it is due to the model i.e., change in values of “x” (dependent variable) while rest is due to unexplained errors. From Minitab R-Sq = 56.6% The R-Sq value of 56.6% and a P-value of 0.000 less than significance level of 0.05 suggested a good relationship between the two variables 5. Utility of Regression Model From the regression analysis we see that the corresponding test statistics are 7.91 indicating that this is a large value of t-statistics as evident by p-value of 0.000 against a significance level ?=0.05. Thus we reject the null hypothesis and conclude that “Size” variable i.e., number of people in a house plays a significant role in regression model. Predictor Coef SE Coef T P Constant 2591.4 195.1 13.29 0.000 Size 403.22 50.95 7.91 0.000 6. Opinion about using SIZE to predict CREDIT BALANCE From the results of regression model and value of Pearson co-efficient of Correlation, it is seen that there exists a relatively strong relationship between two variables the Credit Balance and Size. Hence, Size can be used as a good predictor to determine the value of Credit Balance in absence of any other relevant information. 7. Compute 95% Confidence Interval for ?1 The regression analysis depicted that the 95% confidence interval for ?1 is 300.79 to 505.66. 95% confidence interval for ?1 = (300.79, 505.66.) From the above results we can be 95% confident that the real value of co-efficient that we are estimating fall somewhere within these 95% confidence interval values. Since the interval does not contain zero value, this suggests that our p-value would be equal to or less than significance level of 0.05 as in our case. In Minitab, the calculations for test statistics are carried out under the assumption that the slope is zero. In the same context, p-value is calculated by default assuming the alternative hypothesis is a “two-tailed, no equal to” hypothesis and is found to be 0.000 (up to three decimal places) which is less than our significance level of 0.05. Since the p-value is smaller than the confidence level we can then reject the null hypothesis and conclude that ?1 is not equal to zero. Thus, there exists sufficient evidence at significance level ?=0.05 to conclude that there is a relationship between the two variables i.e., Size and Credit Balance. General Regression Analysis: Credit Balance($) versus Size Regression Equation Credit Balance($) = 2591.44 + 403.221 Size Coefficients Term Coef SE Coef T P 95% CI Constant 2591.44 195.064 13.2851 0.000 (2199.24, 2983.65) Size 403.22 50.946 7.9147 0.000 ( 300.79, 505.66) Summary of Model S = 620.162 R-Sq = 56.62% R-Sq(adj) = 55.71% PRESS = 19992921 R-Sq(pred) = 53.02% Analysis of Variance Source DF Seq SS Adj SS Adj MS F P Regression 1 24092210 24092210 24092210 62.6421 0.000000 Size 1 24092210 24092210 24092210 62.6421 0.000000 Error 48 18460853 18460853 384601 Lack-of-Fit 5 2499467 2499467 499893 1.3467 0.263274 Pure Error 43 15961386 15961386 371195 Total 49 42553062 Fits and Diagnostics for Unusual Observations Credit Obs Balance($) Fit SE Fit Residual St Resid 5 1864 3397.89 113.691 -1533.89 -2.51600 R R denotes an observation with a large standardized residual. 8. Using an interval, estimate the average credit balance for customers that have household size of From the regression analysis, we see that the values of confidence interval at 95% were calculated as New Obs Fit SE Fit 95% CI 95% PI 1 4607.5 119.0 (4368.2, 4846.9) (3337.9, 5877.2) 95% Confidence Interval to estimate average credit balance for household size of 5 = (4368.2, 4846.9) Thus we can safely state that we can be 95% confident that average credit balance for customers having household size of 5 would fall within the interval of 4368.2 to 4846.9. Regression Analysis: Credit Balance($) versus Size The regression equation is Credit Balance($) = 2591 + 403 Size Predictor Coef SE Coef T P Constant 2591.4 195.1 13.29 0.000 Size 403.22 50.95 7.91 0.000 S = 620.162 R-Sq = 56.6% R-Sq(adj) = 55.7% Analysis of Variance Source DF SS MS F P Regression 1 24092210 24092210 62.64 0.000 Residual Error 48 18460853 384601 Total 49 42553062 Unusual Observations Credit Obs Size Balance($) Fit SE Fit Residual St Resid 5 2.00 1864.0 3397.9 113.7 -1533.9 -2.52R R denotes an observation with a large standardized residual. Predicted Values for New Observations New Obs Fit SE Fit 95% CI 95% PI 1 4607.5 119.0 (4368.2, 4846.9) (3337.9, 5877.2) Values of Predictors for New Observations New Obs Size 1 5.00 9. Using an interval, predict the credit balance for a customer that has a household size of 5 From the regression analysis, we see that the predicted values of confidence interval at 95% were calculated as New Obs Fit SE Fit 95% CI 95% PI 1 4607.5 119.0 (4368.2, 4846.9) (3337.9, 5877.2) 95% Confidence Interval to predict credit balance for a customer having household size of 5 = (3337.9, 5877.2) Thus we can safely state that we can be 95% confident that predicted credit balance for customers having household size of 5 would fall within the interval of 3337.9 to 5877.2. 10. Credit Balance for a customer that has a household size of 10 In our sample data, the maximum household size is 7; therefore, a household size of 10 is likely to be an outlier. From the regression equation, we can estimate possible Credit Balance for a household size of 10 as below The regression equation is Credit Balance($) = 2591 + 403 Size = 2591 + (403 X 10) =6621 11 Using the variables INCOME, SIZE and YEARS to predict CREDIT BALANCE The regression analysis with the three variables indicated an improved R-Square value of 80.5% which shows that this model fits better than the earlier one to our actual data. The new regression equation is given below: Regression Analysis: Credit Balance($ versus Size, Income ($1000), Years The regression equation is Credit Balance($) = 1276 + 347 Size + 32.3 Income ($1000) + 7.9 Years Predictor Coef SE Coef T P Constant 1276.0 273.6 4.66 0.000 Size 346.85 36.03 9.63 0.000 Income ($1000) 32.272 4.348 7.42 0.000 Years 7.88 12.34 0.64 0.526 S = 424.715 R-Sq = 80.5% R-Sq(adj) = 79.2% Analysis of Variance Source DF SS MS F P Regression 3 34255444 11418481 63.30 0.000 Residual Error 46 8297619 180383 Total 49 42553062 Source DF Seq SS Size 1 24092210 Income ($1000) 1 10089614 Years 1 73620 Unusual Observations Credit Obs Size Balance($) Fit SE Fit Residual St Resid 3 4.00 5100.0 3830.1 93.7 1269.9 3.07R 5 2.00 1864.0 3001.7 139.3 -1137.7 -2.84R 11 3.00 4208.0 3210.1 103.3 997.9 2.42R 17 6.00 4412.0 5250.3 116.3 -838.3 -2.05R R denotes an observation with a large standardized residual 12. Perform the Global Test for Utility (F-Test). According to Minitab Help, the F-value for regression is used to test the null hypothesis that all the coefficients in a regression model are zero. It is calculated as F = MS regression / MS error. Suppose we choose a = 0.05. If the hypothesis above is true (with 95% confidence), the calculated F-value should be smaller than the 95th percentile of the F distribution using degrees of freedom for regression and residual. We reject the null hypothesis if the F-value is greater than the F-value at the 95th percentile. We follow standard hypothesis test procedures in conducting the lack of fit F-test. First, we specify the null and alternative hypotheses: H0: There is no lack of linear fit. H?: There is lack of linear fit. Analysis of Variance Source DF SS MS F P Regression 1 24092210 24092210 62.64 0.000 Residual Error 48 18460853 384601 Lack of Fit 5 2499467 499893 1.35 0.263 Pure Error 43 15961386 371195 Total 49 42553062 The F*-statistic is 1.35 and the P-value is 0.263. The P-value is greater than the significance level ? = 0.05 — we accept the null hypothesis and conclude that there is sufficient evidence at ? = 0.05 level that there is no lack of linear fit. 13. Perform the t-test on each independent variable. Explain your conclusions and clearly state how you should proceed. In particular, which independent variables should we keep and which should be discarded From the regression analysis, we see that the two variables Size and Income have t-statistics value of 9.63 and 7.42 respectively which are greater than the p-value of 0.000 for both variables against a significance level ?=0.05. Thus we reject the null hypothesis and conclude that Size and Income have relationship with Credit Balance. However, in case of Years, the t-value of 0.64 which resulted in a p-value of 0.526 much greater than our significance level of 0.05. Here, we accept null hypothesis and conclude that Years do not play a significant role in regression model. Therefore, we can discard the variable “Years” and include the variables “Size” and “Income” in the regression model. Predictor Coef SE Coef T P Constant 1276.0 273.6 4.66 0.000 Size 346.85 36.03 9.63 0.000 Income ($1000) 32.272 4.348 7.42 0.000 Years 7.88 12.34 0.64 0.526 14. Is this multiple regression model better than the linear model that we generated in parts 1-10? To see if this multiple regression model better than the linear model that we generated in parts 1-10, we determine the best subset for the regression to predict Credit Balance. Thus from Minitab we have Best Subsets Regression: Credit Balan versus Size, Years, Income ($100 Response is Credit Balance($) I n c o m e ( $ Y 1 S e 0 i a 0 No of Mallows z r 0 Vars R-Sq R-Sq(adj) Cp S e s ) 1 56.6 55.7 56.3 620.16 X 1 39.3 38.0 97.3 733.85 X 2 80.3 79.5 2.4 422.03 X X 2 57.1 55.3 57.1 622.86 X X 3 80.5 79.2 4.0 424.72 X X X From values of R-Square in the results above, we can see that alone “Size” is a good predictor for Credit Balance in absence of any other information; however, best prediction of Credit Balance is achieve with all three variables inclusive having R-Square Value of 80.5% which indicates that the model best fits the data points. However, discarding the variable “Years” does not significantly impact the regression model as our R-Square value is 80.3% where as R-Sq (adj) is 79.5%. Mallow’s Cp is another statistic for assessing how well the model fits the data. Mallows' Cp should be close to the number of predictors contained in the model plus the constant. Using Mallows' Cp to compare regression models is only valid when you start with the same set of variables. When all the variables are considered the value of Mallow’s Cp is 2.4~3 which is equal to number of predictor variables (=2) plus constant (=1). References Cameron, A., Windmeijer, F., Gramajo, H., Cane, D. and Khosla, C. (1992). An R-Squared Measure of Goodness of Fit for Some Common Nonlinear Regression Models. Journal of Econometrics, 72 (2), pp. 1790-1792. Statisticshowto.com (2009). What is the Pearson Correlation Coefficient?. [online] Retrieved from: http://www.statisticshowto.com/articles/what-is-the-pearson-correlation-coefficient [Accessed: 17 Aug 2013]. Read More
Cite this document
  • APA
  • MLA
  • CHICAGO
(“Staistic assignment Example | Topics and Well Written Essays - 1000 words”, n.d.)
Staistic assignment Example | Topics and Well Written Essays - 1000 words. Retrieved from https://studentshare.org/statistics/1484501-staistic-assignment
(Staistic Assignment Example | Topics and Well Written Essays - 1000 Words)
Staistic Assignment Example | Topics and Well Written Essays - 1000 Words. https://studentshare.org/statistics/1484501-staistic-assignment.
“Staistic Assignment Example | Topics and Well Written Essays - 1000 Words”, n.d. https://studentshare.org/statistics/1484501-staistic-assignment.
  • Cited: 0 times

CHECK THESE SAMPLES OF Scatterplot for Credit Balance and Size

Excel Spreadsheet on Italian Government

The paper "Excel Spreadsheet on Italian Government" highlights that generally, macroeconomics factors are very important in determining the level and performance of the economy.... To get the economy moving, the government must borrow and consume appropriately.... ... ... ... In recent times, it is necessary to motivate a study of the statistical correlation between the correlation relationship between government consumption and the GDP and between government stock yield and the current GDP....
8 Pages (2000 words) Term Paper

Credit Rationing and Compensating Balance

The calculation on interest rates on a compensating balance and installment loan will assist an individual or a business entity as investors to understand the true cost of debt.... Surname: Instructor: Course: Date: Credit Rationing Compensating balance is the minimum balance that must be held by an account and qualify an investor whether an individual or a company for the qualifications of a loan.... For example, if an investor borrows $100,000 and the bank requires a deposit of $10,000 then that is a compensating balance loan....
3 Pages (750 words) Essay

Financial Policies and the Value of the Firm: The Role of Corporate Governance

This dissertation "Financial Policies and the Value of the Firm: The Role of Corporate Governance " discusses global trade, differences in values, motives, and political power that have established a major need for the financial strategies to survive on the markets.... ... ... ... As technology advances exponentially, so is the telecommunications market....
28 Pages (7000 words) Dissertation

Balance of Payments Account

The main idea of this study is to give detailed information about the balance of payments account.... The author assesses the components of the balance of the payments, merchandise, invisible imports and exports, capital account, the unilateral transfers account.... The balance of payments accounting is normally recorded on the balance of payments account.... The balance of payments account also reflects whether a country is a debtor or creditor....
11 Pages (2750 words) Essay

Role of the Interest Rate in the Financial Markets

This assignment "Role of the Interest Rate in the Financial Markets" determines the impacts of policies on the interest rate, assessing investments, capital formation, and inflation in an economy.... Interest rates are determined as credible and predictable to determine the effectiveness of policies....
21 Pages (5250 words) Assignment

The Golf Stock Price Study

The paper contains descriptive statistics of the golf stock price study.... The eight variables are taken from the Golf/Stock Price Study are Handicap 04, Rank in Industry 04, Market Value, Sales, profitability, netmargin2003, Recent share price, and estimated earning per share 2004.... ... ... The average recent share price of the companies was about 41 (SD = 21....
5 Pages (1250 words) Assignment

Analysis of Income, Credit Balance, Size, and Location: AJ Davis

"Analysis of Income, credit balance, Size, and Location: AJ Davis" paper argues that income and credit card balance has systematic distributions, and measures of central tendencies can inform decision making on variables.... he credit balance is another variable to the company as it identifies customers' ability to meet their credit obligations.... The following table summarizes some of the statistics for the credit balance of the company's customers....
8 Pages (2000 words) Statistics Project

Statistics Project Module Analysis

The assignment "Statistics Project Module Analysis" focuses on the analysis of the project module in statistics.... The overall goal of this learning activity is to visualize the relationship between two scale variables creating scatterplots and to quantify this relationship with the correlation coefficient....
5 Pages (1250 words) Assignment
sponsored ads
We use cookies to create the best experience for you. Keep on browsing if you are OK with that, or find out how to manage cookies.
Contact Us