Influence of Indeterminate Values on Predictive Ability of Scoring Coursework Example | Topics and Well Written Essays

INFLUENCE OF INDETERMINATE VALUES ON PREDICTIVE ABILITY OF SCORING MODELS Introduction Commercial Banks and other financial s receive thousands of credit applications every day and in case of consumer credits it can be tens or hundreds of thousands every day. Due to the complexity associated with processing the applications manually, automatic systems are widely used by these institutions for evaluating credit reliability of individuals who ask for credit (Müller & Rönz, 2000). These systems are fed with applications of statistics and operations research models that guide in decision making of whether to allow credit or not. Credit scoring thus assists in establishing the credit worthiness of a client before responding to application. Credit scoring is predictive in nature. By use of set of predictive models, the techniques there in allow financial institutions to evaluate the credit worthiness of a client before granting credit. The set of predictive models in credit scoring determine customer legibility to credit based on the customer’s score as either good or bad customer (Sobehart, Keenan & Stein, 2000). The scores together with other business considerations such as expected approval rates, profit, churn and losses are then used as a basis for decision making. The model also determines the amount of money the customer is entitled to by evaluating the probability that the borrower will repay the whole amount within the set deadlines. Several modeling approaches for credit scoring have been in use for the last six decades. However, the development of materials in credit scoring has been slow since the concept is still in the development stages. For example, one decade ago the list of good books devoted to the issue of credit scoring was very small. Nevertheless, the situation has improved in the last decade though there is no comprehensive work devoted to assessment of credit scoring model’s quality in full complexity (Anderson, 2007; Crook et al. 2007; Siddiqi, 2006; Thomas et al. 2002). This article will contribute to the knowledge of techniques used in assessing the quality of credit scoring models. First we will discuss what a good / bad client is in relation to credit scoring. This definition will be crucial since the distinction between the two is the basis of every computation of credit scoring model quality in this study. Second, we will discuss indeterminate variables and their values on categorization of the bad and good clients in the credit scoring models. Thirdly, we will review widely used quality indexes, their properties and mutual relationships. This review will be extended by investigating some already established results connected to the quality indexes under review. Finally, we will use real financial data in a case study to investigate the application of all listed quality indexes and their appropriate computation issues. Credit scoring Credit scoring is a mathematical set of predictive models coupled with corresponding techniques used by financial institutions to grant credit facilities to their clients. In these models lies the target dependent variable definition which clearly stipulates whether a client is good or bad (Crook, J.N., Edelman, D.B., Thomas, L.C.67). However, more often they do not inculcate the scrutiny of the range between the bad and the good clients that is, an indeterminate range of values (Sobehart, Keenan & Stein, 2000). The methodologies applied in developing the credit scoring models and some measures of their quality have been discussed in various studies (Hand & Henley, 1997; Thomas, 2000 and Crook at al. (2007). The most commonly used methodology is logistic regression. Classification diagrammed approach and linear programming are also used in combination with neural network method (Thomas, 2000). Once the methodologies are applied, the models developed are tested for their goodness in delivering required results. This is done by first selecting the best model with regard to any measure of quality at the time of development and second monitoring the quality of the model after deployment into real business. The quality of scoring models is measured using quantitative indexes such as Gini index, KS statistics, Lift, Mahalanobis distance and Information statistics (Müller & Rönz, 2000). These indexes are used for comparison of several developed models at the moment of development as well as monitoring the quality of those models after deployment into real business. However, the usability of these indexes has been largely regionalized. For example, Gini index is a global measure that has been commonly used in Europe; hence it is impossible to use it for assessment of local quality. The KS is ideal if the expected cutoff value is near that point where KS is realized. It is also extensively used in North America. Although the information statistics are a global measure of model’s quality, use of graphs from these statistics product to examine local properties of a given model will be most appropriate. Especially we can focus on a region of scores where the cutoff is expected (Koláček & Řezáč, 2010). Overall, the Lift seems to be the best choice for our purpose. Since we proposed expression of the Lift by cumulative distribution functions of scores of bad and all clients, it is possible to compute the value of the Lift for any level of the score. In case of credit scoring, it is necessary to precisely define good and bad client. Usually this definition is based on the client’s number of days after the due date (days past due, DPD) and the amount past due (Koláček,& Řezáč, 2010). We need to set some tolerance level in the case of the past due amount as a way of separating what is a debt and what is not. The client could have gotten into payment delay innocently (because of technical imperfections of the system). Small amounts (e.g. less than 4 €) past due are as well inappropriate when regarded as debts (Sobehart, Keenan & Stein, 2000). Furthermore, it is necessary to determine the time horizon in which the previous two characteristics are traced. For example, good is marked for clients who have less than 90 DPD (with tolerance 4 €) ever. Accumulation of several agreements is another practical issue used in the definition of a good client. The customer could be overdue in a number of contracts but the contracts have different days past due (Wilkie, 2004). In such cases , all amounts past due connected to the client are summed up and the maximum value from days past due is taken. This approach can be applied only in some cases and especially in a situation where there is a complete accounting data. The situation is considerably more complex in case of aggregated data (Thomas, 2000). It is importance to build up set of models with varying levels of these definitions. Furthermore it can be useful to develop a model with one good/bad definition and measure the model’s quality with another (Witzany, 2009). Furthermore, it should hold that given scoring model has higher performance it should be substantiated by either having higher good or bad definition. The choice of these definitions depends greatly on the type of financial product. “Certainly there will be different definition for consumer loans of small amounts with original maturities around one year compared to those set for mortgages, which are typically connected to very large amounts with maturities up to several tens of years” (Thomas, Edelman & Crook, 2002). In various sectors clients are categorized into various sections based on various reasons. They are mainly grouped into Good, Bad, Indeterminate, Insufficient, Excluded and Rejected categories. The indeterminate values always tend to help predict the effectiveness of the dependent target variable of either bad or good. A closer examination of the tolerance level in the credit scoring model that is more often depicted by day past due (DPD) and the time horizon, tries to helps us build well informed decision variables (Thomas, Edelman & Crook, 2002). For example if the delinquent clients do meet the time horizon for the late payment of credit, then they will be grouped as good. And this evidently creates a more superior model with higher level of predictive power by building good clients against bad ones (Wand & Jones, 1995). The indeterminate values also define the risks involved in the scoring model because, the more oblivious one get in grouping the clients, the more risks the company incur. In this case the more the indeterminate values, the more risky it gets (Coppock, D.S. 12). So, a wide range of indeterminate values tends to lower the superiority of the credit scoring model in question. Indeterminate type of client is on the border between good and bad clients, and directly affects their definition as shown in the figure 1 below. If we are considering only DPD, clients with a high DPD (e.g. 90 +) are typically identified as bad, clients who are not delinquent (e.g. their DPD are less than 30 or equal to zero alternatively) are identified as good. Indeterminate are then considered delinquent customers who have not exceeded given threshold of DPD. We use this type of clients to model very good clients against very bad ones (Koláček & Řezáč, 2010). Such practice leads us to obtaining a model with amazing predictive power. However, this power dive immediately after assessing the model on whole population, where indeterminate is considered to be good. Thus the usage of this type of clients is very disputable and usually does not lead to any improvement of model’s quality. What are Indeterminate? The next type is typically the case of clients with very short history. This makes it impossible to determine the correct definition of dependent variable (good / bad client). The excluded clients are typically clients with wrong data as to be misleading (e.g. frauds, first payment defaulters). They are also marked as “hard bad”. The second group of excluded clients consists of applicants who belong to a category that will not be assessed by a model e.g. VIPs, bank employees and thus the meaning of rejected client is obvious (Anderson, 2007; Thomas et al. 2002; Thomas, 2009). In this paper, only good and bad clients are used for further model building. We do not use indeterminate category, instead we set up some tolerance level for amount past due to solve the issue with simultaneous contracts (Crook & Edelman & Thomas, 2007). We therefore remain with two parameters affecting the good/bad definition. Case study An international financial service organization has been in the consumer credit market for over three years. The company has collected enough data to create the score cards of its customers as well as apply the score cards to new customers. The development sample was of 18817 bad and 184230 good for the full sample with different good definition for models 1 and 2, 18817 bad and 276001 good for the full sample with different good definition for models 3 and 4, 1859 bad and 18390 good for the 10% of full sample with different good definition for models 5 and 6, and 1859 bad and 27580 good for the 10% of full sample with different good definition for models 7 and 8. Bad has been defined as having 90 days past due once and the rest are good as given below: Bad90 … >90 dpd Good0 … = 0 dpd Good30… Read More

Influence of Indeterminate Values on Predictive Ability of Scoring Models - Coursework Example

Extract of sample "Influence of Indeterminate Values on Predictive Ability of Scoring Models"

CHECK THESE SAMPLES OF Influence of Indeterminate Values on Predictive Ability of Scoring Models

DIOPHANTUS A KEY FIGURE IN THE HISTORY OF ALGEBRA

Reject inference applied on large data sets

Credit Scoring

Corporate failure prediction methods

Sales Prediction for Northern Household Goods

Indeterminate Sentencing Laws

Analysis of determinate trusses

Support for the Rescorla-Wagner Model