Geography, Area classification and methodology Coursework

Geography, Area ification, and Methodology al Affiliation] Geography, Area ification, and Methodology The discussion concerns the methodology for classification of statistical ward in which a ward is placed in a group with similar variables from census. Statistical wards resulted from a policy introduced in National Statistics to minimize statistical impact of frequent transformations in the electoral ward boundaries in England. The classification ensured that each ward had minimum population of one thousand and those with less combined to reach the minimum number. The process promotes the classification of similar regions according to their specific merged features. Variable Selection The objective is to choose the least number that will fully stand for the major dimensions in the data. Variables used are demographic and socioeconomics, which cover six dimensions including household composition, demographic structure, socioeconomic, housing, industry sector, and employment. The variable are selected in a procedure with several steps. The first step considers variables from the key statistics table. Step two merges the variables to create composite variables. The third step removes variables strongly correlated through examining the correlation matrix; this is necessary and done to avoid too much influence of the census data on the result. The last step excludes variables previously considered as badly behaved and with high proportion of zeros. he Advisory Board was consulted and proposed conduction of a principal component analysis to aid variable selection. The objective of the sorting variables was to select the least likely number of variables, which adequately stand for the major measurements of the Census data in 2001. Setting of the Variables During census, five main domains were identified whose intention was to represent fully the main domains within the classification. The five identified domains are demographic makeup, domestic composition, socio–economic as well as employment. Preliminary data setting included output area level key statistics table variables, which represents the most important variables from the census published data. Initial set of data was later reduced to represent the census data in the main dimensions with minimum number of variables following detailed assessment of each variable. The process eliminated any variable adding nothing to the classification and in some cases, a composite variable was used to reduce variables. Variables representing very small sectors of the population were removed. Migration indicators were omitted because the data was absent. Optional questions with variable levels of responses in different areas of the UK such as religion were omitted. Strongly correlated variables were identified and reduced in the dataset. The Pearson correlation coefficient identified pairs of variables in which too much weight was allocated to a character if they were to be included the classification (Wallace & Denham 1996). The ONS Project Board together with the School of Geography discussed and decided the method to select variables. Some of the guidelines set for inclusion of variables include: Highly Correlated Variables In a dataset, strong correlations are undesired for cluster scrutiny since they stand for data redundancy. The data repeats most of the information contained within just one variable thereby making it hard to gauge the outcome of any single variable on the clustering process. Badly Behaved Variable Distribution For normal distribution, clustering and standardization work is reliable. However, for highly skewed distribution, it can create difficulties in standardization and clustering. This kind of problem is solved by logarithmic transformation, ranking the data, or square rooting the data. Composite Variables These variables are created from two associated variables showing comparable models and possibility to share the same denominator. The technique can cluster highly associated variables or variables representing a little part of the public (Openshaw & Rao 1995). Uncertain Variables Some variables were categorized based on enumerator’s judgmental of their observation. According to the figure of the census that dropped in 1991 from 11,550 to 10,500 in 2001, it seems improbable with the progressing trend of buying of second homes in the region throughout the period. Tax register figures suggested the actual number is thrice that specified in the census (Kaufman & Rousseeuw 1990). Data Standardization Variables had to be standardized over the same range before clustering to ensure each had similar weight in the categorization. This is important, particularly when there is a dissimilar type of data. An exemplar is population mass that gives the figure of populace per unit region and shelter, which is a proportion of all family units. The number of people fitting in a specified area limits population density and ranges from zero to 12715 people per hectare whereas housing ranges from zero to 100%. The variables are not on the same scale and if they were under standardized, the population density would be in command of the categorization because of superior range of data. The grouping techniques were similarity or dissimilarity based on the cases grouped. Detachment matrix with variables reflected on the statistics for each case was constructed to measure the grouping techniques. Variables with large dispersion influenced the determination of the final resemblance. To represent each data equally, there was need to standardize using three methods as follows. Z-score standardization is the most used form of standardization, which contrasts every value of variable xi to mean X. The value obtained is segregated by the standard deviation of every variable. For normally distributed data, Z-score works well but data may be abnormally distributed sometimes. Range standardization technique was applied in 1991 classification to compare every value of a variable to the least amount, which was divided by the detachment amid the minimum as well as the variable’s limit (Wallace & Denham 1996). However, the technique is ineffective for data containing outliers. The third method is the inter-decile range standardization; this method conquers the problems related to outliers and contrasts every value of a variable to the median that is divided by the detachment amid the 90th percentile as well as the 10th percentile. Inter-decile range standardized data initial experiments disclosed variables with extremely skewed distribution thereby steering the classification. These variables received more weight. Range standardization method was used to solve the heavy weight allocated to variables by standardizing the ward level data. Clustering Technique Techniques of hierarchical cluster breakdown fall in two major categories: agglomerative clustering proceeding with series of fusion and divisive method separating groups into finer groupings. It was created as a technique “to cluster large numbers of objects, symbols or persons into smaller numbers of mutually exclusive groups, each having members that are as much alike as possible” (Ward 1963, p. 236). The process of clustering objects is reduced to yield large but fewer clusters during the preceding hierarchy. The procedure progresses until the clustering of all objects, which are grouped into a cluster (Ward 1963). Studies done by Everitt (1993) shows those agglomerative techniques of forming clusters are mostly used. Spherical clusters of the same size are roughly produced to unite objects collectively into mounting sizes of clusters employing similarity of distance. Clusters are formed by combining groups containing single object by bottom-up approach. At the next stage, two other cases are combined to make a new cluster or a third case is added to the cluster. Formed clusters cannot be divided but can be joined only with other clusters. The linkage method is a choice of what to compare between groups containing more than one observation while choosing the similarity or dissimilarity. Cluster means change as new cases are added because of the agglomerative character of the ward’s technique and by the end, some cases are incorrect clusters, but the solution can be remodeled with k-means. This technique minimizes the values inside cluster variability as well as maximizing the variability of clusters between. It is an iterative repositioning algorithm founded on sum of squares and requires specified cluster numbers. A case is repeatedly moved by algorithm to see if it advances the sum of squares within every cluster, and it is reassigned to the cluster thereby yielding significant improvement. A stable classification is reached when there is no motion in a complete iteration. Classification of Statistical Wards The 1991 classification of statistical wards was founded on a model from all wards, which was chosen, and categorized into clusters. The residual wards were assigned to similar clusters although there was risk of bias involving the missed area. A different approach recommended by Advisory Board was used for the 2001 classification (Chariton, Openshaw & Wymer 1985). The method has several steps as follows: first is by yielding an indiscriminate wards’ classification present into 1000 clusters. K-means technique is used with starting cluster centers from the indiscriminate classification to reach 1000-cluster optimum remedy. Ward’s technique was employed to obtain the 1000 cluster from k-means. By examining the agglomeration schedule subgroups, groups, and super groups are determined. Each ward is assigned to its correct subgroup after refining it with k-mean. The other levels were attained with hierarchies from other wards (Charlton, Openshaw & Wymer 1985). Subgroup centroids changes with addition of new ward because of the agglomerative nature although the process condones the re-allocation of individual wards to the nearest subgroup. At the termination of the procedure, certain wards were more alike in other subgroups and k-means reallocates them to their correct subgroups. This process is repetitive and progresses until the attainment of steady results. To retain the hierarchical structure, the method is never carried at group or super group level. This is because wards may receive reallocations to novel groups or super groups in case they were originally at subgroups in dissimilar groups or super groups using Ward’s method. Conclusion The decisions made during classification and reasons for the decisions have been discussed together with the reason variables were included and excluded from the classification. The discussion elucidates the building of the classification database and the careful data checks performed on it. The essay explains creation of classification and clustering process behind it. The essay also outlines the creation of classification. Reference list Charlton, M, Openshaw, S & Wymer, C 1985, Some newclassifications of census Enumeration Districts in Britain. A poorman’s ACORN. Journal of Economic and Social Measurement, vol.13, pp. 69-96 Everitt, BS 1993, Cluster Analysis, London, Edward Arnold. Everitt, BS, Landau, S & Leese, M 2001, Cluster Analysis, London, Edward Arnold Kaufman, L & Rousseeuw, PJ 1990, Finding Groups in Data, John Wiley & Sons, New York. Openshaw, S & Rao, L 1995, Algorithms for reengineering 1991 Census geography, Environment and Planning, vol. 27, pp. 425-446. Wallace, M & Denham, C 1996, The ONS classification of local and health authorities of Great Britain, Studies on Medicaland Population Subjects, ONS. Ward, JH 1963, Hierarchical grouping to optimize an objective function, Journal of the American Statistical Association, vol. 58, pp. 236-244. Read More

The Classification of a Statistical Ward - Coursework Example

Extract of sample "The Classification of a Statistical Ward"

CHECK THESE SAMPLES OF The Classification of a Statistical Ward

Geodemographic report using SPSS

National Ward Level Classification

A Particular Place of Residence of a Person

Classification of Local Gangs

Structural Equation Modelling and Logistic regression

Healthcare and Statistics: Examples

Area lassification and Methodology

Critical Evaluation of On Being Sane in Insane Places by David Rosenhan