StudentShare
Contact Us
Sign In / Sign Up for FREE
Search
Go to advanced search...
Free

The Classification of a Statistical Ward - Coursework Example

Cite this document
Summary
The paper "The Classification of a Statistical Ward" discusses that decisions made during classification and reasons for the decisions have been discussed together with the reason variables were included. The discussion elucidates the building of the classification database…
Download full paper File format: .doc, available for editing
GRAB THE BEST PAPER96.7% of users find it useful
The Classification of a Statistical Ward
Read Text Preview

Extract of sample "The Classification of a Statistical Ward"

Geography, Area ification, and Methodology al Affiliation] Geography, Area ification, and Methodology The discussion concerns the methodology for classification of statistical ward in which a ward is placed in a group with similar variables from census. Statistical wards resulted from a policy introduced in National Statistics to minimize statistical impact of frequent transformations in the electoral ward boundaries in England. The classification ensured that each ward had minimum population of one thousand and those with less combined to reach the minimum number. The process promotes the classification of similar regions according to their specific merged features. Variable Selection The objective is to choose the least number that will fully stand for the major dimensions in the data. Variables used are demographic and socioeconomics, which cover six dimensions including household composition, demographic structure, socioeconomic, housing, industry sector, and employment. The variable are selected in a procedure with several steps. The first step considers variables from the key statistics table. Step two merges the variables to create composite variables. The third step removes variables strongly correlated through examining the correlation matrix; this is necessary and done to avoid too much influence of the census data on the result. The last step excludes variables previously considered as badly behaved and with high proportion of zeros. he Advisory Board was consulted and proposed conduction of a principal component analysis to aid variable selection. The objective of the sorting variables was to select the least likely number of variables, which adequately stand for the major measurements of the Census data in 2001. Setting of the Variables During census, five main domains were identified whose intention was to represent fully the main domains within the classification. The five identified domains are demographic makeup, domestic composition, socio–economic as well as employment. Preliminary data setting included output area level key statistics table variables, which represents the most important variables from the census published data. Initial set of data was later reduced to represent the census data in the main dimensions with minimum number of variables following detailed assessment of each variable. The process eliminated any variable adding nothing to the classification and in some cases, a composite variable was used to reduce variables. Variables representing very small sectors of the population were removed. Migration indicators were omitted because the data was absent. Optional questions with variable levels of responses in different areas of the UK such as religion were omitted. Strongly correlated variables were identified and reduced in the dataset. The Pearson correlation coefficient identified pairs of variables in which too much weight was allocated to a character if they were to be included the classification (Wallace & Denham 1996). The ONS Project Board together with the School of Geography discussed and decided the method to select variables. Some of the guidelines set for inclusion of variables include: Highly Correlated Variables In a dataset, strong correlations are undesired for cluster scrutiny since they stand for data redundancy. The data repeats most of the information contained within just one variable thereby making it hard to gauge the outcome of any single variable on the clustering process. Badly Behaved Variable Distribution For normal distribution, clustering and standardization work is reliable. However, for highly skewed distribution, it can create difficulties in standardization and clustering. This kind of problem is solved by logarithmic transformation, ranking the data, or square rooting the data. Composite Variables These variables are created from two associated variables showing comparable models and possibility to share the same denominator. The technique can cluster highly associated variables or variables representing a little part of the public (Openshaw & Rao 1995). Uncertain Variables Some variables were categorized based on enumerator’s judgmental of their observation. According to the figure of the census that dropped in 1991 from 11,550 to 10,500 in 2001, it seems improbable with the progressing trend of buying of second homes in the region throughout the period. Tax register figures suggested the actual number is thrice that specified in the census (Kaufman & Rousseeuw 1990). Data Standardization Variables had to be standardized over the same range before clustering to ensure each had similar weight in the categorization. This is important, particularly when there is a dissimilar type of data. An exemplar is population mass that gives the figure of populace per unit region and shelter, which is a proportion of all family units. The number of people fitting in a specified area limits population density and ranges from zero to 12715 people per hectare whereas housing ranges from zero to 100%. The variables are not on the same scale and if they were under standardized, the population density would be in command of the categorization because of superior range of data. The grouping techniques were similarity or dissimilarity based on the cases grouped. Detachment matrix with variables reflected on the statistics for each case was constructed to measure the grouping techniques. Variables with large dispersion influenced the determination of the final resemblance. To represent each data equally, there was need to standardize using three methods as follows. Z-score standardization is the most used form of standardization, which contrasts every value of variable xi to mean X. The value obtained is segregated by the standard deviation of every variable. For normally distributed data, Z-score works well but data may be abnormally distributed sometimes. Range standardization technique was applied in 1991 classification to compare every value of a variable to the least amount, which was divided by the detachment amid the minimum as well as the variable’s limit (Wallace & Denham 1996). However, the technique is ineffective for data containing outliers. The third method is the inter-decile range standardization; this method conquers the problems related to outliers and contrasts every value of a variable to the median that is divided by the detachment amid the 90th percentile as well as the 10th percentile. Inter-decile range standardized data initial experiments disclosed variables with extremely skewed distribution thereby steering the classification. These variables received more weight. Range standardization method was used to solve the heavy weight allocated to variables by standardizing the ward level data. Clustering Technique Techniques of hierarchical cluster breakdown fall in two major categories: agglomerative clustering proceeding with series of fusion and divisive method separating groups into finer groupings. It was created as a technique “to cluster large numbers of objects, symbols or persons into smaller numbers of mutually exclusive groups, each having members that are as much alike as possible” (Ward 1963, p. 236). The process of clustering objects is reduced to yield large but fewer clusters during the preceding hierarchy. The procedure progresses until the clustering of all objects, which are grouped into a cluster (Ward 1963). Studies done by Everitt (1993) shows those agglomerative techniques of forming clusters are mostly used. Spherical clusters of the same size are roughly produced to unite objects collectively into mounting sizes of clusters employing similarity of distance. Clusters are formed by combining groups containing single object by bottom-up approach. At the next stage, two other cases are combined to make a new cluster or a third case is added to the cluster. Formed clusters cannot be divided but can be joined only with other clusters. The linkage method is a choice of what to compare between groups containing more than one observation while choosing the similarity or dissimilarity. Cluster means change as new cases are added because of the agglomerative character of the ward’s technique and by the end, some cases are incorrect clusters, but the solution can be remodeled with k-means. This technique minimizes the values inside cluster variability as well as maximizing the variability of clusters between. It is an iterative repositioning algorithm founded on sum of squares and requires specified cluster numbers. A case is repeatedly moved by algorithm to see if it advances the sum of squares within every cluster, and it is reassigned to the cluster thereby yielding significant improvement. A stable classification is reached when there is no motion in a complete iteration. Classification of Statistical Wards The 1991 classification of statistical wards was founded on a model from all wards, which was chosen, and categorized into clusters. The residual wards were assigned to similar clusters although there was risk of bias involving the missed area. A different approach recommended by Advisory Board was used for the 2001 classification (Chariton, Openshaw & Wymer 1985). The method has several steps as follows: first is by yielding an indiscriminate wards’ classification present into 1000 clusters. K-means technique is used with starting cluster centers from the indiscriminate classification to reach 1000-cluster optimum remedy. Ward’s technique was employed to obtain the 1000 cluster from k-means. By examining the agglomeration schedule subgroups, groups, and super groups are determined. Each ward is assigned to its correct subgroup after refining it with k-mean. The other levels were attained with hierarchies from other wards (Charlton, Openshaw & Wymer 1985). Subgroup centroids changes with addition of new ward because of the agglomerative nature although the process condones the re-allocation of individual wards to the nearest subgroup. At the termination of the procedure, certain wards were more alike in other subgroups and k-means reallocates them to their correct subgroups. This process is repetitive and progresses until the attainment of steady results. To retain the hierarchical structure, the method is never carried at group or super group level. This is because wards may receive reallocations to novel groups or super groups in case they were originally at subgroups in dissimilar groups or super groups using Ward’s method. Conclusion The decisions made during classification and reasons for the decisions have been discussed together with the reason variables were included and excluded from the classification. The discussion elucidates the building of the classification database and the careful data checks performed on it. The essay explains creation of classification and clustering process behind it. The essay also outlines the creation of classification. Reference list Charlton, M, Openshaw, S & Wymer, C 1985, Some newclassifications of census Enumeration Districts in Britain. A poorman’s ACORN. Journal of Economic and Social Measurement, vol.13, pp. 69-96 Everitt, BS 1993, Cluster Analysis, London, Edward Arnold. Everitt, BS, Landau, S & Leese, M 2001, Cluster Analysis, London, Edward Arnold Kaufman, L & Rousseeuw, PJ 1990, Finding Groups in Data, John Wiley & Sons, New York. Openshaw, S & Rao, L 1995, Algorithms for reengineering 1991 Census geography, Environment and Planning, vol. 27, pp. 425-446. Wallace, M & Denham, C 1996, The ONS classification of local and health authorities of Great Britain, Studies on Medicaland Population Subjects, ONS. Ward, JH 1963, Hierarchical grouping to optimize an objective function, Journal of the American Statistical Association, vol. 58, pp. 236-244. Read More
Cite this document
  • APA
  • MLA
  • CHICAGO
(Geography, Area classification and methodology Coursework, n.d.)
Geography, Area classification and methodology Coursework. https://studentshare.org/environmental-studies/1767899-geography-area-classification-and-methodology
(Geography, Area Classification and Methodology Coursework)
Geography, Area Classification and Methodology Coursework. https://studentshare.org/environmental-studies/1767899-geography-area-classification-and-methodology.
“Geography, Area Classification and Methodology Coursework”. https://studentshare.org/environmental-studies/1767899-geography-area-classification-and-methodology.
  • Cited: 0 times

CHECK THESE SAMPLES OF The Classification of a Statistical Ward

Geodemographic report using SPSS

Methodology and Practical Work In order to carry out a statistical analysis of the population segments in the target area, the statistics were gathered from the 2001 Aggregate Statistics Datasets.... Essentially, geodemography is the comprehension of complicated socio-economic information by the use of structured statistical methods (Brimicombe, 2007).... In addition, it allows statistical calculations to be performed that help in estimating the significance of results....
10 Pages (2500 words) Essay

National Ward Level Classification

the classification area, ‘geo-demographics' is used for the purpose of understanding the classification for population census.... In this assignment, I have explained the 2001 population census of UK by focusing on the national ward level classification.... … “National ward level classification essay” Student enter the Name and Code Number University or College Name of Professor 7th February, 2012.... Introduction In this assignment, I have explained the 2001 population census of UK by focusing on the national ward level classification....
7 Pages (1750 words) Essay

A Particular Place of Residence of a Person

As a result, the different decisions gathered could be appropriate or not appropriate depending on the classification that is to be created.... The paper seeks to give a transparent illustration of the methodologies that are used in creating the common Output Area classification.... hellip; Area classification is the categorization of particular areas on the basis of similarities and commonalities that exist.... nbsp; The process of area classification is done through the clustering or grouping of geographical units by the use of particular methods....
6 Pages (1500 words) Research Paper

Classification of Local Gangs

hellip; This makes the classification of gangs a bit more difficult, since some gangs are highly mobile and could cover large distances before succumbing to the police.... Classifications of Local Gangs Abstract Gangs are increasing to become a global epidemic, especially to impoverished people....
3 Pages (750 words) Research Paper

Structural Equation Modelling and Logistic regression

The paper investigated the effect of perceived ease of use and perceived usefulness of cashless payments on attitudes towards replacing cash payment systems.... Cashless payments have been popularized across recent decades, and their adoption has been important in several ways to… Despite being associated with several challenges, including fear of insecurity of personal data and theft of information leading to stealing of money....
16 Pages (4000 words) Assignment

Healthcare and Statistics: Examples

Yet, it is deemed as a statistical abuse since the reporters did not indicate the two or more sides of the story but one.... The aim of this assignment "Healthcare and Statistics: Examples" is to analyze the concepts of statistical abuse, statistical significance, sample size, etc as well as answer some of the most common questions regarding statistics as a science and its relationship with the healthcare.... Lastly, those who have conducted this research did not first consult the appointed statistical agencies to put the study under review, clarification, and analysis for validity....
7 Pages (1750 words) Term Paper

Area lassification and Methodology

The following figure shows the agglomerative schedule for the classification of the statistical wards.... It begins by explaining the details of the methodology of the national ward level classification to give a clear understanding of statistical wards and census area statistics (CAS) wards.... n 2003, the certain policy was established across National Statistics to reduce the statistical effect of recurrent electoral ward boundary modifications, especially in England....
7 Pages (1750 words) Essay

Critical Evaluation of On Being Sane in Insane Places by David Rosenhan

The coursework "Critical Evaluation of On Being Sane in Insane Places by David Rosenhan " describes the validity of the psychological diagnosis.... This paper outlines harmful labels, mental disorder detection, and control, labeling them as a dangerous act.... hellip; The psychological fraternity exists as a bisected body, with one side claiming that no such thing as sanity or insanity exists while the contrary segment argues that indeed there exists definite, accurate, and proven means of defining the two....
7 Pages (1750 words) Coursework
sponsored ads
We use cookies to create the best experience for you. Keep on browsing if you are OK with that, or find out how to manage cookies.
Contact Us