Data Mining and Visualization Essay Example | Topics and Well Written Essays

? Data Mining And Visualization Given the rapid development in the internet, the distributed databases have increasingly become commonly used environment in many areas (Givano, 2003). Evidently, it has become a critical task to the mine association rules in the distributed databases. An Algorithm is defined as a definite list of defined set of instructions used for completing a certain task. Provided there is an initial state, it proceeds through a clearly defined series of states eventually terminating in the end state. The concept behind the algorithm has its origin in the recording procedures used in solving mathematical problems. Name of Algorithm: Eulid’s Algorithm Criteria of the Eulid’s Algorithm To measure was defined as placing shorter measuring length S in a successful times (q times) along a longer length (L), until the remaining portion (r) becomes less than the shorter length (S). In other words, the remainder r = L – q x s, q as the quotient is the modulus, which is the integer fractional par left after the division. In order for this method to work well, the lengths starting the process ought to satisfy 2 requirements. First, the lengths need not be zero. Secondly, the subtraction need not be proper and the test should guarantee that of the two, the smaller must be subtracted from the larger one. Description of the Eulid’s Algorithm This algorithm was postulated by Euclid who poses a math problem. The problem is such that provided with 2 numbers that are not prime to each other one would find the greatest common measure for the two numbers. In this case, the number was defined as the multitude consisting of units, a counting number, and a positive integer excluding zero. The original proof of the Euclid adds a third, in which the two lengths not being prime to the other. Euclid stipulated with a view to construct a proof that the common measure for the two numbers is in the greatest. An example-diagram for the Euclid’s algorithm from the Yanhong (2002) along with other details. Euclid does not extend beyond the third measuring and does not provide a numerical example. Nicomachus provides an example of 21 and 49. When the less is subtracted from the greater; 28 is left, then again, the same 21 is subtracted from the same and 7 is left. This is again subtracted from 21 with the remainder being 14, from which 7 is subtracted. In this case, seven would be left. However, seven cannot be subtracted from 7 (Yanhong, 2002). The diagram below shows the Euclid’s algorithm. Advantages of Euclid’s algorithm Euclid’s algorithm has various advantages. First, the algorithm involves step-by-step rep. of the solution to any given problem and it also has a definite procedure hence it is quite easy understanding it. It is quite easy developing and converting it to the flowchart and finally developing it into a computer program. Additionally, the algorithm is independent of the programming language, and given that every step has its logical sequence it is easy debugging it. Disadvantages of the Eudlid’s algorithm. The use and development process of the algorithm is somewhat cumbersome, as well as time consuming given that the algorithm has to be developed first, then be converted into a flowchart and finally a computer program. Name of the algorithm: Force-based Criteria of Force-based algorithm The force-based algorithms often achieve the pattern through assigning forces in a set of edges, along with the set of nodes. In this case, the straight forward method entails assigning forces as though the edges were springs and nodes as though they were particles that were electrically charged. The graph is stimulated as though it were the physical system with the forces being applied on the nodes, by pulling the nodes together and pushing the somewhat further apart. The process is repeated iteratively until when the system reaches the equilibrium state. In this case, the relative positions of the nodes are not altered from one position to the other. The graph is then drawn at such a moment. Generally, the physical interpretation of such an equilibrium state is that forces are said to be in mechanical equilibrium. The force-directed based algorithms have various properties (Ricardo & Baeza-Yates, 2000). Description of Force-based algorithm The Force-based algorithms belong to the class of algorithms that are used for drawing graphs in a way that is aesthetic and pleasing (Ricardo & Baeza-Yates, 2000). These types of algorithms are used for positioning the nodes of a graph in either two dimensional or three dimensional space (Ricardo & Baeza-Yates, 2000). This is done in making sure all the edges of a graph are of less or more equal length with few crossing edges. Advantages First is its advantage of giving out good and quality results. In this respect, for a graph having medium size such as 50 to 100 vertices, the obtained results have mostly results that are considered to be appropriate. These results are normally based on the edge length that is uniform, symmetry, and the distribution of uniform vertex. This is one criterion, which is vital but hard to be achieved using any other algorithm type. The second advantage entails the flexibility trait. The algorithm that is force directed could be adapted easily and extended to satisfy the aesthetic criteria that are additional (Shyamasundar, 1999). This characteristic makes them be an algorithm class that is extremely versatile. For instance some of the extensions that do exist include the drawings of 3D graph, directed graphs, cluster graph, and drawings of dynamic graph. The third property involves its intuitive nature. This is so because these algorithms rely on the physical nature of objects that are common such as springs. This makes their behavior to be easily predicted. The fourth trait is its simplicity nature. This type of algorithms is simple making them to be implemented in minimal code of lines. The last property is its interactive nature. Through the drawing of the stages that are intermediate, the user could follow the evolvement of the graph, observing it unfold from a mess that is tangled into a configuration that is good looking. In other graphs that are interactive, the user could have many or one nodes pulled out of their state of equilibrium and observe them return into the original position. This property makes them be an appropriate choice of online and dynamic drawing system of graphs (Shyamasundar, 1999). Disadvantages There are two main disadvantages of the force-directed algorithms. First, the force-directed algorithm has high running time. Basically, a typical force-directed algorithm have a running time that is equal to O(n3), with n representing the number of nodes for the input graph. Secondly, the force directed algorithm has a poor local minima. This implies that it is quite easy to see to it that a force-directed algorithm produce a graph that using minimal energy particularly one with the total energy just but a local minimum. Name of Algorithm: PageRank Criteria of PageRank Algorithm It is one in which each element is assigned a numerical weighting of the hyperlinked documents with an intension of measuring the relative importance of the hyperlinked documents within the set (Newman & Watts, 2006). This algorithm can be applicable on a collection of entities having reciprocal references and quotations that are related to the webpages. In this algorithm, the numerical weight assigned to an element E is called the PageRank of E denoted by a series of cartoons that are used for illustrating the principle of a PageRank (Shyamasundar, 1999). Basing on the PageRank principle, the size of the faces is such that it is proportional to the size of other faces pointing to it as shown. Basing on its initial operation, a PageRank is a mathematical algorithm that is based upon the webgraph. The World Wide Web pages serve as nodes with hyperlinks acting as edges in consideration of authority hubs such as Usa.gov. In this algorithm, the rank value is used to denote the importance of a certain page. In this respect, a hyperlink assigned to a certain page often count as a vote for its support. Moreover, a PageRank of any page is recursively defined and is dependent upon the pageRank metric, as well as the number of pages linking to it. The page linking many pages with a high PageRank has a high rank. On the contrary, where there are lacking links to a certain web page, it implies that there is lacking support for the page. Description of PageRank Algorithm It undisputed that this is a computer era. This implies that internet is part and parcel of the everyday life and information is only but a click away. What one has to do is to open any favorite search engine such as Yahoo, Alta Vista or Google and then go ahead and type a key word and the search engine shall display the relevant pages for the search. The prime question in this case, is how does the search engine operate? In early 90s, it used to be the correct picture with the first search engine making use of the text based ranking system to decide on the pages that deemed most relevant to any given query. However, there were various shortcomings with this approach. This is because the search on some common terminology such as the internet was quite problematic. The modern search engines make use of methods involving ranking of results in order to give the best result first, which seem more elaborative than just but plain text ranking. The Page Rank algorithm is most influential algorithms used for computing relevance of the web pages and it has since been employed by the Google search engine. This idea of Page Rank was bought up by Sergey Brin and Larry Page and was made the Google trademark I 1998. In this case, this algorithm belongs to the link analysis (Newman, & Watts, 2006). PageRank is often used by the search engine of the Google. Advantages of PageRank Algorithm PageRank algorithm has an advantage of being flexible. In this case, in any context, it is possible for one to compute a context-sensitive PageRank score with the use of a classifier. Through this, one can easily compute similarities of the context on the basis of the topics and thus be able to weigh the topic sensitive PageRank vectors accordingly. This makes it possible to treat diverse sources such as emails, browsing history, bookmarks, and query history uniformly. PageRank is also transparent. In this case, the topic-based rank vectors posses an intuitive interpretation. This implies that incase, a system provides undue preference on a cetain topic, one can easily tune a classifier in use on a search context, or use the manual system to adjust the topic. While using the user context, the user can be shown the topic that the system believes best represents their interest. Privacy Some of forms of the search context increase the potential privacy concerns. It is not proper for one to send the browsing history of the user or personal details to a search engine to be used in constructing a profile. Disadvantage The major disadvantage of the PageRank algorithm is that the algorithm often favors older pages. It is because a new page, whether it is a good one or not, fails to have many links to it unless it is forms part of the existing site. Name of the algorithm: Bitap Criteria for Bitap algorithm This idea of Bitap Algorithm was proposed by Gaston Gonnet and Ricardo Baeza. It is worth noting that the original version only dealt with the letter substitution with the ability to computing the Hamming distance. It was later modified by Udi Mamber and Sun Wu both of whom suggested a modification of the algorithm used for computation of the Levenshtein distance. In this case, the algorithm can tell whether a certain text has a substring that is approximately equal to any given pattern with the approximate equality defined according to Levenshtein distance. The algorithm would consider certain strings and patterns as equal if they fall within a certain distance K with each other. In this case, the Bitap algorithm starts by pre-computing bitmasks that contains a bit for each of the elements of a pattern (Udi, 2009). From this, the algorithm can now perform much of the work using the bitwise operations whose speed is quite fast. This algorithm is the Unix utility agrep dealing with fuzzy matching of regular expressions (Udi, 2009). Given that the algorithm has a data structure it works best on the patterns with a constant length with a preference of inputs instead of small alphabets. The moment it becomes implemented, for the specified word length and alphabet, the running time of the algorithm can be predictable since it can run in O mn operations, irrespective of the pattern or structure of a given text (Udi, 2009). Description of the Bitap algorithm The bitap algorithm which can, as well be referred to as Baeza-Yates-Gonnet or Shift-or algorithm, is an algorithm that approximates string matching (Yanhong 2002). Research has shown that Bitap algorithm along with its modifications are more often than not, used for the fuzzy search though without indexing. For instance, its variation is applied in unix agrep that often work like a standard grep supporting errors in a search query while providing a limited ability to make use of regular expressions. Advantages First, this algorithm can easily and correctly isolate regions having the same properties. Secondly, the use of this algorithm provides original images having clear segmentation results alongside clear edges. Thirdly, the concept behind the algorithm is quite simple representing limited number of points to represent the property required. Disadvantages First, the process of computation is time consuming. Additionally, variations in noise, as well as intensity might lead to over-segmentation. Last but not least, the method of generating the algorithm might not distinguish clearly the real images. Name of the Algorithm: Phonetic Criteria for Phonetic Algorithm So far the best known phonetic algorithms include soundex, Caverphone, Daitch-Mokotoff Soundex, Metaphone and Double Metaphone, Match Rating Approach and Kolner Phonetik. The Soundex was developed for purposes of encoding surnames to be used in censuses. These are codes that are composed of four-character strings consisting of one single letter and followed with three numbers. Daitch–Mokotoff Soundex is a refined soundex specifically designed for matching the Germanic and Slavic origin surnames better. They are basically strings consisting of six numeric digits. Description of the phonetic Algorithm These algorithms were mostly developed to be used with English language. This, therefore, means that applying rules of this algorithm to words that are in other languages might fail to provide a meaningful result. Advantages The phonetic algorithm is used for words indexing (Balint, 2001). Disadvantages of Phonetic Algorithm The phonetic algorithm are quite complex with many exceptions and rules given that English pronunciation and spelling is complicated basing on the pronunciations and words that were originally borrowed from many other languages (Balint, 2001). Conclusion Conclusively, it is a indispensable that no humans have the ability of writing fast, long or small enough to enable him or her list all members of enumerable infinite set writing out their names in certain notation, one after another or writing their names accordingly. This, thus, necessitates the use of Algorithms. As far as this paper is concerned, basing on the scientific research paper, the above discussed algorithms are listed as data mining algorithms as identified by IEEE International Conference concerning Data Mining. These algorithms are influential data mining algorithms as far as the research community is concerned. The Force-based algorithms are essential since they serve the purpose of drawing graphs in a way that is aesthetic and pleasing. In accomplishing of this task, these types of algorithms can help position the nodes of a graph in either two dimensional or three dimensional space. The bitap algorithm, on the other hand, serves to approximate string matching. In particular, they are used for the fuzzy search. One other algorithm that has been explored in this paper is the PageRank, which is an algorithm belonging to the link analysis. This algorithm is important given that it is often used by the search engine of the Google. It is one in which each element is assigned a numerical weighting of the hyperlinked documents with an intension of measuring the relative importance of the hyperlinked documents within the set. Additionally, there are the phonetic algorithm is commonly used for words indexing. References Balint D, 2001. The algorithm for syntacx tical analysis, Computational Linguistics, Hungarian Academy of Science p. 27–47. Givano, N. 2003. The Social Network Analysis: Applications and Methods. Cambridge: Cambridge University Press. Jiang B., 2008. Self-organized natural roads for purposes of predicting traffic flows: a sensitivity study. Journal of Statistical Mechanic: Theory and Experiment P07008 (8): 8 Manber, S. 2005. Fast text searching with an error. Ney York: University of Arizona. Myers, W., 2002. The fast bit-vector algorithm for approximating string matching basing on the dynamic programming. Journal of the ACM 46 (3)395–425. Newman, M., & Watts, D. 2006. Structures and Dynamics of Networks. Princeton. London: Princeton University. Ricardo A. & Baeza-Yates, G. 2000. A New Approach towards Text Searching." Communications of the ACM, 35(Wasserman, Stanley and Katherine Faust. 1994. Shyamasundar, N., 1999. Precedence parsing using the Domolki's algorithm, The International Journal of Computer Mathematics, 6(2)pp 105–114. Udi, W, 2009. Communications of the ACM. New York: Oxford publishers. Yanhong, Li., 2002. Toward the qualitative search engine. Internet Computing, IEEE Computer Society. 3 (5): 24–28. Read More

Data Mining and Visualization - Essay Example

Extract of sample "Data Mining and Visualization"

CHECK THESE SAMPLES OF Data Mining and Visualization

Fudations of busesiness computing

Difference between Information Technology and Information System

Why Organisations Outsource Business Activities

Foundations of Business Computing - Web-Based Collaboration

Foundations of Business Computing

Information Systems Theory - Foundations of Business Computing

Data or Information Visualization

Knowledge Management System, Potential Threats to E-Commerce, Risk Analysis in Information Systems