The Deep Web, Dark Web Coursework Example | Topics and Well Written Essays

THE DEEP WEB (DARK WEB) Introduction The Web is too vast for the search engines to index everything that the Web has in store. Efforts to identify the problem reveals the technicalities and challenges of using the current search engines. These technicalities are responsible for the division of the Web into two diverse sections; the Deep Web and the surface Web. The surface Web is the common section where the search engines are able to easily index the sites while the Deep Web is mostly inaccessible for regular searching, as it is denied from the indexes retrieved by the search engines. The exploration of the Deep Web in this research starts with the examination of the sufficient literature which provides more insights into the background information about the Deep Web and the Web crawlers. The Deep Web is the most important strategy used by Web crawlers in generating the hidden Web content. The search engine is very impotent in Deep Web as it allows users to locate their sites of interest or places of interest in the World Wide Web as a query interface. When a user types the key words in the query interface, one is able to get access to the knowledge behind it. The search engine, through indexing is used by World Wide Web crawlers seeking to get access to the Deep Web. As such, this is prerequisite information about the Deep Web. First, this study will explore the surface Web and the weaknesses it has that prompts the need for the Deep Web, how much overlap is possible from a search engine query and lastly, the Web crawler system. The complications with the surface Web As noted earlier, the Web is divided into two parts, with the surface Web being the indexable, visible and smaller section while the Deep Web is the section which is rich in data which has been hidden from the general searches. The results generated for the surface Web are those that search engines are capable of gathering and displaying them to the public as query results. However, in the last decade, there were technical details that were modified such that some content which had earlier been listed as Deep Web were available on the surface Web. This changes are attributed to a situation whereby a search engine has been further developed such that it can generate results on the Deep Web, thereby overcoming the indexing obstacle on the Deep Web elements (He, Patel, et al. 2005). Search Engine Overlapping The scale of search engine overlapping can be estimated by examining the search engine indexes retrieved from the search engine when using the three service providers; Goggle, Microsoft and Yahoo. The results from the three service providers using the same keyword retrieves different keyword results. As such, a user may then ask whether the search engine generates all the useful information one requires or does one need to conduct multiple searches to get the information. This shows the weakness of the surface Web in generation of the most effective search results from the keyword query. Since the surface Web is very vast, they have many pages which cannot be crawled fully by the search engines, and therefore, they do not explore the surface Web completely. Even though the surface Web has large indexes, it is no surprise that there is very minimal search engine overlap. This is because the Web crawlers which are found in every search engines are governed by rules stating the address and the depth that the crawler is supposed to cover. As such, the Web crawlers will go on diverse routes and generate documents which are different from the others since the servers are different among the different Web crawlers (Broder, 2001). The Deep Web There are different alternative names to refer to the Deep Web such as the ‘Dark Web’, ‘Deep Net’, ‘Invisible Net’ and the ‘Hidden Web’. The acceptance of the term ‘Deep Web’ was popularized through Michael K. Bergman’s writings “Deep Web: Surfacing Hidden Value” (Bergman, 2001). As such, this paper will refer to the Deep Web as the part of the Web that is not well indexed by the search engines or not indexed at all. In the introduction of this paper, it was noted that the surface Web presents a challenge to the engineers since most content required for use is not indexed. This can be illustrated by the increasing number or documents retrieved from the Deep Web. The statistics from the usage of Deep Web show that in 2000, the number of documents resulting from Deep Web were 2.5 billion. The rate of Deep Web documents has since then been on the increase at a rate of 7.5 million documents daily reaching an approximate size of nineteen terabytes. Goggle was ranked as having the most number of search engines indexed from the surface Web with 1.35 billion documents which is a representation of approximately 54% of the entire surface Web. In comparison to the surface Web statistics, the Deep Web recorded the highest number of documents crawled daily. The Deep Web had 220 times more documents than the surface Web with over 550 billion documents. Furthermore, the Deep Web had 400 times more data than the surface Web with approximately 7,500 terabytes. According to Bergman (2001), the Deep Web sites were estimated to be lightly more than 200,000. What is the importance of Deep Web? The simple answer to this question is: the Deep Web is very important as it contain all the useful information. As noted in the statistics above, it makes up to more than 99% of the total Web. Businesses that trade or require information on a daily basis find the Deep Web to be very importance for their business to run effectively on reliable and timely information. Major large organizations that possess huge amounts of information which is of high quality have their content placed online such as patents, trademarks, government-related knowledge bases, news broadcasts, medical research, flight schedules, shopping catalogs, and financial information. This type of information is located on the Deep Web databases as it offers such large organizations privacy and safety for their sensitive data (Wright, 2009). While of the information is stored in the surface Web where it can be indexed, most of the withheld information is found on the Deep Web in order for it to remain hidden and to avoid being accessible to the generic searches using the search engine keywords. As such, the information stored on the Deep Web requires additional querying interfaces to be retrieved since it is stored deeper in the Web. Since the Deep Web is enriched with data and documents due to its structured nature of data, users stand to benefit more from reaching the Deep Web when searching for information in the Web and it will not only increase the number of documents available for perusal, but also enhance the quality of the search engines. It also has an implication in shaping the way most organizations transact and do business online (Madhavan, et al. 2008). Content of the Deep Web In order to understand the importance of the Deep Web, it is imperative to introduce the content or the databases that collectively generate the largest percentage of information on the Deep Web. The content is classified into five fields: Databases of the Web Most of the Deep Web is covered in databases making this the largest segment of the Deep Web. These are table that contain rows which represent the working units. The users searching the Deep Web is interested in exploring the rows as they contain the information found in the custom queries. The rows are made of small divisions known as attributes which contains the informational value to be retrieved during information search. Such databases include; PostgreSQL, MySQL, Microsoft SQL Server, Oracle and Access. Dynamic and scripted Web pages Dynamic and scripted Web pages are retrieved by choice of use of a database which shows the information the user is interested in. This type of content is an improvement of the static HTML document. The Web pages refreshes on its own some of its part without the need of uploading the whole document from the servers or the remote system. The most notable difference with the dynamic is that each time a user visits the site, there is a different experience and also the experiences are time-related as new feedback is available to the users every time they visit the site. Private and limited content One of the complications and challenges of the Web crawlers is the limited and the private content on the Web. Deep Web allows users to achieve their online privacy in order to protect them from instances such as fraud, identify theft, patent theft or hacking. Having the privilege of limited and private content allows users to store documents for certain actions such as allowing information to be visible for a specified amount of time, purchasing personal details and using them online as well as carrying out transactions involving currency. It is most unlikely that the Web crawler may gain access to such content by imitating the complex process involving the site-specific actions required to complete the search. Non-HTML content During the last decade, stakeholder search engines have been able to index non-HTML files in PPT (Microsoft PowerPoint), XLS (Microsoft Excel), DOC (Microsoft Word), and PDF (Adobe) by converting them to HTML and thus making them part of the Deep Web. This also includes the search for multimedia of the Web as the Deep Web now includes the indexing of content such as video, audio r images searches. Since searching through the media format’s meta-information is complex on the Deep Web, the content is converted into a standard form. Exchangeable Image File Format (EXIF) is one of the most popular standards that stores the multimedia content in form of TIFF or JPEG. Unlinked content The Web crawler in the surface Web cannot find the unlinked data as it travels with a set of rules guided by links. As such, a Deep Web is famous for retrieving the documents that have been detached from the Web by application of a different indexing system. Deep Web navigation process The navigation process of the Deep Web involves getting to the Web interface and finally to the results page where the data of interest is located. The process can take three steps. First, the search can begin from the interface in the form fields, then the intermediate page which is between the results page and the interface page- this page allows the user to choose where to go first, either on the results page or the interface- and finally, the results page where the documents of interest are located. To navigation of the Deep Web starts with the identification and labeling of the significant form filed input elements. Afterwards, the user then employs the user-provided input values to fill in the form fields appropriately at runtime, the form fields are then submitted to check if the results page is available (Wang and Hornung, 2008). One of the processes that can be used to complete the search is the ‘page-keyword-action paradigm’. This system is known to fill out forms using the parameters for inputting the data and then submitting the form in order to get to the search results. It is imperative to note that the basic architecture of the Web crawler is sufficient and simple but it can become challenging and complicated to the engineers due to the simplicity and scale of the surface Web. Such technical difficulties limiting the retrieval of Web databases is the major motivation of accessing the Deep Web. The surface Web has also been marked as prone to online evils such as identify theft and fraud. Storing content on the Deep Web has proven to be efficient to many organization that have to store sensitive documents related to health issues, finance, and their knowledge documents such as patents and trademarks. REFERENCES Alvarez, M., Juan, R., Fidel, C., & Pan, A., (2006). A Task-specific Approach for Crawling the Deep Web. Engineering Letters. Bergman, M. K.(2001). The Deep Web: Surfacing hidden value. Bright Planet. Molina, H., and Sriram, R., (2000). Crawling the Hidden Web. Department of Computer Science, Stanford University, Stanford. He, B., Patel, M., Zhen Z., & Chen-Chuag, K., (2005). Accessing the Deep Web: A Survey. Department of Computer Science, University of Illinois. Madhavan, J., David, K., Kot ,L., Vignesh, G., Rasmussen, A., & Halevy, A., (2008). Googles Deep Web Search. PVLDB08.Auckland: ACM. 1241-1252. Wang, Y., & Hornung, T., (2008). Deep Web Navigation by Example. Department of Computer Science, Albert-Ludwigs University, Freiburg, Germany. Wright, A. (2009). "Exploring a Deep Web That Google Cant Grasp." The New York Times. Read More

The Deep Web, Dark Web - Coursework Example

Extract of sample "The Deep Web, Dark Web"

CHECK THESE SAMPLES OF The Deep Web, Dark Web

Triangle Solutions

Play Buried Child Written by Sam Shepard

The Anatomy of Classes

Analyze one week of world economy

Fact Sheet on Bigleaf Maple

Common Scientific MythConceptions

Self-Love as a Whole Philosophy