StudentShare
Contact Us
Sign In / Sign Up for FREE
Search
Go to advanced search...
Free

Web Spam Detection: Techniques and Approaches - Term Paper Example

Cite this document
Summary
Web spam is one of the major challenges for the search engine today. It refers to some of the unjustifiable techniques used by spammers to subvert the ranking algorithm of the search engine in order to raise web page position in search engine…
Download full paper File format: .doc, available for editing
GRAB THE BEST PAPER95.3% of users find it useful
Web Spam Detection: Techniques and Approaches
Read Text Preview

Extract of sample "Web Spam Detection: Techniques and Approaches"

? Web Spam Detection: Techniques and Approaches Grade (24th, March. Table of Contents Web spamming technique 6 Link Based Technique 6 Cloaking/Hiding technique 6 Content-Based technique 6 Importance of detecting spam pages 7 Web Spam detection Techniques 7 Detection methods for Link based web spam techniques 7 Link-Based detection 7 Multi level Link Structure Analysis (MLSA) 7 Experiment 8 Link Farm Properties 8 Comparison between Link Farm Properties and (MLSA) Methods 8 Page rank 8 Comparison between Link Farm Properties and (MLSA) and Page Rank Methods 8 Detection methods for Hiding Techniques 9 Cloaking Detection Methods 9 Tagged-Based detection method 9 Comparison between Cloaking Detection Methods 9 Feature Decrement 9 Content Analysis 9 Comparison between Feature Decrement and Content Analysis Detection Methods 10 Conclusion 10 References 11 Web Spam Detection: Techniques and Approaches Abstract Web spam is one of the major challenges for the search engine today. It refers to some of the unjustifiable techniques used by spammers to subvert the ranking algorithm of the search engine in order to raise web page position in search engine. Spammers use three types of spamming techniques in order to get a higher rank. These techniques are Link spam, Content spam, and Cloaking. Recently, the amount of spam web pages has increased significantly causing distrust to search results. In this paper we present some of the web spam detection methods that can be used to reduce the negative effects of spam pages. In addition, we introduce some of experimental results that illustrate some of detection methods can detect spam pages more accurate than the others. Introduction Web spam refers to the tactics that are used by Web Fraudsters to increase the search engine result ranking, thus granting such web pages an advantage over the others, in terms of the frequency and cumulative visits that they experience (Andert & Burleson, 2005). The web spammers deliberately manipulates the search engine indexes, so that their web pages can experience high traffic, a concept that arose in the 1990s and has been on the rise since then. Through Web spam, a web page is made to rank higher in the search engine results, than it would have otherwise ranked, and thus reduces the relevance and usefulness of the search engine, since the resources and information obtainable from the search engine are duplicated and less diverse. Therefore, web spamming is a major problem that is affecting both the Search engines and the internet users, since they reduce the usability of the internet as a resource, through hindering the diversity and relevance of the information obtainable, when an individual search for certain information from the internet. The visitors to the search engine feel that they have lost valuable time, since they could be searching for information that is indicated present by the spam pages, only to find that it is entirely missing. On the other hand, web spamming means a waste of valuable resources for the search engines, through hosting many pages that are highly ranked in the search engine result on their sites, yet such web pages are not genuinely helpful to the site visitors (Bolton & Hand, 2002). Therefore, it has become inevitable for the search engines to develop Web Spam Detection Techniques and Approaches, to address the negative effects of the spam pages. This discussion seeks to identify various web spam detection methods, with a view to assessing how they help in addressing the negative implications of web scam. Key words Web spam: The unethical tactics applied to make websites rank higher in the search engine results Search Engine: The vehicle through which individuals access information from the internet Spam detection methods: Techniques applied to detect the unethical tactics applied to make websites rank higher in the search engine results Related works Internet penetration and usage in the world between 2002 and 2012 The access of the internet is at its all time high currently, with the recent improvements in technology and infrastructure, which allows many individuals from different parts of the country to be able to access the relevant devices that can access the internet. The relevance of the internet in the daily lives of the people has increased, owing to its ability to supply information of all kinds, more easily and conveniently (Pingdom, 2012). According to the world internet statistics, there has been an immense growth in internet usage for the last 10 year period, between 2002 and 2012. Asia has the highest number of internet users, accounting for 44.8% of all internet users, followed by Europe at 21.5% of the total internet users population, then North America, accounting for 11.4%, Latin America follows at 10.4%, the Africa at 7.0%, with the least population of the internet users being found in the Middle East at 3.7% of all internet users population and then Australia at only 1% (Internet World Stats, 2012). However, the most fundamental aspect of the internet usage is the growth in the accessibility and internet usage that has been experienced since the year 2002. In the 10 year period between 2002 and 2012, there has been registered immense increase in the internet penetration to different populations of the world. The increase in the internet penetration to the population in Africa during this period is 15.6%; Asia has registered an increase in internet penetration to its population by 27.5% (Internet World Stats, 2012). This trend has been registered globally, with the highest increase in internet penetration to the society being registered in North America, which has registered increased internet penetration to its population to a tune of 78. 6%, followed by the Oceanic/Australia at a 67.6% increased internet penetration. Europe has registered increased internet penetration to its population by 63.3%, 42.9% for the Latin America and the Caribbean, while the Middle East has registered an increased internet penetration to its population at 40.2% (Internet World Stats, 2012). The above statistics serve to indicate that there has been an immense growth in the internet accessibility and usage in the last 10 years, which means that the number of the people affected by the unethical web spamming continues to grow. This is meant to create some dissatisfaction and discontent to a large number of people, which calls for an immediate action to address the vice. With the growth in the number of internet users, an opportunity is created for fraudsters to engage in more fraudulent activities over the internet, since the market for their vice has been expanded. The number of Internet users have almost doubled in the last five years, for the period 2007 to 2012, with an increase in the internet users all over the world from 1.15 billion users in 2007, to 2.27 billion in 2012 (Pingdom, 2012). Various factors have been attributed to this growth, which includes the increased technology advancement, which has brought about many internet access devices, which has enabled many people from different parts of the world to be able to access and use the internet. Increased literacy levels as well as improved infrastructure has also played a great role in enhancing internet accessibility in the world, since many regions of the world have been reached by technology and internet infrastructure, such as the fiber optic connection, which has made access to the internet easier and less costly for such regions (Andert & Burleson, 2005). Additionally, a higher population of the world is now able to access education, and thus the literacy levels have greatly increased. With the rise in literacy levels, the internet accessibility and usage has become much easier. Growth of Internet fraud and Web Spam Nevertheless, the growth in internet accessibility and usage has prompted a growth in the levels of internet frauds and web spam. Visa International is such one organization that has been faced by this increased problem. According to the Visa International statistics, only 2% of its overall transactions make up Visa’s overall business. However, Visa International has observed that 50% of all transactional disputes it handles are about internet transactions (BBC News, 1999). This is a clear indication that internet has become the greatest area of concern for many businesses, due to increased fraud. The case is not only unique to Visa International alone. TPG Capital is a global private investment firm, which has over $48 billion of capital circulating in different markets of the world (BBC News, 1999). However, the company has been heavily hit by Web Spam issues over the last few years, threatening to bring its business down. There has been a high publishing of fake websites by web spammers, which are tactically placed to feature high on the search engine rankings, meant to attract the existing and potential TPG Capital customers to the imposter sites, to facilitate fraud (BBC News, 1999). The company has realized that a host of web spam has been created as the domains for the company’s websites, which makes it difficult for its customers or any other interested parties to differentiate between the genuine and the spam web sites. This has lowered the company’s business, since the communication between TPG Capital and its customers through digital media has been badly affected. These are just but few examples that serve to show that the issue of web spam is real, and it is affecting many internet users, search engines and business that operate online (Pingdom, 2012). This calls for immediate remedies to address the negative implication of web spam on search engines, internet users and businesses. Goals of web spam A web spam is a page that is created to attract referrals on its site, from the search results on the search engine, for the sole purpose of increasing traffic frequency and cumulative visits to the sites. The creators of web spam pages, also referred to as web spammers, have different reasons for creating such sites. First, they create web spam to attract more viewers to visit their sites, which serves to increase the ranking and the scores of the page, and consequently deliver some financial benefits to the owners of the page. The more the visitors, the frequency and the cumulative viewing of a website page, the higher the score for such a page, and the higher the financial rewards that are obtained by the owners of the pages (Bolton & Hand, 2002). Therefore, the web spammers merely creates search pages to derive financial benefits from diverting the attention of the internet visitors from their main course to visiting their sites, which serves to waste some valuable time and resources for the visitors. The other reason for which web spam exists is for advertisement purposes (Ghiam & Nemaney, 2012). The web spammers creates pages and links that diverts the attention of the internet visitors from their main course, by creating other pages that rank high in the search engine, only for the visitors to find that such pages do not have any information relevant to them, but gives different information regarding some products and services that are offered by a given company. Such pages have information that persuades the visitors to try and purchase their products and services. This way, a visitor to the internet can eventually be enticed by the web spam to make a purchase on some products or services offered by a certain company, while their main reason for visiting the site was to search for different information. Thus, web spam are a way of promoting some products and services offered by different companies, which obviously takes the advantage of diverting the internet visitors from their main course, to advertise the products to them, and thus improve their sales and profitability, at the expense of using genuine and ethical advertising channel (Andert & Burleson, 2005). The third reason for the existence of web spam is to install malware on the computer of the visitors, with malicious goals of damaging the visitor’s computer, or stealing some confidential data from such computers, which can then be used for fraudulent purposes (Bolton & Hand, 2002). The web spammers have created different malware which are installed into the visitor’s computer once an internet user visits their site. Such malware can be destructive to the computer, or they can be simply monitoring malwares which access certain confidential information from the user’s computer, such as the IP address or other transactional details when they undertake any online transaction (Ghiam & Nemaney, 2012). Such information is then used by the fraudsters to obtain financial gains by defrauding the computer owner. Web spamming technique To accomplish various missions that the web spammers have, they apply various techniques, which fool both the search engines and the internet users, to fall into their tricks. These techniques can be categorized into three, namely the Link-based techniques, hiding techniques and content -based techniques. The different techniques are elaborated differently as follows: Link Based Technique Link-Based Technique, also referred to as Link spam, involves the process of the web spammers creating numerous links that are connected to the target pages, so as to alter the search engine ranking algorithm (Ghiam & Nemaney, 2012). The web spammers engages in the creation of a network of pages, whose contents remains the same or highly similar, thus increasing the chances of the pages being recognized any time a visitor searches some information with a key word linked to the pages. This inevitably links the internet user to the spam page, which is also interconnected to numerous other pages, thus may make the internet user consistently move from one page to the other, which are all target pages created by the spammer. This web spam technique is mostly applied by the web spammers who have the advertisement goal, which makes the internet users to move from one page to the next, all the time viewing the displayed products and services that the spammer is promoting (Bolton & Hand, 2002). The creation of an interconnected network of pages by the link spammers results to the formation of densely connected link Farms (Ghiam & Nemaney, 2012). The idea is to make the internet user to continuously visit the pages, while viewing the promotional information displayed on such pages. This is an advertising strategy meant to persuade the internet users to purchases the designated products, and thus improve the sales. Cloaking/Hiding technique This is the other strategy that the web spammers use to fool the search engines and the internet users, and thus achieve their unethical goals. This technique entails the web spammers delivering two different contents for the internet users and the World Wide Web monitor program, also referred to as the internet crawler, which deceives the monitoring program and thus helps in the manipulation of the search engines ranking algorithms (Andert & Burleson, 2005). By presenting deceptive URL to the Web crawler, the record of the requested pages is altered in favor of the web spam, and thus allows it to rank high in the search engine results. This technique is mostly applied by the web spammers who have the objective of benefiting financially from the traffic, frequency and cumulative viewing of the pages (Ghiam & Nemaney, 2012). When the web spam is ranked high in the search engines, their owners stands the opportunity of reaping the financial benefits that comes with high visitations to the site. Under this technique too, the web spammers can apply the redirection strategy, where any visit to a certain site is immediately redirected to a different page, a technique referred to as redirection (Bolton & Hand, 2002). Under this technique, once the requested page is fully loaded, it immediately redirects the internet user to a different URL, which is a target page for the web spammers, that could contain advertising or promotional information for certain products and services (Ghiam & Nemaney, 2012). This strategy is also mainly used to meet the advertising goal of the web spammers. Content-Based technique This technique entails the action of the web spammers to modify the content of the target page, such that it will reflect some information that is highly favorable for common such queries, allowing such a page to rank high, whenever such a query is posted on the search engine (Ghiam & Nemaney, 2012). The web spammers can achieve this through the repeated use of the most popular words in the target page, which prompts the search engine to recognize the page, whenever a query with that name is posted on the search engine. The repetition of the words serves to enhance the ranking of the target page, which may confer some financial benefits to the owners of the page, or help them to promote certain products and service, considering that the internet users will be prompted to visit the pages by the search engine results (Bolton & Hand, 2002). To certain extents, the web spammers normally retouch the content of the target web page, such that it reflects the words and information for the most uncommon queries, thus making it appear as the only search engine result whenever such a query is posted on the web page. This leaves the internet user without any other option, but to view the content of the displayed page, even when it is not relevant to the information being sought (Jones, 2005). Importance of detecting spam pages Various reasons make the detection of the spam pages important. First, owing to the negative effects of the spam pages both to the internet users and to the search engines, it would be beneficial for both, if such pages are detected and avoided in advance. The spam pages make internet users waste most of their valuable time, while also consuming their resources on the internet, without getting the desired information that had prompted them to visit the internet (Ghiam & Nemaney, 2012). On the other hand, web spam causes harm to the search engines, since they serve to discredit their usefulness as vehicles through which information from the internet is obtained, while at the same time wasting valuable resources for the search engines, through hosting such spam pages on their sites, making them lose financially (Bolton & Hand, 2002). Therefore, it becomes essential to detect and avoid the spam pages for both the internet users and the search engines. Another reason that makes it important to detect spam pages is the fact that some web spam are very harmful to the internet users, owing to the installation of malware in their computers, which may lead to the malfunctioning or damaging of their computer (Jones, 2005). Additionally, the installation of the malware may cause the computer user to lose valuable information to the web spammers, which can also be used to defraud them financially. Therefore, the detection and avoidance of the web spam in advance is significant, to help the internet users avoid such harmful repercussions. This explains why it is vital to have web spam detection techniques, which helps to avoid the spam pages in advance, and thus enables both the internet user and the search engines to avoid the harmful effects associated with web spam. Web Spam detection Techniques Detection methods for Link based web spam techniques Link-Based detection Under this section, the detection techniques that are applicable in addressing the Link-Base web Spam techniques are discussed. Some detection methods under this technique are the Multi level Link Structure Analysis (MLSA), Page Rank and the Link Farm Properties detection methods. Multi level Link Structure Analysis (MLSA) This detection method Link-Based detection was proposed by Tung and Adnan, where the detection technique focuses on both the links between the pages on the same domain and those pages from different domains (Ghiam & Nemaney, 2012). Since most link farms in the same domain have at least one link from a page, moving from the domain to its neighboring domain, the outgoing links in that page to the pages in the same domain and those in a different domain are collected and analyzed to see what they lead to. The pages that such links points to are all noted, and then the algorithm counts all the domain names of all the incoming and the outgoing links (Bolton & Hand, 2002). Where the domains number pointed by the links is found to be more than the predetermined threshold, then the page selected is marked as a bad page. Experiment An experiment undertaken using the Yahoo search engine, indicated that Multi level Link Structure Analysis (MLSA) is capable of detecting most spam pages. However, there are also chances that the method may detect some pages as spam, while in essence that is not the case (Ghiam & Nemaney, 2012). Thus, it was seen that this method has a potential for false positive, simply because the method uses only links to detect the spam pages, without any due consideration to the content of the pages (Ghiam & Nemaney, 2012). Link Farm Properties Due to the shortcomings displayed by the MLSA detection method, an improved method was then developed, which is referred to as the Link Farm Properties detection method. This method considers the World Wide Web as a graph and each page is perceived as a node, while each link forms the edge of the graph (Bolton & Hand, 2002). The World Wide Web is supposed to be found in the form of a scale-free network, where all the nodes are not connected to other nodes, since not all pages in the internet are interconnected. Therefore, this method is used to analyze the interconnection of pages using both their links and their content, to see whether all the pages (nodes), emerges as interconnected or as scale-free networks (Jones, 2005). Where the nodes emerge as interconnected, then, the analysis indicates the presence of a link network farm, which indicates the pages are a constituent of a spam connection (Ghiam & Nemaney, 2012). Comparison between Link Farm Properties and (MLSA) Methods The MLSA method has the disadvantage of possessing a potential for positive false spam page identification, where it may identify some pages that are not essentially spam pages, as spam (Ghiam & Nemaney, 2012). This is because, the method applies link as the mode of detecting the spam pages. However, the Link Farm Properties is best suited to identify spam pages under the Link-Based techniques, since it can apply the whole World Wide Web structure to detect the connections that defy its normal structure based on the links and the content, and thus accurately determine the spam pages (Liu, 2011). Page rank This is a method of detecting spam pages, which operates on the premise that the importance of a web page is determined by the importance of other web pages (Mintz, 2002). The observation for this method is that the pages with more out links are more important than the ones with less out links. Therefore, when a page is out-linked from an important page, its score and consequent ranking becomes higher. Therefore, if a page appears to be out-linked from other pages that are not important, yet it has a higher ranking, then it points to be a spam page (Shabtai, Elovici & Rokach, 2012). Comparison between Link Farm Properties and (MLSA) and Page Rank Methods While compared with the two other methods, namely the MLSA and the Link Farm Properties, the Page Rank spam detection method seems to be fairer and accurate in the determination of spam pages, because it uses a weighted method for comparing the importance of the pages, based on their rankings and the source of their rankings, and thus gives a more relevant detection option (Ghiam & Nemaney, 2012). Detection methods for Hiding Techniques Cloaking Detection Methods Cloaking techniques entails sending different URLs to internet users, through the browsers and the Crawlers (Mintz, 2002). Extended Primary method is one of the methods used to detect cloaking spam pages, where the content of the two copies of the page, one send to the browser and one send to the crawler are assessed for similarity. Where the content of the two copies is found to be the same, then the pages are not categorized as spam, while if the content sent to the crawler, and the one send to the browser differs, then the pages are categorized as spam pages (Ghiam & Nemaney, 2012). The same can be done with the links and terms that are sent from the page in the browser, and those sent from the page in the crawler, to assess the similarity or differences of the links and the terms in the crawler and the browser. If the links and the terms show similarity, then they are not spam pages, while if a difference is noted, then they are spam pages (Shabtai, Elovici & Rokach, 2012). Tagged-Based detection method This is a spam detection method that uses tags as the basis for detecting spam pages, where different tags are retrieved from the page in the crawler and the one in the browser, and then compared for their similarities and differences (Shabtai, Elovici & Rokach, 2012). The tags are retrieved differently, with the first retrieval observing one copy of the tag from the browser and another copy from the crawler, which are then analyzed. The second retrieval obtains one copy from the browser, another from one crawler, and the third one from a different browser. The three tags are then analyzed for their similarity or differences. If the tags from the three sources are similar, then the pages are not spam, but if they are different, then they are spam pages (Ghiam & Nemaney, 2012). In the third retrieval, two copies of tags are obtained from two different crawlers and two copies of tags obtained from two different browsers. The tags are then compared for their similarity or differences, and if they appear similar then the pages are not spam, while if they are different, then those pages are spam (Ghiam & Nemaney, 2012). Comparison between Cloaking Detection Methods A comparison between the primary method and the Tagged-Based detection method indicates that the tagged-based spam detection method works better and more accurately, since it compares the content of different crawlers and browsers exuded by the same page, to determine if it is a spam or not. This increases the precision of the Tagged-Based detection method, compared to the primary method, which compares the links (Bolton & Hand, 2002). However, since web spammers tend to change the content of the web pages, the primary method can give accurate detection in the long-term, since it considers the URL links (Ghiam & Nemaney, 2012). Detection Methods for Content-Based Techniques Feature Decrement This is a method of spam detection, where the content of the web page is analyzed based on discriminative content and link features (Ghiam & Nemaney, 2012). Here, the number of features to be used in comparison is reduced, to enhance the performance of the classification, while also improving on the accuracy. Therefore, this method takes the content of different linked pages and analyses them based on decreased content and link features, to assess the similarity and the differences of the content in the pages that are interlinked. Where the content of the different pages is found to be unrelated, non-existent or wholesomely duplicated, then the pages are marked as spam pages (Shabtai, Elovici & Rokach, 2012). Content Analysis Under this method of spam detection, the content of a web page is analyzed, to determine how it is similar or different to the other interlinked web pages, based on the content features such as words, language and anchor texts (Jones, 2005). This method analyses how each of these aspect in a web page is similar to the other web pages that are connected to the page, where if the language is found to be wholesomely similar, and all the words are found to match, and the same anchor texts that are leading to other web pages are found to match, the duplicity of the content is determined, and thus the spam pages determined (Liu, 2011). Some other pages could be found to contain wholesomely anchor texts that lead to other pages, without having any content at all. Such pages are also considered as spam pages. Visible content is yet another aspect that is used under the content analysis method of spam detection, to compare whether the visible texts between interconnected pages are similar. Where the visible content seems to reduce significantly from one page to the other, then chances are that the content was meant to manipulate the search engine so as to give the page a higher ranking, which makes the pages to be considered as scam pages (Ghiam & Nemaney, 2012). Comparison between Feature Decrement and Content Analysis Detection Methods While the Feature Decrement method of spam detection is easier and quicker; since it applies discriminative and reduced content and link features to analyze the content of web pages for spam, the Content Analysis is more comprehensive since it applies various features in its analysis, and thus it is fairer and has a high degree of precision in identifying spam pages (Shabtai, Elovici & Rokach, 2012). Data Mining Techniques for Spam Detection TrustRank This is a data mining spam detection method, which is applied to detect spam pages, which operates on the premise that honest pages points to honest pages, and rarely to spam pages (Pei, Zhou, Tang & Huang, 2008). The method entails the use of the genuine web pages as the seed set, which is then compared with the rest of the web pages, that the data set does not seem to point through its incoming and its outgoing links. The genuine web page set is allocated high trust scores, and the reverse is done to the other pages that do not seem to link with the genuine pages. The out-links points to doubtful web pages whenever allocated low trust score and high trust score, if the pages out-linked to it are not genuine. The TrustRank eventually converges pages that have high trust scores through the out-links, awarding them high scores, an indication that they are genuine pages (Pei, Zhou, Tang & Huang, 2008). The spam web pages are shown by the lower scores. Efficient Term Spam Detection This is a data mining spam detection technique, which applies the term spamicity threshold, allowing for a comparison of the term threshold and the one obtainable from the web page. If the term spamicity of the web page is found to be within the threshold, then the web page is termed as genuine, while if the web page is found to surpass the term threshold, then the web page is clarified as a spam (Pei, Zhou, Tang & Huang, 2008). The web threshold is calculated based on the Web page keyword parsing load and the Search engine querying load, which then gives the required term threshold, which is then compared with the web page term-content, to determine whether the web page surpasses the threshold (Pei, Zhou, Tang & Huang, 2008). Utility-based Link Spamicity Utility-based Link Spamicity refers to a spam detection technique, where the detection of the spam pages operates on the basis of link farm structures. Every farm of linked web pages produces its local link structures, which are then interconnected to other web pages that share the same structures as much as possible (Pei, Zhou, Tang & Huang, 2008). Eventually, the interconnection produces a mass of linked web page farms that are in the category of genuine pages, owing to the sharing of certain structures, while the isolated pages, which do not get linked with the link-farm emerge as the spam pages. The structure of the linked-farm structured is then compared with that of the isolated structure, which then produces the link spam (Pei, Zhou, Tang & Huang, 2008). Conclusion Web spam is a big problem that has adversely affected both the internet users and the search engines. The major goals for web spam are; to reap the financial benefits associated with high search engine ranking scores, to promote and advertise certain firm products, or to install malware that can eventually facilitate fraud on the computers of the internet users. To overcome the adverse effects caused by web spam to both the internet users and search engines, several spam detection techniques have been devised, which include the Link-based detection techniques, the Hiding techniques detection methods and the Content-Based technique detection methods. While all these techniques plays a greater role in enhancing the detection and consequent avoidance of the web spam pages; before they cause harm to the internet users and to the search engines, Content-Based spam are known to have a high damaging effect on the search engines, while the Link-Based web spam are most harmful to the internet users. References Andert, S., & Burleson, D. K. (2005). Web stalkers: Protect yourself from internet criminals & psychopaths. Kittrell, N.C: Rampant TechPress. BBC NEWS. (1999).The growing threat of internet fraud. BBC News Online. http://news.bbc.co.uk/2/hi/business/526709.stm Bolton, R. & Hand, D. (2002). Statistical Fraud Detection: A Review. Statistical Science 17(3): 235–255. Ghiam, S. & Nemaney, A. (2012). A survey on web spam detection methods: taxonomy. International Journal of Network Security & Its Applications 4(5): 119-134. Internet World Stats. (2012). Internet Users in the World: Distribution by World regions. http://www.internetworldstats.com/stats.htm Jones, R. (2005). Internet forensics. Beijing: O'Reilly. Liu, B. (2011). Web Data Mining: Exploring Hyperlinks, Contents, and Usage Data. Berlin: Springer. Mintz, A. P. (2002). Web of deception: Misinformation on the Internet. Medford, NJ: CyberAge Books. Pei, J., Zhou, B., Tang, Z. and Huang, D. (2008). Data Mining Techniques for Spam Detection. Pingdom. (2012). World Internet population has doubled in the last 5 years. The Technology Blog. http://royal.pingdom.com/2012/04/19/world-internet-population-has-doubled-in-the-last-5-years/ Shabtai, A., Elovici, Y., & Rokach, L. (2012). A survey of data leakage detection and prevention solutions. New York: Springer. Read More
Cite this document
  • APA
  • MLA
  • CHICAGO
(“Web Spam Detection: Techniques and Approaches Term Paper”, n.d.)
Retrieved from https://studentshare.org/information-technology/1403905-web-spam-detection-techniques-and-approaches
(Web Spam Detection: Techniques and Approaches Term Paper)
https://studentshare.org/information-technology/1403905-web-spam-detection-techniques-and-approaches.
“Web Spam Detection: Techniques and Approaches Term Paper”, n.d. https://studentshare.org/information-technology/1403905-web-spam-detection-techniques-and-approaches.
  • Cited: 0 times

CHECK THESE SAMPLES OF Web Spam Detection: Techniques and Approaches

How Spam Works

… There are two popular techniques used by spammers to puzzle message recipients: using open relay sites to send messages and adding "Received:" headers of their own creation when sending a message.... There are two popular techniques used by spammers to puzzle message recipients: using open relay sites (Yahoo!... The protocol was written in 1982, when the problem of spam has not yet emerged.... Anti-spam Resource Center, 2004) to send messages and adding "Received:" headers of their own creation when sending a message....
9 Pages (2250 words) Essay

Internet Crimes

Given the kind of internet environment that corporate and individuals operate currently, it is seen that current levels of internet mandating needs to be strongly… reinforced and different layers of security need to be laid for a practically sound system to be in place that could deal effectively with all kinds of entry into secure sites. One major internet invasion is hacking which is a kind of cyber trespass, or by the use of illegal Another is spamming, the odious practice of sending unrequited e-mails, or messages through mobile phones, etc....
14 Pages (3500 words) Essay

Definition, Organisation, and Creation of Botnets

he botmaster uses the control panel to send new exploit-code to the bots or to modify the bot-code so as to avoid detection through signature methods.... The paper "Definition, Organisation, and Creation of Botnets" portrays botnets as a major problem facing networks, devastating the economy to the extent that if they are not checked they could lead to a shutdown of online business....
7 Pages (1750 words) Case Study

Steganography: how it is used for counter/anti-forensics

It also refers to covert and secret communication and it includes techniques of broadcasting surreptitious messages by means of inoffensive cover… One does this by embedding the true message within a seemingly innocuous communication, such as audio, image, video, email, text, empty sections of disks, or executable files (Armistead, 2011 and Janczewski, hare (2009) explains that steganography works by replacing bits of unused or useless data in regular computer files such as HTML, graphics, text, and sound, with bits of different, invisible information....
7 Pages (1750 words) Research Paper

IDS Systems - Snort and Bro

There are two major techniques of network traffic monitoring, one is anomaly-based and the other is signature-based.... Intrusion detection can be carried out automatically as well as manually (Sundaram, 1996).... At the present, there exist a large number of intrusion detection systems (IDS).... Some intrusion detection systems are available in the open-source environment, which makes it easier for the organizations to adopt them according to their needs....
12 Pages (3000 words) Case Study

Differentiation Between DOS and DDOS

This malicious material can be spam, Trojan, spyware, or malware; in short, anything that damages the data or puts the individual computer user or network at risk can be termed as one of the aforementioned programs.... The paper "Differentiation Between DOS and DDOS" highlights that the history of D....
8 Pages (2000 words) Term Paper

Voice over IP

"Voice over IP" paper aims at developing a modular framework for SPIT detection and prevention.... The method applied for this purpose is to utilize the test server for the recognition of spam voice with the help of multiple modules to detect and prevent SPIT.... Like e-mail spam, voice spam (also called SPIT-spam over Internet Telephony) is a common misuse of VOIP products and services that transfer bulk messages to phones via the internet and broadcasted through VOIP....
8 Pages (2000 words) Coursework

Air Structure and Design, the Three Types of Aircraft Loads

The stress levels are computed with the help of the computer matrix techniques to solve the detailed internal load.... The reporter casts light upon the fact that the three types of aircraft loads include the quasi-static load, dynamic loads, and fatigue load.... Moreover, the quasi-static loads are designed for in-flight loads, ground handling, and in local and internal loads....
6 Pages (1500 words) Assignment
sponsored ads
We use cookies to create the best experience for you. Keep on browsing if you are OK with that, or find out how to manage cookies.
Contact Us