Sunday, July 21, 2019

A Survey on Ranking in Information Retrieval System

A Survey on Ranking in Information Retrieval System Shikha Gupta Abstract Available information is expanding day by day and this availability makes access and proper organization to the archives critical for efficient use of information. People generally rely on information retrieval (IR) system to get the desired result. In such a case, it is the duty of the service provider to provide relevant, proper and quality information to the user against the query submitted to the IR System, which is a challenge for them. With time, many old techniques have been modified, and many new techniques are developing to do effective retrieval over large collections. This paper is concerned with the analysis and comparison of various available page ranking algorithms based on the various parameters to find out their advantages and limitations in ranking the pages. Based on this analysis of different page ranking algorithms, a comparative study has been done to find out their relative strengths and limitations. This paper also tries to find out the further scope of researc h in page ranking algorithm. Keywords Information Retrieval (IR) System, Ranking, Page Rank, HITS, WPR, WLR, Distance Rank, Time Rank, Query Dependent, Context. 1. INTRODUCTION 1.1 Information Retrieval System Information retrieval systems are defined as some collection of components and processes which takes input in the form of a query from the user to the system, then compares it with the information which has been collected by the system, and then produce an output, which is some set of texts or information objects considered to be related to the query. It is the activity of obtaining the information resources which are relevant to an information need(query) from a collection of information resources. Data structure used by an IR system is Inverted index which is an index of {term, doc IDs} entries. IR system consists of three main components: firstly the user in the system; then the knowledge resource on which the user has an access and with which s/he interacts; and, a person(s) and/or device(s) that supports and mediates the interaction of the user with the knowledge resource (the intermediary). User FeedbackUser Query RankedExecutable DocumentsQuery Fig: IR architecture In an IR System the processes which are to be considered as important are: Representation of the user’s information problem which is in the form of texts in the knowledge resource: e.g. indexing; Comparison of representation of texts and information problem: e.g. retrieval techniques; Interaction between the user and an intermediary: e.g. human-computer interaction or reference interview; and, sometimes, Judgment of appropriateness of the text to information problem submitted by the user: e.g. relevance judgments; and Modification of the representation of an information problem: e.g. query reformulation or relevance feedback. 1.2 Ranking Ranking is a process of arranging the resulted documents in the order of their relevancy. An information retrieval process begins when the user enters aqueryinto a system. Queries can be defined as formal statements ofinformation needs, for example the search strings in web search engines. In information retrieval not only a single object uniquely identifies a query in the collection, rather, several objects may match the query, but, with different degrees ofrelevancy. Most of the IR systems compute a numeric score for each object in the database to determine how well each of them matches the query, and then it rank the objects according to this calculated value. After ranking, objects having top ranks are shown to the user. The user can then iterate the process by refining the query, if required. Use of ranking To improve search quality. To do effective retrieval over large collections. Granting relevant, efficient, fast and quality information against the user query. 2. RELATED WORK In this paper, a review of previous work on ranking is given. In the field of ranking, many algorithms and techniques have already been proposed but they all seem to be less efficient in efficiently granting the rank. The various algorithms are defined below. . Page Rank Algorithm Page Rank Algorithm is one of the most common ranking algorithms. It is alink analysisalgorithm which provides a way of measuring the importance of pages. Its working is based on the number and quality of links to a page to make a rough estimate of the importance of the page. It is based on the assumption that more important pages are will receive more links from other pages. The numerical weight that it assigns to any given elementEis referred to as thePageRank of Eand is denoted by PR (E). HITS Algorithm Hyperlink-Induced Topic Search(HITS; also known ashubs and authorities) is alink analysisalgorithmthat rates pages. In links and out links of the web pages are processed to rank them. A good hub represents a page that pointes to many other pages, and a good authority represents a page that was linked by many different hubs. The scheme therefore assigns two scores for each page: its authority, which estimates the value of the content of the page, and its hub value, which estimates the value of its links to other pages. HITS algorithm has the limitation of assigning high rank value to some popular pages that are not highly relevant to the given query. Hubs Authorities Fig: Hubs and Authorities Weighted Page Rank Algorithm Weighted Page Rank algorithm (WPR) is an extension to the standard Page Rank algorithm. The importance of both in-links and out-links of the pages are taken into account. Rank scores are distributed based on the popularity of the pages. Number of in-links and out-links are observed to determine the popularity of a page. This algorithm performs better than the conventional Page Rank algorithm in terms of returning a large number of relevant pages to the given query. Weighted Links Rank Algorithm Weighted links rank (WLRank) algorithm is a variant of Page Rank algorithm. Different page attributes are considered to give more weight to some links, for improving the precision of the answers. Various page attributes which are considered for assigning the weight are: tag in which the link is contained, length of the anchor text and relative position in the page. The use of anchor text is the best attribute of this algorithm. Distance Rank Algorithm It is an intelligent ranking algorithm based on learning. In this algorithm, the distance between pages is calculated. The distance is deà ¯Ã‚ ¬Ã‚ ned as the number of ‘‘average clicks’’ between two pages. It considers distance between pages as a punishment and therefore aims at minimizing this distance so that a page with less distance will get a higher rank. The Advantage of this algorithm is that it can find pages with high quality and more quickly with the use of distance based solution. Also, the complexity of Distance Rank is low. The Limitation of this algorithm is that it requires a large calculation to calculate the distance vector. Time Rank Algorithm This algorithm utilizes the time factor to increase the accuracy of the web page ranking. In this the rank score is improved by using the visit time of the page. The visit time of the page is measured after applying original and improved methods of web page rank algorithm to know about the degree of importance to the users. Time factor is used in this algorithm to increase the accuracy of the page ranking. It is a combination of content and link structure. It provides satisfactory and more relevant results. Query Dependent Ranking Algorithm This algorithm is used to point out a large variety of queries. The similarities between the queries are measured. The ranking of documents in search is conducted by using different models based on different properties of queries. The ranking model in this algorithm is the combination of various models of the similar training queries. Categorization by context This approach proposes a ranking scheme in which ranking is done on the basis of context of the document rather than on the terms basis. Its task is to extract contextual information about documents by analyzing the structure of documents that refer to them. It uses context to describe collections. It is used to overcome the disadvantages of term based approach. 3. CONCLUSION AND FUTURE SCOPE A large number of algorithms are present today which can be used for ranking the pages in Informational Retrieval System. There will always be a scope of better ranking of pages as each algorithm has its associated advantages and disadvantages. In term based approach, there are problems of Synonymy (means multiple words having the same meaning) and Polysemy (means that a word has multiple meanings). On the other hand, in context based approach, the problem is that the pages which refer to a document must contain enough hints about its content so that they are sufficient to classify the document. According to the requirements of the user, the IR system should use an appropriate algorithm. Use of an efficient algorithm will provide speedy response, and, accurate and relevant results. REFERENCES [1] Wenpu Xing and Ali Ghorbani, â€Å"Weighted PageRank Algorithm†, In proceedings of the 2rd Annual Conference on Communication Networks Services Research, PP. 305-314, 2004. [2] Ricardo Baeza-Yates and Emilio Davis ,Web page ranking using link attributes , In proceedings of the 13th international World Wide Web conference on Alternate track papers posters, PP.328-329, 2004. [3] H Jiang et al., TIMERANK: A Method of Improving Ranking Scores by Visited Time, In proceedings of the Seventh International Conference on Machine Learning and Cybernetics, Kunming, 12-15 July 2008. [4] Jon Kleinberg, â€Å"Authoritative Sources in a Hyperlinked Environment†, In Proceedings of the ACM-SIAM Symposium on Discrete Algorithms, 1998. [5] Ali Mohammad Zareh Bidoki and Nasser Yazdani, â€Å"DistanceRank: An Intelligent Ranking Algorithm for Web Pages†, Information Processing and Management, 2007. [6] Dilip Kumar Sharma and A. K. Sharma,â€Å" A Comparative Analysis of Web Page Ranking Algorithms†, in International Journal on Computer Science and Engineering, 2010. [7] Giuseppe Attardi and Antonio Gullà ¬, â€Å"Automatic Web Page Categorization by Link and Context Analysis†, [8] Parul Gupta and Dr. A.K.Sharma, â€Å"Context based Indexing in Search Engines using Ontology†, 2010 International Journal of Computer Applications. [9] Abdelkrim Bouramoul, Mohamed-Khireddine Kholladi1 and Bich-Lien Doan, , â€Å" USING CONTEXT TO IMPROVE THE EVALUATION OF INFORMATION RETRIEVAL SYSTEMS† International Journal of Database Management Systems, May 2011. [10] Xiubo Geng, Tie-Yan Liu, Tao Qin, â€Å"Query Dependent Ranking Using K-Nearest Neighbor†, SIGIR’08, July 20–24, 2008, Singapore

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.