Using Simulated Annealing to Prioritize Query Results

Using Simulated Annealing to Prioritize Query Results:
Some Initial Findings

Major Bernard J. Jansen
Dept. Electrical Engineering and Computer Science
United States Military Academy
West Point, New York 10996
Telephone: (914) 938-5559
FAX: (914) 938-5956
Email: jjansen@acm.org

Please Cite: Jansen, B. J. 1997. Simulated annealing for query results ranking. ACM Computer Science Education Conference. San Jose, CA.

Go to Publication List

Problem and Motivation:

As the quantity of available electronic information increases, finding the information that one needs becomes more difficult. Commonly, a user will search using a couple of key terms, and she then spends hours manually browsing hundreds of documents to find the relevant information. This unfortunate situation occurs because search engines are primarily passive tools.

The overall goal of this research is the make search engines active personal assistants using an autonomous agent. An autonomous agent monitors user actions and determines which of the remaining query results are relevant to the user. User actions can be saving, printing, or browsing a particular document. The agent then reorders the remaining query results using this new knowledge of user desires. The focus of this paper is on the first phase of the research, which explores a method for the agent to both prioritize query results and rapidly unlearn inaccurate knowledge concerning what information is relevant.

Background and Related Work:

A search engine should retrieve documents that are of interest to the user. To accomplish this goal, a search engine has an inference mechanism, which is the query and query algorithm that model the desires of the user (Croft and Thompson, 1986). To improve a search engine’s retrieval performance, one must enhance the inference mechanism of the engine. One method to accomplish this enhancement is for an agent to monitor user actions on a particular query result list and build a user profile that represents the user’s information desires. Based on the user profile, the agent decides relevance on each of the remaining results and prioritizes using this new knowledge. However, prioritizing information is a perplexing problem in information retrieval.

Given that there exists a set of documents and a user who has an interest in the information in some of them, optimal information retrieval is: Find all the relevant and none of the irrelevant documents (Mitchell et. al., 1994). Recall and precision are two accepted standards of performance for comparing and evaluating retrieval systems (Mitchell, et. al. 1994) (Salton, 1987) (Salton, 1992). The definitions of these performance standards are:

Recall = Relevant Documents Retrieved
Total Number of Relevant Documents

Precision = Relevant Documents Retrieved
Total Number of Retrieved Documents

By teaching an agent via a user profile, this research aims to improve precision without seriously degrading recall. Relevance feedback is the mechanism used to build the user profiles, thereby teaching the agent. The assumption behind relevance feedback is that relevant documents resemble each other. Therefore, vectors of keywords and descriptors that represent each document will also be similar (Salton, 1987). Relevance feedback is a proven method for generating improved queries (Salton, 1992) (Salton and Buckley, 1988) (Maes et. al., 1996). However, relevance feedback also reduces recall.

To overcome this degradation, this research modifies the relevance feedback algorithm to generate a user profile based on user actions. The agent then uses the new knowledge contain in the user profile to determine the relevance of the remaining query results. The advantage of this agent-based, query result prioritizing approach is that it places no additional demands on the user, and there is little degradation of recall.

Genetic algorithms, ID3, and connectional algorithms were rejected as teaching mechanisms. These algorithms were rejected because they need a large sample size to perform adequately (Learning, 1996) (Sheth, 1994). Building a large sample size requires more time than is available during a single search session.

Along with relevance feedback, the autonomous agent is the concept that makes a passive search engine an active personal assistant. This research defines an agent as: A design model similar to client-server computing, not strictly a technology, program, or product, that manages complexity and implements delegation (IBM, 1996). An information retrieval agent must accomplish two objectives, which are to seek the best possible solution using existing knowledge and unlearn previous knowledge when it becomes inaccurate (Sheth and Maes, 1993).

The most common approaches used to address these two objectives are usually derivations of simulated evolution. In simulated evolution, a population of agents simulates a biological process where each generation adapts to its environment. Unfortunately, this process is slow and requires more information then is available during a single search session.

Approach:

This research instead uses a simulated annealing algorithm to accomplish these two objectives. Simulated annealing was selected due to its relative quicker solution time and the analogous fit between inaccurate relevant rankings and the local minima problem.

Simulated annealing has not been previously used in the information retrieval area, except for Boltzmann machines (Biron, 1995). Simulated annealing is a combination of deterministic and stochastic methods (Kirkpatrick, Gellatt, and Vecchi, 1983). It is neither a random nor greedy algorithm. The simulated annealing algorithm uses a temperature and a cost function. The algorithm attempts to reduce the end state cost determined by the cost function during each execution. Usually if the end state has a higher cost than the previous state, the algorithm rejects it. However, the algorithm accepts an end state with a higher cost with a certain probability (i.e., the temperature). This temperature component permits the agent to jump out of a "local minima" caused by inaccurate user profiles. For this research, the algorithm prioritizes the query result list to have the lowest possible summed weights for the entire list.

Results and Contributions:

The results from the prototype search engine confirm that simulated annealing can efficiently prioritize query results. This finding is significant because simulated annealing has the potential to overcome the initial and inaccurate knowledge in a relative short amount of time. Deterministic algorithms can become stuck in local minima, and simulated evolution algorithms require an extended period to unlearn inaccurate knowledge. The deterministic capability of simulated annealing allows the agent to prioritize query results even during periods of user uncertainty. The probabilistic aspect of simulated annealing will permit the agent to unlearn incorrect knowledge and "jump" to a more optimal solution. Exploiting this probabilistic aspect is the next phase of this research.

References:

Biron, Paul V. Boltzmann Machines: Learning and Stochastic Information Retrieval.
1 October 1996. [http://argo.gslis.ucla.edu/pbiron/pubs/boltz/boltz.html].

Croft, W.B. and R.H. Thompson. I3: A New Approach to the Design of Document Retrieval Systems. COINS Technical Report 87-58. July 23, 1986.

IBM Corporation. Intelligent Agent Strategy.
[http://activist.gpl.ibm.com:81/WhitePaper/ptc2.htm]

Kirkpatrick, S. C. D. Gellatt Jr., and P. Vecchi. Optimization by Simulated Annealing. Science. Vol. 220. No. 4598. 13 May 1983.

Learning Systems: An Overview. 3 October 1996. [http://aruba.ccit.arizona.edu/].

Maes, Pattie, T. Darrell, B. Blumberg, A. Pentland. The ALIVE System: Wireless,
Full- Body Interaction with Autonomous Agents. To be published in a Special Issue on Multimedia and Multisensory Virtual Worlds, ACM Multimedia Systems, ACM Press, Spring 1996. [http://agents.www.media.mit.edu/groups/agents/papers.html]

Mitchell, Tom. Caruana, Rich. Freitag, Dayne. McDermott, John. Zabowski, David. Experience with a learning personal assistant. Communications of the ACM. 37(7): 80-91. 1994 Jul.

Salton, Gerard. The State of Retrieval System Evaluation. Information Processing & Management. 28(4): 441-449. 1992.

Salton, Gerard. Historical Note: The Past Thirty Years in Information Retrieval. Journal of the American Society for Information Science. 38(5): 375-380. 1987 Sep.

Salton, Gerard. Buckley, Christopher. Term-Weighting Approaches in Automatic Text Retrieval. Information Processing & Management. 24(5): 513-523. 1988.

Sheth, Beerud and Pattie Maes. Evolving Agents for Personalized Information Filtering. Artificial, Intelligence Applications, 1993 Conference.

Sheth, Beerud. A Learning Approach to Personalized Information Filtering. Learning and Common Sense Section T. R. 94-01, MIT Media Laboratory,
January 1994. [http://agents.www.media.mit.edu/groups/agents/papers.html]