This website does readability filtering of other pages. All styles, scripts, forms and ads are stripped. If you want your website excluded or have other feedback, use this form.

About CiteSeerX | CiteSeerX

About CiteSeerX

CiteSeerx is an evolving scientific literature digital library and search engine that has focused primarily on the literature in computer and information science. CiteSeerx aims to improve the dissemination of scientific literature and to provide improvements in functionality, usability, availability, cost, comprehensiveness, efficiency, and timeliness in the access of scientific and scholarly knowledge. Rather than creating just another digital library, CiteSeerx attempts to provide resources such as algorithms, data, metadata, services, techniques, and software that can be used to promote other digital libraries. CiteSeerx has developed new methods and algorithms to index PostScript and PDF research articles on the Web. Citeseerx provides the following features.


Autonomous citation indexing (ACI) CiteSeer uses ACI to automatically extract citations and create a citation index that can be used for literature search and evaluation. Compared to traditional citation indices, ACI provides improvements in cost, availability, comprehensiveness, efficiency, and timeliness. Automatic metadata extraction CiteSeer automatically extracts author, title and other related metadata for analysis and document search. Citation statistics CiteSeer computes citation statistics and related documents for all articles cited in the database, not just the indexed articles. Reference linking CiteSeer was the first to allow browsing documents using citation links that are automatically generated. Author disambiguation Using scalable methods authors are automatically disambiguated from other authors. Citation context CiteSeer can show the context of citations to a given paper, allowing a researcher to quickly and easily see what other researchers have to say about an article of interest (no longer available). Awareness and tracking CiteSeer provides automatic notification of new citations to given papers, and new papers matching a user profile. Related documents CiteSeer locates related documents using citation and word based measures and displays an active and continuously updated bibliography for each document. Full-text indexing CiteSeer indexes the full-text of the entire articles and citations. Full boolean, phrase and proximity search is supported. Query-sensitive summaries CiteSeer provides the context of how query terms are used in articles instead of a generic summary, improving the efficiency of search. Up-to-date CiteSeer is regularly updated based on user submissions and regular crawls. Powerful search CiteSeer uses fielded search to all complex queries over content, and allows the use of author initials to provide more flexible name search. Harvesting of articles CiteSeer automatically harvests research papers from the public Web but also accepts submissions through a submission system. Metadata of articles CiteSeer automatically extracts and provides metadata from all indexed articles. Personal Content Portal CiteSeer provides certain features such as personal collections, RSS-like notifications, social bookmarking, and social network facilities. Personalized search setting and institutional data tracking is possible. Documents of users can be submitted through an easy to use document submission system


CiteSeer was the first digital library and search engine to provide automated citation indexing and citation linking by autonomous citation indexing.

CiteSeer was developed in 1997 at the NEC Research Institute, Princeton, New Jersey, by Steve Lawrence, Lee Giles and Kurt Bollacker. The service transitioned to the Pennsylvania State University's College of Information Sciences and Technology in 2003. Since then, the project has been led by Professor Lee Giles.

After serving as a public search engine for nearly ten years, CiteSeer, originally intended as a prototype only, began to scale beyond the capabilities of its original architecture. Since its inception, the original CiteSeer grew to index over 750,000 documents and served over 1.5 million requests daily, pushing the limits of the system's capabilities. Based on an analysis of problems encountered by the original system and the needs of the research community, a new architecture and data model was developed for the "Next Generation CiteSeer," or CiteSeerx, in order to continue the CiteSeer legacy into the foreseeable future.


  • We gratefully acknowledge current and past support from:
    • The National Science Foundation award CNS-0958143.
    • Microsoft Research
    • NASA
    • Qatar
  • The initial header parsing algorithm used by CiteSeerx was developed by Hui Han, C. Lee Giles, Eren Manavoglu, Hongyuan Zha, Zhenyue Zhang, and Edward A. Fox. The algorithm was further refined by Levent Bolelli and Isaac Councill.
  • Yang Song developed an initial MyCiteSeer prototype that guided later efforts.
  • Yang Sun contributed the venue analysis code for calculating impact factor statistics.

Open Source Acknowledgements

CiteSeerx is supported by numerous excellent open source applications and libraries. Specifically, we would like to thank all who participated in the development of the following projects:

We Also Recognize

  • Andrew Ng was the first to extract title and author information from the header of PostScript files.
  • The New Zealand Digital Library was the first to index the full text of PostScript research articles.
  • Dr. Eugene Garfield created the idea of citation indexing of the scientific literature.

Special Thanks

Many have contributed to CiteSeer and its continuing development. In a list in which some are surely missing, we would like to thank

  • Anurag Acharya
  • Joshua Alspector
  • Esam Alwagait
  • Jose Nelson Amaral
  • Anders Ardo
  • Bill Arms
  • Shumeet Baluja
  • Arunava Banerjee
  • Eric Baum
  • Donna Bergmark
  • Levent Bolelli
  • Kurt Bollacker
  • Shannon Bradshaw
  • Vivek Bhatnagar
  • Jay Budzik
  • Robert Cameron
  • Jack Carroll
  • Rich Caruana
  • Ingemar Cox
  • Sandip Debnath
  • Seyda Ertekin
  • Scott Fahlman
  • Umer Farooq
  • Gary Flake
  • Ed Fox
  • Eugene Garfield
  • Susan Gauch
  • Bill Gear
  • Paul Ginsparg
  • Eric Glover
  • Abby Goodrum
  • Marco Gori
  • Allan Gottlieb
  • Jim Gray
  • Hui Han
  • Mike Halm
  • Steve Hanson
  • Stevan Harnad
  • Eric Hellman
  • Hui Han
  • Geoff Hinton
  • Haym Hirsh
  • Steve Hitchcock
  • Jian Huang
  • Kirby Huntsinger
  • Gerd Hoff
  • Ernesto Di Iorio
  • Jim Jansen
  • Shannon Johnson
  • Paul Kantor
  • Madian Khabsa
  • Jon Kleinberg
  • Thomas Krichel
  • Bob Krovetz
  • Carl Lagoze
  • Andrea LaPaugh
  • Steve Lawrence
  • Wang-Chien Lee
  • Jay Lepreau
  • Michael Lesk
  • Huajing Li
  • Marco Maggini
  • Eren Manavoglu
  • Andrew McCallum
  • Chris Milito
  • Steve Minton
  • Tom Mitchell
  • Finn Nielsen
  • Michael Nelson
  • Craig Nevill-Manning
  • Andrew Ng
  • Andrew Odlyzko
  • Frank Olken
  • David Pennock
  • Yves Petinot
  • Brian Pinkerton
  • Alexandrin Popescul
  • Augusto Pucci
  • Betsy Richmond
  • Ben Schafer
  • Bruce Schatz
  • Terrence Sejnowski
  • Anand Sivasubramaniam
  • Warren Smith
  • Yang Song
  • Amanda Spink
  • Yang Sun
  • Harold Stone
  • Pucktada Treeratpituk
  • Kostas Tsioutsiouliklis
  • Valerie Tucci
  • Lyle Ungar
  • Frits Vaandrager
  • Moshe Vardi
  • David Waltz
  • James Ze Wang
  • Simeon Warner
  • Ian Witten
  • John Yen
  • Maria Zemankova
  • Hongyuag Zha
  • Ding Zhou
  • Ziming Zhuang


We are very thankful for the generous support that our sponsors have provided. In particular, CiteSeerx would not exist without their support.
If there is any interest in sponsoring CiteSeerx, please contact Professor Giles.

Current Sponsors

Previous Sponsors