FORCE2019 has ended
Wednesday, October 16 • 10:30am - 11:00am
Perpetual access machines: archiving web-published scholarship at scale

Sign up or log in to save this to your schedule, view media, leave feedback and see who's attending!

Feedback form is now closed.
In 2018, the Internet Archive undertook a large-scale project to build as complete a collection as possible of scholarly outputs published on the web, as well as to improve the discoverability and accessibility of scholarly works archived as part of these global web harvests. This project involved a number of areas of work: targeted archiving of known OA publications (especially at-risk “long tail” publications); extraction and augmentation of bibliographic metadata and full text; integration and preservation of related identifier, registry, and aggregation services and datastores; partnerships with affiliated initiatives and joint service developments; and creation of new tools and machine learning approaches for identifying archived scholarly work in existing born-digital and web collections. The project also identified and archived associated research outputs such as blogs, datasets, code repositories and other secondary research objects. The beta API and public interface - code-named "fatcat" - can be found at https://fatcat.wiki/.

Project leads will talk about the project’s current status and upcoming work, focusing on content acquisition, indexing, discoverability, the role of machine learning, service provisioning, and their collaborative work with libraries, publishers, and non-profits. Conceptually, the project demonstrates that the scalability and technologies of "archiving the web" can facilitate automated ingest, enrichment, and dissemination strategies for a variety of web-published primary and secondary scholarly record types that have traditionally been collected via more custom and manual workflows. The project strategic goal is to provide open infrastructure for the perpetual discoverability of and access to archived scholarship.

avatar for Jefferson Bailey

Jefferson Bailey

Director, Archiving & Data Services, Internet Archive
Jefferson Bailey is Director of Web Archiving & Data Services at Internet Archive. Jefferson joined Internet Archive in Summer 2014 and manages Internet Archive’s web archiving services including Archive-It, used by over 900 institutions to preserve the web. He also oversees web... Read More →

Wednesday October 16, 2019 10:30am - 11:00am BST
Plenary Room