WASHINGTON – U.S. intelligence researchers are asking industry to develop technologies to determine the authorship of human- and machine-generated documents by creating explainable linguistic fingerprints to attribute printed words to the correct authors and protect author privacy.
Officials of the U.S. Intelligence Advanced Projects Agency (IARPA) in Washington released a broad agency announcement on Friday (IARPA-BAA-22-01) for the Human Interpretable Attribution of Text using Underlying Structure (HIATUS) program.
Authorship attribution is an important capability for the Intelligence Community (IC), facilitating efforts to understand and combat increasingly sophisticated malicious information campaigns online, fight human trafficking, and identify counterintelligence risks. IARPA is the research arm of the U.S. director of national intelligence.
Authorship privacy -- also known as authorship obfuscation -- refers to ways of modifying text to remove features unique to a specific author. These technologies could protect individuals and groups whose writing, if attributed, could place them in personal danger.
The 3.5-year HIATUS program seeks to develop systems for attributing authorship and protecting author privacy by identifying and capitalizing on explainable linguistic fingerprints.
HIATUS will create technologies that automatically identify and generate verifiable linguistic fingerprints of individual authors -- human and machine -- to enable authorship attribution and privacy. Successful systems will perform for diverse authors, topic domains, genres, and languages.
The program will develop ways to represent each author in stable fashion across diverse text types, and create algorithms not only for authorship attribution, but also for author privacy. Approaches must scale to other languages.
HIATUS WILL develop an encoder that compares texts from the same author. Successful technical approaches will take into account machine- and human-generated documents. Successful technical approaches will take both feature types into account.
Authorship attribution will identify unique authors using stylistic characteristics of author text. Software will capitalize representations from the stylistic encoder to determine whether two or more documents in a collection were produced by the same author.
Authorship privacy will develop a system able to modify aspects of a document that provide evidence of authorship without altering fluency and meaning.
Companies interested should submit proposals no later than 18 April 2022 through the IARPA Distribution and Evaluation System (IDEAS) at https://iarpa-ideas.gov.
Email technical, contractual, or administration questions to IARPA at [email protected]. More information is online at https://sam.gov/opp/ae4dfd2492104130b68eee99d8aed195/view.