UC Berkeley and Stanford join forces on groundbreaking database on police misconduct and use of force, combining human tenacity with AI to benefit the public

August 5, 2025

UC Berkeley and Stanford join forces on groundbreaking database on police misconduct and use of force, combining human tenacity with AI to benefit the public

 

A few minutes before midnight on January 1, 2019, UC Berkeley Journalism student Susie Neilson (’19) stepped away from a New Year’s Eve gathering to click send on a mail merge that would send more than 400 public records requests to law enforcement agencies across the state — marking the first effort by journalists to begin gathering 1.5 million pages of once-secret public records on police use of force and misconduct in California.

a pile of thumb drives and a tag, indicating evidence, in a blue and red filter.Neilson and fellow journalism students at UC Berkeley Journalism’s Investigative Reporting Program, led in a workshop by journalist Thomas Peele, didn’t waste a day in requesting newly available records following the passage of California’s transparency law, S.B. 1421. And they haven’t wasted much time since.

Nearly seven years later — with the involvement of more than 100 journalists, journalism students, researchers, data scientists and lawyers at UC Berkeley, Stanford University and UC Irvine and legal and civil liberties organizations nationwide — a database containing these documents, the result of more than 3,500 public records, is live and searchable on the news sites of KQED, CalMatters, The San Francisco Chronicle and Los Angeles Times. Before now, only journalists, lawyers or others with the wherewithal to access public records at some 700 law enforcement and oversight agencies in the state, could acquire the information.

Neilson, now an award-winning investigative reporter for The San Francisco Chronicle, said sending off records requests at midnight seems slightly ridiculous now, but she appreciates the results. “I am so proud of all of the people involved in making this law function for Californians,” she said.

Former state Sen. Nancy Skinner, who served the East Bay in the legislature and now serves as a state energy commissioner, was instrumental in passing the laws, S.B. 1421 and S.B. 16, that would give Californians new access to once-restricted police records and also in helping the state fund the creation of the database to make information broadly accessible.

“For 40 years California hid police misconduct,” said Skinner in a media release about the launch of the Police Records Access Project this week. “We were able to open those records to the public when the legislature passed S.B. 1421 in 2019. Now with this new database, Californians will have even better access, making it easier to find out which law enforcement officers have a history of bad behavior and which of our police departments do the right thing to hold their officers accountable.’’

This first-of-its-kind database was built by UC Berkeley Journalism’s Investigative Reporting Program (IRP), the Berkeley Institute for Data Science (BIDS), and Stanford University’s Big Local News. Support also came from the ACLU Foundation of Southern California, California innocence organizations, the National Association of Criminal Defense Lawyers, UC Irvine law school’s Press Freedom Project and UC Berkeley law school’s Criminal Law & Justice Center.

The painstaking work of collecting public records

With leadership from UC Berkeley Journalism’s Investigative Reporting Program, journalists at dozens organizations teamed up to form the California Reporting Project in 2018. They created Google docs and folders to share files, used DocumentCloud and custom-built a database. Even though the reporters published more than 100 stories from the information they found, it wasn’t easy.

First, they had to know which agencies might hold records they needed, which required analyzing other datasets to identify qualifying park districts or probation offices.

Katey Rusch (’20), records request manager for the project, was involved as a student in 2019 and then hired to take over records requests at 40 hours per week in 2023. She sent some 700 requests per year and worked with others, fellow alums like Daniel Lempres (’21) and faculty, like Susan Seeger at UC Irvine law school, to organize all of the records into “incidents” and to manually verify some 2,000 misconduct cases.

Group photograph of staff and faculty of the Investigative Reporting Program in the newsroom.

The IRP’s faculty and staff from left to right: Kate Raphael, Lisa Pickoff-White, Sasha Schell, Garrett Therolf, Aysha Pettigrew, David Barstow, Yasmin Rafiei, Katey Rusch, Kathryn Hurd and Bernice Yeung. Not pictured: Christine Schiavo. Photo: Marlena Telvick

“Everything about this project was on hard mode,” laughs Lisa Pickoff-White (’08), research director for the project who works out of Berkeley’s Investigative Reporting Program and a mainstay of the effort since inception.

Pickoff-White said once you had received the requested files, you would have to navigate all types of formats to decipher and extract text and information for a story. She said it was like doing a thousand puzzles at once, with people constantly mailing in puzzle pieces to be inserted. She explained that agencies have to hand over records, but are not required to do so in any specific or organized way. That’s why the involvement of students in such a labor-intensive effort has been invaluable.

Over years and with few resources and through COVID, Berkeley Journalism alums like Sukey Lewis (’15), a KQED reporter, kept the project going even while juggling their own demanding reporting jobs. In addition to the California Reporting Project, a national group of journalists and lawyers formed to collaborate on requesting and using the documents.

A major assist from AI

While the project was a labor of love, it was never-ending and nearly impossible for journalists to do alone.

Thankfully, journalists, lawyers and data scientists — from Berkeley, Stanford, UC Irvine and beyond — came together.

In 2019, Cheryl Phillips, founder of Stanford’s Big Local News, hosted a lunch at the university’s Faculty Club in Palo Alto in the first convening of all the project’s many collaborators inside and outside of journalism. The group stayed so long that staff had to usher them out.

“Folks walked across campus to keep talking, they were so engaged,” Phillips said, noting that’s when she first met David Barstow, chair of UC Berkeley Journalism’s Investigative Reporting Program.

In 2022, the group took another major step when data scientists at the Berkeley Institute for Data Science (BIDS), part of UC Berkeley’s College of Computing, Data Science, and Society, and data scientists and journalists at Stanford University’s Big Local News joined forces with the IRP in a state-funded effort to create an accessible database.

A bald man in a blue shirt with glasses.Aditya Parameswaran, an associate professor at UC Berkeley’s Department of Electrical Engineering and Computer Sciences, led work on the database’s backend while Cheryl Phillips at Stanford’s Big Local News led work on the interface. All along the way, humans were reviewing the outputs of the AI tooling that was vastly accelerating the organization of the data received through the records requests.

 “Here we have an amazing example of how generative AI — with humans in the loop — can be used for good, at a scale that’s unprecedented, for a task that’s never been done before and for societal impact,” Parameswaran said.

Parameswaran also co-directs the EPIC Data Lab, a lab focused on no-code data tooling powered by Generative AI, with research efforts that complement — both draw inspiration from and contribute back to — this project.

Tarak Shah, who has served as product manager for the project at BIDS, says it was a confluence of factors that gave rise to the database, including the passage of transparency laws and the ongoing development of Generative AI, specifically large language models, that enabled the team to quickly and reliably sift through information from a massive collection. Generative AI, for example, was used to help the journalism team more quickly organize files into cases, extract information like locations and dates per case, and improve the search.

“The timing was right so that we could use this opportunity,” Shah said. “As the journalists expanded their efforts at requesting records under the newly passed laws, they found the manual annotation and extraction processes they had relied on could not cope with the volume of records they were getting. Traditional text processing and machine learning tools were not well suited to the heterogeneity of documents from hundreds of different jurisdictions and agency types.”

A screen with the words Police Records Access Project that looks like a page of the datagase, surrounded by a green three-dimensional border.He said that without the availability of generative AI tools, organizing a collection of this size and diversity would have required much more labor than was realistically available.

Shah also emphasized the importance of the database to the possibility of accountability: for example survivors of police violence will be able to look up disciplinary information about officers involved in their case.

“The release of this database demonstrates BIDS’ commitment to interdisciplinary collaboration around the use of artificial intelligence to improve science and society,” said Kirstie Whitaker, BIDS executive director. “The project team leveraged AI tools to scale the expertise of journalists while maintaining their collective ethical responsibilities to civic society.”

‘Emblem of transparency and the power of journalism’

Rusch says she and others felt like this day would never come.

“It felt insurmountable to publish this scale of records — not only to get this many records, but to publish them all in one database,” she said. “To me, it feels really exciting and powerful to be able to provide this for the public. I have always felt that public records should not be reserved for people with some kind of expertise.”

Michael D. Bolden, dean of UC Berkeley Journalism, said he’s proud of the dozens of students, alumni and staff who have helped build this robust database over many years.

“It’s a project that represents the best of Berkeley — innovative collaboration across disciplines and institutions and a dedication to fact-finding and transparency for the public good,” Bolden said.

The IRP’s Barstow, the Reva and David Logan Professor of Investigative Reporting, agreed, saying that the publication of the database in California provides a contrast to the recent actions of the federal government.

“At a time when the federal government is moving aggressively to remove access to records and data about police in general and police misconduct in particular, the years of hard work on this database will now shed some light on policing across the entire state of California,” said Barstow, a four-time Pulitzer Prize winner, whose IRP was named a finalist for the prize this year. “The database stands as a powerful emblem of transparency and the power of journalism at a time when both of these values face significant new threats.”