Data broker shared billions of phone location records with D.C. government as part of COVID-tracking effort
WASHINGTON — A data broker shared billions of "highly sensitive" phone-location records with the Washington D.C. government last year that revealed how people moved about the city, public records show.
The sharing of the raw phone location data was pitched as uniquely valuable for tracking the COVID pandemic, the records show. But the provision of the records for six months to the D.C. government's Department of Health also shows the potential for abuse of such data, which is generally collected without consumers' knowledge and then resold to both public and private buyers.
The company, Veraset, provided the data as part of a free trial, according to internal emails obtained via a Freedom of Information Act request by the Electronic Frontier Foundation, a digital-rights group. D.C. officials reviewed the data but ultimately declined to renew the partnership after the trial ended.
The emails show that the shared data was authorized for COVID-tracking purposes only and did not include people's names and personal details. EFF researchers said they found no evidence the data was misused.
But EFF technologist Bennett Cyphers said the emails showed how data brokers tried to "COVID-wash" their controversial work during the health crisis and forge new relationships with government authorities. He also questioned how anonymous the data truly is.
"A lot of these data brokers' existence depends on people not knowing too much about them because they're universally unpopular," Cyphers said. "Veraset refuses to reveal even how they get their data or which apps they purchase it from, and I think that's because if anyone realized the app you're using ... also opts you into having your location data sold on the open market, people would be angry and creeped out."
He noted that Veraset's location data includes sequences of code, known as "advertising identifiers," that can be used to pinpoint individual phones. Researchers have also shown that such data can be easily "de-anonymized" and linked to a specific person. Both Apple and Google announced changes earlier this year that would allow people to block their ID numbers from being used for tracking.
"If you look at a map of where a device spends its time, you can learn a lot: where you sleep at night, where you work, where you eat lunch, what bars and parks you go to," Cyphers said. Because of that, he added, it's extremely simple "to associate one of these location traces to a real person."
Veraset and other data brokers have worked to improve their public image and squash privacy concerns by sharing their records with public health agencies, researchers and news organizations, claiming the data could provide an indispensable way to monitor potentially risky crowd movements and public gatherings. The Washington Post, the New York Times and other news organizations also have made use of the data in reporting on potential health risks.
Veraset and other data brokers pay software developers to include snippets of code in their apps that then share a user's location data back to the company. Some companies have folded their code into games and weather apps, but Veraset does not say which apps they work with, and critics have questioned whether users are truly aware that their data is being shared in such a way.
The company is a spinoff of the location-data firm SafeGraph, which Google banned earlier this year as part of an effort to restrict covert location tracking.
Officials with Veraset and SafeGraph did not respond to requests for comment.
Sam Quinney, director of The Lab @ DC, a science and technology team in the D.C. government, said in a statement that District officials reviewed the data to determine if it could help with the local COVID response but "did not find suitable insights for our use cases" and declined to renew their access. The data, he said, was never shared with anyone other than authorized officials and is scheduled for deletion at the end of the year.
SafeGraph said last year it had shared data with the Centers for Disease Control and Prevention and state and city officials across the U.S., and its website says the company strives to "be the source of truth about the physical world."
The firm's investors include Peter Thiel, the billionaire co-founder of data-mining firm Palantir, and Prince Turki al-Faisal, a former Saudi ambassador to Washington who led Saudi Arabia's intelligence agency from 1979 to 2001.
The CDC used SafeGraph data as part of a one-year trial starting in the first weeks of the pandemic and, in April, awarded a contract to the company for another year of "social mobility" data, a spokeswoman told The Washington Post.
The data is used in the CDC's publicly viewable pandemic "Data Tracker" to estimate what percentage of the population is staying home. The CDC has also published at least two scientific reports using SafeGraph data covering how stay-at-home orders and the timing of public-policy changes changed population movement and "community mobility."
Some public health groups and news organizations have argued that the data can offer important insights and should be handled carefully so as to limit risks to people's privacy. The Post last year used SafeGraph data to visualize changes in attendance and potential risk at bars, churches, workplaces and restaurants, and the New York Times used SafeGraph and Veraset data to illustrate the differences in safety between specific gyms, coffee shops and fast-food joints, based on how long people visited and how crowded they got.
A Post spokeswoman said in a statement that the aggregated data did not include any personally identifiable information and offered "an important way to give readers a sense of what was happening around the country in a time of so much uncertainty." A Times spokeswoman said in a statement that their reporting relied on aggregated location data that was securely stored and erased after publication.
Veraset required D.C. officials to sign a "data access agreement" prohibiting the use of the data for non-research purposes and allowing the company to "choose to remain anonymous as the source of the Data at Company's sole discretion."
That agreement, Cypher said, could help Veraset ensure its work is cast in a positive light. The city's refusal to pay for the data, he added, suggested that raw location data may be less useful for public health than the company has claimed.
A D.C. government official said in the emails that the records included more than 12 billion data points. One phone can produce many data points, because its movements are tracked over time.
The D.C. emails have been redacted so as to not disclose how many people had their location data gathered, but a Veraset listing on the data marketplace Datarade said the company's records cover roughly 10 percent of the U.S. population, indicating that the D.C. data could have detailed the movements of hundreds of thousands of people.
The Datarade listing also advertised "billions of daily precise location data observations" taken from thousands of apps. Besides governments, the firm advertises its data to advertising, real estate and investment firms interested in tracking crowd size and movement at certain locations.
"Our core population human movement data set delivers the most granular and frequent GPS signals available in a third-party data set," the listing states.
The pandemic has fueled a nationwide debate over whether public health uses are valuable enough to justify an open market in data drawn from tracking people's movements without their knowledge.
Sens. Ron Wyden, D-Ore., and Rand Paul, R-Ky., introduced a bill this spring, the Fourth Amendment Is Not for Sale Act, that would ban government and law enforcement agencies from buying location data and other personal information without a warrant.
The bill would not prohibit the sale of location data to government agencies for public health purposes, but it would prevent such data from being shared by health agencies with law enforcement or intelligence officials.
Wyden's office attempted to contact SafeGraph multiple times last year but never received a response, an aide told The Post, adding that Wyden flagged the company to Google as a "data broker of concern" shortly before the tech giant banned SafeGraph's location-tracking code.
"It's no surprise that shady data brokers want to exploit the pandemic to put a positive spin on their sale of Americans' private information to the government," Wyden said in a statement Tuesday. "The unregulated trade in detailed location data creates serious safety risks for American families. The United States needs a comprehensive federal privacy law to stop these shady data sales."
In April 2020, a Veraset representative emailed a D.C. government official with an offer of "highly sensitive data ... [that] must be treated with extreme care," the public records show.
The offer included two data sets: "Movement," for GPS location coordinates tied to a phone's advertising ID number, timestamp and other information; and "Visits," showing when individual phones had visited stores or other "points of interest" over time. (In another Datarade listing, Veraset said the "Visits" data covers roughly 6 million places visited by 20 million people every day.)
A D.C. official responded that the data could help the Department of Health determine whether social distancing and stay-at-home orders had been effective. For the next six months, the emails show, Veraset officials routinely passed along new data of phone locations that had been recorded within the last 24 to 72 hours.
While the data did not include the movement of all residents, a Veraset official wrote, company tests had indicated that the data was representative enough that it could be used to "infer population movement."
The redacted emails said the data covered an unidentified portion of the D.C. metropolitan area that included both the District and nearby neighborhoods in Virginia and Maryland. City officials said they worked to safeguard the data, marking it for encryption and designating it as "classified" to block it from public view.
When the trial period ended in late September, a D.C. official wrote that the Veraset data had been "an excellent baptism by fire" for data scientists working to expand the city's centralized information database.
"Having such massive, massive regularly updating tables forced us to make leaps forward" and allowed officials to "learn about the strengths and weaknesses of using mobility data for DC's COVID response," the official said. But the government, he added, never found a use for the data due to "the limitations of app-based data and competing priorities within D.C. government."
The emails do not reveal a price that D.C. was expected to pay to extend their data access. In a separate 2019 agreement obtained by EFF, the state of Illinois paid SafeGraph $50,000 for access to two years of raw phone location data totaling roughly 50 million GPS "pings" a day.
A group of Democratic senators last year called for an investigation into U.S. Customs and Border Protection officials' use of location data sold by the data broker Venntel, which the agency had used to track people without a warrant.
The senators said the agency "should not be able to buy its way around the Fourth Amendment," which protects against unreasonable searches. CBP officials said they were allowed to "obtain access to commercially available information relevant to its border security mission."