A robust, timely census is vital to democracy. Censuses sketch the changing face of our nation by charting both political and demographic shifts, including changes in wealth and neighborhood transitions. Most crucially, they influence how resources and political power are doled out: Cities use census data to set budgets and the Constitution mandates a national census to apportion congressional seats. But taking a census is both expensive and slow—the annual American Community Survey (ACS) represents data collected over a five-year period, creating one hell of a lag. A team of Stanford AI researchers believe we can use computer vision to speed up the process, a radical approach to a centuries-old practice.
In a new paper, the researchers described how they fast-tracked census data collection by using object recognition on 50 million Google Street View images from 200 cities. They found that street vehicles told a sort of neighborhood biography: Just by counting cars they found in the images (22 million in total), they were able to extract data like income, race, education, and voting patterns of a given area.
The researchers began by sorting cars identified by deep learning-based computer vision into 2,657 categories based on make, model, body type and year. From there, they created a training set for a computer model with census data from 35 of the 200 cities. By looking at census data from the ACS, the US Census and the 2008 presidential election, the model surfaced associations between the cars it spotted and key data points, including the race, political patterns and income of neighborhoods. The associations were accurate enough that, when applied to the larger set of 165 cities, they mostly held up.
For example, Hondas and Toyotas were associated with predominantly Asian neighborhoods, Chryslers, Buicks, and Oldsmobiles with black neighborhoods, and Aston Martins, Volkswagens and pickup trucks with white neighborhoods. The model most closely associated Republican-voting precincts with extended-cab pickup trucks, while Democrat-voting precincts were mostly strongly tied to sedans. When matched against the actual census data, the AI generally got it right and did it much faster: the AI classified all 50 million images in 2 weeks. The ACS release takes five years.
The researchers found the model worked best when associating cars with voting patterns, writing, “We found that by driving through a city for 15 minutes while counting sedans and pickup trucks, it is possible to reliably determine whether the city voted Democratic or Republican: if there are more sedans, it probably voted Democrat (88% chance) and if there are more pickup trucks, it probably voted Republican (82% chance).”
The accuracy is surprising, but certainly not astonishing: You’d expect poorer neighborhoods to have older cars, just as you’d expect those same neighborhoods to be less educated overall. Ultimately, the researchers say they’d like to strike a balance between integrating more advanced data collection techniques, like satellite imagery, into their work and considering ethical concerns about public data and personal privacy.
Real-time data collection could hugely improve the census process, but the public is increasingly being asked by various parties to forgo privacy in exchange for some larger good, as is already the case with policing and online security.