Socially aware algorithms are ready to help

Better coding, not just laws and regulations, is the solution for tech’s failure to address the needs of actual humans

Calls for stronger government regulation of large technology companies have become increasingly urgent and ubiquitous. But many of the technology failures we hear about every day—including fake news; privacy violations; discrimination; and filter bubbles that amplify online isolation and confrontation—have algorithmic failures at their core.

For problems that are primarily algorithmic in nature, human oversight of outcomes is insufficient. We cannot expect, for example, armies of regulators to check for discriminatory online advertising in real time. Fortunately, there are algorithmic improvements that companies can and should adopt now, without waiting for regulation to catch up.

Given their frequent media portrayal as mysterious black boxes, we might be worried that rogue algorithms have escaped the abilities of their creators to understand and rein in their behaviors. The reality is thankfully not so dire. In recent years hundreds of scientists in machine learning, artificial intelligence and related fields have been working hard at what we call socially aware algorithm design. Many of the most prominent and damaging algorithmic failures are well understood (at least in hindsight), and, furthermore, have algorithmic solutions.

Consider the use of algorithms in automating consumer lending decisions. If a bank discovers that their algorithm has a false rejection rate for creditworthy black applicants that is significantly higher than for creditworthy white applicants, the solution does not have to involve sacrificing automation. If the “standard” way of training a predictive model for consumer lending proves to be discriminatory, there are a variety of ways the model can be adjusted to reduce or eliminate this bias.

For instance, rather than training the model to simply minimize the overall predictive error rate, we can instead train it under the additional fairness condition that no racial group is treated substantially worse than any other. Doing so will generally cause the overall error rate to be higher, since the fairness condition only makes the problem harder. Or if the error disparity stems from a lack of data from the minority population, it can be improved by gathering more data. This of course requires the investment of money and other resources.

These trade-offs highlight a fundamental but often painful truth: algorithmically enforcing social norms like fairness, privacy and transparency will come at a cost—usually a cost to accuracy, profit or “utility” more broadly. While there are a variety of good ways of reducing algorithmic bias, all of them will present the designer, and thus their corporate employer and society at large, with inescapable trade-offs between fairness and utility. Deciding how to balance these trade-offs will require executives, companies and scientists to make difficult yet crucial choices with far-reaching consequences for society.

The scientific outlook for privacy is similar. For example, if we are worried that the release of data will reveal compromising information about someone, or that products using that data will do so (since machine learning methods can leak information about the individuals used to train them), we can use a powerful recent tool known as differential privacy that provably prevents such leaks. The idea is to add “noise” to computations in a way that preserves our broad algorithmic goals but obscures the data of any particular individual.

For instance, rather than the U.S. Census publishing the exact fraction of residents below the poverty line in a particular Philadelphia neighborhood, they might instead “corrupt” that fraction by adding a small random number. The corruption can be small enough that a statistician can be highly confident about the overall poverty level of the block, while still guaranteeing that nobody can be certain about the income level of any specific person. There is a real tension between overall accuracy and the degree of such privacy we can promise; but as with fairness, such trade-offs are inevitable.

The census example is more than hypothetical: all statistics released from the 2020 U.S. Census will be computed in a differentially private manner. And the trade-off between privacy and accuracy has made this controversial among downstream consumers of census data, who naturally prefer more accurate statistics. This is just an early example of the kind of societal negotiation we will have to engage in more broadly, very soon.

Socially aware algorithm design is now scientific reality. We can indeed design better-behaved algorithms, and precisely quantify the costs and benefits of doing so. Even while waiting for regulation and law to wind their way through the system, algorithms can be improved now; it’s mainly a matter of technology companies having the will to do so. In the meantime, policy makers and regulators would do well to study the science. Requiring that algorithms be “private” and “fair” is subtle and easy to get wrong—but also possible to get right.

Via ScientificAmerican.com