Covid-19 Ethics Hackathon

This hackathon focused on the ethical implications of a Covid-19 data dashboard. Starting in early of August, this three-week effort used a dashboard created by a non-profit and explore the following topics:

Tasks

1.      Demonstrate potential harms of data bias. 

a.      Broadly reflect on the short and long term ethical implications of unrepresentative location/mobility data and how the platform might contribute to inequity.  

b.      Build and describe an illustrative scenario demonstrating at least one specific ethical harm.

2.      Demonstrate use of and propose actionable recommendations to address the following questions: 

a.      What methods should be deployed to improve data representativeness and to address inequity?  

b.      What additional data, features, or other ideas should be integrated into this tool to address data bias? (Optional)

Outcomes

     At the end of the Ethics Hackathon, each of three student groups presented their findings to their peers and faculty advisors. While the groups’ emphases varied, the specific ethical concerns rested in three main categories: 

  • Data representation concerns: how representative are the underlying data signals on the platform;
  • Data privacy concerns: how re-identifiable are users from their data on the platform; &
  • Data use concerns:  how would the data be interpreted and subsequently used by policy-makers or others.
Sample exploration of COVID testing sites on the continental USA

Representation Concerns

Concerns about the underlying data focused on analyzing the platform’s use of geolocation data based on cell phone pings that potentially fails to accurately represent various populations. Overall, concerns focused on how the geolocation data sources rely on unequally distributed hardware and network characteristics. For example, teams observed that lower income individuals and elderly individuals both demonstrate lower levels of smart phone usage and are thus underrepresented in the dataset. Teams also observed that rural populations’ phones have fewer pings and data transactions, in part due to less reliable cell phone service.  These elements of data underrepresentation were not considered insurmountable, and groups presented various methods to analyze and try to compensate for gaps in the dataset, including comparisons against census tract data as a trustworthy baseline. Other suggested methods to compensate cell phone data for areas with lower device penetration involved weighting regions to create a balanced ‘pings per capita.’ 

Overall, the groups’ findings about the underlying data quality highlighted the importance of couching technical questions of how the data might be used with skepticism over who the data represents. While the groups’ findings did not mean that the data could not be used to inform valuable insights, they pointed out the need to ensure that the limitations and potential inequity associated with the data are addressed. This issue was considered especially important given the public health interventions potentially being informed by data pulled from the platform.             

Privacy Concerns

Another concern with the platform was the problem of data privacy and the possible reidentification of individuals. These concerns are familiar in contexts where geolocation datasets are used, but groups highlighted the risk that the use of cell phone location data potentially put the identify of individuals at risk.  Given this risk, it is important to verify appropriate use of such sensitive data.   

Use Concerns

 In addition to concerns about the representativeness and privacy of the data itself, groups also highlighted how ethical concerns hinge on knowing who the users of the data are and what types of interventions they intend to implement.  Teams noted that questions of whether the data was sufficiently representative depended on what subset of the data was being considered, and for what ultimate purpose. For example, local decisionmakers focused on smaller regions might need to be more wary of representation issues exacerbated by smaller sample sizes. There were also concerns over interpretation of the data by end-users, particularly around the correlation between data availability and factors such as race and wealth, especially if the tool is used for more coercive policy such as ordering quarantines.  

Overall, the ethics hackathon concept seems valuable and productive as a more holistic approach to tech red-teaming. There were initial interface & platform-access frictions or costs as the hackathon effort started. But those proved surmountable. We will continue to pursue and refine the concept as needed!