Align or fail: How economics shape successful data sharing
Digest #78: Why choosing the right data sharing model for your project is essential.
Data plays a critical role in fostering innovation for good; after all, information is power. With more data being created every day, more information should mean more progress, yet much of this data is siloed, curtailing its impact. When we bring data together from different sources and actors, we often unlock the maximum impact for social good. For example, food poverty is a complex issue with many drivers, however data owned by municipalities regarding social services is rarely combined with data on food aid distribution provided by non-profit organizations to inform strategies, stifling their impact.
We have seen recognition by a wide range of actors, including governments, on data sharing’s importance in recent years. For example, the EU has recently created the recognised status of ‘Data Altruism Organization’, a designation obtainable by not-for-profit and independent organizations promoting the use of data for social purposes to encourage more organizations to do so.
However, it is not easy to convince public and private actors to share data, and it is even harder to engage them in long lasting data sharing efforts. Making data sharing easy and their efforts successful is crucial in ensuring data’s potential is untapped. Choosing the right data sharing model for the right project is vital to ensuring its success, and only by clarifying the distinctions between different data sharing models can we enable practitioners to choose the approach that suits their purposes, and ultimately achieve impact.
Understanding the different models of data sharing
The conceptual distinctions between different data sharing models are mostly based on one fundamental element: the economic nature of data and its value.
Open data projects operate under the assumption that data is a non-rival (i.e. can be used by multiple people at the same time) and a non-excludable asset (i.e. anyone can use it, similar to a public good like roads or the air we breathe). This means that data can be shared with everyone, for any use, without losing its market and competitive value. The Humanitarian Data Exchange platform is a great example that allows organizations to share over 19,000 open data sets on all aspects of humanitarian response with others.
Data collaboratives treat data as an excludable asset that some people may be excluded from accessing (i.e. a ‘club good’, like a movie theater) and therefore share it only among a restricted pool of actors. At the same time, they overcome the rival nature of this data set up by linking its use to a specific purpose. These work best by giving the actors a voice in choosing the purpose for which the data will be used, and through specific agreements and governance bodies that ensure that those contributing data will not have their competitive position harmed, therefore incentivizing them to engage. A good example of this is the California Data Collaborative, which uses data from different actors in the water sector to develop high-level analysis on water distribution to guide policy, planning, and operations for water districts in the state of California.
Data ecosystems work by activating market mechanisms around data exchange to overcome reluctance to share data, rather than relying solely on its purpose of use. This means that actors can choose to share their data in exchange for compensation, be it monetary or in alternate forms such as other data. In this way, the compensation balances the potential loss of competitive advantage created by the sharing of a rival asset, as well as the costs and risks of sharing. The Enershare initiative aims to establish a marketplace utilizing blockchain and smart contracts to facilitate data exchange in the energy sector. The platform is based on a compensation system, which can be non-monetary, for exchanging assets and resources related to data (such as datasets, algorithms, and models) with energy assets and services (like heating system maintenance or the transfer of surplus locally self-produced energy).
Using different operations for different purposes
These different models of data sharing have different operational implications. First, most clearly, the degree of openness of the sharing environment. Open data projects are, by definition, open to everyone, meaning that everyone can access the data without going through any selection process. Data collaboratives take the form of closed data partnerships, in which data is shared only among members of the collaborative. If new actors want to access the collaboration, they should normally go through a selection process and declare the purpose for which they want to use the data. Data ecosystems are often open to new actors, which can freely access the ecosystem if they go through evaluation or negotiation processes to access the data.
Secondly, their approach to data use is important. Open data platforms, and to a certain extent, data ecosystems, are based on the idea that data can, and should, be used for any purpose and that actors are probably using the same data or each other’s data for different projects and purposes. In a data collaborative, data are shared with a clear link to a predetermined social purpose, so its purview is often more narrow. Understanding these differences, and knowing from the outset how data should be used and by whom is critical for selecting the appropriate model.
Making the right choice
When the differences in data sharing methods are considered we see success and, most importantly, impact from data sharing. The ACTNOW Coalition was developed by a group of volunteers during the COVID pandemic to create a live and user-friendly map of the pandemic spread. They did it by using open data published by California’s government and the platform allowed public bodies and citizens to make better-informed decisions on actions and behaviors to protect themselves and their families. This is a fantastic utilization of open data platforms, which are perfect for making data on social, economic, and health at an aggregated level more accessible and usable, thus creating transparency, supporting public decision-making, and fostering social mobilization and citizens' engagement.
Data collaboratives by contrast are most effective when promoted in relation to a specific societal or environmental challenge (e.g., migration, urbanization, or food security). By linking the use of data to a specific social purpose and recognizing the competitiveness of data, they are more easily able to engage private actors, allowing the sharing of privately held, granular, and sensitive data that is not suitable for public use or access.
The Data for Children Collaborative pools together data from public and private sources to work on eight issues impacting children around the world, like climate change, nutrition, education, and poverty. The collaborative not only analyzes the data, but also ensures data privacy and confidentiality, constituting a safe environment for data sharing, and guarantees that data are only used for the purpose agreed.
Data ecosystems have been most effective in industrial settings (e.g., energy or mobility), in which exchanges are predominantly governed by market forces but may benefit from cooperation. The Mobility Data Space is a great example of this, an ecosystem in which actors from the mobility industry can exchange data, with the data provider having the right to choose whether to share their data and with whom. In this way, they contribute to industry development, by making data available for lobbying or public decision-making, but still preserving a good degree of control over the use of their data.
Unlocking data’s potential
Understanding the differences between different models of data sharing can assist practitioners in selecting the best configuration for their initiatives, and lead to higher engagement and ultimately an abundance of data shared.
The risk, as already seen in multiple cases, is to build huge cathedrals (data sharing platforms) that no one will use, and the benefits of data sharing are lost.
Data sharing, and the insights gained through collaboration or access to new data sets, may hold answers to many of the thorniest issues we face in our society. The economic nature of data impacts the decisions made around its sharing and must be taken into account if we are to unlock data’s full potential.