Open doesn’t mean equal: diversifying data in the Creative Commons

The Creative Commons (CC) was founded in 2001 to address the limitations of copyright in a digital world. It intended to function as a symbol of sharing and freedom to push back against a restrictive environment and to bring people together. »The release of the Creative Commons (CC) licenses ensured that, as the internet continued to evolve, individuals and institutions could freely share in ways that were simple, standardized, and allowed for legal reuse.« [2025–2028 CC Strategy Document]

I am an assistant professor at a design school in Bangalore. In one of my classes in 2024, I asked my students to raise their hands if they knew what CC meant. None of them had heard of this initiative. During internships, they relied almost entirely on paid image repositories and generative AI to create prototypes, iterations and final outputs. And the limitations of such repositories quietly seeped into their work, for example, reinforcing familiar stereotypes. One student recalled her frustration at not finding many images of well-to-do Indians in the proprietary library to which the studio she interned at subscribed.

But why would people pay for image repositories when infrastructures such as the Creative Commons are available? Systems built in the spirit of collectivising knowledge depend on the collective to sustain them. Without a clear incentive, who will take responsibility for ensuring that open-source platforms remain relevant, diverse and un- or at least de-biased?

What began as a classroom observation soon revealed a deeper issue: the value of Creative Commons has not diminished, but collective engagement with it has. In the age of AI, in which datasets quietly shape everything from visual culture to everyday decision-making, neglecting open infrastructures has real consequences. If the commons is not actively enriched, it risks reflecting the same narrow, dominant perspectives that AI models already reproduce at scale.

For Creative Commons to reflect the world it aspires to serve it must include those things and people that have long been missing from public archives. Not so much »new« stories as underrepresented and marginal ones that have historically been excluded.

Diversifying disabilities

The Inclusive AI Lab began working with the equity product team of a design company to explore how the Creative Commons could be diversified through community-led representation. This led us to develop a ground-up, co-created process to diversify open-source datasets. This led us to collaborate with Enable India, one of the largest organisations working with persons with disabilities in the country.

The company instructed us to capture »authenticity«.” But how may one capture this visually given that many disabilities are not visible?

Under the banner »Women at Work«, we talked with many women with disabilities about how they might like to be seen. Our engagements with Enable India reminded us that disability is not a single category. »There is already a dearth of data on people with visual impairment; to leave them out would mean giving in to the challenges we hope to address«, argued our collaborator at Enable India.

People perform their identities

We asked working women with disabilities to document their daily realities with the prompt »a day in your life«. They clicked on photos of their lives in response to how they work, who they work with and the tools of their work. Turning an external gaze on one’s own life is not intuitive. It is a process of reflecting, conversing, of seeing oneself from a distance. This process imbibes the spirit of the collective: we want communities to lead the narrative, but such leadership still needs facilitation and scaffolding.

After collecting many images »in action« the challenge of curating them followed. Which visuals are more representative of their daily lives? For Kritika, a 37 year old woman who uses a wheelchair and worked in a training department, being depicted as focusing on her work mattered; but being happy at work mattered more. For Riya, a 21 year old woman with a hearing impairment and an intern in a design team, a more meaningful image for her was two-way communication and not someone talking at her.

Image 1 caption: Kritika at work prioritising her image representing joy

Image 2: Riya prioritising two-way communication

When we asked the women to translate their aspirations into prompts for AI image generators, interestingly, they rarely used labels such as »visually impaired« or »hearing impaired« to describe themselves. Instead, they focused on getting their features right; their clothes, expressions and surroundings.

Representing disability proved complex: aids can signal identity, but they also carry mixed lived experiences. Some relied on them, others no longer used them, and some had unpleasant memories associated with them; yet all felt that including these aids was an act of activism as they recognised the need to make women with disabilities more visible, especially in the world of work.

Limits of data capture

Representation can sometimes run counter to authenticity, and decisions about how disability appears in images must rest with people with disabilities themselves. Our work with people with disabilities is part of a larger project on Debiasing Data undertaken by the Inclusive AI Lab. This project is more than a creative intervention into diversifying public data archives. It’s a reminder that even when open systems such as the Creative Commons invite participation, they can never fully represent diversity in all its depth and nuance. By diversifying open datasets and spotlighting disability groups, we invite policymakers and technology leaders to rethink what fair data really means. Because a truly open commons isn’t just accessible, it’s representative. In the end, it’s reassuring to remember that we are, and always will be, far more than our data.

About the author

The Inclusive AI Lab at Utrecht University is dedicated to incubating leaders and helping to build inclusive, responsible and ethical AI data, tools, services, policies and platforms, with a special focus on the Global South.

Siddhi Gupta is the India field lead at the Inclusive AI Lab with a decade of experience leading projects in equity design and digital storytelling with global stakeholders. Her work has been published by Amsterdam University Press and in Teacher Plus Magazine, as well as journals such as Learning, Media and Technology. Siddhi currently teaches creative education at Srishti Manipal Institute, Bangalore.

Prof. Dr. Payal Arora is Chair of Inclusive AI Cultures at Utrecht University and co-founder of the Inclusive AI Lab. She is the author of award-winning books including The Next Billion Users (Harvard University Press) and From Pessimism to Promise (MIT Press). Forbes called her the champion of the next billion and the »right kind of person to reform tech«.