
Does it Matter Whose Values We Encode in AI?
Consider for a moment the benefits of artificial intelligence, and what likely comes to mind are its contributions to areas like automation, data analysis, accelerating scientific and medical breakthroughs, or simply making daily life more convenient. As AI systems grow more capable to do these tasks, a contentious subject remains, one that has moved from academic corridors into public debate: AI alignment.
AI alignment has burgeoned into a subdiscipline in AI research and refers to the process of designing AI systems to align with human intentions and values, with the goal of making them safe, and reliable. The increasing complexity of AI systems brings with it greater societal and ethical risks; it is essential to align these systems by encoding them with ethical values to ensure they effectively serve their intended purpose. A failure to do so could mean AI systems optimising their performance to pursue their own goals.
At first glance, “alignment” sounds comfortably technical. AI engineers describe it as making AI systems behave according to human intentions and societal norms. Policymakers invoke it as a safeguard against harmful or unpredictable behavior. Tech executives frame it as their top priority.
Scratching the surface, and the term reveals deep fault lines. “Human values” are not universal; they diverge across cultures, political ideologies, moral frameworks, and economic interests. The battle over alignment is less about technical safety, although that matters, and more about power. It is about who gets to enshrine their worldview inside the next generation of intelligent systems. In this article I discuss the plurality of values, its importance to building AI systems that are a true representation of the world. Additionally, I address the illusion of universal values that has propelled current conversation on AI alignment. I also identify the structural barriers that exclude representative values from the global majority. Lastly, I offer a three-pronged framework for value alignment.
The illusion of universality
The field of AI alignment has traditionally proceeded from an unstated assumption that there exists a coherent set of "human values" to be encoded in AI systems. This assumption is not merely naive; it is actively harmful to the billions of people whose values remain systematically excluded from AI development.
Research around alignment often takes a technical yet universal outlook on the subject, giving an impression of consensus. However, we must ask “whose values should be encoded in AI systems?” To answer this question, it is crucial to understand that many major AI labs speak of building AI models to uphold “human values,” such as fairness, respect, safety, dignity, and non-discrimination. These are important aspirations, but they are far from universally interpreted.
The dominance of Western epistemological frameworks in AI development is not accidental but structural. Research has found that the underrepresentation of geographic areas such as Africa, South and Central America and Central Asia indicates that global regions are not participating equally in the AI ethics debate, revealing a power imbalance in international discourse. This imbalance manifests not only in who participates in AI development but in the very conceptual frameworks used to think about alignment.
Consider the empirical evidence from cross-cultural studies. Cross-cultural variation in judging language as offensive is largely explained by differences in individual moral concerns, especially “Care" and “Purity”, according to research employing moral foundations theory. Specifically, cultures that prioritize caring for others and avoiding impure thoughts demonstrate greater sensitivity to language deemed offensive. These are not superficial differences in preference. They are fundamental variations in moral reasoning that cannot be reconciled through simple averaging or approaches with preference given to the majority of the population.
For instance, Western countries tend to emphasise privacy, surveillance, and ethics in their AI concerns, while regions like Africa and Asia focus on technological dependency, state control, and socioeconomic issues such as job displacement. This means that although the goal of aligning these systems more broadly may be similar they differ within context for users. When AI systems are trained predominantly on feedback from Western annotators addressing Western concerns, they inevitably encode these priorities as universal truths.
“When AI systems are trained predominantly on feedback from Western annotators addressing Western concerns, they inevitably encode these priorities as universal truths.”
The technical architecture of value imposition
Modern large language models achieve alignment primarily through Reinforcement Learning from Human Feedback (RLHF), a technique that trains AI systems to maximise preferences expressed by human evaluators. The process appears elegantly simple. Firstly, collect human judgments about which AI outputs are preferable; secondly, train a reward model to predict these preferences; and thirdly, use reinforcement learning to optimise the AI system toward higher-scoring responses. Yet this single reward function cannot adequately represent the diverse opinions of different groups of people, and even with representative sampling, conflicting views may result in the reward model favoring majority opinion and disadvantaging underrepresented groups.
The technical problem runs deeper than simple underrepresentation. Standard alignment procedures might actually reduce distributional pluralism in models, with RLHF-trained models tending to concentrate on a less varied set of answers that deviate from the more distributed nature of human responses. This convergence toward uniformity is not a bug but a feature of optimisation processes that seek singular solutions. When we train AI systems to maximise a scalar reward signal, we mathematically encode the assumption that better and worse responses exist on a single axis, an assumption that crumbles under cross-cultural scrutiny.
Paradigms for pluralistic alignment
Recent research has begun to formalise what pluralistic alignment might actually mean in practice. Three possible approaches have been identified: Overton pluralistic models that present a spectrum of reasonable responses; steerably pluralistic models that can adjust to reflect certain perspectives; and distributionally pluralistic models that are well-calibrated to a given population.
Each paradigm offers different affordances and challenges. Overton pluralism acknowledges that multiple responses may be legitimate but risks defining "reasonableness" according to dominant cultural norms. Steerable pluralism allows users to select value frameworks but places the burden of choice on individuals and may fragment shared public discourse. Distributional pluralism aims to reflect actual population diversity but requires representative data that currently does not exist at scale across global populations, especially in the global majority.
The technical implementation of these paradigms remains an open challenge. Research finding showed that model steering with sparse autoencoders offers consistent improvement over baselines with only 50 annotated samples, demonstrating that pluralistic alignment can be achieved in low-resource settings. This finding has profound implications. It suggests that we do not need massive datasets from every culture (a challenge for many in the global majority) to begin building meaningfully inclusive AI systems. We only need intentional effort to collect and utilise diverse feedback. In effect, this removes the bottleneck from technical feasibility to organisational willingness to prioritise value diversity.
Structural barriers to global south inclusion
The path toward genuinely pluralistic AI faces obstacles that are simultaneously technical, economic, and epistemological. African countries often lack extensive resources to develop advanced AI systems and therefore rely significantly on AI software created by more technologically advanced countries in the Global North, placing African nations in a consumer position using AI tools whose development contexts do not necessarily align with local cultural, ethical, and social traditions.
This technological dependency creates a vicious cycle. When global majority nations lack the infrastructure to participate in AI development, their values remain unrepresented or underrepresented. When AI systems trained on Western values are deployed globally, they further entrench existing power asymmetries. The result is a manifestation of what is often termed digital colonisation, the extension of historical colonial patterns through digital infrastructure and extraction.
The unavailability of resources like datasets and technical expertise from the global majority remains an issue for advancing a pluralistic alignment agenda. This creates a resource disparity since technical talent and data is abundant in the global north. Beyond this, AI systems are not autonomous artifacts with fixed properties. They are sociotechnical assemblages shaped by dynamic interactions and complex interplays of sociocultural, technical, and political economies. The values encoded in AI are not merely the explicit preferences of annotators but the implicit assumptions of development teams, funding structures, and market incentives.
Establishing a framework for value alignment and accountability
I propose a framework for value alignment and accountability, operating across three levels: data annotation, model architecture, governance mechanism, and the underlying epistemology.
Annotation-level transparency
It is important to ensure some level of transparency when annotation is done. Every dataset used for alignment should include detailed demographic metadata about annotators, making visible whose preferences shaped the model. This transparency mechanism allows downstream users and researchers to understand value biases rather than treating aligned models as value-neutral and their output as a true reflection of the world.
Architectural pluralism
A more suitable technical approach involves moving beyond training single, monolithic models with one reward function. Instead, developing modular systems where different value modules can be applied selectively, depending on the specific context and user preference, may be the right path forward. This approach takes advantage of steerable pluralism, going a step further by making value frameworks technically possible to break down and combine. This approach introduces personalised alignment; the benefit of which acknowledges moral diversity while preserving user agency. It avoids imposing one universal framework on a culturally heterogeneous world.
Governance-level inclusion
AI alignment involves various levels of values - individual, organizational, national, and global - with each influencing the others. It is important, therefore, to develop a governance mechanism needed to aggregate diverse stakeholder input from all these levels. By doing so, we ensure that AI systems continue to align to values we hold.
Two primary flows illustrate how we may aggregate stakeholder inputs: the bottom-up and top-down flow. Bottom-up flow begins at the individual user level, where millions of users provide feedback, leading companies to adjust their models. These emerging norms then influence national regulators' definitions of "acceptable AI behavior," which, in turn, can contribute to the development of international standards.
Top-down flow originates with international bodies, such as the UN, proposing global AI safety principles. These principles are then adopted by nations and formalized into laws, which companies are required to comply with. Consequently, individual users experience an AI system directly shaped by these high-level, distant policy decisions.
Conclusion
AI systems inevitably encode values, which vary widely across cultures and experiences. Decisions about whose feedback, preferences, and safety concerns shape AI determine whose interests it serves. Achieving pluralistic alignment is technically challenging but primarily requires the will from those who build AI systems. Building AI that serves all humanity demands acknowledging its plurality and that values cannot be reduced to a single reward function.
So, does it matter whose values we encode in AI? Yes, it does.
-------------------------
This article is part of the Africa and The Big Debates on AI analysis series, produced as an output of the African Observatory on Responsible AI, a flagship project of the Global Center on AI Governance, funded by Canada’s International Development Research Centre and the UK government's, Foreign, Commonwealth & Development Office
