📊 Full opportunity report: AMÁLIA · The Three Hard Questions. on ThorstenMeyerAI.com — validation score, market gap, and execution plan.
TL;DR
Portugal’s AMÁLIA, a €5.5M government-funded European Portuguese language model, is operational and surpasses many benchmarks. However, critical questions about its openness, data sufficiency, and objectives are still unresolved, raising concerns about the broader European sovereign-LLM efforts.
Portugal’s €5.5 million investment in the AMÁLIA language model has resulted in a functioning base version, which outperforms many benchmarks in European Portuguese tasks. However, despite its technical progress, fundamental questions about the model’s openness, data sufficiency, and strategic goals remain unanswered — issues that are central to the broader European sovereign-LLM landscape.
AMÁLIA, developed by a consortium of approximately 60 researchers from Portugal’s top research institutions, was officially launched in October 2025. The model is based on a continuation of the EuroLLM multilingual foundation, with the training pipeline including 107 billion tokens, of which only about 5.8 billion tokens are from Portuguese sources. The model currently performs well on European Portuguese benchmarks, surpassing previous open models and beating Qwen 3-8B on most tasks, though it still trails on some specific benchmarks like ALBA.
While the technical progress is notable, questions linger about how open AMÁLIA truly is, especially regarding access to its training data and model weights. Critics highlight that only a small fraction of the training tokens are from native Portuguese sources, raising doubts about the model’s native-language capabilities and potential biases. Additionally, the strategic goals—what the model is optimized for and how it aligns with Portugal’s national AI policy—are not fully clarified, according to recent analysis by Duarte O.Carmo.
AMÁLIA
The three hard
questions.
Portugal spent €5.5M to build a European Portuguese LLM. The base version is operational, the benchmarks beat Qwen 3-8B on most pt-PT tasks. So why are the most important questions still unanswered?
Last month, Duarte O.Carmo published the sharpest public analysis of AMÁLIA — Portugal’s state-funded European Portuguese large language model. He prefaces his critique with the necessary diplomatic apparatus before doing what almost nobody else in the European-sovereign-LLM discourse has been willing to do publicly: asking hard questions about whether the work, as released, actually does what it set out to do. This piece is a structural extension of his analysis. The AMÁLIA case study exposes three hard questions every national LLM effort needs to answer publicly — and the broader European sovereign-LLM movement has been operating without explicit answers to any of them.
Three questions every national LLM effort needs to answer publicly.
Duarte O.Carmo’s framing maps cleanly onto the structural argument. Each question lands specifically in AMÁLIA — and the broader European sovereign-LLM movement has been operating without explicit answers to any of them.
The three questions form a structural feedback loop. Q3 (optimization target) determines Q2 (data volume needed) which conditions Q1 (openness sufficient for community contribution). The European sovereign-LLM movement collectively benefits from these questions becoming standard methodology disclosure, not exceptional critique.
European Portuguese language model
As an affiliate, we earn on qualifying purchases.
As an affiliate, we earn on qualifying purchases.
107 billion tokens. 5.8 billion clearly pt-PT.
The structurally tractable question with a structurally surprising answer. For a model whose entire stated purpose is European Portuguese prioritization, the native-language share of extended pre-training is 5.5%. The implications cascade into every other question.
AI language model training dataset
As an affiliate, we earn on qualifying purchases.
As an affiliate, we earn on qualifying purchases.
The Olmo standard. AMÁLIA’s current state.
Allen Institute for AI’s Olmo project defines what “fully open” operationally requires. Olmo doesn’t lead frontier benchmarks. That’s not the point. The point is to be the structural reference for openness. AMÁLIA’s “fully open source” claim should track to the operational standard.
open source large language model
As an affiliate, we earn on qualifying purchases.
As an affiliate, we earn on qualifying purchases.
Four strategic positions. AMÁLIA between two and three.
Approximately €100M+ in publicly disclosed European sovereign-LLM funding across the major initiatives. The structural question every project faces: what is the actual competitive position you’re staking? Four options — none mutually exclusive — but each requiring different commitments.
AI model access and weights
As an affiliate, we earn on qualifying purchases.
As an affiliate, we earn on qualifying purchases.
Three standards. For AMÁLIA and the movement.
The structural critique generalizes beyond AMÁLIA. Italy, France, Germany, Switzerland, the OpenEuroLLM consortium, and every subsequent national project benefit from public discourse holding national LLM efforts to operational standards on openness, data accounting, and strategic positioning.
The European sovereign-AI agenda is a serious strategic project that deserves serious public discourse. O.Carmo’s analysis is what serious public discourse looks like. Appropriately diplomatic. Structurally rigorous. Willing to ask the hard questions in public when the public investment justifies it. More of this is needed — across every European sovereign-LLM project, not just AMÁLIA.
Implications for European Sovereign-Language AI Initiatives
The development of AMÁLIA exemplifies Portugal’s commitment to building a sovereign-language LLM, reflecting a broader European effort to reduce dependency on Anglo-American models. However, the unanswered questions about openness and strategic focus highlight systemic challenges facing all national LLM projects, including issues of data transparency, model accessibility, and alignment with national policies. These issues are critical because they determine whether such models will truly serve their respective countries’ linguistic and cultural needs or become closed, proprietary tools.
European Sovereign LLM Projects and Structural Challenges
Across Europe, multiple countries—including Italy, Germany, France, and Norway—are investing in sovereign-language models, often with similar funding and strategic goals. These projects are at a similar structural crossroads, grappling with questions about data sufficiency, openness, and purpose. Portugal’s AMÁLIA is a case study illustrating these broader issues, especially given its public funding and national scope. The European sovereign-LLM movement is still in early stages, with many projects in progress and no clear consensus on best practices for openness and strategic alignment.
Recent critiques, such as those by Duarte O.Carmo, emphasize that the discourse tends to focus on individual model performance rather than the underlying structural questions that will determine long-term success and sovereignty in AI.
“AMÁLIA is an impressive piece of work, but the critical questions about openness and goals remain unanswered.”
— Duarte O.Carmo
Unresolved Questions About Openness and Strategic Goals
It remains unclear how open AMÁLIA will ultimately be in terms of access to its weights and training data. The extent to which it will serve as a public resource or remain proprietary is still to be determined. Additionally, the strategic objectives—whether the model is primarily for academic, governmental, or commercial use—are not yet explicitly defined, leaving questions about its long-term role and impact.
Next Steps and Expected Developments in AMÁLIA’s Evolution
The final version of AMÁLIA is scheduled for release in June 2026, which will likely clarify some of the current uncertainties. Over the next 12-24 months, researchers and policymakers will scrutinize the model’s openness, performance, and strategic alignment more closely. Additionally, Portugal’s government and research institutions may release more detailed policies on data sharing and model access, shaping the future of the project and similar European efforts.
Key Questions
What are the main technical strengths of AMÁLIA?
AMÁLIA outperforms previous open models on European Portuguese benchmarks and beats Qwen 3-8B on most tasks, demonstrating strong technical progress based on a continuation of the EuroLLM foundation.
Why are the questions about openness and data important?
Openness affects transparency, reproducibility, and the potential for collaborative improvement. Data sufficiency impacts the model’s ability to accurately represent native language nuances and cultural context.
What are the broader implications for European AI sovereignty?
The questions raised by AMÁLIA reflect systemic challenges in developing truly sovereign-language models, crucial for reducing dependency on proprietary foreign models and ensuring linguistic and cultural representation.
When will the final version of AMÁLIA be available?
The final version is expected in June 2026, which should provide more clarity on many of the current uncertainties.
Source: ThorstenMeyerAI.com