This article is part 4 of the Business Decisions of LLM Model Selection whitepaper series. Other parts: The 2026 LLM market map, Task-based model selection and benchmarks, Cost optimization and routing strategy.
Security, Compliance and Data Residency
When deploying artificial intelligence in a business context, the third — and often most important — dimension alongside performance and cost is security and regulatory compliance. A poorly chosen deployment model not only poses data-protection risks but can also carry legal consequences. In this chapter we review how each provider handles data, what the EU AI Act means for model selection, and how to choose based on sector-specific requirements.
AI Model Data-Handling Models
The table below summarizes the six most common deployment models from the perspective of who processes the data, where the processing takes place, and whether the data is used for further model training.
The most important distinction is that business APIs do not use your data for further model training by default — but this was not always the case, and terms can change. Azure OpenAI and Google Vertex AI have the advantage of letting you choose the data-processing region, so data can be kept within the EU. Mistral is a natively European provider, which is advantageous from a GDPR perspective. A local model provides full control — but at the cost of hardware and maintenance.
EU AI Act Impact on Model Selection (2026)
The EU AI Act is actively in force in 2026 and has a direct impact on how large language models may be used in business environments. The regulation affects model selection in two main areas.
High-risk applications (healthcare, employment, credit scoring):
- Mandatory human oversight — The model may only provide suggestions; it may not make autonomous decisions. This means the AI system's output must always be reviewed by a human decision-maker who approves or overrides it.
- Transparency — The user must know they are communicating with an artificial intelligence. This is especially important in chatbot and customer-service applications where the user may assume a natural human interaction.
- Documentation — The AI decision chain must be auditable. All inputs, outputs and the logic applied by the model must be logged so that the reasoning behind any given suggestion can be traced after the fact.
- Model selection consequence — Documented, auditable models should be preferred. The OpenAI and Anthropic enterprise APIs come with audit logs that meet these requirements. For local models, logging must be implemented by the operator.
General-Purpose AI Models (GPAI):
- Models with a training capacity above 10^25 FLOP are classified as systemic risk, which places additional obligations on the model provider (not the user).
- This is not directly the responsibility of the company using the model, but the provider's compliance affects the risk profile. If a provider fails to meet EU AI Act requirements, this also represents a risk for the companies using it.
Sector-Specific Data Protection Considerations
Data-protection risk is not equal across every industry. The table below summarizes the five most common sectors in terms of the type of sensitive data involved and the recommended deployment model.
Healthcare and the legal sector carry the highest risk, which is why local models or guaranteed EU-based processing are recommended in these industries. In finance, SOC 2 certification represents the security baseline. In the beauty/wellness sector — where personal and health data (e.g., allergies) are intermingled — EU-based processing is sufficient. In marketing the risk is lower, and any provider is suitable as long as it has a Data Processing Agreement (DPA).
The Decision Tree: Which Deployment Model to Choose?
The following decision tree helps navigate the most common scenarios:
Is the data sensitive (healthcare, legal)?
├── Yes → Can the data leave the organization?
│ ├── No → LOCAL MODEL (Llama, Mistral 7B)
│ └── Yes, but within the EU → AZURE OPENAI EU or MISTRAL EU
└── No → Is price the priority?
├── Yes → OPENAI API (GPT-4o-mini) or GEMINI FLASH
└── No, quality is → CLAUDE SONNET or GPT-4o
The first question in the decision tree is always data sensitivity. If the data is sensitive and cannot leave the organization, there is no alternative: a local model is required. If it can leave but must stay within the EU, Azure OpenAI EU or Mistral is the solution. If the data is not sensitive, the decision simplifies to a trade-off between price and quality.
Local Models — When Are They Worth It?
Local (on-premise or self-hosted) models are increasingly popular, especially since Meta's Llama and Mistral's open-weight models are approaching the performance of closed models. But running locally is not always the best choice. Let's look in detail at when it pays off — and when it doesn't.
Advantages
-
Full data control — Data never leaves the server. There is no third party, no data transfer, no risk that the provider changes its data-handling terms. This is the single biggest advantage for sensitive sectors.
-
Zero API cost — Beyond hardware amortization there is no per-token fee. Anyone running tens of thousands of queries per day will be significantly cheaper with a local model in the long run compared to a cloud API.
-
Offline operation — No internet dependency. It works on production lines, ships, aircraft and remote sites. Where there is no stable internet, a cloud API is simply not an option.
-
Latency control — Response time is determined by our own hardware, not the provider's load. During peak hours there is no slowdown, no queuing, no rate limiting. Latency is predictable and plannable.
-
Unlimited usage — No rate limit, no token quota, no monthly spending cap. The model can be called as many times as needed, with any context size (within the hardware limits).
Disadvantages
-
Hardware cost — A single NVIDIA A100 GPU costs approximately $15,000, with an annual cloud lease of roughly $12,000 per GPU. Running a 70B model requires at least 2 such GPUs, representing a serious upfront investment.
-
Quality gap — Llama 3.3 70B is an excellent model, but it does not reach GPT-4o's level, especially in complex reasoning, nuanced non-English text and sophisticated tool-calling tasks. The gap is narrowing, but it still exists.
-
Tool calling unreliability — The tool-calling capability of open models is significantly weaker than that of closed models. OpenAI GPT-4o's tool-calling reliability is above 95%, while Llama 3.3 70B's sits around 75–85%. For business-critical automations, this is unacceptable.
-
Maintenance — Model updates, GPU monitoring, scaling, backups and troubleshooting are all our responsibility. This requires dedicated DevOps/MLOps capacity, which is a significant burden for smaller teams.
-
Non-English languages — The non-English language capabilities of open models are weaker than those of closed models. The training data of Llama and Mistral models is overwhelmingly in English, meaning that fine-tuning is needed for adequate quality in other languages — and that represents extra work and expertise.
When Is a Local Model Worth It?
The table clearly shows that a local model is the unambiguous winner in two main cases: high volume and sensitive data. For small and medium-sized businesses with a few hundred daily interactions, a cloud API is significantly more economical. In the areas of tool calling and non-English language support, cloud models still have the advantage, although the latter can be improved with fine-tuning.
The Hybrid Approach
For most companies the hybrid solution is the most realistic. This is not an "either/or" decision — the two worlds can be combined, and typically should be.
The essence of the hybrid model is simple:
- Sensitive data (customer identification, health data, legal documents) → processed by a local model; the data never leaves the organization.
- General tasks (chatbot, email generation, CRM summaries, content creation) → handled by a cloud API, where quality is higher and cost is lower.
- The router decides — the data type determines where the request goes. If the incoming data contains sensitive fields (personal identifiers, health data, legal content), it routes to the local model. If not, the cloud API receives it.
This approach combines the best of both worlds: the quality of cloud models and the data security of local models. The router logic is relatively simple, but the categorization must be accurate — a misclassified request can become a data-protection incident.
The Decision Matrix — Summary
So far we have analyzed providers, task types, costs, security considerations and local models. Now we condense all of this into a single decision table that gives the CTO or technology leader an immediate action guide.
The One-Page Decision Table
This table does not say that a single model is the best — rather, it says the best model depends on the given priority. For most companies the solution is a combination: GPT-4o-mini for routine tasks, Claude Sonnet or GPT-4o for complex tasks, and where necessary a local Llama for sensitive data.
The CTO's 5-Step Action Plan
The following five steps guide the practical implementation of model selection — from assessment through continuous optimization.
Step 1: Audit — What tasks do we use (or will we use) AI for? Create a list by category: FAQ and customer service, tool calling and automation, data analysis and summarization, content creation, code generation. The list should be specific: the first line should not read "Introduce AI" but rather "Categorize 500 daily customer-service messages and generate draft replies."
Step 2: Classification — Classify every task along three dimensions. First by complexity: simple (template-based, short context), medium (multi-step logic, medium context) or complex (reasoning, tool calling, large context). Second by data-protection status: sensitive (personal data, healthcare, legal) or non-sensitive (general content, public information). Third by criticality: business-critical (an incorrect output causes direct business loss) or supplementary (the output goes through human review).
Step 3: Model assignment — Using the decision table, assign a model to every task category. In most cases this means the following combination: GPT-4o-mini for routine tasks (customer service, categorization, simple summaries), Claude Sonnet or GPT-4o for complex tasks (reasoning, long document analysis, quality-critical content), and where necessary a local Llama 3.3 70B for sensitive data. Do not assign the most expensive model to every task — it is unnecessary cost with no quality gain.
Step 4: Provider-agnostic architecture — Do not hard-code any single model. Build an adapter layer where model selection happens at the configuration level. In practice this means the code does not contain direct OpenAI or Anthropic API calls but communicates through an intermediary layer where the model, provider and parameters are set via configuration (not code). This way, if a better or cheaper model appears tomorrow, you can switch to it with a single configuration change.
Step 5: Measurement and iteration — Measure three things: per-model cost (dollars per 1,000 interactions), quality by task type (human evaluation or automated metrics) and latency (response time from the user's perspective). Review these quarterly — because the market moves fast. A model that offers the best value today may be more expensive than a new competitor three months from now.
Closing Thought
Model selection is not a one-time decision — it is continuous optimization. Anyone who picks the best model today and hard-codes it into the system will have a suboptimal system six months later. The market moves so fast that the best solution from early 2025 is no longer the most competitive by early 2026.
Those who build a provider-agnostic architecture with task-based routing, however, always use the best price/performance combination — no matter what the market brings. The goal is not to choose well once. The goal is to build a system that is always capable of choosing well.
This series is based on the comprehensive Business Decisions of LLM Model Selection whitepaper. Want to find out which model combination best fits your company? Get in touch with us — we'll help you find the optimal balance between cost, performance and security.