Guozhen AIGlobal AI field notes and model intelligence

English translation

Conceptual example: Python code injecting malicious image samples

Published:

Category: AI Security & Privacy

Read time: 4 min

Reads: 0

Lesson #7Views are counted together with the original Chinese articleImages are preserved from the source page

Security Risk Assessment Framework

Map the attack surface of AI systems to visualize risk areas

The attack surface of AI applications extends from user inputs to knowledge bases, plugins, model providers, logs, and tool permissions. The OWASP LLM Top 10 2025 has identified prompt injection, sensitive information leakage, supply chain vulnerabilities, data and model poisoning, and over-privileged agent behavior as top-tier risks.

You can use the OWASP Top 10 for LLM Applications 2025 as a reference checklist for risk identification.

AI system attack surface: inspection checklist

I mark all external inputs in red: user text, uploaded files, web content, retrieved document snippets, and plugin return values. Any input whose origin cannot be fully controlled must never be treated as trusted system instructions.

3.1 Potential Attack Surfaces

In today’s digital era, artificial intelligence (AI) systems are widely deployed across domains such as healthcare, finance, and autonomous driving. While these systems significantly enhance efficiency and accuracy, they also introduce numerous security risks. To understand these risks, we must first recognize the potential attack surfaces within AI systems.

Security risk application checklist for AI systems

When revisiting “Security Risks in AI Systems”, avoid launching large-scale initiatives upfront. Instead, begin with a single, simple example to verify whether the core logic is clear and well-articulated.

Security risk application retrospective card for AI systems

If you haven’t yet fully internalized “Security Risks in AI Systems”, revisit this card and walk through its four actionable steps.

1. Components of an AI System and Their Attack Surfaces

A typical AI system generally comprises the following components:

  • Data Sources: Datasets used to train and test AI models.
  • Models: AI models built using machine learning algorithms, responsible for learning patterns from data and generating predictions.
  • Interfaces: Channels enabling user interaction with the AI system—commonly APIs or user interfaces.
  • Storage: Databases or cloud platforms storing models, data, and related metadata.

Each component represents a potential target for attackers, who may exploit these surfaces to achieve malicious objectives.

2. Common Attack Types

AI systems face several distinct categories of attacks:

2.1 Data Attacks

Data attacks involve tampering with the training or testing data used by AI models. Key risks include:

  • Malicious Data Injection: Attackers may inject biased or erroneous samples into training datasets, degrading model performance. For instance, inserting specially crafted images into an image recognition dataset could cause systematic misclassification of specific objects.
# Conceptual example: Python code injecting malicious image samples
import numpy as np

def inject_malicious_data(original_data, malicious_sample):
    return np.append(original_data, malicious_sample, axis=0)

# Hypothetical original dataset and malicious sample
original_data = np.array([[0, 1], [1, 0]])  # Original dataset
malicious_sample = np.array([[1, 1]])       # Malicious sample

# Inject malicious sample into original dataset
new_data = inject_malicious_data(original_data, malicious_sample)
print(new_data)  # Output: new dataset containing the malicious sample

2.2 Model Attacks

These attacks directly target the AI model itself—for example, via model hijacking or reverse engineering. Common techniques include:

  • Model Stealing: Attackers may reconstruct model architecture and parameters through repeated API queries and inference responses (e.g., via model extraction attacks).
  • Adversarial Attacks: Inputs are subtly perturbed to produce incorrect model outputs. In natural language processing, minor spelling changes can cause misinterpretation—for example, altering “weather” to “wheather” may confuse sentiment analysis.
# Adversarial attack example: simple text perturbation
original_text = "The weather is nice today."
adversarial_text = "Teh wheather is nice today."  # Intentional typos

print("Original text: ", original_text)
print("Adversarial text: ", adversarial_text)
# The model may misclassify or misinterpret this adversarial input

3. Exposure Risks and Consequences

Successful exploitation of AI system attack surfaces can lead to severe consequences, including:

AI system security risk assessment card

When auditing AI system security risks, prioritize checking: data provenance, prompt injection vectors, tool permissions, model output integrity, and log storage practices. Risks most frequently emerge at integration points—where components interface.

  • Data Leakage: Sensitive or personally identifiable information (PII) may be exposed, resulting in privacy violations and regulatory penalties.
  • Model Degradation: Compromised training data or adversarial inputs may severely impair model reliability and decision-making accuracy.
  • Financial Loss: In financial services, for example, manipulated models or fraudulent inputs could trigger erroneous trades, credit decisions, or fraud detection failures—leading to substantial monetary losses.

4. Preventive Measures

To mitigate security risks in AI systems, organizations should adopt the following proactive measures:

AI Security & Privacy Reading Map Card

Before diving into the main text of “Security Risks in AI Systems”, quickly scan the accompanying illustrations: What question does each image pose? Which conceptual distinctions need clarification? Which step invites hands-on experimentation? And—critically—what criteria define successful completion?

  • Data Validation and Cleansing: Rigorously validate all incoming data to detect and filter malicious or anomalous inputs before they reach the model.
  • Model Protection: Apply encryption, watermarking, or API-level access controls to deter unauthorized model extraction or tampering.
  • Adversarial Detection: Implement robust input sanitization, anomaly detection, and confidence-thresholding mechanisms to identify and reject adversarial inputs.

By systematically mapping and understanding AI system attack surfaces, organizations can better identify vulnerabilities and implement targeted, layered defenses—ensuring resilience, trustworthiness, and operational continuity.

Summary

As AI technologies become increasingly embedded in critical infrastructure and daily life, comprehending—and proactively addressing—their inherent security risks is not optional; it is essential. Sustainable AI innovation requires equal emphasis on technical advancement and security assurance.

Next, Chapter Three will delve deeper into two pivotal threats: data poisoning and model hijacking—both central challenges in modern AI security research and practice.

Continue

Keep reading from here

Browse English site

Reader Messages

Reader messages

Questions, corrections, extra sources, or hands-on results can be left here. No login is required.

Max 800 characters

To reduce spam, each message is checked for length, link count, and posting frequency.

0/800

Messages

0 messages
Loading messages...