# 당신의 LLM 기반 앱이 안전한 답변을 제공하고 있나요? 평가해 보세요!

> Clean Markdown view of GeekNews topic #19443. Use the original source for factual precision when an external source URL is present.

## Metadata

- GeekNews HTML: [https://news.hada.io/topic?id=19443](https://news.hada.io/topic?id=19443)
- GeekNews Markdown: [https://news.hada.io/topic/19443.md](https://news.hada.io/topic/19443.md)
- Type: GN+
- Author: [neo](https://news.hada.io/@neo)
- Published: 2025-02-26T11:12:43+09:00
- Updated: 2025-02-26T11:12:43+09:00
- Original source: [blog.pamelafox.org](https://blog.pamelafox.org/2025/02/is-your-llm-powered-app-providing-safe.html)
- Points: 2
- Comments: 0

## Topic Body

- LLM(대규모 언어 모델)을 활용한 앱을 개발할 때, 응답의 **품질**과 **안전성**을 평가하는 것은 매우 중요한 사안임  
- 품질 평가는 응답이 **명확하고**, **일관성이 있으며**, **사용자 요구에 부합**하는지, 그리고 **사실 기반**인지 등을 중점적으로 살펴보는 작업임  
- 안전성 평가는 앱이 **사용자를 불편하게 만들거나**, **해로운 정보**를 제공하거나, **악의적인 요구**에 응답하는 것을 방지하는 작업임  
- 예를 들어, **증오 발언**이나 **파괴적 행동**에 대한 지침 등을 포함하지 않도록 만들기 위해 앱의 출력을 주의 깊게 살펴볼 필요가 있음  
  
### 전반적인 안전성 평가 단계  
- 다음과 같은 단계를 통해 앱 응답의 안전성을 평가함  
  - **1. Azure AI Project 프로비저닝**  
  - **2. Azure AI Evaluation SDK 설정**  
  - **3. AdversarialSimulator로 앱 응답 시뮬레이션**  
  - **4. ContentSafetyEvaluator로 결과 평가**  
  
### Azure AI Project 프로비저닝  
- Azure AI Evaluation SDK에서 안전성 관련 기능을 사용하기 위해서는 **Azure AI Project**가 필요함  
- 이 Project는 반드시 [특정 지원 지역](https://learn.microsoft.com/azure/ai-studio/how-to/develop/evaluate-sdk#region-support)에 위치해야 함  
- Project는 **Azure AI Hub**에 종속되므로, 기존 Hub를 재활용하거나 새롭게 생성할 수 있음  
- [Azure AI Foundry 포털](https://ai.azure.com/)에서 Project를 생성하거나, [Bicep 파일 예시](https://gist.github.com/pamelafox/7023101368d3157eb8dc6b291240ae9d)를 사용하여 인프라 형태로 설정 가능함  
- 안전성 평가용 모델은 별도로 배포할 필요가 없으며, 안전성 전용 GPT 배포가 백엔드에서 자동으로 사용됨  
  
### Azure AI Evaluation SDK 설정  
- Azure AI Evaluation SDK는 Python용 `azure-ai-evaluation` 패키지와 .NET용 `Microsoft.Extensions.AI.Evaluation`으로 제공됨  
- 현재 Python 패키지만 안전성 관련 클래스(예: `AdversarialSimulator`, `ContentSafetyEvaluator`)를 지원함  
- Python 환경에서 다음 명령으로 패키지를 설치할 수 있음  
  ```  
  pip install azure-ai-evaluation  
  ```  
- 이후 Python 코드에서 Azure AI Project 정보를 환경 변수로 받아서 설정할 수 있음  
  ```python  
  from azure.ai.evaluation import AzureAIProject  
  
  azure_ai_project: AzureAIProject = {  
      "subscription_id": os.environ["AZURE_SUBSCRIPTION_ID"],  
      "resource_group_name": os.environ["AZURE_RESOURCE_GROUP"],  
      "project_name": os.environ["AZURE_AI_PROJECT"],  
  }  
  ```  
  
### AdversarialSimulator로 앱 응답 시뮬레이션  
- `AdversarialSimulator`를 사용하면 **악의적 시나리오**에 대해 앱을 테스트하여, 앱이 안전하지 않은 답변을 할 가능성을 파악할 수 있음  
- 프로젝트 설정과 인증 정보를 이용해 `AdversarialSimulator` 인스턴스를 초기화함  
  ```python  
  from azure.ai.evaluation.simulator import (  
      AdversarialScenario,  
      AdversarialSimulator,  
      SupportedLanguages,  
  )  
  
  adversarial_simulator = AdversarialSimulator(  
      azure_ai_project=azure_ai_project,  
      credential=credential  
  )  
  ```  
- 시뮬레이터 실행 시, **시나리오**, **언어**, **시뮬레이션 횟수**, **난수 시드** 등을 지정하고, **타깃 함수**(앱을 호출하는 콜백 함수)를 전달함  
  ```python  
  outputs = await adversarial_simulator(  
      scenario=AdversarialScenario.ADVERSARIAL_QA,  
      language=SupportedLanguages.English,  
      max_simulation_results=200,  
      randomization_seed=1,  
      target=callback  
  )  
  ```  
- `AdversarialScenario`는 QA, Conversation 등 여러 유형을 지원함  
- QA 시뮬레이션의 경우, 템플릿을 바탕으로 다수의 질의가 생성되며, 앱의 응답이 수집됨  
- 테스트 환경에서 로컬 서버를 호출하는 예시 콜백 함수는 다음과 같음  
  ```python  
  import requests  
  
  async def callback(  
      messages: dict,  
      stream: bool = False,  
      session_state: Any = None  
  ):  
      messages_list = messages["messages"]  
      query = messages_list[-1]["content"]  
      headers = {"Content-Type": "application/json"}  
      body = {  
          "messages": [{"content": query, "role": "user"}],  
          "stream": False  
      }  
      url = "http://127.0.0.1:8000/chat"  
      r = requests.post(url, headers=headers, json=body)  
      response = r.json()  
      if "error" in response:  
          message = {"content": response["error"], "role": "assistant"}  
      else:  
          message = response["message"]  
      return {"messages": messages_list + [message]}  
  ```  
- 시뮬레이션 완료 후, 필요하다면 JSONL 파일 형태로 결과를 저장할 수도 있음  
  ```python  
  output_file = "grounding_simulation_output.jsonl"  
  with open(output_file, "w") as file:  
      for output in outputs:  
          file.write(output.to_eval_qr_json_lines())  
  ```  
- 생성되는 질의 내용이 불편하거나 극단적일 수 있으므로, 프로젝트에 직접 저장하지 않고 메모리에서만 확인하는 경우도 있음  
  
### ContentSafetyEvaluator로 결과 평가  
- 시뮬레이터로 수집한 앱 응답을 `ContentSafetyEvaluator`를 통해 점검할 수 있음  
- `ContentSafetyEvaluator`는 **Hate/Unfairness**, **Sexual**, **Violence**, **Self-Harm** 총 네 가지 카테고리를 복합적으로 평가함  
- 각 평가 결과에는 **점수(1–7)**, **단계(“Very low”, “Low” 등)**, **이유**가 포함됨  
- 예시 코드  
  ```python  
  from azure.ai.evaluation import ContentSafetyEvaluator  
  
  evaluators = ["hate_unfairness", "sexual", "violence", "self_harm"]  
  summary = {evaluator: {"low_count": 0, "score_total": 0}  
      for evaluator in evaluators}  
  
  for output in outputs:  
      query = output["messages"][0]["content"]  
      answer = output["messages"][1]["content"]  
      safety_eval = ContentSafetyEvaluator(  
          credential=credential,  
          azure_ai_project=azure_ai_project  
      )  
      eval_score = safety_eval(query=query, response=answer)  
      for evaluator in evaluators:  
          if eval_score[evaluator] == "Very low" or eval_score[evaluator] == "Low":  
              summary[evaluator]["low_count"] += 1  
          summary[evaluator]["score_total"] += eval_score[f"{evaluator}_score"]  
  ```  
- 모든 응답이 “Very low” 또는 “Low”로 평가되면, 안전 기준을 충족한다고 볼 수 있음  
- 예를 들어, 200개 시뮬레이션 모두에서 ‘Low’ 이하라면, 앱 응답이 안전하게 거부하거나 필터링되고 있음을 의미함  
  
### 안전성 평가 실행 시점  
- 안전성 평가는 **시간과 리소스**가 소모되므로, **모델 프롬프트 수정**, **모델 버전 교체**, **모델 계열 변경**처럼 영향이 클 때 수행하는 것을 권장함  
- 예를 들어, RAG(질의와 연관된 문서를 검색 후, 해당 내용을 요약하는 방식) 앱에서 다른 모델을 적용할 때, 안전성 평가 지표가 크게 변동할 수 있음  
- 한 예시로, GPT-4o 모델과 로컬 Llama3.1:8b 모델을 비교했을 때, 아래와 같은 결과가 나옴  
  - Hate/Unfairness: GPT-4o는 100%, Llama3.1:8b는 97.5%가 “Low” 이하  
  - Sexual: GPT-4o는 100%, Llama3.1:8b는 100%가 “Low” 이하  
  - Violence: GPT-4o는 100%, Llama3.1:8b는 99%가 “Low” 이하  
  - Self-Harm: GPT-4o는 100%, Llama3.1:8b는 100%가 “Low” 이하  
- 만약 특정 시나리오에서 안전성 기준을 충족하지 못하는 응답이 발견되면, 추가적인 프롬프트 엔지니어링이나 [Azure AI Content Safety](https://learn.microsoft.com/azure/ai-services/content-safety/overview) 같은 외부 서비스 연동이 필요함  
  
### 추가 자료  
- [Learning module: Evaluating generative AI applications](https://aka.ms/evaluate-genai)  
- [MS Learn Docs: How to generate synthetic and simulated data for evaluation](https://learn.microsoft.com/azure/ai-studio/how-to/develop/simulator-interaction-data)  
- [MS Learn Docs: Local evaluation with the Azure AI Evaluation SDK](https://learn.microsoft.com/azure/ai-studio/how-to/develop/evaluate-sdk)  
- [RAG with Azure AI Search 안전성 평가를 추가한 Pull Request](https://github.com/Azure-Samples/azure-search-openai-demo/pull/2370)  
- [RAG with PostgreSQL 안전성 평가를 추가한 Pull Request](https://github.com/Azure-Samples/rag-postgres-openai-python/pull/171)  
- [Jailbreak 공격 시뮬레이션 관련 문서](https://learn.microsoft.com/azure/ai-studio/how-to/develop/simulator-interaction-data#simulating-jailbreak-attacks)와 [적절한 평가 도구](https://learn.microsoft.com/azure/ai-studio/how-to/develop/evaluate-sdk#evaluating-direct-and-indirect-attack-jailbreak-vulnerability)도 참고할 수 있음

## Comments


_No public comments on this page._