K2Web Wizard

글번호: 871405

공나형·조태린, 「인공 지능 학습용 말뭉치의 부적절성 분석을 위한 주석 방안 연구」, 『한국사전학』42, 한국사전학회, 2023.11.

작성일: 2024.03.04

수정일: 2024.03.04

작성자: 최옥정

조회수: 21

공나형 외, 「인공 지능 학습용 말뭉치의 부적절성 분석을 위한 주석 방안 연구」, 『한국사전학』42, 한국사전학회, 2023.11.

국문 초록

본고는 대화형 인공 지능의 학습을 위한 대규모 말뭉치를 대상으로 부적절 표현을 선별하고, 그것을 분석할 수 있는 주석 방안을 마련하는 데 목적이 있다. 이를 위하여 본고는 ‘비윤리성’, ‘공격성’, ‘비하성’과 더불어 기존 논의에서 배제되었던 ‘편향성’을 부적절성의 구성 요인 중 하나로 설정함으로써 기계 학습 시 발생할 수 있는 편향(algorithmic bias)을 바로잡을 수 있도록 하였다. 부적절성 선별과 더불어 이를 분석하는 요인으로 ‘명시성’, ‘맥락’, ‘영역’, ‘강도’ 의 네 가지 범주를 설정하였다. 특히 부적절성의 강도 판정은 산업계에서도 그 요도와 활용도가 높다는 점에서 중요성을 지닌다. 본고는 강도 판정 시 발생할 수 있는 주석자 간 불일치도를 줄이기 위하여 ‘명시성’과 ‘맥락’을 다면적으로 고려하는 방안을 제안하였다. 마지막으로 본고는 말뭉치에서 추출한 용례를 대상으로 실제 분석 요인에 따른 주석 방법을 보임으로써 부적절성의 개념과 분석 요인, 주석 방안의 타당성과 실제성을 검증해 보이고자 하였다. 본고의 이와 같은 작업은 인공 지능 학습용 말뭉치를 대상으로 범용적으로 활용 가능한 부적절성 검증 틀을 고도화하는 데 기여할 수 있다는 점에서 학술적·산업적 의의를 지닌다.

영문 초록

This study aims to select inappropriate representations from large corpus built for learning conversational artificial intelligence and to develop annotation measures to analyze them. To this end, this study sets ‘bias’, which was excluded from previous discussions, as one of the components of inappropriateness, along with ‘unethical’, ‘aggression’, and ‘depreciation’, to correct algorithm bias that may occur during machine learning. Next, four categories were set as factors to analyze the inappropriateness selected: ‘explicitness’(explicit/implicit), ‘context’ (positive/negative), ‘domain’ and ‘intensity’(strong/weak). In particular, determining the intensity of inappropriateness is vital in that the need and utilization of it are high in the industry. In order to reduce the degree of inconsistency between annotators that may occur when determining strength, this paper proposed a plan to consider ‘explicit’ and ‘context’ in a multi-faceted manner. Finally, this paper tried to verify the validity and practicality of the proposed concept of inappropriateness, analysis factors, and factor annotation method by showing an annotation method according to actual analysis factors. Such work in this paper is of academic and industrial significance in that it has contributed to upgrading a universally available inappropriate verification framework for artificial intelligence learning corpus.

키워드

대화형 인공 지능, 기계 학습, 학습용 말뭉치, 편향, 부적절성, 명시성, 맥락, 영역, 강도

첨부파일: 첨부파일이(가) 없습니다.