arxiv:2001.04351

CLUENER2020: Fine-grained Named Entity Recognition Dataset and Benchmark for Chinese

Published on Jan 13, 2020

Authors:

Abstract

A Chinese NER dataset named CLUENER2020 is introduced, containing diverse categories and challenging tasks, along with released baselines and leader-board.

Generated by Qwen/Qwen2.5-Coder-32B-Instruct

In this paper, we introduce the NER dataset from CLUE organization (CLUENER2020), a well-defined fine-grained dataset for named entity recognition in Chinese. CLUENER2020 contains 10 categories. Apart from common labels like person, organization, and ___location, it contains more diverse categories. It is more challenging than current other Chinese NER datasets and could better reflect real-world applications. For comparison, we implement several state-of-the-art baselines as sequence labeling tasks and report human performance, as well as its analysis. To facilitate future work on fine-grained NER for Chinese, we release our dataset, baselines, and leader-board.