KunquDB

About

KunquDB

A large-scale, well-annotated audio-visual dataset
Comprises 339 speakers and 128 hours of content
Originated from the Kunqu Opera Art Canon, Kunqu yishu dadian
Structured by dialogue lines, providing explicit annotations: character names, speaker names, gender, vocal manner classifications and preliminary text transcriptions

Kunqu yishu dadian

The Kunqu Opera Art Canon, encompasses the most significant literary, musical, and audiovisual materials spanning over 600 years, embodying the essence of Kunqu Opera art. Specifically, it contains over 22.3 million words of textual documentation, 396 sets of reprinted documents totaling over 70,000 pages, 127 hours of recorded audio, over 400 hours of video recordings, and over 6,000 images, all compiled into 149 volumes. For more details about this book, you can visit the official website of its publisher, or check out its introduction on douban.

Note

After purchasing the book, we negotiated with the publisher and secured their authorization for its utilization in Kunqu Opera research. The publisher explicitly stated that the book's digital resource can be employed solely for scholarly or research endeavors upon the approval of the publisher. It may not be illegally disseminated or used for commercial purposes.

demo

Annotation Data Format

Data structured by dialogue lines, metadata stored in a CSV table
Contains all labels and annotations for each utterance
Information for each utterance includes:
- video ID indicates the corresponding video;
- start and end timestamps specify the location of the utterance within the video;
- character name denotes the character portrayed in the video, often associated with the role type, for the corresponding utterance;
- performer name represents the individual performer portraying the character for the corresponding utterance;
- vocal manner type categorizes the utterance into “stage speech” or “singing”, depending on how it’s vocalized;
- preliminary content transcription corresponds to the transcription of the spoken content within the utterance.

Here’s a screenshot of the annotation:

Annotation Data Examples

Due to data restrictions, we choose not to publicly disclose the data from the KunquDB dataset as examples. Instead, we showcase similar annotations for online Kunqu Opera videos here.

Video Name

Start time

End time

Character name

Performer name

Vocal manner

Text transcription

Play video

牡丹亭
Peony Pavilion

00:11

00:47

杜丽娘
Liniang Du

单雯
Wen Shan

Singing

原来姹紫嫣红开遍
A riot of deep purple and bright red.

牡丹亭
Peony Pavilion

01:12

01:30

杜丽娘
Liniang Du

单雯
Wen Shan

Singing

良辰美景奈何天
Why does Heaven give us brilliant day and dazzling sight?

牡丹亭
Peony Pavilion

00:30

00:50

柳梦梅
Mengmei Liu

施夏明
Xiaming Shi

Singing

则把云鬟点红松翠偏
Let me rerange your tresses in disarray.

牡丹亭
Peony Pavilion

02:25

02:32

柳梦梅
Mengmei Liu

施夏明
Xiaming Shi

Stage speech

姐姐你身子乏了
Fair maiden, you are tired.

牧羊记
Shepherd's Notes

00:26

01:20

李陵
Ling Li

施夏明
Xiaming Shi

Singing

到虏庭与哥哥报冤
To the enemy's court, I'll seek justice for my brother.

牧羊记
Shepherd's Notes

05:35

05:38

苏武
Wu Su

柯军
Jun Ke

Stage speech

竟在此享荣华受富贵
Here you unexpectedly enjoy wealth and honor.

牧羊记
Shepherd's Notes

11:00

11:06

苏武
Wu Su

柯军
Jun Ke

Singing

我的忠心铁石样坚
My loyalty is as firm as iron and stone.

牡丹亭
Peony Pavilion

03:03

03:21

杜丽娘
Liniang Du

单雯
Wen Shan

Singing

乱煞年光遍
Chaos reigns, time passes in chaos.

牡丹亭
Peony Pavilion

04:42

04:55

杜丽娘
Liniang Du

单雯
Wen Shan

Stage speech

剪不断理还乱闷无端
Endless twists and turns, cutting without end.

牡丹亭
Peony Pavilion

03:57

04:16

春香
Chunxiang

陶一春
Yichun Tao

Singing

恁今春关情似去年
The affairs of this spring seem similar to those of last year.

牡丹亭
Peony Pavilion

04:36

04:41

春香
Chunxiang

陶一春
Yichun Tao

Stage speech

小姐你侧着宜春髻子恰凭栏
Miss, leaning on the railing with your Yichun hairpin askew.

paper

KunquDB: An Attempt for Speaker Verification
in the Chinese Opera Scenario

Huali Zhou, Yuke Lin, Dong Liu, Ming Li

Citation:

If our work is useful for your research, please consider citing:

@inproceedings{zhou2024kunqudb,
title={Kunqudb: An attempt for speaker verification in the chinese opera scenario},
author={Zhou, Huali and Lin, Yuke and Liu, Dong and Li, Ming},
booktitle={International Conference on Pattern Recognition},
pages={233--249},
year={2024},
organization={Springer}
}

download

Download

Researchers can gain access to the source video data by purchasing Kunqu yishu dadian. It is the user’s responsibility to get the approval from the publisher to conduct research for non-commercial purposes. We only provide our annotation dataset and processing scripts. To obtain the annotation dataset, please contact us via E-mail: huali.zhou@dukekunshan.edu.cn or ming.li369@dukekunshan.edu.cn, along with your affiliation and the consent from the publisher.

License

The dataset is licensed under the CC BY-NC-SA 4.0 license. This means that you can share and adapt the dataset for non-commercial purposes as long as you provide appropriate attribution and distribute your contributions under the same license. Detailed terms can be found on LICENSE.

acknowledgment

This research is funded by the Kunshan Municipal Government Research Funding under the project "Deep Learning based Singing Voice Synthesis for Kun Opera". We want to thank the publisher for allowing us to conduct research on their data and DKU library staff members for their coordination. Special thanks to Xiaoyi Qin for his assistance.

About

KunquDB

Kunqu yishu dadian

Note

statistics

Role Type

Vocal Manner

Duration

demo

Annotation Data Format

Annotation Data Examples

paper

KunquDB: An Attempt for Speaker Verification
in the Chinese Opera Scenario

Huali Zhou, Yuke Lin, Dong Liu, Ming Li

Citation:

download

Download

License

acknowledgment

About

KunquDB

Kunqu yishu dadian

Note

statistics

Role Type

Vocal Manner

Duration

demo

Annotation Data Format

Annotation Data Examples

paper

KunquDB: An Attempt for Speaker Verification in the Chinese Opera Scenario

Huali Zhou, Yuke Lin, Dong Liu, Ming Li

Citation:

download

Download

License

acknowledgment

KunquDB: An Attempt for Speaker Verification
in the Chinese Opera Scenario