Improving Music Mood Classification Using Lyrics, Audio and Social Tags

Literaturnachweis - Detailanzeige

Trefferliste

Autor/in	Hu, Xiao
Titel	Improving Music Mood Classification Using Lyrics, Audio and Social Tags
Quelle	(2010), (141 Seiten) PDF als Volltext Verfügbarkeit Ph.D. Dissertation, University of Illinois at Urbana-Champaign
Sprache	englisch
Dokumenttyp	gedruckt; online; Monographie
ISBN	978-1-1246-3625-2
Schlagwörter	Hochschulschrift; Dissertation; Expertise; Music; Linguistics; Measures (Individuals); Classification; Information Science; Psychology; Internet; Affective Behavior; Psychological Patterns; Content Analysis; Listening; Social Influences + Suchen Sie Ihr Suchwort? Thesis; Dissertations; Academic thesis; Expert appraisal; Musik; Linguistik; Messdaten; Classification system; Klassifikation; Klassifikationssystem; Informationswissenschaft; Psychologie; Affective disturbance; Active behaviour; Affektive Störung; Inhaltsanalyse; Hörvorgang; Zuhören; Sozialer Einfluss
Abstract	The affective aspect of music (popularly known as music mood) is a newly emerging metadata type and access point to music information, but it has not been well studied in information science. There has yet to be developed a suitable set of mood categories that can reflect the reality of music listening and can be well adopted in the Music Information Retrieval (MIR) community. As music repositories have grown to an unprecedentedly large scale, people call for automatic tools for music classification and recommendation. However, there have been only a few music mood classification systems with suboptimal performances, and most of them are solely based on the audio content of the music. Lyric text and social tags are resources independent of and complementary to audio content but have yet to be fully exploited. This dissertation research takes up these problems and aims to 1) summarize fundamental insights in music psychology that can help information scientists interpret music mood; 2) identify mood categories that are frequently used by real-world music listeners, through an empirical investigation of real-life social tags applied to music; 3) advance the technology in automatic music mood classification by a thorough investigation on lyric text analysis and the combination of lyrics and audio. Using linguistic resources and human expertise, 36 mood categories were identified from the most popular social tags collected from last.fm, a major Western music tagging site. A ground truth dataset of 5,296 songs in 18 mood categories were built with mood labels given by a number of real-life users. Both commonly used text features and advanced linguistic features were investigated, as well as different feature representation models and feature combinations. The best performing lyric feature sets were then compared to a leading audio-based system. In combining lyric and audio sources, both methods of feature concatenation and late fusion (linear interpolation) of classifiers were examined and compared. Finally, system performances on various numbers of training examples and different audio lengths were compared. The results indicate: 1) social tags can help identify mood categories suitable for a real world music listening environment; 2) the most useful lyric features are linguistic features combined with text stylistic features; 3) lyric features outperform audio features in terms of averaged accuracy across all considered mood categories; 4) systems combining lyrics and audio outperform audio-only and lyric-only systems; 5) combining lyrics and audio can reduce the requirement on training data size, both in number of examples and in audio length. Contributions of this research are threefold. On methodology, it improves the state of the art in music mood classification and text affect analysis in the music domain. The mood categories identified from empirical social tags can complement those in theoretical psychology models. In addition, many of the lyric text features examined in this study have never been formally studied in the context of music mood classification nor been compared to each other using a common dataset. On evaluation, the ground truth dataset built in this research is large and unique with ternary information available: audio, lyrics and social tags. Part of the dataset has been made available to the MIR community through the Music Information Retrieval Evaluation eXchange (MIREX) 2009 and 2010, the community-based evaluation framework. The proposed method of deriving ground truth from social tags provides an effective alternative to the expensive human assessments on music and thus clears the way to large scale experiments. On application, findings of this research help build effective and efficient music mood classification and recommendation systems by optimizing the interaction of music audio and lyrics. A prototype of such systems can be accessed at http://moodydb.com. [The dissertation citations contained here are published with the permission of ProQuest LLC. Further reproduction is prohibited without permission. Copies of dissertations may be obtained by Telephone (800) 1-800-521-0600. Web page: http://www.proquest.com/en-US/products/dissertations/individuals.shtml.] (As Provided).
Anmerkungen	ProQuest LLC. 789 East Eisenhower Parkway, P.O. Box 1346, Ann Arbor, MI 48106. Tel: 800-521-0600; Web site: http://www.proquest.com/en-US/products/dissertations/individuals.shtml
Erfasst von	ERIC (Education Resources Information Center), Washington, DC
Update	2017/4/10