General information
Master in Sound and Music Computing, DTIC, UPF
2010/2011. Second term.
Instructor: Emilia Gómez (emilia.gomez at upf dot edu)
Credits: 5 ECTS
Schedule: Wednesday 16:30-18:00 (lecture, room 52109) and Friday 15:00-16:30 (hands-on exercices, room 54009)
Course presentation
This course focuses on methodologies and techniques for the automatic characterization of musical audio in terms of different facets (e.g. melody, harmony, rhythm, timbre, and spatial location), temporal scopes and abstraction levels (from low-level to semantic descriptions). Special emphasis is given to state-of-the-art signal processing methods for audio content description, interdisciplinary research and music information retrieval.
Prerequisites
To take this course it is desirable to have an engineering background, to have taken some courses in Mathematics at the undergraduate level, like Algebra and Calculus, and also to be familiar with basic signal processing concepts. Programming experience and/or formal music knowledge is also desirable.
Competences to be acquired
General: Mathematic and analytic skills; Ability to find and use information to solve a given problem; Ability to communicate in English; Ability to work in a team; Ability to work in an interdisciplinary field;
Specific: Develop methods for describing audio signals; Understand current state of the art analysis techniques; Apply signal processing methodologies to solve practical problems related to audio and music analysis; Combine musical knowledge and audio signal processing to perform musical significant descriptions.
Content
- Levels and facets of music content.
- Temporal vs frequency representation of audio signals.
- Timbre and instrument description and classification.
- Pitch estimation and melodic description.
- Harmonic and tonal description.
- Computational methods for rhythmic analysis of audio.
- Music structural analysis.
- Bridging the semantic gap.
- Music classification and comparative analysis using machine learning techniques.
- Application contexts for music content analysis in Music Information Retrieval (MIR)
Class format and evaluation method
The course takes place in the 2nd term of the year (10 weeks), and it is organized in two types of session each week:
- Lectures: Wednesday from 16:30 to 18:00
- Hands-on exercises: Friday from 15:00 to 16:30
The first session mainly consists on lectures, although it will sometimes include seminars or presentations by students. The second one focuses on practical work with computers.
Homework: each week, all students are expected to review the lecture material and to work on a set of questions proposed at the lectures and small programming assignments proposed at the hands-on sessions.
The labs should be carried out in teams of no more than two students. They will consist on implementing a set of descriptors related to the different musical facets. They will be implemented in Matlab and will sometimes use existing developed programs. Students will present their results during the lab sessions.
Final exam: the student will be asked to discuss some of the topics of the course at a final exam.
Evaluation: the evaluation of the course is based on the following items:
- Exam (40%)
- Labs (40%)
- Participation in the discussions and lab presentation (20%)
Exercices
Lectures
Labs
|
Week |
Date |
Lecture |
Lab |
|
1 |
12th-14thJanuary 2011 |
Course introduction (Initial evaluation).
Content-based sound and music description.
Low-level descriptors. |
Presentation of Lab1: Low-level features and timbre. |
|
2 |
19th-21st January 2011 |
Timbre description and instrument classification. |
Lab1 |
|
3
|
26th-28th January 2011 |
NO CLASS |
NO CLASS (To be scheduled later)
Work on Lab1 |
|
4 |
2th-4th
February 2011
|
Pitch and melodic description.
|
Presentation of results for Lab1
|
|
5 |
9th-11th
February 2011
|
Harmony and tonality |
Presentation of Lab2:Melody.
Lab2 additional material: Yin: "private" folder for yin including mac mex files; Singing voice melodies.; Onset detection code.
Lab2
|
|
6 |
16th-18th
February 2011
|
Rhythm
|
Lab2
|
|
7 |
23th-25th
February 2011
|
Presentation of results for Lab2
|
Lab3: Chord and key. Presentation.lab description.
Lab 4: Tempo induction. Presentation.lab description. |
|
8 |
2nd-4th
March 2011
|
Similarity computation for structural analysis and retrieval.
|
Lab3, Lab4
|
|
9 |
9th-11th
March 2011
|
Comparative analysis and automatic classification (Mood classification).
Comparative analysis and automatic classification (Genre classification). |
Presentation of results for Lab3 |
|
10 |
16th-18th
March 2011
|
Content-based transformations.
|
Presentation of results for Lab4
Course evaluation |
A moodle page is available for the students, including updated schedules, course slides and lab material.
Lab format: lab reports should be between 4 and 6 pages long, in PDF format, and conform to the guidelines in the following ISMIR word and LaTeX templates.
Core technical papers from current literature.
- Aucouturier; J.J., Pachet, F. and Sandler, M. The way it sounds: Timbre models for analysis and retrieval of polyphonic music signals.IEEE Transactions of Multimedia, 7(6):1028‐1035 December 2005.
- Bello, J., Daudet, L., Abdallah, S., Duxbury, C., Davies, M. and Sandler, M.B. 2005. A Tutorial on Onset Detection in Music Signals, IEEE TSAP Vol 13-5
- Casey, M. A., Veltkamp, R., Goto, M., Leman, M., Rhodes, C. and Slaney, M. Content-Based Music Information Retrieval: Current Directions and Future Challenges, Proceedings of the IEEE 96(4), April 2008.
- Chai, W. Segmentation and Summarization of Music. IEEE Signal Processing Magazine, Special Issue on Semantic Retrieval of Multimedia, 2007.
- de Cheveigné, A., and Kawahara, H., YIN, a fundamental frequency estimator for speech and music, JASA, 2002.
- Dixon, S., Pampalk, E. and Widmer, G. Classification of Dance Music by Periodicity Patterns, ISMIR 2003. SLIDES
- Gómez, E., Klapuri, A., Meudic, B. 2003. Melody Description and Extraction in the Context of Music Content Processing. Journal of New Music Research Vol.32 .1
- Gómez, E., Haro, M., Herrera, P. (2009). Music and geography: content description of musical audio from different parts of the world. 10th International Society for Music Information Retrieval Conference.
- Gouyon, F. Dixon, S. A review of automatic rhythm description systems. Computer Music Journal 29(1), pp.34-54, 2005.
- Fabien Gouyon, Perfecto Herrera, Emilia Gómez, Pedro Cano, Jordi Bonada, Àlex Loscos, Xavier Amatriain, Xavier Serra, Content processing of music audio signals, in Pietro Polotti and Davide Rocchesso, eds., Sound to sense, sense to sound, a state of the art in sound and music computing, Logos Verlag, Berlin, 2008.
- Herrera-Boyer, P., Klapuri, A. Davy, M. Automatic classification of pitched musical instrument sounds, in In: Klapuri, A. & Davy, M. (eds.). Signal Processing Methods for Music Transcription, 2006.
- Perfecto Herrera, Joan Serrà, Cyril Laurier, Enric Guaus, Emilia Gómez and Xavier Serra, The discipline formerly known as MIR, fMIR workshop, ISMIR 2009.
- Klapuri, A. and Davy, M. (Editors), Signal Processing Methods for Music Transcription, Springer-Verlag, New York, 2006.
- Krumhansl, C. L. 2004. The cognition of tonality - as we know it today. Journal of New Music Research, 33(3):253–268.
- Laurier, C., Herrera, P. (2009). Automatic Detection of Emotion in Music: Interaction with Emotionally Sensitive Machines, in press.
- Leman, M., Clarisse, L., De Baets, B., De Meyer, H.,Lesaffre, M., Martens, G., Martens, J., and Van Steelant, D. Tendencies, perspectives, and opportunities of musical audio-mining. In A. Calvo-Manzano, A. Pérez-López, & J. S. Santiago (Eds.), Forum Acusticum Sevilla 2002, 16-20 September, 2002. Madrid: Sociedad Española de Acustica -SEA. (Special issue of Journal Revista de Acustica Vol XXXIII, no. 3-4), 2002.
- Peeters, G., McAdams, S., Herrera, P., Instrument sound description in the context of MPEG-7, ICMC, 2000.
- Peeters, G., A large set of audio features for sound description (similarity and classification) in the CUIDADO project, 2004.
- MFCCs in wikipedia
- Peeters, G., Music pitch representation by periodicity measures based on combined temporal and spectral representations, ICASSP, 2006.
- Selfridge-Field, E. (1998). Conceptual and representational issues in melodic comparison. In Melodic Similarity – Concepts, Procedures, and Applications, Hewlett,W.B. & Selfridge-Field, E. editors, MIT Press, Cambridge, Massachusetts.
- Scaringella, N., Zoia, G., Mlynek, D.(2005), Automatic genre classification of music content, IEEE SPM.
- Zils, A., Pachet, F. Automatic extraction of music descriptors from audio signals, ISMIR, 2004.
Complementary References
- Cooper, M. and Foote, J. Summarizing Popular Music via Structural Similarity Analysis. Proc. IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, 2003.
- Foote, J., Visualizing music and audio using self-similarity, Proceedings of the seventh ACM international conference on Multimedia (Part 1), p.77-80, 1999, Orlando, USA
- Gómez, E. (2006) Tonal description of polyphonic audio for music content processing. INFORMS Journal on Computing, Special Cluster on Computation in Music, 18(3).
- Gómez, E. and Bonada, J. (2005). Tonality visualization of polyphonic audio. In International Computer Music Conference.
- Krumhansl, C. L. (1990). Cognitive foundations of musical pitch. Oxford University Press, New York.
- Liu, D., Lu, L., & Zhang, H. J. (2003). Automatic mood detection from acoustic music data.
- Luan, Z., Lu, L. and Zhang, C. Collective annotation of music from multiple semantic categories, ISMIR 2008
- McKinney, N. and Moelants, D., 2004. Extracting the perceptual tempo from music. ISMIR 2004
- Peeters, G., Music pitch representation by periodicity measures based on combined temporal and spectral representations, ICASSP, 2006.
- Rabiner: On the Use of Autocorrelation Analysis for Pitch Detection, Speech & Signal processing, vol. 25(1), pp. 24- 33.
- Streich, S. (2007) Music Complexity a multi-faceted description of audio content., PhD thesis, MTG-UPF.
- Turnbull, D., Barrington L., Torres, D., Lanckriet, G. (2008) Semantic Annotation and Retrieval of Music and Sound Effects, IEEE TASL, 16(2).
- Yang, Y. H., Lin, Y. C., Su, Y. F., & Chen, H. H. (2008). A regression approach to music emotion recognition. IEEE Transactions on Audio, Speech, and Language Processing , 16 (2), 448-457.
Relevant Software