Hammond, Tracy
Myna: Masking-Based Contrastive Learning of Musical Representations
Yonay, Ori, Hammond, Tracy, Yang, Tianbao
We present Myna, a simple yet effective approach for self-supervised musical representation learning. Built on a contrastive learning framework, Myna introduces two key innovations: (1) the use of a Vision Transformer (ViT) on mel-spectrograms as the backbone and (2) a novel data augmentation strategy, token masking, that masks 90 percent of spectrogram tokens. These innovations deliver both effectiveness and efficiency: (i) Token masking enables a significant increase in per-GPU batch size, from 48 or 120 in prior methods (CLMR, MULE) to 4096. (ii) By avoiding traditional augmentations, Myna retains pitch sensitivity, enhancing performance in tasks like key detection. (iii) The use of vertical patches allows the model to better capture critical features for key detection. Our hybrid model, Myna-22M-Hybrid, processes both 16x16 and 128x2 patches, achieving state-of-the-art results. Trained on a single GPU, it outperforms MULE (62M) on average and rivals MERT-95M, which was trained on 16 and 64 GPUs, respectively. Additionally, it surpasses MERT-95M-public, establishing itself as the best-performing model trained on publicly available data. We release our code and models to promote reproducibility and facilitate future research.
Mamba in Vision: A Comprehensive Survey of Techniques and Applications
Rahman, Md Maklachur, Tutul, Abdullah Aman, Nath, Ankur, Laishram, Lamyanba, Jung, Soon Ki, Hammond, Tracy
Mamba is emerging as a novel approach to overcome the challenges faced by Convolutional Neural Networks (CNNs) and Vision Transformers (ViTs) in computer vision. While CNNs excel at extracting local features, they often struggle to capture long-range dependencies without complex architectural modifications. In contrast, ViTs effectively model global relationships but suffer from high computational costs due to the quadratic complexity of their self-attention mechanisms. Mamba addresses these limitations by leveraging Selective Structured State Space Models to effectively capture long-range dependencies with linear computational complexity. This survey analyzes the unique contributions, computational benefits, and applications of Mamba models while also identifying challenges and potential future research directions. We provide a foundational resource for advancing the understanding and growth of Mamba models in computer vision. An overview of this work is available at https://github.com/maklachur/Mamba-in-Computer-Vision.
Mechanix: A Sketch-Based Tutoring and Grading System for Free-Body Diagrams
Valentine, Stephanie (Texas A&M University) | Vides, Francisco (Texas A&M University) | Lucchese, George (Texas A&M University) | Turner, David (Texas A&M University) | Kim, Hong-hoe (Texas A&M University) | Li, Wenzhe (Texas A&M University) | Linsey, Julie (Texas A&M University) | Hammond, Tracy (Texas A&M University)
Introductory engineering courses within large universities often have annual enrollments which can reach up to a thousand students. It is very challenging to achieve differentiated instruction in classrooms with class sizes and student diversity of such great magnitude. Professors can only assess whether students have mastered a concept by using multiple choice questions, while detailed homework assignments, such as planar truss diagrams, are rarely assigned because professors and teaching assistants would be too overburdened with grading to return assignments with valuable feedback in a timely manner. In this paper, we introduce Mechanix, a sketch-based deployed tutoring system for engineering students enrolled in statics courses. Our system not only allows students to enter planar truss and free body diagrams into the system just as they would with pencil and paper, but our system checks the student's work against a hand-drawn answer entered by the instructor, and then returns immediate and detailed feedback to the student. Students are allowed to correct any errors in their work and resubmit until the entire content is correct and thus all of the objectives are learned. Since Mechanix facilitates the grading and feedback processes, instructors are now able to assign free response questions, increasing teacher's knowledge of student comprehension. Furthermore, the iterative correction process allows students to learn during a test, rather than simply displaying memorized information.
Sketch Recognition Algorithms for Comparing Complex and Unpredictable Shapes
Field, Martin (Texas A&M University) | Valentine, Stephanie (Saint Mary's University of Minnesota) | Linsey, Julie (Texas A&M University) | Hammond, Tracy (Texas A&M University)
In an introductory engineering course with an annual enrollment of over 1000 students, a professor has little option but to rely on multiple choice exams for midterms and finals. Furthermore, the teaching assistants are too overloaded to give detailed feedback on submitted homework assignments. We introduce Mechanix, a computer-assisted tutoring system for engineering students. Mechanix uses recognition of freehand sketches to provide instant, detailed, and formative feedback as the student progresses through each homework assignment, quiz, or exam. Free sketch recognition techniques allow students to solve free-body diagram and static truss problems as if they were using a pen and paper. The same recognition algorithms enable professors to add new unique problems simply by sketching out the correct answer. Mechanix is able to ease the burden of grading so that instructors can assign more free response questions, which provide a better measure of student progress than multiple choice questions do.
Hashigo: A Next-Generation Sketch Interactive System for Japanese Kanji
Taele, Paul (Texas A&M University) | Hammond, Tracy (Texas A&M University)
Language students can increase their effectiveness in learning written Japanese by mastering the visual structure and written technique of Japanese kanji. Yet, existing kanji handwriting recognition systems do not assess the written technique sufficiently enough to discourage students from developing bad learning habits. In this paper, we describe our work on Hashigo, a kanji sketch interactive system which achieves human instructor-level critique and feedback on both the visual structure and written technique of students’ sketched kanji. This type of automated critique and feedback allows students to target and correct specific deficiencies in their sketches that, if left untreated, are detrimental to effective long-term kanji learning.