Goto

Collaborating Authors

 sinir


Performance Comparison of Pre-trained Models for Speech-to-Text in Turkish: Whisper-Small and Wav2Vec2-XLS-R-300M

arXiv.org Artificial Intelligence

In this study, the performances of the Whisper-Small and Wav2Vec2-XLS-R-300M models which are two pre-trained multilingual models for speech to text were examined for the Turkish language. Mozilla Common Voice version 11.0 which is prepared in Turkish language and is an open-source data set, was used in the study. The multilingual models, Whisper- Small and Wav2Vec2-XLS-R-300M were fine-tuned with this data set which contains a small amount of data. The speech to text performance of the two models was compared. WER values are calculated as 0.28 and 0.16 for the Wav2Vec2-XLS- R-300M and the Whisper-Small models respectively. In addition, the performances of the models were examined with the test data prepared with call center records that were not included in the training and validation dataset.


A Comparison of Time-based Models for Multimodal Emotion Recognition

arXiv.org Artificial Intelligence

Emotion recognition has become an important research topic in the field of human-computer interaction. Studies on sound and videos to understand emotions focused mainly on analyzing facial expressions and classified 6 basic emotions. In this study, the performance of different sequence models in multi-modal emotion recognition was compared. The sound and images were first processed by multi-layered CNN models, and the outputs of these models were fed into various sequence models. The sequence model is GRU, Transformer, LSTM and Max Pooling. Accuracy, precision, and F1 Score values of all models were calculated. The multi-modal CREMA-D dataset was used in the experiments. As a result of the comparison of the CREMA-D dataset, GRU-based architecture with 0.640 showed the best result in F1 score, LSTM-based architecture with 0.699 in precision metric, while sensitivity showed the best results over time with Max Pooling-based architecture with 0.620. As a result, it has been observed that the sequence models compare performances close to each other.


An Estimation of Personnel Food Demand Quantity for Businesses by Using Artificial Neural Networks

arXiv.org Machine Learning

Today, many public or private institutions provide professional food service for personnels working in their own organizations. Regarding the planning of the said service, there are some obstacles due to the fact that the number of the personnel working in the institutions is generally high and the personnel are out of the institution due to personal or institutional reasons. Because of this, it is difficult to determine the daily food demand, and this causes cost, time and labor loss for the institutions. Statistical or heuristic methods are used to remove or at least minimize these losses. In this study, an artificial intelligence model was proposed, which estimates the daily food demand quantity using artificial neural networks for businesses. The data are obtained from a refectory database of a private institution with a capacity of 110 people serving daily meals and serving at different levels, covering the last two years (2016-2018). The model was created using the MATLAB package program. The performance of the model was determinde by the Regression values, the Mean Absolute Percentage Error (MAPE) and the Mean Squared Error (MSE). In the training of the ANN model, feed forward back propagation network architecture is used. The best model obtained as a result of the experiments is a multi-layer (8-10-10-1) structure with a training R ratio of 0,9948, a testing R ratio of 0,9830 and an error rate of 0,003783, respectively. Experimental results demonstrated that the model has low error rate, high performance and positive effect of using artificial neural networks for demand estimating.