Abstract:
This study focuses on the classification of British and American English accents
using machine learning, addressing the growing interest in accent-based
applications within speech processing systems. Speech samples were collected,
comprising 414 British and 410 American English recordings, to construct a
dataset representative of both accent groups. Key features were extracted from
the audio data, enabling machine learning models to differentiate between the
two accents effectively. Four machine learning models were evaluated: Naive
Bayes, K-Nearest Neighbors (KNN), Random Forest, and Decision Tree. Among
these, Naive Bayes demonstrated the highest accuracy at 84.24%, highlighting
its effectiveness in capturing the distinguishing features of British and American
English accents. KNN followed closely with an accuracy of 78.79%, benefiting
from its proximity-based classification mechanism. Random Forest achieved an
accuracy of 78.18%, leveraging ensemble learning to improve prediction stability.
The Decision Tree model, while functional, demonstrated the lowest performance
at 76.36%, indicating limitations of single-tree approaches in capturing nuanced
differences in speech patterns. The findings underscore the potential of machine
learning in accent classification and reveal significant differences in model
performance based on algorithmic design. These results contribute to advancing
research in automatic speech recognition, accent identification, and related
applications in natural language processing. Future work could explore larger
datasets, deep learning approaches, and feature optimization to further enhance
classification accuracy. By effectively distinguishing between British and
American accents, this research lays the foundation for improved speech-based
systems and broader linguistic studies.