Extractive text rank-based NLP news summarization for multiple domains

Mustofa, Md. Wazih Ullah

DSpace Home
→
Faculty of Science and Information Technology
→
Department of Computer Science and Engineering
→
Project Report
→
View Item

Extractive text rank-based NLP news summarization for multiple domains

Mustofa, Md. Wazih Ullah

URI: http://dspace.daffodilvarsity.edu.bd:8080/handle/123456789/13500

Date: 2024-01-26

Abstract:

This paper provides a thorough analysis of extractive summarization, or the use of Natural Language Processing (NLP) techniques to summarize news articles. Approximately two thousand articles covering a wide range of topics, including business, entertainment, politics, sports, and technology, were gathered from different online platforms, including the well-known "Prothom Alo" newspaper. My method included a thorough preprocessing step that included punctuation and special character removal, as well as spell correction with TextBlob. The primary focus of my study is the implementation of the TextRank algorithm, which was modified from the PageRank algorithm to handle natural language text. Using this technique, text was represented as a graph, with edges denoting the cosine similarity between sentences and vertices representing the sentences themselves. I described my process for vectorizing sentences and creating a similarity matrix by figuring out the cosine similarity between each pair. The paper explores the algorithmic nuances of using a customized sentence similarity function to rank sentences according to their relevance and importance. I then conducted a comparative analysis of the summaries generated against the original texts, calculating similarity scores to evaluate the efficacy of my summarization process. The study aims to highlight the effectiveness of extractive summarization in processing large volumes of news data, offering insights into the potential of NLP in media analytics. By comparing the actual summaries and those generated through my method, I draw conclusions about the precision and utility of extractive summarization in the context of diverse news content. This research contributes to the field by demonstrating a practical application of NLP in the efficient processing and summarization of large-scale news data.

Show full item record