Класифікація кіберзагроз на основі аналізу текстових описів з використанням методів обробки природної мови

Шаламай, Д.С.; Євсеєв, С.П.; Кушнерьов, Олександр Сергійович; Kushnerov, Oleksandr Serhiiovych

Класифікація кіберзагроз на основі аналізу текстових описів з використанням методів обробки природної мови

dc.contributor.author	Шаламай, Д.С.
dc.contributor.author	Євсеєв, С.П.
dc.contributor.author	Кушнерьов, Олександр Сергійович
dc.contributor.author	Kushnerov, Oleksandr Serhiiovych
dc.date.accessioned	2025-06-19T07:41:55Z
dc.date.available	2025-06-19T07:41:55Z
dc.date.issued	2025
dc.description.abstract	Дослідження присвячене розробці системи для автоматичної класифікації кіберзагроз на основі текстових описів, наданих у довільній формі. Через недостатню оперативність традиційних підходів до документування загроз , була створена система, що використовує методи обробки природної мови (NLP) та машинного навчання. Початковий набір з 220 загроз банківського сектору було розширено до 1078 описів за допомогою технік аугментації даних, зокрема парафразування. Розроблений конвеєр обробки даних включає очищення та лематизацію тексту за допомогою бібліотеки Stanza , перетворення тексту у вектори TF-IDF та багатовихідну класифікацію з використанням RandomForestClassifier. Створена система здатна категоризувати загрози за вісьмома параметрами і спершу перевіряє схожість введеного опису з існуючими в базі за допомогою косинусної подібності, перш ніж задіяти модель машинного навчання.	en_US
dc.description.abstract	This research is dedicated to the development of a system for the automatic classification of cyber threats based on textual descriptions provided in a free-form manner. Due to the insufficient speed of traditional approaches to threat documentation , a system was created that utilizes natural language processing (NLP) and machine learning methods. An initial dataset of 220 threats from the banking sector was expanded to 1078 descriptions using data augmentation techniques, particularly paraphrasing. The developed data processing pipeline includes text cleaning and lemmatization with the Stanza library , text-to-vector conversion using TF-IDF , and multi-output classification using RandomForestClassifier. The resulting system can categorize threats across eight parameters and first checks the similarity of an input description against an existing database using cosine similarity before engaging the machine learning model.	en_US
dc.description.abstract	This research is dedicated to the development of a system for the automatic classification of cyber threats based on textual descriptions provided in a free-form manner. Due to the insufficient speed of traditional approaches to threat documentation , a system was created that utilizes natural language processing (NLP) and machine learning methods. An initial dataset of 220 threats from the banking sector was expanded to 1078 descriptions using data augmentation techniques, particularly paraphrasing. The developed data processing pipeline includes text cleaning and lemmatization with the Stanza library , text-to-vector conversion using TF-IDF , and multi-output classification using RandomForestClassifier. The resulting system can categorize threats across eight parameters and first checks the similarity of an input description against an existing database using cosine similarity before engaging the machine learning model.	en_US
dc.identifier.citation	Шаламай Д. С., Євсеєв С. П., Кушнерьов О. С. Класифікація кіберзагроз на основі аналізу текстових описів з використанням методів обробки природної мови // Матеріали V Міжнародної науково-практичної конференції «Інформаційна безпека та інформаційні технології» (Харків, Одеса, Луцьк, 9–11 червня 2025 р.). – 2025. – С. 56–58.	en_US
dc.identifier.uri	https://essuir.sumdu.edu.ua/handle/123456789/99211
dc.language.iso	uk	en_US
dc.publisher	ПП «Новий Світ-2000»	en_US
dc.rights.uri	CC BY 4.0	en_US
dc.subject	кіберзагрози	en_US
dc.subject	cyber threats	en_US
dc.subject	класифікація	en_US
dc.subject	classification	en_US
dc.subject	обробка природної мови	en_US
dc.subject	natural language processing	en_US
dc.subject	машинне навчання	en_US
dc.subject	machine learning	en_US
dc.subject	аугментація даних	en_US
dc.subject	data augmentation	en_US
dc.title	Класифікація кіберзагроз на основі аналізу текстових описів з використанням методів обробки природної мови	en_US
dc.type	Theses	en_US

Files

Original bundle

Now showing 1 - 1 of 1

Name:: Kushnerov_.Klasifikazij.pdf
Size:: 1.55 MB
Format:: Adobe Portable Document Format
Description:

Download

License bundle

Now showing 1 - 1 of 1

Name:: license.txt
Size:: 3.96 KB
Format:: Item-specific license agreed upon to submission
Description:

Download

Collections

Наукові видання (ННІ БіЕМ)