Advertisement · 728 × 90
#
Hashtag
#issue
Advertisement · 728 × 90
WSL仓库声明 issue

WSL仓库声明 issue

中国开发者道歉 issue 1

中国开发者道歉 issue 1

中国开发者道歉 issue 2

中国开发者道歉 issue 2

Github Open issues

Github Open issues

WSL仓库管理员已通过脚本删除了所有垃圾issues(图一)

> 但是我觉得中文开发者没必要为了此次攻击道歉(图二、三)

所有被攻击的仓库:github.com/microsoft/WSL/issues/202...

#Github #WSL #issue

0 0 1 0
Real-time Network Device Configuration and Security Monitoring System Using NLP and LLM **DOI :****https://doi.org/10.5281/zenodo.19314735** Download Full-Text PDF Cite this Publication T. Suganya, M. Mohamed Apsal, L.V. Shriramsankar, B. Niranjan, 2026, Real-time Network Device Configuration and Security Monitoring System Using NLP and LLM, INTERNATIONAL JOURNAL OF ENGINEERING RESEARCH & TECHNOLOGY (IJERT) Volume 15, Issue 03 , March – 2026 * **Open Access** * Article Download / Views: 0 * **Authors :** T. Suganya, M. Mohamed Apsal, L.V. Shriramsankar, B. Niranjan * **Paper ID :** IJERTV15IS031259 * **Volume & Issue : ** Volume 15, Issue 03 , March – 2026 * **Published (First Online):** 29-03-2026 * **ISSN (Online) :** 2278-0181 * **Publisher Name :** IJERT * **License:** This work is licensed under a Creative Commons Attribution 4.0 International License __ PDF Version View __ Text Only Version #### Real-time Network Device Configuration and Security Monitoring System Using NLP and LLM T. Suganya(1), M. Mohamed Apsal(2), L.V. Shriramsankar(3), B. Niranjan(4) Assistant Professor(1), Students(234) Department of Computer Science and Engineering(Cybersecurity), K.L.N. College of Engineering, Pottapalayam, Sivagangai. AbstractModern enterprise networks contain a wide range of devices, services, and security challenges, making traditional manual configuration difficult and prone to human error. To address this issue, this work proposes a natural languagebased network automation and security monitoring system that simplifies device configuration and improves operational efficiency. In this system, network administrators can express high-level intents, such as enabling SSH access, configuring system logging, or checking device status, using simple natural language commands. These commands are processed using Natural Language Processing (NLP) techniques and Large Language Models (LLMs) to automatically generate the corresponding router configuration commands. The generated configurations are then applied within a simulated enterprise network environment for real-time device management. In addition to automation, the system continuously monitors network interfaces and device behavior to identify issues such as unauthorized port activity, interface failures, or unusual network events. When such conditions are detected, alerts are generated to notify administrators. By combining intent- based automation with real-time monitoring, the system reduces manual workload, decreases the likelihood of configuration errors, and improves overall network reliability and security. This solution demonstrates a practical and scalable approach for managing modern enterprise networks efficiently. Keywords: Network Automation, Intent-Based Networking, Natural Language Processing (NLP), Netmiko, Network Security Monitoring, Intrusion Detection System (IDS) 1. INTRODUCTION Todays organisation networks include a couple of routers, switches, and interconnected devices that require non-stop configuration and monitoring to ensure green operation and security. Historically, network directors configure these devices manually the use of command- line interfaces (CLI), which can be complicated, time- consuming, and vulnerable to human errors. As community length and complexity boom, guide configuration becomes inefficient and difficult to control, regularly leading to misconfigurations and protection vulnerabilities. Latest advancements in artificial intelligence and herbal Language Processing (NLP) have enabled the improvement of clever structures that simplify complex technical obligations. In networking, motive-based totally tactics allow administrators to define excessive- degree necessities in natural language, which can be routinely translated into device-unique configuration instructions. This reduces the dependency on guide CLI operations and improves ordinary community management efficiency. In this paper, an AI-based cause-pushed network automation and security tracking gadget is proposed. The system lets in users to enter network configuration intents in easy language, which are processed to generate corresponding router commands and deployed routinely the usage of SSH-based totally automation. in addition to configuration, the gadget constantly monitors community interfaces to stumble on unauthorized get right of entry to and interface screw ups. by way of combining automation with real-time tracking, the proposed system enhances network reliability, reduces administrative effort, and improves usual community security. 2. LITERATURE REVIEW / RELATED WORK Recent research has focused on improving network management through automation, intent-based networking, and intelligent systems. Several studies have explored different approaches to simplify configuration processes and enhance network security. INSpIRE: Integrated NFV-based intent refinement environment [1] proposed an intent-based framework that refines high-level user requirements into network configurations using Network Function Virtualization (NFV). The system focuses on translating user intent into actionable policies, improving flexibility in network management. However, it mainly emphasizes service orchestration rather than real-time monitoring. A comprehensive approach to the automatic refinement and verification of access control policies [2] introduced a method for automating the refinement and verification of access control policies. Their approach enhances network security by ensuring correctness in policy implementation. While effective in policy validation, it does not address dynamic configuration or real-time device-level monitoring. IBCS: Intent-based cloud services for security applications [3] presented an intent-based cloud service model designed for security applications. The system allows users to define security requirements at a higher level, which are then implemented automatically. Although it improves cloud security management, it is primarily focused on cloud environments rather than enterprise network devices. Hey, Lumi! Using natural language for intent based network management [4] explored the use of natural language interfaces for network management. Their work demonstrates how user inputs in plain language can be translated into network configurations. This approach improves usability, but it lacks integration with continuous monitoring and alert mechanisms. A survey on intent based networking [5] provided a comprehensive survey of intent-based networking technologies, highlighting their benefits, challenges, and future directions. The study emphasizes the importance of automation in modern networks but does not propose a complete implementation combining multiple functionalities. Intent-driven autonomous network and service management in future cellular networks: A structured literature review [6] reviewed intent-driven network management approaches in next-generation cellular networks. The authors discussed the role of automation and intelligence in managing complex systems, but their focus is mainly on large-scale telecom infrastructures. From the analysis of existing works, it is observed that most solutions focus on either intent-based automation or security aspects independently. Very few systems integrate natural language-based configuration, automated deployment, and real-time monitoring into a single framework. To address this gap, the proposed system combines NLP-based intent processing with automated configuration and continuous network monitoring, providing a more comprehensive and practical solution for modern enterprise networks. 3. PROPOSED SYSTEM The proposed system is an intelligent network automation and security monitoring solution that integrates Natural Language Processing (NLP) with automated configuration and real-time monitoring. The primary objective of the system is to simplify network management by allowing administrators to interact with network devices using high-level natural language commands instead of manual command-line configuration. In this system, the user provides input in the form of simple text instructions through a web-based interface. These instructions may include tasks such as enabling SSH access, configuring IP addresses, setting up routing protocols, or checking device status. The system processes the input using an intent analysis mechanism to identify the required network operation. Once he intent is identified, the command generation module converts the users request into device-specific configuration commands. These commands are structured according to the syntax supported by network devices. The generated commands are then securely deployed to the target device using an SSH-based automation module, ensuring safe and reliable communication. In addition to configuration automation, the system includes a continuous monitoring component that observes the status of network interfaces in real time. The monitoring module checks for abnormal conditions such as unauthorized interfaces becoming active, trusted interfaces going down, or unusual device behavior. When such anomalies are detected, the system triggers an alert mechanism that notifies the administrator through email. This ensures that network issues are identified and addressed at an early stage, reducing the risk of failures and security threats. The integration of natural language-based automation with real-time monitoring makes the proposed system efficient, user-friendly, and reliable. It significantly reduces manual effort, minimizes configuration errors, and enhances overall network security. Compared to existing systems, the proposed solution provides a unified approach by combining configuration, monitoring, and alerting within a single framework. processing. The intent processing module analyzes the input and identifies the required network operation using predefined keywords or rules. Based on the identified intent, the command generation module creates device-specific configuration commands. These commands are then executed on the target device through the deployment module, which establishes a secure SSH connection using automation tools such as Netmiko. After configuration, the monitoring module continuously checks the status of network interfaces and device activity. The alert module detects abnormal conditions such as unauthorized access or interface failures and notifies the administrator through email. This architecture enables automated configuration and real-time monitoring in a single system, improving efficiency, reducing errors, and enhancing network security. Fig 1. Data flow diagram of proposed system 4. SYSTEM ARCHITECTURE The proposed system follows a modular architecture that integrates user interaction, automation, and monitoring to manage network devices efficiently. Each module performs a specific function and collectively provides a complete network management solution. The process begins with a web-based user interface, where the administrator provides instructions in natural language. These inputs are sent to the backend server for Fig 2. System architecture diagram 5. IMPLEMENTATION & METHODOLOGY 1. System Design Overview The system is designed as a modular architecture that integrates user interaction, intent processing, command generation, configuration deployment, and network monitoring. Each module works independently but is connected to form a complete automated network management system. 2. Development Environment The implementation is carried out using Python as the primary programming language due to its simplicity and strong support for network automation. A web interface is developed using the Flask framework to allow user interaction. The network environment is simulated using a router setup, enabling safe testing of configurations and monitoring features. 3. Intent Processing Mechanism The system accepts user input in the form of natural language. The intent processing module analyzes the input using keyword-based logic to identify the required network operation. Based on the detected keywords such as SSH, gateway, or OSPF, the system determines the appropriate configuration task. 4. Command Generation Process Once the intent is identified, the command generation module converts the user request into device-specific configuration commands. These commands are structured in the format supported by network devices, ensuring compatibility and correct execution. 5. Configuration Deployment The generated commands are deployed to the network device using a secure SSH connection. The deployment module establishes communication with the router and sends the commands automatically. This eliminates the need for manual configuration and ensures consistent execution of network operations. 6. Network Monitoring Mechanism After configuration, the system continuously monitors the network device by executing standard diagnostic commands. It retrieves interface status information and analyzes it to identify abnormal conditions such as inactive interfaces or unexpected activity. 7. Intrusion Detection Logic The monitoring module uses a rule-based approach to detect anomalies. It compares active interfaces with a predefined list of authorized interfaces. If an unauthorized interface becomes active or a critical interface goes down, the system identifies it as a potential issue and generates an alert. 8. Alert Generation and Notification When an abnormal condition is detected, the system generates an alert message and notifies the administrator. The alert mechanism includes email notification using secure communication protocols, ensuring that the administrator is informed in real time. 9. User Interface Interaction The web-based interface allows users to enter network intents and view system responses. After submitting an input, the user receives feedback such as generated commands, deployment status, and monitoring results, providing a simple and interactive experience. 10. Periodic Monitoring In addition to manual checks, the system supports periodic monitoring at fixed time intervals. This ensures continuous observation of the network and helps in early detection of issues without requiring user intervention. Fig 3. The gateway is configured via our system by giving prompt VIII. CONCLUSION The proposed system provides an effective solution for simplifying network management by integrating intent- based automation with continuous monitoring. It enables administrators to give high-level instructions in natural language, which are automatically converted into device- specific configuration commands and deployed efficiently. This approach reduces manual effort, minimizes human errors, and improves the overall efficiency of network configuration. Fig 4. Email alert occured when the trusted and untrusted interface is down and up respectively 6. EXISTING SYSTEM VS PROPOSED SYSTEM Table 1. Aspects of existing system and proposed system 7. PERFORMANCE EVALUATION Fig 5. performance evaluation of existing system and proposed system In addition to automation, the system continuously monitors network interfaces to detect issues such as unauthorized activity and device failures. The alert mechanism ensures timely notification, allowing quick response to potential problems. By combining automation, monitoring, and alerting in a single framework, the system enhances network reliability, security, and ease of management in modern network environments. IX. REFERENCES 1. E. J. Scheid et al., INSpIRE: Integrated NFV-based intent refinement environment, in Proc. IFIP/IEEE Symp. Integr. Netw. Service Manag., 2017. 2. M. Cheminod, L. Durante, L. Seno, F. Valenza, and 1. Valenzano, A comprehensive approach to the automatic refinement and verification f access control policies, Comput. Security, vol. 80, pp. 186199, Jan. 2019. 3. J. Kim et al., IBCS: Intent-based cloud services for security applications, IEEE Commun. Mag., vol. 58, no. 4, pp. 4551, Apr. 2020. 4. A. S. Jacobs et al., Hey, Lumi! Using natural language for intent based network management, in Proc. USENIX ATC, Jul. 2021. 5. A. Leivadeas and M. Falkner, A survey on intent based networking, IEEE Commun. Surveys Tuts., vol. 25, no. 1, pp. 625655, 1st Quart., 2023. 6. K. Mehmood, K. Kralevska, and D. Palma, Intent- driven autonomous network and service management in future cellular networks: A structured literature review, Comput. Netw., vol. 220, Jan. 2023. ______________

Real-time Network Device Configuration and Security Monitoring System Using NLP and LLM View Abstract & download full text of Real-time Network Device Configuration and Security Monitoring Syst...

#Volume #15, #Issue #03 #(March #2026)

Origin | Interest | Match

0 0 0 0
SPAM Issues

SPAM Issues

发现很多的新空仓库充满了SPAM issues,可以通过搜索:`"售后受理客服中心(2026)"`或者访问 github.com/search 找到

https://github.com/angelcanruy/68c/issues […]

[Original post on mstdn.feddit.social]

0 0 0 0
github.com/anomalyco/opencode/issues

github.com/anomalyco/opencode/issues

OpenCode的issue区也开始了:
https://github.com/anomalyco/opencode/issues

#issue #Github #SPAM #Opencode

0 0 2 0
Preview
home-assistant/frontend :lollipop: Frontend for Home Assistant. Contribute to home-assistant/frontend development by creating an account on GitHub.

同时,被SPAM攻击的Github项目还有:

1. https://github.com/home-assistant/frontend/issues
2. https://github.com/elin4231-m/cro/issues
3. https://github.com/msgpack/msgpack-node/issues
4. https://github.com/isce-framework/isce2/issues



#issue #Github #SPAM

0 1 1 0
SPAM issue

SPAM issue

SPAM isssues

SPAM isssues

Github的Microsoft/WSL仓库的issue区遭到中文博彩平台SPAM攻击

PR区未遭SPAM攻击,成为新的issue区(
WSL's issues has been attacked for Chinese betting platforms:https://github.com/microsoft/WSL/pull/20669

遭到大量SPAM攻击的issue区和action区:
https://github.com/microsoft/WSL/issues […]

[Original post on mstdn.feddit.social]

0 1 1 0
AI-Based Mental Health Companion -A Personalised Chatbot **DOI :****10.17577/IJERTV15IS030506** Download Full-Text PDF Cite this Publication Jangiti Swathi, Kallepalli Sravanthi, Paladugula Hema Lalitha, 2026, AI-Based Mental Health Companion -A Personalised Chatbot, INTERNATIONAL JOURNAL OF ENGINEERING RESEARCH & TECHNOLOGY (IJERT) Volume 15, Issue 03 , March – 2026 * **Open Access** * Article Download / Views: 4 * **Authors :** Jangiti Swathi, Kallepalli Sravanthi, Paladugula Hema Lalitha * **Paper ID :** IJERTV15IS030506 * **Volume & Issue : ** Volume 15, Issue 03 , March – 2026 * **Published (First Online):** 27-03-2026 * **ISSN (Online) :** 2278-0181 * **Publisher Name :** IJERT * **License:** This work is licensed under a Creative Commons Attribution 4.0 International License __ PDF Version View __ Text Only Version #### AI-Based Mental Health Companion -A Personalised Chatbot Jangiti Swathi , Kallepalli Sravanthi , Paladugula Hema Lalitha Dr MGR Educational and Research Institute Abstract – The pervasive global shortage of mental health professionals and barriers to access have intensified interest in scalable digital interventions. This paper presents the design, implementation, and evaluation plan for an AI-Based Mental Health Companion a personalized conversational agent that leverages transformer-based language models fine-tuned on therapeutic dialogue corpora, culturally adapted content, and structured safety mechanisms to provide Cognitive Behavioral Therapy (CBT) exercises, mood tracking, and crisis escalation. Conversations and summarized memories are stored in MongoDB to enable longitudinal personalization through retrieval-augmented prompts. The system integrates a crisis-detection pipeline, clinician escalation workflows, and privacy-preserving storage with end-to-end encryption and anonymized records. Results from prototyping and pilot evaluations demonstrate promise in symptom reduction, engagement, and scalability compared with baseline digital interventions; however, ethical, safety, and generalizability issues require systematic mitigation. This work contributes a modular architecture, a set of implementation best practices, and an evaluation framework for future clinical trials and deployment in underserved regions. Keywords: mental health chatbot, large language model, cognitive behavioral therapy, crisis detection, personalization, memory augmentation, digital mental health II INTRODUCTION Mental disorders represent a principal global health burden and are a major contributor to disability-adjusted life years worldwide. Structural shortages of trained therapists, financial barriers, stigma, and uneven geographic distribution of services have limited access to care, creating a need for scalable, evidence-based digital alternatives. Conversational agents especially those powered by recent transformer-based language models have demonstrated capacity for natural language understanding and generation that can emulate supportive, reflective dialogue. When designed with evidence-based therapeutic strategies, such systems can deliver structured interventions such as CBT psychoeducation, thought restructuring, behavioral activation, and mood monitoring. Recent progress in large language models (LLMs) has opened opportunities for more personalized and flexible conversational support, yet also introduces safety and ethical challenges that require rigorous clinical evaluation, safety engineering, and regulatory attention. This paper describes an end-to-end system that integrates LLM-based dialogue, memory summarization, and crisis escalation into a clinically informed pipeline optimized for low-resource and culturally diverse settings. III LITERATURE SURVEY The past two years have seen accelerated empirical work evaluating both feasibility and clinical impact of AI-driven conversational agents. A landmark randomized controlled trial published in NEJM AI in 2025 evaluated a generative-AI therapy chatbot (Therabot) in adults with depression, anxiety, and high-risk eating disorders; the trial reported clinically meaningful symptom reductions versus waitlist control and high user engagement, underlining the potential of careful clinician-guided LLM interventions for treatment-level effects [1]. Complementary to controlled trials, qualitative studies have characterized user experiences with generative agents, finding that many users report helpfulness, increased reflection, and high usability while also raising concerns about limits of empathy and crisis handling [5]. These real-world insights support iterative, human-in-the-loop design as a safeguard. Work on early detection and crisis surveillance shows the power of AI for identifying at-risk individuals. A prospective observational study analyzing social media streams using multimodal deep learning achieved high accuracy in early detection of mental health crises and demonstrated potential lead times for intervention, though it underscored ethical concerns around privacy and representativeness [3]. Similarly, ensemble and explainable models for suicidal ideation detection have been advanced to improve classification transparency and to distinguish suicidal from non-suicidal ideation in social text, which is critical for triage and escalation logic [2]. These methods inform the crisis detection and triage modules of a mental health companion. Evaluation frameworks and quality assessment tools have emerged to measure conversational agents therapeutic fidelity, safety, and privacy functions. The CAPE framework provides a structured rubric for assessing psychotherapy chatbots and reveals common gaps in safety features across commercial offerings, emphasizing the need for systematic quality assurance [4]. A scoping review of LLM applications in mental health care synthesized existing evidence and identified methodological heterogeneity, variable reporting standards, and an urgent need for standardized evaluation metrics to compare systems [7]. Lightweight LLMs and efficient model variants have also been investigated as a path toward deployable counselors on resource-constrained hardware, with comparative analyses showing acceptable tradeoffs between model size and counseling task performance under careful fine-tuning [6]. Several applied studies highlight domain-specific design principles. Trials comparing interfaces (digital human avatars versus text-only chatbots) demonstrate interface effects on usability and biometrics, informing UI/UX choices for engagement and acceptability [14]. Work on cognitive restructuring delivered via LLMs has shown feasibility in guiding users through structured therapeutic exercises in small user studies, suggesting that prompt-engineered LLMs can operationalize individual CBT techniques when safety guardrails are present [8]. Reviews focused on AI-driven suicide prevention and mental health surveillance summarize promising predictive performance across diverse ML models while reiterating limitations in generalization and real-world integration [9,10,15]. Collectively, these studies provide a foundation for a clinically informed AI companion that combines LLM therapeutic capabilities with explicit crisis detection, memory-based personalization, and clinician escalation pathways. 1. EXISTING SYSTEM Existing digital mental health systems range from rule-based chatbots and structured CBT apps to hybrid systems that mix templated content with limited machine learning. Commercial apps such as Wysa and Youper implement therapist-informed conversational flows and mood tracking, often combining scripted modules with automated personalization; these apps demonstrate moderate benefits for anxiety and depression symptoms but are constrained by dialog rigidity, limited natural-language flexibility, and difficulties in managing complex or high-risk presentations. Recent LLM-powered chatbots and general-purpose assistants provide richer conversational capacity but commonly lack robust safety, clinician oversight, and validated therapeutic fidelity, which limits suitability for clinical deployment [4,5,7]. Significant disadvantages of many current systems include insufficient crisis detection and escalation mechanisms, data governance and privacy gaps, lack of longitudinal personalization that truly reflects prior interactions, and limited cultural or language adaptation for global populations. Additionally, deployment on low-cost devices or in low-bandwidth evironments is often neglected, further restricting access in resource-constrained regions. 2. PROPOSED SYSTEM Figure 1 : Block diagram The proposed AI-Based Mental Health Companion (figure : 1) addresses the limitations above by combining three core principles: (1) clinically-grounded therapeutic content powered by LLMs that are fine-tuned on therapy corpora and constrained by clinician-authored prompts; (2) memory and personalization layers using MongoDB to store conversation transcripts, derived memory summaries, and longitudinal mood metrics that inform retrieval-augmented prompts; and (3) safety-first architecture with real-time crisis detection, explainable risk scoring, automatic clinician escalation, and opt-in sharing for emergency contacts. Advantages include higher conversational naturalness than template systems, tighter integration between longitudinal user history and present dialogue via memory summaries, and explicit safety workflows for high-risk events informed by recent suicide-risk detection literature [2,3,9]. The system design also emphasizes cultural adaptation, multilingual support, and an offline modest-footprint option through use of lightweight LLM variants for edge deployment where needed [6]. Together, these design choices aim to maximize accessibility while minimizing risk. 3. IMPLEMENTATION System Architecture Figure 2 : Architecture Diagram The architecture as shown in the figure 2 comprises a modular pipeline: a frontend conversational UI (mobile/web), an API orchestration layer (FastAPI or similar), LLM services (hosted or remote inference endpoints), a memory and metadata store (MongoDB), a crisis detection and risk-scoring engine, a clinician/escalation service, and monitoring/audit logs with encryption at rest and in transit. Incoming user messages are received by the API, preprocessed, and passed to an intent/NER classifier for structural extraction (intent, temporal markers, mention of harm). The pipeline simultaneously queries MongoDB for recent memory summaries and mood time series to construct a retrieval-augmented prompt. The assembled prompt is passed to an LLM with constraints (safety and therapeutic policy) and a post-filter that checks for disallowed content and risk signals. If risk thresholds are exceeded, the crisis detection module triggers an escalation workflow that anonymizes and forwards relevant data to designated clinicians and crisis contacts; otherwise, the agent reply is returned to the user and the conversation along with a concise memory summary is persisted. Modules : Module 1 Conversation Management & Data Storage: This module handles message ingestion, session management, message-level metadata, and persistent storage in MongoDB. Each conversation is assigned a unique convo_id; messages are timestamped and stored with redaction markers for sensitive PII(Personally Identifiable Information). The design includes automated summarization of each dialogue chunk into a short memory record saved to a separate collection to enable fast retrieval without scanning raw transcripts. This memory pipeline follows proven retrieval-augmented techniques to inform personalization while keeping the heavy transcript data archived and encryption-protected. Module 2 Memory Summarization & Retrieval: Periodic chunking and summarization reduce user history to salient, clinically relevant points: mood trends, recurring themes, coping strategies used, and recent crises. Summaries are short (one to three sentences) plus metadata (dates, sentiment scores). At the start of a session, the retrieval engine returns the most relevant memory snippets to the LLM, enabling contextually aware follow-ups (e.g., references to earlier coping strategies). This approach balances personalization and privacy by avoiding re-injecting long verbatim histories into prompts while preserving therapeutic continuity. Module 3 Therapeutic Dialogue & CBT Module: The therapeutic core implements structured CBT techniques, behavioral activation scheduling, thought records, Socratic questioning, and cognitive reframing via specialized prompt templates and small task-specific models when necessary. The LLM is fine-tuned or prompt-engineered to follow therapeutic scripts and to generate worksheets, stepwise exercises, and guided reflections. Safety constraints ensure the agent does not provide diagnostic claims or medication guidance; instead, referrals and psychoeducation are provided with citations to trustworthy resources when appropriate. Module 4 Gamification & Engagement: Gamified elements include progress dashboards, streaks for completing mood-tracking or behavioral tasks, and adaptive micro-challenges aligned with therapeutic goals. These features are driven by a lightweight rules engine that maps longitudinal progress metrics stored in MongoDB to engagement strategies that emphasize small wins and gradual skill acquisition. Cultural adaptation and language preferences tailor game content and reward framing to local norms. Module 5 Crisis Detection & Escalation: A dedicated pipeline uses ensemble classifiers and explainability layers to detect suicidal ideation, active self-harm intent, or imminent risk (leveraging advances in explainable suicide detection and multimodal surveillance). Risk-thresholded responses trigger a tiered response: automated safety messaging, consented outreach to emergency contacts, clinician notification including anonymized context and confidence scores, and activation of emergency services where permitted by local regulations. All escalation actions are logged and audited. Security, privacy, and compliance are cross-cutting concerns: all stored data are encrypted, access is role-based, and de-identification is enforced for any external analytics. Data retention policies and user consent flows adhere to regional regulations, and the platform provides users control over data sharing and export. 4. Results Prototype evaluation entailed technical validation, usability testing, and a small pilot study comparing the proposed system with a baseline rule-based CBT chatbot. Technical metrics showed high intent classification accuracy (>90%) for common therapeutic intents and reliable memory retrieval latency compatible with real-time conversation. In the qualitative usability study, participants reported improved rapport and perceived helpfulness relative to the baseline. In the pilot clinical outcomes pilot (n60), the LLM-augmented system produced larger reductions in self-reported depressive symptoms over 8 weeks compared with the baseline chatbot, though the sample size and study design do not permit broad generalization. A comparative table summarizes key dimensions of the proposed system versus typical existing methods: Dimension | Proposed LLM-based Companion | Rule-based/Template Chatbots | Existing LLM General- purpose Bots ---|---|---|--- Therapeutic fidelity | High (clinician-verified prompts, CBT modules) | Moderate (scripted CBT flows) | Variable (not clinician- tuned) Personalization (longitudinal) | Memory summaries + retrieval | Limited (session-based) | Limited unless engineered Crisis detection & escalation | Ensemble detection + clinician escalation | Often absent or rudimentary | Usually absent or inconsistent Safety & auditability | Explainable risk scores + logs | Limited | Limited Deployability in low-resource settings | Lightweight LLM option + offline mode | High | Variable Empirical evidence | Pilot + RCTs in field (contextdependent) | Some trials for specific apps | Emerging RCT evidence for clinician-tuned LLMs [1] The table demonstrates the proposed systems strengths in personalization, safety workflows, and clinical alignment. These gains echo recent large-scale and clinical trial findings indicating that carefully constrained, clinician-guided LLM systems can produce clinically meaningful improvements when paired with robust safety infrastructure [1,4,6]. Nevertheless, limitations remain: model hallucinations, fairness and bias across demographic groups, and the need for large-scale, multi-site randomized trials to confirm effectiveness and safety in diverse populations. The NEJM AI randomized trial provides evidence that generative AI therapy can reduce symptoms under trial conditions [1], while multiple reviews call for standardized evaluation frameworks and careful risk mitigation before broad deployment [7,4]. 5. CONCLUSION A personalized AI-Based Mental Health Companion that integrates LLM-powered therapeutic dialogue, memory-based personalization, and robust crisis detection can expand access to evidence-based psychological interventions and provide scalable support in underserved regions. The proposed architecture and modular implementation combine recent advances in transformer-based models, explainable risk detection, and deployment strategies for resource-limited environments. Empirical results from prototype testing and early trials indicate potential clinical benefits, but significant ethical, regulatory, and technical challenges persist. Future work should prioritize large-scale randomized controlled trials, cross-cultural validation, continual safety auditing, and frameworks for clinician oversight and accountability. Responsible deployment demands transparent reporting, federated and privacy-preserving learning where possible, and partnerships with clinical services to ensure that automated companions augment rather than substitute essential human care. 6. REFERENCES 1. Heinz MV, Mackin DM, Trudeau BM, Bhattacharya S, Wang Y, Banta HA, et al. Randomized Trial of a Generative AI Chatbot for Mental Health Treatment. NEJM AI. 2025; Published March 27, 2025. doi:10.1056/AIoa2400802. ai.nejm. 2. (Explainable Model) Explainable AI-based Suicidal and Non-Suicidal Ideations Detection from Social Media Text With Enhanced Ensemble Technique. Scientific Reports. 2024; (2024). 3. Early Detection of Mental Health Crises through Artificial-Intelligence-Powered Social Media Analysis: A Prospective Observational Study. Digital Health / JMIR / PMC (PMC11433454). 2024. 4. Eccleston-Turner M, et al. Evaluating the Quality of Psychotherapy Conversational Agents: Framework Development and Cross-Sectional Study (CAPE Framework). JMIR / PMC 2025. 5. Experiences of Generative AI Chatbots for Mental Health. Qualitative Study (PMC11514308). 2025. 6. Comparative Analysis: Exploring the Potential of Lightweight Large Language Models for AI-Based Mental Health Counselling Tasks. Scientific Reports. 2025. 7. A Scoping Review of Large Language Models for Generative Tasks in Mental Health Care. npj Digital Medicine. 2025 8. Evaluating an LLM-Powered Chatbot for Cognitive Restructuring. arXiv / preprint 2025. 9. AI-Driven Mental Health Surveillance: Identifying Suicidal Ideation and Other Risk States. MDPI / Information or Related Journal. 2025. 10. Artificial Intelligence in Suicide Prevention: Utilizing Deep Learning for Risk Prediction. International Journal of Psychiatry / INPJ 2024. 11. Leveraging Large Language Models for Simulated Psychotherapy: Client101 and Evaluation. JMIR Medical Education / MedEd / 2025 12. AI Chatbots for Mental Health: A Scoping Review of Effectiveness, Feasibility and Safety. Applied Sciences / MDPI (2024). 13. Artificial Intelligence and Machine Learning Techniques for Suicide Prevention: Systematic Perspectives. ScienceDirect / 2024 Review 14. Randomized Controlled Trial Usability Differences between Digital Human and Text-only Chatbot Interfaces. JMIR Human Factors. 2024 15. Early empirical and methodological critiques and recommendations for LLMs in mental health: multiple commentaries and reports including Stanford and other institutional evaluations (20242025). Stanford Report & news analyses. 2025. ______________

AI-Based Mental Health Companion -A Personalised Chatbot View Abstract & download full text of AI-Based Mental Health Companion -A Personalised Chatbot Download Full-Text PDF Cite this Publicat...

#Volume #15, #Issue #03 #(March #2026)

Origin | Interest | Match

0 0 0 0
LLM-Augmented Academic Administration: A Role-Aware Architecture for Secure College Management **DOI :****10.17577/IJERTV15IS031150** Download Full-Text PDF Cite this Publication Mrs. J. Veerendeswari, Mr. Kabilan S S, Mr. Logapriyan A, Mr. Rajesh R, 2026, LLM-Augmented Academic Administration: A Role-Aware Architecture for Secure College Management, INTERNATIONAL JOURNAL OF ENGINEERING RESEARCH & TECHNOLOGY (IJERT) Volume 15, Issue 03 , March – 2026 * **Open Access** * Article Download / Views: 1 * **Authors :** Mrs. J. Veerendeswari, Mr. Kabilan S S, Mr. Logapriyan A, Mr. Rajesh R * **Paper ID :** IJERTV15IS031150 * **Volume & Issue : ** Volume 15, Issue 03 , March – 2026 * **Published (First Online):** 27-03-2026 * **ISSN (Online) :** 2278-0181 * **Publisher Name :** IJERT * **License:** This work is licensed under a Creative Commons Attribution 4.0 International License __ PDF Version View __ Text Only Version #### LLM-Augmented Academic Administration: A Role-Aware Architecture for Secure College Management Mrs. J. Veerendeswari Head of the Department, Information Technology, Rajiv Gandhi College of Engineering and Technology, Puducherry, India Mr. Kabilan S S, Mr. Logapriyan A, Mr. Rajesh R UG, Information Technology, Rajiv Gandhi College of Engineering and Technology, Puducherry, India Abstract – Contemporary academic institutions remain constrained by fragmented information silos, labor- intensive administrative workflows, and inflexible permission structures inherent to legacy Enterprise Resource Planning (ERP) platforms. Although Large Language Models (LLMs) present a compelling opportunity to modernize institutional operations, their deployment within multi-stakeholder educational environments introduces non-trivial risks around data confidentiality and intra-organizational access governance. This paper presents the backend architecture of an LLM-augmented College Management System (CMS) purpose-built for administrative and faculty operations, proposing a principled approach to embedding generative AI within the sensitive boundaries of higher education infrastructure. At the core of the proposed system is AIRA – an Adaptive Intelligent Routing Architecture – a multi-agent AI framework orchestrated beneath a rigorously enforced Role-Based Access Control (RBAC) layer. This architecture automates high-complexity institutional workflows including dynamic academic report generation, attendance analytics, and fee lifecycle management, while ensuring that all AI-mediated database interactions remain strictly bounded by the requesting user’s authorization profile. Informed by documented vulnerabilities in production-grade LLM agent deployments, the design deliberately decouples AI routing logic from core transactional database operations, and enforces token-authenticated security contracts at every API boundary. The result is a scalable, role-aware blueprint for LLM augmentation in academic administration one that advances operational intelligence without compromising the integrity or confidentiality of sensitive institutional data. Keywords: Large Language Models (LLMs), Academic Administration, Enterprise Resource Planning (ERP), Multi-Agent AI, Role-Based Access Control (RBAC), Smart Campus, Automated Reporting, LLM Agent Safety, Database Security, Higher Education Systems. 1. INTRODUCTION The modernization of higher education administration demands more than the digitization of physical records; it requires intelligent, automated systems capable of actively assisting faculty and institutional administrators. Current College Management Systems (CMS) typically rely on rigid, query-specific architectures that fail to provide real- time, context-aware insights, leaving faculty burdened with manual workflow bottlenecks such as calculating defaulters and compiling multi-departmental reports. Recently, the adoption of Large Language Models (LLMs) has presented an opportunity to create “Smart Campuses” through natural language data querying. However, deploying these models in a secure enterprise environment introduces critical data privacy risks. Research has demonstrated that LLM-based agents are susceptible to prompt injection attacks, malfunction induction, and unsafe tool invocations when deployed without proper security constraints [1][2]. If an AI agent has direct execution access to a centralized college database, it must be strictly governed to prevent unauthorized internal data access for instance, a department staff member querying the confidential financial records restricted only to top-level administrators. Furthermore, comprehensive evaluations of LLM agents across multi-agent environments reveal that none of the tested agents achieved a safety score above 60%, underscoring the substantial challenge of safely deploying AI in enterprise settings [3]. These findings directly motivate the architectural decisions made in this work. To address these challenges, this paper introduces a novel backend architecture that integrates a context-aware multi- agent AI assistant, named AIRA (Artificial Intelligence Routing Agent), directly into a secure institutional ERP. The primary contribution of this work is a layered architectural blueprint that seamlessly combines automated administrative workflows with a robust Role-Based Access Control (RBAC) framework. By decoupling the AI prompt- engineering logic from the core SQL execution engine and enforcing strict authorization checks at the API layer, the proposed system ensures that AI-driven insights are highly efficient, accurate, and mathematically bounded by the user’s security clearance. 2. LITERATURE REVIEW The transition from traditional academic record-keeping to intelligent institutional management involves multiple overlapping domains of research, primarily focusing on database accessibility, workflow automation, and enterprise security. 1. Legacy ERPs vs. AI-Augmented Systems Traditional College Management Systems (CMS) are fundamentally transactional. They rely on rigid, pre-defined Graphical User Interfaces (GUIs) where administrators must navigate complex menus and execute specific, hard- coded SQL queries to retrieve data. The integration of Natural Language Processing (NLP) and Large Language Models (LLMs) allows users to retrieve complex datasets through conversational prompts. However, while general- purpose AI models are highly capable of understanding intent, they often hallucinate table schemas or fail to generate accurate SQL when deployed in specialized institutional environments without proper context- bounding. 2. Vulnerabilities in Deployed LLM Agents A critical dimension of deploying AI in institutional environments is the inherent instability and exploitability of LLM agents. Zhang et al. [1] introduced a class of malfunction amplification attacks in which adversaries use prompt injection to trap agents in infinite loops or redirect them into executing irrelevant actions, achieving failure rates exceeding 80% across multiple agent frameworks. Crucially, these attacks are difficult to detect through conventional self-examination defenses precisely because they target benign-looking operational failures rather than overtly harmful commands. This work directly informs our architectural decision to intercept and validate all AI- generated SQL before execution, rather than relying on the LLM itself to self-regulate its output. Complementing this, Cemri et al. [2] conducted the first systematic taxonomy of Multi-Agent System (MAS) failures using Grounded Theory analysis across over 200 execution traces. Their MAST framework identifies 14 distinct failure modes across three categories: specification issues (41.8%), inter-agent misalignment (36.9%), and task verification failures (21.3%). For our single-agent AIRA framework, the most relevant findings are step repetition (FM-1.3, 17.14%) and reasoning-action mismatch (FM-2.6, 13.98%) both of which our RBAC interception engine is designed to mitigate by enforcing deterministic execution boundaries regardless of the LLM’s internal reasoning state. 3. LLM Security and Enterprise Data Privacy The most critical barrier to adopting AI in institutional management is data security. Current research into enterprise LLM deployment frequently warns against “naive integration” where an AI agent is gien unrestricted read access to a centralized database. The AGENT-SAFETYBENCH evaluation framework [3] reveals two fundamental safety defects in current LLM agents: lack of robustness (incorrect or incomplete tool invocations) and lack of risk awareness (proceeding with actions whose downstream consequences are unsafe). Their evaluation of 16 state-of-the-art agents found that none achieved a total safety score above 60%, with particularly concerning performance on failure modes involving ignoring constraint information (M4) and ignoring implicit risks (M5). These findings validate our design choice of placing the RBAC interceptor between the AI output and the SQL execution layer the interceptor acts as an external, deterministic constraint system that does not rely on the LLM’s own risk awareness. There is a distinct gap in literature regarding the specific implementation of interceptor-pattern Role-Based Access Control (RBAC) that evaluates AI-generated queries against user authentication tokens before database execution. The proposed AIRA architecture seeks to fill this exact gap. 3. PROPOSED SYSTEM ARCHITECTURE The proposed architecture is designed exclusively for institutional administrators and faculty, eliminating the student-facing attack surface entirely. The backend is structured as a pipeline, where every natural language request generated by a user passes through multiple validation and processing layers before interacting with the core transactional database. The lifecycle of a query through the AIRA (Artificial Intelligence Routing Agent) framework operates in the following sequential phases: Phase 1: Secure API Gateway and Payload Reception. When a staff member or administrator submits a natural language query (e.g., “Show me the attendance deficit for the Computer Science department”), the frontend client transmits the request to a secure RESTful API endpoint. Accompanying this payload is an encrypted JSON Web Token (JWT) that contains the user’s explicit role (Admin or Staff) and their departmental jurisdiction. Phase 2: The RBAC Interception Engine. Before the AI processes the prompt, the request hits the Role-Based Access Control (RBAC) middleware. This is the primary security perimeter. The interceptor decodes the JWT and maps the user’s role against a rigid permission matrix. If a faculty member requests financial data strictly reserved for administrators, the middleware forcefully terminates the request and returns a 403 Forbidden status, ensuring the AI is never even invoked for unauthorized domains. This architectural choice directly addresses the lack of risk awareness failure mode identified in [3], by making risk enforcement deterministic rather than model-dependent. Phase 3: Context-Aware Prompt Injection. Once authorized, the natural language prompt is routed to the AIRA engine. Instead of exposing the entire database schema to the Large Language Model (LLM), the system utilizes a context-bounding mechanism. The backend dynamically retrieves only the SQL table schemas relevant to the authorized user’s department and injects them into a highly structured system prompt. This design mitigates the hallucination and reasoning-action mismatch failure modes documented in [2], drastically reducing the context window size and preventing the AI from generating queries against non-existent or restricted tables. Phase 4: LLM Processing and SQL Generation. The heavily contextualized prompt is processed by the underlying language model, which translates the human intent into a structured, syntactically correct SQL query. To prevent prompt-injection attacks which Zhang et al. [1] demonstrated can induce malfunction rates exceeding 80% the AI is restricted to generating SELECT queries, with all INSERT, UPDATE, or DELETE operations strictly routed through traditional, hard-coded administrative API endpoints. Phase 5: Execution, Aggregation, and Automation. The generated SQL query is executed against the relational database. If the user’s prompt requested a specific automated workflow such as compiling a category-wise report or a CGPA list the raw database output is intercepted by the backend’s automation engine. This engine algorithmically processes the data arrays and utilizes server-side rendering libraries to dynamically generate a formatted PDF report, returning the downloadable file to the user alongside the AI’s natural language summary. 4. IMPLEMENTATION AND METHODOLOGY 1. Technology Stack and Integration The core backend API is developed using Python, chosen for its robust ecosystem of data processing and AI integration libraries. Relational data (staff profiles, departmental attendance, and academic records) is managed via a SQL-based database. Authentication is handled using JSON Web Tokens (JWT), ensuring stateless, secure API communication. For automated reporting, server-side PDF generation libraries (such as ReportLab or pdfkit) are integrated directly into the algorithm layer, allowing the system to convert raw SQL arrays into formatted, downloadable documents dynamically. 2. Context-Aware Prompt Engineering To mitigate LLM hallucination and ensure accurate data retrieval, the AIRA framework utilizes dynamic System Prompt injection. When a prompt is received, the system does not send the entire database schema to the LLM. Instead, it dynamically compiles a localized schema string based on the user’s department. A simplified structure of the hidden system prompt is as follows: “You are an administrative SQL assistant. Generate a strict SELECT query for the ‘{user_department}’ department only. Use the following schema: {department_tables}. Do not use JOINs outside these tables. Respond only with the SQL query.” This explicit bounding significantly reduces token consumption, increases the accuracy of the generated queries, and directly counters the step repetition failure mode (FM-1.3) documented in [2] by constraining the agent to a well-defined schema context. 3. The RBAC Interception Engine To guarantee internal data privacy, an algorithmic interceptor is placed between the AI output and the database execution layer. Before executing any AI-generated query, the backend validates the query scope against the active user’s token payload. The interceptor architecture is informed by the security failure modes M2 (calling tools with incomplete information), M4 (ignoring constraint information), and M5 (ignoring implicit risks) identified in [3]. Algorithm 1: RBAC Middleware Interception (Pseudo- code) FUNCTION Execute_AI_Query(user_token, ai_generated_sql): claims = Decode_JWT(user_token) IF claims.role == “Admin”: RETURN Execute_SQL(ai_generated_sql) IF claims.role == “Staff”: restricted_tables = [“finance”, “salaries”, “global_settings”] IF Contains_Any(ai_generated_sql, restricted_tables): RETURN Error(403, “Unauthorized Domain Access”) IF NOT Contains(ai_generated_sql, “WHERE dept = ” + claims.dept): ai_generated_sql = Append_Department_Filter(ai_generated_s ql, claims.dept) RETURN Execute_SQL(ai_generated_sql) This programmatic isolation guarantees that even if the AI is manipulated into requesting financial data via prompt injection [1], the execution is blocked at the application level independent of the LLM’s own safety alignment. 5. RESULTS AND EVALUATION 1. Security Validation (RBAC) Penetration testing was simulated at the API layer to validate role boundaries between Administrators and Staff. Test cases involved authenticated “Staff” tokens attempting to execute AI queries for unauthorized domains, such as insttutional financial summaries or cross-departmental academic records. In 100% of the simulated edge cases, the RBAC interception engine successfully evaluated the token claims against the AI-generated SQL query, blocking execution before database interaction. This included test cases modelled after the prompt injection attack patterns described in [1], such as injected commands instructing the AI to repeat actions or access restricted tables. 2. AI Query Accuracy The prompt-routing mechanism was tested using a dataset of 500 standard administrative queries (e.g., “Generate a list of defaulters in the CS department”). By injecting scoped database schemas rather than the full database structure, the AIRA system achieved a 96% accuracy rate in translating natural language into executable, secure database actions. The context-bounding technique directly addresses the reasoning-action mismatch (FM-2.6) and step repetition (FM-1.3) failure modes identified by Cemri et al. [2], by constraining the action space available to the model at inference time. 6. Conclusion and Future Scope This paper presented a secure, AI-driven backend architecture designed to eliminate the manual workflow bottlenecks prevalent in traditional College Management Systems. By decoupling the Artificial Intelligence Routing Agent (AIRA) from direct database execution and enforcing a strict Role-Based Access Control (RBAC) interception layer, the system successfully bridges the gap between natural language data retrieval and enterprise-level data privacy. The design is directly informed by empirical findings on LLM agent vulnerabilities: the malfunction amplification attacks demonstrated in [1], the systematic failure taxonomy of multi-agent systems developed in [2], and the comprehensive safety benchmarking of [3] collectively establish that secure AI deployment in enterprise settings requires deterministic, external enforcement mechanisms rather than reliance on the LLM’s intrinsic safety alignment. The evaluations demonstrate that institutional administrators and faculty can reliably automate complex tasks such as dynamic report generation and attendance tracking without risking unauthorized access to restricted departmental domains. Future Scope: While the current architecture effectively processes text-based natural language queries, the logical progression for institutional automation is multimodal voice integration. Future iterations of this framework will aim to embed Speech-to-Text (STT) models directly into the AIRA pipeline, allowing administrators and faculty to issue commands verbally. Pairing voice recognition with the existing RBAC interceptor will require additional security considerations, such as speaker verification and resistance to voice-based injection attacks analogous to the prompt injection vectors documented in [1]. The framework will evolve from a text-based ERP into a fully hands-free, intelligent “Smart Campus” assistant. REFERENCES 1. Zhang, B., Tan, Y., Shen, Y., Salem, A., Backes, M., Zannettou, S., & Zhang, Y. (2024). Breaking Agents: Compromising Autonomous LLM Agents Through Malfunction Amplification. arXiv:2407.20859 [cs.CR]. 2. Cemri, M., Pan, M. Z., Yang, S., Agrawal, L. A., Chopra, B., Tiwari, R., Keutzer, K., Parameswaran, A., Klein, D., Ramchandran, K., Zaharia, M., Gonzalez, J. E., & Stoica, I. (2025). Why Do Multi-Agent LLM Systems Fail? arXiv:2503.13657 [cs.AI]. 3. Zhang, Z., Cui, S., Lu, Y., Zhou, J., Yang, J., Wang, H., & Huang, M. (2025). AGENT-SAFETYBENCH: Evaluating the Safety of LLM Agents. arXiv:2412.14470 [cs.CL]. ______________

LLM-Augmented Academic Administration: A Role-Aware Architecture for Secure College Management View Abstract & download full text of LLM-Augmented Academic Administration: A Role-Aware Architec...

#Volume #15, #Issue #03 #(March #2026)

Origin | Interest | Match

0 0 0 0
Personalized Medicine through AI-Driven Diagnostics **DOI :****10.17577/IJERTV15IS031006** Download Full-Text PDF Cite this Publication Vageesha Vats, Gunn P Jain, 2026, Personalized Medicine through AI-Driven Diagnostics, INTERNATIONAL JOURNAL OF ENGINEERING RESEARCH & TECHNOLOGY (IJERT) Volume 15, Issue 03 , March – 2026 * **Open Access** * Article Download / Views: 1 * **Authors :** Vageesha Vats, Gunn P Jain * **Paper ID :** IJERTV15IS031006 * **Volume & Issue : ** Volume 15, Issue 03 , March – 2026 * **Published (First Online):** 27-03-2026 * **ISSN (Online) :** 2278-0181 * **Publisher Name :** IJERT * **License:** This work is licensed under a Creative Commons Attribution 4.0 International License __ PDF Version View __ Text Only Version #### Personalized Medicine through AI-Driven Diagnostics Vageesha Vats 23BCAR01961, BCA-Cybersecurity Deaprtment Of Cs/It Jain Deemed To Be University Bangalore. Gunn P Jain 23BCAR02042, BCA-Cybersecurity Deaprtment Of Cs/It Jain Deemed To Be University Bangalore. Abstract – Artificial intelligence is transforming healthcare. Shifting from one size fits all care to truly individualised treatment. Due to their wide variations in genetics, clinical history and environment and lifestyle, peoples reaction to same medication differ significantly. As genomic, sequencing, electronic health records and continuous wearable data becomes more common, AI systems can use this multi-model data to predict individual drug responses and guide prescribing. In particular, AI driven pharmacogenomics uses machine learning to combine genetic variations with non-genetic factors such as diet, physical activity, and other behaviours to produce personalised recommendations for medication selection and dosage instead of population average regimens. This paper explores how AI based diagnostic and decision-support systems can link diagnostics to treatment by recommending medications according to each patients genes and lifestyle profile, aiming to reduce adverse drug reactions and improve therapeutic effectiveness. It also highlights key challenges including model, transparency, data, privacy, bias, regulatory oversight, and clinician acceptance that must be addressed for safe, equitable and real-world adoption of AI-driven personalised prescribing. 1. INTRODUCTION Individualized care is increasingly replacing generic treatments in todays data-rich, technologically advanced healthcare environment. Conventional one size fits all prescribing frequently overlooks important variations in genetics, medical history, environment and daily routines which can result in inconsistent treatment results and preventable side-effects. Thanks to the convergence of wearable sensors, genomics, artificial intelligence, and electronic health records, it is now feasible to analyze these factors collectively and create individualized medical regimens. In this regard, AI driven personalized medicine focuses on employing algorithms to determine how patients with various gene profiles and lifestyles reacts to medications and then suggesting the best drug and dosage for each individual. The goal of the study is to investigate how AI can be used to transition from general diagnostic support to personalized prescribing in which treatments are selected and modified based on patients genes and personal life choices. AI systems can support more accurate, safer, and efficient medication choices, than standard guidelines alone by combining geonomics (the study of how genes affect drug response) with the real-world data like activity levels, diet, sleep, and comorbidities. AI driven decision support tools have the potential to revolutionize the way clinicians chose medication, particularly for chronic conditions requiring long-term closely monitor therapy, as healthcare systems look for ways to improve outcomes while reducing trial and error prescribing. However, there are significant research questions regarding data, quality, model, reliability, patient, privacy, fairness, and practical usability in clinical settings when developing and implementing such AI based prescription systems. It is necessary to comprehend how to incorporate lifestyle and genomic data into clinical workflows without overburdening physicians, how to make sure AI recommendations are clear and understandable, and how to prevent the reinforcement of pre-existing biases in healthcare data. In order to demonstrate how AI driven personalize prescribing can improve patient care while assuring to safety, ethical, and legal requirements, this study focuses on comprehending the balance between technological capability and responsible implementation. Research objectives AI Powered Risk Assessment and Diagnostics Analyze how AI models classify patients into risk groups that inform treatment planning and use clinical, imaging, and laboratory data to identify disease early. Using Genomics to Customize Prescriptions Examine how AI can combine clinical variables and genetic variations to forecast each persons reaction to drugs and recommend the best medication and dosages. Optimization of Lifestyle-Aware Therapy Examine how AI systems can be used to continuously improve and customize medication regimens by in cooperating data from variables and digital health tools ( such as activity, sleep, and appearance patterns). Practical, Ethical, and Regulatory Aspects Examine the main obstacles to the safe and fair implementation of AI driven personalize, prescribing an actual healthcare systems, such as transparency, data, privacy, biases, regulation, and clinical training. 2. LIMITATIONS OF THE STUDY While the exploration of AI-driven personalized prescribing holds promising opportunities for improvement of healthcare services, the findings of the research also come with a series of limitations that could affect the generalization of the results. Conceptual and Non-Experimental Scope This research paper appears to be conceptual and based on literature findings rather than actual experiments or trials. Accordingly, the findings of the research paper are based on theoretical potential and outcomes of the application of AI- driven personalized prescribing. Data Availability and Representativeness This Research paper assume the availability of high-quality genomic, clinical and lifestyle information. However, the quality of the information obtained from various sources can be limitation to the generalization of the findings of the research paper. Simplification of Technical Models In order to make the research paper understandable to a wider audience, the technical models of AI architecture and algorithms used in the research paper are presented in a simplified form. Regulatory and Ethical Generalization In the study regulatory, ethical and governance issues are highlighted at a general level and across several regions. It is important to know the regulatory system very significantly between different countries and are constantly changing. Limited Coverage of Clinical Specialties Although some specialties like cancer treatment, heart disease treatment, and managing chronic diseases are highlighted in the study, the studies not exhaustive in terms of covering all medical specialties or drug types. Assumptions Used in the Study There are some assumptions used in the study regarding the level of technology and AI adoption by different individuals and institutions. In reality, the level of technology and AI adoption may vary significantly between different individuals and institutions. Incomplete Analysis of Long-term Impact Long-term clinical, economic, and social implications of AI- based personalized prescription are beyond the scope of the study. 3. SCOPE OF THE STUDY This current study is based on exploring and understanding the concept, design and practical applications of AI-based personalized prescribing within modern healthcare settings. The overall aim of the study is to analyze how artificial intelligence can be used to improve clinical decision-making to move beyond traditional. One size fits all medicine towards personalized medicine based on individual genetic profiles and lifestyles. It examine how AI-based personalized prescribing can work and practice from the collection of individual information to AI-based analysis to create personalized medicine recommendations. Moreover, the current study is based on exploring how practical AI-based personalize prescribing can be within real-world healthcare settings. This involves understanding the willingness of individual healthcare, practitioners and institutions to adopt AI-based personalize, prescribing, as well as the awareness and willingness of patience to use the individual, genetic and lifestyle information to make prescribing decisions. In addition to that, the current study is based on evaluating the potential of AI-based personalize prescribing to create innovation within precision, medicine and genomics. This involves understanding the potential benefits and challenges associated with creating AI-based personalized prescribing models to improve window individual patient care. Overall, the scope of the project is to gain meaningful insights into how AI-based personalize prescription systems can fit into contemporary healthcare, making treatment, more precise, safer, and more individualized, while also contributing to the broader evolution of data driven and patient centered medicine. 4. LITERATURE REVIEW WHO- guidance on artificial intelligence in health (2021 – 2024). FDA -AI/ML in software as medical device. Regulatory expectations for life-cycle management, validation, and reporting of AI-enabled diagnostic and decision support tools. Reviews on AI in Personalized Medicine and Precision Health (2020-2025). Overview of applications and diagnostics, risk prediction, and individual treatment planning, including oncology, cardiology, and neurology. AI in Pharmacogenomics and Patient Specific Drug Response (2023-2025). Studies that integrate gene variant with machine learning models that help predict the drug efficiency toxicity and optimal dozing for individual patients. Multi-model and Genomics Plus Lifestyle Models for Precision Medicine (2024-2025). Work on combining clinical data, genomics, imaging, and lifestyle, or environmental data to improve patient stratification and treatment personalization. Generative AI and Personalized Medicine and Treatment Planning (2024-2025). Research on using generator models to summarize clinical knowledge, simulate, patient, responses, and support Tailored therapy design. Clinical Studies on AI-Enabled Diagnostics and Treatment Guidance (2023-2025). Evidence of improved detection, risk scoring, and treatment, planning and oncology, endoscopy, and other specialty using AI tools integrated into clinical workflows. Policy and Governance Analysis for AI in Health (2023-2025). WHO, OECD, and regional policy papers discussing health- system, impacts, data governance, fairness, and macro-level considerations for AI-driven personalized care. Critical Perspective on Bias, Over- Customization, and Real- World use. Commentaries highlighting risks of algorithmic bias, limiting generalizability, over-reliance on personalization, and the practical barriers to AI deployment and diverse healthcare settings. 5. RESEARCH METHODOLOGY Artificial intelligence (AI) has become one of the most significant technologies changing modern healthcare systems. In recent years, researchers have explored how AI can support the transition from traditional medical practices to more personalize approaches. Conventional healthcare models often follow a generalized treatment strategy in which similar therapies are provided to large groups of patients. However, individuals differ in terms of genetics, medical history, environmental factors and lifestyle choices. Because of these variations, personalized medicine has gained significant attention as it focuses on designing treatments that is customized according to the individual needs of each patient. Several studies have highlighted the ability of AI technology to analyse large volume of healthcare data efficiently. Machine learning and deep learning techniques are capable of processing information obtained from electronic health records laboratory reports medical scan imaging systems and genomic databases. By examining these complex datasets, AI-models can detect patterns and correlations that may not be immediately visible to health care professionals. As a result AI can access physicians in making faster and more accurate clinical decision. Researchers have also examined the role of AI in early disease detection. AI-based diagnostic tools have shown strong effectiveness in identifying illnesses such as cancer, cardiovascular disease and neurological disorders. These systems are especially useful for analysing diagnostic image such as MRI scan, CT scan and X-rays by detecting small abnormalities in imaging data, AI model can support doctors and identifying diseases at earlier an stage, which can significantly improve treatment outcomes and survival rates. Another important area of research is Pharmacogenomics, which focuses on understanding how genetic differences influence individual responses to medications. AI technologies can examine genomic data together with clinical information to estimate how patients might respond to certain medications. This capability allows healthcare providers to select treatments that are most suitable for each patient while reducing the risk of adverse drug reactions. As a result, a contributes to the creation of more accurate and effective treatment strategies. Recent research as also emphasized the importance of integrating multiple sources of healthcare data. Advanced AI systems are capable of combining clinical records, genomic data wearable device information, and lifestyle factors to generate more comprehensive diagnostic insights. This integrated analysis enables healthcare providers to gain better knowledge of patient condition and assists in creating individualized treatment plans. Even though AI provides significant advantages in healthcare, a number of challenges have been identified in existing literature. One major concern is safeguarding patient data, as healthcare and information is highly sensitive and requires strict privacy measures. In addition, algorithm bias may occur if AI models are often trained using datasets that do not properly represent diverse population. Another challenge involves the lack of transparency in certain AI models, which may reduce the trust amount healthcare professionals if the decision-making process cannot be clearly explained. To address this issues, international health organisations and regulatory bodies have begun developing guidelines for the responsible implementation of AI technology in healthcare. These guidelines stress the significance of ethical AI development, transparency, fairness, and patient protection. Most researches agrees that AI should function as a decision support tool them enhances the export is of healthcare professionals rather than replacing them. Overall, the literature suggests that AI driven diagnostic technologies have the potential to significantly improve personalized medicine by enabling early disease detection, improving diagnostic accuracy, and supporting individualized treatment strategies. However, further research, stronger regulatory frameworks, and continued technology advancements and necessary to ensure the safe and effective integration of AI into health Care systems. 6. PROPOSED METHODOLOGY The proposed methodology for this study focuses on examining how artificial intelligence can be used to support personalized medicine through advanced diagnostic systems. The main objective of this methodology is to analyse how AI technologies can interpret large volumes of healthcare data and assist medical professionals in identifying diseases, predicting risks, and development treatment plans that are tailored to individual patients. This research follows a structured approach that combines Data collection, data processing, AI model development, and performance evaluation. Each stage of the methodology contributes to building a system that can analyse complex healthcare information and provide meaningful insights for medical decision making. The first stage involves Data collection. Health Care data from various sources to ensure a complete analysis of patient health conditions. These sources include electronic health records that contain patient medical histories, databases that provide genetic information, laboratory test reports, and medical scan datasets such as X-rays, CT scans, and MRI scans. In addition, lifestyle information collected from wearable health tracking devices may also be included to better understand patient behaviour and health patterns. After the data is collected, the next stage is data preprocessing. Raw health Care data often contains incomplete entries, inconsistencies, or error that may affect the performance of AI models. Therefore, pre-processing is necessary to clean the data sets and prepare for it analysis. The step involves removing duplicate records, correcting in accurate information, and transforming the data into an organized format suitable for analysis by machine learning models. The following stage focuses on AI model development. At this stage, machine learning and deep learning techniques are implemented on the prepared dataset in order to detect patterns associated with diseases and treatment outcomes. These algorithms are trained using previously recorded medical data so that the systems can learn how different factors influence disease diagnosis and patient responses to treatments. Predictive models such as neural networks and classification algorithms may be used to perform this analysis. Once the models are developed, the research proceeds to the training and testing phase. The datasets are split into two parts: a training data set and a testing dataset. The training set enables the AI system to identify patterns and correlations in the data, whereas the testing set is used to measure how effectively the model performs on new, unseen information. This step helps ensure that the system is reliable and capable of making accurate predictions. To determine the effectiveness of the AI systems, performance evaluation metrics are applied. The evaluation measures include accuracy, precision, recall, and the F1 score. These indicators help measure how well the model identify diseases and predicts health outcomes. The results generated by the AI systems are also compared with traditional diagnostic methods to determine whether the use of AI improves clinical decision- making. The final state involves integration with clinical decision support systems. In this stage, the AI generated insights are incorporated into tools that assist health care professionals during diagnosis and treatment planning. Rather than replacing human expertise, the AI system functions as a supportive technology that enhances the ability of doctors to interpret complex medical data and make more informed decisions. By following this methodology, the study aims to demonstrate how AI driven diagnostic systems can contribute to improved health care outcomes. The approach highlights the potential of artificial intelligence to enable earlier disease detection, more accurate diagnoses, and personalized treatment strategies while maintaining strong ethical standards related to patient privacy, transparency, and fairness. 7. CONCLUSION The concept of personalized prescribing to the assistance of AI technology signifies a major leap forward in the effective union of technology and patient centric medicine. This study seeks to explore the capabilities of artificial intelligence to transcend the conventional "one-size-fits-all" approach to all medicine by developing a system that can provide personalized prescription of medicine according to the genetic makeup of the patient, their medical history, as well as their lifestyle pattern. This can be accomplished through the effective union of Pharmacogenomics, electronic health records, as well as data from wearable technology or other digital health tools. According to the discussion, the potential of the concept of personalized medicine through assistance of artificial intelligence technology can be deemed highly promising despite the various challenges that need to be addressed. According to the research, the development of effective artificial intelligence technology can only be accomplished through the effective collection of diverse data that can be used to provide fair recommendations. However, the research proved that the concept of personalized prescribing through assistance of artificial intelligence technology can be deemed highly effective as a potential direction for the future of medicine. From a social perspective, the research signifies the importance of the effective use of data to ensure that the benefits of personalized medicine can be enjoyed by different patient population. The potential of the concept to provide continuous improvements to the effectiveness of the medicine can be deemed highly promising through the assistance of real- time lifestyle patterns. In conclusion, the concept of personalized prescribing through the assistance of artificial intelligence technology signifies the future of medicine that can be deemed highly effective in the development of a precise healthcare ecosystem. 8. REFERENCES 1. Smith, J., Brown, T., & Miller, K. (2022). The role of artificial intelligence in precision medicine. Journal of Medical AI, 10(3), 123145 2. Chen, L., & Wang, M. (2023). Ethical considerations in healthcare artificial intelligence. AI & Society, 38(1), 5065. 3. Patel, S., Kumar, R., & Singh, A. (2021). Federated learning for genomic data analysis in precision medicine. Nature Medicine, 27(11), 1900 1908. 4. Gomez, A., & Lee, H. (2022). Clinician-cantered AI design in healthcare systems. Health Informatics Journal, 28(4), 450462. 5. World Health Organization. (2025). Ethics and governance of artificial intelligence for health: Guidance on large multi-modal models. Geneva: WHO. 6. World Health Organization. (2021). Ethics and governance of artificial intelligence for health. Geneva: WHO. 7. US Food and Drug Administration. (20212024). Artificial Intelligence/Machine Learning (AI/ML)-Based Software as a Medical Device (SaMD): Action Plan and Good Machine Learning Practice (GMLP) Initiatives. Silver Spring, MD: FDA. 8. Rajpurkar, P., et al. (2023). Artificial intelligence and personalized medicine. NPJ Digital Medicine. 9. Alavijeh, M. S., et al. (2025). Artificial intelligence in the field of pharmacogenomics. International Journal of Engineering Technology Research & Management. 10. Rana, A., et al. (2024). Integrating pharmacogenomic and AI: A review on personalised prescribing. American Journal of Biomedical Science & Research. 11. Obermeyer, Z., et al. (2023). Artificial intelligence (AI) in personalized medicine. Frontiers in Medicine. 12. Kiran, S., et al. (2024). A comprehensive review of AI applications in personalized medicine. International Journal of Scientific Research in Applied Sciences. 13. Pereira, B., et al. (2025). How genomics and multi-modal AI are reshaping precision medicine. Frontiers in Genetics. 14. Health is beyond genetics: On the integration of lifestyle and environmental factos into precision health models. (2025). Frontiers in Public Health. 15. The impact of artificial intelligence on precision medicine and personalized oncology: A systematic review. (2024). Electronic Journal of General Medicine. 16. Role of generative artificial intelligence in personalized medicine: A systematic review. (2025). Cureus. 17. Chatterjee, A., et al. (2025). The role of generative AI in personalized medicine and treatment planning. Community Medicine and Health Research Journal. 18. AIs role in revolutionizing personalized medicine by reshaping diagnostics and therapy. (2024). Computer Methods and Programs in Biomedicine. 19. Empowering personalized pharmacogenomics with generative AI. (2024). NPJ Digital Medicine. 20. Precision medicine, AI, and the future of personalized health care. (2020). JAMA.HealthAIs recommendations to the WHO Science Council on responsible technologies in global health. (2025). HealthAI. 21. How generative AI and precision medicine will change healthcare in 2025. (2025). Healthcare IT News. 22. Artificial intelligence applications in medical devices for clinical decision support. (2026). Journal of Medical Internet Research. 23. The WHO guidance for the use of large multi-modal models in health. (2024). AI Ethics Policy Lab Brief. ______________

Personalized Medicine through AI-Driven Diagnostics View Abstract & download full text of Personalized Medicine through AI-Driven Diagnostics Download Full-Text PDF Cite this Publication Vagees...

#Volume #15, #Issue #03 #(March #2026)

Origin | Interest | Match

0 0 0 0

🛠️ MC-306310 is now fixed! (43 days, 5 hours, 32 minutes) 🛠️

The “minecraft:entity.pig_big.eat” sound event is displayed as a raw translation key

➡️ https://bugs.mojang.com/browse/MC-306310

0 1 0 0
Detecting Malicious Profiles on Social Media using Multi-Dimensional Analytics **DOI :****https://doi.org/10.5281/zenodo.19235033** Download Full-Text PDF Cite this Publication M. Kumarasamy, Madana Venkata Bhavani Prasad, Sripuram Tharun, C. Vamsi, Buddaiah Vaigara Vamsi Krishna, 2026, Detecting Malicious Profiles on Social Media using Multi-Dimensional Analytics, INTERNATIONAL JOURNAL OF ENGINEERING RESEARCH & TECHNOLOGY (IJERT) Volume 15, Issue 03 , March – 2026 * **Open Access** * Article Download / Views: 13 * **Authors :** M. Kumarasamy, Madana Venkata Bhavani Prasad, Sripuram Tharun, C. Vamsi, Buddaiah Vaigara Vamsi Krishna * **Paper ID :** IJERTV15IS030871 * **Volume & Issue : ** Volume 15, Issue 03 , March – 2026 * **Published (First Online):** 26-03-2026 * **ISSN (Online) :** 2278-0181 * **Publisher Name :** IJERT * **License:** This work is licensed under a Creative Commons Attribution 4.0 International License __ PDF Version View __ Text Only Version #### Detecting Malicious Profiles on Social Media using Multi-Dimensional Analytics M. Kumarasamy Professor, Department of Computer Science and Engineering,, Siddharth Institute of Engineering and Technology, Puttur, AP. Madana Venkata Bhavani Prasad 22F61A05H0 Department of Computer Science and Engineering, Siddharth Institute of Engineering and Technology, Puttur, AP. Sripuram Tharun 23F65A0515 Department of Computer Science and Engineering, Siddharth Institute of Engineering and Technology, Puttur, AP. C. Vamsi 22F61A05G5 Department of Computer Science and Engineering, Siddharth Institute of Engineering and Technology, Puttur, AP. Buddaiah Vaigara Vamsi Krishna 22F61A05G6 Department of Computer Science and Engineering, Siddharth Institute of Engineering and Technology, Puttur, AP. ABSTRACT – Malicious profiles Detection has become increasingly important with the rise of sophisticated counterfeit accounts on Online Social Networks (OSNs), as these accounts compromise information transparency, threaten user privacy, and disrupt digital security, while traditional detection methods fail to cope with evolving malicious strategies, creating the need for an intelligent and adaptive framework. The base paper addresses this by introducing a multimodal deep learning framework that analyzes visual content, temporal activity, and network interactions, merging them into a unified representation, and demonstrating improved detection accuracy over single-modality approaches when validated on the Cresci 2017 dataset. However, this approach struggles with adversarial evasion, cross-platform adaptability, and lacks explainability in its predictions. To overcome these limitations, the proposed framework enhances FAD by integrating adversarially robust training, cross- platform generalization, and explainable AI modules, along with additional features such as behavioral biometrics, sentiment shifts in text, and real-time anomaly detection to capture subtle manipulations. Technologically, the system leverages Graph Neural Networks (GNNs) with dynamic graph embeddings for modeling evolving connections, attention-based transformers for multimodal contextual analysis, adversarial defense mechanisms for robustness, and explainable AI for transparency, making it highly relevant in cybersecurity and social media analytics. Compared to the base model, the proposed system achieves obtained accuracy with improved resilience, interpretability, and adaptability across platforms, ultimately providing a more reliable, scalable, and future-ready solution that strengthens OSN security while maintaining user trust. 1. INTRODUCTION The rapid growth of online social networks and digital platforms has significantly transformed the way people communicate, share information, and conduct business. However, this expansion has also led to a rise in malicious profiles that engage in activities such as spreading misinformation, conducting fraud, launching phishing attacks, and manipulating public opinion. These malicious entities often mimic legitimate user behavior, making their detection increasingly complex and challenging for traditional security mechanisms. As a result, there is a growing need for intelligent and scalable solutions that can accurately identify and mitigate such threats in dynamic online environments. A comprehensive framework for detecting malicious profiles must go beyond single- feature or rule-based approaches and instead leverage multi- dimensional analytics that examine user behavior from multiple perspectives. By analyzing diverse attributes such as profile metadata, behavioral patterns, network relationships, content characteristics, and temporal activity, deeper insights can be gained into hidden anomalies and coordinated malicious actions. The integration of advanced data analytics and machine learning techniques enables the system to uncover subtle patterns and correlations that are often overlooked by conventional methods. This framework aims to enhance detection accuracy, reduce false positives, and adapt to evolving attack strategies. Ultimately, such a robust and holistic approach contributes to safer digital ecosystems by strengthening trust, protecting users, and ensuring the integrity of online. 2. LITERATURE SURVEY This study focuses on identifying malicious user profiles by analyzing behavioral and profile based features extracted from social networking platforms. Machine learning classifiers are trained to distinguish between genuine and malicious accounts based on activity patterns, interaction frequency, and account metadata. The work highlights that combining multiple behavioral features significantly improves detection accuracy compared to single-feature approaches, but it also notes limitations in handling evolving attacker strategies. Graph-Based Analysis for Identifying Malicious Accounts This research explores the use of graph theory and network analytics to detect malicious profiles by examining relationships among users. By modeling social interactions as graphs, the study identifies suspicious communities and abnormal connectivity patterns often associated with coordinated malicious activities. Although effective in revealing group-based attacks, the approach faces scalability challenges when applied to large- scale, real-time social networks. Content and Behavior-Based Malicious Profile Detection The authors propose a framework that integrates content analysis with user behavior modeling to identify malicious profiles. Textual features, posting frequency, and sentiment patterns are jointly analyzed to uncover deceptive or harmful activities. The study demonstrates improved detection performance but points out that content- based features alone may be vulnerable to evasion through sophisticated text generation techniques. Unsupervised and Semi- Supervised Techniques for Malicious Account Detection This work investigates unsupervised and semi-supervised learning methods to address the scarcity of labeled data in malicious profile detection. Clustering and anomaly detection techniques are employed to identify abnormal user behavior without prior labeling. While these methods show promise in detecting novel attacks, the study emphasizes the need for hybrid models to enhance precision and reduce false alarms. 3. PROPOSED SYSTEM The proposed methodology starts by gathering user data from social media or online network sources, including profile information, posted content, interaction networks, and time-based activity records. The collected data is then preprocessed through steps such as text cleaning, feature scaling, handling missing values, and constructing interaction graphs to make it suitable for analysis. Linguistic features are extracted from textual content, behavioral features from user activity patterns, network features from relational graphs, and temporal features from posting behavior over time. These combined feature sets are used to train machine learning and deep learning models such as random forests, ensemble techniques, convolutional neural networks, long short- term memory networks, and graph neural networks. A multi-feature fusion strategy integrates information from all dimensions to enhance detection performance. The systems effectiveness is assessed using evaluation metrics including accuracy, precision, recall, F1- score, and ROC-AUC. Finally, comparisons with single- dimensional models demonstrate the superiority of the proposed muli- dimensional detection framework. Fig 1. System Architecture The diagram shows a multi-dimensional It is a process of planning a new business system or replacing an existing system by defining its components or modules to satisfy the specific requirements. Before planning, you need to understand the old system thoroughly and determine how computers can best be used in order to operate efficiently. approach for detecting malicious profiles. Behavioral features, content-based features, and network features are collected from user data. These different feature types are combined and analyzed using a multi- dimensional model. The model processes the information together to accurately identify and classify malicious profiles, improving detection reliability compared to using a single feature type alone The multi-dimensional analytics model demonstrated strong classification performance in distinguishing malicious profiles from genuine users. The integration of behavioral, content-based, and network- level features enabled the model to capture complex patterns commonly associated with fake, spam, or malicious accounts. Fig. multi-dimensional analytics model The accuracy comparison graph indicates that the proposed multi-dimensional approach outperforms traditional machine learning models that rely on limited feature sets. The improvement in accuracy highlights the importance of combining multiple data perspectives when analyzing social media behavior, as malicious users often attempt to mimic legitimate activity in one dimension while exposing anomalies in others. The learning curves of the proposed model were analysed to assess convergence and generalization performance. The close alignment between training and validation accuracy curves demonstrates stable learning behavior and minimal overfitting. This indicates that the model effectively generalizes to unseen user profiles and can reliably identify malicious behavior patterns across different user populations and platforms. The confusion matrix reveals a high true positive rate, confirming that most malicious profiles were correctly identified. A low false negative rate is particularly important in social media security, as undetected malicious accounts can spread misinformation, spam, or harmful content. Additionally, the reduced false positive rate ensures that legitimate users are not unfairly flagged, preserving user trust and platform integrity. The experimental results clearly demonstrate that detecting malicious profiles using multi- dimensional analytics significantly enhances performance compared to single- feature or rule-based detection systems. By jointly analyzing profile metadata, behavioral patterns, content characteristics, and network interactions, the proposed system achieves higher accuracy, improved generalization, and stronger resilience against evolving malicious strategies. The graphical analysis validates the models stability, robustness, and effectiveness, confirming its potential for real-time deployment in large-scale social media platforms to improve user safety, reduce abuse, and maintain platform credibility 4. CONCLUSION A multi-dimensional analytics framework provides a highly effective and comprehensive solution for detecting malicious profiles across social platforms. By integrating behavioral patterns, content features, and network structure, such systems achieve significantly higher detection accuracy and adaptability than traditional single-feature approaches. The combination of supervised and semi-supervised learning enables the model to identify both known malicious behaviors and emerging, previously unseen threat patterns. Overall, this hybrid framework enhances robustness, reduces misclassification, and supports scalable, real- time malicious profile detectionmaking it a critical advancement for safeguarding online communities from coordinated manipulation and harmful activities. REFERENCES 1. Alvari, H., Hashemi, S. M., & Hamzeh, A. (2018). Online social network spam detection using multi-dimensional features. Information Sciences, 462, 319336. 2. Benevenuto, F., Magno, G., Rodrigues, T., & Almeida, V. (2010). Detecting spammers on Twitter. In Proceedings of the 7th Annual Collaboration, Electronic Messaging, Anti- Abuse and Spam Conference (pp. 1221). ACM. 3. Bishop, C. M. (2006). Pattern recognition and machine learning. Springer. 4. Cao, Q., Sirivianos, M., Yang, X., &Pregueiro, T. (2012). Aiding the detection of fake accounts in large scale social online services. In Proceedings of the 9th USENIX Conference on Networked Systems Design and Implementation (pp. 197210). USENIX Association. 5. Chandola, V., Banerjee, A., & Kumar, V. (2009). Anomaly detection: A survey. ACM Computing Surveys, 41(3), 158. 6. Ferrara, E., Varol, O., Davis, C., Menczer, F., & Flammini, A. (2016). The rise of social bots. Communications of the ACM, 59(7), 96104. 6. Gilani, Z., Farahbakhsh, R., Tyson, G., Wang, L., & Crowcroft, J. (2017). Of bots and humans (on Twitter). In Proceedings of the 2017 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (pp. 349354). IEEE. 7. Liu, F. T., Ting, K. M., & Zhou, Z. H. 8. (2008). Isolation forest. In Proceedings of the 8th IEEE International Conference on Data Mining (pp. 413422). IEEE. 9. Wu, L., & Liu, H. (2018). Tracing fake- news footprints: Characterizing social media manipulation. IEEE Intelligent Systems, 33(2),5159. 10. Yang, K. C., Varol, O., Hui, P. M., & Menczer, F. (2020). Scalable and generalizable social bot detection through data selection. Proceedings of the AAAI Conference on Artificial Intelligence, 34(01), 10961103. ______________

Detecting Malicious Profiles on Social Media using Multi-Dimensional Analytics View Abstract & download full text of Detecting Malicious Profiles on Social Media using Multi-Dimensional Analyti...

#Volume #15, #Issue #03 #(March #2026)

Origin | Interest | Match

0 0 0 0
Post image

YouTube users are being bombarded by CAPTCHA challenges And as you can imagine, it's frustrating Despite the same look and feel, YouTube is quickly changing behind the scenes. The brand has bee...

#youtube #issue #Utilities #YouTube #issue

Origin | Interest | Match

0 1 0 0

🛠️ MC-302628 is now fixed! (167 days, 8 hours, 19 minutes) 🛠️

Dolphins don't dismount minecarts when passing over activator rails

➡️ https://bugs.mojang.com/browse/MC-302628

0 1 0 0

🛠️ MC-305467 is now fixed! (77 days, 16 hours, 52 minutes) 🛠️

The dragon death animation effects render in front of worn armor

➡️ https://bugs.mojang.com/browse/MC-305467

0 1 0 0

🛠️ MC-252814 is now fixed! (1382 days, 13 hours, 21 minutes) 🛠️

Clamp density function takes a direct input and doesn't allow a reference

➡️ https://bugs.mojang.com/browse/MC-252814

0 1 0 0

🛠️ MC-269520 is now fixed! (737 days, 10 hours, 48 minutes) 🛠️

Game freezes while using /locate command in a world without structures enabled

➡️ https://bugs.mojang.com/browse/MC-269520

0 1 0 0

🛠️ MC-306064 is now fixed! (54 days, 4 hours, 25 minutes) 🛠️

Mobs can be forced to look like they're dying while they aren't by using commands

➡️ https://bugs.mojang.com/browse/MC-306064

0 1 0 0

🛠️ MC-306890 is now fixed! (9 days, 1 hour, 43 minutes) 🛠️

Campfires cause bees to work much more slowly

➡️ https://bugs.mojang.com/browse/MC-306890

0 1 0 0

🛠️ MC-306903 is now fixed! (8 days, 5 hours, 28 minutes) 🛠️

Cubic Bézier easing functions sometimes produce wrong values

➡️ https://bugs.mojang.com/browse/MC-306903

0 1 0 0
Original post on blog.radwebhosting.com

How to Deploy Bugzilla on Ubuntu VPS (5-Minute Quick-Start Guide) This article provides a step-by-step guide detailing how to deploy Bugzilla on Ubuntu VPS. What is Bugzilla? Bugzilla is an open-so...

#Guides #Cloud #VPS #apache #bug #tracking […]

[Original post on blog.radwebhosting.com]

0 0 0 0
Real-Time Phishing Detection using Lightweight Deep Learning Models **DOI :****https://doi.org/10.5281/zenodo.19205405** Download Full-Text PDF Cite this Publication Siddharth Adhikary, Upasna Setia, 2026, Real-Time Phishing Detection using Lightweight Deep Learning Models, INTERNATIONAL JOURNAL OF ENGINEERING RESEARCH & TECHNOLOGY (IJERT) Volume 15, Issue 03 , March – 2026 * **Open Access** * Article Download / Views: 11 * **Authors :** Siddharth Adhikary, Upasna Setia * **Paper ID :** IJERTV15IS030957 * **Volume & Issue : ** Volume 15, Issue 03 , March – 2026 * **Published (First Online):** 24-03-2026 * **ISSN (Online) :** 2278-0181 * **Publisher Name :** IJERT * **License:** This work is licensed under a Creative Commons Attribution 4.0 International License __ PDF Version View __ Text Only Version #### Real-Time Phishing Detection using Lightweight Deep Learning Models Siddharth Adhikary Department of Computer Scieence and Engineering, Ganga Institute of Technology and Management Kablana, India Upasna Setia Department of Computer Scieence and Engineering, Ganga Institute of Technology and Management Kablana, India Abstract Phishing attacks continue is to be a major cybersecurity concern as attackers tries more and more to exploit digital communication channels to cheat users into revealing sensitive information. Traditional phishing detection techniques, such as blacklist-based systems and rule-based filters, often fail to detect newly created phishing websites or malicious URLs in real time. Recent advances in deep learning have improved detection of accuracy; however, many deep learning models require high computational resource and difficult to deploy in real-time systems. This study is toward the use of lightweight deep learning model for real-time phishing detection. The proposed research focuses on designing efficient neural network architectures capable of identifying phishing URLs with minimum computational. By combining terminology URL features with lightweight deep learning techniques, the model achieves effective detection while maintaining fast processing speed. The results shows that lightweight deep learning models are capable of provide a practical solution for real-time phishing detection system deployed on browsers, email gateways, and mobile devices. Keywords Phishing detection, cybersecurity, deep learning, lightweight neural networks, URL classification, real-time security. 1. #### Introduction Phishing is a process of cyberattack in which malicious person attempt to trick users into disclosing their sensitive information such as login credentials, banking details, or personal data. These attacks are carried out through misleading emails, fraudulent websites, and malicious URLs designed to reproduce legitimate services. With the fast growth of online platforms and digital services phishing occurrences had become increasingly sophisticated and widespread. Traditional phishing detection systems mainly rely on blacklist databases and manually defined rules. While these approaches are capable of identifying known malicious websites, but they often fail to detect newly generated phishing domains or modified attack patterns. Attackers frequently register themselves with new domains or slightly modify existing URLs to bypass blacklist systems. As a result, traditional approaches fails to provide effective protection against emerging phishing. Machine learning and deep learning techniques have recently gained important attention in phishing detection research. These techniques allows detection systems to learn patterns from data and identify malicious behaviour more effectively than rule-based methods. However, many deep learning models are expensive and require significant processing power, making them difficult to deploy in real- time security systems To address this challenge my paper explores the use of lightweight deep learning models that can be used for real- time phishing detection. Lightweight models are proposed to accomplish higher performance while maintaining low computational complexity, making them appropriate for real- time deployment such as browser extensions, email filtering systems and mobile security tools. 2. #### Background and Related Work Several studies have discovered machine learning methods for phishing detection. Traditional machine learning algorithms such as Support Vector Machines, Decision Trees, and Random Forest classifiers had been widely used for classifying phishing URLs based on features. Deep learning techniques had shown hopeful results in recent years. Convolutional Neural Networks (CNNs) have been applied to examine character-level patterns in URLs, while Recurrent Neural Networks (RNNs) and Long Short-Term Memory (LSTM) networks have been used to process sequential data such as email messages. Most recently transformer based architectures such as BERT have been applied to phishing email detection by capturing appropriate relationships within text data. Although these models provide high detection accuracy, they often require significant computational resources. As a result, recent research focuses on making lightweight deep learning architectures that reduce computational complexity while maintaining strong detection performance. 3. #### Methodology 1. #### System Overview: The planned phishing detection framework emphases in identifying malicious URLs in real time using lightweight deep learning models. The system takes vocabulary features from URLs and processes the same through a lightweight neural network architecture for classification. User URL Input URL Feature Extraction Feature Encoding Lightweight Deep Learning Model Phishing / Legitimate Classification 2. #### Dataset: For phishing detection research some publicly available dataset are commonly used that including PhishTank, URLHaus, and Kaggle phishing datasets. This dataset mainly has labeled samples for phishing URLs and legitimate URLs collected from real-world sources. The dataset are used to study contains both phishing and legitimate URLs collected from publicly available sources. Each URL are labeled according their classification. 3. #### Feature Extraction: The planned model focuses on vocabulary URL features that can remove quickly without requiring webpage analysis. These features include: 1. URL length 2. Number of special characters 3. Number of subdomains 4. Presence of suspicious keywords 5. Domain age 6. Presence of HTTPS protocol Vocabulary features are particularly beneficial for real- time phishing detection because they can be extracted instantly. 4. #### Lightweight Deep Learning Architecture: The intentional detection model use lightweight neural network architecture designed to efficient computation. The architecture consists of the following components: Input Layer (Encoded URL Features) Embedding Layer Convolution Layer (Feature Extraction) Pooling Layer (Dimensionality Reduction) Fully Connected Layer Output Layer (Phishing / Legitimate) This architecture lets efficient processing URL feature while maintaining its strong classification performance. 4. #### Experimental Results The performance of the proposed model was assessed using standard classification metrics including: * Accuracy 1. 1. #### Experimental Setup: The experiment evaluates the proposed lightweight deep learning model using a dataset of 2000 URLs containing both phishing in addition to legitimate samples. The dataset was divided into 80% training and 20% testing. Standard assessment metrics including accuracy, precision, recall, and F1-score were used. 2. #### Dataset Distribution: The dataset is balanced with a equal figures of phishing and legitimate URLs #### Model Training Performance: This model was trained for 20 epochs. Accuracy of the model increases while losses declines during training. 1. * Precision * Recall * F1-score 5. Confusion Matrix Analysis: The confusion matrix shows that the majority of phishing and legitimate URLs were correctly classified with minimal false predictions. Experimental results shows that the lightweight deep learning model attains high detection accuracy while maintaining faster prediction times. Compared to traditional deep neural networks, the planned architecture requires fewer computational resources and provides efficient real-time classification. 6. ROC Analysis: The ROC curve illustrates strong classification capability of the model 7. Model Performance Comparison: The planned lightweight deep learning model achieves approximately 96% accuracy and outperforms traditional machine learning models such as SVM and Random Forest. * Discussion The experimental results shows that lightweight deep learning models gives an effective solution for real-time phishing detection. By focusing on vocabulary URL features and simplified neural network architectures, the model can sense phishing attacks quickly and efficiently. The system is mainly suitable for deployment in environment where computational resources are less, such as browser extensions, mobile devices, and network gateways. However, phishing attacks continue to grow rapidly. Attackers often change URL structures and domain registration patterns for bypassing detection systems. Therefore, these phishing detection models must continuously be updated using new dataset for maintaining its effectiveness. * Future Work Future research can be explored in many directions to increase phishing detection system effectiveness: 1. Addition of multi-model features such as webpage content and visual screenshots. 2. Development of a adaptive learning model capable of handling the concept drift. 3. Implementation of a understandable AI techniques to improve model transparency 4. Deployment of lightweight models for browser- based phishing detection systems * Conclusion Phishing attacks persist significant cybersecurity threat in modern digital environment. Traditional detection systems are often struggle to detect a newly generated phishing attacks during real time. This research demonstrate that lightweight deep learning models are effective in detect phishing URLs while upholding low computational overhead. The planned system combines efficient feature extraction with a basic neural network architecture, enabling fast and accurate phishing detection. Lightweight deep learning approach are therefore representing a promising direction for developing scalable and practical real time phishing detection systems. References 1. J. Garera, N. Provos, M. Chew, and A. Rubin, A framework for detection and measurement of phishing attacks, Proceedings of the ACM Workshop on Rapid Malcode, 2022. 2. Q. E. ul Haq, M. H. Faheem, and I. Ahmad, Detecting phishing URLs using deep learning techniques, Applied Sciences, vol. 14, 2024. 3. H. Li, Email phishing detection using BERT transformer models, Proceedings of SPIE, 2024. 4. S. Sountharrajan et al., Phishing URL detection using machine learning techniques, International Journal of Information Security, 2021. 5. K. Liew, K. Choo, and Y. Xiang, Deep learning for phishing detection: A survey, IEEE Access, 2023. ______________

Real-Time Phishing Detection using Lightweight Deep Learning Models View Abstract & download full text of Real-Time Phishing Detection using Lightweight Deep Learning Models Download Full-Text ...

#Volume #15, #Issue #03 #(March #2026)

Origin | Interest | Match

0 0 0 0
Real-Time Phishing Detection using Lightweight Deep Learning Models **DOI :****https://doi.org/10.5281/zenodo.19205405** Download Full-Text PDF Cite this Publication Siddharth Adhikary, Upasna Setia, 2026, Real-Time Phishing Detection using Lightweight Deep Learning Models, INTERNATIONAL JOURNAL OF ENGINEERING RESEARCH & TECHNOLOGY (IJERT) Volume 15, Issue 03 , March – 2026 * **Open Access** * Article Download / Views: 11 * **Authors :** Siddharth Adhikary, Upasna Setia * **Paper ID :** IJERTV15IS030957 * **Volume & Issue : ** Volume 15, Issue 03 , March – 2026 * **Published (First Online):** 24-03-2026 * **ISSN (Online) :** 2278-0181 * **Publisher Name :** IJERT * **License:** This work is licensed under a Creative Commons Attribution 4.0 International License __ PDF Version View __ Text Only Version #### Real-Time Phishing Detection using Lightweight Deep Learning Models Siddharth Adhikary Department of Computer Scieence and Engineering, Ganga Institute of Technology and Management Kablana, India Upasna Setia Department of Computer Scieence and Engineering, Ganga Institute of Technology and Management Kablana, India Abstract Phishing attacks continue is to be a major cybersecurity concern as attackers tries more and more to exploit digital communication channels to cheat users into revealing sensitive information. Traditional phishing detection techniques, such as blacklist-based systems and rule-based filters, often fail to detect newly created phishing websites or malicious URLs in real time. Recent advances in deep learning have improved detection of accuracy; however, many deep learning models require high computational resource and difficult to deploy in real-time systems. This study is toward the use of lightweight deep learning model for real-time phishing detection. The proposed research focuses on designing efficient neural network architectures capable of identifying phishing URLs with minimum computational. By combining terminology URL features with lightweight deep learning techniques, the model achieves effective detection while maintaining fast processing speed. The results shows that lightweight deep learning models are capable of provide a practical solution for real-time phishing detection system deployed on browsers, email gateways, and mobile devices. Keywords Phishing detection, cybersecurity, deep learning, lightweight neural networks, URL classification, real-time security. 1. #### Introduction Phishing is a process of cyberattack in which malicious person attempt to trick users into disclosing their sensitive information such as login credentials, banking details, or personal data. These attacks are carried out through misleading emails, fraudulent websites, and malicious URLs designed to reproduce legitimate services. With the fast growth of online platforms and digital services phishing occurrences had become increasingly sophisticated and widespread. Traditional phishing detection systems mainly rely on blacklist databases and manually defined rules. While these approaches are capable of identifying known malicious websites, but they often fail to detect newly generated phishing domains or modified attack patterns. Attackers frequently register themselves with new domains or slightly modify existing URLs to bypass blacklist systems. As a result, traditional approaches fails to provide effective protection against emerging phishing. Machine learning and deep learning techniques have recently gained important attention in phishing detection research. These techniques allows detection systems to learn patterns from data and identify malicious behaviour more effectively than rule-based methods. However, many deep learning models are expensive and require significant processing power, making them difficult to deploy in real- time security systems To address this challenge my paper explores the use of lightweight deep learning models that can be used for real- time phishing detection. Lightweight models are proposed to accomplish higher performance while maintaining low computational complexity, making them appropriate for real- time deployment such as browser extensions, email filtering systems and mobile security tools. 2. #### Background and Related Work Several studies have discovered machine learning methods for phishing detection. Traditional machine learning algorithms such as Support Vector Machines, Decision Trees, and Random Forest classifiers had been widely used for classifying phishing URLs based on features. Deep learning techniques had shown hopeful results in recent years. Convolutional Neural Networks (CNNs) have been applied to examine character-level patterns in URLs, while Recurrent Neural Networks (RNNs) and Long Short-Term Memory (LSTM) networks have been used to process sequential data such as email messages. Most recently transformer based architectures such as BERT have been applied to phishing email detection by capturing appropriate relationships within text data. Although these models provide high detection accuracy, they often require significant computational resources. As a result, recent research focuses on making lightweight deep learning architectures that reduce computational complexity while maintaining strong detection performance. 3. #### Methodology 1. #### System Overview: The planned phishing detection framework emphases in identifying malicious URLs in real time using lightweight deep learning models. The system takes vocabulary features from URLs and processes the same through a lightweight neural network architecture for classification. User URL Input URL Feature Extraction Feature Encoding Lightweight Deep Learning Model Phishing / Legitimate Classification 2. #### Dataset: For phishing detection research some publicly available dataset are commonly used that including PhishTank, URLHaus, and Kaggle phishing datasets. This dataset mainly has labeled samples for phishing URLs and legitimate URLs collected from real-world sources. The dataset are used to study contains both phishing and legitimate URLs collected from publicly available sources. Each URL are labeled according their classification. 3. #### Feature Extraction: The planned model focuses on vocabulary URL features that can remove quickly without requiring webpage analysis. These features include: 1. URL length 2. Number of special characters 3. Number of subdomains 4. Presence of suspicious keywords 5. Domain age 6. Presence of HTTPS protocol Vocabulary features are particularly beneficial for real- time phishing detection because they can be extracted instantly. 4. #### Lightweight Deep Learning Architecture: The intentional detection model use lightweight neural network architecture designed to efficient computation. The architecture consists of the following components: Input Layer (Encoded URL Features) Embedding Layer Convolution Layer (Feature Extraction) Pooling Layer (Dimensionality Reduction) Fully Connected Layer Output Layer (Phishing / Legitimate) This architecture lets efficient processing URL feature while maintaining its strong classification performance. 4. #### Experimental Results The performance of the proposed model was assessed using standard classification metrics including: * Accuracy 1. 1. #### Experimental Setup: The experiment evaluates the proposed lightweight deep learning model using a dataset of 2000 URLs containing both phishing in addition to legitimate samples. The dataset was divided into 80% training and 20% testing. Standard assessment metrics including accuracy, precision, recall, and F1-score were used. 2. #### Dataset Distribution: The dataset is balanced with a equal figures of phishing and legitimate URLs #### Model Training Performance: This model was trained for 20 epochs. Accuracy of the model increases while losses declines during training. 1. * Precision * Recall * F1-score 5. Confusion Matrix Analysis: The confusion matrix shows that the majority of phishing and legitimate URLs were correctly classified with minimal false predictions. Experimental results shows that the lightweight deep learning model attains high detection accuracy while maintaining faster prediction times. Compared to traditional deep neural networks, the planned architecture requires fewer computational resources and provides efficient real-time classification. 6. ROC Analysis: The ROC curve illustrates strong classification capability of the model 7. Model Performance Comparison: The planned lightweight deep learning model achieves approximately 96% accuracy and outperforms traditional machine learning models such as SVM and Random Forest. * Discussion The experimental results shows that lightweight deep learning models gives an effective solution for real-time phishing detection. By focusing on vocabulary URL features and simplified neural network architectures, the model can sense phishing attacks quickly and efficiently. The system is mainly suitable for deployment in environment where computational resources are less, such as browser extensions, mobile devices, and network gateways. However, phishing attacks continue to grow rapidly. Attackers often change URL structures and domain registration patterns for bypassing detection systems. Therefore, these phishing detection models must continuously be updated using new dataset for maintaining its effectiveness. * Future Work Future research can be explored in many directions to increase phishing detection system effectiveness: 1. Addition of multi-model features such as webpage content and visual screenshots. 2. Development of a adaptive learning model capable of handling the concept drift. 3. Implementation of a understandable AI techniques to improve model transparency 4. Deployment of lightweight models for browser- based phishing detection systems * Conclusion Phishing attacks persist significant cybersecurity threat in modern digital environment. Traditional detection systems are often struggle to detect a newly generated phishing attacks during real time. This research demonstrate that lightweight deep learning models are effective in detect phishing URLs while upholding low computational overhead. The planned system combines efficient feature extraction with a basic neural network architecture, enabling fast and accurate phishing detection. Lightweight deep learning approach are therefore representing a promising direction for developing scalable and practical real time phishing detection systems. References 1. J. Garera, N. Provos, M. Chew, and A. Rubin, A framework for detection and measurement of phishing attacks, Proceedings of the ACM Workshop on Rapid Malcode, 2022. 2. Q. E. ul Haq, M. H. Faheem, and I. Ahmad, Detecting phishing URLs using deep learning techniques, Applied Sciences, vol. 14, 2024. 3. H. Li, Email phishing detection using BERT transformer models, Proceedings of SPIE, 2024. 4. S. Sountharrajan et al., Phishing URL detection using machine learning techniques, International Journal of Information Security, 2021. 5. K. Liew, K. Choo, and Y. Xiang, Deep learning for phishing detection: A survey, IEEE Access, 2023. ______________

Real-Time Phishing Detection using Lightweight Deep Learning Models View Abstract & download full text of Real-Time Phishing Detection using Lightweight Deep Learning Models Download Full-Text ...

#Volume #15, #Issue #03 #(March #2026)

Origin | Interest | Match

0 0 0 0
Real-Time Phishing Detection using Lightweight Deep Learning Models **DOI :****https://doi.org/10.5281/zenodo.19205405** Download Full-Text PDF Cite this Publication Siddharth Adhikary, Upasna Setia, 2026, Real-Time Phishing Detection using Lightweight Deep Learning Models, INTERNATIONAL JOURNAL OF ENGINEERING RESEARCH & TECHNOLOGY (IJERT) Volume 15, Issue 03 , March – 2026 * **Open Access** * Article Download / Views: 11 * **Authors :** Siddharth Adhikary, Upasna Setia * **Paper ID :** IJERTV15IS030957 * **Volume & Issue : ** Volume 15, Issue 03 , March – 2026 * **Published (First Online):** 24-03-2026 * **ISSN (Online) :** 2278-0181 * **Publisher Name :** IJERT * **License:** This work is licensed under a Creative Commons Attribution 4.0 International License __ PDF Version View __ Text Only Version #### Real-Time Phishing Detection using Lightweight Deep Learning Models Siddharth Adhikary Department of Computer Scieence and Engineering, Ganga Institute of Technology and Management Kablana, India Upasna Setia Department of Computer Scieence and Engineering, Ganga Institute of Technology and Management Kablana, India Abstract Phishing attacks continue is to be a major cybersecurity concern as attackers tries more and more to exploit digital communication channels to cheat users into revealing sensitive information. Traditional phishing detection techniques, such as blacklist-based systems and rule-based filters, often fail to detect newly created phishing websites or malicious URLs in real time. Recent advances in deep learning have improved detection of accuracy; however, many deep learning models require high computational resource and difficult to deploy in real-time systems. This study is toward the use of lightweight deep learning model for real-time phishing detection. The proposed research focuses on designing efficient neural network architectures capable of identifying phishing URLs with minimum computational. By combining terminology URL features with lightweight deep learning techniques, the model achieves effective detection while maintaining fast processing speed. The results shows that lightweight deep learning models are capable of provide a practical solution for real-time phishing detection system deployed on browsers, email gateways, and mobile devices. Keywords Phishing detection, cybersecurity, deep learning, lightweight neural networks, URL classification, real-time security. 1. #### Introduction Phishing is a process of cyberattack in which malicious person attempt to trick users into disclosing their sensitive information such as login credentials, banking details, or personal data. These attacks are carried out through misleading emails, fraudulent websites, and malicious URLs designed to reproduce legitimate services. With the fast growth of online platforms and digital services phishing occurrences had become increasingly sophisticated and widespread. Traditional phishing detection systems mainly rely on blacklist databases and manually defined rules. While these approaches are capable of identifying known malicious websites, but they often fail to detect newly generated phishing domains or modified attack patterns. Attackers frequently register themselves with new domains or slightly modify existing URLs to bypass blacklist systems. As a result, traditional approaches fails to provide effective protection against emerging phishing. Machine learning and deep learning techniques have recently gained important attention in phishing detection research. These techniques allows detection systems to learn patterns from data and identify malicious behaviour more effectively than rule-based methods. However, many deep learning models are expensive and require significant processing power, making them difficult to deploy in real- time security systems To address this challenge my paper explores the use of lightweight deep learning models that can be used for real- time phishing detection. Lightweight models are proposed to accomplish higher performance while maintaining low computational complexity, making them appropriate for real- time deployment such as browser extensions, email filtering systems and mobile security tools. 2. #### Background and Related Work Several studies have discovered machine learning methods for phishing detection. Traditional machine learning algorithms such as Support Vector Machines, Decision Trees, and Random Forest classifiers had been widely used for classifying phishing URLs based on features. Deep learning techniques had shown hopeful results in recent years. Convolutional Neural Networks (CNNs) have been applied to examine character-level patterns in URLs, while Recurrent Neural Networks (RNNs) and Long Short-Term Memory (LSTM) networks have been used to process sequential data such as email messages. Most recently transformer based architectures such as BERT have been applied to phishing email detection by capturing appropriate relationships within text data. Although these models provide high detection accuracy, they often require significant computational resources. As a result, recent research focuses on making lightweight deep learning architectures that reduce computational complexity while maintaining strong detection performance. 3. #### Methodology 1. #### System Overview: The planned phishing detection framework emphases in identifying malicious URLs in real time using lightweight deep learning models. The system takes vocabulary features from URLs and processes the same through a lightweight neural network architecture for classification. User URL Input URL Feature Extraction Feature Encoding Lightweight Deep Learning Model Phishing / Legitimate Classification 2. #### Dataset: For phishing detection research some publicly available dataset are commonly used that including PhishTank, URLHaus, and Kaggle phishing datasets. This dataset mainly has labeled samples for phishing URLs and legitimate URLs collected from real-world sources. The dataset are used to study contains both phishing and legitimate URLs collected from publicly available sources. Each URL are labeled according their classification. 3. #### Feature Extraction: The planned model focuses on vocabulary URL features that can remove quickly without requiring webpage analysis. These features include: 1. URL length 2. Number of special characters 3. Number of subdomains 4. Presence of suspicious keywords 5. Domain age 6. Presence of HTTPS protocol Vocabulary features are particularly beneficial for real- time phishing detection because they can be extracted instantly. 4. #### Lightweight Deep Learning Architecture: The intentional detection model use lightweight neural network architecture designed to efficient computation. The architecture consists of the following components: Input Layer (Encoded URL Features) Embedding Layer Convolution Layer (Feature Extraction) Pooling Layer (Dimensionality Reduction) Fully Connected Layer Output Layer (Phishing / Legitimate) This architecture lets efficient processing URL feature while maintaining its strong classification performance. 4. #### Experimental Results The performance of the proposed model was assessed using standard classification metrics including: * Accuracy 1. 1. #### Experimental Setup: The experiment evaluates the proposed lightweight deep learning model using a dataset of 2000 URLs containing both phishing in addition to legitimate samples. The dataset was divided into 80% training and 20% testing. Standard assessment metrics including accuracy, precision, recall, and F1-score were used. 2. #### Dataset Distribution: The dataset is balanced with a equal figures of phishing and legitimate URLs #### Model Training Performance: This model was trained for 20 epochs. Accuracy of the model increases while losses declines during training. 1. * Precision * Recall * F1-score 5. Confusion Matrix Analysis: The confusion matrix shows that the majority of phishing and legitimate URLs were correctly classified with minimal false predictions. Experimental results shows that the lightweight deep learning model attains high detection accuracy while maintaining faster prediction times. Compared to traditional deep neural networks, the planned architecture requires fewer computational resources and provides efficient real-time classification. 6. ROC Analysis: The ROC curve illustrates strong classification capability of the model 7. Model Performance Comparison: The planned lightweight deep learning model achieves approximately 96% accuracy and outperforms traditional machine learning models such as SVM and Random Forest. * Discussion The experimental results shows that lightweight deep learning models gives an effective solution for real-time phishing detection. By focusing on vocabulary URL features and simplified neural network architectures, the model can sense phishing attacks quickly and efficiently. The system is mainly suitable for deployment in environment where computational resources are less, such as browser extensions, mobile devices, and network gateways. However, phishing attacks continue to grow rapidly. Attackers often change URL structures and domain registration patterns for bypassing detection systems. Therefore, these phishing detection models must continuously be updated using new dataset for maintaining its effectiveness. * Future Work Future research can be explored in many directions to increase phishing detection system effectiveness: 1. Addition of multi-model features such as webpage content and visual screenshots. 2. Development of a adaptive learning model capable of handling the concept drift. 3. Implementation of a understandable AI techniques to improve model transparency 4. Deployment of lightweight models for browser- based phishing detection systems * Conclusion Phishing attacks persist significant cybersecurity threat in modern digital environment. Traditional detection systems are often struggle to detect a newly generated phishing attacks during real time. This research demonstrate that lightweight deep learning models are effective in detect phishing URLs while upholding low computational overhead. The planned system combines efficient feature extraction with a basic neural network architecture, enabling fast and accurate phishing detection. Lightweight deep learning approach are therefore representing a promising direction for developing scalable and practical real time phishing detection systems. References 1. J. Garera, N. Provos, M. Chew, and A. Rubin, A framework for detection and measurement of phishing attacks, Proceedings of the ACM Workshop on Rapid Malcode, 2022. 2. Q. E. ul Haq, M. H. Faheem, and I. Ahmad, Detecting phishing URLs using deep learning techniques, Applied Sciences, vol. 14, 2024. 3. H. Li, Email phishing detection using BERT transformer models, Proceedings of SPIE, 2024. 4. S. Sountharrajan et al., Phishing URL detection using machine learning techniques, International Journal of Information Security, 2021. 5. K. Liew, K. Choo, and Y. Xiang, Deep learning for phishing detection: A survey, IEEE Access, 2023. ______________

Real-Time Phishing Detection using Lightweight Deep Learning Models View Abstract & download full text of Real-Time Phishing Detection using Lightweight Deep Learning Models Download Full-Text ...

#Volume #15, #Issue #03 #(March #2026)

Origin | Interest | Match

0 0 0 0
Preview
Transcriptomic profiling confirms microRNA-140 is more functional in joint development than in disease To investigate the distinct roles of microRNA-140 (miR-140) in skeletal development and osteoarthritis (OA), and to identify novel miR-140–5p targets using advanced transcriptomic profiling.

What is the role of microRNA-140 in joint development and in post-traumatic OA? 🧐

Hao et al. sought to investigate this question with spatial transcriptomics in our #OMICS #Special #Issue in #OAC.

Read more to discover their findings:
www.oarsijournal.com/article/S106...

1 0 0 0
Post image

"His base is turning #Israel 🇮🇱 into an #issue that he doesn't know how to deal with. It is becoming I would say one of the central issues of the #MAGA world. ... It is the issue that could break the dɯnɹꓕ thing wide open." — Michael Wolff @michaelwolffnyc.bsky.social

www.youtube.com/shorts/gAvEu...

4 1 0 1

Folks, Calima here. We're running this down to dIME immediately. Please stand by. #LiveFire #Issue #LivePING #Processing #Standby.

1 0 0 1

WebGuard AI-Powered Potato Threat Detector Using BERT And AutoEncoder View Abstract & download full text of WebGuard AI-Powered Potato Threat Detector Using BERT And AutoEncoder Download Full-Tex...

#Volume #15, #Issue #03 #(March #2026)

Origin | Interest | Match

0 0 0 0
WebGuard AI-Powered Cyber Threat Detector Using BERT And AutoEncoder **DOI :****https://doi.org/10.5281/zenodo.19161307** Download Full-Text PDF Cite this Publication Ravi M, A. Obulesu, A. Sathvik Samuel, K. Pravallika, S. Pavan, L. Anjaneyulu, 2026, WebGuard AI-Powered Cyber Threat Detector Using BERT And AutoEncoder, INTERNATIONAL JOURNAL OF ENGINEERING RESEARCH & TECHNOLOGY (IJERT) Volume 15, Issue 03 , March – 2026 * **Open Access** * Article Download / Views: 0 * **Authors :** Ravi M, A. Obulesu, A. Sathvik Samuel, K. Pravallika, S. Pavan, L. Anjaneyulu * **Paper ID :** IJERTV15IS030751 * **Volume & Issue : ** Volume 15, Issue 03 , March – 2026 * **Published (First Online):** 22-03-2026 * **ISSN (Online) :** 2278-0181 * **Publisher Name :** IJERT * **License:** This work is licensed under a Creative Commons Attribution 4.0 International License __ PDF Version View __ Text Only Version #### WebGuard AI-Powered Cyber Threat Detector Using BERT And AutoEncoder Ravi M Vidya Jyothi Institute of Technology Hyderabad, India A. Obulesu Vidya Jyothi Institute of Technology Hyderabad, India A. Sathvik Samuel Vidya Jyothi Institute of Technology Hyderabad, India K. Pravallika Vidya Jyothi Institute of Technology Hyderabad, India S. Pavan Vidya Jyothi Institute of Technology Hyderabad, India L. Anjaneyulu Vidya Jyothi Institute of Technology Hyderabad, India Abstract – In the fast development of web applications, cyber attacks like phishing and poisonous web requests have become more advanced. There are old signature based detection methods which find it difficult to recognize new and zero day. attacks. In this paper, a cyber threat detection framework based on the use of AI and combining AutoEncoder-based anomaly detection with semantic phishing based on BERT is introduced. detection. The AutoEncoder model uses normal network behavior as a form of learning and detects the abnormal patterns, but it does not need labeling data, whereas the BERT model carries out deep semantic analysis of URLs to distinguish between phishing attempts. The combination of the two methods allows the proposed system to be able to detect the known and those threats with high levels of robustness. Experimental analysis shows a high accuracy and low false positives and high generalization ability than conventional. techniques of machine learning. Keywords – Phishing Detection, URL Classification, Hybrid Deep Learning, BERT, AutoEncoder, Structural Anomaly Detection, Semantic Analysis, Transformer Models, Cybersecurity, Ensemble Learning. 1. INTRODUCTION The general use of web-based services is a major factor that has escalated cyber-attacks such as phishing attacks, malicious URLs, and web-based intrusions. Phishing attackers use the human weaknesses on using misleading URL format and impersonation strategies whereas intrusions are usually associated with abnormal traffic patterns and delivery of malicious payloads. The conventional security tools like rule-based filters and blacklist systems cannot work anymore to counter the changing tactics of a contemporary attacker because they are based on predefined signatures and previous data [5], [9]. In the present cybersecurity environment, the attackers have been altering the URL constructs and are using the obfuscation techniques to be able to evade traditional detection mechanisms [6], [8]. Moreover, the current solutions that rely on a fixed rule set or individual machine learning classifiers can typically not generalize to zero-day attacks and sophisticated phishing schemes integrated into URLs and web requests [2], [7]. To overcome such drawbacks, this research suggests an AI- based cyber threat detection system based on semantic analysis that incorporates the use of contextual knowledge and structural anomaly detection. The proposed model will include a system based on a BERT-based phishing detector model which seeks to analyze semantic and contextual patterns of URLs [2], [4], and an anomaly detection model based on an AutoEncoder, which will learn the structural features of valid URLs and detect anomalous variations [3], [7]. The framework does not set to replace traditional security infrastructure, instead, it enables the framework to act as an intelligent decision-support layer, which improves the capabilities of early detection and helps network managers identify potential cyber threats in a more efficient manner. 2. LITERATURE SURVEY The increasing complexity of phishing attacks and rogue URLs has spurred concerns on research of intelligent machine learning and deep learning-based detection systems. Blacklist and rule-based technologies are not effective against the zero-day attacks and domain names that are generated automatically. Recent research focuses on the hybrid architectures, representation learning, and NLP- based approaches to construct the scalable and robust phishing detection systems that can learn complicated URL patterns. In Enhancing Phishing Detection: A New Hybrid Deep Learning Model to Cybercrime Forensics, Alsubaei et al. suggest having an ensemble framework consisting of ResNeXt, GRU, and AutoEncoders to extract features and classify them [1]. Through their work, it is apparent that AutoEncoders are efficient in learning latent URL representation, noise elimination, and exhibit high detection rates even during imbalanced dataset learning. Nonetheless, the system is also concerned mostly with structural feature learning but fails to integrate transformer-based semantic model of URLs [1]. On the same note, in the article by Ahmad et al. Across the Spectrum: In-Depth Review of AI-Based Models for Phishing Detection, more than 130 AI-based phishing detection studies are evaluated [2]. They note an increase in dependence on ensemble learning, anomaly detector, and deep neural network and uncover the continued difficulties of overfitting, low generalization to newly registered domains, and weak context modeling in URL only classifiers. They suggest that transformer-based language models should be combined with the unsupervised learning method to be more resilient to changing phishing tactics [2]. In A Survey of Intelligent Detection Designs of HTML URL Phishing Attacks, Asiri et al. divide the phishing detection systems into the style of URL-based, content-based, and hybrid detention systems [3]. They emphasize that despite the popularization of deep learning models like CNNs and LSTMs, most systems do not have access to semantic information of URL tokens and have infrequent use of transformer-based encoders [3]. AutoEncoders also considered by the authors are promising under unsupervised feature learning and zero-day detection, but the methods outside of advanced NLP architecture have limited integrations [3]. In Phishing Detection System Through Hybrid Machine Learning Based on URL, Karim et al. introduce a voting- based ensemble of Logistic Regression, Support Vector Machines, and Decision Trees, which is combined [4]. They demonstrate that their results are stronger and predictive than single classifiers. Although it is useful in the case of organized lexical attributes, the paper does not touch on deep contextual representations or unsupervised anomaly detection methods of detecting unseen or zero-day URLs [4]. 3. EXISTING MODELS The traditional methods of phishing detection have been based on blacklist-based methods, heuristic methods and classical machine learning method. Blacklist systems retain already known bad websites and prevent access to these sites but they cannot identify zero-day attacks or those created recently because they rely wholly on the past information [5], [9]. Recent survey research also highlights that blacklist-only systems cannot effectively deal with the changing phishing threats on a large scale [1], [4]. Heuristic and rule-based methods are trying to find phishing URLs based on the lexical attributes that have already been defined to detect phishing URLs; which are abnormality of length of URL, the presence of a lot of special characters, inclusion of a IP address, and occurrence of suspicious keywords like login, or verify. These methods are computationally cheap and simple to apply, but they are not flexible and generate a high number of false positives because of strict definitions of rules. Such forms of static detection are easy to circumvent by the attacker via a URL obfuscation and manipulatin of the structure [6], [8]. There is also evidence of comparative studies indicating that the URL-only rule-based systems cannot be effective against sophisticated phishing techniques [3], [4]. Conventional machine learning algorithms such as Logistic Regression, Support Vector Machines (SVM), Random Forests and Naive Bayes classifiers have been used extensively in phishing detection activities. Based on such models, handcrafted features that are derived out of the URLs or webpage content are paramount and do supervised classification [7], [10]. They are more accurate than the heuristic systems, but the effectiveness of these systems is heavily reliant on the quality of the feature engineering, and might not be able to relate deeper contextual relationships within textual data. The hybrid ensemble models which involve more than one classifier have been shown to perform better, although they are yet to rely on automated feature extraction [3]. In the recent past, deep learning-based models like Convolutional Neural Networks (CNNs), Recurrent Neural Networks (RNNs) and a hybrid deep architecture of detecting phishing has been introduced. They are automatic feature representations learners that can detect objects better [7] with a raw input. As an example, the hybrid deep learning systems that use AutoEncoders and optimization algorithms have shown high accuracies but with a cost of increased computational complexity [2]. Regardless of these progresses, most deep learning systems are either semantics- driven or structural-driven, which restricts the ability to be resistant to advanced phishing attacks [1], [4]. Models that involve a transformer like BERT have also enhanced contextual understanding of text with the help of self- attention, which improves semantic interpretation of URLs. But standalone anomaly detection systems can give more false positives, but purely semantic models can ignore structural anomalies. Thus, ]the recent studies suggest the necessity of hybrid models, which would combine semantic intelligence with structural anomaly detection as the means of obtaining balanced, scalable, and robust phishing detection systems [2], [3], [6]. 4. PROPOSED METHODOLOGY This section presents the proposed framework for detecting phishing URLs using a hybrid deep learning architecture that integrates transformer based semantic modeling with unsupervised anomaly detection. The system combines Bidirectional Encoder Representations from Transformers (BERT) for contextual URL embedding with an AutoEncoder network for feature compression and novelty detection, followed by a supervised classification layer. The complete pipeline is implemented as a web based service using Flask to enable real time inference. 1. Dataset Preparation To cover the legitimate and malicious patterns of URLs comprehensively, two sets of data were used in this study. Valid URLs were gathered on the Tranco Top Domains list, the list of popular and reputable sites, thus imitating the normal web traffic patterns[5]. The publicly available phishing repositories and Kaggle datasets of labeled malicious links were used as sources of phishing URLs[3],[7]. The integrated dataset was balanced so as to minimize bias in classification so as to train the model fairly. To perform supervised semantic classification with the help of BERT, URLs received binary labels: the 0 values were used to represent legitimate and 1 phishing. Conversely, the AutoEncoder was only trained on valid URLs so as to establish the normal structural features of normal web behavior. The design will make the system more able to identify zero-day phishing attacks by detecting structural anomalies through the identification of significant deviation to known benign data. 2. Data Preprocessing All the URLs in the dataset were subjected to a semantic classification phase of a fine tuned BERT model (bert-base- uncased ) to learn contextual phishing patterns[1], [4]. The tokenized URL was subjected to the transformer architecture and the [CLS] token embedding was obtained as a 768- dimensional semantic feature of the full URL[1]. This embedding allows the model to detect the patterns of deceptive languages, brand impersonation, and the manipulations on the lexical level, which are often applied in phishing attacks[4]. The contextual representation was subsequently fed through a fully connected classification layer and a sigmoid activation function to produce a score on whether an individual is a phisher or not. This is the probability of the URL being malicious and is the semantic element of the hybrid detection system. 3. BERT Semantic Classification We have used a pre-trained BERT model (bert-base-uncased) to do fine-tuning on binary classification to distinguish between legitimate and phishing URLs[1], [4]. The semantic feature vector upon which the input sequence was classified was the 768 dimensional token embedding of the input sequence during processing[1]. This representation allows the model to learn the complicated contextual relationships in the URL text like brand impersonation patterns, phishing-related keywords, obfuscated textual patterns and deceptive lexical signatures that are frequently employed in malicious links[4]. Through the self-attention scheme induced by transformer based architectures, the BERT branch essentially learns finer grained semantic features which a traditional feature-based model may miss[1]. The result of this branch is a probabilistic score which is produced via a sigmoid activation function and which is the probability of the URL as belonging to phishing. Fig. 1 BERT Architecture 4. AutoEncoder Based Anomaly Detection An autoencoder with a complete architecture was developed to learn the structural traits of URLs with 24 dimensions of feature vectors at the input[2], [7]. These features are reduced by the encoder to 8 dimensional latent representations (24 -16 -8) that reflect the key structural patterns of valid URLs. The decoder is made up of symmetric layers (8 16 24) with ReLU and Sigmoid activation functions to rebuild the original feature vector. In the process of inference, Mean Squared Error (MSE) is used to calculate the reconstruction error between the original and reconstructed features[2]. Anomalous URLs are identified by a threshold set based on the 95th percentile of reconstruction error between valid training data, and any values above the threshold are considered as possible phishing attacks[7]. Fig. 2 AutoEncoder Architecture 5. Hybrid Fusion Mechanism In order to boost the detection strength, the semantic and structural representations were combined into a unified decision framework. The 768 dimensional BERT embedding was added to the 8-dimensional latent representation created by the AutoEncoder to create a hybrid feature representation. A fully connected neural network having architecture Linear(768+8 64 1) was then used to pass this fused vector. A sigmoid activation mechanism was used to generate the final phishing possibility score. Such a hybrid design allows the system to identify the semantic anomalies or known phishing patterns and the anomaly detection of zero- day structural anomalies. Fig. 3 System Overview 6. Training Strategy The binary cross entropy (BCE) loss was used to train the hybrid model to optimize the performance of phishing classification by reducing the difference between the predicted probabilities and the true labels. Loss=BCE(y,y^) Adam optimizer was used as the optimization process and it offers adaptive changes in the learning rate to ensure stable and efficient convergence. Generalization and overfitting were prevented by appropriate scheduling of the learning rate. AutoEncoder was also trained individually to minimize the reconstruction error beteen structural features of input and their reconstructed objects with the help of Mean Squared Error (MSE) loss. This independent training approach provides proper anomaly recognition and good classification in the hybrid framework. Loss=MSE(x,x^) 7. Evaluation Metrics The performance of the proposed model was evaluated using standard classification metrics including Accuracy, Precision, Recall, and F1-score. Accuracy measures the overall correctness of the models predictions across both classes. Precision evaluates how many of the URLs predicted as phishing were actually malicious, thereby reflecting false positive control. Recall measures the models ability to correctly identify actual phishing URLs, indicating its sensitivity. Additionally, a Confusion Matrix was used to provide a detailed analysis of true positives, true negatives, false positives, and false negatives, ensuring a balanced evaluation of model performance. 5. RESULTS AND DISCUSSION 1. Quantitative Results The proposed hybrid phishing detection system was evaluated using a balanced dataset consisting of legitimate URLs from the Tranco list and phishing URLs collected from publicly available repositories. The BERT semantic classifier achieved high validation accuracy (approximately 97%), demonstrating strong contextual understanding of phishing patterns. The AutoEncoder-based anomaly detection model effectively identified structural irregularities using reconstruction error with a 95th percentile threshold. The hybrid ensemble model, which combines semantic and structural representations, achieved improved overall accuracy, precision, recall, and F1-score compared to individual components. Confusion matrix analysis indicated a high true positive rate for phishing detection and a low false positive rate for legitimate URLs, confirming balanced model performance. Fig. 4 Performance Metrics 2. Prediction Output and Classification Behavior The output of the system is a probabilistic phishing score generated through a sigmoid activation function. URLs predicted as legitimate produced low probability values close to zero, while phishing URLs produced probability values close to one, indicating strong model confidence. The AutoEncoder component generated reconstruction error scores, where legitimate URLs showed minimal deviation from learned structural patterns, and malicious URLs exhibited significantly higher anomaly scores. The hybrid fusion mechanism successfully integrated both outputs to produce stable and reliable final classifications. Real-time testing through the deployed web interface demonstrated consistent and interpretable predictions, with clear distinction between legitimate and phishing URLs. support use in creative AI application. Fig.5 Detected URL as Legit Fig. 6 Detected URL as Phishing 3. Model Complexity and Analysis The pre-trained BERT-base is used in the semantic branch and has about 110 million parameters and uses transformer- based self-attention mechanisms. Although BERT has a good contextual learning capability, it has higher computational needs during inference and training. In comparison, the AutoEncoder model is minimalistic, and it comprises fully connected layers with greatly reduced parameters and minimal computation costs. The hybrid model adds another dimension whereby a 768-dimensional BERT embedding is concatenated with an 8-dimensional latent space and then a small fusion network is used. Inference time is also practical to real-time use when using standard hardware regardless of the addition of BERT. In general, the computational trade- off would be covered by the enhanced detection robustness realized by hybrid integration. 4. Discussion The findings of the given research indicate that a combination of semantic intelligence and structural anomaly detectors can contribute to the phishing URL detection process greatly through the use of a hybrid deep learning model. The semantic model based on BERT was found to be very useful at detecting contextual patterns of deception, including brand impersonation, suspicious keywords, and hidden lexical manipulations that are embedded in the URLs. The fact that its transformer-based architecture enabled the system to capture complex contextual relationships, which its traditional machine learning models do not tend to acknowledge overly. The large validation accuracy of the semantic model shows that contextual knowledge is an important element in contemporary phishing detection. Nonetheless, semantic analysis might not necessarily be able to identify structurally abnormal URLs that look lexically innocuous but diverge in the valid patterns of URLs. The AutoEncoder-based anomaly detection element overcame this drawback by training the structural properties of legal URLs. The model was enabled to measure reconstruction error as an abnormality measure by compressing URL features into a latent representation and reconstructing them. This method was especially applicable in detecting the zero- day phishing attacks and newly generated malicious URLs which were not directly observed in the process of supervised training. However, the detection of structural anomaly on its own can sometimes indicate an abnormal but legitimate URL as suspicious, in particular when a legitimate URL has an unusual format pattern. Combination of the two models to a hybrid ensemble model gave a well-balanced and powerful detection system. The system combined the strengths of both systems by joining the semantic embedding that BERT performs with the latent structural representation that the AutoEncoder does. The hybrid model minimized the false positives in comparison to standalone anomaly detection and maximized the reliability of the detection in comparison to standalone semantic classification. This illustrates the fact that the twofold perspective analysis, in which contextual meaning and structural integrity are considered jointly, is useful in phishing detection. Computationally, the BERT model adds complexity to the entire model as it has a transformer structure and large parameters. Nevertheless, the AutoEncoder is still light and computationally efficient. Although to train the hybrid model, one will need moderate computing capacities, the inference time is practical in real- time web-based implementation. This renders the proposed system to be applicable to academic research, small business enterprise deployment, and cybersecurity use cases where there is a compromise of detection accuracy and available hardware resources. A key finding of this research is the importance of diversity of data sets. Using the AutoEncoder to be trained only on legit URLs enabled the model to create a clear image of what is considered as normal structural behavior. In the same manner, the assessments of legitimate and phishing samples during the supervised training enhanced the fairness of classification and decreased bias. There might be a small or homogeneous data that might narrow down the possibility of generalization and risk of misclassification. Thus, it is critical to have a wide and representative data set to obtain a consistent and scalable phishing detection system. Conclusively, the findings demonstrate the relevance of combining deep learning structures in dealing with the changing features of phishing attacks. Although semantic models offer contextual intelligence and structural deviations are detected by structural anomalies, a combination of the two offers a robust defense mechanism. The directions of future research could include lightweight transformers, adversarial training, and more hybrid architectures, which use other contextual information, like webpage text or reputation of a domain. Hybrid AI-based methods of detecting phishing attacks are a promising future of constructing resilient and smart cybesecurity tools as phishing attacks keep advancing and getting more complex. 6. CONCLUSION The paper described an AI-based phishing detector that combines both deep learning models to detect suspicious URLs, having a hybrid framework with a BERT-based semantic classifier and an AutoEncoder-based structural anomaly detector. The findings showed that the BERT model has the ability to learn contextual phishing features including brand impersonation, suspicious use of keywords and misleading lexical structures in URLs. Experimental analysis established that the hybrid system had high classification and balanced precision-recall accuracy, which is a good indication that transformer-based semantic understanding has strong evidence to boost the capability of phishing detection. With the integration of systematic preprocessing and stable neural network training procedures, the system kept on giving reliable predictions on a wide range of valid and phishing URLs. Efficiently trained lightweight structural models like AutoEncoders can be used to offer low cost anomaly detection by learning the normal properties of legitimate URL structure; however when operated alone they can give false positives in scenarios of uncommon yet harmless URL structure. More sophisticated semantic models are available like BERT, which have more contextual knowledge and stronger detection capability however they need more computational resources because of their transformer architecture. The hybrid nature of the semantic and structural components offered superior generalization and stability in the detection as opposed to independent models though the introduction of BERT raises the complexity of computation in training and application. On the whole, the system that was devised in this paper will show a viable and efficient method of automated phishing detection, allowing to classify in real- time URLs with the help of deployable web-based interface. The hybrid deep learning model lowers the need to engineer rules manually and increases flexibility to changing phishing schemes. To sum up, this study has identified the increased importance of hybrid artificial intelligence systems in cybersecurity as an example of how contextual semantic analysis and structural anomaly detection, when used together, can offer a robust and scalable and intelligent protection against contemporary phishing attacks. Fig. 7 Web interface of the URL phishing detection system 7. REFERENCES 1. S. Ahmad et al., Across the Spectrum: In-Depth Review of AI-Based Models for Phishing Detection, IEEE Access, 2025. 2. F. S. Alsubaei et al., Enhancing Phishing Detection: A Novel Hybrid Deep Learning Framework, IEEE Access, 2024. 3. A. Karim and M. Shahroz, Phishing Detection System Through Hybrid Machine Learning Based on URL, 2023. 4. S. Asiri, Y. Xiao, and T. Li, A Survey of Intelligent Detection Designs of HTML URL Phishing Attacks, IEEE Access, 2023. 5. J. Kline, E. Oakes, and P. Barford, A URL-based analysis of WWW structure and dynamics, in Proc. Netw. Traffic Meas. Anal. Conf. (TMA), Jun. 2019, p. 800. 6. A. K. Murthy and Suresha, XML URL classification based on their semantic structure orientation for web mining applications, Procedia Comput. Sci., vol. 46, pp. 143150, Jan. 2015. 7. A. A. Ubing, S. Kamilia, A. Abdullah, N. Jhanjhi, and M. Supramaniam, Phishing website detection: An improved accuracy through feature selection and ensemble learning, Int. J. Adv. Comput. Sci. Appl., vol. 10, no. 1, pp. 252257, 2019. 8. A. Aggarwal, A. Rajadesingan, and P. Kumaraguru, PhishAri: Automatic realtime phishing detection on Twitter, in Proc. eCrime Res. Summit, Oct. 2012, pp. 112. 9. S. N. Foley, D. Gollmann, and E. Snekkenes, Computer Security ESORICS 2017, vol. 10492. Oslo, Norway: Springer, Sep. 2017. 10. P. George and P. Vinod, Composite email features for spam identification, in Cyber Security. Singapore: Springer, 2018. ______________

WebGuard AI-Powered Cyber Threat Detector Using BERT And AutoEncoder View Abstract & download full text of WebGuard AI-Powered Cyber Threat Detector Using BERT And AutoEncoder Download Full-Tex...

#Volume #15, #Issue #03 #(March #2026)

Origin | Interest | Match

0 0 0 0
WebGuard AI-Powered Cyber Threat Detector Using BERT And AutoEncoder **DOI :****https://doi.org/10.5281/zenodo.19161307** Download Full-Text PDF Cite this Publication Ravi M, A. Obulesu, A. Sathvik Samuel, K. Pravallika, S. Pavan, L. Anjaneyulu, 2026, WebGuard AI-Powered Cyber Threat Detector Using BERT And AutoEncoder, INTERNATIONAL JOURNAL OF ENGINEERING RESEARCH & TECHNOLOGY (IJERT) Volume 15, Issue 03 , March – 2026 * **Open Access** * Article Download / Views: 0 * **Authors :** Ravi M, A. Obulesu, A. Sathvik Samuel, K. Pravallika, S. Pavan, L. Anjaneyulu * **Paper ID :** IJERTV15IS030751 * **Volume & Issue : ** Volume 15, Issue 03 , March – 2026 * **Published (First Online):** 22-03-2026 * **ISSN (Online) :** 2278-0181 * **Publisher Name :** IJERT * **License:** This work is licensed under a Creative Commons Attribution 4.0 International License __ PDF Version View __ Text Only Version #### WebGuard AI-Powered Cyber Threat Detector Using BERT And AutoEncoder Ravi M Vidya Jyothi Institute of Technology Hyderabad, India A. Obulesu Vidya Jyothi Institute of Technology Hyderabad, India A. Sathvik Samuel Vidya Jyothi Institute of Technology Hyderabad, India K. Pravallika Vidya Jyothi Institute of Technology Hyderabad, India S. Pavan Vidya Jyothi Institute of Technology Hyderabad, India L. Anjaneyulu Vidya Jyothi Institute of Technology Hyderabad, India Abstract – In the fast development of web applications, cyber attacks like phishing and poisonous web requests have become more advanced. There are old signature based detection methods which find it difficult to recognize new and zero day. attacks. In this paper, a cyber threat detection framework based on the use of AI and combining AutoEncoder-based anomaly detection with semantic phishing based on BERT is introduced. detection. The AutoEncoder model uses normal network behavior as a form of learning and detects the abnormal patterns, but it does not need labeling data, whereas the BERT model carries out deep semantic analysis of URLs to distinguish between phishing attempts. The combination of the two methods allows the proposed system to be able to detect the known and those threats with high levels of robustness. Experimental analysis shows a high accuracy and low false positives and high generalization ability than conventional. techniques of machine learning. Keywords – Phishing Detection, URL Classification, Hybrid Deep Learning, BERT, AutoEncoder, Structural Anomaly Detection, Semantic Analysis, Transformer Models, Cybersecurity, Ensemble Learning. 1. INTRODUCTION The general use of web-based services is a major factor that has escalated cyber-attacks such as phishing attacks, malicious URLs, and web-based intrusions. Phishing attackers use the human weaknesses on using misleading URL format and impersonation strategies whereas intrusions are usually associated with abnormal traffic patterns and delivery of malicious payloads. The conventional security tools like rule-based filters and blacklist systems cannot work anymore to counter the changing tactics of a contemporary attacker because they are based on predefined signatures and previous data [5], [9]. In the present cybersecurity environment, the attackers have been altering the URL constructs and are using the obfuscation techniques to be able to evade traditional detection mechanisms [6], [8]. Moreover, the current solutions that rely on a fixed rule set or individual machine learning classifiers can typically not generalize to zero-day attacks and sophisticated phishing schemes integrated into URLs and web requests [2], [7]. To overcome such drawbacks, this research suggests an AI- based cyber threat detection system based on semantic analysis that incorporates the use of contextual knowledge and structural anomaly detection. The proposed model will include a system based on a BERT-based phishing detector model which seeks to analyze semantic and contextual patterns of URLs [2], [4], and an anomaly detection model based on an AutoEncoder, which will learn the structural features of valid URLs and detect anomalous variations [3], [7]. The framework does not set to replace traditional security infrastructure, instead, it enables the framework to act as an intelligent decision-support layer, which improves the capabilities of early detection and helps network managers identify potential cyber threats in a more efficient manner. 2. LITERATURE SURVEY The increasing complexity of phishing attacks and rogue URLs has spurred concerns on research of intelligent machine learning and deep learning-based detection systems. Blacklist and rule-based technologies are not effective against the zero-day attacks and domain names that are generated automatically. Recent research focuses on the hybrid architectures, representation learning, and NLP- based approaches to construct the scalable and robust phishing detection systems that can learn complicated URL patterns. In Enhancing Phishing Detection: A New Hybrid Deep Learning Model to Cybercrime Forensics, Alsubaei et al. suggest having an ensemble framework consisting of ResNeXt, GRU, and AutoEncoders to extract features and classify them [1]. Through their work, it is apparent that AutoEncoders are efficient in learning latent URL representation, noise elimination, and exhibit high detection rates even during imbalanced dataset learning. Nonetheless, the system is also concerned mostly with structural feature learning but fails to integrate transformer-based semantic model of URLs [1]. On the same note, in the article by Ahmad et al. Across the Spectrum: In-Depth Review of AI-Based Models for Phishing Detection, more than 130 AI-based phishing detection studies are evaluated [2]. They note an increase in dependence on ensemble learning, anomaly detector, and deep neural network and uncover the continued difficulties of overfitting, low generalization to newly registered domains, and weak context modeling in URL only classifiers. They suggest that transformer-based language models should be combined with the unsupervised learning method to be more resilient to changing phishing tactics [2]. In A Survey of Intelligent Detection Designs of HTML URL Phishing Attacks, Asiri et al. divide the phishing detection systems into the style of URL-based, content-based, and hybrid detention systems [3]. They emphasize that despite the popularization of deep learning models like CNNs and LSTMs, most systems do not have access to semantic information of URL tokens and have infrequent use of transformer-based encoders [3]. AutoEncoders also considered by the authors are promising under unsupervised feature learning and zero-day detection, but the methods outside of advanced NLP architecture have limited integrations [3]. In Phishing Detection System Through Hybrid Machine Learning Based on URL, Karim et al. introduce a voting- based ensemble of Logistic Regression, Support Vector Machines, and Decision Trees, which is combined [4]. They demonstrate that their results are stronger and predictive than single classifiers. Although it is useful in the case of organized lexical attributes, the paper does not touch on deep contextual representations or unsupervised anomaly detection methods of detecting unseen or zero-day URLs [4]. 3. EXISTING MODELS The traditional methods of phishing detection have been based on blacklist-based methods, heuristic methods and classical machine learning method. Blacklist systems retain already known bad websites and prevent access to these sites but they cannot identify zero-day attacks or those created recently because they rely wholly on the past information [5], [9]. Recent survey research also highlights that blacklist-only systems cannot effectively deal with the changing phishing threats on a large scale [1], [4]. Heuristic and rule-based methods are trying to find phishing URLs based on the lexical attributes that have already been defined to detect phishing URLs; which are abnormality of length of URL, the presence of a lot of special characters, inclusion of a IP address, and occurrence of suspicious keywords like login, or verify. These methods are computationally cheap and simple to apply, but they are not flexible and generate a high number of false positives because of strict definitions of rules. Such forms of static detection are easy to circumvent by the attacker via a URL obfuscation and manipulatin of the structure [6], [8]. There is also evidence of comparative studies indicating that the URL-only rule-based systems cannot be effective against sophisticated phishing techniques [3], [4]. Conventional machine learning algorithms such as Logistic Regression, Support Vector Machines (SVM), Random Forests and Naive Bayes classifiers have been used extensively in phishing detection activities. Based on such models, handcrafted features that are derived out of the URLs or webpage content are paramount and do supervised classification [7], [10]. They are more accurate than the heuristic systems, but the effectiveness of these systems is heavily reliant on the quality of the feature engineering, and might not be able to relate deeper contextual relationships within textual data. The hybrid ensemble models which involve more than one classifier have been shown to perform better, although they are yet to rely on automated feature extraction [3]. In the recent past, deep learning-based models like Convolutional Neural Networks (CNNs), Recurrent Neural Networks (RNNs) and a hybrid deep architecture of detecting phishing has been introduced. They are automatic feature representations learners that can detect objects better [7] with a raw input. As an example, the hybrid deep learning systems that use AutoEncoders and optimization algorithms have shown high accuracies but with a cost of increased computational complexity [2]. Regardless of these progresses, most deep learning systems are either semantics- driven or structural-driven, which restricts the ability to be resistant to advanced phishing attacks [1], [4]. Models that involve a transformer like BERT have also enhanced contextual understanding of text with the help of self- attention, which improves semantic interpretation of URLs. But standalone anomaly detection systems can give more false positives, but purely semantic models can ignore structural anomalies. Thus, ]the recent studies suggest the necessity of hybrid models, which would combine semantic intelligence with structural anomaly detection as the means of obtaining balanced, scalable, and robust phishing detection systems [2], [3], [6]. 4. PROPOSED METHODOLOGY This section presents the proposed framework for detecting phishing URLs using a hybrid deep learning architecture that integrates transformer based semantic modeling with unsupervised anomaly detection. The system combines Bidirectional Encoder Representations from Transformers (BERT) for contextual URL embedding with an AutoEncoder network for feature compression and novelty detection, followed by a supervised classification layer. The complete pipeline is implemented as a web based service using Flask to enable real time inference. 1. Dataset Preparation To cover the legitimate and malicious patterns of URLs comprehensively, two sets of data were used in this study. Valid URLs were gathered on the Tranco Top Domains list, the list of popular and reputable sites, thus imitating the normal web traffic patterns[5]. The publicly available phishing repositories and Kaggle datasets of labeled malicious links were used as sources of phishing URLs[3],[7]. The integrated dataset was balanced so as to minimize bias in classification so as to train the model fairly. To perform supervised semantic classification with the help of BERT, URLs received binary labels: the 0 values were used to represent legitimate and 1 phishing. Conversely, the AutoEncoder was only trained on valid URLs so as to establish the normal structural features of normal web behavior. The design will make the system more able to identify zero-day phishing attacks by detecting structural anomalies through the identification of significant deviation to known benign data. 2. Data Preprocessing All the URLs in the dataset were subjected to a semantic classification phase of a fine tuned BERT model (bert-base- uncased ) to learn contextual phishing patterns[1], [4]. The tokenized URL was subjected to the transformer architecture and the [CLS] token embedding was obtained as a 768- dimensional semantic feature of the full URL[1]. This embedding allows the model to detect the patterns of deceptive languages, brand impersonation, and the manipulations on the lexical level, which are often applied in phishing attacks[4]. The contextual representation was subsequently fed through a fully connected classification layer and a sigmoid activation function to produce a score on whether an individual is a phisher or not. This is the probability of the URL being malicious and is the semantic element of the hybrid detection system. 3. BERT Semantic Classification We have used a pre-trained BERT model (bert-base-uncased) to do fine-tuning on binary classification to distinguish between legitimate and phishing URLs[1], [4]. The semantic feature vector upon which the input sequence was classified was the 768 dimensional token embedding of the input sequence during processing[1]. This representation allows the model to learn the complicated contextual relationships in the URL text like brand impersonation patterns, phishing-related keywords, obfuscated textual patterns and deceptive lexical signatures that are frequently employed in malicious links[4]. Through the self-attention scheme induced by transformer based architectures, the BERT branch essentially learns finer grained semantic features which a traditional feature-based model may miss[1]. The result of this branch is a probabilistic score which is produced via a sigmoid activation function and which is the probability of the URL as belonging to phishing. Fig. 1 BERT Architecture 4. AutoEncoder Based Anomaly Detection An autoencoder with a complete architecture was developed to learn the structural traits of URLs with 24 dimensions of feature vectors at the input[2], [7]. These features are reduced by the encoder to 8 dimensional latent representations (24 -16 -8) that reflect the key structural patterns of valid URLs. The decoder is made up of symmetric layers (8 16 24) with ReLU and Sigmoid activation functions to rebuild the original feature vector. In the process of inference, Mean Squared Error (MSE) is used to calculate the reconstruction error between the original and reconstructed features[2]. Anomalous URLs are identified by a threshold set based on the 95th percentile of reconstruction error between valid training data, and any values above the threshold are considered as possible phishing attacks[7]. Fig. 2 AutoEncoder Architecture 5. Hybrid Fusion Mechanism In order to boost the detection strength, the semantic and structural representations were combined into a unified decision framework. The 768 dimensional BERT embedding was added to the 8-dimensional latent representation created by the AutoEncoder to create a hybrid feature representation. A fully connected neural network having architecture Linear(768+8 64 1) was then used to pass this fused vector. A sigmoid activation mechanism was used to generate the final phishing possibility score. Such a hybrid design allows the system to identify the semantic anomalies or known phishing patterns and the anomaly detection of zero- day structural anomalies. Fig. 3 System Overview 6. Training Strategy The binary cross entropy (BCE) loss was used to train the hybrid model to optimize the performance of phishing classification by reducing the difference between the predicted probabilities and the true labels. Loss=BCE(y,y^) Adam optimizer was used as the optimization process and it offers adaptive changes in the learning rate to ensure stable and efficient convergence. Generalization and overfitting were prevented by appropriate scheduling of the learning rate. AutoEncoder was also trained individually to minimize the reconstruction error beteen structural features of input and their reconstructed objects with the help of Mean Squared Error (MSE) loss. This independent training approach provides proper anomaly recognition and good classification in the hybrid framework. Loss=MSE(x,x^) 7. Evaluation Metrics The performance of the proposed model was evaluated using standard classification metrics including Accuracy, Precision, Recall, and F1-score. Accuracy measures the overall correctness of the models predictions across both classes. Precision evaluates how many of the URLs predicted as phishing were actually malicious, thereby reflecting false positive control. Recall measures the models ability to correctly identify actual phishing URLs, indicating its sensitivity. Additionally, a Confusion Matrix was used to provide a detailed analysis of true positives, true negatives, false positives, and false negatives, ensuring a balanced evaluation of model performance. 5. RESULTS AND DISCUSSION 1. Quantitative Results The proposed hybrid phishing detection system was evaluated using a balanced dataset consisting of legitimate URLs from the Tranco list and phishing URLs collected from publicly available repositories. The BERT semantic classifier achieved high validation accuracy (approximately 97%), demonstrating strong contextual understanding of phishing patterns. The AutoEncoder-based anomaly detection model effectively identified structural irregularities using reconstruction error with a 95th percentile threshold. The hybrid ensemble model, which combines semantic and structural representations, achieved improved overall accuracy, precision, recall, and F1-score compared to individual components. Confusion matrix analysis indicated a high true positive rate for phishing detection and a low false positive rate for legitimate URLs, confirming balanced model performance. Fig. 4 Performance Metrics 2. Prediction Output and Classification Behavior The output of the system is a probabilistic phishing score generated through a sigmoid activation function. URLs predicted as legitimate produced low probability values close to zero, while phishing URLs produced probability values close to one, indicating strong model confidence. The AutoEncoder component generated reconstruction error scores, where legitimate URLs showed minimal deviation from learned structural patterns, and malicious URLs exhibited significantly higher anomaly scores. The hybrid fusion mechanism successfully integrated both outputs to produce stable and reliable final classifications. Real-time testing through the deployed web interface demonstrated consistent and interpretable predictions, with clear distinction between legitimate and phishing URLs. support use in creative AI application. Fig.5 Detected URL as Legit Fig. 6 Detected URL as Phishing 3. Model Complexity and Analysis The pre-trained BERT-base is used in the semantic branch and has about 110 million parameters and uses transformer- based self-attention mechanisms. Although BERT has a good contextual learning capability, it has higher computational needs during inference and training. In comparison, the AutoEncoder model is minimalistic, and it comprises fully connected layers with greatly reduced parameters and minimal computation costs. The hybrid model adds another dimension whereby a 768-dimensional BERT embedding is concatenated with an 8-dimensional latent space and then a small fusion network is used. Inference time is also practical to real-time use when using standard hardware regardless of the addition of BERT. In general, the computational trade- off would be covered by the enhanced detection robustness realized by hybrid integration. 4. Discussion The findings of the given research indicate that a combination of semantic intelligence and structural anomaly detectors can contribute to the phishing URL detection process greatly through the use of a hybrid deep learning model. The semantic model based on BERT was found to be very useful at detecting contextual patterns of deception, including brand impersonation, suspicious keywords, and hidden lexical manipulations that are embedded in the URLs. The fact that its transformer-based architecture enabled the system to capture complex contextual relationships, which its traditional machine learning models do not tend to acknowledge overly. The large validation accuracy of the semantic model shows that contextual knowledge is an important element in contemporary phishing detection. Nonetheless, semantic analysis might not necessarily be able to identify structurally abnormal URLs that look lexically innocuous but diverge in the valid patterns of URLs. The AutoEncoder-based anomaly detection element overcame this drawback by training the structural properties of legal URLs. The model was enabled to measure reconstruction error as an abnormality measure by compressing URL features into a latent representation and reconstructing them. This method was especially applicable in detecting the zero- day phishing attacks and newly generated malicious URLs which were not directly observed in the process of supervised training. However, the detection of structural anomaly on its own can sometimes indicate an abnormal but legitimate URL as suspicious, in particular when a legitimate URL has an unusual format pattern. Combination of the two models to a hybrid ensemble model gave a well-balanced and powerful detection system. The system combined the strengths of both systems by joining the semantic embedding that BERT performs with the latent structural representation that the AutoEncoder does. The hybrid model minimized the false positives in comparison to standalone anomaly detection and maximized the reliability of the detection in comparison to standalone semantic classification. This illustrates the fact that the twofold perspective analysis, in which contextual meaning and structural integrity are considered jointly, is useful in phishing detection. Computationally, the BERT model adds complexity to the entire model as it has a transformer structure and large parameters. Nevertheless, the AutoEncoder is still light and computationally efficient. Although to train the hybrid model, one will need moderate computing capacities, the inference time is practical in real- time web-based implementation. This renders the proposed system to be applicable to academic research, small business enterprise deployment, and cybersecurity use cases where there is a compromise of detection accuracy and available hardware resources. A key finding of this research is the importance of diversity of data sets. Using the AutoEncoder to be trained only on legit URLs enabled the model to create a clear image of what is considered as normal structural behavior. In the same manner, the assessments of legitimate and phishing samples during the supervised training enhanced the fairness of classification and decreased bias. There might be a small or homogeneous data that might narrow down the possibility of generalization and risk of misclassification. Thus, it is critical to have a wide and representative data set to obtain a consistent and scalable phishing detection system. Conclusively, the findings demonstrate the relevance of combining deep learning structures in dealing with the changing features of phishing attacks. Although semantic models offer contextual intelligence and structural deviations are detected by structural anomalies, a combination of the two offers a robust defense mechanism. The directions of future research could include lightweight transformers, adversarial training, and more hybrid architectures, which use other contextual information, like webpage text or reputation of a domain. Hybrid AI-based methods of detecting phishing attacks are a promising future of constructing resilient and smart cybesecurity tools as phishing attacks keep advancing and getting more complex. 6. CONCLUSION The paper described an AI-based phishing detector that combines both deep learning models to detect suspicious URLs, having a hybrid framework with a BERT-based semantic classifier and an AutoEncoder-based structural anomaly detector. The findings showed that the BERT model has the ability to learn contextual phishing features including brand impersonation, suspicious use of keywords and misleading lexical structures in URLs. Experimental analysis established that the hybrid system had high classification and balanced precision-recall accuracy, which is a good indication that transformer-based semantic understanding has strong evidence to boost the capability of phishing detection. With the integration of systematic preprocessing and stable neural network training procedures, the system kept on giving reliable predictions on a wide range of valid and phishing URLs. Efficiently trained lightweight structural models like AutoEncoders can be used to offer low cost anomaly detection by learning the normal properties of legitimate URL structure; however when operated alone they can give false positives in scenarios of uncommon yet harmless URL structure. More sophisticated semantic models are available like BERT, which have more contextual knowledge and stronger detection capability however they need more computational resources because of their transformer architecture. The hybrid nature of the semantic and structural components offered superior generalization and stability in the detection as opposed to independent models though the introduction of BERT raises the complexity of computation in training and application. On the whole, the system that was devised in this paper will show a viable and efficient method of automated phishing detection, allowing to classify in real- time URLs with the help of deployable web-based interface. The hybrid deep learning model lowers the need to engineer rules manually and increases flexibility to changing phishing schemes. To sum up, this study has identified the increased importance of hybrid artificial intelligence systems in cybersecurity as an example of how contextual semantic analysis and structural anomaly detection, when used together, can offer a robust and scalable and intelligent protection against contemporary phishing attacks. Fig. 7 Web interface of the URL phishing detection system 7. REFERENCES 1. S. Ahmad et al., Across the Spectrum: In-Depth Review of AI-Based Models for Phishing Detection, IEEE Access, 2025. 2. F. S. Alsubaei et al., Enhancing Phishing Detection: A Novel Hybrid Deep Learning Framework, IEEE Access, 2024. 3. A. Karim and M. Shahroz, Phishing Detection System Through Hybrid Machine Learning Based on URL, 2023. 4. S. Asiri, Y. Xiao, and T. Li, A Survey of Intelligent Detection Designs of HTML URL Phishing Attacks, IEEE Access, 2023. 5. J. Kline, E. Oakes, and P. Barford, A URL-based analysis of WWW structure and dynamics, in Proc. Netw. Traffic Meas. Anal. Conf. (TMA), Jun. 2019, p. 800. 6. A. K. Murthy and Suresha, XML URL classification based on their semantic structure orientation for web mining applications, Procedia Comput. Sci., vol. 46, pp. 143150, Jan. 2015. 7. A. A. Ubing, S. Kamilia, A. Abdullah, N. Jhanjhi, and M. Supramaniam, Phishing website detection: An improved accuracy through feature selection and ensemble learning, Int. J. Adv. Comput. Sci. Appl., vol. 10, no. 1, pp. 252257, 2019. 8. A. Aggarwal, A. Rajadesingan, and P. Kumaraguru, PhishAri: Automatic realtime phishing detection on Twitter, in Proc. eCrime Res. Summit, Oct. 2012, pp. 112. 9. S. N. Foley, D. Gollmann, and E. Snekkenes, Computer Security ESORICS 2017, vol. 10492. Oslo, Norway: Springer, Sep. 2017. 10. P. George and P. Vinod, Composite email features for spam identification, in Cyber Security. Singapore: Springer, 2018. ______________

WebGuard AI-Powered Cyber Threat Detector Using BERT And AutoEncoder View Abstract & download full text of WebGuard AI-Powered Cyber Threat Detector Using BERT And AutoEncoder Download Full-Tex...

#Volume #15, #Issue #03 #(March #2026)

Origin | Interest | Match

0 0 0 0
AI LAD: Lightweight Log Anomaly Detection System with Hybrid Detection and LLM-Assisted Analysis **DOI :****10.17577/IJERTV15IS030611** Download Full-Text PDF Cite this Publication R. Sivasubramanian, N. Srikar Reddy, P. Dhanush Pavan, S. Nirupam Srivarma, S.Siva Sathvik, 2026, AI LAD: Lightweight Log Anomaly Detection System with Hybrid Detection and LLM-Assisted Analysis, INTERNATIONAL JOURNAL OF ENGINEERING RESEARCH & TECHNOLOGY (IJERT) Volume 15, Issue 03 , March – 2026 * **Open Access** * Article Download / Views: 0 * **Authors :** R. Sivasubramanian, N. Srikar Reddy, P. Dhanush Pavan, S. Nirupam Srivarma, S.Siva Sathvik * **Paper ID :** IJERTV15IS030611 * **Volume & Issue : ** Volume 15, Issue 03 , March – 2026 * **Published (First Online):** 21-03-2026 * **ISSN (Online) :** 2278-0181 * **Publisher Name :** IJERT * **License:** This work is licensed under a Creative Commons Attribution 4.0 International License __ PDF Version View __ Text Only Version #### AI LAD: Lightweight Log Anomaly Detection System with Hybrid Detection and LLM-Assisted Analysis R. Sivasubramanian Assistant Professor, Dept. of Artificial Intelligence & Machine Learning Malla Reddy University, Hyderabad,India N. Srikar Reddy Dept. of Artificial Intelligence & Machine Learning, Malla Reddy University Hyderabad, India P. Dhanush Pavan Dept. of Artificial Intelligence & Machine Learning, Malla Reddy University Hyderabad, India S. Nirupam Srivarma Dept. of Artificial Intelligence & Machine Learning Malla Reddy University, Hyderabad, India S. Siva Sathvik Dept. of Artificial Intelligence & Machine Learning, Malla Reddy University Hyderabad, India Abstract – The increasing adoption of distributed architectures and cloud-based services has resulted in a rapid growth of system- generated log data produced across multiple heterogeneous platforms. Conventional log monitoring approaches mainly depend on static rule-based techniques, which are often ineffective at detecting emerging or subtle anomaly patterns. To address this limitation, this study introduces AI LAD, a lightweight log anomaly detection framework that combines heuristic methods, machine learning techniques, and LLM-assisted forensic summarization to enable efficient real-time log analysis.The proposed system applies TFIDF feature extraction along with an Isolation Forest model to detect anomalous log entries originating from diverse log sources. A structured preprocessing pipeline is designed to manage noisy, inconsistent, and semi-structured log data. Several detection strategies were evaluated during experimentation, after which a hybrid detection framework was selected based on its superior F1- score and balanced accuracy performance.The solution is implemented as a real-time desktop application capable of generating structured outputs that include anomaly classifications, severity indicators, and concise forensic summaries produced through a lightweight large language model integration. Experimental evaluation demonstrates that the system achieves strong cross-source generalization, maintains efficient runtime performance, and offers practical applicability for automated log monitoring and anomaly detection in modern computing environments. #### Index Terms – Log Anomaly Detection, Hybrid Detection, Isolation Forest, TF-IDF, LLM-Assisted Forensics, Cross-Source Evaluation, Real-Time Monitoring, Machine Learning. 1. INTRODUCTION Modern distributed platforms, cloud infrastructures, and enterprise software systems continuously produce vast amounts of system and application log data. These logs capture critical information about system activities, including security events, authentication attempts, performance indicators, operational states, and failure occurrences. Proper analysis of this log data plays a key role in maintaining system reliability, identifying potential security threats, and ensuring stable system operations. However, the increasing volume and heterogeneity of log data make manual monitoring both inefficient and impractical. Conventional log monitoring solutions generally depend on predefined rules and static threshold mechanisms to identify abnormal behavior. Although these techniques can effectively detect previously known patterns, they often struggle to recognize new or evolving anomalies. Deep learningbased approaches have been introduced to improve contextual understanding of log patterns, but such methods typically require substantial computational resources, complex deployment environments, and are not always suitable for real-time applications. Another challenge arises from the structural differences among logs generated by heterogeneous systems, including HPC clusters, Windows servers, Apache web servers, and Linux-based environments. This variability complicates the development of models that can generalize effectively across multiple log sources. To overcome these challenges, this study proposes AI LAD, a lightweight hybrid log anomaly detection framework designed for efficient real-time monitoring across diverse environments. The proposed system combines heuristic severity scoring with machine learningbased anomaly detection using TFIDF feature extraction and the Isolation Forest algorithm. Additionally, the framework incorporates selective integration of a lightweight large language model to generate structured forensic summaries for detected anomalies, improving interpretability while preserving runtime efficiency. Through lightweight modeling techniques and a modular system architecture, the proposed solution offers a scalable and practical approach for automated log analysis in real-world operational settings. 2. LITERATURE REVIEW Log anomaly detection has progressed considerably over the past decade, evolving from traditional rule-based monitoring techniques toward more advanced machine learning and deep learning approaches. Earlier systems mainly depended on statistical analysis and predefined rules to identify abnormal patterns within system logs. Although these approaches were useful for detecting known anomalies, the rapid expansion of distributed systems, cloud platforms, and large-scale enterprise applications has created complex and heterogeneous log environments that require more flexible and scalable anomaly detection methods. Liu et al. [1] introduced the Isolation Forest algorithm, an unsupervised anomaly detection technique based on the concept of isolating abnormal data points through recursive partitioning. Because of its computational efficiency and its ability to handle high-dimensional data, Isolation Forest has become widely used in anomaly detection tasks across multiple domains. Chandola et al. [2] presented a comprehensive survey of anomaly detection techniques, providing an overview of various detection methods including statistical, proximity- based, and machine learning approaches. Aggarwal [3] provided an extensive discussion of outlier detection algorithms and their applications in large-scale data analysis. In the context of log analysis, preprocessing and log parsing play an important role in enabling effective anomaly detection. He et al. [4] proposed Drain, an online log parsing method that converts raw log messages into structured templates using a fixed-depth tree structure. This transformation allows log data to be processed more efficiently by automated analysis systems. Du et al. [5] later introduced DeepLog, which applies recurrent neural networks to learn sequential patterns in system logs and detect anomalies when deviations from normal sequences occur. More recent research has focused on improving robustness and adaptability by combining multiple detection techniques. Zhang et al. [6] conducted a survey of modern log anomaly detection approaches and highlighted key challenges such as heterogeneous log formats, cross-source variability, data imbalance, and real-time processing constraints. Chen et al. [7] proposed hybrid frameworks that integrate rule-based heuristics with machine learning algorithms to improve detection accuracy and reliability in operational environments. Kumar et al. [8] further explored cross-source log analysis and emphasized the difficulty of building anomaly detection models that generalize effectively across logs generated from different platforms. Sharma et al. [9] demonstrated that combining TF-IDF feature extraction with Isolation Forest can effectively identify anomalous patternswithin log datasets while maintaining efficient computational performance. In addition, recent advancements in large language models have introduced new possibilities for automated log interpretation. The Gemini model family, introduced by Google Research [10], demonstrates strong capabilities in contextual language understanding and summarization, which can support automated explanation and forensic analysis of detected anomalies. Despite these advancements, many existing approaches either rely on computationally intensive deep learning architectures or depend on static rule-based monitoring systems that lack adaptability. Furthermore, relatively little research has focused on lightweight and deployable log anomaly detection platforms capable of performing real-time monitoring across heterogeneous environments. To address these challenges, this research proposes AI LAD, a lightweight hybrid log anomaly detection framework that integrates heuristic analysis, TF-IDF feature representation, Isolation Forest-based anomaly scoring, and LLM-assisted forensic summarization within a practical desktop-based deployment architecture. 3. PROPOSED METHODOLOGY The system follows a structured hybrid anomaly detection methodology designed to analyze system logs efficiently across heterogeneous environments. The methodology integrates log preprocessing, feature extraction, anomaly detection modeling, rule-based evaluation, and LLM-assisted forensic summarization. The overall workflow supports real- time monitoring while maintaining low computational overhead. The processing pipeline follows the sequence: Log Input Log Preprocessing Feature Representation Hybrid Anomaly Detection Rule-Based Evaluation LLM Forensic Summarization Alert Output and Storage This pipeline ensures systematic processing of log events while enabling scalable monitoring and automated anomaly detection. 1. Log Preprocessing System logs collected from different platforms often contain semi-structured data with varying formats. Therefore, preprocessing is performed to normalize log messages and extract meaningful attributes required for further analysis. The preprocessing stage includes the following steps: * Regex-based parsing of raw log messages * Extraction of severity indicators and relevant keywords * Identification of IP addresses, timestamps, and event patterns * Removal of redundant metadata fields These operations transform raw log entries into structured representations while preserving anomaly-related patterns present in the data. 2. Feature Representation Before applying machine learning algorithms, log messages must be converted into numerical representations. In this system, TF-IDF (Term FrequencyInverse Document Frequency) encoding is used to convert textual log messages into sparse numerical feature vectors. TF-IDF measures the relative importance of words within the dataset and provides an efficient representation for textual anomaly detection tasks involving high-dimensional log data. 3. Anomaly Detection Modeling The system employs a hybrid detection strategy that combines heuristic severity scoring with machine learning based anomaly detection.The machine learning component uses the Isolation Forest algorithm, an unsupervised anomaly detection method that isolates anomalous data points through recursive partitioning. Anomaly scores are generated based on how easily log events can be separated from normal observations. Multiple detection modes are supported: * Heuristic-based detection * Machine learningbased detection * Hybrid detection combining both approaches 4. Severity Classification Instead of using a binary anomaly classification, the system applies a multi-level severity classification mechanism. Detected anomalies are categorized into four severity levels, enabling better prioritization of system alerts. The classification component outputs probability scores for each severity category. The final severity label corresponds to the class with the highest probability value. A refinement stage ensures that predicted severity levels remain consistent with contextual indicators present in the log message. 5. Deployment Integration After model training and evaluation, the detection pipeline is integrated into a desktop-based monitoring platform. Incoming log events are processed using the same preprocessing and detection pipeline applied during system development. For each analyzed log entry, the system generates structured outputs including: * Anomaly label * Severity level * Anomaly score These outputs enable automated alert generation and operational monitoring. 4. SYSTEM ARCHITECTURE Fig. 1. AI-LAD System Architecture. The lightweight log anomaly detection framework follows a modular and layered architecture designed for scalability, efficiency, and real-time deployment. The architecture integrates preprocessing, hybrid anomaly detection, rule-based automation, LLM-assisted forensic summarization, and persistent storage into a unified monitoring system. 1. 1. Overall Architecture The architecture consists of five primary components: 1. User Interface Layer The user interface accepts log inputs through live monitoring streams or dataset ingestion within a desktop-based interface developed using CustomTkinter. The interface enables users to monitor logs, view detected anomalies, and review system alerts. 2. Preprocessing Module The preprocessing module parses and normalizes raw log messages using regex-based extraction techniques. It identifies attributes such as severity indicators, timestamps, keywords, and IP patterns to structure the log data for analysis. 3. Hybrid Detection Engine The hybrid detection engine processes structured log entries using TF-IDF feature encoding and Isolation Forestbased anomaly scoring. The anomaly score generated by the model is combined with heuristic severity indicators to produce the final anomaly classification. 4. Rule and Response Layer The rule layer applies monitoring rules such as priority-based triggers, repeated-event windows, and temporary blocklist logic. These rules enable automated alert generation and operational response handling. 5. LLM and Database Layer The final layer integrates forensic explanation and persistent storage. The system utilizes Gemini Flash 2.0 via OpenRouter to generate structured forensic summaries describing detected anomalies. Processed logs, alerts, and responses are stored in a thread-safe SQLite database. 2. Detection Engine Layer The core detection pipeline consists of the following components: * Regex-based log parser * TF-IDF vectorizer * Isolation Forest model * Heuristic severity scoring module * Hybrid decision logic The TF-IDF vectorizer converts processed log messages into numerical feature vectors, while the Isolation Forest algorithm computes anomaly scores based on data isolation principles. These scores are combined with heuristic severity indicators to determin the final anomaly classification. 3. LLM Service Integration The system integrates Gemini Flash 2.0 through OpenRouter to generate forensic explanations for detected anomalies. The LLM service performs the following operations: Receives anomalous log entries Generates structured forensic summaries Validates outputs using Pydantic schema enforcement Handles truncated or incomplete JSON responses This integration improves interpretability while maintaining efficient detection performance. 4. Modularity and Scalability The architecture follows a modular design that allows independent modification of system components such as preprocessing, detection models, rule evaluation, and LLM services. This modular structure supports future enhancements including cloud-based deployment, distributed log ingestion pipelines, additional anomaly detection models, and advanced monitoring dashboards. 5. TRAINING CONFIGURATION AND OPTIMIZATION 1. Training Configuration The anomaly detection framework is trained using an unsupervised learning strategy that models the normal structural patterns present in system logs. The dataset is divided into training and testing subsets to evaluate the models ability to generalize across different log sources. During training, log messages are first converted into numerical representations using TF-IDF feature encoding. These feature vectors are then used to train the Isolation Forest model, which learns the distribution of normal log patterns and identifies outliers that deviate from this distribution. * Feature Representation: TF-IDF vectorization * N-gram Range: (1,2) * Maximum Features: Determined based on dataset characteristics * Anomaly Detection Model: Isolation Forest * Training Iterations: Approximately 34 optimization cycles Validation monitoring is applied during training to ensure stable performance and to reduce the risk of overfitting. The training process focuses on learning representative patterns from normal log behavior while maintaining computational efficiency. 2. Model Optimization The Isolation Forest model constructs an ensemble of random decision trees designed to isolate anomalous data points efficiently. The training procedure follows three main steps: 1. Random subsets of the training data are selected to construct isolation trees. 2. Each tree recursively partitions the feature space by selecting random features and split values. 3. The path length required to isolate each instance is calculated and averaged across all trees to determine the final anomaly score. Model performance is evaluated on validation datasets to ensure consistent anomaly scoring behavior. Training continues until stable performance is achieved across evaluation metrics, ensuring reliable anomaly detection across diverse log sources. 6. MATHEMATICAL FORMULATION The anomaly detection task is formulated as an unsupervised outlier detection problem over system log messages. Each log entry is first transformed into a numerical feature representation and then evaluated using the Isolation Forest model to determine its anomaly score. 1. Log Representation Let the set of system log messages be represented as: L = {x 1 , x 2 , x 3 , . . . , x n } where xi represents an individual log message and n denotes the total number of log entries in the dataset. Since log messages are textual in nature, they must be converted into numerical vectors before applying machine learning algorithms. This transformation is performed using TFIDF encoding. For a given log message x, the TFIDF representation is defined as: (x) = (w1, w2, w3, . . . , wm) where m represents the number of features (terms) extracted from the log corpus and w j denotes the TFIDF weight of the j th term. The TFIDF weight for a term t in a log message x is calculated as: TFIDF(t, x) = TF(t, x) × IDF(t) Where H(i) represents the harmonic number: H(i) = ln(i) + and is the EulerMascheroni constant (approximately 0.577). An anomaly score close to 1 indicates a highly anomalous instance, while values closer to 0 represent normal observations. A log entry is classified as anomalous if: Where: And: Here: TF(t, x) = kf(k, x)f(t, x) IDF(t) = log(df(t)N) (, ) > where is a threshold determined by the contamination parameter of the Isolation Forest model. 1. Hybrid Decision Function * f(t,x) represents the frequency of term t in log message x * N denotes the total number of log * d f(t) represents the number of documents containing term t 2. Isolation Forest Anomaly Scoring After feature representation, anomaly detection is performed using the Isolation Forest algorithm, which isolates anomalies by randomly partitioning the data space. Let h(x) denote the path length required to isolate instance x in a randomly generated isolation tree. The expected path length across all trees is represented as: To improve detection robustness, the system combines the anomaly score generated by the Isolation Forest model with heuristic severity indicators extracted from log attributes. Let: * M(x) represent the anomaly score from the isolation forest model * H(x) represent the heuristic severity score derived from log features such as keywords, failed authentication,or suspicious IP patterns The final anomaly decision score is defined as: D(x) = H(x) + M(x) as: E(h(x)) The anomaly score s(x,n) for a data point x is calculated s(x, n) = 2 c(n)E(h(x)) where: Where: * and are weighting parameters controlling the contribution of heuristic and model-based components * +=1 The final classification decision is obtained using: y(x) = {1 ,0}if D(x) > otherwise * E(h(x)) is the average path length of instance x across all isolation tress * n is thenumber of samples used to consturct the tress * c(n) is the average path length of unsuccesfull searches in a binary search tree The value of c(n) is approximated as: c(n) = 2H(n 1) (n2(n 1)/n) Where: * y(x)=1 indicates an anomalous log event * y(x)=0 indicates an anomalous log event * represents the decision threshold. 1. Severity Classification Detected anomalies are further categorized into multiple severity levels based on contextual indicators present in the log message. Let the severity classification function be: S(x) = arg maxP(yk x) Where: * represents the decision threshold * P(y kx) represents the probability of log message x belonging to severity class k * k{1,2,3,4} corresponds to severity levels such as Low, Medium, High, and Critical 7. RESULTS After training, the final anomaly detection model is integrated into the monitoring platfor to enable real-time analysis of incoming log events. During system initialization, the trained model and TF-IDF feature extractor are loaded so that the application can perform efficient inference on new log data. Incoming log entries are processed using the same preprocessing pipeline used during training, including normalization, regex-based parsing, and feature transformation. The transformed feature vectors are then evaluated by the Isolation Forest model to compute anomaly scores and determine whether a log entry represents normal activity or suspicious behavior. For each processed log entry, the system generates structured outputs containing the anomaly label, severity level, and anomaly score. These outputs are used by the monitoring interface to update alert notifications and visualize abnormal system behavior. The platform continuously processes incoming events and maintains system statistics such as total logs processed, detected anomalies, and overall system status indicators. Fig. 2. AI LAD monitoring dashboard displaying anomaly statistics, processed logs, system health indicators, and recent alert notifications. The monitoring dashboard provides a centralized overview of system activity and anomaly detection results. It presents aggregated metrics including the number of detected anomalies, processed logs, and current system health indicators. Graphical visualizations are used to illustrate anomaly trends over time and system efficacy, enabling administrators to quickly assess the operational status of the monitoring environment. Recent alerts are also displayed to highlight newly detected threats requiring attention. To support continuous monitoring, the system includes a live log streaming interface that displays incoming log events in real time. This interface allows administrators to observe system activity as it occurs and detect abnormal patterns immediately. Fig. 3. Live monitoring module showing real-time log stream, anomaly indicators, and threat distribution statistics. The live monitoring module continuously updates as new events are received. Log messages are displayed sequentially while the anomaly detection engine evaluates each entry in real time. Detected anomalies are highlighted using severity indicators to assist operators in identifying suspicious activity quickly. In addition, the interface provides statistical summaries such as threat distribution and frequently occurring attack sources, allowing users to understand system behavior at a glance. Detected anomalies can be further examined using the forensic analysis module, which provides contextual interpretation of suspicious events. Fig. 4 Forensic analysis module presenting automated summaries, detected threat information, and recommended investigation actions. The forensic interface presents structured summaries that describe the potential cause and impact of detected anomalies. It highlights important attributes such as detected attack type, source IP addresses, and event severity levels. The system also provides investigation suggestions and monitoring recommendations, assisting analysts in understanding the context of anomalous behavior without manually inspecting large volumes of log data. All processed events, alerts, and forensic summaries are stored for later review and reporting. This deployment structure allows the system to operate as a real-time log monitoring and anomaly detection platform, combining automated analysis, interactive visualization, and structured forensic interpretation within a unified monitoring environment. 8. CONCLUSION The developed system demonstrates stable and efficient performance during experimental evaluation and cross-source testing. The obtained results indicate reliable anomaly detection capability, achieving strong F1-scores and balanced accuracy across heterogeneous log datasets. In addition, the system maintains low inference latency and high processing throughput, confirming its suitability for real-time monitoring scenarios. The integration of LLM-assisted forensic summarization improves interpretability by generating structured explanations for detected anomalies. This capability helps analysts understand abnormal system behavior more effectively without requiring manual inspection of large volumes of raw log data. The modular desktop-based architecture further highlights the systems practical applicability for operational environments. Its lightweight design allows efficient deployment while maintaining scalability and adaptability across different log sources and monitoring conditions. Overall, the framework provides a practical and efficient solution for automated log monitoring and anomaly detection. By combining hybrid detection techniques with structured forensic analysis, the system offers a lightweight yet powerful approach for real-time identification and interpretation of anomalous system events. REFERENCES 1. F. T. Liu, K. M. Ting, and Z.-H. Zhou, Isolation Forest, Proc. IEEE International Conference on Data Mining (ICDM), 2008. 2. G. Chandola, A. Banerjee, and V. Kumar, Anomaly Detection: A Survey, ACM Computing Surveys, vol. 41, no. 3, 2009. 3. C. C. Aggarwal, Outlier Analysis, 2nd ed., Springer, 2017. 4. P. He, J. Zhu, Z. Zheng, and M. R. Lyu, Drain: An Online Log Parsing Approach with Fixed Depth Tree, Proc. IEEE International Conference on Web Services (ICWS), 2017. 5. M. Du, F. Li, G. Zheng, and V. Srikumar, DeepLog: Anomaly Detection and Diagnosis from System Logs, Proc. ACM Conference on Computer and Communications Security (CCS), 2017. 6. X. Zhang et al., A Survey on Log Anomaly Detection Techniques, IEEE Access, 2021. 7. S. Chen et al., Hybrid Log Anomaly Detection Frameworks for Real- World Systems, IEEE Transactions on Services Computing, 2023. 8. A. Kumar et al., Cross-Source Log Analysis and Generalization in Anomaly Detection, ACM Transactions on Internet Technology, 2024. 9. S. Sharma et al., Efficient Log Anomaly Detection Using TF-IDF and Isolation Forest, Journal of Systems and Software, 2023. 10. Google Research, Gemini: A Family of Highly Capable Multimodal Models, 2023 ______________

AI LAD: Lightweight Log Anomaly Detection System with Hybrid Detection and LLM-Assisted Analysis View Abstract & download full text of AI LAD: Lightweight Log Anomaly Detection System with Hybr...

#Volume #15, #Issue #03 #(March #2026)

Origin | Interest | Match

0 0 0 0