AI Safety

Artificial Intelligence (AI) is increasingly being utilized across a wide range of domains, including applications within safety-critical functions where reliability, robustness, and trustworthiness are of utmost importance. The deployment of AI in these areas introduces unique challenges, particularly when ensuring that the system performs consistently under various scenarios, including unexpected or adverse conditions. In this project, we explore innovative methods to address these challenges by leveraging adversarial techniques during the development process. By intentionally introducing perturbations and edge cases, we aim to identify vulnerabilities in AI models and enhance their robustness against potential failures.

Furthermore, we are developing sophisticated monitoring systems designed to continuously evaluate AI systems during operation. These monitors act as safeguards, ensuring that the AI operates strictly within its predefined capabilities and does not produce unsafe or unpredictable outputs when faced with inputs outside its training distribution. This dual approach—strengthening robustness through adversarial methods and ensuring safe operation through monitoring—plays a critical role in advancing the reliability of AI systems in safety-critical applications.