Research Overview


My research is concerned with the development of Safe AI, i.e, developing methods to ensure deployed artificial intelligence models do not pose a threat in high stakes environments. To address this, my work has focussed primarily on adversarial attacks (and how we can defend against them) in the Natural Language Processing (NLP) domain. My other works have explored other Safe AI related topics including: uncertainty for out of distribution handling; biases and shortcut learning.

My research has been applied to a range of tasks: standard NLP classification tasks (e.g. entailment and sentiment classification); grammatical error correction; neural machine translation, spoken language assessment, weather tabular data and standard image classification (object recognition) tasks.

Publications


Gender Bias and Universal Substitution Adversarial Attacks on Grammatical Error Correction Systems for Automated Assessment

Vyas Raina, Mark Gales

U.K. Speech 2022

Paper Poster Code
Abstract Grammatical Error Correction (GEC) systems perform a sequence-to-sequence task, where an input word sequence containing grammatical errors, is corrected for these errors by the GEC system to output a grammatically correct word sequence. With the advent of deep learning methods, automated GEC systems have become increasingly popular. For example, GEC systems are often used on speech transcriptions of English learners as a form of assessment and feedback - these powerful GEC systems can be used to automatically measure an aspect of a candidate's fluency. The count of edits from a candidate's input sentence (or essay) to a GEC system's grammatically corrected output sentence is indicative of a candidate's language ability, where fewer edits suggest better fluency. The count of edits can thus be viewed as a fluency score with zero implying perfect fluency. However, although deep learning based GEC systems are extremely powerful and accurate, they are susceptible to adversarial attacks: an adversary can introduce a small, specific change at the input of a system that causes a large, undesired change at the output. When considering the application of GEC systems to automated language assessment, the aim of an adversary could be to cheat by making a small change to a grammatically incorrect input sentence that conceals the errors from a GEC system, such that no edits are found and the candidate is unjustly awarded a perfect fluency score. This work examines a simple universal substitution adversarial attack that non-native speakers of English could realistically employ to deceive GEC systems used for

Residue-Based Natural Language Adversarial Attack Detection

Vyas Raina, Mark Gales

North American Chapter of the Association for Computational Linguistics (NAACL) 2022

Paper Poster Code
Abstract Deep learning based systems are susceptible to adversarial attacks, where a small, imperceptible change at the input alters the model prediction. However, to date the majority of the approaches to detect these attacks have been designed for image processing systems. Many popular image adversarial detection approaches are able to identify adversarial examples from embedding feature spaces, whilst in the NLP domain existing state of the art detection approaches solely focus on input text features, without consideration of model embedding spaces. This work examines what differences result when porting these image designed strategies to Natural Language Processing (NLP) tasks - these detectors are found to not port over well. This is expected as NLP systems have a very different form of input: discrete and sequential in nature, rather than the continuous and fixed size inputs for images. As an equivalent model-focused NLP detection approach, this work proposes a simple sentence-embedding "residue" based detector to identify adversarial examples. On many tasks, it out-performs ported image domain detectors and recent state of the art NLP specific detectors.

Shifts: A dataset of real distributional shift across multiple large-scale tasks

Andrey Malinin, Neil Band, German Chesnokov, Yarin Gal, Mark JF Gales, Alexey Noskov, Andrey Ploskonosov, Liudmila Prokhorenkova, Ivan Provilkov, Vatsal Raina, Vyas Raina, Mariya Shmatova, Panos Tigas, Boris Yangel

Advances in Neural Information Processing Systems (NeurIPS), Datasets and Benchmarks Track 2021

Paper Presentation Code
Abstract There has been significant research done on developing methods for improving robustness to distributional shift and uncertainty estimation. In contrast, only limited work has examined developing standard datasets and benchmarks for assessing these approaches. Additionally, most work on uncertainty estimation and robustness has developed new techniques based on small-scale regression or image classification tasks. However, many tasks of practical interest have different modalities, such as tabular data, audio, text, or sensor data, which offer significant challenges involving regression and discrete or continuous structured prediction. Thus, given the current state of the field, a standardized large-scale dataset of tasks across a range of modalities affected by distributional shifts is necessary. This will enable researchers to meaningfully evaluate the plethora of recently developed uncertainty quantification methods, as well as assessment criteria and state-of-the-art baselines. In this work, we propose the \emph{Shifts Dataset} for evaluation of uncertainty estimates and robustness to distributional shift. The dataset, which has been collected from industrial sources and services, is composed of three tasks, with each corresponding to a particular data modality: tabular weather prediction, machine translation, and self-driving car (SDC) vehicle motion prediction. All of these data modalities and tasks are affected by real, `in-the-wild' distributional shifts and pose interesting challenges with respect to uncertainty estimation. In this work we provide a description of the dataset and baseline results for all tasks. Abstract written here.

Universal adversarial attacks on spoken language assessment systems

Vyas Raina, Mark Gales, Katherine Knill

INTERSPEECH 2020

Paper Presentation Code
Abstract There is an increasing demand for automated spoken language assessment (SLA) systems, partly driven by the performance improvements that have come from deep learning based approaches. One aspect of deep learning systems is that they do not require expert derived features, operating directly on the original signal such as a speech recognition (ASR) transcript. This, however, increases their potential susceptibility to adversarial attacks as a form of candidate malpractice. In this paper the sensitivity of SLA systems to a universal black-box attack on the ASR text output is explored. The aim is to obtain a single, universal phrase to maximally increase any candidate's score. Four approaches to detect such adversarial attacks are also described. All the systems, and associated detection approaches, are evaluated on a free (spontaneous) speaking section from a Business English test. It is shown that on deep learning based SLA systems the average candidate score can be increased by almost one grade level using a single six word phrase appended to the end of the response hypothesis. Although these large gains can be obtained, they can be easily detected based on detection shifts from the scores of a “traditional” Gaussian Process based grader.

Challenges


Shifts Challenge 1.0

NeurIPS 2021

The Shifts Challenge aims to provide a standardized set of benchmark datasets and baseline models across a range of modalities to assess the impact of distributional shift in the wild. Often deployed systems will fail when used in domains where there exist a statistical shift from the source training domain. The aim of this challenge is two-fold: the development of models that are robust to real-life distributional shifts AND the ability of models to give a meaningful uncertainty measure for their predictions, i.e., the models should know when they are likely to be wrong, so that human intervention can be provided in such settings.

My work in this challenge focused on development of the Weather track, where I designed the data splits, augmentation of data and model baseline training. Further, I worked on refining and assessing uncertainty measures to be used for model evaluation. Finally, I actively helped in the organization, tutorial writing and running of the challenge at NeurIPS 2021.

The Shifts Weather Prediction dataset contains both a scalar regression and a multi-class classification task. Specifically, at a particular latitude, longitude, and timestamp, one must predict either the air temperature at two meters above the ground or the precipitation class, given targets and features derived from weather station measurements and weather forecast models. This data is used by Yandex for real-time weather forecasts and represents a real industrial application.

Paper Challenge Talks Code