Filter by type:

Sort by year:

Understanding Backdoor Attacks through the Adaptability Hypothesis

AI SafetyConference paper
Xun Xian, Ganghua Wang, Jayanth Srinivasa, Ashish Kundu, Xuan Bi, Mingyi Hong, Jie Ding
International Conference on Machine Learning (ICML)
Publication year: 2023

Abstract:

A poisoning backdoor attack is a rising security concern for deep learning. This type of attack can result in the backdoored model functioning normally most of the time but exhibiting abnormal behavior when presented with inputs containing the backdoor trigger, making it difficult to detect and prevent. In this work, we propose the adaptability hypothesis to understand when and why a backdoor attack works for general learning models, including deep neural networks, based on the theoretical investigation of classical kernel-based learning models. The adaptability hypothesis postulates that for an effective attack, the effect of incorporating a new dataset on the predictions of the original data points will be small, provided that the original data points are distant from the new dataset. Experiments on benchmark image datasets and state-of-the-art backdoor attacks for deep neural networks are conducted to corroborate the hypothesis. Our finding provides insight into the factors that affect the attack’s effectiveness and has implications for the design of future attacks and defenses.

Keywords:

Adversarial deep learning

Backdoor attack

Data poisoning

Towards Understanding Variation-Constrained Deep Neural Networks

AI FoundationsJournal paper
Gen Li, Jie Ding
IEEE Transactions on Signal Processing
Publication year: 2023

Abstract:

Multi-layer feedforward networks have been used to approximate a wide range of nonlinear functions. A fundamental problem is understanding the generalizability of a neural network model through its statistical risk, or the expected test error. In particular, it is important to understand the phenomenon that overparameterized deep neural networks may not suffer from overfitting when the number of neurons and learning parameters rapidly grow with n or even surpass n. In this paper, we show that a class of variation-constrained regression neural networks, with arbitrary width, can achieve a near-parametric rate n^{1/2+δ} for an arbitrarily small positive constant δ. It is equivalent to n^{1+2δ} under the mean squared error. This rate is also observed from numerical experiments. The result provides an insight into the benign overparameterization phenomenon. It indicates that the number of trainable parameters may not be a suitable complexity measure as often perceived for classical regression models. We also discuss the convergence rate regarding other network parameters, including the input dimension, network layer, and coefficient norm.

Keywords:

Complexity theory

Deep learning theory

Statistical convergence analysis

Pruning Deep Neural Networks from a Sparsity Perspective

AI FoundationsAI ScalabilityConference paper
E. Diao, G. Wang, J. Zhang, Y. Yang, J. Ding, V. Tarokh
International Conference on Learning Representations (ICLR)
Publication year: 2023

Abstract:

In recent years, deep network pruning has attracted significant attention in order to enable the rapid deployment of AI into small devices with computation and memory constraints. Pruning is often achieved by dropping redundant weights, neurons, or layers of a deep network while attempting to retain a comparable test performance. Many deep pruning algorithms have been proposed with impressive empirical success. However, existing approaches lack a quantifiable measure to estimate the compressibility of a sub-network during each pruning iteration and thus may underprune or over-prune the model. In this work, we propose PQ Index (PQI) to measure the potential compressibility of deep neural networks and use this to develop a Sparsity-informed Adaptive Pruning (SAP) algorithm. Our extensive experiments corroborate the hypothesis that for a generic pruning procedure, PQI decreases first when a large model is being effectively regularized and then increases when its compressibility reaches a limit that appears to correspond to the beginning of underfitting. Subsequently, PQI decreases again when the model collapse and significant deterioration in the performance of the model start to occur. Additionally, our experiments demonstrate that the proposed adaptive pruning algorithm with proper choice of hyper-parameters is superior to the iterative pruning algorithms such as the lottery ticket-based pruning methods, in terms of both compression efficiency and robustness.

Keywords:

Deep model pruning

Sparsity index

Provable Identifiability of Two-Layer ReLU Neural Networks via LASSO Regularization

AI FoundationsJournal paper
Gen Li, Ganghua Wang, Jie Ding
IEEE Transactions on Information Theory
Publication year: 2023

Abstract:

LASSO regularization is a popular regression tool to enhance the prediction accuracy of statistical models by performing variable selection through the l1 penalty, initially formulated for the linear model and its variants. In this paper, the territory of LASSO is extended to the neural network model, a fashionable and powerful nonlinear regression model. Specifically, given a neural network whose output y depends only on a small subset of input x, denoted by S, we prove that the LASSO estimator can stably reconstruct the neural network and identify S when the number of samples scales logarithmically with the input dimension. This challenging regime has been well understood for linear models while barely studied for neural networks. Our theory lies in an extended Restricted Isometry Property (RIP)-based analysis framework for two-layer ReLU neural networks, which may be of independent interest to other LASSO or neural network settings. Based on the result, we advocate a neural network-based variable selection method. Experiments on simulated and real-world datasets show promising performance of the variable selection approach compared with existing techniques.

Keywords:

Lasso

Identifiability

Neural network

Nonlinear regression

Variable selection

Personalized Federated Recommender Systems with Private and Partially Federated AutoEncoders

AI SafetyConference paper
Qi Le, Enmao Diao, Xinran Wang, Ali Anwar, Vahid Tarokh, Jie Ding
Asilomar Conference on Signals, Systems, and Computers (Asilomar)
Publication year: 2023

Abstract:

Recommender Systems (RSs) have become increasingly important in many application domains, such as digital marketing. Conventional RSs often need to collect users’ data, centralize them on the server-side, and form a global model to generate reliable recommendations. However, they suffer from two critical limitations: the personalization problem that the RSs trained traditionally may not be customized for individual users, and the privacy problem that directly sharing user data is not encouraged. We propose Personalized Federated Recommender Systems (PersonalFR), which introduces a personalized autoencoder-based recommendation model with Federated Learning (FL) to address these challenges. PersonalFR guarantees that each user can learn a personal model from the local dataset and other participating users’ data without sharing local data, data embeddings, or models. PersonalFR consists of three main components, including AutoEncoder-based RSs (ARSs) that learn the user-item interactions, Partially Federated Learning (PFL) that updates the encoder locally and aggregates the decoder on the server-side, and Partial Compression (PC) that only computes and transmits active model parameters. Extensive experiments on two real-world datasets demonstrate that PersonalFR can achieve private and personalized performance comparable to that trained by centralizing all users’ data. Moreover, PersonalFR requires significantly less computation and communication overhead than standard FL baselines.

Keywords:

Recommender systems

Data and model privacy

Parallel Assisted Learning

Decentralized AIJournal paper
Xinran Wang, Jiawei Zhang, Mingyi Hong, Yuhong Yang, Jie Ding
IEEE Transactions on Signal Processing
Publication year: 2023

Abstract:

In the era of big data, a population’s multimodal data are often collected and preserved by different business and government entities. These entities often have their local machine learning data, models, and tasks that they cannot share with others. Meanwhile, an entity often needs to seek assistance from others to enhance its learning quality without sharing proprietary information. How can an entity be assisted while it is assisting others? We develop a general method called parallel assisted learning (PAL) that applies to the context where entities perform supervised learning and can collate their data according to a common data identifier. Under the PAL mechanism, a learning entity that receives assistance is obligated to assist others without the need to reveal any entity’s local data, model, and learning objective. Consequently, each entity can significantly improve its particular task. The applicability of the proposed approach is demonstrated by data experiments.

Keywords:

Assisted learning

Incentive

Model Privacy: A Unified Framework to Understand Model Stealing Attack and Defense

AI SafetyManuscript
Ganghua Wang, Yuhong Yang, Jie Ding
Manuscript under review
Publication year: 2023

Abstract:

The security of machine learning models against adversarial attacks has been an increasingly important problem in modern application scenarios such as machine-learning-as-a-service and collaborative learning. The model stealing attack is a particular threat that aims to reverse-engineer a general learned model (e.g., a server-based API, an information exchange protocol, an on-chip AI architecture) from only a tiny number of query-response interactions. Consequently, the attack may steal a proprietary model in a manner that may be much more cost-effective than the model owner’s original training cost. Many modelstealing attack and defense strategies have been proposed with good empirical success. However, most existing works are heuristic, limited in evaluation metrics, and imprecise in characterizing loss and gain. This work presents a unified conceptual framework called Model Privacy for understanding and quantifying model stealing attacks and defenseModel privacy encapsulates the foundational tradeoffs regarding the usability and vulnerability of functionality of a learned model. Based on the developed concepts, we then develop fundamental limits on privacy-utility tradeoffs and their implications in various machine learning problems (e.g., those based on linear functions, polynomials, reproducing kernels, and neural networks). The studied new problems are also interesting from a theoretical perspective, as a model owner may maneuver multiple query responses jointly to maximally enhance model privacy, violating the data independence assumption that plays a critical role in classical learning theory. For example, we show that by breaking independence, a model owner can simultaneously attain a slight utility loss and a much larger privacy gain, a desirable property not achievable in independent data regimes.

Keywords:

Model-stealing attack and defense
Privacy

Information Criteria for Model Selection

AI FoundationsJournal paper
Jiawei Zhang, Yuhong Yang, Jie Ding
WIREs Computational Statistics (invited article)
Publication year: 2023

Abstract:

The rapid development of modeling techniques has brought many opportunities for data-driven discovery and prediction. However, this also leads to the challenge of selecting the most appropriate model for any particular data task. Information criteria, such as the Akaike information criterion (AIC) and Bayesian information criterion (BIC), have been developed as a general class of model selection methods with profound connections with foundational thoughts in statistics and information theory. Many perspectives and theoretical justifications have been developed to understand when and how to use information criteria, which often depend on particular data circumstances. This review article will revisit information criteria by summarizing their key concepts, evaluation metrics, fundamental properties, interconnections, recent advancements, and common misconceptions to enrich the understanding of model selection in general.

Keywords:

Modeling Methods

Model Selection

Information Theoretic Methods

Exploring Gradient Oscillation in Deep Neural Network Training

AI FoundationsConference paper
Chedi Morchdi, Yi Zhou, Jie Ding and Bei Wang
59th Annual Allerton Conference on Communication, Control, and Computing (Allerton)
Publication year: 2023

Abstract:

Understanding optimization in deep learning is a fundamental problem. Many previous works argued that gradient descent stably trains deep networks, yet recent work empirically discovered that the training is actually at the edge of stability. In this work, we take one step further toward exploring the instability of gradient descent in training deep networks. Specifically, through training various modern deep networks using gradient descent, we empirically show that most of the optimization progress is achieved with oscillating gradients – gradients that are highly negatively correlated in adjacent iterations. Moreover, we observe that such gradient oscillation (GO) has several fundamental properties: (i) GO appears in different training stages for networks with different architectures; (ii) under a large learning rate, GO is consistently observed across all the layers of the networks; and (iii) under a small learning rate, GO is more substantial in the input layers than in the output layers. Our discoveries suggest that GO is an essential and invariant feature in training different types of neural networks, and may inspire new optimizer designs.

Keywords:

Deep learning theory

Gradient descent

Learning rate

Explainable multi-task learning for multi-modality biological data analysis

AI ScalabilityJournal paper
X. Tang, J. Zhang, Y. He, X. Zhang, Z. Lin, S. Partarrieu, E. Hanna, Z. Ren, H. Shen, Y. Yang, X. Wang, N. Li, J. Ding, J. Liu
Nature Communications (Editors’ Highlight)
Publication year: 2023

Abstract:

Current biotechnologies can simultaneously measure multiple high-dimensional modalities (e.g., RNA, DNA accessibility, and protein) from the same cells. A combination of different analytical tasks (e.g., multi-modal integration and cross-modal analysis) is required to comprehensively understand such data, inferring how gene regulation drives biological diversity and functions. However, current analytical methods are designed to perform a single task, only providing a partial picture of the multi-modal data. Here, we present UnitedNet, an explainable multi-task deep neural network capable of integrating different tasks to analyze single-cell multi-modality data. Applied to various multi-modality datasets (e.g., Patch-seq, multiome ATAC + gene expression, and spatial transcriptomics), UnitedNet demonstrates similar or better accuracy in multi-modal integration and cross-modal prediction compared with state-of-the-art methods. Moreover, by dissecting the trained UnitedNet with the explainable machine learning algorithm, we can directly quantify the relationship between gene expression and other modalities with cell-type specificity. UnitedNet is a comprehensive end-to-end framework that could be broadly applicable to single-cell multi-modality biology. This framework has the potential to facilitate the discovery of cell-type-specific regulation kinetics across transcriptomics and other modalities.

Keywords:

AI for healthcare

Deep learning

Single-cell biology

Demystifying Poisoning Backdoor Attacks from a Statistical Perspective

AI SafetyManuscript
Xun Xian, Ganghua Wang, Jayanth Srinivasa, Ashish Kundu, Xuan Bi, Mingyi Hong, Jie Ding
Manuscript under review
Publication year: 2023

Abstract:

The growing dependence on machine learning in real-world applications emphasizes the importance of understanding and ensuring its safety. Backdoor attacks pose a significant security risk due to their stealthy nature and potentially serious consequences. Such attacks involve embedding triggers within a learning model with the intention of causing malicious behavior when an active trigger is present while maintaining regular functionality without it. This paper evaluates the effectiveness of any backdoor attack incorporating a constant trigger, by establishing tight lower and upper boundaries for the performance of the compromised model on both clean and backdoor test data. The developed theory answers a series of fundamental but previously unsolved problems, including (1) what are the determining factors for a backdoor attack’s success, (2) what is the most effective backdoor attack, and (3) when will a human-imperceptible trigger succeed. The experimental outcomes corroborate the established theory.

Keywords:

Adversarial learning

Backdoor attack

Statistical analysis

Assisted Learning for Organizations with Limited Imbalanced Data

AI FoundationsDecentralized AIJournal paper
Cheng Chen, Jiaying Zhou, Jie Ding, Yi Zhou
Transactions on Machine Learning Research
Publication year: 2023

Abstract:

In the era of big data, many big organizations are integrating machine learning into their work pipelines to facilitate data analysis. However, the performance of their trained models is often restricted by limited and imbalanced data available to them. In this work, we develop an assisted learning framework for assisting organizations to improve their learning performance. The organizations have sufficient computation resources but are subject to stringent data-sharing and collaboration policies. Their limited imbalanced data often cause biased inference and sub-optimal decision-making. In assisted learning, an organizational learner purchases assistance service from an external service provider and aims to enhance its model performance within only a few assistance rounds. We develop effective stochastic training algorithms for both assisted deep learning and assisted reinforcement learning. Different from existing distributed algorithms that need to transmit gradients or models frequently, our framework allows the learner to only occasionally share information with the service provider, but still, obtain a model that achieves near-oracle performance as if all the data were centralized.

Keywords:

Assisted Learning

Imbalanced Data

Adaptive Continual Learning: Rapid Adaptation and Knowledge Refinement

AI ScalabilityManuscript
Jin Du, Yuhong Yang, Jie Ding
Manuscript under review
Publication year: 2023

Abstract:

Continual learning (CL) is an emerging research area aiming to emulate human learning throughout a lifetime. Most existing CL approaches primarily focus on mitigating catastrophic forgetting, a phenomenon where performance on old tasks declines while learning new ones. However, human learning involves not only re-learning knowledge but also quickly recognizing the current environment, recalling related knowledge, and refining it for improved performance. In this work, we introduce a new problem setting, Adaptive CL, which captures these aspects in an online, recurring task environment without explicit task boundaries or identities. We propose the LEARN algorithm to efficiently explore, recall, and refine knowledge in such environments. We provide theoretical guarantees from two perspectives: online prediction with tight regret bounds and asymptotic consistency of knowledge. Additionally, we present a scalable implementation that requires only first-order gradients for training deep learning models. Our experiments demonstrate that the LEARN algorithm is highly effective in exploring, recalling, and refining knowledge in adaptive CL environments, resulting in superior performance compared to competing methods.

Keywords:

Continual learning

Online streaming data

 

A Unified Framework for Inference-Stage Backdoor Defenses

AI SafetyManuscript
Xun Xian, Ganghua Wang, Jayanth Srinivasa, Ashish Kundu, Xuan Bi, Mingyi Hong, Jie Ding
Manuscript under review
Publication year: 2023

Abstract:

Backdoor attacks involve inserting poisoned samples during training, resulting in a model containing a hidden backdoor that can trigger specific behaviors without impacting performance on normal samples. These attacks are challenging to detect, as the backdoored model appears normal until activated by the backdoor trigger, rendering them particularly stealthy. In this study, we devise a unified inference-stage detection framework to defend against backdoor attacks. We first rigorously formulate the inference-stage backdoor detection problem, encompassing various existing methods, and discuss several challenges and limitations. We then propose a framework with provable guarantees on the false positive rate or the probability of misclassifying a clean sample. Further, we derive the most powerful detection rule to maximize the detection power, namely the rate of accurately identifying a backdoor sample, given a false positive rate under classical learning scenarios. Based on the theoretically optimal detection rule, we suggest a practical and effective approach for real-world applications based on the latent representations of backdoored deep nets. We extensively evaluate our method on 12 different backdoor attacks using Computer Vision (CV) and Natural Language Processing (NLP) benchmark datasets. The experimental findings align with our theoretical results. We significantly surpass the state-of-the-art methods, e.g., up to 300% improvement on the detection power as evaluated by AUCROC, over the state-of-the-art defense against advanced adaptive backdoor attacks.

Keywords

Backdoor defense

Data poisoning

A Framework for Incentivized Collaborative Learning

Decentralized AIManuscript
Xinran Wang, Qi Le, Ahmad Faraz Khan, Jie Ding, Ali Anwar
Manuscript under review
Publication year: 2023

Abstract:

Collaborations among various entities, such as companies, research labs, AI agents, and edge devices, have become increasingly crucial for achieving machine learning tasks that cannot be accomplished by a single entity alone. This is likely due to factors such as security constraints, privacy concerns, and limitations in computation resources. As a result, collaborative learning (CL) research has been gaining momentum. However, a significant challenge in practical applications of CL is how to effectively incentivize multiple entities to collaborate before any collaboration occurs. In this study, we propose ICL, a general framework for incentivized collaborative learning, and provide insights into the critical issue of when and why incentives can improve collaboration performance. Furthermore, we show the broad applicability of ICL to specific cases in federated learning, assisted learning, and multi-armed bandit with both theory and experimental results.

Keywords:

Collaborative learning

Incentives

Understanding Model Extraction Games

AI SafetyConference paper
Xun Xian, Mingyi Hong, Jie Ding
2022 IEEE 4th International Conference on Trust, Privacy and Security in Intelligent Systems, and Applications (TPS-ISA)
Publication year: 2022

Abstract:

The privacy of machine learning models has become a significant concern in many emerging Machine-Learning-as- a-Service applications, where prediction services based on well-trained models are offered to users via the pay-per-query scheme. However, the lack of a defense mechanism can impose a high risk on the privacy of the server’s model since an adversary could efficiently steal the model by querying only a few ‘good’ data points. The game between a server’s defense and an adversary’s attack inevitably leads to an arms race dilemma, as commonly seen in Adversarial Machine Learning. To study the fundamental tradeoffs between model utility from a benign user’s view and privacy from an adversary’s view, we develop new metrics to quantify such tradeoffs, analyze their theoretical properties, and develop an optimization problem to understand the optimal adversarial attack and defense strategies. The developed concepts and theory match the empirical findings on the ‘equilibrium’ between privacy and utility. In terms of optimization, the key ingredient that enables our results is a unified representation of the attack-defense problem as a min-max bi-level problem. The developed results are demonstrated by examples and empirical experiments.

Keywords:

 

Targeted Cross-Validation

AI FoundationsJournal paper
Jiawei Zhang, Jie Ding, Yuhong Yang
Bernoulli
Publication year: 2022

Abstract:

In many applications, we have access to the complete dataset but are only interested in the prediction of a particular region of predictor variables. A standard approach is to find the globally best modeling method from a set of candidate methods. However, it is perhaps rare in reality that one candidate method is uniformly better than the others. A natural approach for this scenario is to apply a weighted L2 loss in performance assessment to reflect the region-specific interest. We propose targeted cross-validation (TCV) to select models or procedures based on a general weighted L2 loss. We show that the TCV is consistent in selecting the best performing candidate under the weighted L2 loss. Experimental studies are used to demonstrate the use of TCV and its potential advantage over the global CV or the approach of using only local data for modeling a local region.

Previous investigations on CV have relied on the condition that when the sample size is large enough, the ranking of two candidates stays the same. However, in many applications with the setup of changing data-generating processes or highly adaptive modeling methods, the relative performance of the methods is not static as the sample size varies. Even with a fixed data-generating process, it is possible that the ranking of two methods switches infinitely many times. In this work, we broaden the concept of selection consistency by allowing the best candidate to switch as the sample size varies, and then establish the consistency of the TCV. This flexible framework can be applied to high-dimensional and complex machine learning scenarios where the relative performances of modeling procedures are dynamic.

Keywords:

Consistency

Cross-validation

Model selection

Regression

SemiFL: Communication Efficient Semi-Supervised Federated Learning with Unlabeled Clients

AI FoundationsConference paperDecentralized AI
Enmao Diao, Jie Ding, Vahid Tarokh
Conference on Neural Information Processing Systems (NeurIPS)
Publication year: 2022

Abstract:

Federated Learning allows training machine learning models by using the computation and private data resources of many distributed clients such as smartphones and IoT devices. Most existing works on Federated Learning (FL) assume the clients have ground-truth labels. However, in many practical scenarios, clients may be unable to label task-specific data, e.g., due to a lack of expertise. This work considers a server that hosts a labeled dataset and wishes to leverage clients with unlabeled data for supervised learning. We propose a new Federated Learning framework referred to as SemiFL to address Semi-Supervised Federated Learning (SSFL). In SemiFL, clients have completely unlabeled data, while the server has a small amount of labeled data. SemiFL is communication efficient since it separates the training of server-side supervised data and client-side unsupervised data. We demonstrate several strategies of SemiFL that enhance efficiency and prediction and develop intuitions of why they work. In particular, we provide a theoretical understanding of the use of strong data augmentation for Semi-Supervised Learning (SSL), which can be interesting in its own right. Extensive empirical evaluations demonstrate that our communication efficient method can significantly improve the performance of a labeled server with unlabeled clients. Moreover, we demonstrate that SemiFL can outperform many existing SSFL methods, and perform competitively with the state-of-the-art FL and centralized SSL results. For instance, in standard communication efficient scenarios, our method can perform 93% accuracy on the CIFAR10 dataset with only 4000 labeled samples at the server. Such accuracy is only 2% away from the result trained from 50000 fully labeled data, and it improves about 30% upon existing SSFL methods in the communication efficient setting.

Keywords:

Federated Learning

Semi-Supervised Learning

Data augmentation theory

Unlabeled data

Self-Aware Personalized Federated Learning

Conference paperDecentralized AI
Huili Chen, Jie Ding, Eric Tramel, Shuang Wu, Anit Kumar Sahu, Salman Avestimehr, Tao Zhang
Conference on Neural Information Processing Systems (NeurIPS)
Publication year: 2022

Abstract:

In the context of personalized federated learning (FL), the critical challenge is to balance local model improvement and global model tuning when the personal and global objectives may not be exactly aligned. Inspired by Bayesian hierarchical models, we develop a self-aware personalized FL method where each client can automatically balance the training of its local personal model and the global model that implicitly contributes to other clients’ training. Such a balance is derived from the inter-client and intra-client uncertainty quantification. A larger inter-client variation implies more personalization is needed. Correspondingly, our method uses uncertainty-driven local training steps and aggregation rule instead of conventional local fine-tuning and sample size-based aggregation. With experimental studies on synthetic data, Amazon Alexa audio data, and public datasets such as MNIST, FEMNIST and Sent140, we show that our proposed method can achieve significantly improved personalization performance compared with the existing counterparts.

Keywords:

Bayesian hierarchical model

Personalized federated learning

Regression with Set-Valued Categorical Predictors

AI SafetyJournal paper
Ganghua Wang, Jie Ding and Yuhong Yang
Statistica Sinica
Publication year: 2022

Abstract:

We address the regression problem with a new form of data that arises from data privacy applications. Instead of point values, the observed explanatory variables are subsets containing each individual’s original value. The classical regression analyses such as least squares are not applicable since the set-valued predictors only carry partial information about the original values. We propose a computationally efficient subset least squares method to perform regression for such data. We establish upper bounds of the prediction loss and risk in terms of the subset structure, the model structure, and the data dimension. The error rates are shown to be optimal under some common situations. Furthermore, we develop a model selection method to identify the most appropriate model for prediction. Experiment results on both simulated and real-world datasets demonstrate the promising performance of the proposed method.

Keywords:

Model selection

Regression

Set-valued data

Mismatched Supervised Learning

AI SafetyConference paper
Xun Xian, Mingyi Hong, Jie Ding
2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
Publication year: 2022

Abstract:

Supervised learning scenarios, where labels and features are possibly mismatched, have been an emerging concern in machine learning applications. For example, researchers often need to align heterogeneous data from multiple resources to the same entities without a unique identifier in the socioeconomic study. Such a mismatch problem can significantly affect learning performance if it is not appropriately addressed. Due to the combinatorial nature of the mismatch problem, existing methods are often designed for small datasets and simple linear models but are not scalable to large-scale datasets and complex models. In this paper, we first present a new formulation of the mismatch problem that supports continuous optimization problems and allows for gradient-based methods. Moreover, we develop a computation and memory-efficient method to process complex data and models. Empirical studies on synthetic and real-world data show significantly better performance of the proposed algorithms than state-of-the-art methods.

Keywords:

Label mismatch

Supervised learning

Meta Clustering for Collaborative Learning

Decentralized AIJournal paper
Chenglong Ye, Reza Ghanadan, Jie Ding
Journal of Computational and Graphical Statistics
Publication year: 2022

Abstract:

An emerging number of learning scenarios involve a set of learners or analysts, each equipped with a unique dataset and algorithm, who may collaborate to enhance their learning performance. From a particular learner’s perspective, a careless collaboration with task-irrelevant other learners is likely to incur modeling error. A crucial challenge is to search for the most appropriate collaborators so that their data and modeling resources can be effectively leveraged. Motivated by this, we propose to study the problem of ‘meta clustering,’ where the goal is to identify subsets of relevant learners whose collaboration will improve each learner’s performance. In particular, we study the scenario where each learner performs a supervised regression, and the meta clustering aims to categorize the underlying supervised relations (between responses and predictors) instead of the private raw data. We propose a general method named Select-Exchange-Cluster (SEC) for performing such a clustering. Our method is computationally efficient as it does not require each learner to exchange their raw data. We prove that the SEC method can accurately cluster the learners into appropriate collaboration sets according to their underlying regression functions. Synthetic and real data examples show the desired performance and wide applicability of the SEC to various learning tasks.

Keywords:

Distributed computing
Fairness
Meta clustering
Regression

Interval Privacy: A Framework for Privacy-Preserving Data Collection

AI FoundationsAI SafetyJournal paper
Jie Ding, Bangjun Ding
IEEE Transactions on Signal Processing
Publication year: 2022

Abstract:

The emerging public awareness and government regulations of data privacy motivate new paradigms of collecting and analyzing data transparent and acceptable to data owners. We present a new concept of privacy and corresponding data formats, mechanisms, and theories for privatizing data during data collection. The privacy, named Interval Privacy, enforces the raw data conditional distribution on the privatized data to be the same as its unconditional distribution over a nontrivial support set. Correspondingly, the proposed privacy mechanism will record each data value as a random interval (or, more generally, a range) containing it. The proposed interval privacy mechanisms can be easily deployed through survey-based data collection interfaces, e.g., by asking a respondent whether its data value is within a randomly generated range. Another unique feature of interval mechanisms is that they obfuscate the truth but do not perturb it. Using narrowed range to convey information is complementary to the popular paradigm of perturbing data. Also, the interval mechanisms can generate progressively refined information at the discretion of individuals, naturally leading to privacy-adaptive data collection. We develop different aspects of theory such as composition, robustness, distribution estimation, and regression learning from interval-valued data. Interval privacy provides a new perspective of human-centric data privacy where individuals have a perceptible, transparent, and simple way of sharing sensitive data.

Keywords:

Data collection
Data privacy
Interval mechanism
Local privacy

 

GAL: Gradient Assisted Learning for Decentralized Multi-Organization Collaborations

AI SafetyAI ScalabilityConference paperDecentralized AI
Enmao Diao, Jie Ding, Vahid Tarokh
36th Conference on Neural Information Processing Systems (NeurIPS 2022)
Publication year: 2022

Abstract:

Collaborations among multiple organizations, such as financial institutions, medical centers, and retail markets in decentralized settings are crucial to providing improved service and performance. However, the underlying organizations may have little interest in sharing their local data, models, and objective functions. These requirements have created new challenges for multi-organization collaboration. In this work, we propose Gradient Assisted Learning (GAL), a new method for multiple organizations to assist each other in supervised learning tasks without sharing local data, models, and objective functions. In this framework, all participants collaboratively optimize the aggregate of local loss functions, and each participant autonomously builds its own model by iteratively fitting the gradients of the overarching objective function. We also provide asymptotic convergence analysis and practical case studies of GAL. Experimental studies demonstrate that GAL can achieve performance close to centralized learning when all data, models, and objective functions are fully disclosed.

Keywords:

Assisted learning

Privacy

FedNAS: Federated Deep Learning via Neural Architecture Search

Conference paperDecentralized AI
Chaoyang He, Erum Mushtaq, Jie Ding, Salman Avestimehr
Manuscript
Publication year: 2022

Abstract:

Federated Learning (FL) is an effective learning framework used when data cannot be centralized due to privacy, communication costs, and regulatory restrictions. While there have been many algorithmic advances in FL, significantly less effort has been made to model development, and most works in FL employ predefined model architectures discovered in the centralized environment. However, these predefined architectures may not be the optimal choice for the FL setting since the user data distribution at FL users is often non-identical and independent distribution (nonIID). This well-known challenge in FL has often been studied at the optimization layer. Instead, we advocate for a different (and complementary) approach. We propose Federated Neural Architecture Search (FedNAS) for automating the model design process in FL. More specifically, FedNAS enables scattered workers to search for better architecture in a collaborative fashion to achieve higher accuracy. Beyond automating and improving FL model design, FedNAS also provides a new paradigm for personalized FL via customizing not only the model weights but also the neural architecture of each user. As such, we also compare FedNAS with representative personalized FL methods, including perFedAvg (based on meta-learning), Ditto (bi-level optimization), and local fine-tuning. Our experiments on a non-IID dataset show that the architecture searched by FedNAS can outperform the manually predefined architecture as well as existing personalized FL methods. To facilitate further research and real-world deployment, we also build a realistic distributed training system for FedNAS, which will be publicly available and maintained regularly.

 

 

L1 Regularization in Two-Layer Neural Networks

AI FoundationsJournal paper
Gen Li, Yuantao Gu, Jie Ding
IEEE Signal Processing Letters
Publication year: 2021

Abstract:

A crucial problem of neural networks is to select an architecture that strikes appropriate tradeoffs between underfitting and overfitting. This work shows that 1 regularizations for two-layer neural networks can control the generalization error and sparsify the input dimension. In particular, with an appropriate 1 regularization on the output layer, the network can produce a tight statistical risk. Moreover, an appropriate 1 regularization on the input layer leads to a risk bound that does not involve the input data dimension. The results also indicate that training a wide neural network with a suitable regularization provides an alternative bias-variance tradeoff to selecting from a candidate set of neural networks. Our analysis is based on a new integration of dimension-based and norm-based complexity analysis to bound the generalization error.

Keywords:

Generalization error
Regularization
Statistical risk
Two-layer neural network

On Statistical Efficiency in Learning

AI FoundationsJournal paper
Jie Ding, Enmao Diao, Jiawei Zhou, Vahid Tarokh
IEEE Transactions on Information Theory, Volume 67, Issue 4, Pages 2488 - 2506
Publication year: 2021

Abstract:

A central issue of many statistical learning problems is to select an appropriate model from a set of candidate models. Large models tend to inflate the variance (e.g., overfitting), while small models tend to cause biases (e.g., underfitting) for a given fixed dataset. In this work, we address the critical challenge of model selection to strike a balance between model fitting and model complexity, thus gaining reliable predictive power. We consider the task of approaching the theoretical limit of statistical learning, meaning that the selected model has the predictive performance that is as good as the best possible model given a class of potentially misspecified candidate models.

We propose a generalized notion of Takeuchi’s information criterion and prove that the proposed method can asymptotically achieve the optimal out-sample prediction loss under reasonable assumptions. It is the first proof of the asymptotic property of Takeuchi’s information criterion to our best knowledge. Our proof applies to a wide variety of nonlinear models, loss functions, and high dimensionality (in the sense that the models’ complexity can grow with sample size). The proposed method can be used as a computationally efficient surrogate for leave-one-out cross-validation. Moreover, for modeling streaming data, we propose an online algorithm that sequentially expands the model complexity to enhance selection stability and reduce computation cost. Experimental studies show that the proposed method has desirable predictive power and much less computational cost than some popular methods.

Keywords:

Cross-validation
Expert learning
Feature selection
Limit of learning
Model expansion

Model Linkage Selection for Cooperative Learning

Decentralized AIJournal paper
Jiaying Zhou, Jie Ding, Kean Ming Tan, Vahid Tarokh
Journal of Machine Learning Research, 2021
Publication year: 2021

Abstract:

Rapid developments in data collecting devices and computation platforms produce an emerging number of learners and data modalities in many scientific domains. We consider the setting in which each learner holds a pair of parametric statistical model and a specific data source, with the goal of integrating information across a set of learners to enhance the prediction accuracy of a specific learner. One natural way to integrate information is to build a joint model across a set of learners that shares common parameters of interest. However, the parameter sharing patterns across a set of learners are not known a priori. In this paper, we propose a novel framework for integrating information across a set of learners that is robust against model misspecification and misspecified parameter sharing patterns. The main crux is to sequentially incorporate additional learners that can enhance the prediction accuracy of an existing joint model based on user-specified parameter sharing patterns across a set of learners.

Keywords:

Data integration
Distributed learning
Model linkage selection

Is a Classification Procedure Good Enough?--A Goodness-of-Fit Assessment Tool for Classification Learning

AI FoundationsJournal paper
Jiawei Zhang, Jie Ding, Yuhong Yang
Journal of the American Statistical Association
Publication year: 2021

Abstract:

In recent years, many non-traditional classification methods, such as Random Forest, Boosting, and neural network, have been widely used in applications. Their performance is typically measured in terms of classification accuracy. While the classification error rate and the like are important, they do not address a fundamental question: Is the classification method underfitted? To our best knowledge, there is no existing method that can assess the goodness-of-fit of a general classification procedure. Indeed, the lack of a parametric assumption makes it challenging to construct proper tests. To overcome this difficulty, we propose a methodology called BAGofT that splits the data into a training set and a validation set. First, the classification procedure to assess is applied to the training set, which is also used to adaptively find a data grouping that reveals the most severe regions of underfitting. Then, based on this grouping, we calculate a test statistic by comparing the estimated success probabilities and the actual observed responses from the validation set. The data splitting guarantees that the size of the test is controlled under the null hypothesis, and the power of the test goes to one as the sample size increases under the alternative hypothesis. For testing parametric classification models, the BAGofT has a broader scope than the existing methods since it is not restricted to specific parametric models (e.g., logistic regression). Extensive simulation studies show the utility of the BAGofT when assessing general classification procedures and its strengths over some existing methods when testing parametric classification models.

Keywords:

Goodness-of-fit test

Cassification procedure

Adaptive partition

Information Laundering for Model Privacy

AI SafetyConference paper
Xinran Wang, Yu Xiang, Jun Gao, Jie Ding
International Conference on Learning Representations (ICLR), spotlight
Publication year: 2021

Abstract:

In this work, we propose information laundering, a novel framework for enhancing model privacy. Unlike data privacy that concerns the protection of raw data information, model privacy aims to protect an already-learned model that is to be deployed for public use. The private model can be obtained from general learning methods, and its deployment means that it will return a deterministic or random response for a given input query. An information laundered model consists of probabilistic components that deliberately maneuver the intended input and output for queries to the model, so the model’s adversarial acquisition is less likely. Under the proposed framework, we develop an information-theoretic principle to quantify the fundamental tradeoffs between model utility and privacy leakage and derive the optimal design.

Keywords:

Information theory
Model privacy
Optimal privacy-utility tradeoff

HeteroFL: Computation and Communication Efficient Federated Learning for Heterogeneous Clients

Conference paperDecentralized AI
Enmao Diao, Jie Ding, Vahid Tarokh
International Conference on Learning Representations (ICLR)
Publication year: 2021

Abstract:

Federated Learning (FL) is a method of training machine learning models on private data distributed over a large number of possibly heterogeneous clients such as mobile phones and IoT devices. In this work, we propose a new federated learning framework named HeteroFL to address heterogeneous clients equipped with very different computation and communication capabilities. Our solution can enable the training of heterogeneous local models with varying computation complexities and still produce a single global inference model. For the first time, our method challenges the underlying assumption of existing work that local models have to share the same architecture as the global model. We demonstrate several strategies to enhance FL training and conduct extensive empirical evaluations, including five computation complexity levels of three model architecture on three datasets. We show that adaptively distributing subnetworks according to clients’ capabilities is both computation and communication efficient.

Keywords:

Federated learning
Heterogeneous clients

Fisher Auto-Encoders

AI FoundationsConference paper
Khalil Elkhalil, Ali Hasan, Jie Ding, Sina Farsiu, Vahid Tarokh
International Conference on Artificial Intelligence and Statistics (AISTATS)
Publication year: 2021

It has been conjectured that the Fisher divergence is more robust to model uncertainty than the conventional Kullback-Leibler (KL) divergence. This motivates the design of a new class of robust generative auto-encoders (AE) referred to as Fisher auto-encoders. Our approach is to design Fisher AEs by minimizing the Fisher divergence between the intractable joint distribution of observed data and latent variables, with that of the postulated/modeled joint distribution. In contrast to KL-based variational AEs (VAEs), the Fisher AE can exactly quantify the distance between the true and the model-based posterior distributions. Qualitative and quantitative results are provided on both MNIST and celebA datasets demonstrating the competitive performance of Fisher AEs in terms of robustness compared to other AEs such as VAEs and Wasserstein AEs.

Speech Emotion Recognition with Dual-Sequence LSTM Architecture

Conference paperMiscellaneous
Jianyou Wang, Michael Xue, Ryan Culhane, Enmao Diao, Jie Ding, Vahid Tarokh
International Conference on Acoustics, Speech, & Signal Processing (ICASSP), 2020
Publication year: 2020

Abstract:

Speech Emotion Recognition (SER) has emerged as a critical component of the next generation of human-machine interfacing technologies. In this work, we propose a new dual-level model that predicts emotions based on both MFCC features and mel-spectrograms produced from raw audio signals. Each utterance is preprocessed into MFCC features and two mel-spectrograms at different time-frequency resolutions. A standard LSTM processes the MFCC features, while a novel LSTM architecture, denoted as Dual-Sequence LSTM (DSLSTM), processes the two mel-spectrograms simultaneously. The outputs are later averaged to produce a final classification of the utterance. Our proposed model achieves, on average, a weighted accuracy of 72.7% and an unweighted accuracy of 73.3%—a 6% improvement over current state-of-the-art unimodal models—and is comparable with multimodal models
that leverage textual information as well as audio signals.

Keywords:

Dual-Sequence LSTM
MelSpectrogram
Speech emotion recognition
Time series

Large Deviation Principle for the Whittaker 2d Growth Model

ManuscriptMiscellaneous
Jun Gao, Jie Ding
arXiv preprint arXiv:2009.12907
Publication year: 2020

Abstract:

The Whittaker 2d growth model is a triangular continuous Markov diffusion process that appears in many scientific contexts. It has been theoretically intriguing to establish a large deviation principle for this 2d process with a scaling factor. The main challenge is the spatiotemporal interactions and dynamics that may depend on potential sample-path intersections. We develop such a principle with a novel rate function. Our approach is based on Schider’s Theorem, contraction principle, and special treatment for intersecting sample paths.

Keywords:

Large deviation principle
Markov diffusion process

Dyson Brownian Motion as a Limit of the Whittaker 2d Growth Model

ManuscriptMiscellaneous
Jun Gao, Jie Ding
Manuscript
Publication year: 2020

Abstract:

This paper proves that a class of scaled Whittaker growth models will converge in distribution to the Dyson Brownian motion. A Whittaker 2d growth model is a continuous-time Markov diffusion process embedded on a spatial triangular array. Our result is interesting because each particle in a Whittaker 2d growth model only interacts with its neighboring particles. In contrast, each particle in the Dyson Brownian motion interacts with all the other particles. We provide two different proofs of the main result.

Keywords:

Stochastic differential equations
Dyson Brownian motion

DRASIC: Distributed Recurrent Autoencoder for Scalable Image Compression

AI ScalabilityConference paper
Diao, Enmao and Ding, Jie and Tarokh, Vahid
Data Compression Conference
Publication year: 2020

Abstract:

We propose a new architecture for distributed image compression from a group of distributed data sources. The work is motivated by practical needs of data-driven codec design, low power consumption, robustness, and data privacy. The proposed architecture, which we refer to as Distributed Recurrent Autoencoder for Scalable Image Compression (DRASIC), is able to train distributed encoders and one joint decoder on correlated data sources. Its compression capability is much better than the method of training codecs separately. Meanwhile, the performance of our distributed system with 10 distributed sources is only within 2 dB peak signal-to-noise ratio (PSNR) of the performance of a single codec trained with all data sources. We experiment distributed sources with different correlations and show how our data-driven methodology well matches the Slepian-Wolf Theorem in Distributed Source Coding (DSC). To the best of our knowledge, this is the first data-driven DSC framework for general distributed code design with deep learning.

Keywords:

Codecs

Data compression

Image coding

Recurrent neural nets

Deep Clustering of Compressed Variational Embeddings

AI ScalabilityConference paper
Suya Wu, Enmao Diao, Jie Ding, Vahid Tarokh
Data Compression Conference, 2020
Publication year: 2020

Abstract:

Motivated by the ever-increasing demands for limited communication bandwidth and low-power consumption, we propose a new methodology, named joint Variational Autoencoders with Bernoulli mixture models (VAB), for performing clustering in the compressed data domain. The idea is to reduce the data dimension by Variational Autoencoders (VAEs) and group data representations by Bernoulli mixture models. Once jointly trained for compression and clustering, the model can be decomposed into two parts: a data vendor that encodes the raw data into compressed data, and a data consumer that classifies the received (compressed) data. In this way, the data vendor benefits from data security and communication bandwidth, while the data consumer benefits from low computational complexity. To enable training using the gradient descent algorithm, we propose to use the GumbelSoftmax distribution to resolve the infeasibility of the backpropagation algorithm when assessing categorical samples.

Keywords:

Unsupervised learning
Variational autoencoder
Bernoulli Mixture Model

Assisted Learning: A Framework for Multi-Organization Learning

Conference paperDecentralized AI
Xun Xian, Xinran Wang, Jie Ding, Reza Ghanadan
Conference on Neural Information Processing Systems (NeurIPS), Spotlight, 2020
Publication year: 2020

Abstract:

In an increasing number of AI scenarios, collaborations among different organizations or agents (e.g., human and robots, mobile units) are often essential to accomplish an organization-specific mission. However, to avoid leaking useful and possibly proprietary information, organizations typically enforce stringent security constraints on sharing modeling algorithms and data, which significantly limits collaborations. In this work, we introduce the Assisted Learning framework for organizations to assist each other in supervised learning tasks without revealing any organization’s algorithm, data, or even task.
An organization seeks assistance by broadcasting task-specific but nonsensitive statistics and incorporating others’ feedback in one or more iterations to eventually improve its predictive performance. Theoretical and experimental studies, including real-world medical benchmarks, show that Assisted Learning can often achieve near-oracle learning performance as if data and training processes were centralized.

Keywords:

Assisted AI
Autonomy
MLaaS
Organization’s learning
Privacy

Restricted Recurrent Neural Networks

AI ScalabilityConference paper
Enmao Diao, Jie Ding, Vahid Tarokh
IEEE International Conference on Big Data, 2019
Publication year: 2019

Abstract:

Recurrent Neural Network (RNN) and its variations, such as Long Short-Term Memory (LSTM) and Gated Recurrent Unit (GRU), have become standard building blocks for learning online data of sequential nature in many research areas, including natural language processing and speech data analysis. In this paper, we present a new methodology to significantly reduce the number of parameters in RNNs while maintaining performance that is comparable or even better than classical RNNs.

The new proposal, referred to as Restricted Recurrent Neural Network (RRNN), restricts the weight matrices corresponding to the input data and hidden states at each time step to share a large proportion of parameters. The new architecture can be regarded as compression of its classical counterpart, but it does not require pre-training or sophisticated parameter fine-tuning, both of which are major issues in most existing compression techniques. Experiments on natural language modeling show that compared with its classical counterpart, the restricted recurrent architecture generally produces comparable results at about 50% compression rate. In particular, the Restricted LSTM can outperform classical RNN with even less number of parameters.

Keywords:

Recurrent neural network
Long short-term memory
Gated recurrent unit

Gradient Information for Representation and Modeling

AI ScalabilityConference paper
Jie Ding, Robert Calderbank, Vahid Tarokh
Conference on Neural Information Processing Systems (NeurIPS), 2019
Publication year: 2019

Abstract:

Motivated by Fisher divergence, we present a new set of information quantities, which we refer to as gradient information. These measures serve as surrogates for classical information measures such as those based on logarithmic loss, Kullback-Leibler divergence, directed Shannon information, etc. in many data-processing scenarios of interest and often provide a significant computational advantage, improved stability, and robustness. As an example, we apply these measures to the Chow-Liu tree algorithm and demonstrate its performance using both synthetic and real data.

Keywords:

Capacity
Fisher divergence
Information
Stability
Chow-Liu tree approximation

Evolutionary Spectra Based on the Multitaper Method with Application to Stationarity Test

AI FoundationsJournal paper
Yu Xiang, Jie Ding, Vahid Tarokh
IEEE Transactions on Signal Processing, 67(5): 1353—1365, 2019
Publication year: 2019

Abstract:

In this work, we propose a new inference procedure for understanding non-stationary processes under the framework of evolutionary spectra developed by Priestley. Among various frameworks of non-stationary modeling, the distinguishing feature of the evolutionary spectra is its focus on the physical meaning of frequency. The classical estimate of the evolutionary spectral density is based on a double-window technique consisting of a short-time Fourier transform and a smoothing. However, smoothing is known to suffer from the so-called bias leakage problem. By incorporating Thomson’s multitaper method that was originally designed for stationary processes, we propose an improved estimate of the evolutionary spectral density and analyze its bias/variance/resolution tradeoff. As an application of the new estimate, we further propose a non-parametric rank-based stationarity test and provide various experimental studies.

Keywords:

Non-stationary processes
Evolutionary spectra
Spectral analysis
Multitaper method
Stationarity test

Bayesian Model Comparison with the Hyvärinen Score: Computation and Consistency

AI ScalabilityJournal paper
Stephane Shao, Pierre E. Jacob, Jie Ding, Vahid Tarokh
Journal of the American Statistical Association, 114(528): 1826—1837, 2019
Publication year: 2019

Abstract:

The Bayes factor is a widely used criterion in model comparison, and its logarithm is a difference of out-of-sample predictive scores under the logarithmic scoring rule. However, when some of the candidate models involve vague priors on their parameters, the log-Bayes factor features an arbitrary additive constant that hinders its interpretation. As an alternative, we consider model comparison using the Hyvärinen score. We propose a method to consistently estimate this score for parametric models, using sequential Monte Carlo methods. We show that this score can be estimated for models with tractable likelihoods as well as nonlinear non-Gaussian state-space models with intractable likelihoods. We prove the asymptotic consistency of this new model selection criterion under strong regularity assumptions in the case of non-nested models, and we provide qualitative insights for the nested case. We also use existing characterizations of proper scoring rules on discrete spaces to extend the Hyvärinen score to discrete observations. Our numerical illustrations include Lévy-driven stochastic volatility models and diffusion models for population dynamics. Supplementary materials for this article are available online.

Keywords:

Bayes factor
Noninformative prior
Model selection
Sequential Monte Carlo
State-space model

Asymptotically Optimal Prediction for Time-Varying Data Generating Processes

AI ScalabilityJournal paper
Jie Ding, Jiawei Zhou, Vahid Tarokh
IEEE Transactions on Information Theory, 65(5): 3034—3067, 2019
Publication year: 2019

Abstract:

We develop a methodology (referred to as kinetic prediction) for predicting time series undergoing unknown changes in their data generating distributions. Based on Kolmogorov-Tikhomirov’s ε-entropy, we propose a concept called ε-predictability that quantifies the size of a model class (which can be parametric or nonparametric) and the maximal number of abrupt structural changes that guarantee the achievability of asymptotically optimal prediction. Moreover, for parametric distribution families, we extend the aforementioned kinetic prediction with discretized function spaces to its counterpart with continuous function spaces and propose a sequential Monte Carlo based implementation.

We also extend our methodology for predicting smoothly varying data generating distributions. Under reasonable assumptions, we prove that the average predictive performance converges almost surely to the oracle bound, which corresponds to the case that the data generating distributions are known in advance. The results also shed some light on the so-called “prediction-inference dilemma.” Various examples and numerical results are provided to demonstrate the wide applicability of our methodology.

Keywords:

Change points
Kinetic prediction
ε-entropy
Optimal prediction
Sequential Monte-Carlo
Smooth variations
Online tracking

Online Learning for Multimodal Data Fusion with Application to Object Recognition

AI ScalabilityJournal paper
Shahin Shahrampour, Mohammad Noshad, Jie Ding, Vahid Tarokh
IEEE Transactions on Circuits and Systems II: Express Briefs, 65(9): 1259--1263
Publication year: 2018

Abstract:

We consider online multimodal data fusion, where the goal is to combine information from multiple modes to identify an element in a large dictionary. We address this problem in object recognition by focusing on tactile sensing as one of the modes. Using a tactile glove with seven sensors, various individuals grasp different objects to obtain 7-D time series, where each component represents the pressure sequence applied to one sensor. The pressure data of all objects is stored in a dictionary as a reference. The objective is to match a streaming vector time series from grasping an unknown object to a dictionary object. We propose an algorithm that may start with prior knowledge provided by other modes. Receiving pressure data sequentially, the algorithm uses a dissimilarity metric to modify the prior and form a probability distribution over the dictionary. When the dictionary objects are dissimilar in shape, we empirically show that our algorithm recognize the unknown object even with a uniform prior. If there exists a similar object to the unknown object in the dictionary, our algorithm needs the prior from other modes to detect the unknown object. Notably, our algorithm maintains a similar performance to standard offline classification techniques, such as support vector machine, with a significantly lower computational time.

Keywords:

Object recognition
Online learning
Tactile sensing

Model Selection Techniques--An Overview

AI FoundationsJournal paper
Jie Ding, Vahid Tarokh, Yuhong Yang
IEEE Signal Processing Magazine (featured article), 35(6): 16—34, 2018
Publication year: 2018

Abstract:

In the era of big data, analysts usually explore various statistical models or machine-learning methods for observed data to facilitate scientific discoveries or gain predictive power. Whatever data and fitting procedures are employed, a crucial step is to select the most appropriate model or method from a set of candidates. Model selection is the central ingredient in data analysis for reliable and reproducible statistical inference or prediction, and thus it is central to scientific studies in such fields as ecology, economics, engineering, finance, political science, biology, and epidemiology.

There has been a long history of model selection techniques that arise from researches in statistics, information theory, and signal processing. A considerable number of methods have been proposed, following different philosophies and exhibiting varying performances. The purpose of this article is to provide a comprehensive overview of them in terms of their motivation, large sample performance, and applicability. We provide integrated and practically relevant discussions on the theoretical properties of the state-of-the-art model selection approach. We also share our thoughts on some controversial views on the practice of model selection.

Keywords:

Akaike information criterion
Bayesian information criterion
Bridge criterion
Cross-validation
Model preference

Bridging AIC and BIC: A New Criterion for Autoregression

AI FoundationsJournal paper
Jie Ding, Vahid Tarokh, Yuhong Yang
IEEE Transactions on Information Theory, 64(6): 4024—4043, 2018
Publication year: 2018

Abstract:

To address order selection for an autoregressive model fitted to time series data, we propose a new information criterion. It has the benefits of the two well-known model selection techniques, the Akaike information criterion, and the Bayesian information criterion. When the data is generated from a finite order autoregression, the Bayesian information criterion is known to be consistent, and so is the new criterion. When the true order is infinity or suitably high with respect to the sample size, the Akaike information criterion is known to be efficient in the sense that its predictive performance is asymptotically equivalent to the best offered by the candidate models; in this case, the new criterion behaves in a similar manner. Different from the two classical criteria, the proposed criterion adaptively achieves either consistency or efficiency, depending on the underlying data generating process. In practice, where the observed time series is given without any prior information about the model specification, the proposed order selection criterion is more flexible and reliable compared with classical approaches. Numerical results are presented to demonstrate the adaptivity of the proposed technique when applied to various datasets.

Keywords:

Adaptivity
Akaike information criterion
Asymptotic efficiency
Bayesian information criterion
Bridge criterion
Consistency
Information criterion
Model selection

Analysis of Multistate Autoregressive Models

AI FoundationsJournal paper
Jie Ding, Shahin Shahrampour, Kathryn Heal, Vahid Tarokh
IEEE Transactions on Signal Processing, 66(9): 2429—2440, 2018
Publication year: 2018

Abstract:

In this work, we consider the inference problem for a wide class of time-series models, referred to as multistate autoregressive models. The time series that we consider are composed of multiple epochs, each modeled by an autoregressive process. The number of epochs is unknown, and the transitions of states follow a Markov process of unknown order. We propose an inference strategy that enables reliable and efficient offline analysis of this class of time series. The inference is carried out through a three-step approach: detecting the structural changes of the time series using a recently proposed multiwindow algorithm, identifying each segment as a state and selecting the most appropriate number of states, and estimating the Markov source based upon the symbolic sequence obtained from previous steps. We provide theoretical results and algorithms in order to facilitate the inference procedure described above. We demonstrate the accuracy, efficiency, and wide applicability of the proposed algorithms via an array of experiments using synthetic and real-world data.

Keywords:

Multi-regime models
Prediction
Recurring patterns
Time series

SLANTS: Sequential Adaptive Nonlinear Modeling of Time Series

AI ScalabilityJournal paper
Qiuyi Han, Jie Ding, Edoardo M. Airoldi, Vahid Tarokh
IEEE Transactions on Signal Processing, 65(19): 4994—5005, 2017
Publication year: 2017

Abstract:

We propose a method for adaptive nonlinear sequential modeling of time series data. Data are modeled as a nonlinear function of past values corrupted by noise, and the underlying nonlinear function is assumed to be approximately expandable on a spline basis. We cast the modeling of data as finding a good fit representation in the linear span of a multidimensional spline basis and use a variant of l1 -penalty regularization in order to reduce the dimensionality of representation. Using adaptive filtering techniques, we design our online algorithm to automatically tune the underlying parameters based on the minimization of the regularized sequential prediction error. We demonstrate the generality and flexibility of the proposed approach on both synthetic and real-world datasets. Moreover, we analytically investigate the performance of our algorithm by obtaining both bounds on prediction errors and consistency in variable selection.

Keywords:

Adaptive filtering
Data prediction
Nonlinearity
Sequential modeling
Spline
Time series

Multiple Change Point Analysis: Fast Implementation and Strong Consistency

AI ScalabilityJournal paper
Jie Ding, Yu Xiang, Lu Shen, Vahid Tarokh
IEEE Transactions on Signal Processing, 65(17): 4495—4510, 2017
Publication year: 2017

Abstract:

One of the main challenges in identifying structural changes in stochastic processes is to carry out analysis for time series with the dependency structure in a computationally tractable way. Another challenge is that the number of true change points is usually unknown, requiring a suitable model selection criterion to arrive at informative conclusions.

To address the first challenge, we model the data generating process as a segment-wise autoregression, which is composed of several segments (time epochs), each of which is modeled by an autoregressive model. We propose a multi-window method that is both effective and efficient for discovering the structural changes. The proposed approach was motivated by transforming a segment-wise autoregression into a multivariate time series that is asymptotically segment-wise independent and identically distributed. To address the second challenge, we derive theoretical guarantees for (almost surely) selecting the true number of change points of segment-wise independent multivariate time series. Specifically, under mild assumptions, we show that a Bayesian information criterion like criterion gives a strongly consistent selection of the optimal number of change points, while an Akaike information criterion like criterion cannot.

Finally, we demonstrate the theory and strength of the proposed algorithms by experiments on both synthetic- and real-world data, including the Eastern U.S. temperature data and the El Nino data. The experiment leads to some interesting discoveries about temporal variability of the summer-time temperature over the Eastern U.S. and about the most dominant factor of ocean influence on climate, which was also discovered by environmental scientists.

Keywords:

Change detection
Information criteria
Large deviation analysis
Strong consistency
Time series

Optimal Variable Selection in Regression Models

AI FoundationsManuscript
Jie Ding, Vahid Tarokh, Yuhong Yang
Manuscript, 2016
Publication year: 2016

Abstract:

We introduce a new criterion for variable selection in regression models and show its optimality in terms of both loss and risk under appropriate assumptions. The key idea is to impose a penalty that is nonlinear in model dimensions. In contrast to the state-of-art model selection criteria such as the Cp method, delete-1 or delete-k cross-validation, Akaike information criterion, Bayesian information criterion, the proposed method is able to achieve asymptotic loss and risk efficiency in both parametric and nonparametric regression settings, giving new insights on the reconciliation of two types of classical criteria with different asymptotic behaviors. Adaptivity and wide applicability of the new criterion are demonstrated by several numerical experiments. Unless the signal to noise ratio is very low, it performs better than some popular methods in our experimental study. An R package ‘bc’ is released that serves as a supplement to this work.

Keywords:

Regression
Subset selection
Feature selection

Complementary Lattice Arrays for Coded Aperture Imaging

Journal paperMiscellaneous
Jie Ding, Mohammad Noshad, Vahid Tarokh
Journal of the Optical Society of America, 33(5): 863—881, 2016
Publication year: 2016

Abstract:

In this work, we propose the concept of complementary lattice arrays in order to enable a broader range of designs for coded aperture imaging systems. We provide a general framework and methods that generate richer and more flexible designs compared to the existing techniques. Besides this, we review and interpret the state-of-the-art uniformly redundant array designs, broaden the related concepts, and propose new design methods.

Keywords:

Combinatorial design
Complementary sequence
Imaging systems
X-ray coded apertures

Key Pre-distributions from Graph-Based Block Designs

Journal paperMiscellaneous
Jie Ding, Abdelmadjid Bouabdallah, Vahid Tarokh
IEEE Sensors Journal, 16(6): 1842—1850, 2015
Publication year: 2015

Abstract:

With the development of wireless communication technologies that considerably contributed to wireless sensor networks (WSNs), we have witnessed ever-increasing WSN-based applications that induced a host of research activities in both academia and industry. Since most of the target WSN applications are very sensitive, the security issue is one of the major challenges in the deployment of WSN. One of the important building blocks in securing WSN is key management. Traditional key management solutions developed for other networks are not suitable for WSN since WSN networks are resource (e.g., memory, computation, and energy) limited. Key pre-distribution algorithms have recently evolved as efficient alternatives to key management in these networks. Secure communication is achieved between a pair of nodes either by a key allowing direct communication or a chain of keys. This paper considers prior knowledge of network characteristics and application constraints in terms of communication needs between sensor nodes. We propose methods to design key pre-distribution schemes to provide better security and connectivity while requiring fewer resources. Our methods are based on casting prior information as a graph. Motivated by this idea, we also propose a class of quasi-symmetric designs named g-designs. Our proposed key pre-distribution schemes significantly improve upon the existing constructions based on the unital designs. We give some examples and point out open problems for future research.

Keywords:

Balanced incomplete block design
Graph
Key pre-distribution
Quasi-symmetric design
Sensor networks

Data-Driven Learning of the Number of States in Multi-State Autoregressive Models

AI FoundationsConference paper
Jie Ding, Mohammad Noshad, Vahid Tarokh
Allerton Conference on Communication, Control, and Computing, 2015
Publication year: 2015

Abstract:

In this work, we consider the class of multi-state autoregressive processes that can be used to model non-stationary time-series of interest. In order to capture different autoregressive (AR) states underlying an observed time series, it is crucial to select the appropriate number of states. We propose a new model selection technique based on the Gap statistics, which uses a null reference distribution on the stable AR filters to check whether adding a new AR state will significantly improve the performance of the model. To that end, we define a new distance measure between AR filters and propose an efficient method to generate random stable filters that are uniformly distributed in the coefficient space. Numerical results are provided to evaluate the performance of the proposed approach.

Keywords:

Levinson-Durbin recursion
Polyhedra
Uniform distribution over Autoregressions

Perturbation Analysis of Orthogonal Matching Pursuit

Journal paperMiscellaneous
Jie Ding, Laming Chen, Yuantao Gu
IEEE Transactions on Signal Processing, 61(2): 398—410, 2012
Publication year: 2012

Abstract:

Orthogonal Matching Pursuit (OMP) is a canonical greedy pursuit algorithm for sparse approximation. Previous studies of OMP have considered the recovery of a sparse signal through Φ and y = Φx + b, where is a matrix with more columns than rows and denotes the measurement noise. In this paper, based on Restricted Isometry Property, the performance of OMP is analyzed under general perturbations, which means both y and Φ are perturbed. Though the exact recovery of an almost sparse signal x is no longer feasible, the main contribution reveals that the support set of the best k-term approximation of x can be recovered under reasonable conditions. The error bound between x and the estimation of OMP is also derived. By constructing an example it is also demonstrated that the sufficient conditions for support recovery of the best k-term approximation of are rather tight. When x is strong-decaying, it is proved that the sufficient conditions for support recovery of the best k-term approximation of x can be relaxed, and the support can even be recovered in the order of the entries’ magnitude. Our results are also compared in detail with some related previous ones.

Keywords:

Compressed sensing
Orthogonal matching pursuit
Restricted isometry property
Strong-decaying signals