sanjitsbatra _at_ gmail _dot_ com

## Publications

The Zika outbreak, spread by the Aedes aegypti mosquito, highlights the need to create high-quality assemblies of large genomes in a rapid and cost-effective fashion. Here, we combine Hi-C data with existing draft assemblies to generate chromosome-length scaffolds. We validate this method by assembling a human genome, de novo, from short reads alone (67X coverage). We then combine our method with draft sequences to create genome assemblies of the mosquito disease vectors Aedes aegypti and Culex quinquefasciatus, each consisting of three scaffolds corresponding to the three chromosomes in each species. These assemblies indicate that virtually all genomic rearrangements among these species occur within, rather than between, chromosome arms. The genome assembly procedure we describe is fast, inexpensive, accurate, and can be applied to many species.

The proposed approach, termed as the Minimal Complexity
Machine (MCM) Regressor, involves solving a simple linear programming
problem. Experimental results show, that on a number of benchmark
datasets, the proposed approach yields regressors with error rates
much less than those obtained with conventional SVM regresssors, while
often using fewer support vectors. On some benchmark datasets, the
number of support vectors is less than one tenth the number used by
SVMs, indicating that the MCM does indeed learn simpler
representations.

We suggest theuse of a simple fuzzy membership in combination with the
MCM. Experimental results show, that ona number of benchmark
datasets, the the fuzzy MCM yields comparable results to
fuzzy SVMs, andlearns representations that require far fewer support
vectors. On several benchmark datasets, the fuzzyMCM classifier yields
excellent test set accuracies while using one-tenth the number of
support vectorsused by fuzzy SVMs.

This paper employs a recently proposed filterfeature-selection
algorithm based on minimizing a tight bound on the VC dimension. We
have successfully applied this algorithm
to determine a reasonable subset of bands without any user-defined
stopping criteria on widely used hyperspectral images and
demonstrate that this method outperforms state-of-the-art methods in
terms of both sparsity of feature set as well as accuracy of
classification.

Experimental results show that the MCM learns very sparse representations; on
many datasets, the kernel MCM yields comparable or better test set accuracies
while using less than one-tenth the number of support vectors. For a linear
hyperplane classifier in the input space, the VC dimension is upper bounded by
the number of features; hence, a linear classifier with a small VC dimension is
parsimonious in the set of features it employs. In this paper, we use the linear
MCM to learn a classifier in which a large number of weights are zero; features

Time series forecasting is of fundamental importance in big data analysis. The prediction of noisy, non-stationary, and chaotic time series demands good generalization from small amounts of data. Vapnik showed that the total risk is dependent on both, empirical error as well as model complexity, where the latter may be measured in terms of the Vapnikâ€“Chervonenkis (VC) dimension. In other words, good generalization requires minimizing model complexity. The recently proposed Minimal Complexity Machine (MCM) has been shown to minimize a tight bound on the VC dimension, and has further been extended to Minimal Complexity Machine Regression (MCMR). In this paper, we present an original approach based on the MCM regressor, which builds sparse and accurate models for short-term time series forecasting. Results on a number of datasets establish that the proposed approach is superior to a number of state-of-the-art methods, and yields sparse models. These sparse models are able to extract only the most important information present in the data sets, thereby achieving high accuracy. Sparsity in time series forecasting models is also important in reducing the evaluation time. This assumes importance when the models need to be evaluated in real time, such as when they are used as part of trading flows.

### An Optimization-Free Approach to Predictive Analytical Chromatography

**Manuscript in Preparation**

For a positive and relatively prime set A, let Gamma(A) denote the set of integers that are
formed by taking nonnegative integer linear combinations of integers in A. Then
there are finitely many positive integers that do not belong to Gamma(A). For A, let
g(A) and n(A) denote the largest integer and the number of integers that do not
belong to Gamma(A), respectively. We determine both g(A) and n(A) for two sets that
arise naturally from the Fibonacci sequence and the Lucas sequence

### Some problems concerning the Frobenius number for extensions of an arithmetic progression