Nicholas Sean Escanilla

"The world is one big data problem." -Andrew McAfee

About

I am a graduate of the University of Wisconsin-Madison, Department of Computer Science, where I obtained my M.S. in Computer Science in May 2018.

My previous engagements include serving as a Machine Learning Consultant at the University of Wisconsin-Madison, a Subject Matter Expert in machine learning at Accenture, and a Data Scientist at Slalom. I am currently a Sr. Data Scientist working on deep learning applications at Verizon.

My academic, research, and industry endeavors have established in me core artificial intelligence and machine learning methods and best practices. I have had the opportunity to expand my skillset in the following areas: data science, computer vision, and health sciences. My overarching goal is to show business leaders and clients how to leverage their data for optimal growth.

Interests:

Deep Learning
Data Science
Computer Vision
Sports Statistics

Skills

Technical: AI, Data Science, Machine Learning, Deep Learning, Computer Vision

Languages: Python, R, MATLAB, Java

Personal: Learner, Analytical, Public Speaking

Education

M.S. Computer Science
UW-Madison

B.A. Mathematics
Lake Forest College

Work Experience

Sr. Data Scientist

Verizon - Denver, Colorado

Deep Learning, Machine Learning, Computer Vision, and Optimization
Technologies:

Python
AWS

Data Scientist

Slalom - Chicago, Illinois

Engaged with clients to brainstorm, gather data, and determine feasibility for potential data science projects.
Transformed ~16,000 survey data points to extract feature importance and perform NLP to identify key phrases/drivers for sentiment.
Designed a rule-based and optimization algorithm to build a portfolio recommendation system for a financial investment firm.
Developed a convolutional neural network architecture and data science pipeline to perform custom object detection and readiness based on location.
Technologies:

Python
Amazon SageMaker

Data Scientist

Accenture - Chicago, Illinois

Acted as the Subject Matter Expert in machine learning and data science.
Trained to understand ML as a Service and tools to use as part of cloud computing services.
Webscraped 3500 help articles from disparate data sources using Python.
Built a content-based recommendation system for use in a contact flow proof of concept.
Implemented convolution neural networks and recurrent neural networks on several datasets (eg. MNIST, CIFAR10) using PyTorch and Keras.
Technologies:

Python
AWS Machine Learning
Amazon SageMaker
Amazon Lambda

Certifications:

ICAgile
Machine Learning on AWS - Technical (Digital) (Certificate of Completion)

Machine Learning Consultant

Department of Computer Sciences, UW-Madison

Collaborated with the Wisconsin Institutes for Medical Research on better understanding breast cancer risk by using novel machine learning methodologies.
Extracted, cleaned, transformed, and integrated genomic data and environmental data for a final dataset that consisted of 2000 patient records and 300 features/attributes.
Implemented recursive feature elimination by sensitivity testing (RFEST) (link to RFEST here).
Showed that 30 out of the 300 features were considered relevant for task of predicting breast cancer risk with an increase in accuracy (ROC AUC).
Technologies:

Python

Graduate Research Assistant

Department of Computer Sciences, UW-Madison

Generated synthetic data based on correlation immune functions of orders two, four, five, and six.
Empirically proved the effectiveness of a novel feature selection algorithm with correlation immune functions.
Successful completion of Master's Thesis (Summer 2017).
Authored a novel feature selection algorithm and accepted into the 17th IEEE International Conference on Machine Learning and Applications (ICMLA 2018).
Featured in the UW-Madison Graduate School website for work on predicting breast cancer risk.
Awards:

Advanced Opportunity Fellowship (2016-2017 academic year).
Computation and Informatics in Biology and Medicine (CIBM) Fellowship (2017-2018 academic year).

Technologies:

R
Python
Matlab
OSX Terminal
Unix Shell

UW-Madison Summer Researcher

Department of Biostatistics and Medical Informatics, UW-Madison

Rigorous independent research on general machine learning algorithms.
Designed a novel feature selection algorithm for use in bioinformatics.
Applied novel algorithm on germline genomic data to improve breast cancer diagnoses.
Technologies:

Harvard University Summer Researcher

Department of Biostatistics, Harvard T.H. Chan School of Public Health

Successful completion of comprehensive introductory courses in biostatistics and epidemiology.
Implemented summary statistics and logistic regression models.
Completed a data analysis project entitled "Evaluation of Gene-Environment Interaction for Ovarian Cancer".
Technologies:

Student Athletic Trainer

Lake Forest College

Assisted the head athletic trainer to maintain athletic training room.
Supervised the care of student athletes that compete at the college.

Projects

Publications

Recursive Feature Elimination by Sensitivity Testing

Authors: Nicholas Sean Escanilla, Lisa Hellerstein, Ross Kleiman, Zhaobin Kuang, James Shull, David Page

Abstract - There is great interest in methods to improve human insight into trained non-linear models. Leading approaches include producing a ranking of the most relevant features, a non-trivial task for non-linear models. We show theoretically and empirically the benefit of a novel version of recursive feature elimination (RFE) as often used with SVMs; the key idea is a simple twist on the kinds of sensitivity testing employed in computational learning theory with membership queries. With membership queries, one can check whether changing the value of a feature in an example changes the label. In the real-world, we usually cannot get answers to such queries, so our approach instead makes these queries to a trained (imperfect) non-linear model. Because SVMs are widely used in bioinformatics, our empirical results use a real-world cancer genomics problem; because ground truth is not known for this task, we discuss the potential insights provided. We also evaluate on synthetic data where ground truth is known.

A Comparative Analysis of Feature Selection Techniques for a Family of Nonlinear Target Functions and Breast Cancer Diagnoses

Thesis Committee: David Page, Charles Dyer, James Shull

Abstract - Due to advances in high-throughput technologies, data mining techniques for decision making processes have grown increasingly popular in the past decades. As more domains utilize these methods, we are seeing this surge in data – so-called big data – that requires preprocessing techniques to combat the curse of dimensionality. To handle such problems, dimensionality reduction techniques have been well-studied. Such an example is principal component analysis (PCA), a procedure that combines features to create new ones, thus reducing the dimensions in the dataset. This paper studies a different technique called feature selection. As opposed to using the original set of features in a dataset to build a predictive model, feature selection aims to find a subset of those features so that their combination is higher in predictive power, quality, and better interpretability of the data. In addition to an overview of feature selection techniques previously published in the literature, we present a novel feature selection algorithm and evaluate it using synthetic data labeled by a challenging family of nonlinear target functions. The method is also assessed on germline genomic data, from breast cancer patients and controls, comprised of single-nucleotide polymorphisms (SNPs) from a region of the human genome associated with breast cancer. Through these experiments, we show the advantages that our algorithm has over alternative methods when the task is to determine the subset of relevant features for a given input dataset.

Contact Me

Chicago, IL

Phone: (847) 668-5578

Email: escanillans@gmail.com