Many people use PittGrid for their research projects. Listed below are examples of the researchers, faculty, and PhD students from various departments at the University of Pittsburgh who have made or are making use of PittGrid's resources in order to make their lives easier, and their problems tractable.

In the classical model of primary hypertension, the brain suffers damage only after the occurrence of stroke, making the brain a late target of the disease. Recent evidence, however, suggests that primary hypertension accelerates brain aging early in its course by producing characteristic functional and cognitive defects. Our study seeks to determine whether middle-aged prehypertensives undergo a faster course of brain aging compared to normotensives. The sample consists of approximately 100 subjects aged 35 to 60 of whom half are normotensive and half are prehypertensive (systolic 120-139 mmHg and/or diastolic 80-89 mmHg). Magnetic resonance images of subjects? brains will be obtained on a 3T Trio TIM whole-body scanner (Siemens Healthcare, Malvern, PA) employing fluid attenuated inversion recovery (FLAIR) and magnetization-prepared rapid acquisition with gradient echo (MPRAGE) sequences. Trained raters will use FLAIR brain images to grade the severity of white matter hyperintensities, ventricle size, and sulcal size using standards from the Cardiovascular Health Study. Cortical thicknesses will be reconstructed from T1-weighted MPRAGE images using the FreeSurfer software package (Martinos Center for Biomedical Imaging, Charlestown, MA). Regression analyses will be performed using age and blood pressure status as predictor variables of these brain aging indices. The results of this study will help determine whether mildly elevated blood pressure plays a significant role in brain aging. Evidence of accelerated brain aging in prehypertensives would suggest the maintenance of normotensive status as a possible strategy for slowing cognitive decline during middle age.

For my dissertation I will be testing new statistical methods on genomewide association studies for a variety of cancers. Genome wide association studies look at hundreds of thousand to millions of base pair variants within the human genome. Diffrence in variant counts between case and controls are assesed to see which variants may lead to cancer successability. Due to the large number of multiple comparisions being made in these studies, false postives are extremely common. I will be using known biological pathways to help reduce the number of false postives and find those variants which truly have an effect. Due to the large number of variants, running these analysis can be computationaly intense.

Major depressive disorder (MDD) is a heterogeneous psychiatric illness with mostly un-characterized pathology, contributing to death by suicide, as well as the fourth most common cause of disability per the World Health Organization (WHO). To understand the genetics of MDD, gene expression analysis is one of methods to identify the biomarkers associated with MDD. Due to the very weak expression signal of MDD, a substantial clinical heterogeneity and small sample size, it is hard to identify consistent and robust biomarkers in an individual study .To achieve a more accurate and stable list of the differentially expressed (DE) genes and pathways associated with MDD, we propose a random intercept model (RIM) to account for the case-control paired design and confounded clinical variables such as alcohol, age, and antidepressant drug. An optimal random intercept model (oRIM) with variable selection is performed to accommodate the small sample size. Three popular meta-analysis methods (Fisher, inverse variance weighted, and maximum p-value) are then applied to detect biomarkers and pathway analysis are performed. The result shows increased statistical power from clinical variable adjustment, paired design modelling and meta-analysis in this genomic setting and more profound biological findings in MDD neurobiology.

This purpose of this project is the development of theoretical tools for the analysis of data from the Large Hadron Collider (LHC). The LHC is a major step forward in experimental particle physics, and the problems of electroweak symmetry breaking and dark matter suggest that a large number of new particles could be discovered at this machine. However, the relation between observable data and underlying particle properties is very complex, since new heavy particles would decay into lighter particles, some of which are invisible to the detector. I want to explore the application of a technique called the Matrix Element Method. This method has been used successfully for top-quark measurements at the Tevatron collider, and it is also very promising for the investigation of new particles at the LHC, since it uses all available information in the measured events. For each event, a likelihood is calculated that this event agrees with a theoretically calculated matrix element, taking the measured particle momenta as input, and integrating over the momenta of the invisible particles. This determination of the likelihood can then be used to compare different theoretical models and constrain their parameters. Using simulated data, I want to apply and refine the Matrix Element Method for the analysis of typical new physics signatures at the LHC. This method requires relatively large computing resources, since a large number of events is needed to even out statistical fluctuations, and for each event a numerical integration needs to be performed.

Entromic characterization of gene panels for SNP selection for a series of genetic epidemiological studies in different tumor types.

In our study, we construct ensembles of models that vary in parameter values and are ranked according to their likelihood to capture existing data or clinically available observations. The results obtained from an ensemble model are probabilistic predictions of the dynamics that accurately represents the variability of responses in individuals.

Our main objective is to construct an ensemble model of the dynamics of influenza infection which makes probabilistic predictions on the course of disease, finds the likelihood of each model representing individual immune response and makes probabilistic estimates of the effectiveness of therapeutic interventions such as antiviral drug therapy. We develop methods to find the posterior distribution which contains all the information on the parameter space. Our main goal is to quantify uncertainty in model predictions due to parameter heterogeneity.

Based on our published ODE model by Hancioglu et al. (JTB, 2007), an ensemble model of human immune response to influenza infection consisting of multiple ODE models that are identical in form but differ in parameter values is developed. A probabilistic measure of goodness of fit of the ODE model is used to derive posteriori probability density of the space of parameter values. This probability density is sampled using Metropolis Monte Carlo method and sampling efficiency is enhanced using a parallel tempering algorithm. The ensemble model is employed to compute probabilistic estimates on trajectory of the immune response, duration of disease, maximum damage, likelihood of rebound in the disease and the probability of occurrence of superspreaders. The effectiveness of using an antiviral drug treating the infection is determined and optimal treatment scenarios are discussed.

GLM+Lasso on spike train data for detecting neural connectivity. Simulation can be significantly accelerated by pittgrid.

Simulation of the solvation behavior of some polymers in CO2 liquid using Amber package.

We are doing modeling and simulation of solid mixers. Different configurations of rotating tumblers are being investigated to enhance the mixing of binary particles.

The project involves whole genome pattern analysis of multiple organisms. The whole genome sequence of a human is 3 Gibabytes; We would like to carry out this analysis by analyzing each chromosome separately in the grid, where chromosomes are of the order of 50 to 300 MB. Pattern mining in data of this size can be resource intensive so PittGrid would serve as a more efficient and effective source for carrying out our analyses.

The goal of this project is to develop a model of breast milk production in humans and then use the model to address a variety of specific questions. The initial model on which we are working is based on major biological factors contributing to the milk production process after it has stabilized (e.g. 30 days after birth) but is, we hope, sufficiently abstracted to allow for mechanistic insights. The model follows milk production from the binding of prolactin molecules to prolactin receptors until the milk is ejected from the breast during feeding (or simulated feeding, i.e. pumping). There are several basic processes involved in the model. First, the process of the prolactin binding and unbinding is mediated by the act of suckling as well as by stretch-mediated receptor inhibition. The model includes prolactin receptor insertion and degraded, the latter of which depends on the amount of feedback inhibitor of lactation (FIL) present. Bound receptors produce the STAT molecule, which we assign responsibility for the generation of milk in the cuboidal cells lining the alveolus. At this point, we treat milk as a single substance and do not consider variations in its fat content or other aspects of its composition. After it is made, the milk is transferred to the lumen of the alveolus; this process is inhibited by the accumulation of FIL in the lumen. Finally, the milk is ejected in response to pulses of oxytocin controlled by suckling. The current goal is to find an adequate parameter set for the model that will fit the data we have using fminsearch.

With Monte Carlo simulations, trying to characterize the response of stochastic neurons. Because of the random nature of the equations, many simulations must be averaged to obtain an accurate estimate of the expected result. These single simulations by themselves are taking several hours. The sophistication of the models/equations is circumvented with fast Matlab routines.

Our project is a mathematical model about vocal fold inflammation. We creted an ordinary differential equation model, which includes 6 equations and 21 parameters. Right now, we are at the step of estimating these parameter values based on our experimental data.

The calculations I am running on PittGrid are to evaluate the stationary state properties of random integrate-and-fire networks. These are model systems commonly used to simulate the networks of neurons. We are interested in the statistical properties of these system, specifically, how the activity level and the relative timing of pre- and post-synaptic spikes depend on the connectivity and connection strength of the networks. These properties are important in neural network learning that is based on spike-timing dependent plasticity.

The application I'm running consists of a main matlab code calling one other matlab code and several different data files. This main code runs a parameter optimization based on the matlab optimizer fminsearch. I created different starting parameter sets and saved them all into one big matrix.

The application takes now the *j-th* element out of the matrix and submits the optimization job to one computer. I'm running the application for j=1..250 at the moment.

Via pittgrid I am running two systems of nodes in matlab. First, the files run a system of nodes modeling lung volumes during infection. The output of this system is then used in the second system odes in matlab, which is actually the discretization of a PDE system modeling gas exchange in the lung.

Due to long running time, the total project was broken into smaller pieces and run in sequence, in order to effectively use pittgrid. In parallel this sequence is ran with various initial lung volumes.

The application I was running was part of a translational study to find the conditions where it would be beneficial to supplement standard chemotherapy (in our case the drug Taxol) with antiangiogenic therapy (the drug SU5416). The study is a longitudinal study that measures the volumes of ovarian tumors in a mouse model where mice are treated with different combinations of drugs. My work developed a new way to perform a functional regression model on these data which does not assume a parametric form for any of the effects over time and a way to use Bayesian confidence intervals to perform inference. The primary brunt of my work on the grid was in computing computationally intensive bootstrap confidence intervals in Matlab which we then compared to the more computationally efficient Bayesian confidence intervals.

As part of a research program in the inflammatory response in humans, we are performing Bayesian parameter estimation in equation based models of inflammation. Analytic analysis of the posterior parameter distributions is not possible due to the non-linear and high dimensional nature of the models. Monte Carlo Markov Chain (MCMC) methods provide a tractable, but computationally intense, approximation to the true distribution.

We are developing a coarse-grain parallel MCMC method with fast mixing times for multi-modal distributions. The algorithms are currently being implemented and tested on the PittGrid system. Our group does not have a dedicated compute cluster so the PittGrid system has been a valuable resource for our work.

http://www.pitt.edu/~pep9/

A novel method of extracting intrinsic information from a gene sequence was developed (1-3). This method uses properties of gene sequences that are deposited in the sequence databases as information about natural, evolution optimized, thermodynamic state of the genes. The graph-theory based processing of this reference information is the engine of this novel methodology, which is termed Thermodynamic Tolerance (TT) analysis. This method provides a formalism that allows the equivalent treatment of both non-synonymous and synonymous mutations and coding and non-coding regions of the gene.

Main applications of the results at the University of Pittsburgh context are anticipated in

- Pharmacogenomics studies at the cutting edge of the transition from SNP to haplotype to functional SNP technology and in further improvement of applications in micro-array assay development that we already implemented;
- Whole genome association studies (e.g. for patients with Crohn disease, obstructive pulmonary disease, idiopathic pulmonary fibrosis, chronic pancreatitis). Here the identified networks of TT-based similarities will serve as the weighting schemes for selecting the most promising targets from the experimental results;
- Experimental validation of the predictions made by the TT-method as it pertains to drug resistance enabling mutations in viruses.
- Undertake computation of whole genome networks of similarities of TT-patterns using primary information from TT-matrices, TT-profiles and TT-distributions for reference consensus sequence (human, other species including mouse and primates). These databases will provide baseline characterization of TT-captured evolutionary, structural and functional weights of genome loci and their mutual relations for subsequent aims.
- Undertake detailed analysis of significant components in TT-pattern networks for currently best characterized ENCODE (Encylopedia of DNA elements; http://www.genome.gov/10005107) regions (both in human and other species) in order to examine the relationship between TT-patterns and functional elements, and annotations. In this way, we will validate the hypothesis that reference TT-based information is biologically relevant and can be used to identify disease-related loci.
- Use publicly available whole genome genotyping data to reconstruct individual genomes (individualize the consensus sequence by genotyping results) to obtain multiple TT-characteristics of genomes for individuals and process these individual TT-characteristics to obtain individual TT-patterns, their networks and quantified interindividual variations.
- Relate the "phenotype" or disease outcome data from existing (or simulated) whole genome association studies to the TT-descriptors of individuals and develop and validate disease specific Quantitative Genome Disease Relationships (QGDR).

- Pancoska, P. (2006) Application of graph-based analysis of genomic sequence context for characterization of drug targets. Curr Drug Discov Technol, 3, 175-188.
- Pancoska, P., Moravek, Z. and Moll, U.M. (2004) Rational design of DNA sequences for nanotechnology, microarrays and molecular computers using Eulerian graphs. Nucleic Acids Res, 32, 4630-4645.
- Pancoska, P., Moravek, Z. and Moll, U.M. (2004) Efficient RNA interference depends on global context of the target sequence: quantitative analysis of silencing efficiency using Eulerian graph representation of siRNA. Nucleic Acids Res, 32, 1469-1479.

I'm applying a simplex method based piecewise linearization algorithm to David's discrete model of DNA. The main purpose at this moment is to figure out whether multiple solutions of the BVP exist for few sequences. The advantage of this method is that it finds all the solutions of a system of nonlinear equations, but the disadvantage is -as you could notice- that it requires plenty of computation capacity in higher dimensions.

Total CPU hrs used: 9,063

Statistics Simulation using R-software.

Total CPU hrs used: 1,081

Statistical simulation using R software.

Total CPU hrs used: 2,877

Simulation of medical patient's data using Matlab.

Total CPU hrs used: 2,037

Simulation of infectious decease using Matlab.

** Question? Comments? **

Contact Senthil Natarajan:

senthil (AT) pitt (DOT) edu

Last updated: 09/11/14