Sie sind auf Seite 1von 6

Outline:

Given a complex function, a binary matrix input and a real number vector as an
output, estimate the other vector and real number variables using optimization.

Solution Deadline:
Sep-1- 2016

If you would like to join an online group who will be working on this problem,
please tell us why you are interested in this challenge, including any relevant
skills or experience that you will bring to the team. If you would like to work
alone, or as part of your own self-formed team, please tell us here.

We are seeking people who are able to commit to two online meetings per week
and contribute in a meaningful way to this project. Do not apply if you are not
100% sure that you are able to make this commitment.

Please indicate what days and times you would normally be available to meet
with your team to work on this challenge. Typically, teams meet online 1-2
times per week.

Challenge Summary
MATHEMATICS
LINEAR PROGRAMMING AND OPTIMIZATION
Given a complex function, a binary matrix input and a real number vector as an
output, estimate the other vector and real number variables using optimization.

Challenge Details

Overview

Specific genes can affect various characteristics of a


plant. Additionally, combinations of multiple genes together can also have an
effect on these characteristics. This is known as the epistatic effect. This
relationship has been represented by the formula in the problem statement
below, which describes how to determine a plants height, given the genetic
makeup of the plant, the genetic makeup of the plant, and the influence that
specific genes and epistatic effects have on the height.

The goal of this challenge is to determine the influence of specific genes and the
epistatic effects, given the genetic makeup of the plant and plant
height. Simply put, to solve the equation for the other variables.

Problem Statement

Consider the function .

Input is a binary matrix :


Inputs and are two real number scalars: .
Input is a real number vector:
Inputs and are two binary vectors: with .
Output is a real number vector: determined as follows
.
Here,is the indicator function, which is equal to 1 if the statement in the
parentheses is true and 0 otherwise. is an error term.

Challenge: estimate andfor given and.

Rationale

Suppose vector and matrix represent the phenotype (such as plant height)
and genotype information of plants, respectively.
Each row in corresponds to a plant, and each column represents a gene that
could be responsible for the phenotype. Suppose each gene has two versions,
represented by 0 and 1.
The height of a plant with all genes being version 0 is estimated to be , as a
baseline value.
For , the additional effect of gene being version 1 over version 0 is estimated to
be , which may be positive, negative, or zero.
The height could also be affected by an epistatic effect, which means that a
certain combination of genes contributes an additional effect, , to the height.
This combination is defined by vectorsand . In order for plant to receive the
epistatic effect, must be 1 for all such that=1 and must be 0 for allsuch that =1.
For example, if , and, then only those plants that simultaneously have ,, andwill
receive the additional effect .
For simplicity, we assume for now that the epistatic effect is denoted by only
one combination.
A similar model could also be used for predicting consumers' preference of a
product from their previous purchasing records, or a stock's future trend based
on its historical performance.
Epistatic effects are common even outside the plant genetics field. For example,
the combination of low air pressure and higher-than-freezing temperature is a
good predictor of rain.
Bounding: m and n can theoretically be any size, however from a practical
standpoint, it is likely that m (The number of plants or observations) will be
very large (could be millions), while n will very likely be limited by processing
time (very large values of mcould take years). Additionally, large values
of n would increase the number of plants (m) required to solve. Most likely the
seeker will begin with n values of ~30.

Model

Examples

These examples are solely meant to help you visualize the types of
data/inputs. The data provided is an example only, and is not meant to be
used, as this is not a data manipulation problem.
Example1

Example2

In this example, Each row is an individual plant. The binaries in A are genes
that are on (1) or off (0). The value in h is the height of that plant.

Criteria
Deliverables:

A detailed and referenced document comparable to something that might be


published is a peer reviewed journal, aimed at experts in linear
optimization/programming
A medium-length (approx. 1500 words) document that explains the solution,
aimed at people who understand mathematics, but are not experts in linear
optimization
A short (500 word) LinkedIn post type of document that explains the solution
in basic language, aimed at people with a technical background but no
expertise in mathematics

Das könnte Ihnen auch gefallen