IRL 2008 Summer Internship Program

Summer Internship Program 2008
India Research Lab invites applications for its 2008 summer internship program. The
internship opportunities are available in various groups at both New Delhi and
Bangalore. The selected students will have the opportunity to work with a prominent
researcher in the group, and will be required to spend 10 12 weeks at the lab
during the summer of 2008.
Brief descriptions of potential projects and the
application process are provided in this document.
Contents:
Internship opportunities for the 2008 program
IRL Delhi
Page 1-13
Information Management & Text Analytics

Telecom Research
Analytics & Optimization
High Performance Computing
Distributed Systems Management
Systems Management & Software Engineering
Programming Models, Analysis & Tools
IRL Bangalore
Page 14-15
Knowledge Discovery and Data Mining

Software Engineering
Business Process Analytics and Optimization
Application Format
IBM India Research Laboratory
Page 16

Internship opportunities at IRL Delhi
Information Management & Text Analytics
Design and development of knowledge lifecycle for systems management
The project involves survey of methods of extracting knowledge artifacts to be used for systems
management and service desk in particular. These methods are to be combined to form whole life
cycle of knowledge management including steps for acquiring the knowledge, storing it,
maintaining it and expiring it. We will be concentrating more on first and last components of the
life cycle. The project involves developing policies for knowledge admission control and expiring.
Skills: Skills in one or more of Information retrieval, text mining, Java, XML
Level: PhD/M-tech/MS/B-tech
Project Code: D-001
Building usable interactive text classification systems
The text classification task is one of learning models for a given set of classes and applying these
models to new unseen documents for class assignment. Text classification has many important
real life applications. For example, categorizing news articles according to topics such as politics,
sports, or education; email categorization; building and maintaining web directories like Dmoz;
spam filters; automatic call and email routing in contact centers; and so on. A statistical text
classification system is trained using ample number of pre-labeled examples. Creating such huge
number of labeled examples is one of the biggest challenges faced by real life text classification
systems. The training examples of a class should also be unambiguous with respect to other
classes and exhaustive to cover all relevant concepts of the corresponding class. How to create
large no. of training samples? How to decide whether training samples gathered are good
enough?
This project aims to significantly extend the state of the art and build systems to overcome
operational challenges. Some initial theory and implementation exists that looks at the interactive
aspect of text classification systems. The first objective would be to develop new and extend
existing theories. Secondly, this project aims to build a system which can be placed in real life
settings for building operational classifiers. One starting point for this project will be IBM TICL
http://www.alphaworks.ibm.com/tech/ticl/
Skills: Theory - Graduate course in Data Mining and/or Machine Learning. Working knowledge of
text classification techniques like naive Bayes, SVMs, Logistic regression.
Implementation - Java and Web-based programming.
Level: M-tech. (preferred), Exceptional B-tech students will also be considered.
Project Code: D-002
QA Knowledge Base Miner

Question answering (QA) is a type of information retrieval. QA is very dependent on a good
search corpus - for without documents containing the answer, there is little any QA system can
do. Most existing QA systems answer queries over a corpus of documents by using NLP
techniques to find the closest matching queries and answers. However, in scenarios such as a
help desk where the queries and answers may be hidden inside call logs or email chains, it is
difficult to detect the answers due to the documents having noise. Moreover, there is often a
chain of queries and answers that could be used for future conversations. This project aims to
build a QA Knowledge Base (KB) over the noisy documents for use in subsequent interactions.
The challenge is to identify from these noisy documents, a set of frequent queries and their

answers, associate them to high level topics for easy retrieval and also develop techniques for
updating the KB.
Skills: Must have good programming skills in Java and strong algorithms background, awareness
of semantic web, web search techniques will be helpful. Good DBMS and/or Information
Retrieval background essential.
Project Code: D-003
Spoken Language Skills Evaluation through Speech Processing

Automatic evaluation of spoken language skills is a very challenging and interesting area of
research. At IBM India Research Lab, we are working on a project to evaluate spoken English
skills, such as, pronunciation, grammar, syllable stress. This requires advanced speech
processing and pattern recognition techniques. In this project, one needs to work on improving
accuracy for these spoken English parameters through new algorithms or modifying the existing
approaches.
Skills: Digital signal processing, probabilities and random process
Preferred skills: digital speech processing, pattern recognition or machine learning.
Project Code: D-004
Learning the Dialect of Noise in text

Noisy unstructured text data is found in informal settings such as online chat, SMS, emails,
message boards, newsgroups, blogs, wikis and web pages. Also, text produced by processing
spontaneous speech, printed text and handwritten text contains processing noise. In this project
we would like to explore unsupervised techniques to model the noise and learn its underlying
distribution. Our goal will be to enable text mining tasks on the noisy data.
Skills: Candidates should be good at Java Programming and preference will be given if he has
knowledge of probabilities and random access.
Project Code: D-005
Extracting knowledge from few hundred streams of categorical/numerical
data.
When one monitor an application it does not monitor only one or a hand of sensors. It monitors
thousand of them through a long (years) period of time. Getting the right information easily has
hence become a huge problem not only in term of visualization but also on selecting the
information to display. There is a need to present to the user no more than 3 or 4 information per
group of sensors, if needed the amount of information can be increased in a Zoom-in/out fashion.
Although the techniques to perform such tasks are well known (clustering of categorical values,
aggregation) such techniques have not been develop for a commercial product.
In the 3 months project the student will have to come up with viable options of
selecting/presenting such information by using a limited amount of resource (memory & CPU)
and considering the date received by the sensor in a stream fashion. The project should have a
prototype at the end to demonstrate the feasibility of such approach.
Skills: Skills in one or more of data-mining, clustering, stream algortihmics, Java or C++ (UNIX)
for prototyping.
Project Code: D-006
Incorporating Annotator Disagreements into Classification Models for Text

Machine learning systems for classification require labeled data. In many operational
classification settings involving text, annotations are available from multiple manual
labelers/annotators. Examples are ratings for a movie provided my multiple reviewers,
classification of a document in Dmoz or Yahoo! by multiple volunteers. Sometimes, the interannotator mismatch is due to genuine disagreement. However, in other cases, it is due to the
myopic or incomplete view of any single annotator (that is, the instances are by nature, multilabeled), or due to errors on the part of one or more annotators. Although there is some work on
learning ensembles of classifiers, it does not account for the nature of disagreement within a
single model. The problem is of practical importance in an organization in tasks such as text
classification, information extraction, sentiment analysis, etc. This project will explore new
methods to address this problem.
Skills: Exposure to machine learning and data mining.
Project Code: D-007
Improvement for IBM DB2 UDB.
DB2 9.5 is an all improved hybrid data server supporting relational and native XML
format. IRL has in the past few years developed algorithms to help the enablement of
new technologies included in DB2. This project is a direct offshoot of this
engagement.The student will have to work as a member of a team composed of
people located in India and Canada. The definition of the project will be done in
conjunction with the student although based on previous year experience he should
expect to work on either enabling current technology to end-users or devising new
algorithms inside the query engine (optimization, indexing, ...)
In the three months project the student will have to come up with an innovative
solution of the problem considered. The project should have a prototype at the end to
demonstrate the feasibility of the approach. It is also expected that the student take
on a lead role in publishing the work in a renowned conference.
Skills: Relational database, algorithmic, statistics, data-mining, data-structure, Java
or C++ (UNIX) for prototyping.
Level: M-tech/MS
Project Code: D-008
Telecom Research
Integrating the Device in Telecom Applications
There is a plethora of end-user devices over which services are accessed and
executed in a Telecom environment. Such devices range from being very basic ones
(for example, offering only the capabilities of messaging and voice) to being
sophisticated, state-of- the-art devices like iPhone. To compose rich Telecom
applications, we need to utilize device functionality. For example, a Telecom
application can make use of calendar, user profile and location (e.g. cell site
information) available on the user mobile phone and feed it to third party services
(like Google Maps) accessible over the Web.

In this project, we propose to look at ways of incorporating device functionality in
Telecom applications. Towards this, we would study a host of mobile phones and
other end-user devices, and develop a generic model for encapsulating their
functionalities.
Skills: Candidate should be conversant with Java programming language. A nice-tohave would be to know about J2ME concepts.
Project Code: D-011
Managing Real-time Information in Next Generation Converged Networks

With the emergence of converged networks, i.e. the convergence of traditional telephony and IPbased networks, cellular operators are rapidly shaping themselves to become the enabler for
people to access several forms of (voice, data, video) Internet-based services. Operators have a
wealth of content associated with their network as well as core network enablers, e.g. call control,
presence and messaging, which could serve as potential new revenue streams in a converged
ecosystem. For example, there is plenty of real-time information about people available in the
network (i.e. location, who is online, what is someone doing currently etc) that can be used to
leverage a number of innovative value-added applications. This project focuses, at a broad level,
on the area of real-time information management in converged networks. The exact scope of
work is expected to include the development of interesting research prototypes, as well as the
design and analysis of efficient algorithms and/or protocols, for real-time data management in
converged networks.
Skills/Level: B.E(CS)/MS(CS)/ Ph.D. (preferred background in systems, mobile, pervasive
computing, data management)
Project Code: D-012
Analytics and Business Intelligence for Cellular Service Providers
We plan to investigate new frontiers of graph models and device novel algorithms to improve the
current state-of-the-art of analytics and business intelligence portfolio for cellular service
providers. The project revolves around implementing new algorithms for churn
prediction/prevention, targeted advertising or campaign management and visualization of these
graphs in innovative ways
Skills: Graph theory, algorithms, data mining, Information visualization (One or more)
Level: PhD/M-tech/MS
Project Code: D-013
Pyr.mea.IT : Permeating IT towards the Base of the Pyramid
Until recently, all the attention of the IT industry were focused on the needs of the
people at the top of the pyramid (people with high earnings). The larger fraction of
the world (several billion people) is unaffected with the advances in IT. We are
investigating novel technological approaches and solutions to empower and enable
the less-privileged of the world. We have developed novel voice based technologies
as well as proposed pragmatic information and service delivery solutions based on
them - details about the project can be obtained from:
http://domino.research.ibm.com/comm/research_projects.nsf/pages/pyrmeait.index.ht
ml.
We welcome intern applications in two tracks:
(i)
Technology Track: Exploring scalable low-cost infrastructures, multi-modal
interfaces, and the like. We expect the user to have a good background in a

technological area such as (but not restricted to) systems or speech or user
interfaces or mobile-related technologies.
(ii)
Design Track: Using Social engineering and HCI design methodologies to
conduct user surveys and technology evaluations. We expect the intern to be
familiar with the standard practices and methods in conducting such studies, and to
be able to use and educate the use of the deployed technologies.
Project Code: D-014
Analytics and Optimization

Fire Program Analysis: Multi-criteria Strategic Budgeting of Fire Mgmt Activities
Develop a multi-criteria optimization model and solution framework (based on Goal Programming
approaches) for addressing the national level budgeting problem for wildfire management
activities in the US for the USDA client. Leverage current research efforts underway to look into
the technical challenges in the areas of on-line resource allocation problems arising in the
deployment of constrained fire management resources in a fire season faced with competing and
long-burning large fires. The overall work has larger applicability in other areas of disaster
management (pandemics, homeland security) that are under the IBM business focus.
Skills: Knowledge of Mathematical Programming theory, Implementation Skills in AMPL, Java
API of CPLEX
Level: MS/PhD in Industrial Engg or Operations Research
Project Code: D-021
Workforce Management: Management of Resource Supply/Demand Risks in Global Service

Delivery
Develop new models for characterizing and analyzing hidden attributes (in unstructured data) of
supply and demand data to aid in effective and efficient optimization of (i) Practitioner Matching
(Resource Identification and Assignment, and Rotation Management), and (ii) Risk Managed
Hiring.
Skills: Knowledge of Applied Probability, Risk Management theory, Stochastic Analysis,
Implementation Skills in SAS or R, MATLAB
Level: MS/PhD in Industrial Engg or Operations Research
Project Code: D-022
Algorithms for understanding/mining/visualizing time varying interaction/social

networks
Today's global service delivery organizations handle a large number of projects in diverse sectors.
A key question faced by such organizations is: how should it identify the people with appropriate
skills sets and with access to relevant skill sets to start a new sector? Professional contact
networks provide links between people and public databases contain outcome of previous
interaction between people. Can we identify the skills of individuals and the skills that they have
access to by association? Similar questions are posed on predicting the spreading patterns of a
pandemic based on the interactions of people (and locations). This project deals with developing

new models and solution approaches for identifying (i) acquisition of and (ii) transitions between,
attributes of professionals based on their social interactions over a period of time.
Skills for the project: Knowledge of C++/JAVA, basic Graph Theory and Data Mining
Level: MS/PhD in Computer Science
Project Code: D-023
High Performance Computing

Data Analytics for Efficient Service Request Management
Currently, service requests in many outsourcing engagements are managed in a somewhat adhoc manner, based on the (sometimes limited) understanding of service personnel about the
nature of a service request, the parameters to be tracked, or the complexity involved in meeting
the request. This can lead to sub-optimal dispatch of service requests, resulting in multiple
reassignments and inordinate delay. Metrics e.g. mean time to resolution/ severity level,
collected on such requests, often have high variance, and do not provide deep insights about the
diversity of request types, or their complexity. To address the above challenges, the broad goals
of this project are (i) Discovering categories of requests in a service environment, and (ii)
Evaluating the complexity of service request categories. To meet these goals, we anticipate the
use of text analysis techniques and rule-based reasoning on service requests. The technical work
will involve algorithmic aspects for category discovery and complexity evaluation, as well as
hands-on implementation of an infrastructure that supports such techniques, invokes them at
runtime, and interfaces with an actual service request management system in use.
Skills: The candidate should have good implementation skills in general, and be hands-on with
Java in particular. Interest/skills in text analysis and data mining would be helpful.
Project Code: D-031
Scaling Scientific Code for HPC Platforms
Study of scaling practical scientific code on HPC platforms (e.g., weather modeling code, some
scientific simulation code, etc.). The goal will be to study the scientific application, identify the
scaling bottlenecks and propose alternate algorithmic methods that are more suitable on
massively parallel processor systems such as Blue Gene.
Skills: Architecture, HPC, algorithms, C programming
Level : M-tech/MS
Project Code: D-032
Scaling: Automatic Generation of Optimal Code for Multi-Core Architectures
Design and development of tools for generating optimal code for numerical/scientific routines on
multi-core platforms. The goal here is not to generate code that is better than what a novice
would produce leading to better productivity -- but to produce the most optimal machine code by
examining the architectural parameter space.
Skills: Architecture, C programming
Level : B-tech
Project Code: D-033

Using contention-aware adaptive co-scheduling to minimize OS jitter in
parallel applications on SMT architectures
The goal of this project is to minimize OS jitter in parallel applications executed on SMT
architectures by developing a scheduling algorithm that appropriately schedules daemon threads
with the application thread so as to cause minimum impact on the parallel application progress.
This project comprises of the following sub-activities (Essential refers to the core activities and
Optional refers to activities that we will undertake if time permits) :
1) Monitoring the resource usage of daemon processes as well as standard HPC benchmarks
(we might use NAS benchmarks or develop our own with varying resource usage characteristics
or do both). This involves instrumenting their cache miss rates, memory access densities, number
of integer and floating point operations etc. This activity annotates the runnable threads with their
respective resource usage, which can help the scheduler to pick the set of 'least-contending'
threads for concurrent execution. [Essential]
2) Based on the resource usage of daemons and the parallel application benchmark in the step 1,
develop an algorithm to identify the subset of daemon threads that can be concurrently executed
with the application thread with minimum performance degradation. In this way, some of the 'less'
harmful daemons can be co-scheduled with the application thread thereby improving utilization.
[Essential]
3) Investigate if we can identify a maximum stretch for each daemon, i.e. the maximum amount of
time we can delay the daemon execution while ensuring correct functioning of the system.
[Optional]
4) Adaptively tune the priorities of the processes such that the maximum allowable stretch for
each daemon is not violated. [Optional]
More details about the OS jitter project can be found at: http://www.research.ibm.com/osjitter
Skills: C/C++ implementation experience required, knowledge of scheduling algorithms/OS
scheduling is a plus, Linux kernel experience is a BIG plus
Project Code: D-034
Parallel Application Mapping Tool
Multi-core architectures are being increasingly used in high performance computing
applications as well as commercial applications. With improvement in semiconductor
technologies, the increase in number of transistors is following Moores Law and this
enables more number of cores on chip with complex interconnect. Moreover, these
cores are heterogeneous for better performance with multiple workload types as
compared to single optimized cores. The memory systems in the future multi-core
devices will be more complex and will resemble a distributed memory system with
multiple coherent domains instead of a single coherent domain. Future multi-core
architectures will have massive number of cores, with 512 cores in next 4-5 years
and many more later. For the legacy software, it will be very difficult to take benefit of
these architectural changes and get appropriate speed-ups. The manual effort
required to enable performance gains on the next generation multi-core architectures
will be very high. Hence, there is serious need for automatic parallelization
techniques that automate the mapping process to a large extent and provide a
methodology to the user so he can converge to a high quality mapping.
The work involves developing the front-end of the mapping tool and using static
analysis to extract interesting program characteristics. Candidates are expected to
have prior exposure to compiler internals.
Skills: Exposure to compiler internals and graph algorithms.
Level: PhD

Project Code: D-035

Application profiling
Platform-independent modeling and profiling of applications, along with prediction of
application resource usage is useful for various tasks in systems management e.g.,
capacity planning and optimization, runtime management and performance
modeling. Current solutions address parts of this problem by either focusing on a
specific application, or a specific platform, or on a small subset of system resources
(e.g. only CPU usage). Applications are often dependent on various resources like
CPU, disk, network etc. and co-relation of these dependencies is a challenging task in
itself. In this project, we plan to investigate platform independent profiling of batch
and transactional workloads for the purposes of:
1. Understanding co-relation of different parameters on application behavior
2. Create multi-dimensional temporal profiles for the applications based on these corelated parameters
3. Usage of this models in profiling and prediction of resource usage. The predicted
values provide valuable inputs to planners, schedulers or any suitable resource
management tools.
Skills: The candidate should be strong in algorithms and their implementation.
Project Code: D-041
Power management
The aim of this project is to design and implement a runtime controller for dynamic
power and performance trade-off management in clusters using server virtualization
and VM migration technologies. We will use standard enterprise benchmark
applications, build power models for the servers and the applications, and design
algorithms that can generate dynamic mappings of applications/VMs to physical
servers. The aim is to use the minimum number of servers in the cluster and run the
rest at optimum utilization levels so that the performance and power targets are met.
Skills: The candidate should be strong in algorithms and their implementation in
Java as well have interest in getting hands-on with VMWare, Xen or other
virtualization platforms.
Project Code: D-042

Dynamic Server Consolidation Planning
This project aims to develop a tool that provides quantitative means of estimating the
benefit of virtualization in a given data center. The tool will take as input a set of
applications, along with a resource usage trace, and show the impact of using
virtualization by static consolidation. Further, the tool will provide estimates on the
additional benefit if live VM migration is introduced.
Skills: The candidate should be strong in algorithms and their implementation and
have interest in server virtualization technologies.
Project Code: D-043
Systems management & Software Engineering

Model-Driven SOA Deployment Automation & Performance Testing
(a) Model Driven SOA Deployment:
A large proportion of time of software development/maintenance teams is spent in
deployment and setup of the OS, middleware, and application level components on
the given hardware. Time is wasted because of several errors occurring because of
mismatch of requirements and constraints between the various components of the
software. We will explore how models can be used to validate and drive the
deployment of composite solutions involving multiple components (servers, OS,
middleware, etc). Research challenges is in modeling and creating the right model
transformations that finally result into scripts that can then be used to not only to
provision the composite solutions but to also set quality of service properties -- like
security, performance, etc.
Skills: Some knowledge of UML, eclipse, and desire for adventure
Project Code: D-051
(b) Performance Testing for Composite applications
Performance testing for composite applications is extremely time consuming simply
because of setting up of the testbed and configuring the workload is time-consuming
and painful. We assume that the test cases that drive the workload already exist. In
performance testing several configurations are tried to understand where the
bottlenecks are, what are the relationships between the various configuration
parameters and how they affect the end-to-end response time and throughput. Our
aim is to generate these configurations (which could be large in number) in a
systematic manner and use these configurations to drive the setup of the test-bed
and workload and finally execute the test. As mentioned earlier, there could be a
large number of configurations, we want to explore how we can intelligently schedule
the testing of each configuration as to minimize the time taken to complete all the
tests. Can we actually embed domain knowledge in such a way that testing for one
configuration would obviate the need to test other configurations. We will explore
how virtualization techniques can be used along with scheduling of the configurations
to reduce the time for testing of all the configurations.
10

Skills: virtualization concepts (VMWare), combinatorial optimization, programming in
Java, penchant for adventure
Project Code: D-052
Expertise discovery in Jazz Software Development Environments
To develop successful complex software, developer and other members of the
software development team must collaborate with each other to solve issues. In large
software projects people may not know other members of the team and may also not
know who is the right person to contact at the time of an issue. Skill and expertise of
people depend on what they have been working in the past. Our goal is to create a
expertise-extraction framework around Jazz (which is a governance framework for all
aspects of software development lifecycle). We want to explore how activity of the
developers can be observed in the IDE (integrated development environments) like
Eclipse and this information can be used to answer who is the right person to
collaborate with for a given issue. For example, a person who is writing a class and
unsuccessful in using the Thread API can quickly collaborate with a person (not
known a priori) who has recently been successfully using the Thread API. We shall
explore and formalize what are the various patterns of activity of software developers
that can be discerned from the IDEs and used for collaboration in future. We shall
also explore what is the scalable architecture to achieve this kind of expertise
gathering and collaboration.
Skills: Eclipse, Software engineering processes for development, some analysis
capabilities, programming in java, may be machine learning but not necessary,
penchant for adventure
Project Code: D-053
Middleware and Tooling for Problem Determination for Large Enterprise Systems
In large and complex enterprise systems when users face problems and issues they
approach a help desk to get their problems resolved. Helpdesk personnel who are
skilled are not only costly to keep but are difficult to find as well. This makes the life
of the service desk operators very difficult as the turn-around time to resolve the
users' problems is large and the users are unhappy. Some of the reasons that
handicap the service desk operators are the vast knowledge of tools and techniques
that is needed to resolve the problem, lack of automation, lack of integration
between tools. Though there are documents stored that talk about problems that
have occurred in the past and what approaches were taken to resolve them yet all
the documentation is done in an ad hoc manner resulting in human error and
wastage of time. The purpose of this project to explore and develop ways in which
input and output from heterogeneous tools and systems can be automated. This
should allow creation of diagnosis plans without the knowledge of the underlying
tools used and in such a way that even if the topology changes the diagnosis plans
do not fail. We will explore tooling and middleware that will allow the creation of
diagnosis plans, serve as a glue between different tools. We shall also explore how
this can be applied to IBM's service desk software to achieve automation of ad hoc
activities performed and in proactive and reactive resolution of issues occurring with
the managed enterprise system.
Skills: Knowledge of programming (java, etc), systems management, ITIL, and "killer
instinct"
11

Project Code: D-054
Programming Models, Analysis & Tools

A Basis for Static Characterization of Exceptions in OOPL
While there has been an abundant techniques of static analysis of OOPL, one of the main issues
has been the false positives that deter a user from using it. The motivation of the project is to
explore a compositional proof theory that would use static analysis as oracles. Such a system
will lead two level proof systems which in turn are expected to lead to effective and scalable
solution of exception analysis in OOPL using SAT solvers and other model checkers.
Skills/Level: Masters/PhD. Exceptional B-tech with PL interest can also be considered.
Project Code: D-061
X10 and Fault Tolerance
X10 is a HiPC language designed keeping in view productivity and performance. There is a need
for considering fault tolerance features needed for robust applications. The project envisages
looking at possible orthogonal constructs that needs to be introduced to cater to a spectrum of
transactions and computations X10 needs to address as it covers various applications. The task
is to arrive at constructs, techniques of implementation and possible integration with the X10
implementation.
Project Code: D-062
Safe Memory Management for Multi-Core Systems
There is a strong need for providing a safe memory model with user-controlled memory
management for multi-core systems. Safety would be dynamically guaranteed (at a minimum), by
an exception mechanism for all invalid accesses. This would provide a fully-user-controllable yet
safe middle option between C-style unprotected memory model with user-controlled memory
management and Java-style safe memory model, but with automatic memory management and
be interoperable with both. Performance of the system would be key, so unlike past work, our
work will also provably guarantee, constant-time operations for memory allocation, memory
deallocation and memory access overhead. References to de-allocated objects would result in an
exception being thrown that can be handled at run-time and would also be useful for program
debugging.
Project Code: D-063
Harnessing Parallelism via Multi-Core Architectures
Traditionally multiple processing units (CPUs/threads) are used to achieve performance in
compute intensive programs. Recently, multi-core systems are being used increasingly and are
becoming the norm. For instance, the Cell processor from IBM has been widely deployed in
gaming play stations. A Cell processor consists of a 64-bit PowerPC core and eight specialized
SIMD co-processors called the SPEs. All these are connected internally through a very high
bandwidth bus, the EIB. High bandwidth is available to the main memory too. The SPEs are
meant to handle the bulk of the computational workload. The SPEs can directly act only on their
local store, which is 256KB each. They have shared access to the memory through DMA. The
maximum size of a single DMA is 16KB. The SPE can use its local registers (Cell has 128
registers) along with L2,L3 and system memory. Similarly multi-core systems from Intel contain
multiple cores which have local L1 cache and shared L2, L3 and system memory. Most of the
12

desktop Intel machines are dual-core these days. Even though these multi-core architectures
provide efficient hardware features, the onus is also left with the expressibility of the
programming language, the compiler infrastructure and the programmer himself has to program
these machine to achieve better performance. The aim of the proposal is to achieve one or more
of the following aspects:
o Harnessing parallelism in applications and mapping the same to a multi-core
architecture
o Providing tools for harnessing performance of explicitly parallel programs on
multi-core architectures
o Providing tools to aid re-factoring of programs for better performance.
Project Code: D-064
Software Transactional Memory
Concurrent programming is horrendously complex. Current lock-based abstractions are difficult to
use and make it hard to design computer systems that are reliable and scalable. Furthermore,
systems built using locks are difficult to compose without knowing about their internals. A growing
consensus is that transactions, along the foundation of database concurrency control, are the
most promising near-term means to simplify the construction of multithreaded programs. A
Software Transactional Memory can perform groups of memory operations atomically. It was
proposed as a light-weight mechanism to synchronize threads by optimistic, lock-free
transactions. It alleviates many of the problems associated with locking, offering the benefits of
transactions without incurring the overhead of a database. Of course, there are several issues
with the realization. One of the main issues is that transactional abstractions don't compose well.
Some of the reasons, for these are:
o Transactions must still contain only revocable memory operations; it cannot
contain irrevocable input output operations.
o Composability of code block requires client to know much about the internal
details of the transaction implementation.
o In Transactional Memories there is currently a trade-off between consistency and
performance. Several high-performance STM implementations use optimistic
reads in the sense that the objects read by the transaction might not be
consistent. Consistency is only checked at commit time. However, having an
inconsistent view of the state of objects during the transactions might prevent
applications to run properly. On the other hand, validating after every access can
be costly, if it is performed for every object previously read i.e. the validation
overhead grows linearly with the number of objects a transaction has read so far.
o Nonblocking implementations of software transactional memories typically
impose an extra level of indirection when accessing an object. The indirection
assures that committing and aborting are both light-weight operations, and that
objects read during a transaction are immutable. However, it also increases both
capacity and coherence misses in the cache, by increasing the number of lines
required to represent an object and the number of lines that are modified when
changes to an object are committed.
In the proposed work, we plan to realize an implementation that would overcome some of the
above drawbacks.
Project Code: D-065
Fault Analysis Based Regression Testing
13

Software testing is often limited by time and cost constraints. To improve the effectiveness of
testing and reduce wasted testing effort, while ensuring that time and cost constraints are not
exceeded, prioritization of test activities is important. For example, prioritization can focus the
testing effort on components that require more thorough verification, thereby saving unnecessary
effort on other, less-critical components. Prioritization can also order test cases (using some
objective function) to ensure that the important ones are run early. In situations where not all test
cases can be run, which is common in regression testing (during maintenance) because of the
accumulation of a large number of test cases, prioritization ensures that the important ones do
get executed. To perform effective prioritization, appropriate information, such as fault profiles,
must be available, which can be used to select the objective function and invest testing effort
proportionately.
The goal of this project is to investigate how regression testing and test-effort prioritization can be
made more effective by taking into account fault profiles of the components being tested.
Informally, a fault profile for modified components is an assignment of relative weights to
components based the likelihood that they contain faults. Our main approach for computing fault
profiles is to analyze code components and their evolution history, to identify properties that may
be significant indicators of faults.
Skills: Experience in static analysis
Level: PhD (preferred), M.Tech/MS
Project Code: D-066
Internship opportunities at IRL Banglaore

Knowledge Discovery and Data Mining
The KDDM group at IRL-Bangalore is involved in projects related to NLP, information retrieval,
information extraction and analytics and Machine Translation. We deal with unstructured and
noisy multi-lingual information sources such as Web pages, documents, e-mail, ticketing systems,
on-line databases, team rooms, and IT monitoring systems to mine information related to
products, technologies, people and competitors to improve the quality of products and IT services
delivery. Examples of problems that we are addressing include:
Automatic extraction of knowledge from streaming categorical/numerical data from
application monitoring.
Machine-driven OLAP analysis to obtain useful insights on root causes of problems from
ticket data.
Analysis of sequence databases, specifically log analysis to make knowledge
management portals more useful
Representation (e.g. in terms of graphs) of incidents, problems and changes in an IT
environment and their relationship to system components to enable meaningful searches
Handling uncertainty, de-duplication and conflict resolution for information extracted from
multiple sources
Automatically extracting detailed skills information from resumes and job descriptions
Automatically evaluating call center calls/emails for compliance and quality
Aerating tools for assisting people to translate domain specific documents leveraging MT
output and domain specific dictionaries
14

Skills: Knowledge of one or more of machine learning, text mining, natural language processing,
machine translation, information extraction/retrieval; Java/C++ for prototyping.
Project Code: B-001
Software Engineering
The Software Engineering group at IRL-Bangalore is involved in projects related to the design
and maintenance of software solutions built using service-oriented architectures (SOAs). In
particular, we are developing technologies for improving the productivity of SOA-based solution
development, and also improving the flexibility and adaptability of SOA-based solutions.
Examples of problems that we are addressing include:
Change Management in SOA-based Solutions
Variation-oriented Modeling and Engineering of SOA-based solutions for improving
reusability
Modeling and Implementing Ontologies for improved software asset representation in
repositories
Design and implementation of transactional properties of contextual Web services
Frameworks for modeling and enacting B2B Applications
Technologies for enhancing reusability and flexibility of SOA-based solutions via
commitments
Skills: Good knowledge of SOA/Web Services and Software Engg. Principles; Java skills for
prototyping
Project Code: B-002
Business Process Analytics & Optimization

The Business Process Management group at IRL-Bangalore is exploring various aspects of
business processes including their discovery/mining, representation and modeling, analysis,
measurement and optimization. The selection of problems is aimed at improving the efficiency
and effectiveness of business processes, especially those used in the service delivery domain.
The group and its internship projects combine advancing the state of art with solving real world
problems. Examples of internship projects are:
Mining business processes from semi/unstructured information sources using domainspecific assists as necessary.
Exploring the creation, measurement and effectiveness of metrics for business process
characteristics like quality, compliance and cost.
Investigating continuous compliance of processes to policies at run-time, along with
implementing corrective measures in case of non-compliance.
Skills: Skills in one or more of business process/workflow languages (BPMN, BPML, BPEL etc.),
LEAN/Six Sigma/TQM and other process optimization and quality control methodologies,
operational risk management, process mining. Students in industrial engineering and
management sciences with related background are also encouraged to apply.
Project Code: B-003
15

The Distributed Systems Management group at IRL-Bangalore is examining issues involved in
the management of large, distributed infrastructure of servers, storage and networks. The
research involves representing complex interrelationships between system components and
correlating them to real-time events from monitoring tools in order to assist with problem
determination and change management using advances in discrete optimization and domain
knowledge of modern systems. Examples of internship projects are:
Structured knowledge management for problem determination
Portfolio optimization for service accounts
Skills: Skills in one or more of discrete algorithms, discrete optimization coupled with knowledge
of modern applications/middleware/systems and basic prototyping skills
Project Code: B-004
Applications along with your latest CV and the application form can be sent to
urirl@in.ibm.com your applications will be considered for the next years program
between May August. The deadline for submitting the application is Jan 15,
2008.
Summer Training Program Application

Date:
Preferred Location:
Delhi Bangalore
Student Details
Name:
Institute:
Department:
Student ID:
Contact No.
Email:
Ranking
GPA (Latest Semester):
Department Rank (Latest Semester):
Institute Rank (If any):
16

Area of Interest
List up to 3 IRL projects in your priority list (include the Project Code from the Project
Descriptions)
1.
2.
3.
(We will try to match you to the project area of your choice, but this may not always be possible)
Applicable Background
Explain why you feel that you are suited for the project area you have selected. Give details of the
relevant courses, software skills, project experience, papers etc.
**Fill your preference for Delhi/ Bangalore above
17

IRL 2008 Summer Internship Program

Hochgeladen von

Dokumentinformationen

Originalbeschreibung:

Originaltitel

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

IRL 2008 Summer Internship Program

Hochgeladen von

Copyright:

Verfügbare Formate

Summer Internship Program 2008

Information Management & Text Analytics

Knowledge Discovery and Data Mining

IBM India Research Laboratory

Summer Internship Program 2008

QA Knowledge Base Miner

IBM India Research Laboratory

Summer Internship Program 2008

Spoken Language Skills Evaluation through Speech Processing

Learning the Dialect of Noise in text

IBM India Research Laboratory

Summer Internship Program 2008

Incorporating Annotator Disagreements into Classification Models for Text

IBM India Research Laboratory

Summer Internship Program 2008

Managing Real-time Information in Next Generation Converged Networks

IBM India Research Laboratory

Summer Internship Program 2008

Analytics and Optimization

Workforce Management: Management of Resource Supply/Demand Risks in Global Service

Algorithms for understanding/mining/visualizing time varying interaction/social

IBM India Research Laboratory

Summer Internship Program 2008

High Performance Computing

IBM India Research Laboratory

Summer Internship Program 2008

IBM India Research Laboratory

Summer Internship Program 2008

Distributed Systems Management

IBM India Research Laboratory

Summer Internship Program 2008

Systems management & Software Engineering

IBM India Research Laboratory

Summer Internship Program 2008

IBM India Research Laboratory

Summer Internship Program 2008

Programming Models, Analysis & Tools

IBM India Research Laboratory

Summer Internship Program 2008

Fault Analysis Based Regression Testing

IBM India Research Laboratory

Summer Internship Program 2008

Internship opportunities at IRL Banglaore

IBM India Research Laboratory

Summer Internship Program 2008

Business Process Analytics & Optimization

Distributed Systems Management

IBM India Research Laboratory

Summer Internship Program 2008

Summer Training Program Application

IBM India Research Laboratory

Summer Internship Program 2008

**Fill your preference for Delhi/ Bangalore above

IBM India Research Laboratory

Das könnte Ihnen auch gefallen