Sie sind auf Seite 1von 17

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/281031644

Analytics for Improving Talent Acquisition Processes

Conference Paper · April 2015

CITATIONS READS
5 1,666

3 authors:

Rajiv Srivastava Girish Palshikar


Tata Research Development and Design Centre Tata Consultancy Services Limited
17 PUBLICATIONS   30 CITATIONS    136 PUBLICATIONS   718 CITATIONS   

SEE PROFILE SEE PROFILE

Sachin Pawar
Tata Consultancy Services Limited
39 PUBLICATIONS   82 CITATIONS   

SEE PROFILE

Some of the authors of this publication are also working on these related projects:

Social Media Analysis View project

Digitate: Cognitive Automation for Enterprise IT View project

All content following this page was uploaded by Girish Palshikar on 17 August 2015.

The user has requested enhancement of the downloaded file.


Analytics for Improving Talent Acquisition Processes

Rajiv Srivastava, Girish Keshav Palshikar, Sachin Pawar


Systems Research Lab, Tata Consultancy Services Limited,
54B Hadapsar Industrial Estate, Pune 411013, India.
Email: {rajiv.srivastava, gk.palshikar, sachin7.p}@tcs.com

Abstract. Talent Acquisition (TA) is an important function within HR, responsible for recruiting
high quality people for given job positions through various sources under stringent deadlines and
cost constraints. Given the importance of TA in the overall successful operations and growth of
any organization, in this paper we identify specific “business questions” focused on analyzing
various aspects of the TA processes, analyze past TA data using statistical analysis techniques and
to discover novel patterns/insights and actionable knowledge which can help in improving the cost,
efficiency and quality of recruitments. Our predictive analytics is mainly related to various
durations and delays in TA, candidate selection or rejection, offer acceptance by selected
candidates, root cause analysis for offer decline. We also use the data-mining technique of
subgroup discovery to identify interesting patterns (e.g., candidate subgroups having unusually
high decline ratios). We illustrate the approaches through a real-life dataset.

Keywords: Talent Acquisition, Workforce Analytics, Human Resources Management, Machine


Learning, Text Mining, Data Mining.

1 INTRODUCTION
Quality and productivity of the workforce are very important for people-centric and knowledge-
intensive industries such as IT, BPO and services in general. It is the responsibility of the Talent
Acquisition (TA) function within HR to recruit the workforce of highest possible quality. The TA
function often works under highly variable (and often unclear) demand pipeline of future business
requirements. The TA function needs to attract the best possible talent from a complex supply
chain of educational institutes (if experience is not required), job portals, employment agencies,
recruitment consultants and direct sourcing through buddy, emails, advertisements, walk-ins and
web. The channels differ in terms of the number and quality of resumes sourced, time and cost for
sourcing, selection ratio and joining ratio for sourced candidates etc. The recruitments themselves
need to be done under stringent goals such as shortest possible times-frames, lowest possible
recruitment costs/efforts and working at many locations and dealing with diverse domains and
technical skills. Moreover, a variety of human and economic factors affect recruitments.

Standards such as People CMMI [4] have helped HR management practices to get aligned with
business objectives and to systematize programs of continuous workforce development. The TA
function typically follows a complex business process; Figure 1 shows the major steps in it in a
linear manner for simplicity. In reality, the business process has many variations and special cases
that need to be handled; e.g., candidates declined/delayed to join or found unsuitable during
induction etc. A separate business process is required for internal recruitment within the
organization.

Automation of workflow and data management functionality required within TA business


processes is supported by modern eHRM software systems [1], [20]. For example, TCS allows
prospective candidates to upload their resumes on our careers portal (www.careers.tcs.com).
Systems such as GRS and eRecruitment in TCS allow end-users (such as HR executives,
managers, project leaders and group leaders) to create and manage a lot of TA data. Typical TA
data consists of job requirements, sourcing interactions, resumes with annotations, interview
teams, locations and schedules, selections, offers, joining, placement and induction.

Given the importance of TA in the overall successful operations and growth of any organization,
it is clearly useful to analyze past TA data using suitable statistical analysis techniques to discover
novel patterns/insights and actionable knowledge which can help in improving the cost, efficiency
and quality of recruitments. Such work is part of the general task of workforce analytics (i.e.,
statistical analysis, modeling and mining of HR data), which is gaining importance [5], [12], [18],
[9], [7], [15], [13], [14]. In this paper, we discuss a system called iTAG that we are building for
using domain-driven analytics techniques to answer specific business questions in TA.
Demand
planning
Channel
activation
Resume sourcing

Candidate short-listing

Tests, interviews, selections

Background check, medical examination, other validations

Offer preparation, negotiation and dispatch

Joining

Placement

Induction

Figure 1. Major steps in a typical TA business process.


While a large number of data mining algorithms are readily available in BI tools, making effective
use of them to solve specific business problems is plagued with several challenges. First, there is
no unified representation for the many different types of knowledge extracted (decision rules,
clusters, associations etc.), making it difficult for end-users to relate extracted knowledge
elements. Next, even the volume of extracted knowledge tends to be large, making it difficult for
end-users to filter and use it. Lastly, there is a huge gap between the extracted knowledge and
practical business goals of the end-user (e.g., reducing TA costs, improving quality of recruited
people etc.) Extracted knowledge is not aligned to and hence not directly usable for meeting the
end-user’s business goals. It becomes the end-user’s responsibility to understand and make use of
the extracted knowledge to solve a business problem. This requires expertise in setting up
appropriate data-mining experiments, selecting and understanding the extracted knowledge and
applying it to solve the business problem. This demands data-mining expertise as well as domain
expertise - a rare combination. In this paper, we attempt to bridge this gap by adopting a domain-
driven data mining (DDDM) approach, which is getting increasing acceptance from end-users
[22], [23]. DDDM consists of either (i) designing special purpose algorithms for stated business
problems; or (ii) designing a “process” to use standard data-mining algorithms, with the right
domain knowledge, so that extracted insights lead to solutions for the business problems. The idea
is to move from domain-neutral data-mining towards problem-specific data-mining i.e., instead of
trying to fit a data-mining technique to a business problem, we adopt a top-down approach where
we begin with a business goal. As examples of DDDM, [17] and [6] survey techniques for fraud
detection and manufacturing respectively.

This paper is organized as follows. Section 2 outlines the business questions that are important in
analyzing TA processes. Section 3 describes a real-life TA dataset that we have used as a case-
study in this paper. Sections 4 to 6 discuss analytics for answering specific TA related business
questions outlined in Section 2. Section 7 presents conclusions and future work.

2 ANALYTICS FOR TALENT ACQUISITION


Given comprehensive data about various stages in the TA processes, the TA function is analyzed
by means of a basic dashboard consisting of important key performance index (KPI) measures and
tracks their evolution over time. The KPI measures are related to: candidate arrival, job
requirements arrival, time to select, time to join, hit ratio (no. of offers/no. of interviewees), decline
ratio (no. of joinees/no. of offers), costs of recruitment, channel wise utility and cost, recruiter
efficiency, select/reject statistics, campus statistics, competitor statistics etc. The users use slice
and dice facilities to create the dashboard for any subset of the TA data (e.g., location, ISU,
account, candidate skills/experience/role/domain).

In this paper, we focus on the domain-driven data-mining of TA data, where the goal is to answer
a specific business question related to cost, efficiency and quality related issues in TA business
processes. Some examples are given below. In this paper, we outline analytics-based approaches
to answer some of these questions.
1. What are the most difficult (in terms of cost or time) job requirements to fulfill?
2. What is the most typical (frequent) candidate profile selected (or rejected) for a particular job
requirement?
3. What are the “green flags” for a given job requirement? For example, people with Sun Java
Certification and (Tier 1 college education or Tier 1 company experience) may have a much
higher chance of getting selected than other candidates for a particular job requirement. A
similar question can be asked about “red flags” for a given job requirement.
4. What are the major differences between selected and rejected candidates for a particular
(given) job requirement?
5. What are the major root causes for rejecting candidates for a given job requirement?
6. What are the major root causes for candidates declining the offer for a given job requirement?
7. What campuses are “good” i.e., provide high quality candidates in large numbers with high hit
ratios?
8. What supply channels are “good” i.e., provide high quality candidates in large numbers with
high hit ratios?
9. What is an optimal sourcing plan (in terms of cost or time to join) for a given set of job
requirements? The sourcing plan should partition the job requirements among various supply
channels.
10. What are the major cost heads for TA? How can the cost of recruitment be brought down by,
say, 5%?
11. What are the bottlenecks (in terms of time or quality) in the TA process  for sourcing? For
short-listing? For selection? For offers? For joining?
12. Can we predict whether a selected person will join or not?
13. Can we predict whether a candidate called for interview will be selected or not?

3 A REAL-LIFE DATASET
In this paper we present a real-life case-study using a real TA dataset. The dataset covers a 1-year
period (Jan. to Dec. 2010) in a BPO organization and contains 26574 records divided into two
tables. The SELECTED table consisted of 12185 candidates who were selected, out of which 1148
(9.4%) declined and 11037 (90.6%) actually joined. The REJECTED table consisted of 14385
candidates who were interviewed but rejected during the recruitment process. The two tables have
some common attributes (columns) (Table 1). The given data had some quality issues, which we
cleaned using some pre-processing steps.
Table 1. Some columns in SELECTED and REJECTED.
Column Name Example Values / Description
CID Candidate ID
GENDER M, F
AGE
TYPE frontline, support, trainee, leadership
DEGREE BA, BCom, BSc, MBA, BPharm, BE, BCA
SPECIALIZATION Electronics, Physics, Marketing, English
INDUSTRY market research, banking, pharma, IT
CLIENT SuperValue, Lufthansa, Honeywell, ABB
GRADE 1, 2, …, 8
TOTAL_EXP Total experience in years
CURRENT_SALARY
LAST_ORG Last employer
INTERVIEW_DATE
*OFFER_DATE
*JOINING_DATE
*DECLINE_DATE
+REJECT_REASON process,subject,communication,attitude
*: columns only in SELECTED; +: columns only in REJECTED; unmarked columns are present in both SELECTED and REJECTED.

4 ANALYSIS OF DURATIONS AND DELAYS


As the TA function plays the critical role of supplying high-quality human resources to fulfill
business requirements from ongoing as well as future projects, it is crucial to provide predictability
and control to the TA business processes in terms of efforts, time durations, delays and cost. Any
delay or uncertainty in converting a candidate to Full Time Employee (FTE) affects productivity,
business growth and project delivery quality. In this section, we focus on the problem of
discovering bottlenecks in the TA business process.

4.1 Joining Interval


Joining interval is defined as the number of days between the date at which a selected candidate
was made a job offer and the date on which that person actually joins. Joining interval is an
important parameter for assessing the efficiency TA processes. Other intervals can be similarly
defined and analyzed (e.g., interview interval = date of interview – date of raising the resource
requirement). Joining interval depends on the personal situation of each candidate which can be
characterized in terms of many factors, such as experience (more experienced candidates usually
take longer to join) and the notice period in the current organization. Summary statistics for the
joining intervals (in days) of a subset of 3489 selected candidates are: Min:0 Max:253
Average:15.0 STDEV:17.65 Q1:2 Q2:8 Q3:22. Fig. 2(a) shows a histogram of the joining intervals.
Fig. 2(b) shows a histogram of the joining intervals for various subsets of candidates corresponding
to different values of the INDUSTRY attribute.

Figure 2. (a) Histogram of joining intervals (b) Joining intervals across values of INDUSTRY attribute.

We built a regression model for joining interval as the dependent variable. The regressor variables
were NOTICE_PERIOD and GRADE. We selected a subset of 767 selected and joined candidates to build
the model. The fitted model’s regression coefficients are:
ˆ0  4.22787, ˆ1  0.709673, ˆ2  4.584716
For this model, R2 = 0.35, which is not very high. The global F-test indicates that at least one of
these regressors is significant and removing either one of them significantly reduces the fit (R2).
Adding other variables (e.g., AGE, TOTAL_EXP, CURRENT_SALARY etc.) does not improve fit of the
regression model (R2) and in fact, partial F-test indicates that the added variables are not significant
in the presence of the above two regressors.

The next important question is whether there are any “patterns” among the joining intervals. In
particular, are there any (sufficiently large) subsets of selected candidates (who share some
characteristics) who have suffered “unusually high” joining intervals? We use a data-mining
technique called subgroup discovery to identify such “interesting” subgroups. A subgroup is
characterized as a selector constructed as a conjunction (AND) of attribute-value pairs. Such a
subgroup is interesting if its joining interval values are “significantly higher” as compared to the
rest of the selected candidates, as determined by Student’s t-test. Our subgroup discovery
algorithm [11] systematically explores (using the beam search technique) the space of all possible
selectors and reports those which are “interesting”. Table 2 shows several such interesting subsets
which have unusually high joining intervals.

Table 2. Subgroups with unusually high joining intervals.


Subgroup (unusually high joining intervals) Size %of subset in top quartile of joining interval

INDUSTRY=POWER and Grade=SCAT 73 97.26


SAL_RANGE=’>25’ 33 81.82
SPECIALIZATION=Accounts and EXP_BAND=Fresher 70 80.00
Grade=BPO.6 106 58.49
DEGREE=MSc and Grade=BPO.3 104 65.38

The use of systematic statistically rigorous subgroup discovery method facilitates deep exploration
of subgroups (e.g., using say 5 or more attributes), which is impossible to do manually due to the
exponentially large number of possibilities. Similar analysis for other intervals (and delays) also
discovers significant sized sub-groups with very high values. Such insights can help the user to
identify likely sources of long intervals and delays and plan strategies to deal with them
accordingly during recruitment. Efforts (e.g., root cause analysis and improvement plans) to reduce
occurrences of long intervals and high delays can now be focused on specific subgroups, which
are kind of bottlenecks in the TA process.

4.2 Delays
Other than various intervals discussed earlier, the TA process can also be measured against various
delays that may occur. A delay happens when an event does not take place by the expected date.
An important delay (from the perspective of TA process efficiency) is the joining delay, defined
as the difference between the actual day of joining and the expected date of joining (as agreed by
the selected candidate). Note that the joining delay value can be negative, if a candidate joins
before the expected date. We are mainly interested in characterizing positive joining delays, when
the candidate joins later then the agreed date. Other kinds of delays that happen in other stages of
the TA business processes (e.g., for interview, offer roll-out, medical examination) can be similarly
defined and analyzed.

We selected a subset of 3498 selected candidates, out of which 192 (5.5%) had delayed their
joining (i.e., they joined after the expected joining date) and the remaining 3306 (94.5%) were not
delayed (i.e., they joined on or before the expected date). Summary statistics for the joining delays
(in days) of these candidates are: Min:31 Max:98 Average:0.75 STDEV:5.6 Q1:0 Q2:0 Q3:0.
Clearly, most people join before or on the expected date of joining. Fig. 3(a) shows a histogram of
the joining delays. Figure 3(b) shows the variation of joining delays for different values of
INDUSTRY attribute.

Figure 3. (a) Histogram of joining delays (b) Joining delay across values of INDUSTRY attribute.

As a first analysis, we built a multiple regression model for predicting joining delays, with AGE,
CURRENT_SALARY, TOTAL_EXP, GRADE and INDUSTRY as regressors. The R2-value was extremely

low (0.04), and the partial F-tests indicated that regression coefficients for none of the regressors
were significant. Since regression models were poor, we tried classification-based predictive
models. We discretized the joining delay values into a binary class label: DELAYED (if joining
delay > 0) and NOT_DELAYED (otherwise). We then used the well-known WEKA tool
(http://www.cs.waikato.ac.nz/ml/weka/) to build predictive models for joining delay class label,
using standard machine learning techniques for classification, such as Decision Tree, Support
Vector Machines (SVM), and Naïve Bayes for building predictive models. Table 3 shows the
results of 5-fold cross-validation (target class = DELAYED); we do not report the accuracy for
Decision Tree, which was poor. Please note the class imbalance (very few examples of class
DELAYED). The overall prediction accuracy is good, but we are really interested only in the
accuracy for the target class DELAYED (Table 3), for which the accuracy is quite poor, indicating
that the data does not seem to capture many of the relevant attributes that truly affect the decision
of why people delay the joining date.

Table 3. Accuracy of predictive models for joining (for the target class DELAYED).
Classifier Precision Recall F-measure
Naïve-Bayes 0.279 0.328 0.301
SVM (RBF) with Normalized attributes, weight = 0.1, 1.0 0.277 0.677 0.393
Random Forest 0.244 0.161 0.194

The next important question is whether there are any “patterns” among the delays. In particular,
are there any (sufficiently large) subsets of selected candidates (who share some characteristics)
who have suffered “unusually high” joining delays? Our subgroup discovery algorithm [11]
discovered various such interesting subsets (Table 4).

Table 4. Subgroups with unusually high joining delays.


Subgroup (unusually high joining delays) Size % with high joining delay
RESOURCE_TYPE=Frontline and GRADE=BA and INDUSTRY=HRO 186 33.87
RESOURCE_TYPE=Frontline and GRADE=BA and EXP_BAND=3-5 years 221 23.8
RESOURCE_TYPE=Frontline and DEGREE=BA and Grade=BA 187 21.39
RESOURCE_TYPE=Frontline and DEGREE=BA and Grade=BCom 311 20.26

5 ANALYSIS OF TA EFFICIENCY
Selection ratio and join ratio are two important KPI parameters to assess the efficiency of the TA
business process (apart from others). Both parameters directly affect total cost of TA: if the former
ratio is very low, TA has to call more candidates for interview than required and if the latter is
very low then all the steps in TA process have to be repeated. In addition to higher cost of
acquisition, low join ratio may result in direct loss of revenue because of placement delays in
important client assignments.

Considering the importance of selection ratio and join ratio, two natural questions are:
1) Can we predict, by looking at the historical data, what kinds of candidates are likely to be
selected?
2) Can we predict, by looking at the historical data, what kinds of candidates are likely to decline?

For example; knowing a candidate’s qualification, experience, current salary, salary offered, and
grade offered, can we predict if the selected candidate is likely to accept or decline the offer? Even
if we cannot make this prediction with 100% confidence, can we at least compute the probability
of how likely the candidate is going to decline the offer? Similar predictions are needed for
candidates who are likely to be selected.

5.1 Predictive Models


The dataset used for modeling SELECT/REJECT predictions is a modified version of the one
described earlier. Records containing “many” missing values were discarded, resulting in a set of
20563 candidates, out of which 7649 were selected and 12914 were rejected. We have also used
more attributes than shown in Table 1, such as:
1. HIGHEST_QLFN: This has 3 possible values, ‘Under Graduate’, ‘Graduate’ and ‘Post
Graduate’ determined using the attribute DEGREE.
2. STREAM: This is also based on the attribute DEGREE and has values such as ‘Commerce’,
‘Science’, ‘Engineering’, ‘Pharmacy’.

For ACCEPT/DECLINE, we use following derived attributes:


1. SAL_ASPER_GRADE: This attribute has one of three values ‘Average’, ‘Below Average’ and
‘Above Average’. It is calculated by using the attributes Grade and Salary. For each grade,
salary average (µ) and standard deviation (σ) are calculated. Then for any grade the salaries
within [µ - 0.25 * σ, µ + 0.25 * σ] are marked as ‘Average’ and salaries above and below are
marked as ‘Above Average’ and ‘Below Average’ respectively.
2. LOCATION_CONFLICT: This is a binary attribute indicating whether candidate location is same
as the joining location offered.

To illustrate the predictive models for ACCEPT/DECLINE and SELECT/REJECT, we use a


subset of selected candidates selected in first and second quarters for one particular business unit.
This subset contains 1201 selected candidates, out of which 272 had declined the offers (remaining
929 candidates accepted the offer and joined). We used WEKA to apply standard machine learning
techniques for classification, such as Decision Tree, Support Vector Machines (SVM), and Naïve
Bayes for building predictive models. Table 5 shows the results of 5-fold cross-validation for 1201
selected candidates, out of which 272 had declined the offer. In this methodology, we split the
given training data into 5 parts randomly, train the classifier on randomly selected 4 parts and test
its predictions on the remaining part. The predictions accuracies are not very high, indicating
inadequate data that does not capture all the factors that affect the DECLINE decision of a
candidate.

Table 5. Accuracy of predictive models.


Prediction Problem Classifier Precision Recall F-measure
Decision Tree 0.731 0.640 0.682
Naïve-Bayes 0.391 0.658 0.490
ACCEPT / DECLINE
SVM (RBF) 0.474 0.507 0.490
Random Forest 0.681 0.511 0.584
Decision Tree 0.774 0.580 0.663
Naïve-Bayes 0.703 0.556 0.621
SELECT / REJECT
SVM (RBF) 0.763 0.517 0.616
Random Forest 0.738 0.647 0.690

Among the four models we used for predicting ACCEPT/DECLINE, decision tree outperforms
other classifiers. Decision Tree identifies approx. 64% cases (Recall) of the target class
(ACCEPT/DECLINE) and it correctly predicts (precision) 73% of the cases. An additional
advantage of decision tree is that the decision rules are represented graphically and hence are easy
to understand for end-users. Some examples of decision rules extracted from the discovered
decision trees are shown in Figure 4. Random Forest model is better for SELECT/REJECT
prediction.
IF SOURCE = Buddy
| AND STREAM = COMPUTER/IT
| | AND EXP_BAND = 3-5 years
| | | AND TYPE = Frontline
| | | | AND AGE <= 28
THEN class = SELECT (with 90% confidence)
IF DIVISION = 1.1
| AND HIRING_QTR = Q1
| | AND INDUSTRY = IT
| | | AND SAL_ASPER_GRADE = Above Average
| | | | AND GENDER = M
| | | | | AND HIGHEST_QLFN = Under Graduate
| | | | | | AND TCS_SAL <= 281129
THEN class = DECLINE (with 70% confidence)
Figure 4. Some examples of discovered classification rules.

5.2 Subgroup Discovery


We applied our subgroup discovery algorithm [11] to discover logically-related subgroups
(subsets) of candidates with unusually low accept ratio (i.e., high decline ratio) or unusually high
reject ratio (i.e., low select ratio). In our dataset 272/1201 (22.65%) candidates declined the offer.
Similarly, the average value of the selection ratio is 7649/20563 = 37.2%. Table 6 and Table 7
show some interesting subgroups discovered among those two datasets. Subgroups with unusually
low selection ratio and unusually high decline ratio (as compared to the global average) indicates
bottleneck areas of inefficiency and can form a focus for TA business process improvement plans.
For example, some specializations clearly show a much lower % of people being selected, reasons
for which need to be investigated.

Table 6. Subgroups with unusually high or low SELECT ratios.


Subgroup (unusually high or low SELECTED percentages) Size %SELECTED
SPECIALIZATION=marketing and EXP_BAND=1-3 years 76.37
SPECIALIZATION=computer science and STREAM=SCIENCE 71.01
SPECIALIZATION=economics and STREAM=ARTS 58.61
SPECIALIZATION=finance and STREAM=BUSINESS ADMINISTRATION 58.06
EXP_BAND=1-3 years and SPECIALIZATION=bio-technology 0.00
STREAM=MANAGEMENT and SPECIALIZATION=management studies 3.45
SPECIALIZATION=pharmacy and STREAM=MEDICAL/PHARMACY 4.96
EXP_BAND=Fresher and SPECIALIZATION=science and STREAM=SCIENCE 11.6
Table 7. Subgroups with unusually low JOIN ratios.
Subgroup (unusually high or low JOINED percentages) Size %DECLINED
INDUSTRY = INSURANCE 100 51.0
INDUSTRY = IT and GRADE=BPO.1 248 30.65
GRADE=BPO.1 and TCS_SAL=1-2 Lacs 230 30.43
DEGREE = BA 74 29.73

6 ROOT CAUSE ANALYSIS FOR DECLINE


We next consider the situation where a selected candidate declines the offer and does not join. It
is important to know the real reasons (root causes) for such decline decisions. Many candidates do
not give a clear reason, though some do. There is a set of fixed well-understood reasons why
selected candidate decline the offer, which are related to salary, designation/grade, location, role,
shift duties, better offer from a competitor, document verification issues, personal issues etc. It is
important to use the historical data to build a model to estimate the (missing or hidden) root cause
for the decline decision of a particular candidate who has not stated any specific reason. Since
decline decision is a costly one with severe business impact, such estimates (along with the
ACCPT/DECLINE predictive model) can then be used to improve the TA business process; e.g.,
to negotiate another offer or to preempt the decline decision by improving the offer to eliminate
the likely root-cause. Table 8 shows the major root causes for DECLINE.

Table 8. Major reasons why selected candidates decline to join.


Decline reason #candidates
Duplicate profile submitted 43
Issues with work type/location/shift/notice period 14
Joining competitor 37
Not interested 79
Other candidate related docs or issues 35
Personal 34
Salary issue 23

We build the DECLINE root cause estimation model in the form of a Bayesian Network (BN) [11].
We adopted a 2-step approach for root-case analysis of DECLINE events.
1. Given a historical training dataset of past DECLINE decisions (along with the known root
cause for each, as given by the attribute DECLINE_REASON), a BN discovery algorithm [1]
(https://dslpitt.org/genie/) is used to automatically identify a BN from the data. During this
step, dependencies between data attributes are learned, along with their conditional probability
tables.
2. The discovered BN is used to estimate (using an inference algorithm) the likely root cause for
any candidate who has declined, by using his/her profile as evidence. Essentially, given
candidate profile and DECLINE = yes, the only unknown RV is DECLINE_REASON, for
which the most likely value is estimated using an inference algorithm.
Figure 5 shows the discovered Bayesian Net and an example of inference of the likely decline
reason for a particular candidate.

Figure 5. Discovered Bayesian Net and an example of inference of the likely decline reason for a candidate.

7 CONCLUSIONS AND FURTHER WORK


In this paper, we presented a domain-driven analytics-based system for answering specific
business questions pertaining to cost, efficiency and quality issues in TA business processes. In
particular, we focused on predictive models for candidate selection and offer acceptance by
selected candidates, root cause analysis for offer decline, as well as several innovative as-is state
descriptions (e.g., candidate subgroups having unusually high decline ratios). We illustrated the
approaches through a real-life dataset. We are working on further improving the accuracies of the
techniques presented. We are also working on devising analytics to answer more “business
problems” related to the TA processes.

Acknowledgements. We thank Dr. Harrick Vin, Dr. Ritu Anand, and various people from TCS
HR Department for their extensive help during the course of this work.
RFERENCES
[1] Bondarouk, T., Ruël, H., Guiderdoni-Jourdain, K. and Oiry, E. Handbook of Research on E-Transformation and
Human Resources Management Technologies—Organizational Outcomes and Challenges. IGI Global, 2009.
[2] Budhwar P.S., Varma A., Singh V. and Dhar R. HRM systems of Indian call centres: an exploratory study, Int.
Journal of Human Resource Management, 17(5), 2006.
[3] Cooper, G.F. and Herskovits E. A Bayesian method for the induction of probabilistic networks from data.
Machine Learning, 9, pp. 309--347, 1992.
[4] Curtis, W., Hefley, W.E. and Miller, S.A. The People CMM: A Framework for Human Capital Management.
2/e, Software Engineering Institute, 2009.
[5] Connors, D. and Mojsilovic, A. Workforce Analytics for the Enterprise: An IBM Approach. Chapter in Service
Science Handbook, Maglio, P.P., Kieliszewski, C.A. and Spohrer J.C. (ed.s), Springer, 2010.
[6] Harding, J.A., Shahbaz, M. and Srinivas, Kusiak, A. Data mining in Manufacturing: A Review. Journal of
Manufacturing Science and Engineering. 128, pp. 969–976, 2006.
[7] Hu, J., Lu, Y., Mojsilovic, A., Singh, M. and Squillante M. Next generation workforce management analytics
for the globally integrated enterprise. Proc. Institute for Operations Research and the Management Sciences
(INFORMS) Annual Meeting, Washington, DC, October 2008.
[8] Hülsheger, U.R., Maier, G.W. and Stumpp, T. Validity of general mental ability for the prediction of job
performance and training success in Germany: a meta-analysis. Int. Journal of Selection and Assessment, 15: 1,
3–18, 2007.
[9] Lu, Y., Radovanovic, A. and Squillante, M. Workforce Management in Service via Stochastic Network Models.
Proc. 2006 IEEE/INFORMS Int. Conf. on Service Operations, Logistics and Informatics (SOLI 2006), 2006.
[10] Murray, M., Young, J., 2008. Decision Model for Contracting Helpdesk Services. Journal of Service Science,
1(1), (www.cluteinstitute-onlinejournals.com ).
[11] Natu M. and Palshikar G.K. Interesting Subset Discovery and its Application on Service Processes, Workshop
on Data Mining for Services (DMS 2010), Int. Conference on Data Mining (ICDM 2010), Australia, 2010.
[12] Naveh, Y., Richter, Y., Altshuler, Y., Gresh, D. L. and Connors, D. P. Workforce optimization: identification
and assignment of professional workers using constraint programming. IBM Journal of Research and
Development, 51, 2007.
[13] Palshikar, G.K., Deshpande, S., Bhat, S. QUEST: Discovering Insights from Survey Responses. Proc. 8th
Australasian Data Mining Conf. (AusDM09), Dec. 1-4, 2009, Melbourne, Australia, P.J. Kennedy, K.-L. Ong,
P. Christen (Ed.s), CRPIT, vol. 101, published by Australian Computer Society, pp. 83 - 92, 2009.
[14] Palshikar, G.K., Vin H.M., Vijaya Saradhi V. and Mudassar M. Discovering Experts, Experienced Persons and
Specialists for IT Infrastructure Support, Service Science, Vol. 3, No. 1, pp. 1 - 21, Spring 2011.
[15] Patterson, B. Mining the gold: gain competitive advantage through HR data analysis. HR Magazine, 2003.
[16] Pearl, J. Probabilistic reasoning in intelligent systems: networks of plausible inference. Morgan Kaufmann,
1988.
[17] Phua C., Lee, V., Smith-Miles, K. and Gayler, R. A Comprehensive Survey of Data Mining-based Fraud
Detection Research. Artificial Intelligence Review, 2005.
[18] Richter, Y., Naveh, Y., Gresh, D.L. and Connors, D.P. Optimatch: Applying Constraint Programming to
Workforce Management of Highly-skilled Employees. Proc. 2007 IEEE/INFORMS Int. Conf. on Service
Operations, Logistics and Informatics (SOLI 2007), 2007.
[19] Rousseau, D.M. and Barends, E.G.R. Becoming an evidence-based HR practitioner. Human Resource
Management Journal, 21(3), 221–235, 2011.
[20] Strohmeier, S. Research in e-HRM: review and implications. Human Resource Management Review, 17(1), pp.
19-37, 2007.
[21] Van De Voorde, K., Paauwe, J., Van Veldhoven, M. Predicting business unit performance using employee
surveys: monitoring HRM-related changes. Human Resource Management Journal, 20(1), pp. 44–63, 2010.
[22] Yu, P. S. (ed.). Proc. 2007 Int. Workshop on Domain driven Data Mining. ACM Press, 2007.
[23] Yu, P. S. (ed.). Proc. 2008 Int. Workshop on Domain driven Data Mining. ACM Press, 2008.

View publication stats

Das könnte Ihnen auch gefallen