Case Study Data Verification

Data Verification Case Study
Managing for Complexity in the Crowd

We delegated to the crowd an assignment what could have been simply completed by any data collection analyst. The problem with assigning this sort of work to an in-house analyst, however, is that this sort of task is generally low value a business would not want to spend its expert staffs time doing this job. Yet, the job required some amount of cognitive ability and could not be ordered to a machine. This was the perfect job for the crowd. We took an approach that most data managers would in engaging the crowd ! it didn't go well. Here's what we learned.
The Task
We asked the crowd to validate company websites. We provided them with a Company Name and a Company Description, which was actually the description of the SIC industry classification that had been assigned to the business, and we asked the crowd workers to confirm whether the link we gave them was the correct website for the company. We asked the contributors to select one of the following answers: 1 Yes, the Name and SIC match the site 2 No, neither the Name nor the SIC match the site 3 Maybe, the Name matches but the SIC does not 4 Bad link As you can see from our answer choices, we were actually trying to get more than one answer from the workers. In addition to providing the SIC code as a means to confirm that the website belonged to the business (matching not only the Company Name but also the Company Description to the website), we also hoped to validate the industry classification previously assigned to the business (answer choice Maybe).
TIP #1
Avoid overloading questions instead, break complex questions into multiple steps.
Our instructions explained how to find the Company Name on the website and where workers might easily locate a description of services that could be used to compare the Company Description. We distributed the work to U.S.-based crowdworkers and paid $0.04 per unit. The entire file included 391 records with 13 test questions (or gold records) that were used to gauge the trust level of the workers on this task. Contributors were paid for every judgment (answer) provided and, apart from the test questions which were included in each task and used to determine whether the worker could continue working on the job, the accuracy of the work was not checked until the entire file had been completed.
Findings TIP #2 If there is a predominant correct answer to your question, select test questions where the answer is not the easiest guess.
Overall, the quality of the work we received was not great however, this did not surprise us, as we knew that there were flaws in the design of our task. We will discuss the mistakes we made and our takeaways in the next section. First, we share our analysis of the results file: ! ! The same universe of workers completed both the golds and full file records, and had an average trust level of 89.9% The quality-checked sample from the full file was not randomly selected. We included more units where the correct answer was not likely to be Yes in order to gauge the quality of judgments where the correct answer was No or Maybe, as these judgments had a higher error rate in the gold record results. Based on the golds and the sample checked from the full file, the correct answer was Yes more than half of the time.
2013 K2 Case Studies
www.k2reimagine.com

Gold Records (units with known answers were used to test
contributors quality as they worked) ! There were 327 judgments for the 13 gold units (every worker is required to complete one gold record for every 5 units they attempt to answer in the file; therefore, golds will always have a larger judgment-to-unit ratio) The overall accuracy of golds was 84% Of the 13 gold questions, yes was an acceptable answer for 11 of them; therefore, of the 327 judgments, 277 could have had a correct answer of Yes Of the 327 judgments, 299 judgments answered the question with Yes For the two gold records where the correct answer was No, the accuracy of judgments was only 9% and 36% Because of the high number of possible correct Yes answers, the crowd could have answered every unit with Yes (all judgments = Yes) and accuracy of gold records would not have changed
Gold Judgments & Accuracy
! !
! ! !
Full File Records

! ! ! There were a total of 1135 judgments in response to the 391 units in the full file (approx 3:1) The accuracy rate of the 33 units sample checked from the full file (105 judgments) was only 55% The number of responses per unit varied from 2 to 6 (it is unclear what led to more responses for some units, as there doesnt seem to be a trend on confidence). According to the results from the sample checked, the accuracy of work did not appear to be correlated to the contributors trust score. (The trust score refers to an individual workers accuracy level, based on prior jobs.)
TIP #3 While worker trust scores can be useful, the design of your job is what will determine the level of accuracy.
www.k2reimagine.com
Takeaways & Recommendations

Crowdsourcing is indeed a powerful alternative to traditional means of collecting and validating data. Even working with an imperfect design, we were amazed by the turnaround we witnessed from the crowd. K2 had estimated our job of 391 units would require approximately 1,000 judgments. To test the supply of labor, we launched the work to U.S.-based workers only on a Tuesday at 9:15pm by 9:30pm, the job was 90% complete, having captured 1,000 judgments in only 15 minutes. Speed and agility are clear benefits offered by the crowd. While we did not test language and specialized skill proficiency in this particular exercise, with examples like Wikipedia (the crowd-sourced encyclopedia) and uTest (crowd-conducted software testing) we are seeing more and more clear cases of individual competencies and collective knowledge being tapped into via online collaboration platforms. Harnessing this ability and enabling the right workers to perform the right tasks in the right ways are skills that those designing jobs for the crowd must possess.
TIP #4 Design your tasks bearing in mind that the crowd does not know the context of the task (or your business).
Designing crowd-supported workflows and microtasks successfully requires not only an appreciation for the nuance of the processes being restructured but also a familiarity with the behavior of the crowd. Just as a manager building a new team in a new location would consider the economics and the culture distinguishing their newcomers, the same attention must be given to this on-demand, virtual workforce. If there is one golden rule for working with the crowd, it is to avoid the common trap of simply retooling and redeploying existing processes. Rather, crowdsourcing for data management thrives when the work is reimagined, keeping in mind the capacity as well as the constraints, and equally important, the motivations of the crowd. We have discussed the capabilities that can be unleashed from the crowd; now we will highlight some of the constraints and motivations that were exemplified in this exercise.
Most data managers commit significant time and resource to ensuring that their production analysts understand the business demands driving their work. The best analysts and data specialists connect the dots and are able to see the big picture. A good analyst would have looked at our task in this case and would have understood that we could optimize the task of confirming the company website to include confirming the industry classification that had been previously assigned (by noticing while on the website the business in which the company is engaged). Workers in the crowd, however, do not have the same context. The task they have been assigned has been necessarily sliced down to just a fractional view of the overall scope of work. It is for this reason that microtask designers must frame questions in a way that does not assume any background knowledge or perspective on the part of the crowdworkers that would allow them to add value above and beyond what the task explicitly requests. In-house analysts, and even outsourced teams, are typically motivated by quality and reputation, key elements for continued employment and career growth. The crowd is motivated by shorter-term incentives, typically money. Whereas quality and reputation apply to the Crowd as well, maximizing compensation is
2013 K2 Case Studies www.k2reimagine.com 3

the key driver - the more units they can churn through, the more money they will earn. Understanding this important distinction is critical to maximizing the quality of the output from the crowd. A contributors quality score is determined by his judgments historically across jobs he has completed. It includes answers to gold units that are hidden amongst the units of work in a job. As the name suggests, a data validation task asks workers to confirm a given value for an attribute. In cases like this, the most likely answer to the question Is this the correct website? is Yes. This obviously presents an opportunity for contributors to play the odds, making their way through a large number of units choosing the same answer (Yes) every time while maintaining a relatively high quality score. To prevent against this behavior, job designers must include a sufficient number of gold questions where the correct answer is not the predominant result (in this case, we should have included more test questions where the answer was clearly No or Bad Link). This control mechanism allows the job designer to weight the quality score of the contributor based on his performance on a controlled sample of gold questions where the likelihood of Yes answers is not proportional to the number of Yes answers across an entire job. Remember in school when you realized that a teacher had a bias toward putting the correct multiple-choice answer as letter "C? If you didn't know the answer or were running short on time, you would default to "C" and almost guarantee yourself a 50-50 chance at getting the answer right. Same concept. While this may sound like a clever approach, the default answer will be wrong some of the time, reducing the quality of your overall work product. These observations are not meant to suggest that organizations should not leverage the crowd. In fact, quality not withstanding, we demonstrated in this case that the crowd truly does scale. And, if the piece of work does not build knowledge or specialized skills for your team, the crowd most likely offers a more attractive cost-benefit proposition. The key is to understand your workers and maximize the value of their contributions by design jobs that take into account the motivations, behaviors and constraints.
TIP #5 As with any workforce, understand the behaviors and motivations of the crowd.
www.k2reimagine.com

Case Study Data Verification

Hochgeladen von

Dokumentinformationen

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Case Study Data Verification

Hochgeladen von

Copyright:

Verfügbare Formate

Data Verification Case Study

Managing for Complexity in the Crowd

2013 K2 Case Studies

Data Verification Case Study

Full File Records

2013 K2 Case Studies

Data Verification Case Study

Takeaways & Recommendations

Data Verification Case Study

2013 K2 Case Studies

Das könnte Ihnen auch gefallen