Sie sind auf Seite 1von 1

Inaugural lecture by Professor Rasmus Pagh

UNDERSTANDING SOFTWARE, AND SOFTWARE FOR UNDERSTANDING (BIG DATA)


Even for highly skilled programmers it is difcult to understand software behaviour. Bugs and performance problems are major challenges for software practitioners and researchers. A particular challenge is understanding software whose path of execution relies on random choices, so-called randomized algorithms. Yet, randomized algorithms are essential in modern software, especially when dealing with large data sets. The device your are using to read this text is probably running hundreds of randomized algorithms right now. How can we know that the software will run correctly and smoothly? While understanding and taming software itself is a challenge, software is also increasingly used to understand the world around us through data analysis. Many speak of an era of big data, where decisions are informed by models and insights obtained through extensive data analysis. Development of big data tools that are robust and scalable to very large data sets is an emerging challenge. In the lecture I will illustrate the above through two randomized algorithms that I have developed: - Cuckoo hashing (2001) provides a highly efcient way of searching for a particular data item stored in memory. But in contrast to earlier good algorithms for organizing and searching data, it is not a priori clear that it works at all! Investigating cuckoo hashing and its variants has been an active area of research over the last decade, and the method is now in widespread use. - Tensor sketching (2011), allows efcient detection of correlated variables in data, without explicitly considering all possible correlations. Instead, it efciently computes special histograms where columns summarize many potential correlations, and these can be used to infer the correlated variables. Looking forward, I claim that understanding how software is inuenced by randomness is important not only when an algorithm itself uses randomness. In order to deal with imperfect data sources, both developers and users of big data systems will need to embrace statistical concepts, e.g., that collected data ITSELF contains randomness, to meaningfully extract and interpret patterns in data. Big data will require algorithm designers to reconsider the robustness of their algorithms to noise in data.

Time and place: Friday September 13, 2013, 14.15 15.15 (followed by a reception) Auditorium 4 IT University of Copenhagen Rued Langgaards Vej 7 2300 Kbenhavn S

IT-UNIVERSITY OF COPENHAGEN

Das könnte Ihnen auch gefallen