Sie sind auf Seite 1von 287

Fridolin Wild

Learning Analytics in R

with SNA, LSA, and MPIA

Fridolin Wild Learning Analytics in R with SNA, LSA, and MPIA

Learning Analytics in R with SNA, LSA, and MPIA

ThiS is a FM Blank Page

Fridolin Wild

Learning Analytics in R with SNA, LSA, and MPIA

Fridolin Wild Learning Analytics in R with SNA, LSA, and MPIA

Fridolin Wild Performance Augmentation Lab Department of Computing and Communication Technologies Oxford Brookes University Oxford UK

The present work has been accepted as a doctoral thesis at the University of Regensburg, Faculty for Languages, Literature, and Cultures under the title “Learning from Meaningful, Purposive Interaction: Representing and Analysing Competence Development with Network Analysis and Natural Language Processing”.

Cover photo by Sonia Bernac (http://www.bernac.org)

ISBN 978-3-319-28789-8 DOI 10.1007/978-3-319-28791-1

ISBN 978-3-319-28791-1 (eBook)

Library of Congress Control Number: 2016935495

© Springer International Publishing Switzerland 2016 This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. The publisher, the authors and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, express or implied, with respect to the material contained herein or for any errors or omissions that may have been made.

Printed on acid-free paper

This Springer imprint is published by Springer Nature The registered company is Springer International Publishing AG Switzerland

To my father Kurt Wild (*1943– { 2010).

ThiS is a FM Blank Page

Acknowledgement

This work would not have been possible without the support of my family, my son Maximilian, my brothers Benjamin, Johannes, and Sebastian, and my parents Brunhilde and Kurt, only one of whom survived its finalisation (Dad, you are missed!). I am indebted to Prof. Dr. Peter Scott, at the time Director of the Knowledge Media Institute of The Open University and Chair for Knowledge Media, now with the University of Technology in Sydney as Assistant Deputy Vice-Chancellor, and Prof. Dr. Christian Wolff, Chair for Media Informatics of the University of Regens- burg (in alphabetical order), who both helped improve this book and saved me from my worst mistakes. Without you and our engaging discussions, this book would simply not have happened. There are many people who contributed to this work, some of them unknow- ingly. I d like to thank the coordinators of the EU research projects that led to this work, also in substitution for the countless researchers and developers with whom I had the pleasure to work in these projects over the years: Wolfgang Nejdl for Prolearn, Barbara Kieslinger for iCamp, Ralf Klamma and Martin Wolpers for ROLE, Peter van Rosmalen and Wolfgang Greller for LTfLL, Peter Scott for STELLAR, Vana Kamtsiou and Lampros Stergioulas for TELmap, and Paul Lefrere, Claudia Guglielmina, Eva Coscia, and Sergio Gusmeroli for TELLME. I d also like to thank Prof. Dr. Gustaf Neumann, Chair for Information Systems and New Media of the Vienna University of Economics and Business (WU), for whom I worked in the years 2005–2009, before I joined Peter Scott s team at the Knowledge Media Institute of The Open University. Special thanks go to Prof. Dr. Kurt Hornik of the Vienna University of Economics and Business, who first got me hooked on R (and always had an open ear for my tricky questions). I d also like to express my gratitude to Dr. Ambjoern Naeve of the KTH Royal Institute of Technology of Sweden for awakening my passion for mathemagics more than a decade ago and believing in me over years that I have it in me.

viii

Acknowledgement

I d like to particularly thank Debra Haley and Katja Buelow for the work in the LTfLL project, particularly the data collection for the experiment further analysed in Sects. 10.3.2 and 10.3.3 . I owe my thanks to my current and former colleagues at the Open University and at the Vienna University of Economics and Business, only some of which I can explicitly mention here: Thomas Ullmann, Giuseppe Scavo, Anna De Liddo, Debra Haley, Kevin Quick, Katja Buelow, Chris Valentine, Lucas Anastasiou, Simon Buckingham Shum, Rebecca Ferguson, Paul Hogan, David Meyer, Ingo Feinerer, Stefan Sobernig, Christina Stahl, Felix Moedritscher, Steinn Sigurdarson, and Robert Koblischke. Not least, I d like to stress that I am glad and proud to be a citizen in the European Union that has brought the inspiring research framework programmes and Horizon 2020 on the way and provided funding to those projects listed above.

Fridolin Wild

The original version of the book was revised: Acknowledgement text “Cover photo by Sonia Bernac ( http://www.bernac.org )” has been included in the copyright page. The Erratum to the book is available at DOI 10.1007/978-3-319-28791-1_12

ThiS is a FM Blank Page

Contents

1 Introduction

1

1.1 The Learning Software Market

 

2

1.2 Unresolved Challenges: Problem Statement

 

3

1.3 The Research Area of Technology-Enhanced Learning

 

5

1.4 Epistemic Roots

 

7

1.5 Algorithmic Roots

 

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

9

1.6 Application Area: Learning Analytics

 

10

1.7 Research Objectives and Organisation of This Book

 

14

References

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

17

2 Learning Theory and Algorithmic Quality Characteristics

 

23

2.1 Learning from Purposive, Meaningful Interaction

 

25

2.2 Introducing Information

 

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

27

2.3 Foundations of Culturalist Information Theory

 

28

2.4 Introducing Competence

 

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

30

2.5 On Competence and Performance Demonstrations

 

32

2.6 Competence Development as Information Purpose

 

of Learning .

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

32

2.7 Performance Collections as Purposive Sets

 

34

2.8 Filtering by Purpose .

 

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

35

2.9 Expertise Clusters

 

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

35

2.10 Assessment by Equivalence

 

36

2.11 About Disambiguation

 

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

37

2.12 A Note on Texts, Sets, and Vector Spaces

 

38

2.13 About Proximity as a Supplement for Equivalence

 

39

2.14 Feature Analysis as Introspection

 

39

2.15 Algorithmic Quality Characteristics

 

40

2.16 A Note on Educational Practice

 

41

2.17 Limitations .

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

41

2.18 Summary

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

42

References

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

43

xii

Contents

3 Representing and Analysing Purposiveness with SNA

 

45

3.1 A Brief History and Standard Use Cases

 

45

3.2 A Foundational Example

 

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

48

3.3 Extended Social Network Analysis Example

 

56

3.4 Limitations .

 

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

68

References

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

69

4 Representing and Analysing Meaning with LSA

 

71

4.1 Mathematical Foundations

 

73

4.2 Analysis Workflow with the R Package lsa

 

77

4.3 Foundational Example

 

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

80

4.4 State of the Art in the Application of LSA for TEL

 

91

4.5 Extended Application Example: Automated Essay Scoring

 

95

4.6 Limitations of Latent Semantic Analysis

 

101

References

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

101

5 Meaningful, Purposive Interaction Analysis

 

107

5.1 Fundamental Matrix Theorem on Orthogonality

 

108

5.2 Solving the Eigenvalue Problem

 

112

5.3 Example with Incidence Matrices

 

114

5.4 Singular Value Decomposition

 

117

5.5 Stretch-Dependent Truncation

 

122

5.6 Updating Using Ex Post Projection

 

125

5.7 Proximity and Identity .

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

126

5.8 A Note on Compositionality: The Difference of Point,

 

Centroid, and Pathway

 

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

128

5.9 Performance Collections and Expertise Clusters

 

129

5.10 Summary

 

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

130

References

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

131

6 Visual Analytics Using Vector Maps as Projection Surfaces

 

133

6.1 Proximity-Driven Link Erosion

 

135

6.2 Planar Projection With Monotonic Convergence

 

136

6.3 Kernel Smoothing

 

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

141

6.4 Spline Tile Colouring With Hypsometric Tints

 

142

6.5 Location, Position, and Pathway Revisited

 

145

6.6 Summary

 

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

146

References

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

147

7 Calibrating for Specific Domains

 

149

7.1 Sampling Model

 

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

150

7.2 Investigation .

 

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

152

7.3 Results .

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

155

7.4 Discussion .

 

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

158

7.5 Summary

 

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

162

References

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

163

Contents

xiii

8 Implementation: The MPIA Package

 

165

8.1 Use Cases for the Analyst .

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

165

8.2 Analysis Workflow

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

167

8.3 Implementation: Classes of the mpia Package

 

170

8.3.1

The DomainManager

 

174

8.3.2

The Domain

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

175

8.3.3 The Visualiser

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

175

8.3.4 The HumanResourceManager

 

176

8.3.5

The Person

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

176

. 8.3.7 The Generic Functions . .

.

8.3.6 The Performance

8.4

References

Summary

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

 

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

177

177

181

181

9 MPIA in Action: Example Learning Analytics

 

183

9.1 Brief Review of the State of the Art in Learning Analytics

 

184

9.2 The Foundational SNA and LSA Examples Revisited

 

185

9.3 Revisiting Automated Essay Scoring: Positioning

 

196

9.4 Learner Trajectories in an Essay Space

 

212

9.5 Summary

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

221

References

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

222

Evaluation

10 .

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

223

10.1 Insights from Earlier Prototypes

 

225

10.2 Verification

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

230

Validation .

10.3 .

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

231

10.3.1 Scoring Accuracy .

 

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

232

10.3.2 Structural Integrity of Spaces

 

234

10.3.3 Annotation Accuracy

 

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

238

10.3.4 Visual (in-)Accuracy

 

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

240

10.3.5 Performance Gains

 

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

241

10.4 Summary and Limitations

242

References

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

243

11 Conclusion and Outlook .

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

247

11.1 Achievement of Research Objectives

 

248

11.1.1 Objective 1: Represent Learning

 

249

11.1.2 Objective 2: Provide Instruments for Analysis

 

250

11.1.3 Objective 3: Re-represent to the User

 

252

11.1.4 Summary .

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

256

11.2 Open Points for Future Research

 

258

11.3 Connections to Other Areas in TEL Research

 

260

11.4 Concluding Remarks .

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

263

References

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

263

Erratum to: Learning Analytics in R with SNA, LSA, and MPIA

 

E1

ThiS is a FM Blank Page

Application examples included

Social Network Analysis

3.2 A foundational example

3.3 Extended social network analysis example

Latent Semantic Analysis

4.3

Foundational latent semantic analysis example

4.5

Extended application example: automated essay scoring

Meaningful, Purposive Interaction Analysis

(8.3 Implementation: classes of the mpia package)

9.2 The foundational SNA and LSA examples revisited

9.3 Revisiting Automated Essay Scoring: Positioning

9.4 Learner trajectories in an essay space

Chapter 1

Introduction

Learning can be seen as the co-construction of knowledge in isolated and collab- orative activity—and as such poses an eternal riddle to the investigation of human affairs. Scholars in multiple disciplines have been conducting research to under- stand and enrich learning with new techniques for centuries, if not millennia. With the advent of digital technologies, however, this quest has undoubtedly been accelerated and has brought forward a rich set of support technologies that can render learning more efficient and effective, when applied successfully. This work contributes to the advancement knowledge about such learning technology by first providing a sound theoretical foundation, to then develop a novel type of algorithm for representing and analysing semantic appropriation from meaningful, purposive interaction. Moreover, the algorithmic work will be chal- lenged in application and through evaluation with respect to its fitness to serve as analytics for studying the competence development of learners. This work is not in academic isolation. Over years, recurring interest in research and development lead to the formation of a new, interdisciplinary research field in between computer science and humanities, often titled technology-enhanced learn- ing (TEL). With respect to development, this repeated interest in TEL is reflected in the growth of educational software and learning tools for the web (Wild and Sobernig 2007 ). Research in this area has seen a steadily growing body of literature, as collected and built up by the European Networks of Excellence Prolearn (Wolpers and Grohmann 2005 ), Kaleidoscope (Balacheff et al. 2009 ), and Stellar (Gillet et al. 2009 ; Fisichella et al. 2010 ; Wild et al. 2009 ). Both research and development have created many new opportunities to improve learning in one way or the other. Only for some of them, however, research was able to provide evidence substantiating the claims on such improvement (for reviews see Schmoller and Noss 2010 , p. 6ff; Means et al. 2010 , pp. xiv ff. and pp. 17ff.; Dror 2008 ; Campuzano et al. 2009 , pp. xxiv).

2

1 Introduction

140

100

80

60

40

20

0

1.1 The Learning Software Market

Already early empirical studies in technology-enhanced learning report large vari- ety in web-based software used to support learning activity in Higher Education:

Paulsen and Keegan ( 2002 ) and Paulsen ( 2003 ), for example, reports 87 distinct learning content management systems in use in 167 installations from the inter- views with 113 European experts in 17 countries. Wild and Sobernig ( 2007 ) report on results from a survey on learning tools offered at 100 European universities in 27 countries. The survey finds 182 distinct learning tools in 290 installations at the universities investigated. Figure 1.1 gives an overview of the variety of these identified tools: block width indicates the number of distinct tools, whereas block height reveals the number of installations of these tools at the universities studied. The majority of tools reported are classified as all-rounders, i.e., multi-purpose learning management and learning content management systems with broad functionality. Special purpose tools such as those restricted to content management (CMS), registration and enrolment man- agement (MS), authoring (AUT), sharing learning objects (LOR), assessment (ASS), and collaboration (COLL) were found less frequent and with less variety in addition to these all-rounders. Among the 100 universities screened in Wild and Sobernig (2007 ), functionality of the tools offered covers a wide range of learning activities: text-based commu- nication and assessment is supported in almost all cases and quality assurance/ evaluation plus collaborative publishing still in the majority of cases. Additional functionalities offered still frequently at more than a quarter of the universities at that time were individual publishing tools, tools for social networking, authoring tools for learning designs, and audio-video conferencing and broadcasting tools. Today, the Centre for Learning and Performance Technologies ( 2014 ) lists over 2000 tools in the 13 categories of its directory of learning and performance tools, although little is known about how and which of these are promoted among the European universities.

which of these are promoted among the European universities. ALL CMS MS ALL = allrounders CMS
which of these are promoted among the European universities. ALL CMS MS ALL = allrounders CMS
which of these are promoted among the European universities. ALL CMS MS ALL = allrounders CMS
which of these are promoted among the European universities. ALL CMS MS ALL = allrounders CMS

ALL

CMS

MS

ALL = allrounders CMS = content management systems MS = management systems AUT = authoring tools LOR = learning object repositories ASS = assessment tools COLL = collaboration tools

ASS = assessment tools COLL = collaboration tools AUT LOR ASS COLL Fig. 1.1 Learning tools:

AUT

LOR ASS
LOR
ASS

COLL

Fig. 1.1 Learning tools: types and installations according to Wild and Sobernig ( 2007, new graphic from raw data)

1.2

Unresolved Challenges: Problem Statement

3

This variety of learning tools available online, however, is only in parts a blessing to learners. Choice can be a burden and when it involves processes of

negotiation, as often required in collaborative situations, it can actually get into the way of knowledge work with content. The plethora of tools definitely forms an opportunity, but at the same time it affords new learning competences and literacies in its utilisation in practice. On the other hand, this plethora of tools is evidence of the existence of a large and growing market of learning technology. And indeed, the market-research firm Ambient Insight assesses the global market for “self-paced eLearning products and services” to have reached a revenues volume of “$32.1 billion in 2010” and predicts

a further annual growth of 9.2 % over the coming years till 2015, with the big