Towards Community-Centric Integrity Management in Crowd-Sourced Systems

Towards Community-Centric Integrity Management in Crowd-Sourced Systems
Amin Ranjbar
School of Computer Science McGill University Montreal, QC H3A 2A7, Canada Email: aranjb1@cs.mcgill.ca
Muthucumaru Maheswaran
School of Computer Science McGill University Montreal, QC H3A 2A7, Canada Email: maheswar@cs.mcgill.ca
AbstractIntegrity is an important concern in any knowledge management system. This paper discusses an ongoing research work that aims to develop a community-centric integrity management system for a large-scale knowledge management system that works on the Internet.
I. I NTRODUCTION Crowd sourcing is a powerful approach for building information artifacts on the Internet. Wikipedia is an example of a highly successful system built using this approach. Another successful example is Linux. In Wikipedia, the end product is an online encyclopedia that keeps evolving rapidly to keep up with the state of things. In Linux, the end product is an operating system kernel that runs on most modern computer architectures and supports high-level functions. While both systems allow an open participation model that encourages contributions from anyone on the Internet, they manage the integrity of the nal product in drastically different ways. In Wikipedia, the overriding objective is scale (i.e., to have the largest online encyclopedia that spans as many topics as possible). To meet this objective, Wikipedia encourages contributions by making it easy for the contributors to create and update articles. The integrity of the contributions are checked and agged by subsequent readers. For highly trafcked articles, this model of integrity enforcement works very well. In Linux, integrity is given very high priority. All updates submitted by the development community need the nal approval of Linus Torvalds (the originator of the project) before they are included in the ofcial software release. Community feedback and importance of the contribution are some of the factors that can motivate Linus Torvalds decision to include or exclude a given update from the kernel. Several Wikipedia like projects that do not have the popularity of Wikipedia use a model similar to Linux to manage the integrity of the articles maintained by them. However, instead of relying on a single person for the whole project, these sites (e.g., Google-Knol) decentralize the integrity management task such that an article creator is responsible for accepting or rejecting community updates received on topics within the scope of the article. While having a central gure per article to maintain integrity makes it simpler, it can create lot of workload for the maintainer if there are large number of small updates from the community.
So far, two extreme approaches are in play for maintaining integrity in crowd sourced systems: Wikipedia style and Linux style. With Wikipedia, integrity emerges out of the crowd activity and can be less effective on sections of the encyclopedia that do not gain wide exposure. Incorrect or intentional bias can be introduced into the Wikipedia articles and remain there until subsequent readers ag the problems. With Linux, integrity is tightly controlled by funnelling all changes through a single authority. This paper proposes a new approach for managing the integrity of content created in crowd-sourced repositories. Our approach assumes that the users are interconnected by an online social network (OSN). Using estimated trust measures obtained from the OSN, we determine how the updates should be processed on the network. II. I NTEGRITY IN C ROWD -S OURCED R EPOSITORIES In a crowd-sourced environment where large number of users are involved in updating large number of documents, dening integrity is a tricky matter. For the purposes of this paper, a document is considered to have high integrity if a majority of the users consider it to be acceptable. In most crowd sourced repositories (e.g., Wikipedia) only one version of a document is retained. This means contention can arise due to conicting opinions. To reduce the bias, systems such as Wikipedia often include different sections that explain the different viewpoints. If an article is owned by a single user (e.g., Google-Knol), this problem does not arise. However, the article itself might be biased because the owner might be ltering out the opinions he disagrees with. This gives rise to the possibility of having multiple articles on a single topic because users concerned about the bias of an existing article might start their own version to provide the alternative opinion. In Table I, we use different attributes to characterize 12 community-centric online information creation sites. The rst attribute is the type of identity required by the site. Except Wikipedia all other sites require some form of login to create an article or edit content that is already present in a site. The identity requirements fall into two major categories: (i) real identities and (ii) pseudo identities. Sites such as Scholarpedia and Citizendium require the contributors to register with physical identifying information (e.g., curriculum vitae) that
identies the person to the site. Once the physical identity and the accompanying information is veried, the user is given access to the site. The other form of identity that is more commonly deployed is the pseudo identity. With pseudo identity, users accumulate reputation on the pseudo identity and the privileges associated with the user is directly dependent on the corresponding reputation. A. Challenges In this section, we discuss some of the challenges of creating crowd-sourced information repositories. We categorize these issues as follows: 1) Ownership of articles: Suppose Alice is creating a document X. In the simplest publishing model, she retains the ownership of document X for its lifetime. If anyone else wants to edit or modify it, they need to get Alices permission. The way it works is for them to submit the modications to Alice and for her to decide whether it should be accepted or not. Obviously, such a publishing model is not very suitable for communitycentric or large-scale collaborative publishing. In this model, Alices ownership becomes a bottleneck because she must validate each and every modication. This is the reason Wikipedia takes the ownership away from its core model. It attempts to provide integrity without ownership or the oversight of a designated owner. 2) Bias articles: the large number of writers, often inexpert in Wikipedia-style websites, results in unreliable and unprofessional documents, contentions that will need further resolution. The main problem here is the lack of authority and fact-checking. Someone should report the problem; otherwise, inaccurate information that is not obviously false may persist in Wikipedia for a long time before it is challenged [1]. 3) Duplication of effort: Google introduced Knol as an online community-based encyclopedia with multiple articles on the same topic, each written by a different author (owner) to solve the previous problem. On the other hand, in this model, a reader would have difculty in searching and discerning the relevant articles from the irrelevant ones. Majority of online community-based websites have the ownership model to solve the problem of unprofessional and bias articles in Wikipedia. If a user creates an article, she is the owner of that document and also responsible for credibility and future modications. This could result in improving the quality of articles signicantly, because the articles indicate the opinion of the authors who will put their reputations on the line. With ownership model, various access control schemes were proposed in order to preserve integrity of the shared data. In the next section, we address necessary requirements for access control models. B. Requirements Any access control scheme for preserving integrity of digital data on community-based websites should provide the follow-
ing requirements [2]: 1) Full control: a scheme should provide the owner full control over how she gives modication access to other users. 2) Flexibility: a community-based website can have variety of different data and the editing requirements can be diverse. Therefore, the access control scheme should be exible and capable of operating at different data granularities. This exibility requirement has various aspects including: applying different editing criteria for different friends, changing editing conditions on a data object basis (e.g., increasing the integrity level of an already published data object), and changing editing conditions on a friend basis (e.g., decreasing the trustworthiness of a friend and consequently limiting her access to modify users data). While exibility is essential in protection schemes, it can add signicant overhead in terms of user effort required to setup and maintain a protection regimen. 3) User effort and collaborative environment: the user effort required by the access control scheme is certainly a major factor in its eventual acceptance in the user community. One way of retaining the exibility and reducing the user effort required by the scheme is to enable collaborative decision making. The collaborative decision making for access control provided for Alice can take different forms: learning from Alices past access control actions, learning from past actions of the community of users within a social neighborhood of Alice, learning from the past actions of like-minded set of users within Alices social neighborhood, or a combination of them. 4) Accessibility: another important requirement for community-based access control model is accessibility to data. For instance, Alice would like to know the friends or friend-of-friends who can modify the new article she is posting with a particular integrity setting. Such prediction of accessibility can be used to interactively shape the integrity settings for important data. III. C OMMUNITY-C ENTRIC I NTEGRITY M ANAGEMENT We present the Community-Centric Integrity Management (CCIM) scheme for preventing unauthorized modications in online collaborative social networking systems inspired by the work done in [2]. In this scheme, users can view, vote, comment, edit and write (own) information artifacts. We incorporate a number of social factors and ownership in our design in order to preserve integrity of documents. It means that CCIM utilizes user activities and the topological structure of the social graph to establish trust between the users. In an editing situation, an owner of the article is more willing to accept modications from an editor who is considered trustworthy. We apply these factors in CCIM scheme to categorize users around the owner of the article into three different zones as shown in Figure 1: editing zone, suggestion
Name Wikipedia Knol Scholarpedia Citizendium Squidoo Hubpages Helium Examiner Instructables About
Type of identity Anonymous Google account Real id Real id Pseudo id Pseudo id Real id Real id Pseudo id Real id
Ownership No Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes
Edit Anyone No, with owners permission Yes, approved of author Yes, before approval No No No, owner No No, contractor writers No No, owner Yes
Approval No No Yes, expert Yes No No No No Yes Yes, guides should accept it No No
Rating Scope of topics No No No No No No Yes No No No Yes Yes All All Scientic All All All All All Do-It-Yourself instructions All Technology news Programming
Number of articles per topic 1 many 1 1 many many many many 1 1 many many
Remuneration No No No No Yes Yes Yes Yes Yes Yes No No
DailyTech Pseudo id Stackoverow Pseudo id
TABLE I C HARACTERISTICS SUMMARY OF ONLINE COMMUNITY- BASED WEBSITES
Fig. 1. Illustration of different zones for one of publisher P s article with user mappings
cases, trust is represented using the social distance between the users (hop-distance in terms of user relationships) in the social network. For instance, direct friends (1 hop friends) are considered more trusted than friend-of-friends (2 hop friends). In CCIM, user trust is an essential part of the scheme. To quantify the trust between two users, we propose a trusted distance measure that is similar to the hop distance between users on the social network. The trusted distance is a generalization of the hop distance; it takes into consideration the hop distance the users as well as past user activities in the network. These activities represent the history of the different editing activities that take place between the users. We denote the trusted distance between two users x and y as dtrust (x, y). B. CCIM Scheme Overview
zone, and commenting zone. Users falling into the editing zone can edit the owners articles without his/her permission. Therefore, the ownership would not be a bottleneck in our system. Conversely, user falling into the commenting zone can only make comments about the owners articles. Modications from users falling in the suggestion zone should be approved by the owner. Our approach for developing CCIM is built on the following assumptions. 1) All users are part of a centrally maintained social network. 2) Friendships among users on the social network are context independent (e.g., direct friends of Alice could include family members and university colleagues). 3) The social network follows the best security practices in resisting whitewashing [3] and sybil attacks [4], [5]. A. Social Factors The design of CCIM leverages the structural properties of the social network and the activities of its users. On social networks, the structure of the relationships can be used to infer the degree of trust between its users [6], [7]. In such
In this scheme, we use integrated namespace in order to facilitate discovery of articles for users. Users can search articles about a specic topic without even logging into the system. While articles are presented based on their popularity, there is no extra or duplication of efforts for user to discern the relevant articles from the irrelevant ones. If a user wants to vote, comment, edit or publish an article, she should be logged into the system with her unique username. Having unique identications helps to prevent multiple votes on one article from a specic user. More votes on one article indicate that it has higher level of acceptance among the users. In addition, we incorporate the ownership in our model to prevent the problem of bias articles in existing systems such as Wikipedia. There are no anonymous comments, editing or writers in the system. If a user creates an article, she is the owner of that document. While a user is logged into the system, she can make comments and vote on existing documents. In addition, she can send a publish request (to publish an article) or an edit request (to modify an existing document in the system). To publish an article OP , the owner P will send a publish request (P, OP , E(OP ), C(OP )) to the editing limit computation module as shown in Figure 2. The limit computation module
Fig. 2.
The different modules of CCIM framework
utilizes the constraints E(OP ) and C(OP ) to determine two thresholds values: editing limit le (OP ) and commenting limit lc (OP ). These limits determine the different zones associated with the article. An edit request from the user R for the article owned by the publisher P is received and processed by the edit request evaluation component. The outcome for the edit request depends on the trusted distance from the requester to the publisher dtrust (P, R). The value of dtrust (P, R) is supplied by the trusted distance computation module. To determine dtrust (P, R), the trusted distance computation module consults the social network topology and the requesters activity history. We assume that trusted distance is measured as a real number greater than zero. For example, a friend is at trusted distance 1.0 and friend-of-a-friend is at 2.0. Similarly, we can represent a good friend by a number less than 1.0 (e.g., 0.9), a very good friend by a number less than that used for a good friend (e.g., 0.8), a good friend-of-a-friend by a number less than 2.0 but greater than 1.0 (e.g., 1.8), and so on as shown in Figure 3.
editing constraints satisfy 0 le (OP ) lc (OP ). If the trusted distance from the requester is in the range [le (OP ), lc (OP )), the edit request would be granted conditionally. It means that the requester have the permission to edit the article; however the publisher P could accept or reject the modications. If the publisher accepts the suggestions, the system would apply the changes to the article. Users whose editing request are accepted conditionally are said to be in the suggestion zone for that article. Figure 4 indicates the steps involved in publishing an article and sending an editing request in the CCIM framework.
Fig. 3.
Different level of friendship
Accordingly, the edit request evaluation module would nd out the zone in which the requester falls based on the trusted distance dtrust (P, R) and decide the outcome of the request. If the requesters trusted distance is less than the editing limit le (OP ), the edit request would be granted automatically. Users whose edit requests are automatically honored are called to fall in the editing zone for that article. The editing limit le (OP ) represents the largest trusted distance for which an edit request is granted unconditionally. The commenting limit lc (OP ) is the smallest trusted distance for which an edit request is rejected and the requester can only make comments for that article. Users whose edit request are rejected are said to fall in the commenting zone. We have to mention that the
Fig. 4. The steps involved in publishing an article and sending an editing request to edit an article using CCIM scheme. To publish an article, the publisher sends a publishing request including the article and editing constraints (1). The editing limits computation module stores the article in data store (1-a) and determines the editing and commenting limits (1-b). A requester can send an editing request to the editing request evaluation module to get access to the article (2). This module gets enough information from the access policy store (3-a) and the trusted distance computation module (3-b) in order to respond to the requester. The requester can receive either an editing notication, commenting notication, or suggestion notication (3). Then the requester can have editing access to the article (4). If the requester receives a suggestion access, CCIM sends a notication to the publisher in order to get the approval from the owner of the article (5).
Considering the case that the requester R attempts to edit the publisher P s article OP by sending an editing request to CCIM. The value of dtrust (P, R) determines which zone the requester R belongs to with respect to constraints assigned to OP (editing limit le (OP ) and commenting limit lc (OP )). The outcome for the editing request is summarized by the following rules:
1) dtrust (P, R) le (OP ): the requester R is trusted by publisher P and is automatically given access to edit the article OP . 2) le (OP ) < dtrust (P, R) < lc (OP ): the requester R can edit the article OP ; however his modications need to be approved by the publisher P . 3) lc (OP ) dtrust (P, R): the requester R is automatically denied to have editing access to the article OP . It can only vote or make comments about this article. For Cases 1 and 3, no further processing is required. Case 2 will send a notication to the owner of the article P . Then the owner can accept or reject the modications. Note that because a users activity history stores information about the outcome of his editing requests, a user can conceivably move from one zone to another. A series of accepted editing requests can push a user from the suggestion zone into the editing zone while a series of rejected requests can easily push a user into the commenting zone. IV. R ELATED W ORK A trust-based extension of role base access control called TrustBAC is presented in [8]. The TrustBAC assigns trust levels to users based on three factors (experience, knowledge, and recommendation). The system computes overall trust levels based on these factors and are assigned to roles. In addition to the trust-based extensions of RBAC, there is signicant effort in developing trust-based access control mechanisms from scratch [9], [10]. In [9], a trust based access control system for mobile ad-hoc collaborative environment is provided. The system combines node reputations and environment risk to make access decisions. A nodes reputation is the perception that peers form about a node, and environment risk is a measure of how risky an action is likely to be given the current state of trust events in the environment.In [10], a trust-based access control framework for peer-to-peer (P2P) le sharing is presented. Two threshold values, one based on trust scores and another based on contribution scores, are associated with each le. A peer seeking access to the le should have trust and contribution values exceeding these threshold values. Recently, Relationship-Based Access Control (ReBAC) was proposed in [11] to express the access control policies in terms of interpersonal relationships between users. ReBAC captures the contextual nature of relationships in online social networks. A new method to nd out the best sharing set based on users access control policies is presented in [12]. Their approach is to compute an equivalent sharing set that indicates the users who are likely to get the information according to the sharing patterns observed in the network. Authors of [13] studied access control policies of data co-owned by multiple parties in online social networks setting, in a way that each co-owner may separately specify her own privacy preference for the shared data. A voting algorithm, using game theory, was adopted to enable the collective enforcement of shared data. Utilizing trust metrics for imposing access restrictions is similar to multi-level security that is proposed in [14] in
order to preserve the trustworthiness of the users data in OSNs. Furthermore, [15] introduces a new discretionary access control model and a related enforcement mechanism for the controlled sharing of information in online social networks. The new scheme adopts a rule-based approach for specifying access policies on information owned by network users. In their scheme, authorized users are denoted in terms of the type, depth, and trust level of the relationships existing between nodes in the network. In [16], authors present a new OSN called Persona where users state who may have access to their information. This OSN uses attribute-based encryption to hide users data and allows users to apply their own policies over who may view their data. Authors of [17] present a new trust management paradigm for securing both intra- and inter-organizational information ows against the threat of information disclosure. They proposed an approach for assessing the risks in terms of trustworthiness and improving risk estimations by involving estimates of trust. Their approach also provides a mechanism for handling risk transfer across organizations and forcing rational entities to be honest. V. C ONCLUSIONS AND F UTURE W ORK In this paper, we presented an early look at a communitycentric integrity management framework we are developing as part of an ongoing research work. Many aspects of the framework are yet to be nalized. We will be evaluating the performance of key procedures using trace data from existing social networks. R EFERENCES
[1] Reliability of wikipedia, http://en.wikipedia.org/wiki/ Reliability of Wikipedia. [2] W. Villegas, B. Ali, and M. Maheswaran, An access control scheme for protecting personal data, in PST 08: Proceedings of the 2008 Sixth Annual Conference on Privacy, Security and Trust. Washington, DC, USA: IEEE Computer Society, 2008, pp. 2435. [3] A. P. De, M. Schorlemmer, I. Csic, and S. Craneeld, A Social-Network Defence against Whitewashing, in aamas 2010: Proceedings of the ninth international conference on autonomous agents and multiagent systems, Toronto, Canada, 2010, pp. 15631564. [4] H. Yu, M. Kaminsky, P. B. Gibbons, and A. Flaxman, Sybilguard: defending against sybil attacks via social networks, in SIGCOMM 06: Proceedings of the 2006 conference on Applications, technologies, architectures, and protocols for computer communications. New York, NY, USA: ACM, 2006, pp. 267278. [5] H. Yu, P. B. Gibbons, M. Kaminsky, and F. Xiao, Sybillimit: A near-optimal social network defense against sybil attacks, in SP 08: Proceedings of the 2008 IEEE Symposium on Security and Privacy. Washington, DC, USA: IEEE Computer Society, 2008, pp. 317. [6] J. Golbeck and J. Hendler, Inferring binary trust relationships in web-based social networks, ACM Transactions on Internet Technology (TOIT), vol. 6, no. 4, p. 529, 2006. [7] C. Binzel and D. Fehr, Social Relationships and Trust, Discussion Papers of DIW Berlin, 2010. [8] S. Chakraborty and I. Ray, Trustbac: integrating trust relationships into the rbac model for access control in open systems, in SACMAT 06: Proceedings of the eleventh ACM symposium on Access control models and technologies, New York, NY, USA, 2006, pp. 4958. [9] W. Adams and N. Davis, Tms: a trust management system for access control in dynamic collaborative environments, IEEE International Performance Computing and Communications Conference, 2006.
[10] H. Tran, M. Hitchens, V. Varadharajan, and P. Watters, A trust based access control framework for p2p le-sharing systems, in Proc. of the 38th Annual Hawaii International Conference on System Sciences, Jan. 2005. [11] P. W. L. Fong, Relationship-based access control: Protection model and policy language, in Proceedings of the First ACM Conference on Data and Application Security and Privacy (CODASPY), San Antonio, Taxas, USA, 2011. [12] A. Ranjbar and M. Maheswaran, A case for community-centric controls for information sharing on online social networks, in Proceedings of IEEE GLOBECOM Workshop on Complex and Communication Networks (CCNet), Miami, Florida, USA, 2010. [13] A. C. Squicciarini, M. Shehab, and J. Wede, Privacy policies for shared content in social network sites, The VLDB Journal, vol. 19, pp. 777 796, December 2010. [14] B. Ali, W. Villegas, and M. Maheswaran, A trust based approach for protecting user data in social networks, in Proceedings of the conference of the center for advanced studies on Collaborative research (CASCON), Richmond Hill, Ontario, Canada, 2007, pp. 288293. [15] B. Carminati, E. Ferrari, and A. Perego, Enforcing access control in web-based social networks, ACM Transactions on Information and System Security (TISSEC), vol. 13, November 2009. [16] R. Baden, A. Bender, N. Spring, B. Bhattacharjee, and D. Starin, Persona: an online social network with user-dened privacy, in Proceedings of the ACM SIGCOMM conference on Data communication (SIGCOMM), Barcelona, Spain, 2009, pp. 135146. [17] M. Srivatsa, S. Balfe, K. G. Paterson, and P. Rohatgi, Trust management for secure information ows, in Proceedings of the 15th ACM conference on Computer and communications security (CCS), Alexandria, Virginia, USA, 2008, pp. 175188.

Towards Community-Centric Integrity Management in Crowd-Sourced Systems

Hochgeladen von

Dokumentinformationen

Originalbeschreibung:

Originaltitel

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Towards Community-Centric Integrity Management in Crowd-Sourced Systems

Hochgeladen von

Copyright:

Verfügbare Formate

Towards Community-Centric Integrity Management in Crowd-Sourced Systems

Approval No No Yes, expert Yes No No No No Yes Yes, guides should accept it No No

Remuneration No No No No Yes Yes Yes Yes Yes Yes No No

DailyTech Pseudo id Stackoverow Pseudo id

TABLE I C HARACTERISTICS SUMMARY OF ONLINE COMMUNITY- BASED WEBSITES

The different modules of CCIM framework

Different level of friendship

Das könnte Ihnen auch gefallen