Big Data With Well-Being

Big Data with Wellbeing 1
Big Data with Wellbeing
By (Name)
Course
Tutor
School
City and State of School
Date
Abstract
The current study mainly sought to analyze how the emerging field of Big Data analytics,
could be used in the study of the well-being of a community. In the paper, the study looks at the
various indicators of a psychological well-being, within various statistics, gotten from the foreign
nation Canada, With the Statistics -Canada, where data sets were acquired to mainly test the
capability of the statistical software (SPSS), for an extensive data set, of population size of
N=681,578. The results from the study found out that that relationships, stress, diet, income,
health, education, alcohol abuse and education, could mainly be used as predictors of the well-
being of a community. The software was found to be capable of analyzing such magnitudes of
data. However, when actual analysis of Big Data, such as those in Petabytes and Terabytes, may
be limited due to various performance issues. The research will again discuss some of the
previous and traditional methods that were used in the measurement of the well-being of
individuals as well as their various limitations. It will delve in topic in the subject of theis study
topic to allow for actual figures.Findings of the paper may as well help and the topic other
concepts that are also availed for this research

Table of Contents
Abstract ......................................................................................................................................................... 2
Introduction .................................................................................................................................................. 6
Significance of study ................................................................................................................................. 9
Literature Review ........................................................................................................................................ 10
The Use of GDP to measure well-being and issues as well as limitations .............................................. 11
The Capability Approach ......................................................................................................................... 14
Capabilities for the Measurement of Well-being of Individuals ......................................................... 15
The Capability Approach, the Nussbaum Version .............................................................................. 17
The Various Limitation of the Capability Approach ............................................................................ 18
Use of Big Data in the Measurement of Well-being of Individuals and Communities ........................... 19
The Various Ethical Concerns for the Large Data Sets ............................................................................ 21
Big Data and Psychology ............................................................................................................................. 22
The Psychological Well-being ................................................................................................................. 25
The Current Study ................................................................................................................................... 26
Methods ...................................................................................................................................................... 30
The Exploration of the various Big Data Technologies ........................................................................... 30
Participants ......................................................................................................................................... 30
Materials ............................................................................................................................................. 30
Procedures .......................................................................................................................................... 31
Method of the Study ............................................................................................................................... 33
Materials for the Study ........................................................................................................................... 34
The Procedure for the Study ................................................................................................................... 34
The Results of the Study ......................................................................................................................... 34
Table 2 The R and R square values for the best model fit .................................................................. 35
TABLE 3: The Anova Table for the Best Model ................................................................................... 36
Results ......................................................................................................................................................... 39
The Research and Findings ..................................................................................................................... 42
Conclusion and Findings ......................................................................................................................... 46
References .................................................................................................................................................. 47
Table of figures
Figure 1 ....................................................................................................................................................... 32
Figure 2 ....................................................................................................................................................... 35
Figure 3 ....................................................................................................................................................... 39
Figure 4 ....................................................................................................................................................... 57
Figure 5 ....................................................................................................................................................... 58
Table of tables
Table 1......................................................................................................................................................... 35
Table 2......................................................................................................................................................... 36
Table 3......................................................................................................................................................... 38
Introduction
In the social science domain, there is the growing interest of using methods that are far
from the conventional income-based approach, to measure the level of development and well-
being of the people or the community (Stiglitz et al., 2009, p.7). Furthermore, it should be noted
that the GDP does not result in the measuring of some of the non-markets interactions, such as
friendship, the moral values, happiness and even a sense of the purpose of life. Due to this, a
growing multitude of scholars, are turning to the issue of subjective self-reported measures of the
well-being of an individual, which can include va4riables such as satisfaction in life. It should be
put in records also that the majority of the economists, mainly use the variable of quality of life
and comfort, as a measure of the direct level of utility.
Furthermore, various political leaders, have supported the move, by calling the
commencement of multiple surveys that measure the subjective well-being, in their policies and
agendas. However, despite the diverse advancement, the measurement of subjective well-being
has raised multiple concerns among the economists, especially regarding the issue of
interpretation and analysis (Kahneman and Deaton, 2010, p.490). As a result of these concerns,
alternative methods such as the Time Use Surveys and the Day Reconstruction Method have
been majorly developed to cater to the various limitations.
As a result of this, the paper mainly contributes to the new research agendas, mainly
through showing how Big Data, could be utilized in the study and understanding of the well-
being of the people. It should be noted that through the increase of the digitization in the social
lives among humans, there is the advent of various variables to be traced, such as feelings, social
relations, and attitudes, which are deeply embedded in the social networking sites, such as
Facebook, Twitter, and even Instagram. Furthermore, the magnitude of these traces should be of
key concern, among the various social scientists. However, despite the capacity to analyze and
collect massive amounts of data, being so high in the areas of physics and biology, the progress
in the social sciences, has been relatively low (Lazer et al., 1999, p.722).
One of the main advantages of Big Data is that they can give the social scientists, the
ability to have a view of the various behaviours of the people. Through the searches on Google,
which they can then use to make various inferences, about the feelings and attitudes of the
people, rather than the statements of the variousattitudes and feelings among the people. Another
advantage is that it helps the social scientists, to stop relying on the various answers to the
predefined questions, such allows the social scientists, to listen to what the people have actually
to say. As a result of this approach, where there are revealed preferences, it results in the
unveiling of a reflexive picture of the whole society, due to the main reason that it allows the
ranking of the main concerns of the citizens, to emerge spontaneously, and also be used as
complementary data, to explain the main effects of the GDP. Furthermore, the data are mainly
based on the actual behaviour of the society, as they search for various information on the
internet, which are then taken to elicit the primary concerns of the people.
Another pro of using Big Data in the study of the subjective well-being of the people is
that they are not constrained to a particular time. The main reason is that they offer a real-time
and immediate source of information, to the various policymakers, which are mainly confronted
with data scarcity and short-term horizons while making multiple decisions for the community.
Furthermore, the information is also available at the local level, as long as there is the advent of
internet connection and access and the use of the internet is inherently sufficient to provide
meaningful statistical data which can be used for interpretation and analysis. Furthermore, Big
Data is often free of charge, which makes their access less costly.
Although these are some of the advantages of the use of Big Data in any platform, it
should be put in records that there are various limitations in its use. One of them is that the large
volume of data, can be substantial and thus provide constraints, when it comes to statistical
interpretation as the data is susceptible to noise, thus making the analysis very difficult. In this
paper, the various issues that could arise, through the use of data from the Google serach3es, will
be identified and proposed solutions recommended. One of the main issues that will be discussed
is the issue of the construction of the various categories, which reflect the different dimensions of
life such as the use of the Bayesian techniques, in a bid to come up with the most important
determiners of the subjective well-being of the individual. The methodology that will be used in
the paper will mainly allow for the construction of a model that has four critical qualities. These
are that the model should be testable, should be grounded in the existing theory and literature, the
model should also be transparent and could potentially be used in the determination of the well-
being of the community in a continuous period.
The paper will mainly build on the existing literature about the exploitation of the search
engine data. It should be put in records that one of the first search engine data, was mainly used
in the prediction and forecast of the unemployment rate in the United States (Ettredge et al.,
2005, p.88). On the other section, the data from the various internet search engines has been
mainly used in the forecast of the various macroeconomic indicators, such as the demand in the
automobile, the various destination for vacations, unemployment rates and even the consumer
sentiments (Choi and Varian, 2009, p.56). Such shows that Big data has an enormous potential to
determine or predict the well-being of people in a community.
Significance of study
The subjective well-being of a community or people, for example, how happy or satisfied
they are with their lives, can be one of the best tools that could be used as an indicator, in the
social progress of a community. Furthermore, through the use of big data and subjective well-
being, the various impacts of policies, as well as regulations on the well-being of people in a
community, can also be measured. However, it should be noted that the data on the subjective
well-being of a community or people, can be complicated to obtain, especially when methods
such as surveys are used, and also due to the fact that there may be some uncertainty, regarding
the factors that drive the people to offer the specific responses to the subjective questions. In this
study, the internet search volumes will be used to build a model that could be used to accurately
predict the subjective well-being of the various communities or people living in the United
States. From the research, it was arrived that the searches, which mainly related to financial
security, employment, leisure and family life, were among the main things that could be used to
predict the subjective well-being of the people. Furthermore, the model could be used in the
production of data, at a much higher frequency and accuracy, than when surveys are used.
Literature Review
For many decades, the issue of measurement of the well-being of the people has been a
matter of key interest among the psychologists, policymakers and the social scientists.
Furthermore, the issue of personal well-being is mainly considered a topic under the
psychological sciences. Such is mainly due to the reason that it mainly concerns the subjective
well-being and feelings of the individuals. On the other hand, the social well-being can be said to
be a collective dimension of the community as well, and the value is considered to be a very
significant descriptor or indicator of how the level of developed or the socio-economic system of
the place is balanced and sustained.
In this section, the various methodologies, which are used in the measurement of the
well-being of the individuals in the social dimensions, will be discussed. The purpose, as well as
the various limitations of the methods, will be discussed. While discussing the various
methodologies that are used in the measurement of the well-being of the community, the main
issue of using internet searches and social media, in the utilization of the Big Data will also be
discussed. The literature review section will be structured as follows;
 The traditional well-being measurement, through the use of GDP
 The capabilities approach
 The Use of Big Data in the measurement of Well-being of a community

The Use of GDP to measure well-being and issues as well as limitations
When it came to the measuring of the socio-economic progress of individuals or the
society, the use of the GDP was mainly considered as one of the best methods, even though it
had many shortcomings, especially when used as an indicator of the well-being of the
community. One of the main reasons that the GDP was mainly a success, in the measuring of the
well-being of the individuals in the community, was that it could connect the various goods and
services with the monetary valuations (Stiglitz et al., 2009, p.8). Furthermore, the use of GDP
also had the advantage that it had some form of linearity, clearness, and objectivity. Such could
be witnessed in the various public debates as well as its usefulness when it comes to some
international comparisons.
It should be noted that the GDP, is mainly an aggregate measurement of the production in
a country or community, and it includes mostly the production of the collective or final goods
and services, which are supplied to other units, rather than their producers in a certain period.
Although the use of GDP, and its correlations, with other aspects such as the standard of living,
is very high, the correlation is not very universal (Stiglitz et al., 2009, p.9). Furthermore, the
differences in the level of income between individuals, only explain a deficient proportion of the
differences that exist in the level of happiness among the people in the community (Frey and
Stutzer, 2002, p.403). Due to this reason, the use of GDP is mainly criticized by many social
scientists’ and scholars as well as psychologists, owing to the mere factor that it behaves as a
lacking indicator. When it comes to the determination of the well-being of the community, thus
resulting in the various wrong information, which might be necessary for the formation of
various policies and decisions to the community (Fleurbacy, 2009, p.1029). On the other hand,
the economists, philosophers and the psychologists.Among other people concerned in the
determination of the well-being of the individuals in the community, are increasingly being
interested in the various platforms of self-reported measures in the welfare of the individuals in
the city, as well as their importance in the contribution of decision-making and policy formations
for the town (Deaton and Stone, 2013, p.2).
More issues have been raised on the use of GDP, a measure of the human development
and the well-being of the community. One of the main reason for the limitation of the accuracy
of the use of GDP in the measurement of the welfare of the people is the issue of lack of a clear
one to one correlation with growth as well as the quality of life. As such, too much emphasis on
the use of the GDP as a benchmark of the well-being of the individuals could result in the wrong
enactment of policies or poor decisions. As such, GDP is mainly a measure of the total
production in the market, as a result of this, it is more useful, when it comes to the measurement
of the supply side in the economy, rather than the standards of living of the individuals or
citizens. An excellent example to this is that since the end of the World War II, there has been a
rapid increase in the GDP of many countries, however, the same cannot be said on the self-
reported subjective well-being of the individuals, as it has not risen or fallen (Frey and Stutzer,
2002, p.404). Furthermore, it should be noted that beyond a certain level of income, there is a
minimal relationship between happiness and income (Choudhary et al., 2012, p.28). As such,
even though revenue is beneficial in giving people some freedom and well-being, it can only be
used be used a mere proxy on what is valued by the people (Robeyns, 2005, p.98).
Other lots of issues have been raised on the subject of the use of GDP in the measurement
of the well-being among people in a community. First of all, it should be noted that the financial
analysis of the transactions in the market, is mostly taken as one of the initial points in the
measuring of the economic performance of a place as well as the prices of certain goods.
However, such data cannot be taken as a correct representation of the real value in the society.
Furthermore, it should be noted that only the actual services and goods are usually exchanged in
the market, and also calculated in the GDP measurement, which further results in the ignoring of
various factors that affect the well-being of the people in the community. Such includes the
aspects of the everyday goods which are provided by the government, which such as security,
democracy, and even freedom.
Furthermore, volunteering services and social relations are also ignored in the calculation
of the GDP, yet they are very crucial in the measurement of the well-being of the person or
individuals in the community. Another aspect that limits the use of GDP in the analysis of the
welfare of the people in the community is that it checks for the increase or rise of prices of some
of the primary commodities, while the decrease in the wealth of the individual, is usually
overlooked. Furthermore, the measurement of the aggregate production in a country does not
take into consideration, the various degradation of the environment or the multiple assets used in
the production, or even pollution.
It should be put in records that through the use of the GDP in the measurement of the
well-being of the individuals, could result in various misleading and wrong observations,
concerning the wealth, inherent to society. Furthermore, adverse events such as natural disasters,
resulting in the reduction of the individual wealth of an individual, but on the other hand, they
lead to an increase in the GDP of the country or the community. As seen from the above, using
the GDP to measure the well-being of individuals in the city, can result to various misleading
information, which can further result in poor decision making, concerning the different policies
that affect the people or the citizens. However, one of the most influential approaches, when it
comes to the measurement of the well-being of the humans, is the capability approach. The
approachwas developed between the 1980s and the 1990s, by one of the most famous economist
scholar, Amartya Sen.
The Capability Approach
The approach was mainly designed by a Nobel Prize winner, Amartya Sen. The approach
borrows mostly various roots from the works of Aristotle, Smith, and Marx. It is primarily a
broad normative framework, which is primarily used in the evaluation of the various societal
arrangements as well as the well-being of the people in the community. As such, the approach
provides mostly a framework that could be used in the carrying out of the various exercises in a
welfare comparison set. It should be put in records that one of the main advantages of the
capability approach is that it mainly measures nearly all the dimensions of the well-being of
people in a given setting. Such is shown by the fact that it targets development, justice, well-
being, as well as paying particular attention to the various linkages that occur between material
wealth, social well-being and psychological issues among the people.
One of the main features of this method is that there is the presence of the distinction
between the ends and means of the various actions undertaken by the humans. Due to this type of
distinction, there is the introduction of two kinds of concepts; these are the functioning and the
capabilities, which are all realized effectively. The former in these two concepts mostly
represents the achievement of the person. As such, it mainly points out the things that a person
manages to do while living their lives. On the other, functioning is more directly related to the
various living conditions that the person is exposedto. As a result, they are more easily
measurable, due to the fact that they are mainly concerned with the different aspects of the living
conditions.
The different capabilities are mainly possibilities and the chances of the various
alternative combinations that the various functioning that one can choose or do in a bid to lead
and live their lives. Hence, the capabilities, are thus the different set of vectors that result in the
functioning of the person, therefore resulting in the reflection of the freedom of the person to
choose which mode of life that they would like to lead or live.
It should be noted that the capability approach, has had longstanding impact and
usefulness, in various parts of the world. In the year 1990, the UNDP, adopted the approach, in a
bid to determine the Human Development Index (HDI), as well as the annual Human
Development Report (Robeyns, 2005, p.100). The main theoretical background of the HDI
mainly lies in the measuring of the various levels of human development. The main proxies used
in this measuring are mainly four in total and contain the first domain of the life expectancy of a
particular people.The second is the average schooling years in the country as well as the
expected length of schooling in the country as the third domain; the fourth domain is mainly
estimated through the use of the National Income Per capita.
Capabilities for the Measurement of Well-being of Individuals
Functioning and the capabilities, mainly represent the main base of information, when
using this approach. The approach differs a lot from the other theoretical frameworks, which are
mainly aimed at determining or predicting the level of well-being of the community or
individuals. A key example of this is that the personal utility, such as desire fulfilment, happiness
and pleasure, and even opulence, majorly means the level of freedom of the person or
community. It should be noted that the individual and the social level, mainly emerge as a form
of interdependence between these two dimensions. Such is regardless of the causality or the
relations being less and or more challenging to get or access from the perspective of capability.
As such, it can be said that the capability perspective and approach, is mainly based on the
mixture of the doings and the beings, where the quality of living could be measured through the
use of capabilities, to result in the valuable functioning of people or the community. However, it
should be noted that material things and materialism, should also be considered in the evaluation
of the well-being of individuals. As such, focusing on the subjective perspective could result in
some form of bias as it could fail to depict the various deprivations of the person as well as the
lack of various material goods.
As the name suggests, the approach mainly depends on the perspective of the various
capabilities of the person, as the main source of information. As such, focusing on the
capabilities, when it comes to the evaluation of the well-being of the individual, does not result
in the loss of any information that is vital in the calculation of the well-being of the individual.
The approach considers not only the well-being achievements of the person but also the well-
being freedom of the person, to choose any life to lead. The approach contrasts itself with the
consumer theory, where there is a variety of option available to the person, the best option is
normally considered. However, through the capability approach, there is freedom to enjoy
various activities as well as beings, that may have important values, to the well-being of the
individual (Sen, 2008, p.280).
As such, by doing this, when using the capability approach, there is no particular favour
of a particular notion about a good life. As such, it is very easy to its influence on the issue of
luck-egalitarianism. The approach mainly assumes that each person should have the same
capability or real opportunity. However, each individual should be held responsible for the
partic7ular choices that they make (Robeyns, 2006, p.356). It should be however noted that the
valuable functioning of the people, can be very different, especially considering the places and
conditions in which a person lives, such as the developing or developed areas. Some of them
may also include the various basic needs such as food and nourishment, health and even more
complex issues such as self-esteem and self-respect.
The Capability Approach, the Nussbaum Version
Martha Nussbaum is mainly credited with the creation of this version of the capability
approach. According to Nussbaum, the desire to get free of choice, primarily requires that there
is the presence of the formal defence of the many fundamental liberties, as well as the ability to
get an assurance on the various levels of material conditions and also circumstances. However,
for the actual effectiveness of the empowerment as well as the human development, various
aspects need to be considered, which allows the people to transform the several chances or
possibilities, into means, outcomes and ends (Nusbaum, 2000).
From this point, it is apparent that the two, Nussbaum and Sen, used different aspects of
the capabilities. In the works of the former, the concept mainly refers to effective or real
opportunities, while in the works of the latter, the more attention is paid on the skills of the
people as well as their personality traits (Robeyns, 2005, p.97). The interpretation of Nussbaum
from the perspective of capabilities.As the various attitudes of the humans, mainly recalls the
aspect of degree of conversion, which was introduced in the year 2003, by Sen, as a way of
indicating the various degrees, in which people can be able to transform the various commodities
in functioning and capabilities, especially those that are identified as social and
personal.Nussbaum mainly elaborates on three different conversion factors that reconsidered in
his perspective. They include as mentioned below;
1. Environmental; which primarily concern the geographical location, the natural
characteristics and the climate.
2. Social; which mainly include the institutions and the values that the society is built upon
in the country or the community.
3. Personal: It mainly features a person that has a more significant influence on the
conversion of various commodities into functioning
The Various Limitation of the Capability Approach
Inthe implementation of the capability approach, it has been done in many contexts and
field, mainly through a general framework, while being supported by other theories. According
to Robeyns (2006, p.358), there are mainlyeight types of applications that make use of the
capability approach; these are listed below;
1. The general assessment of the human development in a given country
2. The poverty and well-being assessment in the development economies around the world
3. Empirical and theoretical analysis of the various policies in the countries
4. Identifying the poor in the developing nations
5. Criticizing the different social norms
6. Analyzing the deprivations that occur to the people who are disabled
7. Gender inequalities assessment
8. In exploratory and descriptive research

As seen, despite the fact that the approach has many uses and attracts much attention
from the various scholars and researchers, its applicability, has continued to be an issue eliciting
many and mixed reactions from various players (Robeyns, 2006, p.370).
Furthermore, multiple doubts have continued to hamper the possibility of undertaking the
empirical use of the approach as it is seen to be very complicated (Sen, 2008, p.279). One of the
great doubts and questions that surround the application of the approach mainly lingers on the
issue of the selection of the capabilities. Issues such as which type of capabilities and functioning
are dearer to the humans have continued to mar the successful application of the method. As seen
the practical application of the approach, has continued to face procedural and operative
problems. Such is mainly attributed to the indefinite borders that mainly surround the issue of
capabilities, which leads to the users finding it hard to distinguish between capabilities and
functionalities.
Use of Big Data in the Measurement of Well-being of Individuals and Communities
The various advancement in the areas of technology are currently resulting in the
development of new avenues in the study of the well-being of individuals in a community or a
country. Due to this, it should be noted that Big Data, as well as the various data analytic tools,
could be used to step in the measurement of the well-being of a community or individuals in the
country. It should be put in records that there is no standard definition of what is composed of
big data, however, in most cases, the term is mainly used in some data sets, which according to
their size, cannot be easily managed by some of the standard data analysis and management tools
or software (Tien, 2013, p.128). Ward (2013), explained that some of the various statistical
programs that are used by some of the researchers, such as the Strata and the SPSS, cannot be
able to process large amounts of data in a timely manner, which could result to various negative
impacts to the researcher or the study itself, such as time delay issues.
It is mainly due to such problems that newer technologies, such as the Hadoop have been
developed, which are mainly seen as the answers to the various limitations that are accrued by
the various traditional data management and analytical tools (Deroos et al., 2012, p.15). Being
able to have tools that could be used to analyse Big Data, could result in the opening up of new
opportunities, in the study of social sciences as well as psychology, which could further help
researchers to study or undertake future projects (Ovadia, 2013, p.133).
However, in undertaking to use Big Data, one should really get an understanding of what
it really is. The main difference between Big Data and the conventional data is mainly the
differences in the volume, velocity as well as a variety (Deroos et al., 2012, p.13). In this, the
volumeprimarily refers to the size of the Big Data, and the size is largely in Terabytes or even a
large volume. Furthermore, advances in technology, are continually making the size of the Big
Data to be bigger each passing day (Tien, 2013, p.130). On the other, variety mainly refers to the
way in which the Big Data is available in many forms or formats, the main distinction here is
between the structured data, which is mainly considered to be organized data. There is also the
unstructured data, which in many instances, is considered to be messy or unorganized (Deroos et
al., 2012, p.17).
Finally, velocity mainly refers to the way and speed, in which the data can be retrieved
for any type of analysis, such can be through the internet. As such, according to these three
criteria for categorizing Big Data, they are mainly considered to be diverse, large and constantly
growing. However, a key thing to consider is that the Big Data will always vary, with regards to
the volume, velocity, and variety, in places where they are retrieved. As mentioned previously,
there are really no formal rules, which categorizes or define what is constituted in Big Data, as
such, the three varieties, mainly give a rough overview of what the Big Dat is really like, in any
instance, and mainly used to differentiate it, with the traditional data (Deroos et al., 2012, p.20).
The Various Ethical Concerns for the Large Data Sets
For many researchers and scholars, having easy access to data is one of the main things
that they always wish for. However, the increase in size, storage and even accessibility of such
type of data, mainly comes with various ethical consequences or questions, mainly regarding the
issue of privacy and consent. The main reason for this is that as the amount of data on a certain
individual, becomes too much, it becomes relatively easy, to ascertain the identity of the
participant (Tien, 2013, p.032). According to Schadt (2012), the growing concern of data
available on the internet, as traces in the digital world, is likely to lead to privacy and ethical
concerns in the near future. Information such as the DNA, various GPS information, data from
the social networking sites and the genomic database, pose very high risks in the identification of
the individuals. Due to this, it is always desired that the privacy of the participants in any type of
research containing Big Data, is kept well and secure, to avert any identification of the
participants.
It should be put in records that the issue of privacy, is not the only thing poses ethical
concerns in any research that uses Big Data, there is also the issue of consent. Such is especially
a serious matter, in instances where data is gotten from the various social networks that allow the
users to display multiple information about their accounts on them (Oboler, Welsh, & Cruz,
2012, p.3). Furthermore, data can be freely available, on social networking sites such as
Facebook, Twitter and this raises various ethical concerns and issues, with regards to what type
of information is considered to be public information. One of the main prevalent problems,
concerning the issue of Big Data, mainly come from the various private or governmental entities,
that can access the data (Lesk, 2013, p.87).
However, researchers do need to be aware of the various ethical concerns as they go with
their research using the Big Data. The issues of privacy concerns and other ethical reasons can be
mitigated by the use of various databases, which collect only the required data for the research or
study and ensure that proper ethical guidelines are followed. As a result of this, having a data set
in which the various steps are taken to ensure that anonymity and consent issues are respected,
will be very crucial in ensuring that future researches that use Big Data are not compromised or
Jeopardized.
Big Data and Psychology
As mentioned earlier, when it comes to social sciences, Big Data has not been explicitly
used, as compared to other fields such as physics and biology (Miller, 2012, par.3). However,
some aspects of social sciences, have utilized the concept of Big Data in their researcher, through
works such as organizational and industrial uses (Davenport, Barth and Bean, 2012, p.45) and
even other forensic applications (Collins, 2013). However, still, the use of Big Data in the field
of psychology and social science is still at the lowest level. However, in the industrial
application, Big Data can be said to be effectively utilized in a bid to understand the various
business environments well, as well as aspects such as the various customer demands and
concerns and the consumer behaviour (Davenporth, Barth, & Bean, 2012, p.48).
It should be noted that Big Data could be very useful in the various processes of the
corporates, in which psychology may be very useful. Such include avenues such as marketing,
the various decision-making, and policy formulation process as well as advertising, can use the
advent of Big Data to ensure better operation and functioning of these processes. It should be
noted that various corporations, always have a magnitude of various information at their
fingertips. However, the main issue is that they are not utilizing the information efficiently
through proper analysis of the data (Koh, 2012, par.3).
As such, ensuring that the company has the efficient technology for the analysis of Big
Data, could be one of the main hindrances which prevent the company from having efficient use
of the Big Data information at their disposal. It should be put into records that any type of data
which cannot be utilized or analysed, is of little use to the company or the researcher. However,
through the use of Big Data, companies, policymakers, economists and psychologists, they are
normally provided with the necessary tools to analyze or get some information from the system.
One of the main projects in which Big Data is used in the analysis of a community at the
general level is through The Durkheim Project. The project is mainly an ongoing program that
mainly utilizes Big Data, to come up with the various solutions, for the increase in the suicide
rates among the veterans (Patterns and Prediction, 2014). It should be noted that through the
monitoring of the various data from the various social media networks and accounts of the
various military veterans that had served in the United States Army, the big data utilized could
be used in the determination or prediction of the susceptibility of the person, to commit suicide.
As a result of this, the main of the project is to come up with various interventions for the risks
of the veterans committing suicide at any moment (Patterns and Prediction, 2014, par.4).
As such, through the project, it is seen that the use of Big Data, has the ability to find out
the various subjective well-being data of a particular community in the real-time. Furthermore,
the availability of such a program, mainly requires minimal efforts from the participants and as
such, could offer better insight, in the issue of the suicide rates. As such, when the program
becomes increasingly successful, it can be used for the general population, showing that the
researches and the policymakers, can be able to use Big Data, for the analysis of the community
and thus aid in the implementation of various policies and even in the decision-making
processes.
It should be put in records that the majority of the individuals in the world have already
incorporated the various social media and network platforms in their daily lives.As such, and as
long as the researchers ensure that the data is gotten in an ethical way, it can be very helpful in
the determination of the well-being of the community or individuals at a certain geographical
level (Oboler, Welsh, & Cruz, 2012, p.14). It should be noted that in many projects in the social
science field, they mostly incorporate data that is present in the public domain, however, through
the analysis of the information, there is the advent of new information and new insights being
accrued from the project as predictions could be done on the existing data sets.
As such, through the process of better analysis of the available sources of information, it
is noted that Big Data can be very useful in a community or country, especially when it comes to
the determination of the various predictors of the well-being of the people in the society.
However, it should be noted that many researchers are also analyzing data that has already been
gathered, such as The Durkheim Project. It should be put in records that there are various
projects, which could successfully utilize the use of Big Data and the available methods of
analysis and tools to better understand the various needs of the society. Furthermore, through the
running of the analysis on the Big Data, the researchers are normally provided with a chance of
obtaining a diverse type of information, which could help in the looking of information at a more
broad or general term.

The Psychological Well-being
The present study will mainly undertake to study the application of Big Data, to
determine or predict the various indicators of well-being in a given community or society. As
such, the main aspect of the study is to study the various tools, big data technologies, the
potential beliefs and their possible applications or the various limitations that could be seen when
using Big data to determine the well-being of a people in a community. It should be put in
records that the issue ofwell-being, in a community, is very tricky as it encompasses various
factors that may not be easily discerned or apparent. The well-being of a people in a society is
mainly a concept that is multifaceted, which mainly looks at various aspects such as mental
health, stress, happiness, anxiety and even self-esteem, among other variables (James, Bore, &
Zito, 2012, p.430).
Ryff (1995), demonstrated that psychological well-being could be a very complex issue
that incorporates various factors such as prominent theories, while at the same time
hypothesizing how the various concepts of psychology, such as environmental mastery,
autonomy, and other things, have a relationship with the well-being of a person. On the other
section, other researchers have found out that there is some type of positive correlation that exists
between age, education, the extraversion and even the issue of conscientiousness, that leads to a
better psychological wellbeing of a people and the overall well-being of a community or a
society (Keyes, Shmotkin, & Ryff, 2002, p.1021). Another type of studies have also found
various positive correlations, between various factors such as the personality of the person and
their own well-being (James, Bore, & Zito, 2012). Others have also found that there is a positive
correlation of the time that is spent in conducting or participating in leisure activities, with the
well-being of the person (Trainor, Delfabbro, Anderson, & Winefield, 2010, p.467). Others have
also found that the income of the people, is sometimes related to the well-being of the persons
(Morrison, Tay, & Diener, 2011, p.168). As such, itis seen that the issue of psychological well-
being, can be affected by a variety of factors, which are not limited to certainissues. As a result
of this, it is very prudent, that one should undertake a broader perspective in the looking of the
various variables in the life of an individual, which may be used in the determination of the well-
being of the person. Furthermore, it is very paramount to determine which variables, are more
significant, when it comes to the determination of the various aspects that affect the
psychological well-being of the person. Such mainly shows that conducting an analysis of the
well-being of the people in the community, using Big Data, could be very useful in the
determination of the various relationships that exist between the various variables that affect the
well-being of persons.
The Current Study
It should be noted that the real-world issues, although they may appear simple, the
underlying factors causing them, usually have deeper roots, and coming up with various
analytical methods is very prudent if one desires to find out about the case. Due to this reason,
Schultes (2013), argued that the various variables that may be assumed to be independent may
not be that way, and furthermore, some subtle correlation, exist, even in large data sets.
Moreover, the scholar suggests that as the size of the data sets continues to increase, there is
evena greater chance, to detect the small correlations, which exist in the data, and the various
interconnections that could exist in the data (Schultes, 2013, par.4). The tendency to stay away
from the more prominent hypothesis, using the big data analysis, is also found to be popular,
among the various biomedical researchers and literature.
Furthermore, Miller (2012), in the discussion of the various uses of the big data analytics
in the medical field, found that some of the researchers in the field have come up with a way of
doing some research, hypothesis-free, owing to the main reason that they are using big data
analytics. Such enables them to let the technology run, and then be able to view some results at
the end of the processing of the data. Due to this aspect of the technologies, many researchers are
beginning to find out some piece of information, which could have been missed or ignored, and
by discovering such facts, they can then undertake to delve deeper into the real issues without
having to waste much time (Miller, 2012, par.5). As such, due to this, the scope under which the
researchers could look into finding and examining various solutions, big data analytics, is
proving to be one of the best methods that could be used in any instance. As mentioned in the
thesis, the current studies will attempt to look into ways, through which big data, could be used
in the determination of the various predictors of well-being among people in a society.
The study will also look at the previous exploratory studies and aggregate the data found
in the study and determination of the well-being of the Canadians, through the use of Big Data
analytics. As a result of doing this, it can serve as a testament to the kind of basic research which
is capable of being done.With the use of the tools and also the big data, and how social scientists,
and psychologist, can welcome the use of technology, to better their research and get better
insights, on certain issues affecting the community. Due to some various limitations, the study
will mainly rely on the data that was previously collected, concerning the well-being of
individuals in the Country. As such, a good example of the hypothetical measure of the well-
being of people in a community could mainly take place in the form of questions about the
mental health of people as well as the stress levels, among certain groups of citizens.
The various potential predictors of the wellbeing of individuals could mainly take place
in the form of personal traits, substance abuse in the community, the levels of the times spent in
leisure among other factors that could not be easily accounted for, in some small-scale research
that used to be done previously.
The primary point of the project is to focus more on how big data analytics, could be used
in the study of the psychological well-being of a community. As such, the paper will mainly
tackle some of the newest technologies that are used in the analysis of big data, such as software
used in such analysis and their potential benefits or limitations. Ward (2013), noted that there is a
various problem with some of the traditional methods or programs for the analysis of the bid data
sets. As a result of this, the various predictors of the well-being of the community will be looked
at, to determine the type of information that could be extracted from such type of large data. It
should be noted that the analysis of big data, is majorly dependent on the issue of accessing the
data as well as the state in which the big data is in, at the moment of retrieval. Such also reveals
another dimension and focus of the research, which stands to show the various roles that big data
analytics, could be used in the field of psychology as well as social sciences.
In trying to determine the various predictors of the well-being of the people in a
community, the various data were combined, collected, organized and then analyzed, to see what
type of predictors could be seen affecting the well-being of the people in the community.
Furthermore, various attempts were also made in determining the usefulness of the various big
data analysis programs, such as Hadoop, to determine their worthiness in such type of research.
As such, by doing this, the main hypothesis in mind is that some new insights could be found,
which were previously ignored or unknown from the previous types of researches. Furthermore,
awareness will also be shifted to the issue of the new technologies of big data, and how it can be
useful in the determination of the various aspects that affect the well-being of a community.
The awareness aspect of the paper will mainly take the form in which there will be an
evaluation of the current technologies and how these technologies are being applied in the big
data analytics. Because of the dual nature in which the current project is based, there is the
advent of the dual nature in which the current study is based. There will include two methods
sections for a better explanation of the various methodologies, utilized in the paper. The first
methodology section will mainly deal with the exploration of the various technologies, being
utilized in the big data analytics. The other methodology section will mainly deal with the
predictors of the well-being of people in a community.

Methods
The Exploration of the various Big Data Technologies
Participants
In this, no participants were mainly required in the study of the exploration of the various
technologies that exist in the exploration of the big data analytics.
Materials
The current study mainly undertook to utilize a computer that had the Windows 7 as its
current operating system, with a various specification such as processing speed of about 3.50
GHz and a total RAM capacity of about 16GB. Additional space on the hard drive that was
required in the study was about 763 GB, mainly needed for the running of the various virtual
systems. Other materials that were required and utilized in the study included an accessible
software design for the access of big data, a timing device, such as the stopwatch as well as the
statistics Canada data files, which were mainly obtained from a public library, accessed from the
Equinox data delivery system.
Some of the Big Data technologies that were used for some comparative functions, in the
study, included Hadoop 4.5.0, Hortonworks Sandbox 1.3, that run on a Linux based operating
system, Microsoft’s Windows Azure, which is mainly an online cloud computing service, that
contains various technologies like the HDinsight, which is largely a Hadoop file system. The
final technology that was looked at was the IBM InfoSphere BigInsights QuickStart, version 2.1,
that also ran on the Linux operating system. is the plagiarism mostly in the reference section? It
should be put in records that all of the programs were mainly for free, and easy to access except
the Azure from Microsoft’s Windows, which required an academic pass for it to be downloaded
or used in any research process. Furthermore, the IBM SPSS 20, was mainly used to determine
the capabilities of the software to handle big data.
Procedures
One of the main steps taken in this instance was coming up with a sample study, that
could be used in the facilitation of the comparison, between the big data analytics tools and the
standard tools. As such, the study mainly took the form of looking into the various predictors of
the psychological well-being of the community, by studying the data provided by the Statistics
Canadian Community Health Survey (CCHS). The data was big data, owing to the fact that it ran
from the year 2003, up to the year 2012, hence, verifying that it was a large data set. As a result
of the large data sets, six different data sets were mainly utilized in the paper. They included the
CCHS 2004, the CCHS 2007-2008, the CCHS 2009-2010 and finally the CCHS for the year
2011-2012.
The data sets were then amalgamated in the merge function of the SPSS. In this, it was
observed that there were about 141 variables, which were mainly shared in the various iterations
of the survey, which were amalgamated in the data set. The various code names for the 141
variables are shown in table 1. It should be noted that the legend is also engraved in the table,
which is mainly relevant to the various Statistical Canada documents (Statistics Canada, 2013).
Furthermore, most of the variables had to be recorded for the sake of proper analysis to be
conducted, owing to the main reason that some of them included some non-answers as well as
non-applicable points and various refusals, from some of the response from the participants. As
such, they were recorded and noted to be missing values, within the SPSS program. Furthermore,
the various yes and no responses were mainly reversed, in a bid to ensure that there was
consistency in the data fed to the SPSS program. Appendix A will show the full legend of the
various recorded variables in the study.
Figure 1
The methodology of the current studyis listed in the Method section that is just after this
section. Various attempts were made to ensure that various statistical analyses, were performed
on the data sets, using the Big Data technologies mentioned in the previous section and as well as
through the use of SPSS.
Furthermore, SPSS was also used in the evaluation of its ability and functioning to handle
some big datasets, through the use of the linear regression analysis by the aspect of the timing
basic multiple being adjusted as the data increased exponentially. Such was mainly achieved
through the taking of the various data files, and merging the various cases with themselves and
then resulting in saving the increased files and then repeating the entire process with the saved
file. Such effectively led to the doubling of the various number of cases, in each of the cases and
the size of the data sets.
The other step that was taken was finding the accessible big data analytics technology
and software such as the various programs or services thatwere based on the Hadoop program,
which is one of the industrial standards when it comes to the analysis of the big data for various
functions. Such would allow for the analysis of the various predictors of the well-being of the
people in the society, as well as offer a chance, for clear comparison, with some of the standard
software used, such as the SPSS. One of the main benefits, accrued through the use of Hadoop, is
the fact that it results in the reduction of the larger jobs, into smaller manageable tasks. The
program also allows for the allocation of the various tasks as well as give room for the various
cores to work parallel to each other. By doing this, there are the chances of completion of very
large tasks, in a very timely or a shorter time, especially when compared to the various
traditional programs for statistical analysis.
One of the main steps taken into finding these type of big data analytics programs, was
through an online search, in which the attention was mainly turned to the low-cost programs or
some of the units that are offered for evaluation by the various developers. During the research,
several of the Hadoop programs were tested, by mainly using the big data obtained from the
Statistics Canada. The tests include applications such as the Sandbox version 1.3, the IBM Big
Insights QuickStart 2.1 and the Azure from Microsoft Windows.
Method of the Study
In thesample size for the study was N=681,578, mainly taken from the Canadian
Community Health Surveys, from the archival Statistics Canada Data. The total population had
males totalling about 45%, which was 310,711 in total, while the rest were females. The age of
the various respondents ranged from the year 12 to over the age of 80. Among the males, the
average age among them was in the range of 40-44, while the average age for the female
respondents, ranged from 45-49 years. The participants were mainly contacted by the Statistics
Canada agents, for a face to face interview, through the visits to their homes, mailed letters and
even phone calls to request them for their participation.
Materials for the Study

In the current study, only statistical technologies and tools are utilized, for some
comparative purposes.
The Procedure for the Study

The respondents were mainly visited in their homes, by some of the interviewers, who
upon agreement with the participants, asked them questions, and then recorded the data on their
laptops. After the interview had ended, the participants were then thanked for their participation
and given a chance to ask any question, regarding the research. Furthermore, many of the
participants did phone interviews, which were then recorded, using the similar format. In this, the
participants were mainly called and interviewed, while the interviewer, recorded the responses
on the laptop or computer. However, the full procedure of the collection of the data can be found
from the Statistics Canada documents (Statistics Canada, 2013).
The Results of the Study

The multiple linear regressions were mainly done using the SPSS 20 software program,
in a bid to determine, some of the possible predictors of the well-being of people in a
community. Such was defined by the variable, GENDMHI. The other stepwise regression was
also run with the PIN as 0.05, while the POUT was 0.10. There was also the pairwise removal of
some of the variables that were missing from the data set. From the analysis using the SPSS
software, some of the possible predictors of the well-being of the people in the community were
found,and the multiple regression was again run on only the variables that were found to have
s8ignificant predictors, within the previousmultiple regressions that were conducted in the study.
In this, the amount of variance that amounted to the best model can be shown in table 2. As such,
the regression equation for the best model of fit was found to be as follows;
Figure 2
Table 2 The R and R square values for the best model fit
Model R R square Adjusted R Std Error of the
Square Estimate
19 . 480𝑠 .230 .230 .812
Table 1
The legend for each of the variable that is included in the best model can be found in the
Appendix B. Such mainly comes to the conclusion that some of the variables, were found to be
very significant when it comes in the prediction of the well-being in the society.
The various correlation tables, as well as the various descriptive statistics for the best
model unveiled, can be located in the Appendix C. Table 3 mainly includes the Anova Table, for
the best model.
TABLE 3: The Anova Table for the Best Model

The Model The Sum of The Degree The Mean F Sig
Squares of Freedom Square
(pdf)
19 7614.881 17 447.937 679.733 .000
25462.523 38639 .659
33077.044 38639
Table 2
Furthermore, the homogeneity of the variances of the various criterion variableswere also
studied, for the various specific values of the predictors. Such was mainly done through the
standardized residuals, in a bid to determine whether the variances of the variousresiduals differ
and in what way they are different. There was also the use of Cook’s distances, in a bid to
determine the influence of the outliers, when it comes to the line of regression. Table 4 below
shows the residual statistics.
Minimum Maximum Mean Std. N
Deviation
Predicted .95 4.29 2.71 .451 30390

Value
Std. -4.636 2.870 .678 1.016 30390
Predicted
Value
Standard .009 .058 .020 .005 30390
Error of the
predicted
Value
Adjusted .95 4.29 2.71 .451 30368
Predicted
Value
Residuals -4.088 2.945 .114 .906 30368
Std. Residual -5.038 3.629 .140 1.117 30368
Stud. -5.040 3.629 .141 1.117 30368
Residuals
Deleted -4.088 2.949 .114 .907 30368
Residuals
Stud. Deleted -5.042 3.632 .141 1.118 30368
Residuals
Mahal. 4.072 194.635 24.424 14.844 30390
Distance
Cook’s .000 .002 .000 .000 30368
Distance
Centered .000 .005 .001 .000 3090
Leverage
Value
Table 3
From the various multiple analysis that were timed on the SPSS 20, it was observed that
to find a distinct trend in a largenumber of times, mainly takes a while to process the
largerdatasets. It should be noted that one of the data which was used in the study, was about
0.21 GB in size, and the software took about 20 seconds in reporting the output. After the
number of cases were doubled, up to 8 times, and observing how long the SPSS 20 took to
produce each of the production of the data, it was observed that a file which was 53.75GB, took,
slightly over an hour to be analyzed (3852 seconds) The various results from the sizing tests, are
shown in the figure below.

Figure 3
Results
By undertaking an exploration of the various technologies used in Big Data, various
issues were seen, over the course. However, none of the big data programs for analysis were
tested, in a bid to find out the level of analysis thatwas required in the research.
It was noted that the Hortonworks Sandbox, version 1.3, had a relatively simple setup,
which did not require the complicated hardware. Furthermore, it was noted that the lessons and
the tutorials, which were inbuilt to the program, were very useful in aiding the use of the
program. One of the main drawbacks of the software is that it is mainly a learning tool, and as
such, they are not capable of undertaking any statistical analysis. However, after checking the
several tutorials and running various commands, it was determined that the program could not be
that useful for the study.
The other program that was tested was the Cloudera Distribution which included the
Apache Hadoop version 4.5.0. However, the one that was used in the study was a free version,
which did not have the full functionality of the full program. Furthermore, various components
of the software were also deactivated, resulting in more error messages. Furthermore, the
program also had some configuration issues, which meant that its application in the study, was
limited. Various efforts were undertaken in a bid to fix the problems in the program, such as
installing the program afresh, but it still did not work as expected.
On the other hand, the IBM Infosphere BigInsights QuickStart 2.1 was the best option
when it comes to the analysis of Big Data, especially considering its usability and functionality.
The program allowed the data to be easily displayed and also appeared to undertake
some statistical analysis. However, running the program had a significant burden on the
hardware requirements of the computer. Such resulted in various issues, such as slow response
and difficulties in running the commands. However, it should be known that activating the
different statistical analysis, was hypothetically possible, but was not readily feasible.
Furthermore, on several other occasions, it was seen that in the multiple data sets, the program
was seen to be dropping some cases, which was a significant issue of concern. However, a data
file was uploaded, and some Basic pig, which is an application within the Hadoop, that mainly
issues commands and also interacts with the data. Such enabled various commands to be run by
the researchers, but the program had issues with speed and the various difficulties in the
statistical analysis, made the program to be not good in analyzing the current research.
The last big data technology that was utilized and analyzed was the Microsoft’s,
Windows Azure. It was the only cloud-based Hadoop system that was in use. However, despite it
being cloud-based, and thus making it bypass the various hardware requirements, the program
also had its problem. It should be noted that the Azure, is a collection of various services and
products, and big data analytics is one of the core partswhich are applied by the program.
However, the website was not that user-friendly, and was more like a status window, that gave
only the services which were to be implemented., the type of files that are stored the user to
check or create the various Hadoop clusters. However, it should be noted that any functionality,
such as the uploading of data, viewing the data, interacting with the data and also analyzing,
were mainly detached from the website.
Furthermore, the various commands had to run through a separate command line
interface, that was downloadable, which mainly operated on the local machine. Unfortunately,
the interface offered very little help, especially to a user that mainly wanted to learn various
aspects such as the interaction of the data, such made the program to appear cruder when
compared to the other big data analytics programs. Furthermore, the only way in which the
statistical data could be analyzed was by mainly downloading the file and opening it locally, and
Excel will analyze it. Such a procedure meant that the purpose of cloud computing was defeated.
As such, the data file was uploadedto the website, and various attempts were made, to
connect it with Excel, thus helping it to perform the various statistical analysis. However, it
mainly proved unsuccessful, due to the appearing of many errors. As such, due to the lack of the
cloud computation for statistical analysis, the crude interface and the various issues present, the
software mainly failed to meet the various requirements for the present study. The time
constraints prevented the exploration of other Big Data technologies, which are currently in use.
The Research and Findings
While some of the strongest predictors of the psychological well-being have been
introduced in the previous section of literature review, using the large dataset, as well as not
coming with some preconceived opinions of hypothesis, mainly resulted in the unveiling of new
insights, which might have been missed or deemed to be insignificant. An excellent example to
this is that the work by the researchers, Gestel, Jansen and Theunessien (2011), found out that
the Binge drinking was some of the predictors of a poor sense of well-being among the teens in
Dutch, those aged between 12-15, however, the same could not be said on the older teens, aged
between 16-18. In the current study, Binge drinking, which was coded as ALC_03, was
discovered to have a meagre effect size of (r = -0.07, and p<.000). Such demonstrates that the
work by Gestel, Jansen, and Theunissen (2011) is a better indicator of small trends, which are
real and thus, may hold for a broader group, as opposed to that which was originally studied.
However, in some complex analysis that was performed on the large data set of the
Canadian Community Health Survey, the data could reveal whether such correlations are true for
certain ages. However, if the data set is not large enough, then some information could be
missed. Furthermore, looking at the study done by Cook and Benton (1993), they found that
there was a positive correlation, between the intake of fruits and vegetables, but this was only
related to the females. However, the current study found that there is a general effect of (r=0.098,
and p<0.000). As a result of this, further research when done on the two correlations within the
data set could lead to a revelation of whether the relationship changes, when the issue of gender
is factored in in the analysis. Furthermore, looking at such type of relationships, through the use
of a more extensive data set, will allow for a different finding to be seen, and also provide more
ways that could lead in the discovery areas, which could further be of benefit to the researchers.
In the current study, some predictors of the psychological well-being were found to have
been missed. Some of the unique findings were found to be the size of the household (r=0.081,
and the p<0.000), the heart disease was (r=-0.071, while p<0.000), issues of high blood pressure,
were found to be (r = -0.074, while p<0.000). Such can mainly be attributed to the variables not
being studied, due to the fact that they have small effects, which did not show in the data set that
was smaller or due to other many reasons. Such also shows some of the benefits of working with
larger data sets as it allows for the data to speak for itself. However, it also results in the
questioning of the way various psychologies and social scientists, determine what makes a
relationship to be significant.
When the sample is large and enough, even the smaller effects are seen to be made
significant, and this leads to the various considerations, that must be taken, especially in
determining whether a relationship is significant or not. With more researchers moving into the
use of the large data set, a conversion will mainly need to be done, especially regarding the issue
of what really constitutes, a real effect mainly due to the significance as well as conceivably,
with nearly every other variable, when the data set is very large. However, it should be noted that
even the small issues, could be significant, especially when it comes to the size of the population,
such as the fact that drug interactions, could be very lethal in small sample population (Miller,
2012, par.4). Such an issue may become a worrying trend in many years to come, necessitating
the need to conduct more research on the subject.
It should be noted that even though the psychologists and scientists rarely use Big Data, it
may have many benefits. There is also the potential of the technology to benefit the various
social scientists, by mainly leading to an expansion of the population, which is to be examined.
Such can lead to the knowledge of the type of solutions which are needed to be made concerning
the issue, based on the analysis from the data. Furthermore, some of the current limiters to the
social scientists and psychologist, especially when it comes to the research, are slowly being
solved as the technology advances.As such, it more likely that many of the problems that are
currently in existence, will be solved through the proliferation of the software to other broader
areas and also the advancement of technology. Some of the main issues which are holding back
the research, and the use of the Big Data, as identified by the current study, mainly fall into four
general areas.
The first issue is mainly related to the access and costs. The current project mainly
witnessed various problems, especially when trying to use and access the various Big Data
technologies, as only free programs or those that cost lowly, were chosen and tested, and these
types of programs had little functionality to aid in the study. Furthermore, it was discovered that
one of the main ways of accessing the software, was majorly through contacting the developing
companies and asking for a copy of the program. Such was very problematic, as some of the
companies did not consider it prudent to give the software to independent researchers, as well as
the reason that it was a relatively small project. However, over the course of the study, the issue
got better, as there was the discovery of more avenues, such as the academic passes and the
various application programs which were in the process of being developed, allowing for easier
access to the Big Data technologies. Such programs required no additional costs, which could
have been very difficult for a lone researcher to handle.
The other issue that was seen was that most of the programs had no capability of
conducting some statistical analysis. As such, all of the programs that were tested in the study
had no capabilities for such type of analysis. Such is one of the main limitation of such programs
as they mainly require the data to be exported to other programs, and depending on its size or the
nature of the data set, would result in some limitation, especially when looking at the current
statistical programs.
Another issue that was discovered is that the programs mainly require high knowledge in
a bid to use them effectively. As such, the majority of the programs, require a moderate or basic
understanding of the issues such as programming, the running of virtual machines and knowing
how to navigate on the operating systems, which were not that familiar or typical. It should be
that all of the programs that were tested in the use of Big Data, required that the various
commands were written in a variety of programming languages., such as Java, R, HiveQL and
Pig Latin. Furthermore, some of the programs, needed to be run as the Linux Virtual machine
which may provide a lot of challenges to the inexperienced user, especially when it comes to the
navigation.
Finally, another issue which was noted when it came to the utilization of the Big Data
technologies, was the issue of the hardware needed to run the programs. The main reason for this
is that many of the programs, mainly require more powerful computers than the one which
people can easily access. An excellent example to this is that the IBM Infosphere BigInsights
QuickStart, version 2.1, requireda computer that had a minimum RAM, of 8GB, to be allocated
to the virtual machi9nes, in a bid to make it run. However, despite the allocation, the program
was observed to be slow, as more RAM was needed. Such and many other issues, are some of
the main reasons that the implementation of the use of Big Data, in the study of the well-being of
a community, are adequately used by the various researchers.

Conclusion and Findings
From the above research, it is seen that the use of Big Data, mainly provides researchers
with the subjective well-being of a community, the ability to access a second and large sample of
data. The data primarily depends on the data source, in a bid to understand its size. As a result of
this, Big Data may be particularly useful, especially in the study of rare events, which may have
an impact on the subjective well-being of a person or community. Such rarely includes occurring
events, such as natural and the human-made disasters. Furthermore, it should be known that Big
Data, could be used in the study of the subjective well-being of people in a community or region,
or even country.
However, it is important to note that the Big Data, may sometimes have some limitations.
And the various researchers, need to be aware of them. Big Data, should be used more, in a bid
to find the rare events that have a significant impact, on the wellbeing of the person. ,also it
should be put in recordsalso that even though the use of Big Data, may not replace the tradition
self-report measure on the subjective well-being of the people in the near future, they can mainly
serve as a very important additional source of data.

References
Choudhary MA, Levine P, McAdam P, Welz P (2012) The happiness puzzle: analytical aspects
of the easterlin paradox. Oxford Economic Papers 64(1):27{42,
DOI 10.1093/oep/gpr006
Choi, H. and H. Varian (2009), ‘Predicting Initial Claims for Unemployment Insurance Using
Google
Trends’, Technical report, Google. Available from:
http://research.google.com/archive/papers/initialclaimsUS.pdf
Cook, R., & Benton, D. (1993). The relationship between diet and mental health. Personality and
Individual Differences,14(3), 397-403. doi:10.1016/0191-8869(93)90308-P
Davenport, T. H., & Barth, P. (2012). How big data is different. MIT Sloan Management
Review, 54(1), 43-46.
Deaton A, Stone AA (2013) Economic analysis of subjective well-being - two happiness puzzles
Deroos, D., Deutsch, T., Eaton, C., Lapis, G., & Zikopoulos, P. (2012). Understanding big data:
analytics for enterprise class Hadoop and streaming data. New York: McGraw-Hill.
Ettredge M., J. Gerdes and G. Karuga (2005), “Using Web-based Search Data to Predict
Macroeconomic Statistics”, Communications of the ACM, 48 (11), pp. 87-92, Available
from:http://portal.acm.org/citation.cfm?id¼1096010.
Fleurbaey M (2009) Beyond gdp: The quest for a measure of social welfare. Journal
of Economic Literature 47(4):1029{75, DOI 10.1257/jel.47.4.1029, URL http:
//www.aeaweb.org/articles.php? doi=10.1257/jel.47.4.1029
Frey BS, Stutzer A (2002) What can economists learn from happiness research? Journal of
Economic Literature 40(2):402{435, DOI 10.1257/
002205102320161320, URL http://www.aeaweb.org/articles.php?doi=10.
1257/002205102320161320
Gestel, A., Jansen, M., & Theunissen, M., (2011). Are mental health and binge drinking
associated in Dutch adolescents? Cross-sectional public health study.BMC Research
Notes, 4(1), 100-100. doi:10.1186/1756-0500-4-100
James, C., Bore, M., & Zito, S. (2012). Emotional intelligence and personality as predictors of
psychological well-being. Journal of Psychoeducational Assessment, 30(4), 425-438.
doi:10.1177/0734282912449448
Kahneman D, Deaton A (2010) High income improves evaluation of life but
not emotional well-being. Proceedings of the National Academy of Sciences
107:16,489{16,493
Keyes, C. L. M., Shmotkin, D., &Ryff, C. D. (2002). Optimizing well-being: The empirical
encounter of two traditions. Journal of Personality and Social Psychology,82(6), 1007-
1022. doi:10.1037/0022-3514.82.6.1007
Koh, J. (2013, Jun 18). Understanding consumers with help from big data. The Business Times.
Retrieved from
https://www.lib.uwo.ca/cgibin/ezpauthn.cgi/docview/1368686527?accountid=15115
Lazer D, Pentland A, Adamic L, Aral S, Barabasi AL, Brewer D, Christakis N,
Contractor N, Fowler J, Gutmann M, Jebara T, King G, Macy M, Roy D,
Van Alstyne M (2009) Computational social science. Science 323(5915):721{723
Lesk, M. (2013). Big data, big brother, big money. IEEE Security & Privacy, 11(4), 85-89.
doi:10.1109/MSP.2013.81
Miller, K. (2012, January 2). Big data analytics in biomedical research. Biomedical Computation
Review. Retrieved from
http://biomedicalcomputationreview.org/content/big-data-analytics-biomedical-research
Morrison, M., Tay, L., & Diener, E. (2011). Subjective well-being and national satisfaction:
Findings from a worldwide survey. Psychological Science, 22(2), 166-171.
doi:10.1177/0956797610396224
Oboler, A., Welsh, K., & Cruz, L. (2012). The danger of big data: Social media as computational
social science. First Monday, 17(7) doi:10.5210/fm.v17i7.3993
Robeyns I (2005) The capability approach: a theoretical survey. Journal of human
development 6(1):93{117
Robeyns I (2006) The capability approach in practice. Journal of Political Philosophy
14(3):351{376, DOI 10.1111/j.1467-9760.2006.00263.x, URL http://dx.
doi.org/10.1111/j.1467-9760.2006.00263.x
Ryff, C. D. (1995). Psychological well-being in adult life. Current Directions in Psychological
Science, 4(4), 99-104. doi:10.1111/1467-8721.ep10772395
Schultes, E. (2013, June). Big data: The myth of independent variables. The 32nd Annual
Conference of the Society for Scientific Exploration. Lecture conducted from Dearborn,
MI.
Sen A (2008) Capability and Well-being. In: Hausman D (ed) The philosophy of
economics: an anthology, Cambridge University Press, New York, pp 270-294
Statistics Canada (2013). Canadian Community Health Survey (CCHS) annual component: User
guide 2012 and 2011-2012 microdata files. Retrieved from:
http://equinox2.uwo.ca/docfiles/cchs/2012/cchs-escc2012_2011-2012gid-eng.pdf.
Stiglitz J, Sen A, Fitoussi JP (2009) Report by the commission on the measurement of economic
performance and social progress. Tech. rep., INSEE,
URL http://www.insee.fr/fr/publications-et-services/dossiers_web/
stiglitz/doc-commission/RAPPORT_anglais.pdf
Tien, J. M. (2013). Big data: Unleashing information. Journal of Systems Science and Systems
Engineering, 22(2), 127-151. doi:10.1007/s11518-013-5219-4
Trainor, S., Delfabbro, P., Anderson, S., & Winefield, A. (2012). Leisure activities and
adolescent psychological well-being. Journal of Adolescence, 35(2), 467-467.
doi:10.1016/j.adolescence.2012.02.005
Ward, B. W. (2013). What’s Better—R, SAS®, SPSS®, or stata®? Thoughts for instructors of
statistics and research methods courses. Journal of Applied Social Science, 7(1), 115-120.
oi:10.1177/193672441
Appendix
Appendix A
RECORDING
1 = 2, 2 = 1, 6 7 8 9 = 999
ADM_PRX
ALC_1
CCC_071
CCC_101
CCC_121
CCC_131
CCC_141
CCC_171
PAC_1A – PAC_1Z
PACFLEI
SDCFIMM
SMK_10
SMK_05D
SMK_06A
SMK_09A
SMK_10A
96 97 98 99 = 999
CCCG102
DHHGLVG
SACDTOT
SMKDSTY
99.9 = 999
PACDEE
96 = 0, 97 98 99 = 999
ALC_2
SMK_05C
96 = 1, 97 98 99 = 999
ALC_3
INCGPER
996 = 0
PAC_2A – Z
SMK_204
SMK_05B
6 7 8 9 = 999
ADM_N09
ADM_N10
DHH_OWN
DHHGHSZ
DHHGMS (Also, recorded values into not
married [1] and married [2])
EDUDH04
EDUDR04
FVCGTOT
GEN_01
GEN_02B
GEN_07
GEN_10 (Also, values reversed for
consistency [ex., 1 = Very Weak])
GENDHDI
GENDMHI
HWTGISW
INCG2
INCGHH
PACDFR
PACDPAI
PACFD
SDCGRES
SMK_202
SMK_01A
6 = 1, 7 8 9 = 999
ADM_N11
6 = 0, 7 8 9 = 999
PAC_3A – Z
1 = 2, 2 = 1, 6 = 1, 7 8 9 = 999
SDC_8
990 – 1000 = SYSMIS
All variables
Appendix B
ALC_3 (Binge Drinking): Frequency of having 5 or more drinks in the past year.
CCC_071 (High Blood Pressure): Yes or no response forthe presence of the medical
condition.
CCC_121 (Heart Disease): Yes or no response forthe presence of the medical condition.
CCC_131 (Cancer): Yes or no response forthe presence of the medical condition.
CCCG102 (Onset of Diabetes): Indicates age participant was when diagnosed with
diabetes.
DHHGAGE (Age): Indicates the age bracket participant falls within.
DHHGHSZ (Household Size): Indicates how many people live with the participant.
DHHGMS (Marital Status): Indicates if the participant is married or single.
DHHSEX (Gender): Indicates if participant is male or female.
EDUDR04 (Education Completed): Indicates highest level of education completed.
FVCDTOT (Fruit & Vegetable Intake): Frequency of eating fruits and vegetables daily.
GEN_07 (Life Stress): Indicates perceptions of stress present in participant’s life.
GEN_10 (Community Belonging): Indicates participant’s sense of belonging to a
community.
GENDHDI (Self-Report Physical Health): General physical health from poor to
excellent.
GENDMHI (Self-Report Mental Health): General mental health from poor to excellent.
HWTGBMI (Body Mass Index): Self-reported BMI score.

INCGPER (Personal Income): Reported personal income from all sources.
PACDEE (Energy Expenditure): Energy spent on physical leisure activities daily
Appendix C
Figure 4
Figure 5

Big Data With Well-Being

Hochgeladen von

Dokumentinformationen

Originaltitel

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Big Data With Well-Being

Hochgeladen von

Copyright:

Verfügbare Formate

Big Data with Wellbeing 1

Big Data with Wellbeing

City and State of School

concepts that are also availed for this research

and comfort, as a measure of the direct level of utility.

been majorly developed to cater to the various limitations.

being of the community in a continuous period.

determine or predict the well-being of people in a community.

well-being of a community or people, can be complicated to obtain, especially when methods

the place is balanced and sustained.

discussed. The literature review section will be structured as follows;

 The traditional well-being measurement, through the use of GDP

 The capabilities approach

 The Use of Big Data in the measurement of Well-being of a community

The Use of GDP to measure well-being and issues as well as limitations

When it came to the measuring of the socio-economic progress of individuals or the

for the town (Deaton and Stone, 2013, p.2).

democracy, and even freedom.

the production, or even pollution.

scholar, Amartya Sen.

The Capability Approach

wealth, social well-being and psychological issues among the people.

estimated through the use of the National Income Per capita.

Capabilities for the Measurement of Well-being of Individuals

mainly aimed at determining or predicting the level of well-being of the community or

lack of various material goods.

individual (Sen, 2008, p.280).

complex issues such as self-esteem and self-respect.

The Capability Approach, the Nussbaum Version

possibilities, into means, outcomes and ends (Nusbaum, 2000).

personal.Nussbaum mainly elaborates on three different conversion factors that reconsidered in

his perspective. They include as mentioned below;

1. Environmental; which primarily concern the geographical location, the natural

characteristics and the climate.

in the country or the community.

conversion of various commodities into functioning

The Various Limitation of the Capability Approach

capability approach; these are listed below;

1. The general assessment of the human development in a given country

3. Empirical and theoretical analysis of the various policies in the countries

4. Identifying the poor in the developing nations

5. Criticizing the different social norms

7. Gender inequalities assessment

8. In exploratory and descriptive research

Use of Big Data in the Measurement of Well-being of Individuals and Communities

development of new avenues in the study of the well-being of individuals in a community or a

researchers to study or undertake future projects (Ovadia, 2013, p.133).

unstructured data, which in many instances, is considered to be messy or unorganized (Deroos et

al., 2012, p.17).

The Various Ethical Concerns for the Large Data Sets

of information is considered to be public information. One of the main prevalent problems,

that can access the data (Lesk, 2013, p.87).

Big Data and Psychology

through proper analysis of the data (Koh, 2012, par.3).

the determination of the well-being of the community or individuals at a certain geographical

broad or general term.

The Psychological Well-being

determine or predict the various indicators of well-being in a given community or society. As

Zito, 2012, p.430).

hypothesizing how the various concepts of psychology, such as environmental mastery,

better psychological wellbeing of a people and the overall well-being of a community or a

The Current Study

among the various biomedical researchers and literature.