Sie sind auf Seite 1von 58

Big Data with Wellbeing 1

Big Data with Wellbeing

By (Name)

Course

Tutor

School

City and State of School

Date
Big Data with Wellbeing 2

Abstract
The current study mainly sought to analyze how the emerging field of Big Data analytics,

could be used in the study of the well-being of a community. In the paper, the study looks at the

various indicators of a psychological well-being, within various statistics, gotten from the foreign

nation Canada, With the Statistics -Canada, where data sets were acquired to mainly test the

capability of the statistical software (SPSS), for an extensive data set, of population size of

N=681,578. The results from the study found out that that relationships, stress, diet, income,

health, education, alcohol abuse and education, could mainly be used as predictors of the well-

being of a community. The software was found to be capable of analyzing such magnitudes of

data. However, when actual analysis of Big Data, such as those in Petabytes and Terabytes, may

be limited due to various performance issues. The research will again discuss some of the

previous and traditional methods that were used in the measurement of the well-being of

individuals as well as their various limitations. It will delve in topic in the subject of theis study

topic to allow for actual figures.Findings of the paper may as well help and the topic other

concepts that are also availed for this research


Big Data with Wellbeing 3

Table of Contents
Abstract ......................................................................................................................................................... 2
Introduction .................................................................................................................................................. 6
Significance of study ................................................................................................................................. 9
Literature Review ........................................................................................................................................ 10
The Use of GDP to measure well-being and issues as well as limitations .............................................. 11
The Capability Approach ......................................................................................................................... 14
Capabilities for the Measurement of Well-being of Individuals ......................................................... 15
The Capability Approach, the Nussbaum Version .............................................................................. 17
The Various Limitation of the Capability Approach ............................................................................ 18
Use of Big Data in the Measurement of Well-being of Individuals and Communities ........................... 19
The Various Ethical Concerns for the Large Data Sets ............................................................................ 21
Big Data and Psychology ............................................................................................................................. 22
The Psychological Well-being ................................................................................................................. 25
The Current Study ................................................................................................................................... 26
Methods ...................................................................................................................................................... 30
The Exploration of the various Big Data Technologies ........................................................................... 30
Participants ......................................................................................................................................... 30
Materials ............................................................................................................................................. 30
Procedures .......................................................................................................................................... 31
Method of the Study ............................................................................................................................... 33
Materials for the Study ........................................................................................................................... 34
The Procedure for the Study ................................................................................................................... 34
The Results of the Study ......................................................................................................................... 34
Table 2 The R and R square values for the best model fit .................................................................. 35
TABLE 3: The Anova Table for the Best Model ................................................................................... 36
Results ......................................................................................................................................................... 39
The Research and Findings ..................................................................................................................... 42
Conclusion and Findings ......................................................................................................................... 46
References .................................................................................................................................................. 47
Big Data with Wellbeing 4

Table of figures

Figure 1 ....................................................................................................................................................... 32
Figure 2 ....................................................................................................................................................... 35
Figure 3 ....................................................................................................................................................... 39
Figure 4 ....................................................................................................................................................... 57
Figure 5 ....................................................................................................................................................... 58
Big Data with Wellbeing 5

Table of tables

Table 1......................................................................................................................................................... 35
Table 2......................................................................................................................................................... 36
Table 3......................................................................................................................................................... 38
Big Data with Wellbeing 6

Introduction
In the social science domain, there is the growing interest of using methods that are far

from the conventional income-based approach, to measure the level of development and well-

being of the people or the community (Stiglitz et al., 2009, p.7). Furthermore, it should be noted

that the GDP does not result in the measuring of some of the non-markets interactions, such as

friendship, the moral values, happiness and even a sense of the purpose of life. Due to this, a

growing multitude of scholars, are turning to the issue of subjective self-reported measures of the

well-being of an individual, which can include va4riables such as satisfaction in life. It should be

put in records also that the majority of the economists, mainly use the variable of quality of life

and comfort, as a measure of the direct level of utility.

Furthermore, various political leaders, have supported the move, by calling the

commencement of multiple surveys that measure the subjective well-being, in their policies and

agendas. However, despite the diverse advancement, the measurement of subjective well-being

has raised multiple concerns among the economists, especially regarding the issue of

interpretation and analysis (Kahneman and Deaton, 2010, p.490). As a result of these concerns,

alternative methods such as the Time Use Surveys and the Day Reconstruction Method have

been majorly developed to cater to the various limitations.

As a result of this, the paper mainly contributes to the new research agendas, mainly

through showing how Big Data, could be utilized in the study and understanding of the well-

being of the people. It should be noted that through the increase of the digitization in the social
Big Data with Wellbeing 7

lives among humans, there is the advent of various variables to be traced, such as feelings, social

relations, and attitudes, which are deeply embedded in the social networking sites, such as

Facebook, Twitter, and even Instagram. Furthermore, the magnitude of these traces should be of

key concern, among the various social scientists. However, despite the capacity to analyze and

collect massive amounts of data, being so high in the areas of physics and biology, the progress

in the social sciences, has been relatively low (Lazer et al., 1999, p.722).

One of the main advantages of Big Data is that they can give the social scientists, the

ability to have a view of the various behaviours of the people. Through the searches on Google,

which they can then use to make various inferences, about the feelings and attitudes of the

people, rather than the statements of the variousattitudes and feelings among the people. Another

advantage is that it helps the social scientists, to stop relying on the various answers to the

predefined questions, such allows the social scientists, to listen to what the people have actually

to say. As a result of this approach, where there are revealed preferences, it results in the

unveiling of a reflexive picture of the whole society, due to the main reason that it allows the

ranking of the main concerns of the citizens, to emerge spontaneously, and also be used as

complementary data, to explain the main effects of the GDP. Furthermore, the data are mainly

based on the actual behaviour of the society, as they search for various information on the

internet, which are then taken to elicit the primary concerns of the people.

Another pro of using Big Data in the study of the subjective well-being of the people is

that they are not constrained to a particular time. The main reason is that they offer a real-time

and immediate source of information, to the various policymakers, which are mainly confronted

with data scarcity and short-term horizons while making multiple decisions for the community.

Furthermore, the information is also available at the local level, as long as there is the advent of
Big Data with Wellbeing 8

internet connection and access and the use of the internet is inherently sufficient to provide

meaningful statistical data which can be used for interpretation and analysis. Furthermore, Big

Data is often free of charge, which makes their access less costly.

Although these are some of the advantages of the use of Big Data in any platform, it

should be put in records that there are various limitations in its use. One of them is that the large

volume of data, can be substantial and thus provide constraints, when it comes to statistical

interpretation as the data is susceptible to noise, thus making the analysis very difficult. In this

paper, the various issues that could arise, through the use of data from the Google serach3es, will

be identified and proposed solutions recommended. One of the main issues that will be discussed

is the issue of the construction of the various categories, which reflect the different dimensions of

life such as the use of the Bayesian techniques, in a bid to come up with the most important

determiners of the subjective well-being of the individual. The methodology that will be used in

the paper will mainly allow for the construction of a model that has four critical qualities. These

are that the model should be testable, should be grounded in the existing theory and literature, the

model should also be transparent and could potentially be used in the determination of the well-

being of the community in a continuous period.

The paper will mainly build on the existing literature about the exploitation of the search

engine data. It should be put in records that one of the first search engine data, was mainly used

in the prediction and forecast of the unemployment rate in the United States (Ettredge et al.,

2005, p.88). On the other section, the data from the various internet search engines has been

mainly used in the forecast of the various macroeconomic indicators, such as the demand in the

automobile, the various destination for vacations, unemployment rates and even the consumer
Big Data with Wellbeing 9

sentiments (Choi and Varian, 2009, p.56). Such shows that Big data has an enormous potential to

determine or predict the well-being of people in a community.

Significance of study

The subjective well-being of a community or people, for example, how happy or satisfied

they are with their lives, can be one of the best tools that could be used as an indicator, in the

social progress of a community. Furthermore, through the use of big data and subjective well-

being, the various impacts of policies, as well as regulations on the well-being of people in a

community, can also be measured. However, it should be noted that the data on the subjective

well-being of a community or people, can be complicated to obtain, especially when methods

such as surveys are used, and also due to the fact that there may be some uncertainty, regarding

the factors that drive the people to offer the specific responses to the subjective questions. In this

study, the internet search volumes will be used to build a model that could be used to accurately

predict the subjective well-being of the various communities or people living in the United

States. From the research, it was arrived that the searches, which mainly related to financial

security, employment, leisure and family life, were among the main things that could be used to

predict the subjective well-being of the people. Furthermore, the model could be used in the

production of data, at a much higher frequency and accuracy, than when surveys are used.
Big Data with Wellbeing 10

Literature Review

For many decades, the issue of measurement of the well-being of the people has been a

matter of key interest among the psychologists, policymakers and the social scientists.

Furthermore, the issue of personal well-being is mainly considered a topic under the

psychological sciences. Such is mainly due to the reason that it mainly concerns the subjective

well-being and feelings of the individuals. On the other hand, the social well-being can be said to

be a collective dimension of the community as well, and the value is considered to be a very

significant descriptor or indicator of how the level of developed or the socio-economic system of

the place is balanced and sustained.

In this section, the various methodologies, which are used in the measurement of the

well-being of the individuals in the social dimensions, will be discussed. The purpose, as well as

the various limitations of the methods, will be discussed. While discussing the various

methodologies that are used in the measurement of the well-being of the community, the main

issue of using internet searches and social media, in the utilization of the Big Data will also be

discussed. The literature review section will be structured as follows;

 The traditional well-being measurement, through the use of GDP

 The capabilities approach

 The Use of Big Data in the measurement of Well-being of a community


Big Data with Wellbeing 11

The Use of GDP to measure well-being and issues as well as limitations

When it came to the measuring of the socio-economic progress of individuals or the

society, the use of the GDP was mainly considered as one of the best methods, even though it

had many shortcomings, especially when used as an indicator of the well-being of the

community. One of the main reasons that the GDP was mainly a success, in the measuring of the

well-being of the individuals in the community, was that it could connect the various goods and

services with the monetary valuations (Stiglitz et al., 2009, p.8). Furthermore, the use of GDP

also had the advantage that it had some form of linearity, clearness, and objectivity. Such could

be witnessed in the various public debates as well as its usefulness when it comes to some

international comparisons.

It should be noted that the GDP, is mainly an aggregate measurement of the production in

a country or community, and it includes mostly the production of the collective or final goods

and services, which are supplied to other units, rather than their producers in a certain period.

Although the use of GDP, and its correlations, with other aspects such as the standard of living,

is very high, the correlation is not very universal (Stiglitz et al., 2009, p.9). Furthermore, the

differences in the level of income between individuals, only explain a deficient proportion of the

differences that exist in the level of happiness among the people in the community (Frey and

Stutzer, 2002, p.403). Due to this reason, the use of GDP is mainly criticized by many social

scientists’ and scholars as well as psychologists, owing to the mere factor that it behaves as a

lacking indicator. When it comes to the determination of the well-being of the community, thus
Big Data with Wellbeing 12

resulting in the various wrong information, which might be necessary for the formation of

various policies and decisions to the community (Fleurbacy, 2009, p.1029). On the other hand,

the economists, philosophers and the psychologists.Among other people concerned in the

determination of the well-being of the individuals in the community, are increasingly being

interested in the various platforms of self-reported measures in the welfare of the individuals in

the city, as well as their importance in the contribution of decision-making and policy formations

for the town (Deaton and Stone, 2013, p.2).

More issues have been raised on the use of GDP, a measure of the human development

and the well-being of the community. One of the main reason for the limitation of the accuracy

of the use of GDP in the measurement of the welfare of the people is the issue of lack of a clear

one to one correlation with growth as well as the quality of life. As such, too much emphasis on

the use of the GDP as a benchmark of the well-being of the individuals could result in the wrong

enactment of policies or poor decisions. As such, GDP is mainly a measure of the total

production in the market, as a result of this, it is more useful, when it comes to the measurement

of the supply side in the economy, rather than the standards of living of the individuals or

citizens. An excellent example to this is that since the end of the World War II, there has been a

rapid increase in the GDP of many countries, however, the same cannot be said on the self-

reported subjective well-being of the individuals, as it has not risen or fallen (Frey and Stutzer,

2002, p.404). Furthermore, it should be noted that beyond a certain level of income, there is a

minimal relationship between happiness and income (Choudhary et al., 2012, p.28). As such,

even though revenue is beneficial in giving people some freedom and well-being, it can only be

used be used a mere proxy on what is valued by the people (Robeyns, 2005, p.98).
Big Data with Wellbeing 13

Other lots of issues have been raised on the subject of the use of GDP in the measurement

of the well-being among people in a community. First of all, it should be noted that the financial

analysis of the transactions in the market, is mostly taken as one of the initial points in the

measuring of the economic performance of a place as well as the prices of certain goods.

However, such data cannot be taken as a correct representation of the real value in the society.

Furthermore, it should be noted that only the actual services and goods are usually exchanged in

the market, and also calculated in the GDP measurement, which further results in the ignoring of

various factors that affect the well-being of the people in the community. Such includes the

aspects of the everyday goods which are provided by the government, which such as security,

democracy, and even freedom.

Furthermore, volunteering services and social relations are also ignored in the calculation

of the GDP, yet they are very crucial in the measurement of the well-being of the person or

individuals in the community. Another aspect that limits the use of GDP in the analysis of the

welfare of the people in the community is that it checks for the increase or rise of prices of some

of the primary commodities, while the decrease in the wealth of the individual, is usually

overlooked. Furthermore, the measurement of the aggregate production in a country does not

take into consideration, the various degradation of the environment or the multiple assets used in

the production, or even pollution.

It should be put in records that through the use of the GDP in the measurement of the

well-being of the individuals, could result in various misleading and wrong observations,

concerning the wealth, inherent to society. Furthermore, adverse events such as natural disasters,

resulting in the reduction of the individual wealth of an individual, but on the other hand, they

lead to an increase in the GDP of the country or the community. As seen from the above, using
Big Data with Wellbeing 14

the GDP to measure the well-being of individuals in the city, can result to various misleading

information, which can further result in poor decision making, concerning the different policies

that affect the people or the citizens. However, one of the most influential approaches, when it

comes to the measurement of the well-being of the humans, is the capability approach. The

approachwas developed between the 1980s and the 1990s, by one of the most famous economist

scholar, Amartya Sen.

The Capability Approach

The approach was mainly designed by a Nobel Prize winner, Amartya Sen. The approach

borrows mostly various roots from the works of Aristotle, Smith, and Marx. It is primarily a

broad normative framework, which is primarily used in the evaluation of the various societal

arrangements as well as the well-being of the people in the community. As such, the approach

provides mostly a framework that could be used in the carrying out of the various exercises in a

welfare comparison set. It should be put in records that one of the main advantages of the

capability approach is that it mainly measures nearly all the dimensions of the well-being of

people in a given setting. Such is shown by the fact that it targets development, justice, well-

being, as well as paying particular attention to the various linkages that occur between material

wealth, social well-being and psychological issues among the people.

One of the main features of this method is that there is the presence of the distinction

between the ends and means of the various actions undertaken by the humans. Due to this type of

distinction, there is the introduction of two kinds of concepts; these are the functioning and the

capabilities, which are all realized effectively. The former in these two concepts mostly

represents the achievement of the person. As such, it mainly points out the things that a person

manages to do while living their lives. On the other, functioning is more directly related to the
Big Data with Wellbeing 15

various living conditions that the person is exposedto. As a result, they are more easily

measurable, due to the fact that they are mainly concerned with the different aspects of the living

conditions.

The different capabilities are mainly possibilities and the chances of the various

alternative combinations that the various functioning that one can choose or do in a bid to lead

and live their lives. Hence, the capabilities, are thus the different set of vectors that result in the

functioning of the person, therefore resulting in the reflection of the freedom of the person to

choose which mode of life that they would like to lead or live.

It should be noted that the capability approach, has had longstanding impact and

usefulness, in various parts of the world. In the year 1990, the UNDP, adopted the approach, in a

bid to determine the Human Development Index (HDI), as well as the annual Human

Development Report (Robeyns, 2005, p.100). The main theoretical background of the HDI

mainly lies in the measuring of the various levels of human development. The main proxies used

in this measuring are mainly four in total and contain the first domain of the life expectancy of a

particular people.The second is the average schooling years in the country as well as the

expected length of schooling in the country as the third domain; the fourth domain is mainly

estimated through the use of the National Income Per capita.

Capabilities for the Measurement of Well-being of Individuals

Functioning and the capabilities, mainly represent the main base of information, when

using this approach. The approach differs a lot from the other theoretical frameworks, which are

mainly aimed at determining or predicting the level of well-being of the community or

individuals. A key example of this is that the personal utility, such as desire fulfilment, happiness
Big Data with Wellbeing 16

and pleasure, and even opulence, majorly means the level of freedom of the person or

community. It should be noted that the individual and the social level, mainly emerge as a form

of interdependence between these two dimensions. Such is regardless of the causality or the

relations being less and or more challenging to get or access from the perspective of capability.

As such, it can be said that the capability perspective and approach, is mainly based on the

mixture of the doings and the beings, where the quality of living could be measured through the

use of capabilities, to result in the valuable functioning of people or the community. However, it

should be noted that material things and materialism, should also be considered in the evaluation

of the well-being of individuals. As such, focusing on the subjective perspective could result in

some form of bias as it could fail to depict the various deprivations of the person as well as the

lack of various material goods.

As the name suggests, the approach mainly depends on the perspective of the various

capabilities of the person, as the main source of information. As such, focusing on the

capabilities, when it comes to the evaluation of the well-being of the individual, does not result

in the loss of any information that is vital in the calculation of the well-being of the individual.

The approach considers not only the well-being achievements of the person but also the well-

being freedom of the person, to choose any life to lead. The approach contrasts itself with the

consumer theory, where there is a variety of option available to the person, the best option is

normally considered. However, through the capability approach, there is freedom to enjoy

various activities as well as beings, that may have important values, to the well-being of the

individual (Sen, 2008, p.280).

As such, by doing this, when using the capability approach, there is no particular favour

of a particular notion about a good life. As such, it is very easy to its influence on the issue of
Big Data with Wellbeing 17

luck-egalitarianism. The approach mainly assumes that each person should have the same

capability or real opportunity. However, each individual should be held responsible for the

partic7ular choices that they make (Robeyns, 2006, p.356). It should be however noted that the

valuable functioning of the people, can be very different, especially considering the places and

conditions in which a person lives, such as the developing or developed areas. Some of them

may also include the various basic needs such as food and nourishment, health and even more

complex issues such as self-esteem and self-respect.

The Capability Approach, the Nussbaum Version

Martha Nussbaum is mainly credited with the creation of this version of the capability

approach. According to Nussbaum, the desire to get free of choice, primarily requires that there

is the presence of the formal defence of the many fundamental liberties, as well as the ability to

get an assurance on the various levels of material conditions and also circumstances. However,

for the actual effectiveness of the empowerment as well as the human development, various

aspects need to be considered, which allows the people to transform the several chances or

possibilities, into means, outcomes and ends (Nusbaum, 2000).

From this point, it is apparent that the two, Nussbaum and Sen, used different aspects of

the capabilities. In the works of the former, the concept mainly refers to effective or real

opportunities, while in the works of the latter, the more attention is paid on the skills of the

people as well as their personality traits (Robeyns, 2005, p.97). The interpretation of Nussbaum

from the perspective of capabilities.As the various attitudes of the humans, mainly recalls the

aspect of degree of conversion, which was introduced in the year 2003, by Sen, as a way of

indicating the various degrees, in which people can be able to transform the various commodities

in functioning and capabilities, especially those that are identified as social and
Big Data with Wellbeing 18

personal.Nussbaum mainly elaborates on three different conversion factors that reconsidered in

his perspective. They include as mentioned below;

1. Environmental; which primarily concern the geographical location, the natural

characteristics and the climate.

2. Social; which mainly include the institutions and the values that the society is built upon

in the country or the community.

3. Personal: It mainly features a person that has a more significant influence on the

conversion of various commodities into functioning

The Various Limitation of the Capability Approach

Inthe implementation of the capability approach, it has been done in many contexts and

field, mainly through a general framework, while being supported by other theories. According

to Robeyns (2006, p.358), there are mainlyeight types of applications that make use of the

capability approach; these are listed below;

1. The general assessment of the human development in a given country

2. The poverty and well-being assessment in the development economies around the world

3. Empirical and theoretical analysis of the various policies in the countries

4. Identifying the poor in the developing nations

5. Criticizing the different social norms

6. Analyzing the deprivations that occur to the people who are disabled

7. Gender inequalities assessment

8. In exploratory and descriptive research


Big Data with Wellbeing 19

As seen, despite the fact that the approach has many uses and attracts much attention

from the various scholars and researchers, its applicability, has continued to be an issue eliciting

many and mixed reactions from various players (Robeyns, 2006, p.370).

Furthermore, multiple doubts have continued to hamper the possibility of undertaking the

empirical use of the approach as it is seen to be very complicated (Sen, 2008, p.279). One of the

great doubts and questions that surround the application of the approach mainly lingers on the

issue of the selection of the capabilities. Issues such as which type of capabilities and functioning

are dearer to the humans have continued to mar the successful application of the method. As seen

the practical application of the approach, has continued to face procedural and operative

problems. Such is mainly attributed to the indefinite borders that mainly surround the issue of

capabilities, which leads to the users finding it hard to distinguish between capabilities and

functionalities.

Use of Big Data in the Measurement of Well-being of Individuals and Communities

The various advancement in the areas of technology are currently resulting in the

development of new avenues in the study of the well-being of individuals in a community or a

country. Due to this, it should be noted that Big Data, as well as the various data analytic tools,

could be used to step in the measurement of the well-being of a community or individuals in the

country. It should be put in records that there is no standard definition of what is composed of

big data, however, in most cases, the term is mainly used in some data sets, which according to

their size, cannot be easily managed by some of the standard data analysis and management tools

or software (Tien, 2013, p.128). Ward (2013), explained that some of the various statistical

programs that are used by some of the researchers, such as the Strata and the SPSS, cannot be
Big Data with Wellbeing 20

able to process large amounts of data in a timely manner, which could result to various negative

impacts to the researcher or the study itself, such as time delay issues.

It is mainly due to such problems that newer technologies, such as the Hadoop have been

developed, which are mainly seen as the answers to the various limitations that are accrued by

the various traditional data management and analytical tools (Deroos et al., 2012, p.15). Being

able to have tools that could be used to analyse Big Data, could result in the opening up of new

opportunities, in the study of social sciences as well as psychology, which could further help

researchers to study or undertake future projects (Ovadia, 2013, p.133).

However, in undertaking to use Big Data, one should really get an understanding of what

it really is. The main difference between Big Data and the conventional data is mainly the

differences in the volume, velocity as well as a variety (Deroos et al., 2012, p.13). In this, the

volumeprimarily refers to the size of the Big Data, and the size is largely in Terabytes or even a

large volume. Furthermore, advances in technology, are continually making the size of the Big

Data to be bigger each passing day (Tien, 2013, p.130). On the other, variety mainly refers to the

way in which the Big Data is available in many forms or formats, the main distinction here is

between the structured data, which is mainly considered to be organized data. There is also the

unstructured data, which in many instances, is considered to be messy or unorganized (Deroos et

al., 2012, p.17).

Finally, velocity mainly refers to the way and speed, in which the data can be retrieved

for any type of analysis, such can be through the internet. As such, according to these three

criteria for categorizing Big Data, they are mainly considered to be diverse, large and constantly

growing. However, a key thing to consider is that the Big Data will always vary, with regards to

the volume, velocity, and variety, in places where they are retrieved. As mentioned previously,
Big Data with Wellbeing 21

there are really no formal rules, which categorizes or define what is constituted in Big Data, as

such, the three varieties, mainly give a rough overview of what the Big Dat is really like, in any

instance, and mainly used to differentiate it, with the traditional data (Deroos et al., 2012, p.20).

The Various Ethical Concerns for the Large Data Sets

For many researchers and scholars, having easy access to data is one of the main things

that they always wish for. However, the increase in size, storage and even accessibility of such

type of data, mainly comes with various ethical consequences or questions, mainly regarding the

issue of privacy and consent. The main reason for this is that as the amount of data on a certain

individual, becomes too much, it becomes relatively easy, to ascertain the identity of the

participant (Tien, 2013, p.032). According to Schadt (2012), the growing concern of data

available on the internet, as traces in the digital world, is likely to lead to privacy and ethical

concerns in the near future. Information such as the DNA, various GPS information, data from

the social networking sites and the genomic database, pose very high risks in the identification of

the individuals. Due to this, it is always desired that the privacy of the participants in any type of

research containing Big Data, is kept well and secure, to avert any identification of the

participants.

It should be put in records that the issue of privacy, is not the only thing poses ethical

concerns in any research that uses Big Data, there is also the issue of consent. Such is especially

a serious matter, in instances where data is gotten from the various social networks that allow the

users to display multiple information about their accounts on them (Oboler, Welsh, & Cruz,

2012, p.3). Furthermore, data can be freely available, on social networking sites such as

Facebook, Twitter and this raises various ethical concerns and issues, with regards to what type
Big Data with Wellbeing 22

of information is considered to be public information. One of the main prevalent problems,

concerning the issue of Big Data, mainly come from the various private or governmental entities,

that can access the data (Lesk, 2013, p.87).

However, researchers do need to be aware of the various ethical concerns as they go with

their research using the Big Data. The issues of privacy concerns and other ethical reasons can be

mitigated by the use of various databases, which collect only the required data for the research or

study and ensure that proper ethical guidelines are followed. As a result of this, having a data set

in which the various steps are taken to ensure that anonymity and consent issues are respected,

will be very crucial in ensuring that future researches that use Big Data are not compromised or

Jeopardized.

Big Data and Psychology

As mentioned earlier, when it comes to social sciences, Big Data has not been explicitly

used, as compared to other fields such as physics and biology (Miller, 2012, par.3). However,

some aspects of social sciences, have utilized the concept of Big Data in their researcher, through

works such as organizational and industrial uses (Davenport, Barth and Bean, 2012, p.45) and

even other forensic applications (Collins, 2013). However, still, the use of Big Data in the field

of psychology and social science is still at the lowest level. However, in the industrial

application, Big Data can be said to be effectively utilized in a bid to understand the various

business environments well, as well as aspects such as the various customer demands and

concerns and the consumer behaviour (Davenporth, Barth, & Bean, 2012, p.48).

It should be noted that Big Data could be very useful in the various processes of the

corporates, in which psychology may be very useful. Such include avenues such as marketing,
Big Data with Wellbeing 23

the various decision-making, and policy formulation process as well as advertising, can use the

advent of Big Data to ensure better operation and functioning of these processes. It should be

noted that various corporations, always have a magnitude of various information at their

fingertips. However, the main issue is that they are not utilizing the information efficiently

through proper analysis of the data (Koh, 2012, par.3).

As such, ensuring that the company has the efficient technology for the analysis of Big

Data, could be one of the main hindrances which prevent the company from having efficient use

of the Big Data information at their disposal. It should be put into records that any type of data

which cannot be utilized or analysed, is of little use to the company or the researcher. However,

through the use of Big Data, companies, policymakers, economists and psychologists, they are

normally provided with the necessary tools to analyze or get some information from the system.

One of the main projects in which Big Data is used in the analysis of a community at the

general level is through The Durkheim Project. The project is mainly an ongoing program that

mainly utilizes Big Data, to come up with the various solutions, for the increase in the suicide

rates among the veterans (Patterns and Prediction, 2014). It should be noted that through the

monitoring of the various data from the various social media networks and accounts of the

various military veterans that had served in the United States Army, the big data utilized could

be used in the determination or prediction of the susceptibility of the person, to commit suicide.

As a result of this, the main of the project is to come up with various interventions for the risks

of the veterans committing suicide at any moment (Patterns and Prediction, 2014, par.4).

As such, through the project, it is seen that the use of Big Data, has the ability to find out

the various subjective well-being data of a particular community in the real-time. Furthermore,

the availability of such a program, mainly requires minimal efforts from the participants and as
Big Data with Wellbeing 24

such, could offer better insight, in the issue of the suicide rates. As such, when the program

becomes increasingly successful, it can be used for the general population, showing that the

researches and the policymakers, can be able to use Big Data, for the analysis of the community

and thus aid in the implementation of various policies and even in the decision-making

processes.

It should be put in records that the majority of the individuals in the world have already

incorporated the various social media and network platforms in their daily lives.As such, and as

long as the researchers ensure that the data is gotten in an ethical way, it can be very helpful in

the determination of the well-being of the community or individuals at a certain geographical

level (Oboler, Welsh, & Cruz, 2012, p.14). It should be noted that in many projects in the social

science field, they mostly incorporate data that is present in the public domain, however, through

the analysis of the information, there is the advent of new information and new insights being

accrued from the project as predictions could be done on the existing data sets.

As such, through the process of better analysis of the available sources of information, it

is noted that Big Data can be very useful in a community or country, especially when it comes to

the determination of the various predictors of the well-being of the people in the society.

However, it should be noted that many researchers are also analyzing data that has already been

gathered, such as The Durkheim Project. It should be put in records that there are various

projects, which could successfully utilize the use of Big Data and the available methods of

analysis and tools to better understand the various needs of the society. Furthermore, through the

running of the analysis on the Big Data, the researchers are normally provided with a chance of

obtaining a diverse type of information, which could help in the looking of information at a more

broad or general term.


Big Data with Wellbeing 25

The Psychological Well-being

The present study will mainly undertake to study the application of Big Data, to

determine or predict the various indicators of well-being in a given community or society. As

such, the main aspect of the study is to study the various tools, big data technologies, the

potential beliefs and their possible applications or the various limitations that could be seen when

using Big data to determine the well-being of a people in a community. It should be put in

records that the issue ofwell-being, in a community, is very tricky as it encompasses various

factors that may not be easily discerned or apparent. The well-being of a people in a society is

mainly a concept that is multifaceted, which mainly looks at various aspects such as mental

health, stress, happiness, anxiety and even self-esteem, among other variables (James, Bore, &

Zito, 2012, p.430).

Ryff (1995), demonstrated that psychological well-being could be a very complex issue

that incorporates various factors such as prominent theories, while at the same time

hypothesizing how the various concepts of psychology, such as environmental mastery,

autonomy, and other things, have a relationship with the well-being of a person. On the other

section, other researchers have found out that there is some type of positive correlation that exists

between age, education, the extraversion and even the issue of conscientiousness, that leads to a

better psychological wellbeing of a people and the overall well-being of a community or a

society (Keyes, Shmotkin, & Ryff, 2002, p.1021). Another type of studies have also found

various positive correlations, between various factors such as the personality of the person and
Big Data with Wellbeing 26

their own well-being (James, Bore, & Zito, 2012). Others have also found that there is a positive

correlation of the time that is spent in conducting or participating in leisure activities, with the

well-being of the person (Trainor, Delfabbro, Anderson, & Winefield, 2010, p.467). Others have

also found that the income of the people, is sometimes related to the well-being of the persons

(Morrison, Tay, & Diener, 2011, p.168). As such, itis seen that the issue of psychological well-

being, can be affected by a variety of factors, which are not limited to certainissues. As a result

of this, it is very prudent, that one should undertake a broader perspective in the looking of the

various variables in the life of an individual, which may be used in the determination of the well-

being of the person. Furthermore, it is very paramount to determine which variables, are more

significant, when it comes to the determination of the various aspects that affect the

psychological well-being of the person. Such mainly shows that conducting an analysis of the

well-being of the people in the community, using Big Data, could be very useful in the

determination of the various relationships that exist between the various variables that affect the

well-being of persons.

The Current Study

It should be noted that the real-world issues, although they may appear simple, the

underlying factors causing them, usually have deeper roots, and coming up with various

analytical methods is very prudent if one desires to find out about the case. Due to this reason,

Schultes (2013), argued that the various variables that may be assumed to be independent may

not be that way, and furthermore, some subtle correlation, exist, even in large data sets.

Moreover, the scholar suggests that as the size of the data sets continues to increase, there is

evena greater chance, to detect the small correlations, which exist in the data, and the various

interconnections that could exist in the data (Schultes, 2013, par.4). The tendency to stay away
Big Data with Wellbeing 27

from the more prominent hypothesis, using the big data analysis, is also found to be popular,

among the various biomedical researchers and literature.

Furthermore, Miller (2012), in the discussion of the various uses of the big data analytics

in the medical field, found that some of the researchers in the field have come up with a way of

doing some research, hypothesis-free, owing to the main reason that they are using big data

analytics. Such enables them to let the technology run, and then be able to view some results at

the end of the processing of the data. Due to this aspect of the technologies, many researchers are

beginning to find out some piece of information, which could have been missed or ignored, and

by discovering such facts, they can then undertake to delve deeper into the real issues without

having to waste much time (Miller, 2012, par.5). As such, due to this, the scope under which the

researchers could look into finding and examining various solutions, big data analytics, is

proving to be one of the best methods that could be used in any instance. As mentioned in the

thesis, the current studies will attempt to look into ways, through which big data, could be used

in the determination of the various predictors of well-being among people in a society.

The study will also look at the previous exploratory studies and aggregate the data found

in the study and determination of the well-being of the Canadians, through the use of Big Data

analytics. As a result of doing this, it can serve as a testament to the kind of basic research which

is capable of being done.With the use of the tools and also the big data, and how social scientists,

and psychologist, can welcome the use of technology, to better their research and get better

insights, on certain issues affecting the community. Due to some various limitations, the study

will mainly rely on the data that was previously collected, concerning the well-being of

individuals in the Country. As such, a good example of the hypothetical measure of the well-
Big Data with Wellbeing 28

being of people in a community could mainly take place in the form of questions about the

mental health of people as well as the stress levels, among certain groups of citizens.

The various potential predictors of the wellbeing of individuals could mainly take place

in the form of personal traits, substance abuse in the community, the levels of the times spent in

leisure among other factors that could not be easily accounted for, in some small-scale research

that used to be done previously.

The primary point of the project is to focus more on how big data analytics, could be used

in the study of the psychological well-being of a community. As such, the paper will mainly

tackle some of the newest technologies that are used in the analysis of big data, such as software

used in such analysis and their potential benefits or limitations. Ward (2013), noted that there is a

various problem with some of the traditional methods or programs for the analysis of the bid data

sets. As a result of this, the various predictors of the well-being of the community will be looked

at, to determine the type of information that could be extracted from such type of large data. It

should be noted that the analysis of big data, is majorly dependent on the issue of accessing the

data as well as the state in which the big data is in, at the moment of retrieval. Such also reveals

another dimension and focus of the research, which stands to show the various roles that big data

analytics, could be used in the field of psychology as well as social sciences.

In trying to determine the various predictors of the well-being of the people in a

community, the various data were combined, collected, organized and then analyzed, to see what

type of predictors could be seen affecting the well-being of the people in the community.

Furthermore, various attempts were also made in determining the usefulness of the various big

data analysis programs, such as Hadoop, to determine their worthiness in such type of research.

As such, by doing this, the main hypothesis in mind is that some new insights could be found,
Big Data with Wellbeing 29

which were previously ignored or unknown from the previous types of researches. Furthermore,

awareness will also be shifted to the issue of the new technologies of big data, and how it can be

useful in the determination of the various aspects that affect the well-being of a community.

The awareness aspect of the paper will mainly take the form in which there will be an

evaluation of the current technologies and how these technologies are being applied in the big

data analytics. Because of the dual nature in which the current project is based, there is the

advent of the dual nature in which the current study is based. There will include two methods

sections for a better explanation of the various methodologies, utilized in the paper. The first

methodology section will mainly deal with the exploration of the various technologies, being

utilized in the big data analytics. The other methodology section will mainly deal with the

predictors of the well-being of people in a community.


Big Data with Wellbeing 30

Methods

The Exploration of the various Big Data Technologies

Participants
In this, no participants were mainly required in the study of the exploration of the various

technologies that exist in the exploration of the big data analytics.

Materials
The current study mainly undertook to utilize a computer that had the Windows 7 as its

current operating system, with a various specification such as processing speed of about 3.50

GHz and a total RAM capacity of about 16GB. Additional space on the hard drive that was

required in the study was about 763 GB, mainly needed for the running of the various virtual

systems. Other materials that were required and utilized in the study included an accessible

software design for the access of big data, a timing device, such as the stopwatch as well as the

statistics Canada data files, which were mainly obtained from a public library, accessed from the

Equinox data delivery system.

Some of the Big Data technologies that were used for some comparative functions, in the

study, included Hadoop 4.5.0, Hortonworks Sandbox 1.3, that run on a Linux based operating

system, Microsoft’s Windows Azure, which is mainly an online cloud computing service, that

contains various technologies like the HDinsight, which is largely a Hadoop file system. The

final technology that was looked at was the IBM InfoSphere BigInsights QuickStart, version 2.1,

that also ran on the Linux operating system. is the plagiarism mostly in the reference section? It

should be put in records that all of the programs were mainly for free, and easy to access except
Big Data with Wellbeing 31

the Azure from Microsoft’s Windows, which required an academic pass for it to be downloaded

or used in any research process. Furthermore, the IBM SPSS 20, was mainly used to determine

the capabilities of the software to handle big data.

Procedures
One of the main steps taken in this instance was coming up with a sample study, that

could be used in the facilitation of the comparison, between the big data analytics tools and the

standard tools. As such, the study mainly took the form of looking into the various predictors of

the psychological well-being of the community, by studying the data provided by the Statistics

Canadian Community Health Survey (CCHS). The data was big data, owing to the fact that it ran

from the year 2003, up to the year 2012, hence, verifying that it was a large data set. As a result

of the large data sets, six different data sets were mainly utilized in the paper. They included the

CCHS 2004, the CCHS 2007-2008, the CCHS 2009-2010 and finally the CCHS for the year

2011-2012.

The data sets were then amalgamated in the merge function of the SPSS. In this, it was

observed that there were about 141 variables, which were mainly shared in the various iterations

of the survey, which were amalgamated in the data set. The various code names for the 141

variables are shown in table 1. It should be noted that the legend is also engraved in the table,

which is mainly relevant to the various Statistical Canada documents (Statistics Canada, 2013).

Furthermore, most of the variables had to be recorded for the sake of proper analysis to be

conducted, owing to the main reason that some of them included some non-answers as well as

non-applicable points and various refusals, from some of the response from the participants. As

such, they were recorded and noted to be missing values, within the SPSS program. Furthermore,

the various yes and no responses were mainly reversed, in a bid to ensure that there was
Big Data with Wellbeing 32

consistency in the data fed to the SPSS program. Appendix A will show the full legend of the

various recorded variables in the study.

Figure 1

The methodology of the current studyis listed in the Method section that is just after this

section. Various attempts were made to ensure that various statistical analyses, were performed

on the data sets, using the Big Data technologies mentioned in the previous section and as well as

through the use of SPSS.

Furthermore, SPSS was also used in the evaluation of its ability and functioning to handle

some big datasets, through the use of the linear regression analysis by the aspect of the timing

basic multiple being adjusted as the data increased exponentially. Such was mainly achieved

through the taking of the various data files, and merging the various cases with themselves and

then resulting in saving the increased files and then repeating the entire process with the saved
Big Data with Wellbeing 33

file. Such effectively led to the doubling of the various number of cases, in each of the cases and

the size of the data sets.

The other step that was taken was finding the accessible big data analytics technology

and software such as the various programs or services thatwere based on the Hadoop program,

which is one of the industrial standards when it comes to the analysis of the big data for various

functions. Such would allow for the analysis of the various predictors of the well-being of the

people in the society, as well as offer a chance, for clear comparison, with some of the standard

software used, such as the SPSS. One of the main benefits, accrued through the use of Hadoop, is

the fact that it results in the reduction of the larger jobs, into smaller manageable tasks. The

program also allows for the allocation of the various tasks as well as give room for the various

cores to work parallel to each other. By doing this, there are the chances of completion of very

large tasks, in a very timely or a shorter time, especially when compared to the various

traditional programs for statistical analysis.

One of the main steps taken into finding these type of big data analytics programs, was

through an online search, in which the attention was mainly turned to the low-cost programs or

some of the units that are offered for evaluation by the various developers. During the research,

several of the Hadoop programs were tested, by mainly using the big data obtained from the

Statistics Canada. The tests include applications such as the Sandbox version 1.3, the IBM Big

Insights QuickStart 2.1 and the Azure from Microsoft Windows.

Method of the Study

In thesample size for the study was N=681,578, mainly taken from the Canadian

Community Health Surveys, from the archival Statistics Canada Data. The total population had
Big Data with Wellbeing 34

males totalling about 45%, which was 310,711 in total, while the rest were females. The age of

the various respondents ranged from the year 12 to over the age of 80. Among the males, the

average age among them was in the range of 40-44, while the average age for the female

respondents, ranged from 45-49 years. The participants were mainly contacted by the Statistics

Canada agents, for a face to face interview, through the visits to their homes, mailed letters and

even phone calls to request them for their participation.

Materials for the Study


In the current study, only statistical technologies and tools are utilized, for some

comparative purposes.

The Procedure for the Study


The respondents were mainly visited in their homes, by some of the interviewers, who

upon agreement with the participants, asked them questions, and then recorded the data on their

laptops. After the interview had ended, the participants were then thanked for their participation

and given a chance to ask any question, regarding the research. Furthermore, many of the

participants did phone interviews, which were then recorded, using the similar format. In this, the

participants were mainly called and interviewed, while the interviewer, recorded the responses

on the laptop or computer. However, the full procedure of the collection of the data can be found

from the Statistics Canada documents (Statistics Canada, 2013).

The Results of the Study


The multiple linear regressions were mainly done using the SPSS 20 software program,

in a bid to determine, some of the possible predictors of the well-being of people in a

community. Such was defined by the variable, GENDMHI. The other stepwise regression was

also run with the PIN as 0.05, while the POUT was 0.10. There was also the pairwise removal of

some of the variables that were missing from the data set. From the analysis using the SPSS
Big Data with Wellbeing 35

software, some of the possible predictors of the well-being of the people in the community were

found,and the multiple regression was again run on only the variables that were found to have

s8ignificant predictors, within the previousmultiple regressions that were conducted in the study.

In this, the amount of variance that amounted to the best model can be shown in table 2. As such,

the regression equation for the best model of fit was found to be as follows;

Figure 2

Table 2 The R and R square values for the best model fit
Model R R square Adjusted R Std Error of the

Square Estimate

19 . 480𝑠 .230 .230 .812

Table 1
Big Data with Wellbeing 36

The legend for each of the variable that is included in the best model can be found in the

Appendix B. Such mainly comes to the conclusion that some of the variables, were found to be

very significant when it comes in the prediction of the well-being in the society.

The various correlation tables, as well as the various descriptive statistics for the best

model unveiled, can be located in the Appendix C. Table 3 mainly includes the Anova Table, for

the best model.

TABLE 3: The Anova Table for the Best Model


The Model The Sum of The Degree The Mean F Sig

Squares of Freedom Square

(pdf)

19 7614.881 17 447.937 679.733 .000

25462.523 38639 .659

33077.044 38639

Table 2

Furthermore, the homogeneity of the variances of the various criterion variableswere also

studied, for the various specific values of the predictors. Such was mainly done through the

standardized residuals, in a bid to determine whether the variances of the variousresiduals differ

and in what way they are different. There was also the use of Cook’s distances, in a bid to

determine the influence of the outliers, when it comes to the line of regression. Table 4 below

shows the residual statistics.

Minimum Maximum Mean Std. N

Deviation

Predicted .95 4.29 2.71 .451 30390


Big Data with Wellbeing 37

Value

Std. -4.636 2.870 .678 1.016 30390

Predicted

Value

Standard .009 .058 .020 .005 30390

Error of the

predicted

Value

Adjusted .95 4.29 2.71 .451 30368

Predicted

Value

Residuals -4.088 2.945 .114 .906 30368

Std. Residual -5.038 3.629 .140 1.117 30368

Stud. -5.040 3.629 .141 1.117 30368

Residuals

Deleted -4.088 2.949 .114 .907 30368

Residuals

Stud. Deleted -5.042 3.632 .141 1.118 30368

Residuals

Mahal. 4.072 194.635 24.424 14.844 30390

Distance

Cook’s .000 .002 .000 .000 30368

Distance
Big Data with Wellbeing 38

Centered .000 .005 .001 .000 3090

Leverage

Value

Table 3

From the various multiple analysis that were timed on the SPSS 20, it was observed that

to find a distinct trend in a largenumber of times, mainly takes a while to process the

largerdatasets. It should be noted that one of the data which was used in the study, was about

0.21 GB in size, and the software took about 20 seconds in reporting the output. After the

number of cases were doubled, up to 8 times, and observing how long the SPSS 20 took to

produce each of the production of the data, it was observed that a file which was 53.75GB, took,

slightly over an hour to be analyzed (3852 seconds) The various results from the sizing tests, are

shown in the figure below.


Big Data with Wellbeing 39

Figure 3

Results
By undertaking an exploration of the various technologies used in Big Data, various

issues were seen, over the course. However, none of the big data programs for analysis were

tested, in a bid to find out the level of analysis thatwas required in the research.

It was noted that the Hortonworks Sandbox, version 1.3, had a relatively simple setup,

which did not require the complicated hardware. Furthermore, it was noted that the lessons and

the tutorials, which were inbuilt to the program, were very useful in aiding the use of the

program. One of the main drawbacks of the software is that it is mainly a learning tool, and as

such, they are not capable of undertaking any statistical analysis. However, after checking the
Big Data with Wellbeing 40

several tutorials and running various commands, it was determined that the program could not be

that useful for the study.

The other program that was tested was the Cloudera Distribution which included the

Apache Hadoop version 4.5.0. However, the one that was used in the study was a free version,

which did not have the full functionality of the full program. Furthermore, various components

of the software were also deactivated, resulting in more error messages. Furthermore, the

program also had some configuration issues, which meant that its application in the study, was

limited. Various efforts were undertaken in a bid to fix the problems in the program, such as

installing the program afresh, but it still did not work as expected.

On the other hand, the IBM Infosphere BigInsights QuickStart 2.1 was the best option

when it comes to the analysis of Big Data, especially considering its usability and functionality.

The program allowed the data to be easily displayed and also appeared to undertake

some statistical analysis. However, running the program had a significant burden on the

hardware requirements of the computer. Such resulted in various issues, such as slow response

and difficulties in running the commands. However, it should be known that activating the

different statistical analysis, was hypothetically possible, but was not readily feasible.

Furthermore, on several other occasions, it was seen that in the multiple data sets, the program

was seen to be dropping some cases, which was a significant issue of concern. However, a data

file was uploaded, and some Basic pig, which is an application within the Hadoop, that mainly

issues commands and also interacts with the data. Such enabled various commands to be run by

the researchers, but the program had issues with speed and the various difficulties in the

statistical analysis, made the program to be not good in analyzing the current research.
Big Data with Wellbeing 41

The last big data technology that was utilized and analyzed was the Microsoft’s,

Windows Azure. It was the only cloud-based Hadoop system that was in use. However, despite it

being cloud-based, and thus making it bypass the various hardware requirements, the program

also had its problem. It should be noted that the Azure, is a collection of various services and

products, and big data analytics is one of the core partswhich are applied by the program.

However, the website was not that user-friendly, and was more like a status window, that gave

only the services which were to be implemented., the type of files that are stored the user to

check or create the various Hadoop clusters. However, it should be noted that any functionality,

such as the uploading of data, viewing the data, interacting with the data and also analyzing,

were mainly detached from the website.

Furthermore, the various commands had to run through a separate command line

interface, that was downloadable, which mainly operated on the local machine. Unfortunately,

the interface offered very little help, especially to a user that mainly wanted to learn various

aspects such as the interaction of the data, such made the program to appear cruder when

compared to the other big data analytics programs. Furthermore, the only way in which the

statistical data could be analyzed was by mainly downloading the file and opening it locally, and

Excel will analyze it. Such a procedure meant that the purpose of cloud computing was defeated.

As such, the data file was uploadedto the website, and various attempts were made, to

connect it with Excel, thus helping it to perform the various statistical analysis. However, it

mainly proved unsuccessful, due to the appearing of many errors. As such, due to the lack of the

cloud computation for statistical analysis, the crude interface and the various issues present, the

software mainly failed to meet the various requirements for the present study. The time

constraints prevented the exploration of other Big Data technologies, which are currently in use.
Big Data with Wellbeing 42

The Research and Findings

While some of the strongest predictors of the psychological well-being have been

introduced in the previous section of literature review, using the large dataset, as well as not

coming with some preconceived opinions of hypothesis, mainly resulted in the unveiling of new

insights, which might have been missed or deemed to be insignificant. An excellent example to

this is that the work by the researchers, Gestel, Jansen and Theunessien (2011), found out that

the Binge drinking was some of the predictors of a poor sense of well-being among the teens in

Dutch, those aged between 12-15, however, the same could not be said on the older teens, aged

between 16-18. In the current study, Binge drinking, which was coded as ALC_03, was

discovered to have a meagre effect size of (r = -0.07, and p<.000). Such demonstrates that the

work by Gestel, Jansen, and Theunissen (2011) is a better indicator of small trends, which are

real and thus, may hold for a broader group, as opposed to that which was originally studied.

However, in some complex analysis that was performed on the large data set of the

Canadian Community Health Survey, the data could reveal whether such correlations are true for

certain ages. However, if the data set is not large enough, then some information could be

missed. Furthermore, looking at the study done by Cook and Benton (1993), they found that

there was a positive correlation, between the intake of fruits and vegetables, but this was only

related to the females. However, the current study found that there is a general effect of (r=0.098,

and p<0.000). As a result of this, further research when done on the two correlations within the

data set could lead to a revelation of whether the relationship changes, when the issue of gender

is factored in in the analysis. Furthermore, looking at such type of relationships, through the use

of a more extensive data set, will allow for a different finding to be seen, and also provide more

ways that could lead in the discovery areas, which could further be of benefit to the researchers.
Big Data with Wellbeing 43

In the current study, some predictors of the psychological well-being were found to have

been missed. Some of the unique findings were found to be the size of the household (r=0.081,

and the p<0.000), the heart disease was (r=-0.071, while p<0.000), issues of high blood pressure,

were found to be (r = -0.074, while p<0.000). Such can mainly be attributed to the variables not

being studied, due to the fact that they have small effects, which did not show in the data set that

was smaller or due to other many reasons. Such also shows some of the benefits of working with

larger data sets as it allows for the data to speak for itself. However, it also results in the

questioning of the way various psychologies and social scientists, determine what makes a

relationship to be significant.

When the sample is large and enough, even the smaller effects are seen to be made

significant, and this leads to the various considerations, that must be taken, especially in

determining whether a relationship is significant or not. With more researchers moving into the

use of the large data set, a conversion will mainly need to be done, especially regarding the issue

of what really constitutes, a real effect mainly due to the significance as well as conceivably,

with nearly every other variable, when the data set is very large. However, it should be noted that

even the small issues, could be significant, especially when it comes to the size of the population,

such as the fact that drug interactions, could be very lethal in small sample population (Miller,

2012, par.4). Such an issue may become a worrying trend in many years to come, necessitating

the need to conduct more research on the subject.

It should be noted that even though the psychologists and scientists rarely use Big Data, it

may have many benefits. There is also the potential of the technology to benefit the various

social scientists, by mainly leading to an expansion of the population, which is to be examined.

Such can lead to the knowledge of the type of solutions which are needed to be made concerning
Big Data with Wellbeing 44

the issue, based on the analysis from the data. Furthermore, some of the current limiters to the

social scientists and psychologist, especially when it comes to the research, are slowly being

solved as the technology advances.As such, it more likely that many of the problems that are

currently in existence, will be solved through the proliferation of the software to other broader

areas and also the advancement of technology. Some of the main issues which are holding back

the research, and the use of the Big Data, as identified by the current study, mainly fall into four

general areas.

The first issue is mainly related to the access and costs. The current project mainly

witnessed various problems, especially when trying to use and access the various Big Data

technologies, as only free programs or those that cost lowly, were chosen and tested, and these

types of programs had little functionality to aid in the study. Furthermore, it was discovered that

one of the main ways of accessing the software, was majorly through contacting the developing

companies and asking for a copy of the program. Such was very problematic, as some of the

companies did not consider it prudent to give the software to independent researchers, as well as

the reason that it was a relatively small project. However, over the course of the study, the issue

got better, as there was the discovery of more avenues, such as the academic passes and the

various application programs which were in the process of being developed, allowing for easier

access to the Big Data technologies. Such programs required no additional costs, which could

have been very difficult for a lone researcher to handle.

The other issue that was seen was that most of the programs had no capability of

conducting some statistical analysis. As such, all of the programs that were tested in the study

had no capabilities for such type of analysis. Such is one of the main limitation of such programs

as they mainly require the data to be exported to other programs, and depending on its size or the
Big Data with Wellbeing 45

nature of the data set, would result in some limitation, especially when looking at the current

statistical programs.

Another issue that was discovered is that the programs mainly require high knowledge in

a bid to use them effectively. As such, the majority of the programs, require a moderate or basic

understanding of the issues such as programming, the running of virtual machines and knowing

how to navigate on the operating systems, which were not that familiar or typical. It should be

that all of the programs that were tested in the use of Big Data, required that the various

commands were written in a variety of programming languages., such as Java, R, HiveQL and

Pig Latin. Furthermore, some of the programs, needed to be run as the Linux Virtual machine

which may provide a lot of challenges to the inexperienced user, especially when it comes to the

navigation.

Finally, another issue which was noted when it came to the utilization of the Big Data

technologies, was the issue of the hardware needed to run the programs. The main reason for this

is that many of the programs, mainly require more powerful computers than the one which

people can easily access. An excellent example to this is that the IBM Infosphere BigInsights

QuickStart, version 2.1, requireda computer that had a minimum RAM, of 8GB, to be allocated

to the virtual machi9nes, in a bid to make it run. However, despite the allocation, the program

was observed to be slow, as more RAM was needed. Such and many other issues, are some of

the main reasons that the implementation of the use of Big Data, in the study of the well-being of

a community, are adequately used by the various researchers.


Big Data with Wellbeing 46

Conclusion and Findings

From the above research, it is seen that the use of Big Data, mainly provides researchers

with the subjective well-being of a community, the ability to access a second and large sample of

data. The data primarily depends on the data source, in a bid to understand its size. As a result of

this, Big Data may be particularly useful, especially in the study of rare events, which may have

an impact on the subjective well-being of a person or community. Such rarely includes occurring

events, such as natural and the human-made disasters. Furthermore, it should be known that Big

Data, could be used in the study of the subjective well-being of people in a community or region,

or even country.

However, it is important to note that the Big Data, may sometimes have some limitations.

And the various researchers, need to be aware of them. Big Data, should be used more, in a bid

to find the rare events that have a significant impact, on the wellbeing of the person. ,also it

should be put in recordsalso that even though the use of Big Data, may not replace the tradition

self-report measure on the subjective well-being of the people in the near future, they can mainly

serve as a very important additional source of data.


Big Data with Wellbeing 47

References

Choudhary MA, Levine P, McAdam P, Welz P (2012) The happiness puzzle: analytical aspects

of the easterlin paradox. Oxford Economic Papers 64(1):27{42,

DOI 10.1093/oep/gpr006

Choi, H. and H. Varian (2009), ‘Predicting Initial Claims for Unemployment Insurance Using

Google

Trends’, Technical report, Google. Available from:

http://research.google.com/archive/papers/initialclaimsUS.pdf

Cook, R., & Benton, D. (1993). The relationship between diet and mental health. Personality and

Individual Differences,14(3), 397-403. doi:10.1016/0191-8869(93)90308-P

Davenport, T. H., & Barth, P. (2012). How big data is different. MIT Sloan Management

Review, 54(1), 43-46.

Deaton A, Stone AA (2013) Economic analysis of subjective well-being - two happiness puzzles

Deroos, D., Deutsch, T., Eaton, C., Lapis, G., & Zikopoulos, P. (2012). Understanding big data:

analytics for enterprise class Hadoop and streaming data. New York: McGraw-Hill.
Big Data with Wellbeing 48

Ettredge M., J. Gerdes and G. Karuga (2005), “Using Web-based Search Data to Predict

Macroeconomic Statistics”, Communications of the ACM, 48 (11), pp. 87-92, Available

from:http://portal.acm.org/citation.cfm?id¼1096010.

Fleurbaey M (2009) Beyond gdp: The quest for a measure of social welfare. Journal

of Economic Literature 47(4):1029{75, DOI 10.1257/jel.47.4.1029, URL http:

//www.aeaweb.org/articles.php? doi=10.1257/jel.47.4.1029

Frey BS, Stutzer A (2002) What can economists learn from happiness research? Journal of

Economic Literature 40(2):402{435, DOI 10.1257/

002205102320161320, URL http://www.aeaweb.org/articles.php?doi=10.

1257/002205102320161320

Gestel, A., Jansen, M., & Theunissen, M., (2011). Are mental health and binge drinking

associated in Dutch adolescents? Cross-sectional public health study.BMC Research

Notes, 4(1), 100-100. doi:10.1186/1756-0500-4-100

James, C., Bore, M., & Zito, S. (2012). Emotional intelligence and personality as predictors of

psychological well-being. Journal of Psychoeducational Assessment, 30(4), 425-438.

doi:10.1177/0734282912449448
Big Data with Wellbeing 49

Kahneman D, Deaton A (2010) High income improves evaluation of life but

not emotional well-being. Proceedings of the National Academy of Sciences

107:16,489{16,493

Keyes, C. L. M., Shmotkin, D., &Ryff, C. D. (2002). Optimizing well-being: The empirical

encounter of two traditions. Journal of Personality and Social Psychology,82(6), 1007-

1022. doi:10.1037/0022-3514.82.6.1007

Koh, J. (2013, Jun 18). Understanding consumers with help from big data. The Business Times.

Retrieved from

https://www.lib.uwo.ca/cgibin/ezpauthn.cgi/docview/1368686527?accountid=15115

Lazer D, Pentland A, Adamic L, Aral S, Barabasi AL, Brewer D, Christakis N,

Contractor N, Fowler J, Gutmann M, Jebara T, King G, Macy M, Roy D,

Van Alstyne M (2009) Computational social science. Science 323(5915):721{723

Lesk, M. (2013). Big data, big brother, big money. IEEE Security & Privacy, 11(4), 85-89.

doi:10.1109/MSP.2013.81

Miller, K. (2012, January 2). Big data analytics in biomedical research. Biomedical Computation

Review. Retrieved from

http://biomedicalcomputationreview.org/content/big-data-analytics-biomedical-research
Big Data with Wellbeing 50

Morrison, M., Tay, L., & Diener, E. (2011). Subjective well-being and national satisfaction:

Findings from a worldwide survey. Psychological Science, 22(2), 166-171.

doi:10.1177/0956797610396224

Oboler, A., Welsh, K., & Cruz, L. (2012). The danger of big data: Social media as computational

social science. First Monday, 17(7) doi:10.5210/fm.v17i7.3993

Robeyns I (2005) The capability approach: a theoretical survey. Journal of human

development 6(1):93{117

Robeyns I (2006) The capability approach in practice. Journal of Political Philosophy

14(3):351{376, DOI 10.1111/j.1467-9760.2006.00263.x, URL http://dx.

doi.org/10.1111/j.1467-9760.2006.00263.x

Ryff, C. D. (1995). Psychological well-being in adult life. Current Directions in Psychological

Science, 4(4), 99-104. doi:10.1111/1467-8721.ep10772395

Schultes, E. (2013, June). Big data: The myth of independent variables. The 32nd Annual

Conference of the Society for Scientific Exploration. Lecture conducted from Dearborn,

MI.
Big Data with Wellbeing 51

Sen A (2008) Capability and Well-being. In: Hausman D (ed) The philosophy of

economics: an anthology, Cambridge University Press, New York, pp 270-294

Statistics Canada (2013). Canadian Community Health Survey (CCHS) annual component: User

guide 2012 and 2011-2012 microdata files. Retrieved from:

http://equinox2.uwo.ca/docfiles/cchs/2012/cchs-escc2012_2011-2012gid-eng.pdf.

Stiglitz J, Sen A, Fitoussi JP (2009) Report by the commission on the measurement of economic

performance and social progress. Tech. rep., INSEE,

URL http://www.insee.fr/fr/publications-et-services/dossiers_web/

stiglitz/doc-commission/RAPPORT_anglais.pdf

Tien, J. M. (2013). Big data: Unleashing information. Journal of Systems Science and Systems

Engineering, 22(2), 127-151. doi:10.1007/s11518-013-5219-4

Trainor, S., Delfabbro, P., Anderson, S., & Winefield, A. (2012). Leisure activities and

adolescent psychological well-being. Journal of Adolescence, 35(2), 467-467.

doi:10.1016/j.adolescence.2012.02.005

Ward, B. W. (2013). What’s Better—R, SAS®, SPSS®, or stata®? Thoughts for instructors of

statistics and research methods courses. Journal of Applied Social Science, 7(1), 115-120.

oi:10.1177/193672441
Big Data with Wellbeing 52
Big Data with Wellbeing 53

Appendix

Appendix A

RECORDING

1 = 2, 2 = 1, 6 7 8 9 = 999

ADM_PRX

ALC_1

CCC_071

CCC_101

CCC_121

CCC_131

CCC_141

CCC_171

PAC_1A – PAC_1Z

PACFLEI

SDCFIMM

SMK_10

SMK_05D

SMK_06A

SMK_09A

SMK_10A

96 97 98 99 = 999

CCCG102

DHHGLVG
Big Data with Wellbeing 54

SACDTOT

SMKDSTY

99.9 = 999

PACDEE

96 = 0, 97 98 99 = 999

ALC_2

SMK_05C

96 = 1, 97 98 99 = 999

ALC_3

INCGPER

996 = 0

PAC_2A – Z

SMK_204

SMK_05B

6 7 8 9 = 999

ADM_N09

ADM_N10

DHH_OWN

DHHGHSZ

DHHGMS (Also, recorded values into not

married [1] and married [2])

EDUDH04

EDUDR04
Big Data with Wellbeing 55

FVCGTOT

GEN_01

GEN_02B

GEN_07

GEN_10 (Also, values reversed for

consistency [ex., 1 = Very Weak])

GENDHDI

GENDMHI

HWTGISW

INCG2

INCGHH

PACDFR

PACDPAI

PACFD

SDCGRES

SMK_202

SMK_01A

6 = 1, 7 8 9 = 999

ADM_N11

6 = 0, 7 8 9 = 999

PAC_3A – Z

1 = 2, 2 = 1, 6 = 1, 7 8 9 = 999

SDC_8
Big Data with Wellbeing 56

990 – 1000 = SYSMIS

All variables

Appendix B

ALC_3 (Binge Drinking): Frequency of having 5 or more drinks in the past year.

CCC_071 (High Blood Pressure): Yes or no response forthe presence of the medical

condition.

CCC_121 (Heart Disease): Yes or no response forthe presence of the medical condition.

CCC_131 (Cancer): Yes or no response forthe presence of the medical condition.

CCCG102 (Onset of Diabetes): Indicates age participant was when diagnosed with

diabetes.

DHHGAGE (Age): Indicates the age bracket participant falls within.

DHHGHSZ (Household Size): Indicates how many people live with the participant.

DHHGMS (Marital Status): Indicates if the participant is married or single.

DHHSEX (Gender): Indicates if participant is male or female.

EDUDR04 (Education Completed): Indicates highest level of education completed.

FVCDTOT (Fruit & Vegetable Intake): Frequency of eating fruits and vegetables daily.

GEN_07 (Life Stress): Indicates perceptions of stress present in participant’s life.

GEN_10 (Community Belonging): Indicates participant’s sense of belonging to a

community.

GENDHDI (Self-Report Physical Health): General physical health from poor to

excellent.

GENDMHI (Self-Report Mental Health): General mental health from poor to excellent.

HWTGBMI (Body Mass Index): Self-reported BMI score.


Big Data with Wellbeing 57

INCGPER (Personal Income): Reported personal income from all sources.

PACDEE (Energy Expenditure): Energy spent on physical leisure activities daily

Appendix C

Figure 4
Big Data with Wellbeing 58

Figure 5