Shrinking The Uncertainty By Predictive Analysis In Production Environment

Abstract 'In production environment the basic processing is the manufacturing of products and get the order of that product from the customers. Only getting orders is not enough for a huge company. We need to know the demand of their product in order to compete in competitive market. So only to know the demand of our product we have to track the order behavior of the same. Social media is a platform which can give us a lot of information about the product. We not only track the product orders but also predict the future response of customer depending the present dataset and historical datasets.
Keywords: Time Series, Seasonal Variant, HDFS, Flume, HoltWinters, Predict.


Whenever we go for analysing about any particular product the primary need is what the current response of people on that product is. We can get the response of people from service outlet where products are sold or in the web where the people are commenting on that product. Now a days social media is like a huge platform where all the retailer and the customer comes with their views ,because it's the only platform where all can sit in their home and discuss their personal experiences about the products. They also can use device like mobile, ipad, tab etc to give their responses. From research point of view we can collect the response, comment likes as dataset for our analysis about a particular product. A also social media is like news channel where each and every incidents happening all over the world are published in a real time manner. So here the concern is tracking those customer behaviour for a product and based on that what can be the future response of the customers. As future is unpredictable, it's the major challenge for business analysts to decide about the delivery of a product to market. [6]


Suppose we face a social media which is new platform to us. We have to put Facebook and Twitter under supervision and find the correlation between the social network and human i.e. friends, followers, followees. There we can find the trend that an individual joins a particular group which is not related to his friends' availability in that rather it is how his friends are connected to one another. In twitter people tweet daily and chat with each other. That involves talk about daily life, commenting on someone, some event etc. Researcher found the content discussed in social media can boost the research for various aspects as sentimental analysis, predictive analysis, human computer analysis etc. Even in political scenario what will be the outcome of the election can be known by analysing social media data. Based on comments of a movie trailer what'll be the response for a movie is also calculated.


Facebook finds there are two groups. One group like a product and based on the number of visited and number of like we predict the popularity of that particular product. Second group comments on a product and based on the comment we extract the comment by some method and calculate how many number of comment are given in a particular duration after launching of the product. Like that also numbers of likes within a time can be calculated? These data will help us to predict the future response of that product. Some analysis can be done by tracking the access pattern and identifying blogs using ratio of posts, comments. Another researcher explored that number of comments posts are given to a particular publication of book within a particular period of time also predict the response of people about the selling of that particular book. As per Szabo & Huberman the video on YouTube or dig becomes popular after certain period of time after the publication. As traffic varies time to time to time, so to avoid all these they introduced source relative time based on the total number of digs or video views across the source divided by the total number of hours that have the data.

There are two aspects in predicting modelling first one is the regression analysis which predicts the response with some value i.e. magnitude. The examples can be stock price, return on investment. Second is classification suggests that prediction of categorical response. Examples can be which brand the customer will prefer, will he buy or not. So overall in prediction we observe the people's observation in a particular product. Generally we deploy three modelling techniques in predictive modelling like traditional, data adaptive, model dependent.
Traditional approaches like linear regression, logistic regression create a model which will fit the data. After we fit the model with dataset we will use predict function to analyse.
In data adaptive approach we have to find the parameter which mostly affect the prediction. So we give emphasis on hypothesis rather than direct analysis. So it is a pure business analytics and depend on the analysis we will generate code for data analysis.
Our third approach is model dependent which will generate data, predictor etc. Mathematical model, Linear Programming Model, Operation Research etc are the primary factors of this approach.


In traditional time series analysis we first decompose the time series into trend, seasonal component and remainder i.e. Xt decomposes T, S, e respectively. From these data we use filtering to get our trend.
The filtered value of a time series at given period 't' is represented by the average of value(x'.x).The coefficient of filter are 1/2a+1 for each case. To interpret the filter we take a=2 for week, a=12 for month, a=40 for quartet.
The filtering is done by the command filter ().
We also can evaluate the trend of tome series which uses non parametric regression techniques. We use a function st1() which performs seasonal decomposition of time series Xt .
Then it determines the T(trend) and S(Seasonal component) and e(Residual).
We have to transform the data to ts-object. It is due to tome series contains a lot of information rather than value. E.g. information about dates, frequencies at which time series has recorded.[15]


For predicting a next value of given time series Xt at the period 't' weighted sum of past observations is required. We should prefer present data rather than previous data for prediction.

Here we use exponential smoothing because the time series is with no symmetrical trend or seasonal component. So we use Holt-Winters function for this type of problem. We require three parameter i.e. level(??), trend(??), seasonal variation(??).
HoltWinters(t, ??, ??, ??) function performs its action on time series 't'. We can also exclude the value of ?? as ??=0.


The above command loads dataset in CSV format from the destination and transform it to 'ts-' object.

This will perform the Holt-Winters function on the dataset and display the parameters i.e. trend, seasonal component, and level. For prediction we need to save the fit model into an object named data.hw



The above command will show the predicted values for next 12 unit time. We also can plot a graph with for next 4 years.



The datasets are highly available at different websites, social networking sites etc. For collecting a data we need to create API which will stream real time data to our system. As the input data coming as streaming is in random data so we need it in JSON format to analyze. But the major issue with our traditional RDBMS is it cannot handle the unstructured data. So we need HADOOP eco system which includes Hive and HBase. Hive is like a query interface which can be used as SQL to query data resided on HDFS. The query language is like SQL but not same. So we call it as HQL. First we need to get the data to HDFS and then we will be able to query data by Hive. We have to write code for the data streaming from any particular website by access token and secret key provided by the server. We can also use Apache Flume to gather data. It works on source-sink mechanism. In flume as each data is event and source produce data(event) and send it through a secure channel to sink where sink write the data to predefined location i.e HDFS files. Then the data is preprocessed to CSV format in matrix format where first column specifies year, second is quarter, and then orders which is sampled to [0-1]. On this data frame we will forecast the future orders by R.


year quarter week orders
1 2014 Q1 1 0.1620370
2 2014 Q1 2 0.0350988
3 2014 Q1 3 0.1109399
4 2014 Q1 4 0.1491900
5 2014 Q1 5 0.1209814
6 2014 Q1 6 0.1044528

Here we use this types of data format to analyze

year quarter week orders
1 2014 Q1 1 0.162037017
2 2014 Q1 2 0.035098799
3 2014 Q1 3 0.110939920
4 2014 Q1 4 0.149190046
5 2014 Q1 5 0.120981436
6 2014 Q1 6 0.104452773
7 2014 Q1 7 0.207429668
8 2014 Q1 8 0.189101995
9 2014 Q1 9 0.134634659
10 2015 Q3 36 0.018114914
11 2015 Q3 37 0.098937180
12 2015 Q3 38 0.083456064
13 2015 Q3 39 0.636601105
14 2015 Q4 40 0.077512931
15 2015 Q4 41 0.043758032
16 2016 Q1 1 0.092038430
17 2016 Q1 2 0.110973004
18 2016 Q1 3 0.161409802
19 2016 Q1 4 0.113494006
20 2016 Q1 5 0.126793503
21 2016 Q1 6 0.055097068
22 2016 Q1 13 0.492528814

> library('ggplot2')
qplot(week, orders, data = dataset, colour = as.factor(year), geom = "line")

There is a lot of similarity between one quarter and the two
years. There is also a data similarity in this graph which tells that there is a hidden structure of data pattern which will be useful for order forecasting or prediction.

qplot(week, data=dataset, colour=as.factor(year), binwidth=0.5) + facet_wrap(~ quarter)

dataset2 <- cbind(dataset[-13,], weekinq=c(1:117))
prev <- dataset2[1,]
runvar <- 1
for(i in 2:nrow(dataset2)){
current <- orderts2[i,]
dataset2[i,"weekinq"] <- ifelse(prev$quarter == current$quarter, runvar+1, 1)
runvar <- ifelse(prev$quarter == current$quarter, runvar+1, 1)
prev <- current
rm(prev, current, runvar, i)
Now comparing the orders quarterly
qplot(weekinq, orders, data = dataset2, colour = as.factor(year), geom = "line") + facet_wrap(~quarter)


Above experiment shows us the prediction analogy of manufacturing products and forecasting for efficient and cost saving business model. As R is not the only option for analysis. It's like a tool which contains thousands of model and implement it by graphically and also numerically. But to get this type of result we have to get the data, process the data, transform it to suitable object format, if necessary we have to gain transform it and apply model to get our result. This is for simple data analysis. When there is complex model for prediction then maybe we have lengthen our processing sequence.


We can also predict the sentiments of users about the product from the social networking site . For that we have to stream the data from that site using Facebook Streaming API, Twitter Streaming API. Which is written in various languages like PHP,JAVA,PYTHON etc. After getting the data we have to store it either in MySql or in HDFS using Hive and HBase. There we need to check the comment type of people as there are lots of emotion like happy, angry, depressed etc. i.e. ' ' :P. Various people use different language so language is not constrained to English only. So we have to consider all the language. After getting these data we can analyze the present trend and based on present trend and historical data we can predict the future response.[10][13]

[1] Jure Leskovec, Lada A. Adamic and Bernardo A. Huberman. The dynamics of viral marketing. In Proceedings of the 7th ACM Conference on Electronic Commerce, 2006.
[2] Bernardo A. Huberman, Daniel M. Romero, and Fang Wu. Social networks that matter: Twitter under the microscope. First Monday, 14(1),Jan 2009.
[3] B. Jansen, M. Zhang, K. Sobel, and A. Chowdury. Twitter power:Tweets as electronic word of mouth. Journal of the American Society for Information Science and Technology, 2009.
[4] D. M. Pennock, S. Lawrence, C. L. Giles, and F. A?? . Nielsen.The real power of artificial markets. Science, 291(5506):987'988, Jan 2001.
[5] Kay-Yut Chen, Leslie R. Fine and Bernardo A. Huberman. Predicting the Future. Information Systems Frontiers, 5(1):47'61, 2003.
[6] W. Zhang and S. Skiena. Improving movie gross prediction through news analysis. In Web Intelligence, pages 301304, 2009.
[7] Akshay Java, Xiaodan Song, Tim Finin and Belle Tseng. Why we twitter:understanding microblogging usage and communities. Proceedings of the 9th WebKDD and 1st SNA-KDD 2007 workshop on Web mining and social network analysis, pages 56'65, 2007.
[8] Ramesh Sharda and Dursun Delen. Predicting box-office success of motion pictures with neural networks. Expert Systems with Applications,vol 30, pp 243'254, 2006.
[9] Daniel Gruhl, R. Guha, Ravi Kumar, Jasmine Novak and Andrew Tomkins. The predictive power of online chatter. SIGKDD Conference on Knowledge Discovery and Data Mining, 2005.
[10] Mahesh Joshi, Dipanjan Das, Kevin Gimpel and Noah A. Smith. Movie Reviews and Revenues: An Experiment in Text Regression NAACL-HLT,2010.
[11] Rion Snow, Brendan O'Connor, Daniel Jurafsky and Andrew Y. Ng.Cheap and Fast - But is it Good? Evaluating Non-Expert Annotations for Natural Language Tasks. Proceedings of EMNLP, 2008.
[12] Fang Wu, Dennis Wilkinson and Bernardo A. Huberman. Feeback Loops of Attention in Peer Production. Proceedings of SocialCom-09: The 2009 International Conference on Social Computing, 2009.
[13] Bo Pang and Lillian Lee. Opinion Mining and Sentiment Analysis Foundations and Trends in Information Retrieval, 2(1-2), pp. 1135, 2008.
[14] Namrata Godbole, Manjunath Srinivasaiah and Steven Skiena. Large-Scale Sentiment Analysis for News and Blogs. Proc. Int. Conf. Weblogs and Social Media (ICWSM), 2007.
[15] G. Mishne and N. Glance. Predicting movie sales from blogger sentiment.In AAAI 2006 Spring Symposium on Computational Approaches to Analysing Weblogs, 2006.

Source: Essay UK -

About this resource

This Information Technology essay was submitted to us by a student in order to help you with your studies.

Search our content:

  • Download this page
  • Print this page
  • Search again

  • Word count:

    This page has approximately words.



    If you use part of this page in your own work, you need to provide a citation, as follows:

    Essay UK, Shrinking The Uncertainty By Predictive Analysis In Production Environment. Available from: <> [06-06-20].

    More information:

    If you are the original author of this content and no longer wish to have it published on our website then please click on the link below to request removal: