Skip to main content

Posts

DOCKERIZING HADOOP, HIVE,SQOOP, SPARK, KAFKA, NIFI,ELASTIC-SEARCH

A working setup of docker which runs Hadoop and other big data components are very useful for development and testing of a big data project. When I was in need, I couldn't found a simple and a working docker setup and here would like to bring together it for you. Please download the files from here .
Recent posts

HOW SPARK SUPPORTS OTHER LANGUAGES (R,JAVA,PYTHON)

Spark,  the general purpose computing framework which is written in Scala allows java. python and R language clients to interact with it. This actually increased the acceptance of Spark among programmers and as an after effect, the spark is enriched with programming libraries that each language has. But I was wondering how the architecture is designed to handle this. When we send a spark job in Python language, how the core functionality of Sparks such as SparkContext or RDD creation is taken care.  I have come across some points to describe how this function. ·        Only Scala client code is directly using spark libraries ·        Other language clients are not interacting with spark's Scala code directly. ·        If the spark libraries are only available in Scala, then whatever programming language we use, the terminal operations can happen only in Scala programming language.  ·        Ex:    If we are using a third party java API from our Java c

UNITED STATES : GENERAL ANALYTICAL DATA POINTS

I searched a lot on google and struggled a lot to brought some data points in order to learn the basics of data analytics. That is why I decided to share the same with you for not waste much of your time. These are some general analytical data points collected about the United States. Gasoline Price  Interest Rate Unemployment Rate Currency Strength  GDP Inflation House Price Index I will try to enrich the data and will be adding more metrics to it. If anybody interested in contributing data is welcomed. ( My official email jobs.thomas1@yahoo.co m )  You can download the data from git hub repository. https://github.com/jobmthomas/DataForAnalytics.git I hope this is really helpful for many.

EMOTION API OF MICROSOFT ASSURE COGNITIVE SERVICES, A JAVA EXAMPLE

What is Microsoft Assure Cognitive service : Microsoft Assure Cognitive services provide a set of precooked cloud services as API for the end user who is into machine learning portfolio. As an end user, we don't want to take the pain in managing a full stack machine learning infrastructure, we can avail it with subscription basis. This helps developers add Intelligent features to their products with very ease. The cognitive services are classified into Vision, Speech, Language, Knowledge, Search. The Emotion API is under Vision. How to get trial keys : There are keys for each service provides by them, here we need an Emotion API key and that can be found  at  here We will get one pair of keys and in that one is a spare one. Java Example Since all assure service is cloud-based, the client's program is accessing it through internet URL's, for that, we have to use HTTP client's such  Apache HttpComponents that you can get it from here   Below is the sampl

PROCESSING IMAGES IN HADOOP USING MAPREDUCE

HIPI: Hipi is Hadoop's Image Processing interface. This provides a set of tools and Input format to process a bulk amount of images using Hadoop's Distributes File System (HDFS) and MapReduce . STEPS INVOLVED: In hipi, the entire process can be categorized into 2 parts. 1) Converting all images into a bulk file(HIPI Image Bundle). 2) Processing the created bulk file of an image using HIPI's image input formats.     The cull (culler class) is used to filter out images with low clarity or defects ISSUES WITH HIPI: To simulate my bulk image processing scenario, I used a java program to create multiple copies of the same image with different names in a single directory. then by using  hipi's  utility, I converted all images into a bulk file (known as the hip file). To check whether all images exist in the bulk file, I have done the reverse process (Converted HIP file into multiple images). There is a utility of hipi to do the same. But I didn't get all