A working setup of docker which runs Hadoop and other big data components are very useful for development and testing of a big data project. When I was in need, I couldn't found a simple and a working docker setup and here would like to bring together it for you. Please download the files from here .
Spark, the general purpose computing framework which is written in Scala allows java. python and R language clients to interact with it. This actually increased the acceptance of Spark among programmers and as an after effect, the spark is enriched with programming libraries that each language has. But I was wondering how the architecture is designed to handle this. When we send a spark job in Python language, how the core functionality of Sparks such as SparkContext or RDD creation is taken care. I have come across some points to describe how this function. · Only Scala client code is directly using spark libraries · Other language clients are not interacting with spark's Scala code directly. · If the spark libraries are only available in Scala, then whatever programming language we use, the terminal operations can happen only in Sca...