Nanocubes implementation with an example

Nanocubes implementation:

This article speaks on Nanocubes, why and when we need to go in for this new concept, and how we can go about Nanocubes implementation.

 

Nanocubes provides  real-time visualization of large datasets. This is the official definition. Furthermore, it is defined as providing visualizations that can be used to explore datasets with billions of elements at interactive rates in a web browser. This means that, for example, Nanocubes can provide more in-detailed visuals when compared to a map. It can provide many more layers to define a visualization and go even closer to the point of definition.

 

So why would we use them? As aforementioned, for more detailed and defined visualisations of a point, we go for Nanocubes. Also, nanocubes take quite little memory to run on. therefore, it can be used even in our modern everyday laptop!  Nanocubes provides three main options to examine, filter  and segment out the large datasets: spatial (with reference to space) , categorical (which means attributes that you can categorise and specify), and lastly temporal (with reference to time).

 

In this article, I am going to describe in detail on nanocubes implementation, but pertaining only to Ubuntu (Linux machines).

Let us proceed step by step.

 

Step 1: First of all, installing prerequisites

As always, we have prerequisites. So, install the following prerequisites for all systems:

  • We need an at least 64-bit operating system. Why? Because nanocubes server is 64 bit, CANNOT support a lesser bit operating system.
  • Since the nanocubes server is written using C++ 11, we need to use a recent version of gcc (>= 4.8).
  • We need a version 1.48 or later of Boost, which is used by nanocubes server.
  • Lastly, we need the  GNU build system. This will build the nanocubes server for us.

Step 2: Linux (Ubuntu)

As mentioned earlier, we are only going to see the nanocubes implementation in an Ubuntu machine. So, if a 64-bit Ubuntu 14.04 system has been newly installed, then the version of gcc/g++ that comes along is already 4.8.2. Nevertheless,  you should install the following packages:

 

1. sudo apt-get install build-essential

           – This is to compile a debian package, and includes references to all related packages.

 

2. sudo apt-get install automake

           – Needed to create GNU standards-compliant Makefiles

 

3. sudo apt-get install libtool

            – To link on several platforms without the details of it

 

4. sudo apt-get install zlib1g-dev

           – This is for compressing

 

5. sudo apt-get install libboost-all-dev

           – To install the libraries that need to be compiled

 

Step 3: Download nanocubes in the Terminal

This is obvious and mandatory.

 

Step 3.1: Download nanocubes archive file:

 


wget https://github.com/laurolins/nanocube/archive/3.2.zip

Step 3.2: Once completed download, unzip the downloaded file using the below command:

 


$ unzip 3.2.zip

 

Step 4: Next is to implement Nanocubes

 

Step 4.1: In terminal go to the folder where nanocubes file has been unzipped, and use following command:

 


$ cd nanocube-3.2

 

Step 4.2: Next, in order to configure, assign your local file path:

 


$ export NANOCUBE_SRC=`pwd`

 

Step 4.3: In order to boot the nanocubes, we need to run the following command. The below command is actually a set of short shell scripts that are run at the shell itself. This is, by definition, known as the chaining of commands in Linux.

 

$ ./bootstrap

 

Step 4.4: Then, make directory:

 


$ mkdir build

 

  Step 4.5: Go to the build folder:

 


$ cd build

 

 Step 4.6: After this, build the file configuration. There are no hassles. Therefore, you just need to follow the instructions:

 


 $  ../configure --prefix=$NANOCUBE_SRC CXXFLAGS="-O3"

 $ make

 $ make install

 $ cd ..

 

Step 5: Keep the tools easily accessible.

 

After execution of these commands, you should now have a directory nanocube-3.2/bin with the nanocubes toolkit inside.

 

In order to make these tools more easily accessible in your account, add the nanocube-3.2/bin directory to your PATH environment variable.

 


$ export NANOCUBE_BIN=$NANOCUBE_SRC/bin

$ export PATH=$NANOCUBE_BIN:$PATH

 

You should take care of a few restraints:

  1. Run configure and mention the recent version of g++ in case the one in your system is too old: CXX=g++-4.8 ../configure –prefix=$NANOCUBE_SRC CXXFLAGS=”-O3″
  2. Configuring nanocubes with the tcmalloc option provides better performance, and so:                      ../configure –prefix=$NANOCUBE_SRC –with-tcmalloc CXXFLAGS=”-O3″

Step 6: Running a nanocube

 

Now that we have installed the nanocube toolkit, we are ready to build a nanocube. Let us take the example of a dataset file included in the distribution, having probably around 50000 records. Here is the command that is going to a nanocube out of the example, for us:

 


$ cat $NANOCUBE_SRC/data/sample.dmp | nanocube-leaf -q 29512 -f 10000

 

So as to describe, the above command simply asks to start a nanocube backend process from the sample.dmp data file. In addition, it asks to answer queries on port 29512, and report the statistics every 10,000 insertions. Sample output from this call is shown below. In here, after inserting all 50,000 records, the nanocube is using 26MB of memory on the whole (probably approximating to only 20MB, if you are using tcmalloc).

 

Output


VERSION: 3.2.1

query-port: 29512

(stdin     ) count:      10000 mem. res:          5MB. time(s):          0

(stdin:done) count:      50000 mem. res:         24MB. time(s):          0

 

If at all the port 29512 is already in use, select another port and use it consistently throughout the examples below.

 

Lastly, simple queries to test your implementation:

Query 1: Total count of all records


http://localhost:29512/count

Output


{ "layers":[  ], "root":{ "val":50000 } }

Interpretation

Starting at the root of the nanocube, we have 50,000 records in total.

 

Query 2: Schema of the nanocube


http://localhost:29512/schema

Output


{

 "fields": [

   {"name": "location","type": "nc_dim_quadtree_25","valnames": {}},

  {

     "name": "crime",

     "type": "nc_dim_cat_1",

     "valnames": {

       "OTHER_OFFENSE": 22,

       "NON-CRIMINAL_(SUBJECT_SPECIFIED)": 18,

       "NARCOTICS": 16,

       "GAMBLING": 9,

       "MOTOR_VEHICLE_THEFT": 15,

       "OTHER_NARCOTIC_VIOLATION": 21,

       "OBSCENITY": 19,

       "HOMICIDE": 10,

       "THEFT": 29,

       "DECEPTIVE_PRACTICE": 8,

       "CRIMINAL_DAMAGE": 5,

       "STALKING": 28,

       "BATTERY": 2,

       "PUBLIC_PEACE_VIOLATION": 25,

       "PUBLIC_INDECENCY": 24,

       "ASSAULT": 1,

       "BURGLARY": 3,

       "ROBBERY": 26,

       "LIQUOR_LAW_VIOLATION": 14,

       "INTERFERENCE_WITH_PUBLIC_OFFICER": 11,

       "NON-CRIMINAL": 17,

       "PROSTITUTION": 23,

       "ARSON": 0,

       "INTIMIDATION": 12,

       "SEX_OFFENSE": 27,

       "CONCEALED_CARRY_LICENSE_VIOLATION": 4,

       "OFFENSE_INVOLVING_CHILDREN": 20,

       "KIDNAPPING": 13,

       "CRIM_SEXUAL_ASSAULT": 7,

       "WEAPONS_VIOLATION": 30,

       "CRIMINAL_TRESPASS": 6      } },

   {"name": "time","type": "nc_dim_time_2","valnames": {}}  ],

 "metadata": [  {"key": "location__origin","value": "degrees_mercator_quadtree25"},

   {"key": "tbin","value": "2013-12-01_00:00:00_3600s"},

   {"key": "name","value": "crime50k.csv"}

 ]

}

 

Step 7: Simple web client

 

This is in order to display or render the result into frontend. Also note that this viewer (in our example) would work with a nanocube of specs: one spatial, zero or more categorical and one temporal dimension.

 

First, before starting the viewer, we need to specify where the nanocube process is being hosted on our machine. To do this, we create an nc_web_viewer specific.json configuration file and put it in the same directory as the viewer.

 

Step 7.1:  Creating a nc_web_viewer specific .json configuration file:

 


ncwebviewer-config -s http://localhost:29512 -o $NANOCUBE_SRC/extra/nc_web_viewer/config_crime.json

 

In our specific case, we can generate a valid configuration file for the sample data by running the following command (which is a python script found in $NANOCUBE_SRC/bin) and then, specifying the machine and port of the nanocube.

 

Step 7.2: The last of this step is to generate the configuration file:

 


$  cd $NANOCUBE_SRC/extra/nc_web_viewer     

$ python -m SimpleHTTPServer 8000

 

By providing the following URL in your browser, we can get to the nc_web_viewer and thereby, our first visualization of the sample data we used. Consequently, you can see that the name of the configuration file (without file extension) is specified in the URL.

 


http://localhost:8000/#config_crime

Conclusion

Now, we have successfully performed Nanocubes implementation, and have also worked on an example, albeit a simple one. I repeat that this nanocubes implementation is relevant only on an Ubuntu machine, and for other OS, there are different approaches. Nanocubes can be an interesting concept to play with and try out its potential. As mentioned in the beginning, it is not just something similar to Maps. Try it out, and happy coding…