Golang Vs Ruby on Rails – Data Insertion Metrics – A Study

Overview:

With the availability of lots of options to choose from the cost effective hardware in cloud environment, the storage has become so easy now a days. Once the storage is cheap, the immediate requirement for any software solution is to write an effective code to consume and expose the data. Agira team met a similar situation with one of our assignments last week.  The solution demanded uploading of  a very complex CSV data into an ETL Database. Improved performance was the goal and we started exploring the possibilities of developing the piece of code in Ruby on Rails and Golang.

Approach:

On the Rails side, we wrote the code on Ruby and executed the upload action using sidekiq jobs. Our good old concurrency bottlenecks of Ruby language hit the performance and the time taken to complete the job was not satisfactory. To handle the situation, we wrote the piece of code in Golang which does the similar task. On the developer’s laptop itself we could see the better performance of the Golang code.

Technology Bundle:

  • Ruby – 2.3.
  • Rails – 4.2.0
  • PostGres Gem – 0.18.4
  • PostGres DB – 9.3.11
  • GoLang – 1.5.3

Ruby Code Snippet:

CSV.read(‘./db/samples/myfile_sample.csv’, {col_sep: “,” }).each_slice(200).each_with_index do |rows,i|

array_hash =[]rows.each_with_index { |row,j|

next if (i==0 && j==0);

array_hash <<         Hash[[array_header,row].transpose] }

RubyCsv.create(array_hash)

end

GoLang Code Snippet:

func (csvDataContainer *VariableInit) GenertaeString() {
// for example csvDataContainer.Lenght = 100

csvDataContainer.stringVal = “”// store the rows of data (as a string) to be inserted in table
csvDataContainer.str = “”// used for comma and semicolon separator
if len(csvDataContainer.remaining) > csvDataContainer.Lenght{
csvDataContainer.first_100 = csvDataContainer.remaining[:csvDataContainer.Lenght] csvDataContainer.remaining = csvDataContainer.remaining[csvDataContainer.Lenght:]

}else{

csvDataContainer.first_100 = csvDataContainer.remaining
csvDataContainer.remaining = nil

}
for parentIndex,rec_values := range csvDataContainer.first_100 {

csvDataContainer.stringVal += “(“
for childIndex,rec := range rec_values {

if rec = rec; rec == “” { rec = “NULL”}
checkAlphanumeric, _ := regexp.Compile(“([a-zA-Z]+)”)
if (checkAlphanumeric.MatchString(rec)) {rec = “‘”+rec+”‘”}
if csvDataContainer.str = “,”; childIndex == 0 { csvDataContainer.str = “” }
csvDataContainer.stringVal += (csvDataContainer.str+rec)

}
if (len(csvDataContainer.first_100) == parentIndex+1) { csvDataContainer.stringVal += “);”
} else {csvDataContainer.stringVal += “),”}

}

psqlConn.InsertRec(csvDataContainer.stringVal)
csvDataContainer.first_100 = nil
fmt.Printf(“————–> %d\n”, len(csvDataContainer.remaining))
if (len(csvDataContainer.remaining) !=0) { csvDataContainer.GenertaeString() }

}

Cloud Setup:

We wanted to do an extensive testing, so decided to have a cloud environment in AWS to test the performance in terms of huge data volumes.

The following AWS boxes configurations were considered for running the volume test:

Instance Type vCPU Memory (GiB) Networking Performance Physical Processor Clock Speed (GHz)
c4.xlarge 4 7.5 High Intel Xeon E5-2666 v3 2.9
c4.2xlarge 8 15 High Intel Xeon E5-2666 v3 2.9

 

Record Set Sample:

Column Data 1 Data 2 Data 3
Data 4 Data 5
policyID 119736 448094 206893 333743 172534
statecode FL FL FL FL FL
county CLAY COUNTY CLAY COUNTY CLAY COUNTY CLAY COUNTY CLAY COUNTY
eq_site_limit 498960 1322376.3 190724.4 0 0
hu_site_limit 498960 1322376.3 190724.4 79520.76 254281.5
fl_site_limit 498960 1322376.3 190724.4 0 0
fr_site_limit 498960 1322376.3 190724.4 0 254281.5
tiv_2011 498960 1322376.3 190724.4 79520.76 254281.5
tiv_2012 792148.9 1438163.57 192476.78 86854.48 246144.49
eq_site_deductible 0 0 0 0 0
hu_site_deductible 9979.2 0 0 0 0
fl_site_deductible 0 0 0 0 0
fr_site_deductible 0 0 0 0 0
point_latitude 30.102261 30.063936 30.089579 30.063236 30.060614
point_longitude -81.711777 -81.707664 -81.700455 -81.707703 -81.702675
line Residential Residential Residential Residential Residential
construction Masonry Masonry Wood Wood Wood
point_granularity 1 3 1 3 1

 

Observation:

The code has been executed against 100k, 500k and 1million record sets. The time taken to update the records in database are captured from the log records. Look below for the time taken in AWS for different volume of record sets for the Golang & Ruby code snippets:

Performance-Metrics

 

 

Code Reference:

The complete code scripts and the sample data used are available for your knowledge at the Git repository: https://github.com/agiratech/go_vs_ruby_metrics

Conclusion:

On the considered scenario for study, to upload records into ETL database, Golang code performance is better than Ruby code.

We at Agira, always believe in suggesting the best possible solution to our clients. Here too, ourteam expended it’s technical knowledge, suggested required technology stack for the product/application development to the clients for improved performance.