How To: Shove data into Postgres using Goroutines and GoLang

After watching Rob Pike’s wonderful golang talk, ‘Concurrency Is Not Parallelism’, in which he uses the analogy of having many Gophers running around and getting work done. I realized that I wanted to program a test so as better to solidify the concept.

Recently a pain point for me while using Play! and Scala was finding a fast way to insert data into a Postgres Database. My solution was fine but I felt it could be simpler, nay I would even say that ideally it SHOULD be simpler. I decided to see how golang would handle this type of problem.

Please install golang 1.2 and postgres 9.3.1 on a 10.9 OSX machine.

We will then create a database with one table.

$ initdb ~/Programming/postgres-E utf8
$ pg_ctl -D ~/Programming/postgres -l ~/logfile start
$ createdb testdb
$ psql testdb
testdb=# create table test ( gopher_id int, created timestamp );

Next, create a file called test_one.go with my provided code.. Our goal right now is to simply get data to insert into the database. I should mention that I planned to have many goroutines (or as Rob says, gophers) inserting data into my testdb so the column gopher_id was in preparation for this.

package main
 
import (
  //"os"
  "log"
  "time"
  "fmt"
  "database/sql"
  _ "github.com/lib/pq"
)
 
func main() {
  // create the statement string
  var sStmt string = "insert into test (gopher_id, created) values ($1, $2)"
 
  // lazily open db (doesn't truly open until first request)
  db, err := sql.Open("postgres","host=localhost dbname=testdb sslmode=disable")
  if err != nil {
    log.Fatal(err)
  }
 
  stmt, err := db.Prepare(sStmt)
  if err != nil {
    log.Fatal(err)
  }
 
  fmt.Printf("StartTime: %v\n", time.Now())
 
  res, err := stmt.Exec(1, time.Now())
  if err != nil || res == nil {
    log.Fatal(err)
  }
 
  // close statement
  stmt.Close()
 
  // close db
  db.Close()
 
  fmt.Printf("StopTime: %v\n", time.Now())
}

Run it and take a look at the results.

$ go build test_one.go 
$ ./test_one 
StartTime: 2013-10-28 19:37:11.014548615 -0700 PDT
StopTime: 2013-10-28 19:37:11.01744029 -0700 PDT
$

Yay! We have inserted exactly one row into the database.

Lets modify our program to insert a 100,000 rows using a simple for loop.

package main
 
import (
  "os"
  "log"
  "time"
  "fmt"
  "database/sql"
  _ "github.com/lib/pq"
)
 
func main() {
  // create the statement string
  var sStmt string = "insert into test (gopher_id, created) values ($1, $2)"
  var entries int = 100000
 
  // lazily open db (doesn't truly open until first request)
  db, err := sql.Open("postgres","host=localhost dbname=testdb sslmode=disable")
  if err != nil {
    log.Fatal(err)
  }
 
  fmt.Printf("StartTime: %v\n", time.Now())
  for i := ; i < entries; i++ {
 
    stmt, err := db.Prepare(sStmt)
    if err != nil {
      log.Fatal(err)
    }
 
    res, err := stmt.Exec(1, time.Now())
    if err != nil || res == nil {
      log.Fatal(err)
    }
 
    // close statement
    stmt.Close()
  }
 
  // close db
  db.Close()
 
  fmt.Printf("StopTime: %v\n", time.Now())
  os.Exit()
}

And the results after running it.

$ ./test_one 
StartTime: 2013-10-28 19:53:55.397246856 -0700 PDT
StopTime: 2013-10-28 19:54:30.471278797 -0700 PDT
$

Ok. It took 35 seconds to insert a simple timestamp 100,000 times.

Now here is where we want to mix things up a bit. Rob mentions this thing called a goroutine in his talk. What would happen if we used 10 goroutines to insert 10,000 records each. It should be faster right?

Here is my test_many.go application, copy it.

package main
 
// schema we can use along with some select statements
// create table test ( gopher_id int, created timestamp );
// select * from test order by created asc limit 1;
// select * from test order by created desc limit 1;
// select count(created) from test;
 
import (
  "log"
  "time"
  "fmt"
  "database/sql"
  _ "github.com/lib/pq"
)
 
const (
  gophers = 10
  entries = 10000
)
 
func main() {
 
  // create string to pass
  var sStmt string = "insert into test (gopher_id, created) values ($1, $2)"
 
  // run the insert function using 10 go routines
  for i := ; i < gophers; i++ {
    // spin up a gopher
    go gopher(i, sStmt)
  }
 
  // this is a simple way to keep a program open
  // the go program will close when a key is pressed
  var input string
  fmt.Scanln(&input)
}
 
func gopher(gopher_id int, sStmt string) {
 
  // lazily open db (doesn't truly open until first request)
  db, err := sql.Open("postgres","host=localhost dbname=testdb sslmode=disable")
  if err != nil {
    log.Fatal(err)
  }
 
  fmt.Printf("Gopher Id: %v || StartTime: %v\n",gopher_id, time.Now())
 
  for i := ; i < entries; i++ {
 
    stmt, err := db.Prepare(sStmt)
    if err != nil {
      log.Fatal(err)
    }
 
    res, err := stmt.Exec(gopher_id, time.Now())
    if err != nil || res == nil {
      log.Fatal(err)
    }
 
    stmt.Close()
 
  }
  // close db
  db.Close()
 
  fmt.Printf("Gopher Id: %v || StopTime: %v\n",gopher_id, time.Now())
 
}

Compile and run it. And here are the results.

  $ go build test_many.go
$ ./test_many 
Gopher Id:  || StartTime: 2013-10-28 20:33:22.756366002 -0700 PDT
Gopher Id: 1 || StartTime: 2013-10-28 20:33:22.756647643 -0700 PDT
Gopher Id: 2 || StartTime: 2013-10-28 20:33:22.756794146 -0700 PDT
Gopher Id: 3 || StartTime: 2013-10-28 20:33:22.756918631 -0700 PDT
Gopher Id: 4 || StartTime: 2013-10-28 20:33:22.757068214 -0700 PDT
Gopher Id: 5 || StartTime: 2013-10-28 20:33:22.757215735 -0700 PDT
Gopher Id: 6 || StartTime: 2013-10-28 20:33:22.757371258 -0700 PDT
Gopher Id: 7 || StartTime: 2013-10-28 20:33:22.757492378 -0700 PDT
Gopher Id: 8 || StartTime: 2013-10-28 20:33:22.757640434 -0700 PDT
Gopher Id: 9 || StartTime: 2013-10-28 20:33:22.758553529 -0700 PDT
Gopher Id: 5 || StopTime: 2013-10-28 20:33:31.466897188 -0700 PDT
Gopher Id: 7 || StopTime: 2013-10-28 20:33:31.473265074 -0700 PDT
Gopher Id:  || StopTime: 2013-10-28 20:33:31.475096903 -0700 PDT
Gopher Id: 4 || StopTime: 2013-10-28 20:33:31.485587389 -0700 PDT
Gopher Id: 6 || StopTime: 2013-10-28 20:33:31.500614304 -0700 PDT
Gopher Id: 3 || StopTime: 2013-10-28 20:33:31.5006651 -0700 PDT
Gopher Id: 8 || StopTime: 2013-10-28 20:33:31.501664917 -0700 PDT
Gopher Id: 1 || StopTime: 2013-10-28 20:33:31.510199788 -0700 PDT
Gopher Id: 2 || StopTime: 2013-10-28 20:33:31.522350833 -0700 PDT
Gopher Id: 9 || StopTime: 2013-10-28 20:33:31.523710669 -0700 PDT
$

Nice! Roughly 9 seconds for 100,000 records.

Lets change our test_many.go application to insert 1,000,000 records. Simply as a way to keep the application running longer.

...snip...
 
const (
  gophers = 10
  entries = 100000
)
 
...snip...

Now go ahead and build the application and run it.

Then in another terminal window, open up top and search for the test_many command.

PID    COMMAND      %CPU TIME     #TH   #WQ  #PORT #MREGS MEM    RPRVT  PURG   CMPRS  VPRVT  VSIZE  PGRP  PPID  STATE    UID       FAULTS    COW     MSGSENT    MSGRECV
89478  test_many    0.0  00:12.39 15        84    1448   140M   139M   0B     0B     358M   139G   89478 88263 sleeping 672251097 36630     95      110        51

_You will see that under the #TH column that our application, testmany, has 15 threads open. 10 of these, are the threads we created using our goroutine.

UPDATE: That statement was not true. I’ve learned after working with go that while using goroutines, threads are dynamically assigned. It is not a 1 to 1 correlation.

However, Rob does have a talk where he mentions some more details. At minute 8:56 he writes

“But if you think of it as a very cheap thread, you won’t be far off”

Anyway, this is the section of the test_many.go file where the magic happens.

...snip...
 
  for i := ; i < gophers; i++ {
    // spin up a gopher
    go gopher(i, sStmt)
  }
 
...snip...

You see, simply by placing the “go” call in front of our gopher function, we allow golang to do a bunch of behind the scenes awesomeness. A part of this gain, is an inexpensive and versatile thread and memory manager. This is pretty sweet.

As the last test, here is 20 gophers inserting a million rows into our database.

Change the needed constants.

...snip...
 
const (
  gophers = 20
  entries = 50000
)
 
...snip...

Then recompile and run our application.

$ ./test_many 
Gopher Id:  || StartTime: 2013-10-30 14:40:51.619452739 -0700 PDT
Gopher Id: 1 || StartTime: 2013-10-30 14:40:51.619757831 -0700 PDT
Gopher Id: 2 || StartTime: 2013-10-30 14:40:51.619896348 -0700 PDT
Gopher Id: 3 || StartTime: 2013-10-30 14:40:51.620059076 -0700 PDT
Gopher Id: 4 || StartTime: 2013-10-30 14:40:51.620283647 -0700 PDT
Gopher Id: 5 || StartTime: 2013-10-30 14:40:51.620410941 -0700 PDT
Gopher Id: 6 || StartTime: 2013-10-30 14:40:51.620550967 -0700 PDT
Gopher Id: 7 || StartTime: 2013-10-30 14:40:51.620693553 -0700 PDT
Gopher Id: 8 || StartTime: 2013-10-30 14:40:51.62085492 -0700 PDT
Gopher Id: 9 || StartTime: 2013-10-30 14:40:51.62177397 -0700 PDT
Gopher Id: 10 || StartTime: 2013-10-30 14:40:51.621900208 -0700 PDT
Gopher Id: 11 || StartTime: 2013-10-30 14:40:51.622068806 -0700 PDT
Gopher Id: 12 || StartTime: 2013-10-30 14:40:51.622233856 -0700 PDT
Gopher Id: 13 || StartTime: 2013-10-30 14:40:51.622374732 -0700 PDT
Gopher Id: 14 || StartTime: 2013-10-30 14:40:51.622520823 -0700 PDT
Gopher Id: 15 || StartTime: 2013-10-30 14:40:51.622663899 -0700 PDT
Gopher Id: 16 || StartTime: 2013-10-30 14:40:51.6228063 -0700 PDT
Gopher Id: 17 || StartTime: 2013-10-30 14:40:51.62294893 -0700 PDT
Gopher Id: 18 || StartTime: 2013-10-30 14:40:51.623085432 -0700 PDT
Gopher Id: 19 || StartTime: 2013-10-30 14:40:51.623231574 -0700 PDT
Gopher Id: 3 || StopTime: 2013-10-30 14:42:35.551828109 -0700 PDT
Gopher Id: 6 || StopTime: 2013-10-30 14:42:35.619763694 -0700 PDT
Gopher Id: 17 || StopTime: 2013-10-30 14:42:35.642630072 -0700 PDT
Gopher Id: 4 || StopTime: 2013-10-30 14:42:35.660139502 -0700 PDT
Gopher Id: 10 || StopTime: 2013-10-30 14:42:35.67527849 -0700 PDT
Gopher Id: 13 || StopTime: 2013-10-30 14:42:35.685175774 -0700 PDT
Gopher Id: 9 || StopTime: 2013-10-30 14:42:35.69144017 -0700 PDT
Gopher Id:  || StopTime: 2013-10-30 14:42:35.729137844 -0700 PDT
Gopher Id: 1 || StopTime: 2013-10-30 14:42:35.749108884 -0700 PDT
Gopher Id: 2 || StopTime: 2013-10-30 14:42:35.770752277 -0700 PDT
Gopher Id: 7 || StopTime: 2013-10-30 14:42:35.777471603 -0700 PDT
Gopher Id: 8 || StopTime: 2013-10-30 14:42:35.780713859 -0700 PDT
Gopher Id: 19 || StopTime: 2013-10-30 14:42:35.783338736 -0700 PDT
Gopher Id: 11 || StopTime: 2013-10-30 14:42:35.794922546 -0700 PDT
Gopher Id: 14 || StopTime: 2013-10-30 14:42:35.799124142 -0700 PDT
Gopher Id: 5 || StopTime: 2013-10-30 14:42:35.800538715 -0700 PDT
Gopher Id: 12 || StopTime: 2013-10-30 14:42:35.802006822 -0700 PDT
Gopher Id: 18 || StopTime: 2013-10-30 14:42:35.806211023 -0700 PDT
Gopher Id: 16 || StopTime: 2013-10-30 14:42:35.817042445 -0700 PDT
Gopher Id: 15 || StopTime: 2013-10-30 14:42:35.824304999 -0700 PDT

With a tiny golang program, we now have 1 million writes taking 1 minute and 44 seconds. Friggin sweet.

UPDATE2: Thanks to corvinusz for pointing this out. By simply moving our prepare request outside of our for loop. We can now insert 1M records in 44 seconds. Thanks dude!

...snip...
 
func gopher(gopher_id int, sStmt string) {
 
  // lazily open db (doesn't truly open until first request)
  db, err := sql.Open("postgres","host=localhost dbname=testdb sslmode=disable")
  if err != nil {
    log.Fatal(err)
  }
 
  stmt, err := db.Prepare(sStmt)
  if err != nil {
    log.Fatal(err)
  }
 
  fmt.Printf("Gopher Id: %v \t|| StartTime: %v\n",gopher_id, time.Now())
 
 
  for i := ; i < entries; i++ {
 
 
    res, err := stmt.Exec(gopher_id, time.Now())
    if err != nil || res == nil {
      log.Fatal(err)
    }
 
 
  }
 
  stmt.Close()
  // close db
  db.Close()
 
  fmt.Printf("Gopher Id: %v \t|| StopTime: %v\n",gopher_id, time.Now())
 
}
 
...snip...
$ ./test_many_pool 
Gopher Id: 8   || StartTime: 2013-10-31 08:52:45.56084829 -0700 PDT
Gopher Id: 6   || StartTime: 2013-10-31 08:52:45.561137992 -0700 PDT
Gopher Id: 12  || StartTime: 2013-10-31 08:52:45.561215818 -0700 PDT
Gopher Id: 4   || StartTime: 2013-10-31 08:52:45.561288387 -0700 PDT
Gopher Id: 7   || StartTime: 2013-10-31 08:52:45.56135501 -0700 PDT
Gopher Id: 15  || StartTime: 2013-10-31 08:52:45.56148435 -0700 PDT
Gopher Id: 2   || StartTime: 2013-10-31 08:52:45.562015828 -0700 PDT
Gopher Id: 17  || StartTime: 2013-10-31 08:52:45.562434107 -0700 PDT
Gopher Id: 13  || StartTime: 2013-10-31 08:52:45.562872507 -0700 PDT
Gopher Id: 19  || StartTime: 2013-10-31 08:52:45.564599961 -0700 PDT
Gopher Id:     || StartTime: 2013-10-31 08:52:45.567561345 -0700 PDT
Gopher Id: 1   || StartTime: 2013-10-31 08:52:45.568002729 -0700 PDT
Gopher Id: 3   || StartTime: 2013-10-31 08:52:45.568259325 -0700 PDT
Gopher Id: 10  || StartTime: 2013-10-31 08:52:45.569216209 -0700 PDT
Gopher Id: 9   || StartTime: 2013-10-31 08:52:45.569317516 -0700 PDT
Gopher Id: 11  || StartTime: 2013-10-31 08:52:45.569604124 -0700 PDT
Gopher Id: 14  || StartTime: 2013-10-31 08:52:45.572600335 -0700 PDT
Gopher Id: 16  || StartTime: 2013-10-31 08:52:45.574639425 -0700 PDT
Gopher Id: 18  || StartTime: 2013-10-31 08:52:45.576122518 -0700 PDT
Gopher Id: 5   || StartTime: 2013-10-31 08:52:45.577174648 -0700 PDT
Gopher Id: 5   || StopTime: 2013-10-31 08:53:28.156243751 -0700 PDT
Gopher Id: 14  || StopTime: 2013-10-31 08:53:28.240175919 -0700 PDT
Gopher Id: 9   || StopTime: 2013-10-31 08:53:28.247910193 -0700 PDT
Gopher Id: 1   || StopTime: 2013-10-31 08:53:28.313724365 -0700 PDT
Gopher Id: 10  || StopTime: 2013-10-31 08:53:28.372717602 -0700 PDT
Gopher Id: 16  || StopTime: 2013-10-31 08:53:28.389852733 -0700 PDT
Gopher Id: 15  || StopTime: 2013-10-31 08:53:28.434203629 -0700 PDT
Gopher Id: 19  || StopTime: 2013-10-31 08:53:28.470070067 -0700 PDT
Gopher Id: 18  || StopTime: 2013-10-31 08:53:28.486355611 -0700 PDT
Gopher Id: 4   || StopTime: 2013-10-31 08:53:28.505530435 -0700 PDT
Gopher Id: 8   || StopTime: 2013-10-31 08:53:28.507830911 -0700 PDT
Gopher Id: 3   || StopTime: 2013-10-31 08:53:28.520788665 -0700 PDT
Gopher Id: 11  || StopTime: 2013-10-31 08:53:28.545292303 -0700 PDT
Gopher Id: 13  || StopTime: 2013-10-31 08:53:28.585376452 -0700 PDT
Gopher Id:     || StopTime: 2013-10-31 08:53:28.587530913 -0700 PDT
Gopher Id: 7   || StopTime: 2013-10-31 08:53:28.598791481 -0700 PDT
Gopher Id: 2   || StopTime: 2013-10-31 08:53:28.605832957 -0700 PDT
Gopher Id: 12  || StopTime: 2013-10-31 08:53:28.61655187 -0700 PDT
Gopher Id: 17  || StopTime: 2013-10-31 08:53:28.616684023 -0700 PDT
Gopher Id: 6   || StopTime: 2013-10-31 08:53:28.618116313 -0700 PDT

Something to call out, is that we are shoving our data, basically, into a black hole. We have no idea if the inserts completed successfully. Yes there is some error logic, but not enough. We are using Go to have our application function in a parallel state, or as Rob might say.

“The concurrent composition of 10 gophers inserting data into a database”

 

Rob in his talk, goes on to mention channels. Which is the logical next step for message passing here. This would bring our application into a concurrent state and if people are interested, hit me up @jf0lkins and let me know I should post more concerning this.