Ludicrous Speed Postgres Black Friday Sale! (Ends 12/1)
Learn how to quickly load data into PostgreSQL with various techniques using Ruby. We discuss the best ways to load terabytes of data.
Load data one row at a time using parameterized queries of the pg gem.
#!/usr/bin/env ruby
require 'pg'
require 'faker'
require 'date'
require 'csv'
def time_rand rt, from = 0.0, to = Time.now
Time.at(from + rt.rand * (to.to_f - from.to_f))
end
conn = PG.connect(
dbname: "test",
port: 5432,
user: "creston",
password: ENV["POSTGRES_DEV_PASSWORD"]
)
dir = "/tmp/"
counts = {}
starting_at = Time.now
puts "#{starting_at}: Starting"
table_name = "posts"
counts[table_name] = 100000
i = 1
rt = Random.new
puts "#{Time.now}: Starting #{table_name}"
while i <= counts[table_name]
conn.exec_params(
"insert into posts (id, title, content, published_at, type)
values ($1, $2, $3, $4, $5);",
[
i,
Faker::Book.title,
Faker::Lorem.paragraphs(3).join(","),
time_rand(rt, Time.local(2010, 1, 1)),
Faker::Book.genre
])
i += 1
puts "#{Time.now}: #{i} :: #{i/(Time.now.to_i - starting_at.to_i)}" if i % 1000 == 0
end
puts "#{Time.now}: Finished #{table_name}"
Load one row of data at a time without using parameterized queries. Only show the changes from the script above.
conn.exec "
insert into posts (id, title, content, published_at, type)
values (#{i}, '#{Faker::Book.title.gsub("'","''")}', '#{Faker::Lorem.paragraphs(3).join(",")}', '#{time_rand(rt, Time.local(2010, 1, 1))}', '#{Faker::Book.genre}');"
Load ten rows of data at a time with a nested loop. Only show the changes from the original script above. In the video, we tested up to 500 rows of data per insert statement.
i = 0
rt = Random.new
puts "#{Time.now}: Starting #{table_name}"
while i <= counts[table_name]
sql = "insert into posts (id, title, content, published_at, type) values"
for j in i..(i+9)
sql += "(#{j}, '#{Faker::Book.title.gsub("'","''")}', '#{Faker::Lorem.paragraphs(3).join(",")}', '#{time_rand(rt, Time.local(2010, 1, 1))}', '#{Faker::Book.genre}'),"
end
conn.exec sql.chomp(",")
i += 10
puts "#{Time.now}: #{i} :: #{i/(Time.now.to_i - starting_at.to_i)}" if i % 1000 == 0
end
Instead of loading data directly, generate a CSV file using Ruby. Only show the changes from the original script above. Then, load that CSV file using PostgreSQL's copy command.
CSV.open(dir + "#{table_name}.csv", "wb") do |csv|
while i <= counts[table_name]
csv << [i, Faker::Book.title, Faker::Lorem.paragraphs(3).join(","), time_rand(rt, Time.local(2010, 1, 1)), Faker::Book.genre]
i += 1
puts "#{Time.now}: #{i} :: #{i/(Time.now.to_i - starting_at.to_i)}" if i % 1000 == 0
end
end
copy posts (id, title, content, published_at, type)
from '/tmp/posts.csv'
CSV DELIMITER ',';