Unique Marketing, Guaranteed Results.

Ruby file trimming app

July 17th, 2009 by hals

We recently had an interesting experience with very large files. These were comma delimited files (.csv) containing hundreds of thousands of records, each with a dozen or so fields.

e.g.

rec1,field2,,,,,,xxx,fieldn,,,1,2,3,,,fieldx

rec2,field22,,,a,s,d,fieldmore,,,,etc

.

.

.

recn,field2n,,,,ring,,,,ring,1,2,,,hello?,,etc

While testing the setup, we had smaller files to work with. The goal was to create a new file containing only the first field from each record.

e.g.

rec1

rec2

.

.

.

recn

During testing this was easily done by opening the file in a spreadsheet program (such as OpenOffice), which would split the records on the comma delimiter and place each field in a different column. Then, it was easy to select the first column and write it out to the new file.

On switching to production files, we discovered that OpenOffice has a limit of 65k rows – a fraction of what we needed. We then tried some other spreadsheet programs, which produced the same results. We knew there was at least one spreadsheet program that would work, but it was not open source.

At this point the comment was made: “well, we ARE ruby developers …”

And that lead to the following simple solution to the problem at hand.

With a few lines of ruby code, the source files could be read in, line by line, split on the comma delimiter, and the first entry written out to the destination file.

So, when the usual tools just don’t work – remember that a new ruby tool might be just around the corner.

#!/usr/bin/ruby

#

#  trimfile.rb

#

require “rubygems”

require “ruby-debug”

class Trimfile

attr_accessor :fileName, :newFile

def initialize(fileName, newFile)

puts “\nSplit off first comma delimited item of each line.”

@fnam = fileName

if @fnam == nil then @fnam = “trimin.txt” end

@newfnam = newFile

if @newfnam == nil then @newfnam = “trimout.txt” end

linecount = 0

puts “\nFilenames – input: #{@fnam}, output: #{@newfnam}”

aFile = File.new(@newfnam, “w”)

IO.foreach(@fnam) do |line|

aFile.puts line.split(‘,’)[0]

linecount += 1

end

aFile.close

puts “\nTotal lines: #{linecount}”

end

end

test = Trimfile.new(ARGV[0], ARGV[1])


Be Sociable, Share!
Filed under: Programming,Tools — Tags: — hals @ 3:26 pm on July 17, 2009

2 Comments

  1. Was the ‘cut’ command not able to handle the file?

    Comment by Martin — July 14, 2010 @ 7:13 pm

  2. Cut would’ve worked great, but we wanted a solution in Ruby, just because.. :)

    Comment by Narshlob — August 27, 2010 @ 9:12 am

RSS feed for comments on this post. TrackBack URL

Leave a comment

Copyright © 2005-2014 PMA Media Group. All Rights Reserved