Unique Marketing, Guaranteed Results.

Making SUPERFAST THINGS in Ruby (Using C Extensions)

July 29th, 2010 by fugufish

I will address one of the primary uses for a C extension in Ruby, speed. Due to it’s very nature, Ruby is slow (as compared to compiled languages like C). It gets the job done, but sometimes it takes it’s sweet time doing it. Sometimes it is necessary to speed things up a bit, and here enter C extensions. There are several methods of implementing extensions, from the generic C extension, to ruby-inline. In this particular article I will focus on the generic C extension.

In this example, I am going to use a fairly inefficient piece of Ruby code I created a while ago for Project Euler (Problem 10) for finding the sum of all primes under 2000000:

class Integer
  
  def prime?
    return true if self == 2
    return false if (self & 1) == 0
    square = Math.sqrt(self).round + 1
    i = 1
    while i 

At the time that I wrote this, I was relatively unaware of more efficient ways of resolving prime numbers (such as a euler sieve), however the code still ran under the allotted 2 minute window (52 seconds) so I went with it. Now to speed it up. To write a C extension you need, at a bare minimum two things:

  1. an extconf.rb file - this file is used by ruby to generate the Makefile that is used to compile the extension
  2. the source file for the extension (in this case primed.c)

Here is a look at these two files for my new version of problem 10:
primed.c

#include "ruby.h"
#include 
#include 
 
VALUE Primed;

VALUE method_prime(VALUE obj, VALUE args)
{
	register uint64_t n;
	n = NUM2INT(obj);
	if (n == 2)
		return Qtrue;
	if ((n & 1) == 0)
		return Qfalse;

	register uint64_t sqrt_n = ((uint64_t)sqrt(n)) + 1;
	register uint64_t i=3;
	for (i; i

extconf.rb

# Loads mkmf which is used to make makefiles for Ruby extensions
require 'mkmf'

# Give it a name
extension_name = 'primed'

# The destination
dir_config(extension_name)

# Do the work
create_makefile(extension_name)

First let me explain primed.c. The objective of this extension is to determine whether or not a number is prime, so that an integer can call x.prime? and return true or false. It is essentially identical to the method used in the pure ruby script above. One of the first thing you may notice is this line:

VALUE Primed

VALUE is a data type defined by Ruby that represents the Ruby object in memory. It is basically a struct that contains the data related to the object. In this case, the object will represent the "Primed" module in ruby, so it will contain data about the instance methods, variables, etc. for that module. All Ruby objects are represented in C by VALUE, regardless of their type within the Ruby VM, anything else will likely result in a segfault.

Next we define the actual method to calculate whether the value is prime. Note that because we need to return a Ruby object, we set the return type as VALUE as well. QTrue and QFalse are directly representative of true and false in ruby, and also return correctly within C (QTrue will evaluate as true, QFalse will evaluate as false).

Finally we see the Init_primed method. Every time a class or module is instantiated within the Ruby VM it calles Init_name. It is here we actually instantiate the Primed module and bind the method_prime function to the Ruby method prime?. Both functions used are pretty self explanatory as to what they do, except for the last argument used in ruby_define_method which is essentially the arity or number of arguments to expect in the Ruby method. In this case, -2 actually make ruby send back self as the first argument to the method_prime function, and an array of any other arguments as the second.

Now we have all of our code. The last thing to put in place is extconf.rb:

# Loads mkmf which is used to make makefiles for Ruby extensions
require 'mkmf'

# Give it a name
extension_name = 'primed'

# The destination
dir_config(extension_name)

# Do the work
create_makefile(extension_name)

Pretty simple right? Now when you call ruby extconf.rb it will generate a Makefile that you can use to build the extension. And the final result? Using the C extension the code runs in just under 3 seconds. Still not really efficient, but it demonstrates the point. When Ruby's speed is the bottle neck, using C is a viable and easy option.

Don’t Call it “Case Equality”

July 30th, 2009 by Brett Rasmussen

I’ve recently learned to love Ruby’s “triple equals” operator, sometimes referred to as the “case equality operator”. But I stand with Hal Fulton, author of The Ruby Way, in disliking the latter term, since there’s no real equality going on with its usage. It’s also not really an operator–it’s a method–but I’m not going to complain too loudly about that one, considering that I prefer the term “relationship operator”. I’m also not opposed to “trequals”, which has a certain jeunesse doree about it. You could say “trequals” at a trendy restaurant with post-modern decor and everyone wearing black.

With one equals sign you assign a value to a variable:

composer = "Beethoven"

With two equals signs you see if two things are the same thing:

puts "9th Symphony" if melody == "Ode to Joy"

With three equal signs you get, well, essentially you get a placeholder that you can use to define arbitrary relationships between objects which you will mostly never call by hand yourself but which Ruby will call for you when you run case statements:

class Composer
  attr_accessor :works
  def initialize(*works)
    @works = works
  end

  def ===(work)
    @works.include?(work)
  end
end

The trequals operator (ok, method) returns true or false depending on a condition I’ve defined. Now I can test a given work against a bunch of composer objects using a case statement:

beethoven = Composer.new("Fur Elise", "Missa Solemnis", "9th Symphony")
mozart = Composer.new("The Magic Flute", "C Minor Mass", "Requiem")
bach = Composer.new("St. Matthew Passion", "Jesu, Joy of Man's Desiring")

case "Requiem"
  when beethoven
    process_beethoven_work
  when mozart
    process_mozart_work
  when bach
    process_bach_work
end

The trequals is called behind the scenes by Ruby. Since I’ve defined it on the Composer class to look for a matching entry in that composer’s list of works, the case statement becomes a way of running different code based on which composer wrote the work in question.

This example is contrived, of course, because if it was this simple a need you’d probably just check “some_composer.works.include?(‘Requiem’)” by hand. But the example demonstrates the crucial point, that there’s no equality being checked for. A work in no way is the composer. It’s a relationship that the case statement is checking for–the given work was written by the given composer–and it’s a relationship that I’ve defined explicitly for my own music-categorizing purposes.

That case statements work this way is yet another example of the magical and powerful stuff that characterizes Ruby. Instead of simply a strict equality match, we can now switch against multiple types, all with different definitions of what qualifies as a relationship:

class String
  def ===(other_str)
    self.strip[0, other_str.length].downcase == other_str.downcase
  end
end

class Array
  def ===(str)
    self.any? {|elem| elem.include?(str)}
  end
end

class Fixnum
  def ===(str)
    self == str.to_i
  end
end

string_to_test = "99 Monkeys"
case string_to_test
  when "99 monkeys jumping on the bed"
    do_monkey_stuff
  when ["77 Rhinos Jumped", "88 Giraffes Danced", "99 Monkeys Sang"]
    do_animal_behavior_stuff
  when 99
    do_quantity_stuff
  when /^\d+\s+\w+/
     do_regex_stuff
end

Here, if the string to be tested is the first portion of the larger string (case-insensitively speaking), if it is part of any of the elements in the specified array, if it starts out with 99 (string.to_i returns only leading integers), or if it matches the given regular expression, the respective code will be run. In this case, it matches all of them, so only the code for the first case–the string match–will be run (in Ruby, switches automatically stop at the first match, so you don’t need to give each case its own “end” line).

Note that I didn’t need to define (actually, override) the trequals on the regular expression. The relationship operator is a method on Object, so all Ruby objects inherit it. If not overridden, it defaults to a simple double-equals equality check (thus contributing to the momentum of the misnomer “case equality”). But some standard Ruby classes already come with their own definition for trequals. Regexp and Range are the notable examples: Regexp defines it to mean a match on that regular expression, and Range defines it to mean a number that falls somewhere within that range, as such:

num = 77
case num
  when 1..50
    puts "found a lower number"
  when 51..100
    puts "found a higher number"
end

Note that since === is really a method, it is not commutative, meaning you can’t swap sides on the call; “a === b” is not the same as “b === a”. If you think through it, it makes sense. You’re really calling “a.===(b)”. If a is an array, you’re calling a method on Array, which will be defined for Array’s own purposes. If b is a string, and you swapped the order, you’d be calling a String method, which would have a different purpose for its trequals operator, so “b.===(a)” would most likely be something quite different. This concept also means that the variable you’re testing in a case statement is being passed as a parameter to the trequals methods of the various case objects, not the other way around. These two snippets are equivalent:

case "St. Matthew Passion"
  when mozart
    process_mozart_work
end

process_mozart_work if mozart === "St. Matthew Passion"

Note that the second snippet was not

process_mozart_work if "St. Matthew Passion" === mozart

It’s also good (although I’m not sure how useful) to know that the relationship operator is used implicitly by Ruby when rescuing errors in a begin-rescue block.

begin
  do_some_stuff
rescue ArgumentError, SyntaxError
  handle_arg_or_syn_error
rescue IOError
  handle_io_error
rescue NoMemoryError
  handle_mem_error
end

In this example, Ruby runs ArgumentError.===, passing it the global variable $!, which holds the most recent error. If that returns false, it moves along, doing the same with SyntaxError, IOError, and NoMemoryError, each in turn. With errors, the trequals is defined to just compare the class of the error that occurred with that of each candidate class (in this case, ArgumentError, etc.) and its ancestors.

It took me a long time before I cared about this little Ruby feature, which I think is sad. I think I just saw the phrase “case equality” and thought something like “Hmm, another subtle variation on what it means for two objects to be equal. I’m sure I’ll have occasion to use this someday. I’ll figure it out then.” But it’s more useful than that, and I think it would get better traction without the specious nomenclature.

Ruby file trimming app

July 17th, 2009 by hals

We recently had an interesting experience with very large files. These were comma delimited files (.csv) containing hundreds of thousands of records, each with a dozen or so fields.

e.g.

rec1,field2,,,,,,xxx,fieldn,,,1,2,3,,,fieldx

rec2,field22,,,a,s,d,fieldmore,,,,etc

.

.

.

recn,field2n,,,,ring,,,,ring,1,2,,,hello?,,etc

While testing the setup, we had smaller files to work with. The goal was to create a new file containing only the first field from each record.

e.g.

rec1

rec2

.

.

.

recn

During testing this was easily done by opening the file in a spreadsheet program (such as OpenOffice), which would split the records on the comma delimiter and place each field in a different column. Then, it was easy to select the first column and write it out to the new file.

On switching to production files, we discovered that OpenOffice has a limit of 65k rows – a fraction of what we needed. We then tried some other spreadsheet programs, which produced the same results. We knew there was at least one spreadsheet program that would work, but it was not open source.

At this point the comment was made: “well, we ARE ruby developers …”

And that lead to the following simple solution to the problem at hand.

With a few lines of ruby code, the source files could be read in, line by line, split on the comma delimiter, and the first entry written out to the destination file.

So, when the usual tools just don’t work – remember that a new ruby tool might be just around the corner.

#!/usr/bin/ruby

#

#  trimfile.rb

#

require “rubygems”

require “ruby-debug”

class Trimfile

attr_accessor :fileName, :newFile

def initialize(fileName, newFile)

puts “\nSplit off first comma delimited item of each line.”

@fnam = fileName

if @fnam == nil then @fnam = “trimin.txt” end

@newfnam = newFile

if @newfnam == nil then @newfnam = “trimout.txt” end

linecount = 0

puts “\nFilenames – input: [email protected]}, output: [email protected]}”

aFile = File.new(@newfnam, “w”)

IO.foreach(@fnam) do |line|

aFile.puts line.split(‘,’)[0]

linecount += 1

end

aFile.close

puts “\nTotal lines: #{linecount}”

end

end

test = Trimfile.new(ARGV[0], ARGV[1])


Asynchronous Processing with Workling and Starling

June 18th, 2009 by fugufish

When working with applications whose actions may take some time to complete, it may be better to  handle the request asynchronously. A quick and easy way to do this is using Starling and Workling. Starling is a light weight message queue based on the Memcache protocol, and Workling is a simple, lightweight consumer. Setup is dead simple:

First, install Starling:

 sudo gem install starling 

This will install Starling and it’s dependencies (memcache-client and eventmachine) if you don’t already have them.

Now install Workling. This doesn’t have a gemspec so we will install it as a plugin:

cd ~/path_to_your_project
script/plugin install git://github.com/purzelrakete/workling.git

Finally, tell Workling, which will want to use Spawn by default if it is installed on your machine, to use Starling by placing this in your environment.rb:

Workling::Remote.dispatcher = Workling::Remote::Runners::StarlingRunner.new

That is it for the installation process! Easy. Now for actually handling requests. Believe it or not, it is just as simple as the installation. Say you have a controller that has to do several long running tasks:

class SkinnyController 

Now typically, you should avoid doing things that take longer than a few seconds to complete. And this is okay for most application requirements, however in some cases, it is inevitable that a few tasks will take much longer, such as above. That is where Workling comes in. Simply refactor the code into a worker (conveniently located in app/workers):

# app/workers/fat_worker.rb
class FatWorker 

Now, in your controller, call the worker:

class SkinnyController 

Just start up starling and workling (starling start, and script/workling_client start respectively) And that is all. You can now handle large tasks asynchronously, and because the tasks are queued with starling, the action can be called multiple times, and it will queue up the worker and process it as soon as the previous tasks are complete.

The Scan() Method for Regular Expressions

June 16th, 2009 by Chris Gunnels

As I was writing a simple script to display the education I received from reading The Ruby Way 2nd Edition chapters 2 and 3, I found a really neat method that helped me complete my task. If you didn’t read the title of this blog post then your out of luck, but if you read the title then you will know that I am talking about the scan() method.

Back to my script, since I wanted to find the number of occurrences white space showed up in a given string I had to come up with a way to count white space. My first thought was to do some regular expression matching. Well after a little thought and a lot of reading, I found a this:
Read the rest of this entry »

Setting PowerDNS To Ignore Records For Downed Web Sites

June 10th, 2009 by Aaron Murphy

If you are using PowerDNS for round robin on multiple websites, you can set it up so that it will return only the records for sites that are up. I set up a Ruby daemon to monitor sites and connect to the PowerDNS MySQL table used by PowerDNS on Rails. You can use any language or system you like. It just needs to be able to access your PowerDNS database. Read the rest of this entry »

Regular Expression Lookahead

June 10th, 2009 by Aaron Murphy

Using standard regular expressions is pretty easy for most tasks. However, there is one task that requires lookaheads. I am referring to using negative lookahead to check for strings that do not follow a desired match. The syntax is

(?!someregexp)

where “someregexp” is a regular expression to match. The negative lookahead will reverse the logic for you. Read the rest of this entry »

Why Test?

June 10th, 2009 by fugufish

I come to PMA from a large corporation over 40,000 employees strong, including an entire army of QA Engineers testing every change and release we made, something we took full advantage of. At first, I was of the opinion (as many BDD converts) that the process of defining the and testing the code before actually writing the code would slow me down. As I moved more and more to BDD however, I found that I was completing tasks faster. The time saved comes from the ability to define how you expect your application to work. By doing this you will find that your actual code requires much less debugging. Things just seem to work. It continuously surprises me using BDD that things just work, so instead of spending hours looking for a mistyped association, I can spend those hours in actually coding.

Even with an army of QA Engineers, some bugs will sneak through. QA time on untested code takes longer, and debugging the code even longer than that. The release process can go from a day to several days, or even weeks.

With BDD, when new features are added to the application, it as easy as running your spec or test suite to ensure that the original functionality is undamaged. It of course seems like a no brainer to me now. It’s like looking back and remembering when you thought the world was flat, and seeing how narrow minded you were. The moral of the story? Test before you code! You fill find yourself with more time, and less headaches.

Smarter Sequencing in Factory Girl

May 29th, 2009 by Brett Rasmussen

Hal Shearer and I monkey-patched Factory Girl’s sequencing capabilities to allow for pre-defined enumerations to loop through, instead of just infinitely incrementing numbers.

So instead of doing this:

  Factory.sequence :email do |n|
    "[email protected]"
  end

you could do something like this:

  Factory.sequence(:email, ['angela', 'brett', 'alec']) do |name|
    "[email protected]"
  end

It will start over at the beginning when it’s gone through all of them:

>> Factory.next :email
=> "[email protected]"
>> Factory.next :email
=> "[email protected]"
>> Factory.next :email
=> "[email protected]"
>> Factory.next :email
=> "[email protected]"
>> Factory.next :email
=> "[email protected]"

You can also hand it a range (the internal implementation on this is none too efficient, so don’t give it billions at a time):

Factory.sequence(:email, 50..60) do |n|
  "[email protected]"
end

>> Factory.next :email
=> "[email protected]"
>> Factory.next :email
=> "[email protected]"
>> Factory.next :email
=> "[email protected]"

The infinitely incrementing counter is still available if you want it:

Factory.sequence(:email, %w[angela brett alec]) do |name,i|
  "[email protected]"
end

>> Factory.next :email
=> "[email protected]"
>> Factory.next :email
=> "[email protected]"
>> Factory.next :email
=> "[email protected]"
>> Factory.next :email
=> "[email protected]"
>> Factory.next :email
=> "[email protected]"

This sort of thing is useful when you want two different factories to use the same sequence and have some overlap between the two groups. For example, we need a bunch of email addresses to test on, many of which share the same domain:

Factory.sequence(:name, %w[angela brett alec hal debbie tracey jared]) do |name,i|
  "#{name}_#{i}"
end

Factory.sequence(:domain, %w[something.com example.com mydomain.com]) do |domain|
  domain
end

Factory.define(:email_address) do |f|
  f.address { "#{Factory.next(:name)}@#{Factory.next(:domain)}" }
end

>> 20.times { ea = Factory.build :email_address; puts ea.address }
[email protected]
[email protected]
[email protected]
[email protected]
[email protected]
[email protected]
[email protected]
[email protected]
[email protected]
[email protected]
[email protected]
[email protected]
[email protected]
[email protected]
[email protected]
[email protected]
[email protected]
[email protected]
[email protected]
[email protected]

For our last trick, the reset method returns both the looping index and the infinite counter back to zero:

>> Factory.reset :name
>> Factory.next :name
=> "angela_0"

Here’s the code to make it happen:

class Factory
  def self.sequence(sequence_name, enum = nil, &blk)
    @@sequences ||= {}

    enum = enum.to_a

    @@sequences[sequence_name] = {
      :enum => enum,
      :index => 0,
      :infinite_counter => 0,
      :template  => blk
    }
  end

  def self.next(sequence_name)
    seq = @@sequences[sequence_name]

    retval = case seq[:template].arity
      when 1
        seq[:template].call(seq[:enum][seq[:index]])
      when 2
        seq[:template].call(seq[:enum][seq[:index]], seq[:infinite_counter])
    end

    seq[:index] = (seq[:index]+1 == seq[:enum].size) ? 0 : seq[:index]+1
    seq[:infinite_counter] += 1
    @@sequences[sequence_name] = seq
    retval
  end

  def self.reset(sequence_name)
    @@sequences[sequence_name][:index] = 0
    @@sequences[sequence_name][:infinite_counter] = 0
  end
end

Just put that into some file–perhaps in your rails lib directory–and make sure that file gets required–probably in your rails config/environment.rb. When doing it by hand like this, you’ll want to make sure your library file is loaded after the factory_girl gem is loaded, or you’ll get weirdness like methods you’ve overridden acting in non-overridden ways and the like; config.after_initialize in your environment.rb’s Rails::Initializer block is your friend.

You can also now use the gem BrettRasmussen-factory_girl from gems.github.com. I mean to submit it as a patch back to the original factory_girl, which I’m sure I’ll have time to do Any Day Now.

Chuck’s Ruby Indexer

May 22nd, 2009 by Chuck Wood

I was messing around with ruby a little and decided to write a little indexer that would tell me what the most common words were in my files. It’s really kind of a dumb program, but it was interesting what it turned out when I ran it against my code. The most common word was frequently the word ‘the.’ ‘the’ is not a commonly used variable or function in Ruby. So I looked at my code and realized that it was heavily commented, yielding frequent ‘the’s.

That being said, I’m curious to see what other people find running the indexer against their code. You can get it at http://github.com/woody2shoes/indexer/tree/master. Please comment and let me know what your code looks like.
Read the rest of this entry »

Copyright © 2005-2016 PMA Media Group. All Rights Reserved &nbsp