|
|
Using the Interpreter Pattern for Run-time QueriesBy Kyle Brown, Knowledge Systems Corporation |
|
It used to be that we had two ways to build systems: the quick-and-dirty way using something like Visual Basic and the long-and-hard way using a more robust tool like C++ or Smalltalk. The difference was that the departmental tools had programming tools like wizards and point-and-click GUI editors that made programming easier in the small scale. On the other hand, OO languages had the underlying support for inheritance, polymorphism and encapsulation that made programming in the large easier. With the introduction of powerful new programming tools like VisualAge for Smalltalk, this distinction becomes less clear. Now we have a tool that combines the point-and-click visual programming model with the full power and flexibility of Smalltalk. It gives programmers the productivity of the departmental-scale tools and the ability to model complex problem domains that makes Smalltalk such a wonderful tool. Even though VisualAge for Smalltalk is such a powerful tool, though, I've found that many programmers don't take full advantage of its capabilities. For instance, let's consider a simple class of applications that involve querying from a database and displaying the results on the screen. Many programmers will simply draw a screen using the VisualAge composition editor, drop a Database part on the canvas, connect the Database part to a Table part and declare the application done. While this approach is very quick and easy, it doesn't take full advantage of the modeling capabilities of Smalltalk. In a recent article in The Smalltalk Report (Brown (1996a)), I discussed how applications can be built more flexibly if they use a four-layer approach that separates the different concerns in an application. It advocated that you create distinct sets of objects representing the UI, the problem domain, and the database infrastructure. It also advocated that communication between these sets of objects be closely controlled to minimize the effect that changes in each layer have on the others. While this approach does result in more flexible applications, it does have a few ramifications that you have to consider. Several of these have been addressed in the "Crossing Chasms" pattern language described in Brown (1996b). One ramification that doesn't have an obvious solution is how to do user-defined queries on a collection of Smalltalk objects. When you are working directly from a database, a user-defined query is not very different from a predefined query. SQL is still SQL whether the user types it in, or if it is part of the code of an application. In Smalltalk, however, there is a big difference. Let's look at an example of what I mean. Let's say we're dealing with a health insurance application that maintains a list of patients. Our patient is defined in Smalltalk by the following class definition: Object subclass: #Patient
instanceVariableNames: 'name streetAddress city state
zip sex dateOfBirth policy '
classVariableNames: ''
poolDictionaries: ''
Oftentimes you have a situation where you need to get back a list of patients whose attributes match a specific value. For example, you may want to find all of the patients who live in Cleveland. This is easy to do in Smalltalk, with the select: method of Collection. aCollectionOfPatients select: [:each |
each city = 'Cleveland'].
Even more complex queries are easy for example, the following code gives us back all of the patients in the list who live in Ohio, but not in Cleveland. aCollectionOfPatients select:[:each | (each state = 'OH')
and:[each city ~= 'Cleveland']]
This is all well and good so long as you know the structure of your queries ahead of time. But then the question arises what if you DON'T know the particular query you want to run ahead of time? What if you won't know until runtime how the query will be constructed? Over the years I've been programming in Smalltalk, I've seen two general solutions to this problem. Neither of them is perfect, and they each have their advantages and disadvantages. Nonetheless, one particular solution had more in its favor than the other, as you will see. The first solution that I saw to this problem was more popular in the early years of Smalltalk than it is now. In the early years of Smalltalk everything was new and pretty much everything was fair game. Many of these "early adopters" of the technology were in corporate R&D labs or universities that didn't have the commercial focus that today's Smalltalk programmers have. A result of this attitude was that solutions didn't have to be economically viable to promote themselves. When early Smalltalk adopters faced this problem, their first response was: "Well, you have the Compiler written in Smalltalk already, right? So why not just generate the appropriate Smalltalk code to build your query and then execute it?" Actually, this is a very simple and easy to implement solution. It also has (not surprisingly) pretty good runtime performance. However, there is one major drawback to this solution. Compiling code at runtime requires that the Compiler classes be included with the file shipped to the customer. In VisualAge for Smalltalk, the packager will automatically strip out the compiler classes. To make this approach work you would have to purchase a full VisualAge license for each customer of your finished application. This is obviously not a very economical solution. Also, since your end-users would be, in effect, doing development, source-code management becomes a major issue. You have to decide if you allow the changes file to grow, or if you will use a trick like dynamic subclassing to keep that from happening. These two problems together make this first option useless for most applications. There were other solutions that I saw to this problem. Usually they would involve nested ifTrue:ifFalse: cases to try to anticipate all of the various possibilities of an end-user query. However, none of them really solved the problem in a clean and extensible fashion. Then, one day I had an epiphany. I was reading the book "Design Patterns" (Gamma (1995)) and came across the pattern Interpreter. Honestly, the write up of the pattern in Gamma (1995) didn't help me much the first time I read it. However, as I mulled it over in my mind it led me to a simple and elegant solution to my problem. In Gamma (1995), the intent of the Intepreter pattern is stated to be: "Given a language, define a representation for its grammar along with an interpreter that uses the representation to interpret sentences in that language." This is not very meaningful unless you're used to building compilers for a living. What my epiphany consisted of was the realization that what we wanted to give users was, in fact, a small query language. It's almost exactly like the WHERE clause of a SQL statement we needed a way to specify what objects to select from a collection. Once I saw the light of building a query language, implementing the code to execute the statements in that language was fairly simple it just required an understanding of how the Interpreter pattern worked. My epiphany took the form of two observations.
To address the first problem, we needed to be able to easily obtain the value of any attribute of an object. This is straightforward if that object's class implements getter methods for that particular attribute. If we know the name of the getter method, we can find the value of that attribute by using the method perform: to execute the method. For example: aPatient perform: #name will return the name of a patient. Since the argument to perform: is a Symbol we don't need to know it ahead of time we can simply provide it when it is needed. This brings us to the simplest part of our query language equality tests. Let's say in our language we want to interpret sentences of the form: (aspectName) = (someValue), i.e.
city = 'Cleveland'
Using the Intepreter pattern we can implement the following class hierarchy to deal with sentences like this.
Figure 1 The class AbstractNode defines a method called evaluateAgainst: that all of its concrete subclasses must override. This method is the key to the Interpreter pattern. The basic idea is that you will construct parse trees at runtime, from your query language. You will then send the root node of that parse tree the message evaluateAgainst:, which says: "Go figure out what your value is in this particular context." To understand this fully, let's look at the implementations of this method in the three classes listed above. ValueNode>>evaluateAgainst: aPatient
^value
AttributeNode>>evaluateAgainst: aPatient
^self perform: attributeName
EqualsNode>>evaluateAgainst: aPatient
^(leftNode evaluateAgainst: aPatient) =
(rightNode evaluateAgainst: aPatient)
When asked to evaluate itself, a ValueNode just returns the value that it holds. When an AttributeNode is asked to evaluate itself, it returns the result of performing its attributeName against the argument. Finally, an EqualsNode returns the result of comparing the equality of the result of evaluating its left child and its right child. To see how this works, let's take the example we had above, and see how we would construct a parse tree for that statement. | equals value attribute patients|
"Patient>>patientList returns a list of patients."
patientList := Patient examplePatients.
equals := EqualsNode new.
value := ValueNode new.
attribute := AttributeNode new.
attribute attributeName: #city.
value value: 'Cleveland'.
equals leftChild: attribute;
rightChild: value.
^patientList select:[:each |
equals evaluateAgainst: each].
Now some of you may be saying: "But that's no better than the previous solution! You still have to know the structure of your query ahead of time!" No, this is not really true. Another idea crucial to the Interpreter pattern is that once you have set up the classes for the nodes, the actual building of the parse tree can be deferred until run time. In actuality, you would have to build a parser that could read some user-provided representation of the query (say the text of the query) and generate the appropriate parse-tree nodes from that representation. If your language is sufficiently simple, then hand-coding such a parser is pretty straightforward. If it gets more complicated, you may want to look into using the free TGen parser generator program from the UIUC Smalltalk Archives. The above solution works fine for the simple queries of the type we defined earlier. If we wanted to handle more complex queries involving AND and OR, we would need to refactor and expand our design to the following one:
Figure 2 This design factors out the leftChild and rightChild instance variables of EqualsNode to an abstract class BinaryNode. Two new subclasses of BinaryNode are AndNode and OrNode. The implementation of evaluateAgainst: in each of them is similar to that in EqualsNode. For instance see the following implementation of AndNode. AndNode>>evaluateAgainst: aPatient
^(leftNode evaluateAgainst: aPatient) and:
[rightNode evaluateAgainst: aPatient]
You may have noted that in the above design the only comparison operator we have is equality. Adding more comparison operators is straightforward. LessThanNode, GreaterThanNode and NotEqualsNode are simple to implement, and very similar to EqualsNode. Note that if you introduce less than and greater than, you need to have some level of type checking to make sure that you don't get type conflicts. More information on how to do this can be found in the pattern Visitor in Gamma (1995). The above solution using the Interpreter pattern has several advantages. First and foremost, it doesn't require the Compiler classes be present in your image to work. It works just as well in a stripped runtime image as a development image. Secondly, the code is very clean and extensible. If you want to add new query nodes, you simply extend the class hierarchy and then add the new statements to the parser code. The major drawback of this implementation is that the runtime performance of the queries is less than that of a directly coded select: statement. It will work well for small collections, but shouldn't be used in high-volume transaction processing environments. BibliographyBrown (1996a), Kyle Brown, "Remembrance of things past: Layered architectures for Smalltalk applications." The Smalltalk Report. Jul./Aug. 1995 Brown (1996b), Kyle Brown and Bruce Whitenack, "A Pattern Language for Relational Databases and Smalltalk." Object Magazine. Oct. 1996 Gamma (1995), Erich Gamma, Richard Helm, Ralph Johnson and John Vlissides, Design Patterns: Reusable Elements of Object Oriented Software, Addison-Wesley Publishing Company, Reading MA. Enjoy the article? Subscribe to Eye on Objects! |
|
Kyle Brown is a Senior Member of Technical Staff at Knowledge Systems Corporation (KSC). Kyle has been building systems in Smalltalk, large and small, for over seven years. During this time he has taught the principles of Smalltalk programming and OO design through KSC's Smalltalk Apprentice program, as well as through conference presentations and magazine articles. He is the co-author (with Bobby Woolf and Sherman Alpert) of the upcoming Addison-Wesley book "Design Patterns: The Smalltalk Companion". |
|