Skip to content

Deciding once

In Fixing dispatch, we refactored some code that dispatched things from a switch-statement or cascading if-statements to dispatching by polymorphism.

This time, we'll refactor a different piece of dispatch in a completely different way, and cover another design principle.

First, the code we left off with has an obvious flaw. The state variable is essentially a global variable, but luckily, different sections of code don't communicate with each other through it, and clean it up after themselves. So, we'll remove that global and give an instance variable to each State class that needs it.

Second, we have another dispatch structure that we can refactor, where we are dispatching based on the type of character:

if (Character.isDigit(ch)) {
    type = ChType.DIGIT;
} else if (ch == '"') {
    type = ChType.QUOTE;
} else if (ch == ' ' || ch == '\n') {
    type = ChType.SPACE;
} else if (ch == '\\') {
    type = ChType.BSLASH;
} else {
    type = ChType.OTHER;
}

But we also dispatch on the type of character in several other places, like this:

class Initial extends State {
    State eat(char ch, ChType type) throws Unexpected {
        switch (type) {
        case DIGIT:
            return new InNumber(ch);
        case QUOTE:
            return new InString();
        case SPACE:
            return this;
        default:
            throw new Unexpected(ch, this);
        }
    }
}

In truth, this split dispatch existed in the original version. However, as we have refactored, the split has become wider and more obvious.

A good rule of thumb is to dispatch on a given piece of data as few times as possible—preferably just once.

In order to do that, we'll first move the call to .eat() into the dispatch structure in our main loop. The type variable is no longer needed and can go away:

if (Character.isDigit(ch)) {
    state = state.eat(ch, ChType.DIGIT);
} else if (ch == '"') {
    state = state.eat(ch, ChType.QUOTE);
} else if (ch == ' ' || ch == '\n') {
    state = state.eat(ch, ChType.SPACE);
} else if (ch == '\\') {
    state = state.eat(ch, ChType.BSLASH);
} else {
    state = state.eat(ch, ChType.OTHER);
}

Now that the second parameter in each of those calls is fixed (as in, not variable), each call site can be refactored in the following way: instead of calling a single, generic .eat() method, introduce a new method to eat each type of character.

Dealing with digits first, we'll refactor the Initial class to look like this (also adding the appropriate method to State):

class Initial extends State {

    State eat(char ch, ChType type) throws Unexpected {
        switch (type) {
        case DIGIT:
            return new InNumber(ch);
        case QUOTE:
            return new InString();
        case SPACE:
            return this;
        default:
            throw new Unexpected(ch, this);
        }
    }

    State eatDigit(char ch) throws Unexpected {
        ChType type = DIGIT;
        switch (type) {
        case DIGIT:
            return new InNumber(ch);
        case QUOTE:
            return new InString();
        case SPACE:
            return this;
        default:
            throw new Unexpected(ch, this);
        }
    }
}

Now, the cases that each method handles can be reduced. The .eat() method loses the case for DIGIT, and the .eatDigit() methods loses all cases except DIGIT:

class Initial extends State {

    State eat(char ch, ChType type) throws Unexpected {
        switch (type) {
        case QUOTE:
            return new InString();
        case SPACE:
            return this;
        default:
            throw new Unexpected(ch, this);
        }
    }

    State eatDigit(char ch) throws Unexpected {
        return new InNumber(ch);
    }
}

Continuing that for every State allows us to write the main dispatch structure like this:

if (Character.isDigit(ch)) {
    state = state.eatDigit(ch);
} else if (ch == '"') {
    state = state.eat(ch, ChType.QUOTE);
} else if (ch == ' ' || ch == '\n') {
    state = state.eat(ch, ChType.SPACE);
} else if (ch == '\\') {
    state = state.eat(ch, ChType.BSLASH);
} else {
    state = state.eat(ch, ChType.OTHER);
}

If we do that for every type of character, we'll end up with this:

if (Character.isDigit(ch)) {
    state = state.eatDigit(ch);
} else if (ch == '"') {
    state = state.eatQuote(ch);
} else if (ch == ' ' || ch == '\n') {
    state = state.eatSpace(ch);
} else if (ch == '\\') {
    state = state.eatBackslash(ch);
} else {
    state = state.eatOther(ch);
}

Keeping the original .eat() method around through these refactors allows us to use it a good default case when a specific state doesn't care to override methods for individual types of characters.

Our final tokenizer then looks like this:

import java.io.FileReader;

class tok {

    class Unexpected extends Exception {
        Unexpected(char ch, State state) {
            super("Unexpected: character '" + ch + "' in state " + state);
        }
    }

    class State {

        State eat(char ch) throws Unexpected {
            throw new Unexpected(ch, this);
        }

        State eatDigit(char ch) throws Unexpected {
            return eat(ch);
        }

        State eatQuote(char ch) throws Unexpected {
            return eat(ch);
        }

        State eatSpace(char ch) throws Unexpected {
            return eat(ch);
        }

        State eatBackslash(char ch) throws Unexpected {
            return eat(ch);
        }

        State eatOther(char ch) throws Unexpected {
            return eat(ch);
        }
    }

    class Initial extends State {

        State eatDigit(char ch) {
            return new InNumber(ch);
        }

        State eatQuote(char ch) {
            return new InString();
        }

        State eatSpace(char ch) {
            return this;
        }
    }

    class InNumber extends State {
        String image = "";

        InNumber(char ch) {
            image += ch;
        }

        State eatDigit(char ch) {
            image += ch;
            return this;
        }

        State eatSpace(char ch) {
            System.out.println("INT(" + image + ")");
            return new Initial();
        }
    }

    class InString extends State {
        String image;

        InString() {
            image = "";
        }

        InString(String image) {
            this.image = image;
        }

        State eat(char ch) {
            image += ch;
            return this;
        }

        State eatQuote(char ch) {
            System.out.println("STRING(" + image + ")");
            return new AfterString();
        }

        State eatBackslash(char ch) {
            return new Escaping(this.image);
        }
    }

    class AfterString extends State {

        State eatSpace(char ch) {
            return new Initial();
        }
    }

    class Escaping extends State {

        String image;

        Escaping(String image) {
            this.image = image;
        }

        State eat(char ch) {
            return new InString(image + ch);
        }
    }

    void tokenize() throws Exception {
        State state = new Initial();
        FileReader reader = new FileReader("demo.txt");

        while (true) {
            int cp = reader.read();
            if (cp == -1)
                break;
            char ch = (char) cp;

            if (Character.isDigit(ch)) {
                state = state.eatDigit(ch);
            } else if (ch == '"') {
                state = state.eatQuote(ch);
            } else if (ch == ' ' || ch == '\n') {
                state = state.eatSpace(ch);
            } else if (ch == '\\') {
                state = state.eatBackslash(ch);
            } else {
                state = state.eatOther(ch);
            }
        }
    }

    public static void main(String[] args) throws Exception {
        new tok().tokenize();
    }
}

Trackbacks

No Trackbacks

Comments

Display comments as Linear | Threaded

No comments

The author does not allow comments to this entry

Add Comment

E-Mail addresses will not be displayed and will only be used for E-Mail notifications.
To leave a comment you must approve it via e-mail, which will be sent to your address after submission.
Enclosing asterisks marks text as bold (*word*), underscore are made via _word_.
Form options