Project: Huffman Coding Using Priority Queues

Huffman Coding Project

Review of char Primitive and Methods in Java

The following char and String methods and techniques will come in handy for Huffman Coding application.

Convert a String Into a char Array

Here is how to convert a String to a char array:

String str = "abcde";
char [ ] array = str.toCharArray();

Using a char as a Number

In java, a char can be used as a number. For example, if a char contains an ASCII character, we can use it as an index for an array. The following example will print 3 because the ASCII value for 'a' is decimal 97.


public static void main(String [ ] args) {
int [] frequencyTable = new int[256]; // ASCII table is 256 characters max (7 bits for the characters and 1 parity bit)
frequencyTable['a'] = 1; // set cell number 97 to 1
frequencyTable[97] += 1; // increment cell number 97
++frequencyTable[ 0x61 ]; // increment cell number 97
System.out.println( frequencyTable['a'] ); // should print a 3
}


Each reference to the frequencyTable array, above, points to the same cell. So after setting to a 1 and incrementing it twice, a 3 will be printed by the println statement.

Counting Letter Frequencies

Given a textfile, you will need to create a class and its associated methods to count the frequency of each letter in the file. You can ignore the special characters like End-of-File (EOF). But do not ignore the space " " character. All uppercase characters will be converted to lowercase for our counting purposes.
We will use a HashMap to count the letter frequencies as shown below:

HashMap<Character, Integer> map = new HashMap<Character, Integer>();

String s = "aasjjikkk";

for (int i = 0; i < s.length(); i++) {

    char c = s.charAt(i);

    Integer val = map.get(c);

    if (val != null) {

        map.put(c, new Integer(val + 1));

    }

    else {

       map.put(c, 1);

   }

}

An Alternate Algorithm for Counting Letter Frequencies

Instead of using a Map object, we can alternatively count frequencies by creating an integer array of size 256 (the number of characters in the ASCII set).
Then, as each character is encountered in the input file, we can increase its count in the corresponding cell. For example if a "g" is encountered, we can code:

++frequencyTable['g'];

HuffmanCode Class Architecture

We will structure our Huffman Code application such that the exterior clas, HuffmanCode, will be responsible for frequency counting of the ASCII text, File I/O and a main method which will be used for testing the app. Most of the work for Huffman Coding will be done by nested (inner) classes. These classes are described on this page. Starter code is provided at the bottom of this page.

Creating the Node Class for a Huffman Tree

Each letter that we wish to transmit will have a binary sequence associated with it. That is what the Huffman Coding app will produce. To create the binary sequence, we will start off by putting each character into a node object. These node objects will comprise the leaves of our Huffman Tree. Other nodes will be formed by combing the leaf nodes. These non-leaf nodes will not contain a character. Therefore, the character element of these nodes will be set to null. We will declare and define this class as a nested class inside our HuffmanCode class. By declaring the class to be private and the data members to be public, we can assure easy access to the elements while safeguarding encapsulation principles. 

private class Node implements Comparable<Node> {

        public final char CHARACTER;    // the character we want to transmit; if a non-leaf node, this element is set to null.
        public final boolean ISLEAF;
        public int frequency;           // how many times this character appeared our original sample 
        public String binarySequence;   // "01001 ..." for this character
        public final Node LEFTCHILD;        
        public final Node RIGHTCHILD;

        public Node(char character, int frequency, Node leftChild, Node rightChild) {
            this.CHARACTER = character;
            this.frequency = frequency;
            this.binarySequence = "";   // this will be updated later in the algorithm 
            this.LEFTCHILD = leftChild;
            this.RIGHTCHILD = rightChild;
            ISLEAF = (leftChild == null && rightChild == null) ? true : false;
        }

        // configure compareTo method to create MIN heap
        public int compareTo(Node other) {
            if (this.frequency != other.frequency) 
                return this.frequency - other.frequency;   // frequency is primary sorting criteria
            else
                return this.CHARACTER - other.CHARACTER;   // ASCII position is secondary critera
        }
    }


Building a Huffman Tree

As part of our HuffmanCode algorithm, we will create a Huffman Tree. This object will take a frequency table (1-dim array of integers) as an argument to its constructor and create a binary tree of Node objects.
We have left the class unfinished but pseudocode is provided in the comments hinting at what must be done.

private class HuffmanTree {

private PriorityQueue<Node> pq;

public Map encodeMap; // <char, String>
public Map decodeMap; // <String, char>

// constructor
public HuffmanTree(int [ ] frequencyTable) {
// allocate memory for the priority queue, encodeMap and decodeMap

// for each character in the frequency table that has a non-zero frequency, create a leaf Node 
// and add the Node to the priority queue

// now construct the binary tree that will be the Huffman tree
// repeat the following until the priority queue is down to one Node only (the root Node)
// take the next two elements out of the priority queue
// combine them to make a new non-leaf Node
//  add the node to the priority queue


// now construct the binary sequences for each character
buildBinarySequence( getRoot(), "" );
}

public Node getRoot() {
// return the last Node remaining in the priority queue. Do not remove this Node from the priority queue
}

private void buildBinarySequence( Node n, String binarySequence ) {
// see description, below
}

}


Building the Binary Sequence for Each Character


For each character stored at the leaf Node in the Huffman Tree, we need to generate a binary sequence that will represent the character. When the Nodes that hold each character will initially created, the binarySequence field was left as an empty String.

We will now describe the algorithm for the buildBinarySequence method in the HuffmanTree class.

private void buildBinarySequence( Node n, String binarySequence ) {
// set the given Node's (n's) binary sequence to the argument String (binarySequence).
// call the buildBinarySequence method recursively as follows
//  for each non-leaf node:
// call the buildBinarySequence on the left child
// set the left child's binary sequence String argument  to the argument binary sequence + "0"
// call the buildBinarySequence on the right child
//
set the left child's binary sequence String argument to the argument binary sequence + "1"
//
// the leaf nodes represent the base cases for the recursive call.
// when leaf nodes are encountered, do not make any additional, recursive calls.
// instead, add a new entry to the encodeMap of the form <char, String> for which char
// represents the character in the leaf node
// and String represents the terminal binary sequence String.
// also add an entry to the decodeMap of the form <String, char>.
//
// after this method concludes, each leaf Node in the Huffman tree should have a fully
// defined binary sequence for each character.
// encodeMap and decodeMap should also be fully populated with char-String relationships.

}


Transmitting the Information

The transmit method in the HuffmanCode class must:

  1. read the input file (input.txt) one character at a time
  2. convert the character to its binary sequence
  3. print the binary sequence to a file called binary.txt
  4. also print the binary sequence to the screen for debugging purposes
  5. close the binary.txt and input.txt files when done
Code for the transmit method is provided to you in the reference code at the bottom of this page.

Receiving and Decrypting the Information

The receive method in the HuffmanCode class must:

  1. read the binary.txt file one character at a time
  2. when a group of characters forms a valid keyword, translate that keyword to a letter
  3. print the letter to a file called output.txt
  4. also print the letter to the screen for debugging purposes
  5. close the binary.txt and output.txt files when done
Code for the receive method is provided to you in the reference code at the bottom of this page.

Main (Test) Method

In a real world application, the transmitting and receiving of the data would take place in different programs. However, for our testing purposes, we will call transmit and receive one after the other from a single testing method (main). The main method is provided for you in the reference code, below.

Starter Code with HuffmanCode Class Drivers and Main Method for Testing

import java.util.*;
import java.io.*;

public class HuffmanCode
{
    public static final String INPUTFILE = "input.txt";
    public static final String ENCODEDFILE = "binary.txt";
    public static final String OUTPUTFILE = "output.txt";
    public static final int ASCII_NEWLINE = 10;

    public static final int ASCII_SIZE = 256;
    private int [] frequencyTable;
    private HuffmanTree ht;
    // private String input;   // debug
    
    // insert code for HuffmanTree inner class here

   // insert code for Node inner class here

   // insert buildBinarySequence method here

  // 

    // debug constructor
    public HuffmanCode(String input) {
        //this.input = input; // debug
        this.frequencyTable = new int[ASCII_SIZE];
        this.frequencyCounter(input);     // debug
        this.ht = new HuffmanTree();

    }

    public HuffmanCode() {
        this.frequencyTable = new int[ASCII_SIZE];
        this.frequencyCounter();     
        this.ht = new HuffmanTree();
    }

    public void frequencyCounter(String input) {
        for( char character : input.toCharArray())
            ++frequencyTable[character];
    }

    public void frequencyCounter() {
        try {
            BufferedReader file = new BufferedReader(new FileReader(INPUTFILE));
            String line = "";

            while ((line = file.readLine()) != null) {
                ++frequencyTable[ASCII_NEWLINE]; // another newline character found! (NL = 10)
                for( char character : line.toCharArray())
                    if (character > 0 && character < ASCII_SIZE) // ignore illegal characters
                        ++frequencyTable[character];
            }
            file.close();
        } catch (Exception e) {
            System.out.println("Error handling file: " + INPUTFILE);
        }
    }

    public void transmit() {
        try {
            File binaryFile = new File(ENCODEDFILE);
            binaryFile.createNewFile();

            FileWriter writer = new FileWriter(binaryFile);
            BufferedReader file = new BufferedReader(new FileReader(INPUTFILE));
            String line = "";

            while ((line = file.readLine()) != null) {
                String transmitString = "";
                for (Character c : line.toCharArray()) {
                    String temp = this.ht.encodeMap.get(c).toString();
                    writer.write( temp  );
                    // System.out.println(this.ht.encodeMap.get(c));
                }
                String encodedNewLineCharacter = this.ht.encodeMap.get((char)ASCII_NEWLINE).toString();
                writer.write(encodedNewLineCharacter);
            }
            file.close();

            writer.flush();
            writer.close();
        } catch (Exception e) {
            System.out.println("File Error handling: " + ENCODEDFILE);
        }
    }

    public String receive(String encodedBinary) {
        String originalMessage = "";
        String focus = "";
        int i = 0;
        while (i < encodedBinary.length()) {
            focus += encodedBinary.substring(i,i+1);
            if (this.ht.decodeMap.containsKey(focus)) {
                originalMessage += this.ht.decodeMap.get(focus);
                focus = "";
            }
            ++i;
        }
        return originalMessage;
    }

    public void receive() {
        try {
            BufferedReader encodedFile = new BufferedReader(new FileReader(ENCODEDFILE));
            File outputFile = new File(OUTPUTFILE);
            outputFile.createNewFile();
            FileWriter writer = new FileWriter(outputFile);

            String line = "";
            while ((line = encodedFile.readLine()) != null) {
                String outputLine = receive(line);
                // replace '\n' characters with Windows/Mac Newline characters
                // otherwise, we will not get separate lines in output.txt
                outputLine = outputLine.replaceAll("\n", System.lineSeparator());
                writer.write(outputLine);

            }
            encodedFile.close();
            writer.flush();
            writer.close();
        } catch (Exception e) {
            System.out.println("Error handling file: " + INPUTFILE);
        }

    }

    public static void main(String [ ] args) {
        // String inputString = "jaaaaabcdefghi";   // debug version
        // System.out.println( inputString );       // debug version
        // HuffmanCode hc = new HuffmanCode(inputString);   // debug version
        HuffmanCode hc = new HuffmanCode();
        hc.transmit();
        hc.receive();

    }
}