Saturday, January 27, 2007

Consuming JSON Webservices with Spring MVC

JavaScript Object Notation (JSON) is a lightweight data interchange format, easy for both humans and machines to parse and generate. It has become very popular among web front-end developers because JavaScript has built in support to convert a JSON string into a JavaScript object, which can be pulled apart and manipulated within the JavaScript scripts within a web page. Along with AJAX (Asynchronous JavaScript and XML), it can be used to create interactive client-driven websites, as opposed to the more traditional server-driven websites that we are all familiar with.

However, JSON can also be considered a simple text data interchange format (similar to XML-RPC and SOAP). Many organizations are exposing their functionality to partners and general users via REST (Representational State Transfer) APIs which are capable to emitting XML or JSON. Clients typically send a HTTP GET request to the APIs, and it responds back with a blob of XML or JSON text which can be parsed and displayed by the client onto the client's co-branded web pages.

This article describes a simple framework, based on Spring MVC, that consumes JSON responses from such an external site, parses the JSON into a flat map, and dumps the output on a JSTL based scaffold page. The scaffold page shows the keys and values of the map, and this can be used to build your own custom web pages. For my example, I choose Google's JSON API (searchmash.com) as my backend, and the custom page I built was a search page for my blogger site. The search page would (hypothetically) be generated when I typed in a term in the search box on the upper left corner of the blogger page.

A picture is worth a thousand words, so here is one that describes this system. The red lines represent the request flow, and the blue lines represent the response. The main components of our framework are the Http Client, the JSON Flattener, the controller and the scaffold JSP page. Each of them are discussed below.

Spring/webapp configuration

Like most Spring applications, the web.xml is minimal, and the only mapping is for the Spring DispatcherServlet. The configuration for the DispatcherServlet is the spring-servlet.xml, and the beans themselves are configured in the applicationContext.xml.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
<!-- ==== web.xml ==== -->
<?xml version="1.0" encoding="ISO-8859-1"?>
<web-app xmlns="http://java.sun.com/xml/ns/j2ee"
    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
    xsi:schemaLocation="http://java.sun.com/xml/ns/j2ee 
    http://java.sun.com/xml/ns/j2ee/web-app_2_4.xsd" version="2.4">

  <servlet>
    <servlet-name>spring</servlet-name>
    <servlet-class>org.springframework.web.servlet.DispatcherServlet</servlet-class>
    <load-on-startup>1</load-on-startup>
  </servlet>

  <servlet-mapping>
    <servlet-name>spring</servlet-name>
    <url-pattern>*.html</url-pattern>
  </servlet-mapping>

</web-app>
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
<!-- ==== spring-servlet.xml ==== -->
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE beans PUBLIC "-//SPRING//DTD BEAN//EN" "http://www.springframework.org/dtd/spring-beans.dtd">
<beans>

  <import resource="classpath:applicationContext.xml" />

  <bean id="urlMapping" class="org.springframework.web.servlet.handler.SimpleUrlHandlerMapping">
    <property name="mappings">
      <props>
        <prop key="blogSearch.html">apiProxyController</prop>
        ...
      </props>
    </property>
  </bean>

  <bean id="viewResolver" class="org.springframework.web.servlet.view.InternalResourceViewResolver">
    <property name="prefix" value="" />
    <property name="suffix" value=".jsp" />
    <property name="viewClass" value="org.springframework.web.servlet.view.JstlView" />
  </bean>

</beans>
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
<!-- ==== applicationContext.xml ==== -->
<?xml version="1.0" encoding="UTF-8"?>
<beans xmlns="http://www.springframework.org/schema/beans"
       xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
       xmlns:util="http://www.springframework.org/schema/util"
       xsi:schemaLocation="
       http://www.springframework.org/schema/beans http://www.springframework.org/schema/beans/spring-beans-2.0.xsd
       http://www.springframework.org/schema/util http://www.springframework.org/schema/util/spring-util-2.0.xsd">

  <bean id="apiProxyClient" class="com.mycompany.apiclient.services.ApiProxyClient">
    <property name="serviceUrl" value="http://api.somecompany.com/" />
  </bean>

  <bean id="apiProxyController" class="com.mycompany.apiclient.controllers.ApiProxyController">
    <property name="apiProxyClient" ref="apiProxyClient" />
    <property name="uriMethodMap">
      <map>
        <entry>
          <key><value>/blogSearch.html</value></key>
          <value>results/</value>
        </entry>
        ...
      </map>
    </property>
  </bean>

</beans>

The Controller (ApiProxyController)

The Controller is a simple Spring Controller implementation. All it does is to map the incoming request URI to a backend API method, and passes in the rest of the parameters as is. It is injected with a reference to the ApiProxyClient, on which it calls the doRemoteCall() method to retrieve the JSON object. The controller then calls the JSON Flattener to convert the JSON Object to a flat map of key-value pairs, and push it into the ModelAndView, where the JSP can pick it up from. It finally forwards to a JSP which has the same name as the backend method name. The code is shown below:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
public class ApiProxyController implements Controller {

  private ApiProxyClient apiProxyClient;
  private String appName;
  private Map<String,String> uriMethodMap;

  public ApiProxyController() {
    super();
  }

  public void setApiProxyClient(ApiProxyClient apiProxyClient) {
    this.apiProxyClient = apiProxyClient;
  }

  public void setUriMethodMap(Map<String, String> uriMethodMap) {
    this.uriMethodMap = uriMethodMap;
  }

  @SuppressWarnings("unchecked")
  public ModelAndView handleRequest(HttpServletRequest request, HttpServletResponse response)
      throws Exception {
    ModelAndView mav = new ModelAndView();
    String requestUri = request.getRequestURI();
    String remoteCallMethod = uriMethodMap.get(requestUri);
    if (remoteCallMethod == null) {
      throw new Exception("The URI:" + path + " is not mapped to a remote method");
    }
    Map<String,String[]> parameterMap = request.getParameterMap();
    List<NameValuePair> remoteCallParams = new ArrayList<NameValuePair>();
    for (String parameterName : parameterMap.keySet()) {
      String[] parameterValues = parameterMap.get(parameterName);
      if (parameterValues == null || parameterValues.length == 0) {
        continue;
      }
      remoteCallParams.add(new NameValuePair(parameterName,
        NumberUtils.isDigits(parameterValues[0]) ? parameterValues[0] :
        URLEncoder.encode(parameterValues[0], "UTF-8")));
    }
    JSON jsonObj = apiProxyClient.doRemoteCall(remoteCallMethod, remoteCallParams);
    Map<String,String> jsonMap = JsonFlattener.flatten(jsonObj);
    mav.addObject("json", jsonMap);
    mav.setViewName(remoteCallMethod);
    return mav;
  }
}

The HTTP Client (ApiProxyClient)

The HTTP Client uses the Apache HTTP Client libraries to model a simple HTTP GET requester. It formats the method call into an HTTP GET request, sends it to the backend service, and parses the resulting response into a JSON Object and returns it to the controller. Here is the code for this.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
public class ApiProxyClient {

  private String serverDomain;

  public ApiProxyClient() {
    super();
  }

  public void setServiceUrl(String serviceUrl) {
    this.serviceUrl = serviceUrl;
  }

  public String getServiceUrl() {
    return serviceUrl;
  }

  public JSON doRemoteCall(String apiMethod, List<NameValuePair> parameters) throws Exception {
    HttpClient client = new HttpClient();
    GetMethod method = new GetMethod(serviceUrl + apiMethod);
    method.setQueryString(parameters.toArray(new NameValuePair[0]));
    method.getParams().setParameter(HttpMethodParams.RETRY_HANDLER,
      new DefaultHttpMethodRetryHandler(3, false));
    try {
      int rc = client.executeMethod(method);
      if (rc != HttpStatus.SC_OK) {
        throw new Exception("HTTP Status Code returned:" + rc);
      }
      String response = method.getResponseBodyAsString();
      JSONObject jsonResponse = new JSONObject(response);
      return jsonResponse;
    } finally {
      method.releaseConnection();
    }
  }
}

The JSON Flattener

The JSON Flattener is a simple utility class which uses the json-lib library. This is the only JSON library, as far as I know, which implements the JSONArray and JSONObject from a common JSON interface. Since the flattener needs to handle JSON Objects with possibly embedded JSONArrays, this was the most appropriate library for me to use. The code traverses the JSON object recursively to flatten it into a set of name value pairs.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
public class JsonFlattener {

  public static Map<String,String> flatten(JSON json) {
    Map<String,String> jsonMap = new LinkedHashMap<String,String>();
    flattenRecursive(json, "", jsonMap);
    return jsonMap;
  }

  private static void flattenRecursive(JSON json, String prefix, Map<String,String> jsonMap) {
    if (json instanceof JSONObject) {
      JSONObject jsonObject = (JSONObject) json;
      Iterator it = jsonObject.keys();
      while (it.hasNext()) {
        String key = (String) it.next();
        Object value = jsonObject.get(key);
        if (value instanceof JSON) {
          flattenRecursive((JSON) value, StringUtils.join(new String[] {prefix, key}, '.'), jsonMap);
        } else {
          jsonMap.put(StringUtils.join(new String[] {prefix, key}, '.'), String.valueOf(value));
        }
      }
    } else if (json instanceof JSONArray) {
      JSONArray jsonArray = (JSONArray) json;
      for (int i = 0; i < jsonArray.length(); i++) {
        JSON jsonObject = (JSON) jsonArray.get(i);
        flattenRecursive(jsonObject, prefix + "[" + i + "]", jsonMap);
      }
    } else if (json instanceof JSONNull) {
      // this is a JSON string
      jsonMap.put(prefix, null);
    }
  }
}

The scaffold JSP

For each new service method to be exposed, we can copy our scaffold JSP into a new JSP with the name of the name of the method. Calling the page with a URL of the form: [/apiclient/method.html?query] will pull the following debug page up.

<table cellspacing="0" cellpadding="0" border="1" width="100%">
  <tr bgcolor="blue" fgcolor="white">
    <th>Name</th>
    <th>Value</th>
  </tr>
  <c:forEach var='jsonElement' items='${json}'>   <tr>
    <td><c:out value="${jsonElement.key}" /></td>
    <td><c:out value="${jsonElement.value}" /></td>   </tr>
  </c:forEach>
</table>
This is the data arranged as we would like it in our final web page, but still unadorned. The JSTL is just cut and pasted into the blogger template for my site below.
<p><b>Found ${json['.estimatedCount']} results for query: "${json['.query.terms']}"</b><br/>
<b>Top 5 search results</b></p>
<c:forEach var='index' items='0,1,2,3,4'>
  <c:set var="resultsUrlKey" value=".results[${index}].url"/>
  <c:set var="resultsTitleKey" value=".results[${index}].title"/>
  <c:set var="resultsCacheUrlKey" value=".results[${index}].cacheUrl"/>
  <c:set var="resultsSnippetKey" value=".results[${index}].snippet"/>
  <p><a href="${json[resultsUrlKey]}">${json[resultsTitleKey]}</a> 
  (<a href="${json[resultsCacheUrlKey]}">Cached</a>)<br/>
  <font size="-1">${json[resultsSnippetKey]}</font><br/></p>
</c:forEach>

So, in effect, all that needs to be done to expose a new JSON webservice is to build a new page, put in the servlet URL mapping in spring-servlet.xml and a incoming URL to backend method name mapping in the controller configuration. And then all that needs to be done is to reformat the JSP accordingly. Such a client system can be built once with a single scaffold JSP page, and can be enhanced by web developers as new methods need to be exposed.

Saturday, January 20, 2007

Faceted Searching with Lucene

Last week, I pointed to an article by William Denton, "How to make a Faceted Collection and put it on the Web", where he describes what facets are and how to build up a faceted collection of data. The example he provides uses a relational database to store the information. For this article, I took the dataset that he used and built up a small web application that provides faceted search results using Lucene as the datastore. I continue to hold the facet metadata in a relational database, however. While this implementation is a first cut, and does not address issues of performance or maintainability (more on this later), I believe that this implementation will resonate better with web developers, given the popularity of Lucene to build search applications.

Tools/Framework used

One application that specifically addresses faceted searching with Lucene is Apache-Solr, and I briefly considered using their classes to drive my application. However, the impression I got (and I could be wrong) was that Solr is very tightly integrated around the webservices architecture, leveraging it to provide facet metadata caching, etc. This would not work so well for me on my resource constrained laptop, so I decided to start from scratch, using Lucene's BooleanQuery and QueryFilters for my implementation.

I did, however, want to use Spring MVC and Dependency injection, so I used the Lucene module from the SpringModules project. I discovered that the current version (0.7) did not work with Lucene 2.0 (which I was using) due to some non-backward compatible changes made to Lucene between 1.4 and 2.0, so I fixed it locally and provided a patch so it can be integrated into future versions.

Screenshots

But first, some mandatory screenshots to grab your interest. As you can see, I am not much of a front-end web developer, but these should give you an idea of what the application does.

Shows the entire data set. As you can see, the URL contains the category=dish-soap parameter. In a "real" application, this could be used to isolate records in a specific category. Faceted search really comes into its own on category style pages, where all the records share a subset of facets. For example, the "agent" facet may not make much sense in a food category.
Shows all the dish soaps that have the brand "Palmolive". This is irrespective of its other facets.
Further constrains the brand=Palmolive facet by dish soaps that are used to wash dishes by hand.
Resets the brand facet so that all dish soaps that are used to wash dishes by hand are shown, irrespective of brand. Clicking the "Reset Search" link will reset all the facet constraints and show all the dishwashing soaps in the category (first screenshot).

The Indexer

To build the index, I first copied (by hand) the dish soaps data from William Denton's article into a semicolon-separated file. The first few lines of the file are shown below:

1
2
3
4
#name;agent;form;brand;scent;effect
Cascade Pure Rinse Formula;dishwasher;liquid;Cascade; ;antibacterial;
Elactrasol lemon gel;dishwasher;liquid;Electrasol;lemon; ;
...

I then created a table to hold the facet metadata. The Spring configuration for the indexer and its associated Dao (to populate the facet metadata) is shown below. The dataSource is a reference to a Spring DriverManagerDataSource connecting to my local PostgreSQL database.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
  <!-- Lucene index datasource configuration -->
  <bean id="fsDirectory" class="org.springmodules.lucene.index.support.FSDirectoryFactoryBean">
    <property name="location" value="file:/tmp/soapindex" />
    <property name="create" value="true" />
  </bean>

  <bean id="indexFactory" class="org.springmodules.lucene.index.support.SimpleIndexFactoryBean">
    <property name="directory" ref="fsDirectory" />
    <property name="analyzer">
      <bean class="org.apache.lucene.analysis.SimpleAnalyzer" />
    </property>
  </bean>

  <!-- IndexBuilder -->
  <bean id="facetsDao" class="net.soapmarket.db.FacetsDao">
    <property name="dataSource" ref="dataSource" />
  </bean>

  <bean id="soapIndexBuilder" class="net.soapmarket.index.SoapIndexBuilder">
    <property name="indexFactory" ref="indexFactory" />
    <property name="analyzer">
      <bean class="org.apache.lucene.analysis.SimpleAnalyzer" />
    </property>
    <property name="facetsDao" ref="facetsDao" />
  </bean>

and the code for the Indexer

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
public class SoapIndexBuilder extends LuceneIndexSupport {

  private FacetsDao facetsDao;

  private String[] fieldsMeta;
  private Map<String,Set<String>> facets;

  public void setFacetsDao(FacetsDao facetsDao) {
    this.facetsDao = facetsDao;
  }

  public void buildIndex(String inputFileName) throws Exception {
    facets = new HashMap<String,Set<String>>();
    BufferedReader reader = new BufferedReader(new InputStreamReader(
      new FileInputStream(inputFileName)));
    String line = null;
    while ((line = reader.readLine()) != null) {
      if (line.startsWith("#")) {
        fieldsMeta = (line.substring(1)).split(";");
        continue;
      }
      addDocument(line);
    }
    reader.close();
    facetsDao.saveFacetMap(facets);
  }

  public void addDocument(final String text) {
    getTemplate().addDocument(new DocumentCreator() {
      public Document createDocument() throws Exception {
        Document doc = new Document();
        String[] fields = text.split(";");
        int fieldIndex = 0;
        for (String fieldMetadata : fieldsMeta) {
          if (fieldIndex == 0) {
            doc.add(new Field(fieldMetadata, fields[fieldIndex], Field.Store.YES, 
              Field.Index.TOKENIZED));
          } else {
            Set<String> facetValues = facets.get(fieldMetadata);
            if (facetValues == null) {
              facetValues = new HashSet<String>();
            }
            if (fields[fieldIndex].indexOf(',') > -1) {
              String[] multiValues = fields[fieldIndex].split("\\s*,\\s*");
              for (String multiValue : multiValues) {
                doc.add(new Field(fieldMetadata, multiValue, Field.Store.NO, 
                  Field.Index.UN_TOKENIZED));
                if (StringUtils.isNotBlank(multiValue)) {
                  facetValues.add(multiValue);
                }
              }
            } else {
              doc.add(new Field(fieldMetadata, fields[fieldIndex], Field.Store.NO,
                Field.Index.UN_TOKENIZED));
              if (StringUtils.isNotBlank(fields[fieldIndex])) {
                facetValues.add(fields[fieldIndex]);
              }
            }
            facets.put(fieldMetadata, facetValues);
          }
          fieldIndex++;
        }
        // finally add our hardcoded category (for testing)
        doc.add(new Field("category", "dish-soap", Field.Store.NO, Field.Index.UN_TOKENIZED));
        return doc;
      }
    });
  }
}

Facet metadata

The Facet metadata is dumped by the IndexBuilder into a single table. This works fine for a tiny dataset such as ours, but when our dataset becomes larger, it may be good to normalize the data into two separate tables. Here is a partial listing of our facets data.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
postgresql=# select * from facets;
 facet_name |     facet_value
------------+---------------------
 brand      | Sunlight
 brand      | Generic
 brand      | Cascade
 brand      | President's Choice
 brand      | Electrasol
 brand      | Palmolive
 brand      | Ivory
 agent      | dishwasher
 agent      | hand
...

Here is the code for the FacetDao, which returns information from the facets table. Only the saveFacetMap() method is used by the Indexer, all the other methods are used by the Searcher.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
public class FacetsDao extends JdbcDaoSupport {

  public void saveFacetMap(Map<String,Set<String>> facetMap) {
    getJdbcTemplate().update("delete from facets where 1=1");
    for (String facetName : facetMap.keySet()) {
      Set<String> facetValues = facetMap.get(facetName);
      for (String facetValue : facetValues) {
        getJdbcTemplate().update("insert into facets(facet_name, facet_value) values (?, ?)",
          new String[] {facetName, facetValue});
      }
    }
  }

  @SuppressWarnings("unchecked")
  public List<String> getAllFacetNames() {
    List<Map<String,String>> rows = getJdbcTemplate().queryForList(
      "select facet_name from facets group by facet_name");
    List<String> facetNames = new ArrayList<String>();
    for (Map<String,String> row : rows) {
      facetNames.add(row.get("FACET_NAME"));
    }
    return facetNames;
  }

  @SuppressWarnings("unchecked")
  public List<String> getFacetValues(String facetName) {
    List<Map<String,String>> rows = getJdbcTemplate().queryForList(
      "select facet_value from facets where facet_name = ?",
      new String[] {facetName});
    List<String> facetValues = new ArrayList<String>();
    for (Map<String,String> row : rows) {
      facetValues.add(row.get("FACET_VALUE"));
    }
    return facetValues;
  }
}

The Searcher

The Searcher is coupled with the controller via the request parameter map. Notice how the facets and their values (in the screenshots above) are really request parameter name-value pairs. The Searcher provides methods to convert the parameter values into corresponding Lucene queries. Notice also, that each page is built from a single Lucene query to show the current dataset, and a set of Lucene queries to build up the facet hit counts on the left navigation toolbar.

The Spring configuration for the Searcher is shown below. Notice that we reuse the FacetsDao and the fsDirectory has its create property commented out. The latter is because Spring will delete your index on startup if create=true is set. In the real world, the Indexer and Searcher applications are usually separate, so this is not an issue. But here we comment out the create property after we are done building our index.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
  <!-- Lucene index datasource configuration -->
  <bean id="fsDirectory" class="org.springmodules.lucene.index.support.FSDirectoryFactoryBean">
    <property name="location" value="file:/tmp/soapindex" />
    <!--<property name="create" value="true" />-->
  </bean>

  <bean id="searcherFactory" class="org.springmodules.lucene.search.factory.SimpleSearcherFactory">
    <property name="directory" ref="fsDirectory" />
  </bean>

  <!-- IndexSearcher -->
  <bean id="facetedSoapSearcher" class="net.soapmarket.search.FacetedSoapSearcher">
    <property name="searcherFactory" ref="searcherFactory" />
    <property name="analyzer">
      <bean class="org.apache.lucene.analysis.SimpleAnalyzer" />
    </property>
    <property name="facetsDao" ref="facetsDao" />
  </bean>

Here is the source code for the Searcher.

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
public class FacetedSoapSearcher extends LuceneSearchSupport {

  private FacetsDao facetsDao;

  public void setFacetsDao(FacetsDao facetsDao) {
    this.facetsDao = facetsDao;
  }

  public Query getQueryFromParameterMap(Map<String,String[]> parameters) {
    if (parameters == null || parameters.size() == 0) {
      RangeQuery query = new RangeQuery(new Term("name", "a*"), new Term("name", "z*"), true);
      return query;
    } else {
      BooleanQuery query = new BooleanQuery();
      for (String parameter : parameters.keySet()) {
        String[] parameterValues = parameters.get(parameter);
        if (parameterValues.length > 0) {
          if (StringUtils.isNotBlank(parameterValues[0])) {
            TermQuery tQuery = new TermQuery(new Term(parameter, parameterValues[0]));
            query.add(tQuery, Occur.MUST);
          }
        }
      }
      return query;
    }
  }

  @SuppressWarnings("unchecked")
  public List<String> search(Query query) {
    List<String> results = getTemplate().search(query, new HitExtractor() {
      public Object mapHit(int id, Document doc, float score) {
        String name = doc.get("name");
        return name;
      }
    });
    return results;
  }

  @SuppressWarnings({ "unchecked", "deprecation" })
  public List<Facet> getFacets(final Query baseQuery, 
      final Map<String,String[]> baseRequestParameters) {
    List<Facet> facetCounts = new ArrayList<Facet>();
    for (String facetName : facetsDao.getAllFacetNames()) {
      Facet facet = new Facet();
      facet.setName(facetName);
      if (baseRequestParameters.get(facetName) != null) {
        // facet already exists in the request, this will only have reset option      
        facet.setAllQueryString(buildFacetResetQueryString(facetName, baseRequestParameters));
        facetCounts.add(facet);
      } else {
        List<String> facetValues = facetsDao.getFacetValues(facetName);
        List hitCounts = new ArrayList<NameValueUrlTriple>();
        for (String facetValue : facetValues) {
          final QueryFilter filter = new QueryFilter(
            new TermQuery(new Term(facetName, facetValue)));
          Integer numHits = (Integer) getTemplate().search(new SearcherCallback() {
            public Object doWithSearcher(Searcher searcher) throws IOException, ParseException {
              try {
                Hits hits = searcher.search(baseQuery, filter);
                return hits.length();
              } finally {
                searcher.close();
              }
            }
          });
          if (numHits > 0) {
            hitCounts.add(new NameValueUrlTriple(facetValue, String.valueOf(numHits),
                buildQueryString(baseRequestParameters, facetName, facetValue)));
          }
        }
        facet.setHitCounts(hitCounts);
        if (hitCounts.size() > 0) {
          facetCounts.add(facet);
        }
      }
    }
    return facetCounts;
  }

  /**
   * Builds up the url for the facet reset (remove it from the query).
   */
  @SuppressWarnings("deprecation")
  private String buildFacetResetQueryString(String facetName, 
      Map<String,String[]> baseRequestParameters) {
    StringBuilder facetResetQueryStringBuilder = new StringBuilder();
    int i = 0;
    for (String parameterName : baseRequestParameters.keySet()) {
      String parameterValue = baseRequestParameters.get(parameterName)[0];
      if (parameterName.equals(facetName)) {
        continue;
      }
      if (i > 0) {
        facetResetQueryStringBuilder.append("&");
      }
      facetResetQueryStringBuilder.append(parameterName).
        append("=").
        append(URLEncoder.encode(parameterValue));
      i++;
    }
    return facetResetQueryStringBuilder.toString();
  }

  /**
   * Builds up the query string for the faceted search for this facet.
   */
  @SuppressWarnings("deprecation")
  private String buildQueryString(Map<String,String[]> baseRequestParameters, 
      String facetName, String facetValue) {
    StringBuilder queryStringBuilder = new StringBuilder();
    int i = 0;
    for (String parameterName : baseRequestParameters.keySet()) {
      String[] parameterValues = baseRequestParameters.get(parameterName);
      if (i > 0) {
        queryStringBuilder.append("&");
      }
      queryStringBuilder.append(parameterName).
        append("=").
        append(URLEncoder.encode(parameterValues[0]));
      i++;
    }
    queryStringBuilder.append("&").
      append(facetName).append("=").append(URLEncoder.encode(facetValue));
    return queryStringBuilder.toString();
  }
}

And the (partial) source code for the Facet bean, all the member variables have associated getter and setter methods. The Facet bean is a convenient abstraction that simplifies our Searcher code as well as our JSP code (shown below).

1
2
3
4
5
6
7
8
public class Facet {

  private String name;
  private List<NameValueUrlTriple> hitCounts;
  private String allQueryString;

  // getters and setters (omitted for brevity)
}

The Controller and JSP

The Controller is really simple. It is built by Spring with a reference to the Searcher. The controller gets the incoming request and delegates most of the work to the Searcher. The Searcher builds the Lucene Query object from the parameters and passes it back to the Controller, which uses the Lucene Query to issue a search() and getFacets() call back to the Searcher, puts the results in the ModelAndView, and forwards to the search JSP. The Spring configuration is shown below:

1
2
3
4
  <!-- Controller -->
  <bean id="facetedSearchController" class="net.soapmarket.controller.FacetedSearchController">
    <property name="facetedSoapSearcher" ref="facetedSoapSearcher" />
  </bean>

And here is the code for the Controller:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
public class FacetedSearchController implements Controller {

  private FacetedSoapSearcher facetedSoapSearcher;

  public void setFacetedSoapSearcher(FacetedSoapSearcher facetedSoapSearcher) {
    this.facetedSoapSearcher = facetedSoapSearcher;
  }

  @SuppressWarnings("unchecked")
  public ModelAndView handleRequest(HttpServletRequest request, HttpServletResponse response)
      throws Exception {
    ModelAndView mav = new ModelAndView();
    Map<String,String[]> parameters = request.getParameterMap();
    Query query = facetedSoapSearcher.getQueryFromParameterMap(parameters);
    mav.addObject("category", parameters.get("category")[0]);
    mav.addObject("results", facetedSoapSearcher.search(query));
    mav.addObject("facets", facetedSoapSearcher.getFacets(query, parameters));
    mav.addObject("categoryName", "Dishwashing Soaps"); // hardcoded for now
    mav.setViewName("search");
    return mav;
  }
}

And the code for the JSP is here:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
<%@ page language="java" import="java.util.*" pageEncoding="UTF-8"%>
<%@ page session="false" %>
<%@ taglib prefix="c" uri="http://java.sun.com/jsp/jstl/core" %>
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
  <body>
    <h2>${categoryName}</h2>
    <table cellspacing="0" cellpadding="0" border="1" width="100%">
      <tr valign="top">
        <td><font size="-1">
          <p><b><a href="/soapmarket/search.do?category=${category}">Reset search</a></b></p>
          <c:forEach var="facet" items="${facets}">
            <c:choose>
              <c:when test="${not empty facet.allQueryString}">
                <p><b><a href="/soapmarket/search.do?${facet.allQueryString}">See all ${facet.name}</a></b></p>
              </c:when>
              <c:otherwise>
                <b>Search by ${facet.name}</b><br>
                <ul>
                <c:forEach var="hitCount" items="${facet.hitCounts}">
                  <li><a href="/soapmarket/search.do?${hitCount.queryString}">${hitCount.name} : (${hitCount.value})</a></li>
                </c:forEach>
                </ul><br>
              </c:otherwise>
            </c:choose>
          </c:forEach>
        </font></td>
        <td>
          <ol>
          <c:forEach var="result" items="${results}">
            <li>${result}</li>
          </c:forEach>
          </ol>
        </td>
      </tr>
    </table>
  </body>
</html>

Scope for improvement

Two issues not addressed in this implementation are performance and maintainability. For this prototype, I am using a dataset of about 27 records which have about 6 facets. Performance can be improved on the relational database end by normalizing the facet information. From what I heard from search engineers at my previous job, and because Lucene depends on an inverted index, Lucene scales very well to large datasets, so that is probably not an issue. The other aspect is maintainability. We are using a new field for each facet, which would grow messy as more facets are added (even in a controlled vocabulary environment). It may be better to store all the facets in a single field. This will require modifications to both the indexer and searcher.