Monday, May 31, 2010

Alfresco: Loading Tag Categories

In my previous post, I described my content model. In this model, each post could be manually classified against one or more tags (similar to how one would do it on Blogger). The tags are stored as part of the Alfresco's taxonomy and shared between bloggers (we could have had this be private to each blogger also, but given that the idea of tagging is to build a shared folksonomy, I thought it would be better to have them be shared).

I pulled three Atom feeds for my example - one from my own blog, and two others from friends who also write on blogger. Then I parsed out the categories from the feeds and wrote out a de-duplicated set of tags from all three blogs out into a flat file. To parse the feeds, I used StAX - here is the code.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
// Source: src/java/com/mycompany/alfresco/extension/loaders/CategoryParser.java
package com.mycompany.alfresco.extension.loaders;

import java.io.File;
import java.io.FileInputStream;
import java.io.FileWriter;
import java.io.PrintWriter;
import java.util.HashSet;
import java.util.Set;

import javax.xml.stream.XMLInputFactory;
import javax.xml.stream.XMLStreamConstants;
import javax.xml.stream.XMLStreamReader;

import org.junit.Test;

public class CategoryParser {

  public void parse(String author, Set<String> cats) throws Exception {
    XMLInputFactory factory = XMLInputFactory.newInstance();
    XMLStreamReader parser = factory.createXMLStreamReader(
      new FileInputStream("/Users/sujit/Projects/Alfresco/" + 
      author + "_atom.xml"));
    for (;;) {
      int evt = parser.next();
      if (evt == XMLStreamConstants.END_DOCUMENT) {
        break;
      }
      if (evt == XMLStreamConstants.START_ELEMENT) {
        String tag = parser.getName().getLocalPart();
        if ("category".equals(tag)) {
          int nattrs = parser.getAttributeCount();
          for (int i = 0; i < nattrs; i++) {
            String attrname = parser.getAttributeLocalName(i);
            if ("term".equals(attrname)) {
              cats.add(parser.getAttributeValue(i));
            }
          }
        }
      }
    }
    parser.close();
  }
  
  @Test
  public void testParse() throws Exception {
    PrintWriter writer = new PrintWriter(
      new FileWriter(new File("/tmp/cats.txt")));
    Set<String> cats = new HashSet<String>();
    parse("happy", cats);
    parse("grumpy", cats);
    parse("bashful", cats);
    for (String cat : cats) {
      writer.println(cat);
    }
    writer.flush();
    writer.close();
  }
}

One thing that I've started doing recently is embed the @Test method in the main class itself, similar to how some people put in a main() method for testing. This is particularly useful if all you want the test to do is to run your class. That way, you can use a single Ant target to run all your classes, instead of having specific ones for each class. Here is the unittest target.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
  <target name="unittest" depends="setup,compile" description="run unit test">
    <junit printSummary="yes" haltonerror="true" haltonfailure="true" fork="true" dir=".">
      <test name="${test.class}" todir="./bin"/>
      <classpath refid="classpath.server"/>
      <classpath refid="classpath.build"/>
      <classpath path="${alfresco.web.dir}/WEB-INF/classes"/>
      <classpath path="bin"/>
      <sysproperty key="basedir" value="."/>
      <sysproperty key="dir.root" value="/Users/sujit/Library/Tomcat/alf_data"/>
      <formatter type="plain" usefile="false"/>
    </junit>
  </target>

Inserting the categories into Alfresco proved a bit trickier. There was no code example that I could find, either in Jeff Pott's book or on the web.

One way to do this is to manually add your categories in the alfresco/bootstrap/categories.xml, reinitialize the database and data directories, then startup the Alfresco web application. I suppose I could have done this, but it seemed a bit of overkill to me.

The next hint I found was in the Classification and Categories wiki page, which states:

To add categories to the cm:generalclassifiable classification, there first needs to be a node of type cm:category with a child association QName of cm:generalclassifiable and child association type cm:categories beneath a node of type cm:category_root. This node is the top of the classification.

Nodes can be created beneath this node of type cm:category and child association type cm:subcategories. These nodes defined the root categories for the classification. Further nodes of type cm:category and child association type cm:subcategories can be added beneath these nodes to define the category hierarchy. Secondary links can be used to include categories from one classification in another - these category nodes appear in both classifications. The category property and its defining aspect determines which classification applies.

Pretty simple, right? Yeah, I thought so too :-). But it does make sense if you read this really carefully, at the same time referring to the categories.xml file.

Towards the bottom of the categories.xml file is an empty subcategory of the root category, called "Tags". Presumably, this is the category that should be customized by applications. So I decided to hang off my "my:tag" category node off this, and put all my categories as child subcategories of this category. That way, I could add more categories that are application specific as siblings of the my:tag category. Something like this:

cm:category_root
  |
  +-- ...
  |
  +-- Tags
  |    |
  |    +-- MyCompany Post Tags (my:tag)
  |    |    |
  |    |    +-- xmlrpc
  |    |    |
  |    |    +-- ...

The wiki page said to look at the unit tests to see how the above should be done - the one I found was ADMLuceneCategoryTest - it wasn't exactly what I was looking for, but it did give me some useful pointers on how to go about doing this. Based on the code in here, I decided to use the Alfresco Foundation API.

The one disadvantage of using the Foundation API is that it takes a while for the ApplicationContext to spin up. But an advantage is that you can do this without the web application running (in fact, with the application running, it complained about port 50501 being already in use - but I believe that is something Mac OS specific). It would probably have been quicker to use one of the remote APIs to do this. Since I plan on using that anyway once I build the client for the CMS users, I decided to use the Foundation API for now. Here is the code to load the categories from the flat file generated in the previous step.

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
// Source: src/java/com/mycompany/alfresco/extension/loaders/CategoryLoader.java
package com.mycompany.alfresco.extension.loaders;

import java.io.BufferedReader;
import java.io.FileReader;
import java.io.Serializable;
import java.util.Collection;
import java.util.List;
import java.util.Map;

import javax.transaction.UserTransaction;

import org.alfresco.model.ContentModel;
import org.alfresco.service.ServiceRegistry;
import org.alfresco.service.cmr.repository.ChildAssociationRef;
import org.alfresco.service.cmr.repository.NodeRef;
import org.alfresco.service.cmr.repository.NodeService;
import org.alfresco.service.cmr.repository.StoreRef;
import org.alfresco.service.cmr.search.CategoryService;
import org.alfresco.service.cmr.search.ResultSet;
import org.alfresco.service.cmr.search.SearchService;
import org.alfresco.service.cmr.security.AuthenticationService;
import org.alfresco.service.namespace.QName;
import org.alfresco.service.transaction.TransactionService;
import org.alfresco.util.ApplicationContextHelper;
import org.junit.Assert;
import org.junit.Test;
import org.springframework.context.ApplicationContext;

public class CategoryLoader {

  private static final String MYCOMPANY_POST_TAG_QUERY = 
    "PATH:\"cm:generalclassifiable/cm:Tags/" + 
    "cm:MyCompany_x0020_Post_x0020_Tags\"";

  private ApplicationContext ctx;
  
  public CategoryLoader() {
    this.ctx = ApplicationContextHelper.getApplicationContext();
  }
  
  public int loadCategories(String categoryFile) throws Exception {
    int numLoaded = 0;
    ServiceRegistry serviceRegistry = 
      (ServiceRegistry) ctx.getBean(ServiceRegistry.SERVICE_REGISTRY);
    CategoryService categoryService = serviceRegistry.getCategoryService();
    AuthenticationService authenticationService = 
      serviceRegistry.getAuthenticationService();
    authenticationService.authenticate("admin", "admin".toCharArray());
    TransactionService txService = serviceRegistry.getTransactionService();
    UserTransaction tx = txService.getUserTransaction();
    tx.begin();
    Collection<ChildAssociationRef> refs = categoryService.getRootCategories(
      StoreRef.STORE_REF_WORKSPACE_SPACESSTORE, 
      ContentModel.ASPECT_GEN_CLASSIFIABLE);
    NodeRef tagCategoryRef = null;
    for (ChildAssociationRef ref : refs) {
      if (ref.getQName().equals(ContentModel.PROP_TAGS)) {
        tagCategoryRef = ref.getChildRef();
        break;
      }
    }
    try {
      SearchService searchService = serviceRegistry.getSearchService();
      ResultSet resultSet = null;
      BufferedReader reader = null;
      try {
        resultSet = searchService.query(
          StoreRef.STORE_REF_WORKSPACE_SPACESSTORE, 
          SearchService.LANGUAGE_LUCENE, MYCOMPANY_POST_TAG_QUERY);
        NodeRef myPostTagsRef = null;
        if (resultSet.getChildAssocRefs().size() > 0) {
          myPostTagsRef = resultSet.getChildAssocRef(0).getChildRef();
        } else {
          myPostTagsRef = categoryService.createCategory(
            tagCategoryRef, "MyCompany Post Tags");
        }
        reader = new BufferedReader(new FileReader(categoryFile));
        String category = null;
        while ((category = reader.readLine()) != null) {
          System.out.println("Adding category: " + category);
          categoryService.createCategory(myPostTagsRef, category);
          numLoaded++;
        }
      } finally {
        if (resultSet != null) { resultSet.close(); }
        if (reader != null) { reader.close(); }
      }
      reader.close();
      tx.commit();
    } catch (Exception e) {
      tx.rollback();
      throw e;
    }
    return numLoaded;
  }

  public void deleteMyCompanyTags() throws Exception {
    ServiceRegistry serviceRegistry = 
      (ServiceRegistry) ctx.getBean(ServiceRegistry.SERVICE_REGISTRY);
    AuthenticationService authenticationService = 
      serviceRegistry.getAuthenticationService();
    authenticationService.authenticate("admin", "admin".toCharArray());
    String ticket = authenticationService.getCurrentTicket();
    TransactionService txService = serviceRegistry.getTransactionService();
    UserTransaction tx = txService.getUserTransaction();
    tx.begin();
    SearchService searchService = serviceRegistry.getSearchService();
    ResultSet resultSet = null;
    try {
      resultSet = searchService.query(
        StoreRef.STORE_REF_WORKSPACE_SPACESSTORE, 
        SearchService.LANGUAGE_LUCENE, MYCOMPANY_POST_TAG_QUERY);
      NodeRef myCompanyTagsRef = resultSet.getChildAssocRef(0).getChildRef();
      NodeService nodeService = serviceRegistry.getNodeService();
      CategoryService categoryService = serviceRegistry.getCategoryService();
      for (ChildAssociationRef caref : 
          nodeService.getChildAssocs(myCompanyTagsRef)) {
        categoryService.deleteCategory(caref.getChildRef());
      }
    } finally {
      if (resultSet != null) { resultSet.close(); }
    }
    tx.commit();
    authenticationService.invalidateTicket(ticket);
    authenticationService.clearCurrentSecurityContext();
  }
  
  public int verifyLoading() throws Exception {
    int numVerified = 0;
    ServiceRegistry serviceRegistry = (ServiceRegistry) ctx.getBean(
      ServiceRegistry.SERVICE_REGISTRY);
    AuthenticationService authenticationService = 
      serviceRegistry.getAuthenticationService();
    authenticationService.authenticate("admin", "admin".toCharArray());
    TransactionService txService = serviceRegistry.getTransactionService();
    UserTransaction tx = txService.getUserTransaction();
    tx.begin();
    try {
      NodeService nodeService = serviceRegistry.getNodeService();
      SearchService searchService = serviceRegistry.getSearchService();
      // find all nodes that are under our category folder
      ResultSet resultSet = null;
      try {
        resultSet = searchService.query(
          StoreRef.STORE_REF_WORKSPACE_SPACESSTORE, 
          SearchService.LANGUAGE_LUCENE, MYCOMPANY_POST_TAG_QUERY);
        NodeRef myCompanyTagsRef = resultSet.getChildAssocRef(0).getChildRef();
        List<ChildAssociationRef> carefs = 
          nodeService.getChildAssocs(myCompanyTagsRef);
        for (ChildAssociationRef caref : carefs) {
          NodeRef catRef = caref.getChildRef();
          Map<QName,Serializable> props = nodeService.getProperties(catRef);
          String name = (String) props.get(ContentModel.PROP_NAME);
          System.out.println("Verified: " + name);
          numVerified++;
        }
      } finally {
        if (resultSet != null) { resultSet.close(); }
      }
      tx.commit();
    } catch (Exception e) {
      tx.rollback();
      throw e;
    }
    return numVerified;
  }

  @Test
  public void testLoadCategories() throws Exception {
    CategoryLoader loader = new CategoryLoader();
    int loaded = loader.loadCategories(
      "/Users/sujit/Projects/Alfresco/cats.txt");
    int verified = loader.verifyLoading();
    Assert.assertEquals(loaded, verified);
  }
}

After running this, I can verify that the categories made it in on the Alfresco webapp's Admin console. The left panel shows the category hierarchy, while the right panel shows a icon view of the various categories that were just inserted. The breadcrumb above also shows the relative position of these category tag elements.

Categories are to Alfresco what Taxonomies are to Drupal. Once you understand how to load categories, it seems to be fairly simple to work with. You can nest categories to any depth as well (see the categories.xml for how to do this).

For comparison, Drupal does this with 4 database tables - 3 to define the taxonomy and term itself, and the fourth one to map the node to the term. Not that there are no warts with Drupal's implementation (there are 3 different ways the node->taxonomy element can be structured, depending on the type of taxonomy being used), but Drupal's approach seems simpler and more intuitive to me.

That said, one thing I do like about Alfresco's approach is its unified approach to category and content - both are nodes.

Update - 2010-06-04

I had a bug in the loading code, I forgot to add the aspect to the category node as I was loading it. I also added a verification step in the loader that runs against Alfresco's Lucene index once the loading is complete, and verifies that the number of categories in the input file are the same as that in Alfresco. The code has been updated in the main post.

The second thing I noticed was that I was creating a new alf_data in my project - this was because the dir.root was set to ./alf_data. I guess the reason I could see it in Alfresco's web client was because it comes from the database. I updated the dir.root directly in repository.properties and added that as a system property in my Ant task. The XML for the Ant task has also been updated in the main post.

Update - 2010-06-12

When trying to link a post to a category, I found that I had marked the categories with the my:tagClassifiable aspect. There should be no aspect applied to the category, it should be applied to the my:post content instead. The CategoryLoader code has been updated with this information. In addition, there is code to delete all the categories (since I had to do this before I reran).

Be the first to comment. Comments are moderated to prevent spam.