Chapter 29. Exporting Comments

LiveJournal provides an interface for exporting comments using an XML format that makes it easy for people to write utilities to use the information. A user is allowed to download comments for any journal they administrate. It returns an XML structure you can parse; it is not an RPC interface.

Please read the bot rates & limits page, which has general rules on how to download information from the LiveJournal installation without getting yourself banned. Please also follow the directions in this chapter.

To use the comment exporter, you need a valid session cookie. This can be obtained with the sessiongenerate protocol mode or by posting login information to the login.bml page.

Comment Data Summary

Element Attribute Mode Mutable Description
maxid   meta yes This element gives you an integer value of the maximum comment id currently available in the user's journal. This is the endpoint, inclusive.
comment id meta, body no The id of this particular comment.
comment posterid meta, body yes The id of the poster of this comment. This can only change from 0 (anonymous) to some non-zero number. It will never go the other way, nor will it change from some non-zero number to another non-zero number. Anonymous (0) is the default if no posterid is supplied.
comment state meta, body yes S = screened comment, D = deleted comment, F = Frozen comment, A = active (visible) comment. If the state is not explicitly defined, it is assumed to be A.
comment jitemid body no Journal itemid this comment was posted in.
comment parentid body no 0 if this comment is top-level, else, it is the id of the comment this one was posted in response to. Top-level (0) is the default if no parentid is supplied.
usermap id meta no Poster id part of pair.
usermap user meta yes Username part of poster id + user pair. This can change if a user renames.
body   body no The text of the comment.
body   body no The text of the comment.
subject   body no The subject of the comment. This may not be present with every comment.
date   body no The time at which this comment was posted. This is in the W3C Date and Time[o] format.
property   body no The property tag has one attribute, name, that indicates the name of this property. The content of the tag is the value of that property.

Fetching Metadata

Please cache metadata, but note that it does contain things that can change about a comment. You should follow these instructions to update your cache once in a while.

Comment metadata includes only information that is subject to change on a comment. It is a lightweight call that returns a small XML file that provides basic information on each comment posted in a journal. Step 1 of any export should look like this:

GET /export_comments.bml?get=comment_meta&startid=0

After you have made the above request, you will get back a response like this:

<?xml version="1.0" encoding='utf-8'?>
    <livejournal>
        <maxid>100</maxid>
        <comments>
            <comment id='71' posterid='3' state='D' />
            <comment id='70' state='D' />
            <comment id='99' />
            <comment id='100' posterid='3' />
            <comment id='92' state='D' />
            <comment id='69' posterid='3' state='S' />
            <comment id='98' posterid='3' />
            <comment id='73' state='D' />
            <comment id='86' state='S' />
        </comments>
        <usermaps>
            <usermap id='6' user='exampleusername2' />
            <usermap id='3' user='exampleusername' />
            <usermap id='2' user='bob' />
        <usermaps>
    </livejournal>

The first part is the actual comment metadata. Each row will contain the mutable information about a single comment. After this data is the list of users and their ids. These mappings will never change, so feel free to completely cache these.

You should also notice the maxid line. This shows you the maximum comment id that is in this user's journal. You should use this number to determine if you are done downloading or not. So, in pseudocode, you should use something like this to get metadata:

    sub gather_metadata
        get largest comment id known about from my cache
        GET /export_comments.bml?get=comment_meta&startid=maxid+1
        add results to metadata cache
        if maximum id returned is less than maxid returned, call gather_metadata again
    end sub

Downloading the Comments

Important

Comment body data is to be heavily cached. None of this data can change. Once you have downloaded a comment, you do not need to do so again.

Once you have the entire list of metadata, you can begin downloading comments. The steps you will use are much the same as for getting metadata. Again, here is some pseudocode:

    sub download_comments
        get largest comment id we have fully downloaded
        GET /export_comments.bml?get=comment_body&startid=maxid+1
        add results to comment cache
        if maximum id returned is less than maxid in metadata cache, call download_comments again
        if nothing was returned, and startid+1000 < maxid from metadata, call download_comments again
    end sub

The resulting format each time you hit export_comments.bml will look like this:

    <?xml version="1.0" encoding='utf-8'?>
    <livejournal>
        <comments>
            <comment id='68' posterid='3' state='S' jitemid='34'>
            <body>we should all comment all day</body>
            <date>2007-03-02T18:14:06Z</date>
        </comment>
        <comment id='69' posterid='3' state='S' jitemid='34'>
            <body>commenting is fun</body>
            <date>2007-03-02T18:16:08Z</date>
        </comment>
        <comment id='99' jitemid='43' parentid='98'>
            <body>anonynote!</body>
            <date>2007-03-16T19:06:31Z</date>
            <property name='poster_ip'>127.0.0.1</property>
        </comment>
        <comment id='100' posterid='3' jitemid='43' parentid='98'>
            <subject>subject!#@?</subject>
            <body>&lt;b&gt;BOLD!&lt;/b&gt;</body>
            <date>2007-03-16T19:19:16Z</date>
        </comment>
    </comments>
    </livejournal>

Note

Users can now edit comments on-site, if they have the required usercap. This was introduced after the comment export facility was implemented. This means some comment data may change after it was originally posted.