Exporting Comments

LiveJournal provides an interface for exporting comments using an XML format that makes it easy for people to write utilities to use the information. A user is allowed to download comments for any journal they administrate.

Please read the LiveJournal Bot Policy page, which discusses more general rules on how to download information from our servers without getting yourself banned. Also please follow the directions contained in this guide.

In order to use the comment exporter, you will need to have a valid session cookie. This can be obtained with the sessiongenerate protocol mode or by posting login information to the login.bml page.

Comment Data Summary

maxid meta yes This element gives you an integer value of the maximum comment id currently available in the user's journal. This is the endpoint, inclusive.
comment id meta, body no The id of this particular comment.
comment posterid meta, body yes The id of the poster of this comment. This can only change from 0 (anonymous) to some non-zero number. It will never go the other way, nor will it change from some non-zero number to another non-zero number. Anonymous (0) is the default if no posterid is supplied.
comment state meta, body yes S = screened comment, D = deleted comment, A = active (visible) comment. If the state is not explicitly defined, it is assumed to be A.
comment jitemid body no Journal itemid this comment was posted in.
comment parentid body no 0 if this comment is top-level, else, it is the id of the comment this one was posted in response to. Top-level (0) is the default if no parentid is supplied.
usermap id meta no Poster id part of pair.
usermap user meta yes Username part of poster id + user pair. This can change if a user renames.
body body no The text of the comment.
subject body no The subject of the comment. This may not be present with every comment.
date body no The time this comment was posted at. This is in the W3C Date and Time format.
property body no The property tag has one attribute, name, that indicates the name of this property. The content of the tag is the value of that property.

Fetching Metadata

NOTE: Please cache metadata, but note that it does contain things that can change about a comment. You should follow these instructions to update your cache once in a while.

Comment metadata includes only information that is subject to change on a comment. It is a lightweight call that returns a small XML file that provides basic information on each comment posted in a journal. Step 1 of any export should look like this:

    GET /export_comments.bml?get=comment_meta&startid=0

After you have made the above request, you will get back a response that looks something like this:

    <?xml version="1.0" encoding='utf-8'?>
            <comment id='71' posterid='3' state='D' />
            <comment id='70' state='D' />
            <comment id='99' />
            <comment id='100' posterid='3' />
            <comment id='92' state='D' />
            <comment id='69' posterid='3' state='S' />
            <comment id='98' posterid='3' />
            <comment id='73' state='D' />
            <comment id='86' state='S' />
            <usermap id='6' user='test2' />
            <usermap id='3' user='test' />
            <usermap id='2' user='xb95' />

The first part is the actual comment metadata. Each row will contain the mutable information about a single comment. After this data is the list of users and their ids. These mappings will never change, so feel free to completely cache these.

You should also notice the maxid line. This shows you the maximum comment id that is in this user's journal. You should use this number to determine if you are done downloading or not. So, in pseudocode, you should use something like this to get metadata:

    sub gather_metadata
        get largest comment id known about from my cache
        GET /export_comments.bml?get=comment_meta&startid=maxid+1
        add results to metadata cache
        if maximum id returned is less than maxid returned, call gather_metadata again
    end sub

Downloading the Comments

WARNING: Comment body data is to be heavily cached. None of this data can change. Once you have downloaded a comment, you do not need to do so again.

Once you have the entire list of metadata, you can begin downloading comments. The steps you will use are much the same as for getting metadata. Again, here is some pseudocode:

    sub download_comments
        get largest comment id we have fully downloaded
        GET /export_comments.bml?get=comment_body&startid=maxid+1
        add results to comment cache
        if maximum id returned is less than maxid in metadata cache, call download_comments again
        if nothing was returned, and startid+1000 < maxid from metadata, call download_comments again
    end sub

The resulting format each time you hit export_comments.bml will look like this:

    <?xml version="1.0" encoding='utf-8'?>
            <comment id='68' posterid='3' state='S' jitemid='34'>
            <body>we should all comment all day</body>
        <comment id='69' posterid='3' state='S' jitemid='34'>
            <body>commenting is fun</body>
        <comment id='99' jitemid='43' parentid='98'>
            <property name='poster_ip'></property>
        <comment id='100' posterid='3' jitemid='43' parentid='98'>

That concludes this brief tutorial on exporting comment data in an appropriate manner so as not to be overly hard on the LiveJournal servers. Thanks for your cooperation, and don't forget to read the Bot Policy page.