LiveJournal provides an interface for exporting comments using an XML format that makes it easy for people to write utilities to use the information. A user is allowed to download comments for any journal they administrate. It returns an XML structure you can parse; it is not an RPC interface.
Please read the bot rates & limits page, which has general rules on how to download information from the LiveJournal installation without getting yourself banned. Please also follow the directions in this chapter.
To use the comment exporter, you need a valid session cookie.
This can be obtained with the sessiongenerate protocol mode or by posting login
information to the
Comment Data Summary
|maxid||meta||yes||This element gives you an integer value of the maximum comment id currently available in the user's journal. This is the endpoint, inclusive.|
|comment||id||meta, body||no||The id of this particular comment.|
|comment||posterid||meta, body||yes||The id of the poster of this comment. This can only change from 0 (anonymous) to some non-zero number. It will never go the other way, nor will it change from some non-zero number to another non-zero number. Anonymous (0) is the default if no posterid is supplied.|
|comment||state||meta, body||yes||S = screened comment, D = deleted comment, F = Frozen comment, A = active (visible) comment. If the state is not explicitly defined, it is assumed to be A.|
|comment||jitemid||body||no||Journal itemid this comment was posted in.|
|comment||parentid||body||no||0 if this comment is top-level, else, it is the id of the comment this one was posted in response to. Top-level (0) is the default if no parentid is supplied.|
|usermap||id||meta||no||Poster id part of pair.|
|usermap||user||meta||yes||Username part of poster id + user pair. This can change if a user renames.|
|body||body||no||The text of the comment.|
|body||body||no||The text of the comment.|
|subject||body||no||The subject of the comment. This may not be present with every comment.|
|date||body||no||The time at which this comment was posted. This is in the W3C Date and Time format.|
|property||body||no||The property tag has one attribute, name, that indicates the name of this property. The content of the tag is the value of that property.|
Please cache metadata, but note that it does contain things that can change about a comment. You should follow these instructions to update your cache once in a while.
Comment metadata includes only information that is subject to change on a comment. It is a lightweight call that returns a small XML file that provides basic information on each comment posted in a journal. Step 1 of any export should look like this:
After you have made the above request, you will get back a response like this:
<?xml version="1.0" encoding='utf-8'?> <livejournal> <maxid>100</maxid> <comments> <comment id='71' posterid='3' state='D' /> <comment id='70' state='D' /> <comment id='99' /> <comment id='100' posterid='3' /> <comment id='92' state='D' /> <comment id='69' posterid='3' state='S' /> <comment id='98' posterid='3' /> <comment id='73' state='D' /> <comment id='86' state='S' /> </comments> <usermaps> <usermap id='6' user='exampleusername2' /> <usermap id='3' user='exampleusername' /> <usermap id='2' user='bob' /> <usermaps> </livejournal>
The first part is the actual comment metadata. Each row will contain the mutable information about a single comment. After this data is the list of users and their ids. These mappings will never change, so feel free to completely cache these.
You should also notice the maxid line. This shows you the maximum comment id that is in this user's journal. You should use this number to determine if you are done downloading or not. So, in pseudocode, you should use something like this to get metadata:
sub gather_metadata get largest comment id known about from my cache GET /export_comments.bml?get=comment_meta&startid=
maxid+1add results to metadata cache if maximum id returned is less than maxid returned, call gather_metadata again end sub
Downloading the Comments
Comment body data is to be heavily cached. None of this data can change. Once you have downloaded a comment, you do not need to do so again.
Once you have the entire list of metadata, you can begin downloading comments. The steps you will use are much the same as for getting metadata. Again, here is some pseudocode:
sub download_comments get largest comment id we have fully downloaded GET /export_comments.bml?get=comment_body&startid=
maxid+1add results to comment cache if maximum id returned is less than maxid in metadata cache, call download_comments again if nothing was returned, and startid+1000 < maxid from metadata, call download_comments again end sub
The resulting format each time you hit
will look like this:
<?xml version="1.0" encoding='utf-8'?> <livejournal> <comments> <comment id='68' posterid='3' state='S' jitemid='34'> <body>we should all comment all day</body> <date>2007-03-02T18:14:06Z</date> </comment> <comment id='69' posterid='3' state='S' jitemid='34'> <body>commenting is fun</body> <date>2007-03-02T18:16:08Z</date> </comment> <comment id='99' jitemid='43' parentid='98'> <body>anonynote!</body> <date>2007-03-16T19:06:31Z</date> <property name='poster_ip'>127.0.0.1</property> </comment> <comment id='100' posterid='3' jitemid='43' parentid='98'> <subject>subject!#@?</subject> <body><b>BOLD!</b></body> <date>2007-03-16T19:19:16Z</date> </comment> </comments> </livejournal>
Users can now edit comments on-site, if they have the required usercap. This was introduced after the comment export facility was implemented. This means some comment data may change after it was originally posted.