LiveJournal provides an interface for exporting comments using an XML format that makes it easy for people to write utilities to use the information. A user is allowed to download comments for any journal they administrate.
Please read the LiveJournal Bot Policy page, which discusses more general rules on how to download information from our servers without getting yourself banned. Also please follow the directions contained in this guide.
In order to use the comment exporter, you will need to have a valid session cookie. This can be obtained with the sessiongenerate protocol mode or by posting login information to the login.bml page.
|maxid||meta||yes||This element gives you an integer value of the maximum comment id currently available in the user's journal. This is the endpoint, inclusive.|
|comment||id||meta, body||no||The id of this particular comment.|
|comment||posterid||meta, body||yes||The id of the poster of this comment. This can only change from 0 (anonymous) to some non-zero number. It will never go the other way, nor will it change from some non-zero number to another non-zero number. Anonymous (0) is the default if no posterid is supplied.|
|comment||state||meta, body||yes||S = screened comment, D = deleted comment, A = active (visible) comment. If the state is not explicitly defined, it is assumed to be A.|
|comment||jitemid||body||no||Journal itemid this comment was posted in.|
|comment||parentid||body||no||0 if this comment is top-level, else, it is the id of the comment this one was posted in response to. Top-level (0) is the default if no parentid is supplied.|
|usermap||id||meta||no||Poster id part of pair.|
|usermap||user||meta||yes||Username part of poster id + user pair. This can change if a user renames.|
|body||body||no||The text of the comment.|
|subject||body||no||The subject of the comment. This may not be present with every comment.|
|date||body||no||The time this comment was posted at. This is in the W3C Date and Time format.|
|property||body||no||The property tag has one attribute, name, that indicates the name of this property. The content of the tag is the value of that property.|
Comment metadata includes only information that is subject to change on a comment. It is a lightweight call that returns a small XML file that provides basic information on each comment posted in a journal. Step 1 of any export should look like this:
After you have made the above request, you will get back a response that looks something like this:
<?xml version="1.0" encoding='utf-8'?> <livejournal> <maxid>100</maxid> <comments> <comment id='71' posterid='3' state='D' /> <comment id='70' state='D' /> <comment id='99' /> <comment id='100' posterid='3' /> <comment id='92' state='D' /> <comment id='69' posterid='3' state='S' /> <comment id='98' posterid='3' /> <comment id='73' state='D' /> <comment id='86' state='S' /> </comments> <usermaps> <usermap id='6' user='test2' /> <usermap id='3' user='test' /> <usermap id='2' user='xb95' /> </usermaps> </livejournal>
The first part is the actual comment metadata. Each row will contain the mutable information about a single comment. After this data is the list of users and their ids. These mappings will never change, so feel free to completely cache these.
You should also notice the maxid line. This shows you the maximum comment id that is in this user's journal. You should use this number to determine if you are done downloading or not. So, in pseudocode, you should use something like this to get metadata:
sub gather_metadata get largest comment id known about from my cache GET /export_comments.bml?get=comment_meta&startid=maxid+1 add results to metadata cache if maximum id returned is less than maxid returned, call gather_metadata again end sub
Once you have the entire list of metadata, you can begin downloading comments. The steps you will use are much the same as for getting metadata. Again, here is some pseudocode:
sub download_comments get largest comment id we have fully downloaded GET /export_comments.bml?get=comment_body&startid=maxid+1 add results to comment cache if maximum id returned is less than maxid in metadata cache, call download_comments again if nothing was returned, and startid+1000 < maxid from metadata, call download_comments again end sub
The resulting format each time you hit export_comments.bml will look like this:
<?xml version="1.0" encoding='utf-8'?> <livejournal> <comments> <comment id='68' posterid='3' state='S' jitemid='34'> <body>we should all comment all day</body> <date>2004-03-02T18:14:06Z</date> </comment> <comment id='69' posterid='3' state='S' jitemid='34'> <body>commenting is fun</body> <date>2004-03-02T18:16:08Z</date> </comment> <comment id='99' jitemid='43' parentid='98'> <body>anonynote!</body> <date>2004-03-16T19:06:31Z</date> <property name='poster_ip'>127.0.0.1</property> </comment> <comment id='100' posterid='3' jitemid='43' parentid='98'> <subject>subject!#@?</subject> <body><b>BOLD!</b></body> <date>2004-03-16T19:19:16Z</date> </comment> </comments> </livejournal>
That concludes this brief tutorial on exporting comment data in an appropriate manner so as not to be overly hard on the LiveJournal servers. Thanks for your cooperation, and don't forget to read the Bot Policy page.