Log in

Or connect using:

lj_dev: RSS, XML, Encodings, fun

Brad Fitzpatrick (bradfitz) wrote in lj_dev,
@ 2001-11-03 00:03:00
ARRAY(0x2b9de5873340)
RSS, XML, Encodings, fun
I added RSS support today (example) because somebody asked for it recently.

It was really easy, but right now we're spitting out bad XML on journals that aren't in UTF-8 or a subset.

I'm fixing this by doing everything properly with XML::DOM and Unicode::MapUTF8 ... both great modules.

We have a 'lang' field in the user table. We'll also need a default encoding userprop probably. We need to expose that and the language field, then.

And we should modify the protocol to let it take an encoding to convert from. Internally we'll store all data as utf-8.

And if we detect a charset encoding with an HTTP POST, we'll do the conversion automatically. Still have to look into how that works (which HTTP request headers are sent...).

Just wanted to make it known that I really do want LiveJournal to be smart about encodings and non-English languages. It's just slow going doing so much at once.

Anybody interested in working on this? If you need help I can guide you.


(31 comments) - (Post a new comment)


gleepy
2001-11-03 12:23 am UTC (link)
RSS is good. (Think of products like Amphetadesk benefitting from making a quick look at LiveJournal entries.)

(Reply) (Thread)


gleepy
2001-11-03 12:30 am UTC (link)
Hey, I added your link to AmphetaDesk v0.91. Looks nice.

(Reply) (Parent)

RSS syndication
momokatte
2001-11-03 09:17 am UTC (link)
According to this support request, <URL> should be <link>.

<url> belongs in the <image> sub-element, and contains the URL of a GIF, JPEG or PNG image that represents the channel.

(Reply) (Thread)

Re: RSS syndication
momokatte
2001-11-03 09:23 am UTC (link)
Nevermind, I just looked at the spec and your implementation appears to be correct.

(Reply) (Parent) (Thread)

Re: RSS syndication
momokatte
2001-11-03 09:25 am UTC (link)
Argh! My brain is fried!

Requester is correct. Check out this resource:
http://backend.userland.com/rss091#whatIsAnLtitemgt

(Reply) (Parent)


way2tired
2001-11-03 12:00 pm UTC (link)
Ok, so what do we use this for? Is it just to spit out a bunch of links, or is there more to it?

(Reply) (Thread)


opiummmm
2001-11-03 11:29 pm UTC (link)
RSS is basically a syndication protocol based on XML. It has its uses, plenty of which I'm sure I'm not thinking of, but it seems that its most popular for news-ticker type programs.

(Reply) (Parent) (Thread)


opiummmm
2001-11-03 11:30 pm UTC (link)
Scratch that, not based on XML, it is XML. Hehe :-)

(Reply) (Parent)


twistah
2001-11-04 02:42 pm UTC (link)
RSS is a subset of RDF. Because XML is a hot buzzword, the Internet is full of information about both formats and their uses, so you'd probably get more information from a search than from a post here. You can start with the links I provided in this post.

(Reply) (Parent)


avva
2001-11-03 07:06 pm UTC (link)
Anybody interested in working on this?

I am. I used a hammer to free some reasonable chunks of time to do some lj_dev stuff, finally. Will you have me back? ;)

(Reply) (Thread)


bradfitz
2001-11-03 09:48 pm UTC (link)
please! :)

(Reply) (Parent)

RSS for friends pages?
markpasc
2001-11-04 12:00 pm UTC (link)
What about RSS versions of friends pages? All the journals I'd like to get in RSS are already aggregated into my friends page. Rather than make N HTTP requests for my N friends' RSS files, I could make one if my friends page were available in RSS.

(Reply) (Thread)

Re: RSS for friends pages?
bradfitz
2001-11-04 12:16 pm UTC (link)
True.

I'll try to get somebody to add this, or I'll do it myself when I got a minute.

(Reply) (Parent) (Thread)

sorry...
gomolyako
2004-09-01 08:42 am UTC (link)
where is i can read information about RSS versions of friends pages? thx.

(Reply) (Parent)

3+ years later
everdred
2005-01-10 10:01 pm UTC (link)
Found a minute yet? ;)

(Reply) (Parent)


insomnia
2001-11-04 12:09 pm UTC (link)
Just a check on this... the RSS feeds only list public posts, right? No private or friends-only?

(Reply) (Thread)


bradfitz
2001-11-04 12:14 pm UTC (link)
yup

(Reply) (Parent)


twistah
2001-11-04 04:18 pm UTC (link)
Just as a note about non-UTF8 stuff, I looked at avva's journal (via RSS) and Internet Explorer 5.5 spit out an XSL error, but when I plugged the URL into FeedReader (a Windows app which reads RSS feeds), everything "showed up" -- the charachters were garbled, but that is probably because I don't have some Russian/Cyrillic supprort installed. But come to think of it, Windows 2000 comes with all languages supported by default (AFAIK) so my theory is probably wrong...

(Reply) (Thread)


bradfitz
2001-11-04 04:24 pm UTC (link)
IE 5.5 is correct. Your theory is wrong.

XML by default is interpretted as UTF-8. His posts are in Windows-1251 (Cryillic). When the XML parser hits his characters above 127, it tries to unpack them as Unicode characters and fails.

We need to convert his code page to UTF-8. There is a perl module to do this (Unicode::MapUTF8) but first we need to tell it the source encoding.

(Reply) (Parent)


bradfitz
2001-11-04 04:25 pm UTC (link)
Also, FeedReader is broken.

The XML spec says that you MUST barf hardcore on bad input. We are giving it bad input ... it shouldn't try to recover.

(Reply) (Parent)


mart
2001-11-05 06:31 am UTC (link)

We need a lang field in log and possibly talk too, since people sometimes write in languages other than their primary language. The interface to this is of course a pain, but at least if the field is there we can find some wonderful way of having the user specify it which is user-friendly.

Besides, it'd be cool if the HTML output on a journal view could do <something lang="es"> around the entries where they differ from that set in the user table...

(Reply) (Thread)


bradfitz
2001-11-05 07:02 am UTC (link)
We're 10 steps ahead of ya, yo. :)

If we convert everything to UTF-8 we can simply mix every encoding all on one page, as UTF-8 encompasses all code pages.

(Reply) (Parent)


bradfitz
2001-11-05 07:32 am UTC (link)
What I also meant to say is that encoding != language and language != encoding. There could be 10 encodings for one language and 10 languages for a particular encoding.

(Reply) (Parent)

Support [description]'s?
morbus
2001-11-05 07:44 pm UTC (link)
Hey there - I'd like to throw in a vote for [description] tags. I read about 10 to 15 LJ's a day, and it would be great to load them all up in AmphetaDesk (I'm the creator of AmphetaDesk - see it here: http://www.disobey.com/amphetadesk/) and read the entire post (with HTML) and then just click to comment on the ones that interest me. What are your thoughts?

(Reply) (Thread)

Re: Support [description]'s?
bradfitz
2001-11-21 10:04 am UTC (link)
I got a patch recently to allow that. I'll get it in soon.

(Reply) (Parent) (Thread)

Re: Support [description]'s?
voidstar
2001-11-25 07:08 am UTC (link)
Any news on getting this patch in? Anything I can do to help?

(Reply) (Parent)

Re: Support [description]'s?
voidstar
2001-12-16 01:52 am UTC (link)
Still waiting ;-(
This year? Next year? Sometime? Never?

(Reply) (Parent)

What about exporting your interests?
wkearney
2001-11-21 06:58 am UTC (link)
Hi,

I'm working with a bunch of folks over in the Syndic8 mailing list to develop an extension to RSS that supports categories. The nearest equivalent I can see in LJ is either a Topic or the interest keywords.

What are your thoughts on including that information with the feed and/or with each item?

You're all welcome to read/join the syndic8 list. We'd welcome the input.
http://groups.yahoo.com/group/syndic8

Thanks,
Bill Kearney

(Reply) (Thread)

Re: What about exporting your interests?
bradfitz
2001-11-21 10:05 am UTC (link)
That'd be cool.

Not sure how useful it'd be, though ... how would an RSS consumer present it?

(Reply) (Parent) (Thread)

Re: What about exporting your interests?
wkearney
2001-11-21 10:13 am UTC (link)
It's not so much as how existing RSS clients would display it. This is, to be sure, a chicken & egg situation.

The first instance of something that will use these categories is going to be the feed browser on a web page. Syndic8 is going to do it as will some others. The point is to allow someone finding a feed to traverse to other feeds based on possibly similar categories or just random wandering. Right now we're are the mercy of NOTHING other than the title and the description text.

Eventually it's likely that a client interface might begin to learn and correlate those categories. This is why we're planning on making the category structure capable of using an external namespace. DMOZ is just one of them; the LJ topics and keywords are possibly others.

There are things that are broken in some of the client interfaces. After we get a grip on the possible category structures we're going to evagelize the authors of the toolsets on how to use them.

The point? Finding others with like interests across all sorts of different spaces.

-Bill Kearney

(Reply) (Parent)

What's new with LJ and RSS ?
quercus
2002-06-04 06:15 am UTC (link)
Anything happening in the LJ / RSS world ? (I'm new to LJ)

Anyone interested in working on RSS 1.0, instead of 0.91 ?

(Reply)


(31 comments) - (Post a new comment)

Welcome to the new LiveJournal

Some changes have been made to LiveJournal, and we hope you enjoy them! As we continue to improve the site on a daily basis to make your experience here better and faster, we would greatly appreciate your feedback about these changes. Please let us know what we can do for you!

Send feedback

Switch back to old version

LiveJournal Feedback

See a bug? Let us know! Here you can also share your thoughts and ideas about updates to LiveJournal

Your request has been filed. You can track the progress of your request at:
If you have any other questions or comments, you can add them to that request at any time.

Send another report Close feedback form

If you're a LiveJournal user, you will be logged in after submitting your request.

(optional, if you're a LiveJournal user only)

(optional, if you're a LiveJournal user only)

(not shown to the public)

If you have a billing inquiry, please go here to submit your question.

Provide a link to the page where you are experiencing the error

Do not include any sensitive information, such as your password or phone number. No HTML allowed.

If you can't pass the human test, email your inquiry to: support@livejournal.com