Paul Crowley ([info]ciphergoth) wrote in [info]trustmetrics,
@ 2003-08-13 19:09:00
Previous Entry  Add to memories!  Tell a Friend!  Next Entry
A quick explanation of TrustFlow
TrustFlow does not look at interests, who reads your journal, or any other such thing. It looks only at "friends" lists. It's trying to determine who is closest to who based on who lists who as a friend.

Imagine this. Everyone on LJ has a bucket that can hold 500 tokens. We start off with every bucket empty. The order in which TrustFlow lists people is the order in which each person manages to fill their own bucket.

You are the source of tokens; you get tokens one at a time. First, you put them into your own bucket, until that bucket is full. Bang, you are top of your own list. You then give more tokens to your friends, in strict rotation; imagine a parent handing out a bag of sweets to kids "one for you, and one for you and one for you, and a second for you and a second for you...". Each of your friends places the token in their own bucket. Finally, after you've given out 500 tokens for each of your friends, all their buckets fill one after another, and they all join the list.

Since all that is totally predictable, we don't bother to list you or your friends, because it's after that that things get interesting.

You get a token. Since your bucket is full, you pass it to a friend. But their bucket is full too, so they pass it to a friend in turn; like you, they issue the tokens they get to their friends in strict rotation. Each token ends up in the bucket of someone who isn't on the list. Keep going like this, and eventually someone's bucket will fill up. They become the next person to join the list. From then on they, like you and your friends, pass on any more tokens they get to their friends. Keep going until we have 50 full buckets, and list the order the buckets fill up.

That's basically it. There's a slight complication to do with dead ends but that doesn't matter much for the basic understanding.

Here's some of the consequences.
  • Supposing Alice and Bob are on your friends list. Alice has two friends; Bob has 20. All other things being equal, each of Alice's friends gets 10 tokens for every token one of Bob's friends get, because Alice and Bob are receiving just as many tokens but Alice is sharing them between fewer people.
  • If many of your friends list me, I will get more tokens.
  • If you list Alice and Bob, and Alice lists Bob, then in addition to the tokens he gets from you, Bob will sometimes get tokens from Alice. As a consequence Bob's friends get a few more tokens.
  • Who your friends are matter only once you get on the list. It doesn't matter if my friends list is just the same as yours, it doesn't affect my ranking at all; my friends list matters only once my ranking is decided.
  • The "Trust" in "TrustFlow" is there because it's really meant for a different situation - one where you list people to indicate that you trust them. That's emphatically not the case on LiveJournal - LJ was just a handy platform on which to try the ideas out. It doesn't measure trust here, more the sorts of qualities which lead to putting someone on your friends list, which is generally acquaintance and interestingness.
  • The "Flow" is there because the trust starts from you, and "flows out" down the links to your friends, and then onto their friends.
I hope that helps answer some of the questions people have had!



(Post a new comment)


[info]fool_in_spirit
2003-08-13 11:27 am UTC (link)
How big are the buckets? How many token can they contain?

(Reply to this)(Thread)

er,,,,
[info]gymli
2003-08-13 12:45 pm UTC (link)
Everyone on LJ has a bucket that can hold 500 tokens

^from the original post ^

(Reply to this)(Parent)(Thread)

Re: er,,,,
[info]ciphergoth
2003-08-13 01:21 pm UTC (link)
This number was chosen as a trade-off between performance and accuracy,

(Reply to this)(Parent)

Re: er,,,,
[info]fool_in_spirit
2003-08-14 01:39 am UTC (link)
got it, sorry. I thought everyone had 500 tokens to give.

Pietro

(Reply to this)(Parent)


[info]rampagingcarrot
2003-08-13 01:10 pm UTC (link)
One of the unintended consquences is that if one of your friends has only 3 or 4 people on his friends list, then those 3 or 4 people will end up near or at the top of the list (depending on how connected your friends are).

(Reply to this)(Thread)


[info]ciphergoth
2003-08-13 01:23 pm UTC (link)
Yes, this is probably the biggest problem with the metric that this trial has revealed. There has to come a point where having more arcs devalues each arc, but it's tempting to set a bottom line of, say, 20 friends, such that if you have fewer than 20 friends then some of the tokens you get are thrown away.

(Reply to this)(Parent)(Thread)


[info]rampagingcarrot
2003-08-13 04:23 pm UTC (link)
Wow, thanks for responding. I don't know very much about programming language at all, having not spent much time learning much in real applications. I only have the basis of a C++ education and the consequent understanding of the inrinsic role of logic in programming. Thus I was somewhat wary posting that fact, which I correctly guessed you already knew about, and I hope you didn't think that I thought I was pointing out something new to you (I thought of myself kind of acting as a lowly middleman of information).

Figuring out how your metric applied to my friends list was a very fun exercise in logic, and I'd like to help anyway I can, knowing full well that your mental capacity for understanding logical structures is easily sufficient enough for this enterprise.

The problem we have been talking about, is there currently a discussion on it somewhere with possible solutions? I'd be interested in looking at it, even if my lack of technical expertise renders me useless. Always eager to learn.

(Reply to this)(Parent)(Thread)


[info]ciphergoth
2003-08-14 12:32 am UTC (link)
Any discussion will either be in this community or linked to from here.

(Reply to this)(Parent)


[info]ruakh
2003-08-13 08:47 pm UTC (link)
Alternatively, if you're willing to abandon the nice water-flowing imagery, you could use some other rule to distribute tokens from a person to his friends. Currently, if a person's bucket is full and he's getting 1 token per minute, then each of his friends will get n-1 tokens per minute (assuming he has n friends); but you could use some other function - say, n-.5 - that offers less of a penalty for certing more people.

The major downsides that I can think of are (1) that this destroys the water-flowing imagery (since a person will "pass along" more tokens than they receive), and (2) that this adds an extra degree of arbitrariness (how do you decide the best function of n?).

(Reply to this)(Parent)(Thread)


[info]ciphergoth
2003-08-14 12:33 am UTC (link)
The trouble with this is that it breaks the proof of attack resistance...

(Reply to this)(Parent)(Thread)


[info]ruakh
2003-08-14 10:44 am UTC (link)
Ah.

I haven't actually seen your proof, but I take it one of the key features is that someone can't dispense trust faster than they obtain it?

(Reply to this)(Parent)(Thread)


[info]ciphergoth
2003-08-15 03:46 am UTC (link)
I haven't actually written up the proof :-) but yes, that's the key thing. It's all pretty obvious from there TBH.

(Reply to this)(Parent)(Thread)


[info]ruakh
2003-08-15 08:19 pm UTC (link)
It's all pretty obvious from there TBH.

Obvious to you, perhaps; but you flatter me unduly if you think it's obvious to me.

:-)

(My knowledge of trust metrics comes almost exclusively from your explanation of this one, and I have no idea what would even constitute a proof that a given trust metric is resistant to attack. For that matter, I have no idea how one would formally define resistance to attack. So all told, I'm quite clueless on the matter.)

(Reply to this)(Parent)(Thread)


[info]ciphergoth
2003-08-16 12:24 am UTC (link)
Look at the proof for Advogato's trust metric:

http://www.advogato.org/trust-metric.html

With my metric, you can say that there's a bound on the amount of water flowing into nodes controlled by bad people, and that means that there's a bound on the number of nodes they can get certification for.

(Reply to this)(Parent)(Thread)


[info]ruakh
2003-08-16 02:39 pm UTC (link)
With my metric, you can say that there's a bound on the amount of water flowing into nodes controlled by bad people, and that means that there's a bound on the number of nodes they can get certification for.

Except that in your metric, there's an arbitrary amount of water flowing into the seed, so over time all the bad people can get certification and start spilling over more water?

Actually, it seems that in your metric, everyone will eventually get certification, as long as there's any chain of trust from you to them? (Though if they're far away, and some of the intermediaries offer many certs, then it will take much longer.)

Suddenly I think I must be missing something fundamental with your metric. In your nifty LJ tool, you cut off the list after 50 friends; I took that to be just an aspect of the tool, but now I'm wondering - is that an aspect of your metric? That it trusts only n people? Because if so, then I think your system is automatically resistant to attack - because no matter how what volume of spam you have, there is a fixed upper bound on how many bad servers can gain certification.

(Reply to this)(Parent)(Thread)


[info]ciphergoth
2003-08-17 09:43 am UTC (link)
These are very good points.

I had originally thought the tool should map everyone onto a "trust ranking", the idea being that you would set a cutoff on how low a ranking you'd accept to trust someone with some privilege (eg commenting in your journal). I now think it should map each person onto a real number representing the amount of water it takes to get them trusted - I'll explain my reasons for that change in another post, but one reason is that with that change a person can much more easily present a subgraph of the graph as proof that they are sufficiently trusted.

However, you're right that in either case the proof of attack resistance is trivial: there is a strict bound on the absolute number of people who can pass the bar, so of course there's a bound on how many people an attacker can get past.

I'd be very interested in ideas on how to make a more strict definition of attack resistance that a bounded metric would fail: something about for every n there are circumstances under which an attacker can get no more than n people trusted, but stricter still because I can still design bad metrics which pass that definition.

Thanks for posting!

(Reply to this)(Parent)(Thread)


[info]ruakh
2003-08-17 11:16 am UTC (link)
Thoughts toward a stricter definition of attack resistance:

Advogato's definition can be rephrased approximately thus:

"Resistance to attack is the property that the number of trusted accounts that are untrustworthy is independent of the total number of accounts that are untrustworthy."

For a metric that can only trust a bounded number of accounts, I think you might adjust this definition just slightly:

"Resistance to attack is the property that the proportion of trusted accounts that are untrustworthy is independent of the overall proportion of accounts that are untrustworthy."

How would that be?

(Reply to this)(Parent)(Thread)


[info]ciphergoth
2003-08-18 02:24 am UTC (link)
"independent of" is statistical terminology; you'd have to have a statistical model which assigned each possible trust graph a probability before you could use that terminology. That's the wrong way to go, because a big part of the trust graph is not random - it's chosen by the attacker.

The definition needs to explicitly specify ideas about which nodes are attacker controlled, and state something like "For a given trust graph G, there is no attacker's supergraph G such that..." where "attacker's supergraph" is a supergraph that adds no arcs from good nodes to bad nodes.

When writing these definitions, it's good to have examples in mind of the sorts of metrics that should fail. Here's one that passes all those definitions: the top entries in the ranking are your direct friends, and the rest are everyone else ranked by number of incoming arcs. This is exactly the sort of non-attack-resistant trust metric people are given to proposing, but the proportion of badniks who can get trust is bounded: if you have 25 friends and you rank 50 people, only 25 badniks can get trust.

I can't think of a strict definition of attack resistance that this fails.

(Reply to this)(Parent)(Thread)


[info]ruakh
2003-08-18 10:33 am UTC (link)
Oh, I see.

Apparently, the link you sent me - Advogato's (informal?) proof of attack resistance - doesn't really bother with a formal definition, then?

Sorry about that!

(Reply to this)(Parent)(Thread)


[info]ciphergoth
2003-08-18 12:01 pm UTC (link)
Another dirty secret of trust metrics that I tried to shove under the carpet comes to light! No, Advogato doesn't actually formally define what attack resistance means, and any attempt I make at such a definition is either too inclusive or includes nothing. I'm working on it.

Meanwhile the best I can say is that the 1/n friends list falloff is necessary for the proofs of the properties that in turn give me the warm fuzzies about attack resistance.

(Reply to this)(Parent)


[info]vampwillow
2003-08-13 02:49 pm UTC (link)
as a metric I like it in many ways, but have a slight problem with the 'token' element. You aren't splitting the tokens so, it appears, that only whole units of tokens get redistributed. If you are allocating them to friends' lists (and onward repeated) until they are used up then surely there is a bias in favour of people listed earlier ineach newly-discovered/linked friends list that tokens reach, thus, as LJ resorts your friends list from the order of aquisition into alphabetical order then the 'A's are more likely to get those odd tokens than the 'Z's (or 'V's!). Am I correct in this reading?

(Reply to this)(Thread)


[info]ciphergoth
2003-08-13 03:02 pm UTC (link)
Yes. However the bigger the bucket the less difference it makes. With a 500 token bucket it should be OK.

The correct solution to this is the continuous version of the trust metric, in which "trust liquid" pours continuously into my bucket, then pours equally into those of all my friends, and so forth. However this involves solving 50 systems of linear equations each with up to 800 variables, and I don't know enough about the subject to do that efficiently.

Anyone who knows a good algorithm for finding the eigenvector with eigenvalue 1 of a somewhat sparse 800x800 matrix, given an OK approximation to that eigenvector, please do get in touch.

(Reply to this)(Parent)(Thread)


[info]vampwillow
2003-08-13 03:54 pm UTC (link)
well *that* was a trip down memory lane! (eigenvectors and eigenvalues, etc.) and I'm not sure how much of it is really accessible in the back of the brain there.

Is there anything specific about this 800x800 matrix? Presumably all numbers involved are real, and I'm guessing that it isn't symmetric?

(Reply to this)(Parent)(Thread)


[info]ciphergoth
2003-08-14 12:30 am UTC (link)
All values are real and non-negative. Many are zero. No, it's not symmetric

Here's the exact structure. The matrix has a column for every trusted person; column 1 represents the trust root. If a, b are two trusted people, and b is not the trust root, and there is an arc from a to b (ie a has friended b) then m_a,b = 1/n where n is the number of people on a's friends list. m_a,1 is whatever value will make the column sum to one.

So the number of rows is the number of trusted people, and the number of non-zero entries is roughly the number of rows plus the number of arcs between trusted people.

If you have over (I think) 750 people on your friends list, LiveJournal doesn't display it on your friends page directly; the current TF code gives up on fetching it and acts as if you list no friends. So the most people can be trusted at any one time during a trust metric calculation is 1 + 750 + 49 = 800 (you, your friends, and those found so far).

Finding the right eigenvector for this matrix tells you how much trust flows through each trusted person. From there finding the trust for all the untrusted people is just a question of adding it up.

(Reply to this)(Parent)(Thread)


[info]ciphergoth
2003-08-14 02:18 am UTC (link)
From there finding the trust for all the untrusted people is just a question of adding it up.

Actually the last step is more complex than that. Finding the rate at which trust liquid flows into the buckets of the untrusted people is just a question of adding it up, but from there you have to take into account the liquid already in the buckets to figure out whose bucket(s) will fill first. You then add the right amount of water to all untrusted buckets, add the newly full bucket(s) to the trusted list, and go around again, recalculating the eigenvector for the newly enhanced graph.

I talked about this a little in the first entry presenting the trust metric.

(Reply to this)(Parent)


[info]puzzledstars
2003-08-14 12:42 pm UTC (link)
ok. this makes no sense to me at all.
all i'm wondering is that when the list of names pops up:
is that people who read your journal?
or people's journal you read?

(Reply to this)(Thread)


(Anonymous)
2003-08-15 01:27 pm UTC (link)
it's a list of friends of friends.

(Reply to this)(Parent)(Thread)


[info]ciphergoth
2003-08-16 12:19 am UTC (link)
There's more to it than that.

(Reply to this)(Parent)


Create an Account
Forgot your login?
Login w/ OpenID
English • Español • Deutsch • Русский…