Connecting the Dots: Phone Logs, Relationships, and Sharing

See Related: Behavior & Cognition Computer Science / IT Social Science

JASON WIESE

Carnegie Mellon University

Years in Grad School: 5

Don't Overshare, Don't Undershare: a Goldilocks Solution for Inferring Sharing Preferences and Social Relationships from Phone Logs

Privacy controls are infrequently set or changed in online social sharing systems, resulting in static privacy settings that seldom match the user’s preferences. Both over- and under-sharing in these systems can have negative social outcomes. Furthermore, as social sharing becomes more passive and continuous (such as Facebook’s “frictionless sharing”), these negative outcomes are exacerbated. The goal of this work is to partially automate the process of specifying sharing preferences by inferring dimensions of the social relationships that the user has with each of her friends. In this work we show that: 1. There is a connection between relationship metrics (i.e. tie strength and life facets) and sharing preferences, 2. There is some evidence that these metrics can be inferred from a user’s communication data, and 3. A computational understanding of social relationships might improve a user’s experience across a variety of applications.

Abstract
Share

Don't Overshare, Don't Undershare: a Goldilocks Solution for Inferring Sharing Preferences and Social Relationships from Phone Logs

10471 Views

Share Presentation

Link

Judges’ Queries (6) Discussion (14)

Judges’ Queries and Presenter’s Replies

Julia Hirschberg

Judge
Faculty: Project PI

May 20, 2013 | 07:25 p.m.

what sort of information will you need to collect from people to automate their sharing preferences, how likely is it that people will be willing to make their call logs available, and will this actually save them time in setting those preferences manually?
Jason Wiese

Lead Presenter

May 21, 2013 | 03:47 p.m.

Hi Dr. Hirschberg,

Thanks for your interest. The data so far shows that call and SMS logs are an okay start, but that there is still some room for improvement. We can imagine a broad set of additional data sources including email communication, IM, and social network site communication. I’m not sure yet which one of these additional sources will be most useful, but I think that probably adding each one will continue to help. Another issue is that we don’t really have rich histories of relationships over time, and mostly the call logs only go back 6 months. So, longer periods of data will be more helpful as well.

For the next question, certainly call logs are sensitive personal data that need to be handled with care. Ultimately, the user has to trust somebody with their call log data; by default it exists at least on their phones. Today, users install many applications on their smartphones (primarily on Android) that gain access to their call logs. For example, the Facebook App (which is the most downloaded app in the Google Play Store), collects lots of personal data, including the phone call logs. There is a question as to how well the user understands what they are giving up, and we certainly do not want to trick users. So, one solution here is that whoever the user trusts with their data (in this case, call logs) can provide some sort of interface for querying sharing preferences.

Finally, to address your last question, studies have shown that the vast majority of users do not bother to ever configure their privacy settings. While it would be interesting to see if our approach saves time compared to the standard approach, I think the much more exciting question is: does this approach move users towards sharing with the people that they intend to share with, while helping to avoid sharing with the people that they would not like to be sharing with. I don’t have a data-backed answer to this question just yet, but I strongly suspect that this approach is better than not setting privacy preferences at all.
Mostafa Bassiouni

Judge
Faculty: Project Co-PI

May 21, 2013 | 06:29 p.m.

Have you assigned the same weight to calls and SMS in your model? For example, will making 3 calls and sending 2 messages be the same as sending 5 messages? Also your work helps users who have a long history of calls/SMS that can be analyzed. How can you help beginners (new users) who are just starting to accept new friends?
Jason Wiese

Lead Presenter

May 22, 2013 | 05:06 p.m.

Hi Dr. Bassiouni,

We calculated the features for the models so that there could be both independent weights for calls and SMS, in addition to a combined weight. So, we calculated a set of features using call logs only, a set of features using SMS logs only, and a set of features that combined both calls and SMS. All three of these feature sets are used in the model. We computed information gain on these features, and the top five features were: 1. total count of calls and SMS, 2. total count of days of communication, 3. total number of calls, 4. total duration of all calls, and 5. percentage of calls and SMS that occurred within the last three months.
.
To answer your second question, there are a couple of ways that we could help users who do not have long histories of calls and SMS logs. One additional analysis that I have conducted subsamples our participants’ data to 7 days, 30 days, and 60 days, with corresponding accuracies of 73.6%, 75.9%, 76.3%. So you can see from this data that even with only a week’s worth of data, there is enough of a signal that we can begin to make some rough differentiation between friends. I can think of two additional strategies that are likely to help. First, we can incorporate more data sources, such as email logs, IM, and social network. Second, user interfaces could be designed so that as users add new friends the user is encouraged to label the relationship or sharing preferences.
Mary Kathryn Cowles

Judge
Faculty: Project PI

May 21, 2013 | 08:46 p.m.

What information is available solely within a user’s Facebook account that could be used to determine the desired sharing level with different friends? I know private messaging is available, but that seems like a pretty limited data source. Could you make a useful automated sharing allocation system for Facebook without access to external data such as email accounts or phone logs?
Jason Wiese

Lead Presenter

May 22, 2013 | 05:08 p.m.

Hi Dr. Cowles,

Yes, I suspect that this would work on some level using only data that is available within Facebook, though it is hard for me to say how the accuracies will compare. Using the Facebook API, the relevant data that is available includes:
1. the wall posts a user makes (which may be on friend’s walls)
2. the posts that have been made on the user’s own wall
3. comments that the user has made or that others have made in to the user
4. tagged photos (and the other people tagged in the photo), including photos that the user posts and tags friends in, as well as photos and stories in which the user has been tagged
5. “likes”
6. direct messages
7. with whom previous posts have been shared
8. mutual friends
.
In addition, Facebook does not make profile viewing data available through the API, but internally Facebook could use this data as well. One reason that it might be better to use data from outside of Facebook is that the user’s behavior on Facebook may be affected by the nature of the site and one’s own sharing settings. For example, a user may be very close to a housemate, but never interact with that person on Facebook because they see each other all the time. Ultimately, I believe that the data sources are complementary. We can use one or the other to infer sharing preferences, but the combination of both data sources will provide a richer and more accurate view that is more likely to capture the user’s preferences.

Presentation Discussion

Maggi Sliwinski

Graduate Student

Maggi's video »

May 20, 2013 | 03:25 p.m.

This was a great video! Thanks for sharing your research. I would love to see something like this developed further so I don’t have to set my preferences!
Jason Wiese

Lead Presenter

May 21, 2013 | 03:49 p.m.

Thanks for checking it out Maggi! We hope some day you won’t have to bother to specify preference too.
Jonah Bea-Taylor

Graduate Student

Jonah's video »

May 21, 2013 | 02:43 p.m.

Very useful research; I wonder if the prediction itself changes one’s sense of the relationship?
Jason Wiese

Lead Presenter

May 21, 2013 | 03:54 p.m.

That’s a great point Jonah, I’ve been thinking about that a lot too. I would imagine that the prediction would have some effect on one’s own perceptions of relationships, depending on how the information is presented.

I’m sure it also depends on how self-reflective the individual is, and even how much the individual already reflects on this data.
Kyana Young

Graduate Student

Kyana's video »

May 21, 2013 | 09:29 p.m.

Nice video and explanation.

Have you thought of any other applications for your sharing preferences and social relationships model?
Jason Wiese

Lead Presenter

May 22, 2013 | 05:15 p.m.

Thanks Kyana!
Yes, I’ve been very engaged in thinking about different ways that these models could be used. In particular, social relationships are associated with physical and mental wellbeing. I can imagine that having a computational understanding of a user’s social relationships can be displayed to the user as a gauge to see how they’re doing, it might be used as diagnostic information that they could share with a health professional, or it might be used as a trigger for technology-based behavioral nudges.
.
I think the other really big opportunity here is in personalizing applications. This might include something like being able to sort your friends list by how close you are to them, instead of alphabetically.
Debra Bernstein

Project Associate

May 22, 2013 | 11:12 a.m.

Great job on the video!
It sounds like your research goal is (partially) to set sharing preferences so users don’t have to do it themselves. But I wonder if there are hidden benefits to the act of setting those preferences, that users would lose or miss if the process were automated? For example, maybe thinking through the members of your network make you more likely to share with them?
Jason Wiese

Lead Presenter

May 22, 2013 | 05:23 p.m.

Hi Debra,

Thanks! I think it’s entirely possible that you’re correct. On the other hand, part of the motivation for this work was based on surveys that showed the majority of users never set their privacy preferences at all, so these people would not be missing out on anything.
.
I think there’s also the opportunity to use these models to help people reflect more on their own social relationships by showing them what the model predicts. It could be really interesting, for example, for a user to say “Hey, I’m close to this person, but it doesn’t think that we are close. Why is that?” I think there are a lot of exciting opportunities there.
Rachel Ferebee

Graduate Student

Rachel's video »

May 22, 2013 | 04:41 p.m.

Great video! I’m glad to see fellow CMU Trainees at this competition!
Jason Wiese

Lead Presenter

May 22, 2013 | 05:24 p.m.

Thanks Rachel, nice job on yours as well!
Kathryn Furby

Graduate Student

Kathryn's video »

May 23, 2013 | 12:34 a.m.

Hi Jason! Interesting research, thanks for sharing.
Jason Wiese

Lead Presenter

May 23, 2013 | 04:19 p.m.

Thanks Kathryn! Your video was great too, I love the idea of using poetry to communicate research!
Philomena Chu

Graduate Student

Philomena's video »

May 23, 2013 | 04:00 p.m.

Great video and animations. I’m curious what the inaccuracies in the predictions can be attributed to?
Jason Wiese

Lead Presenter

May 23, 2013 | 04:13 p.m.

Hi Philomena,

Thanks! I think there are several major reasons that we see this level of inaccuracy. The first reason is that the call logs for most of our participants were no longer than 6 months, because Android only stores the most recent 500 calls. So, say that you were really close to somebody in high school, and still want to share things with that person now, but you don’t talk to that person on the phone very often anymore. If we had years of call logs, or some other longer-term view of that relationship, I suspect the model would be more accurate.
.
I think the second reason is related to the first: People don’t only interact with each other by calling and texting them. If two friends exchange most of their communication over email, then this model would not have captured that relationship very well. So, together what I’m saying is that more data sources, and data logs that spanned a larger period of time, are both likely to help.
.
Finally, we made a generalized model across all of our participants, but I suspect that there are some individual differences about sharing preferences. So, it may be that one person is more willing to share everything than another person, or for a particular sharing scenario some people are just happy to share their location no matter what and others would rather never share it.
Further posting is closed as the event has ended.