April 6, 2020
Sarit Kraus – Attention Based Fraud Detection of Online Banking Transactions

Sarit Kraus – Attention Based Fraud Detection of Online Banking Transactions

– Welcome to the ISTS
Distinguished Colloquium Talk, this evening, by Professor Sarit
Kraus, from Bar-Ilan University. It’s a very great pleasure
for me to host her, as one of the preeminent experts
in AI in the world today. Sarit has won virtually
every major AI award around. Going back to the 90s, when she won the IJCAI Computers and Thought Award, to the EMET Prize from
the president of Israel, for her work in artificial intelligence. She’s chairing the biggest AI conference of the year, later this year, I’m sure a matter of
both pride and regret, (chuckling) And so, she’s worked
extensively, over the years, on reasoning multi-agent
systems, autonomous systems where multiple agents cooperate to make intelligent decisions. She’s one of the founders of
the use of game theory in AI, for many, many real world problems, including patrolling problems, for which she’d received an
award from the city of LA. So, without further ado,
I’ll let Sarit tell us about her work today, on machine learning, applied to financial fraud. (applause) – Thank you V.S., for your kind word, and thank you for coming. Online banking is increasing. I don’t know if here, but in Israel, banks are just closing branches, because everything goes online, And for example, this app, we do all the transaction using this app, now in Israel and everywhere, but, as the amount of online
banking increasing, the frauds is also increasing. And there are many challenges in finding fraud in online banking. The main problem is the imbalance. There is one in many thousand transactions that is a fraud, but this one can cost the bank a lot of money. So, this is one problem. Another problem is, there is sparse data, because, every transaction
has its own features and its make it very difficult. There are multiple types of transactions. There are transaction involves
in more than one bank, and then you don’t have
information about the other bank, and there are relative small number of annotated examples and there
are not much data per user. So, that’s why, compare, for
example, to credit cards. In credit cards fraud,
there are many papers, a lot of model and it’s
working since the 90s. On the other hand, on online banking, and fraud in online
banking, it’s not common to have a paper’s
research about doing this. So, what does, what do
banks do about this problem? So, usually, they hired a third party, who is providing them a
service of identifying frauds. What these third party is doing, so, they have some machine learning models with many, many parameters
of the app, the IP, the type of machine, the location, many, many features and
parameters of their transactions, and then they have some
machine-learning model, and they don’t tell the bank
whether it’s a fraud or not. They are giving the bank some risk score. And now, the bank can do several things. Given the risk score,
they can decide to allow, putting the transaction as, so. They can deny it, these
are the two easy things. They can challenge, make
the challenge to the user, I don’t know, like, asking
his birth date or other authentication things or calling
the bank or giving you know Users of banks, customers of banks don’t like this authentications, or, they can be reviewed
by a human expert, but this has delay and also
human experts are very costly. So, given this score, risk score, the banks need to put thresholds over what out the five, the
four categories to apply, given their score and the
companies that provide the score, telling, what is the percentage that will be a fraud
transaction in each score. So, they decide, maybe how many
people they want to ba-sil, because of their worried, or
just, if there are transactions they make so much money,
there will be transactions that is a fraud, they will pay back. So, that’s something
to decide for the bank. Now– Oops, now the battery went. Okay, what should we do? We do this one. (background noise drowns out other sounds) I will do that. Okay, now, one important thing to say, is that, given their risk score, one of the things, that
the banks would like, to get explanations, why
this risk score, why, So, they can make better
decision on, also, retroactive, if I deny the access, or the transaction, customers want to know,
why you did it to us. So, explanation plays important role here. So, what does this company,
the third component, that provide the service, what do they do? So, they get data from the
bank, and, data that annotated. Some are annotated as fraud, and other are not annotated at all. That mean, usually, they are
okay, they are not fraud, but they, in order to give the service, they need annotated examples, and then, they are doing some
machine-learning game models, and then, they have some work. The problem with this, that,
the available data is limited. Why? Because the banks don’t really like to give data to somebody, even
if they are on their side. So, that’s one problem, that data is– There’s fixed procedures, so, the data is very
problematic, and as I said, they are poor history of a
user, because think about it, how many credit card
transaction you have in the day, and how many transaction you have. So, there are some– I saw the data. There are some customers,
that have many transactions, but most of us, how many transactions do we have in a bank in the day? I mean, in a week, in a month? Think about yourself, and you realize that this is not that common. You don’t have many transactions. So, in this talk, I would like to tell you about our adventures, that we have, in helping such search company, some company that provides this service, and, the first part will be about, Attention Based Neural
Network for Fraud Detection of Online Banking Transactions, and the second part of the talk will be about Transfer Learning, learning from one bank to another. And then, if I have time,
I’ll tell you about, really different works, that I’m doing about physical security. Okay, so, let’s start. So, one of the things, that when we came to this company, that we realized, that frauds are, usually,
is not one transaction. In order to perform a fraud in a bank, you need to do a few transactions. That is, you log in, you are looking at this things, you are, you are– Then, eventually, you move money, that’s your goal, and then you log out. So, essentially, the fraud
is based on sequence. However, unfortunately, major
to major current approacher, they are just looking at one transaction at a time, not on sequences. The best algorithms, the best approaches, what they are doing, they
do all sorts of averages. What is the average number of
times that this user uses IP? What’s the average number
of times he perform, in a month, that he move money, pay money? So, they are looking, but
they didn’t look at sequences, and we decided, that sequences should play an important role in finding fraud. However, managing sequences is not a easy. So, there are two ideas,
that we have in mind, and on thinking about machine
learning and sequences, everybody told us, use LSTM’s networks. We have friends, in
natural language in Berlin. They say, LSTM, this is the way to solve, your way, sequence problems. On the other hand, there was
other friends, that said, No, no, no, you should
use attention mechanism. Attention mechanism
allow you, in a sequence, to give different weight to
elements in the sequence, but they are simple, they
are they easy to implement, easy to train, use the easy thing. So, we have these two approaches, and we decided to a look
at them, and the idea was, why won’t we do hierarchical
form of attentions? We have two issues here. Each transaction has
features, and we can think of different attention of
features in different situations. So, this is one, and
then, the other thing is, we have the entire
sequence, and we can give different weights to different
transactions in the sequence. So, why won’t we use
attention on both levels? So, this is the idea, and
the most important issue of the attention is, that it give weights, both to features and to
transaction in the sequence. Then we can explain, why did
you think that this is a fraud? Oh, because this feature
got higher attention, or this transaction got higher attention, and this will be useful for explaining, why this activity was,
the sequence was fraud. So, we got data, real data, from one of the industry leading
foreign detection companies. The data is being sent to
the vendor, from the bank, and it was fixed, at predefined day, away. So, we have transactions,
which is a set of features, and then we have a sequence,
which are a set of transactions So, that what we are having. And we had three main
components of the system. One, we had to handle the features, the values of the feature. Why? Because, these features were
discrete and very sparse. So, we had to, somehow, make
the presentation more compact, and then, we move them
from a discreet case, to some dense feature presentation. So, we had to learn how to do that. Second, we want to do
attention over features, we had to learn this aspect as well. And, finally, we want to do
attention over the transactions. So, this is how it looks, the algorithm, but, if we look more locally, you can say, that first, we have to do
some feature embedding, and intuitively, we are
multiplying the features with a matrix, because feature
representation was sparse. We have discreet value, you have a vector, were each possible value appear as a beat. So, we are creating a feature embedding. Then, we are computing weights for each feature, using this function. This function, over the embedded features, with weight, and of
course, normalizing it, and then, we eventually have a value, associated with each feature. Once we have this, we can
move with each a transaction. Once we have this, we
can move to the sequence. Then we can compute the
value of a given transaction. What do we think, locally,
whether this is a fraud or not. Then, we need to give weight,
for each of the transactions, and then, we can have some sort of weighted averaging of the local decisions. So, this is the step of the process, and my student decided
that there are many things to learn, there, the
function of the features. So, he decided that he put one network and learned everything, at once. When students succeed, I don’t complain. (chuckling) And so, he had, I think, enough data. Okay, so, how did we get the data? So, we have six months of data from 2017, belonging to some South American bank, with 80 million
transactions, out of which, only 22 thousand were marked as fraud, the other weren’t marked. So, actually, you are not sure, the others are not fraud, or not, but we assume they are not fraud. Most features, we’ll categorize by nature, like IP, like wizards,
also, male or female, all sort of features like this,
but non-co-to-gal features, such as payments, which is
very important features, were also dis-ket-tari-sed. Then, we did also use an identifier. We have missing values,
a lot of missing values, about this data, it’s very noisy data, so, we clean it, not me, the student. Okay, then he did other
sample of the data, because there were so many transactions, and they, only 22 thousand. So, we kept all the 22 thousand fraud, and sample the other data, ’cause, otherwise, he couldn’t manage it, and we decided that the sequence, after doing all sorts of analysis, that’s the sequence will be a seven days parallel to a transaction, So, with a transaction, we
take all the transaction in the previous seven days,
and this is a sequence. So, think about it, sequence
were of different lengths, because, there are some users,
that have many activities in the seven days and others that, not. So, this makes it more difficult. And, we also, at the end, we ended out with 57 features for each transaction. Eh, life is not easy, what can I say. And so, you can see, this is an example of fraud, in the sense that they, this a malicious person did the log in, he views a statement,
did the log in again, then he made, eventually, payments. This is their goal, to make a payment to some place and they run with the money. So, we decided to compare various methods. So, our method is here, a
feature and sequence attention. This is what we propose,
that you have attention of both the features and the sequence, but we compare with all only
attention of the features only the attention of the sequence, Decaying weight, that is, we
have attention of the feature, and then, on the element of the sequence, we have some weights that was decaying as you went earlier in time. We have, what the company was doing now, only looking at the last transactions, and we did attention of the features, and then we use LSTMs as a compilation. So, here is the most interesting results. So, we are doing a
false-positive, false-negative. You need to decide, what to fail, to ba-sil people, because you told, about good transactions,
that it’s fraud, or, you can lose money, but how many fraud transactions you really found. So, this is a trade off, and of course, if I always say that a
transaction is a fraud, there will never be fraud
transactions that will– But then I ba-sil all the, you know– The banks want the
transaction to go through, because they’re making
money on these transactions, and they’re losing money on fraud. So, they need to balance. So, you can see,
interestingly, that I will– The one, that the attention
both on the features and the sequence, got the highest A-C. And then, you can see, the ones that did– We have a sequence, and then, the LSTMs. These are the top sequences. These are the three, that has
to do with the top methods, that has to do with
sequence, did the best. And of course, ours was a top. That’s why I am here, but
that was very interesting, because, we wanted to convince the vendor, you need to look at sequences. Stop looking just on,
look on transactions. So, this was a good proof to this, and, as we are noticing, it’s better to have attention on both the
features and the sequence, So, sequences are good, attention on both things are also good. Here, you can see how much
money we saved to the bank. But, using each of the– And here, this is where
the bank put the threshold. You know, as you are going
up, you are saving more money, as you are wasting the
time of more customers, you are saving more money. Usually, they’re putting
here, and that way, you can save 80 percent of the loss. So, you are looking at the money, how much money they will lose, if they are not saying this is a fraud, and this method can– But you can see it also the LSTMs, this is the yellow one,
was doing quite well at the beginning but
as the bank allow more to ba-sil more customers,
our methods did better. But again, sequences are better
than non-sequences, methods. So, then we wanted to see
whether attention really helps, and whether they make any difference, whether they have a different– So, we looked, what was the sequence with the highest attention, with the highest weight, and we normalized it. So, if there was, that they
is closing the last one, then it got a weight, it was here, if it was far away from the transactions that we want to look at, it was here. But, you can see why
they are quite similar. You can see that genius,
the good transactions– In the sequence, they look at the one that we are considering,
but less in the bad ones, because, in the bad ones, usually, the activities that lead
to the the ones that they can identify as leading to
the fraud, appear earlier. And to do this, we actually print out, weights of transactions and sequences. So, you can see here– Do you see here, nah, wait, I can do this. Okay, okay. So, you can look here,
these are the transactions. So, the ones that we want to decide, whether it’s a fraud, or not, it’s here, and this are the area one. So, we look here, at 10
transaction in a sequence, and this weight say, what are the transactions that got the highest weight. So, this is the same user. This one was a fraud sequence, and this one was a good transactions, and this one was a fraud,
this is a second user. So, interestingly, you can see– Can you see it? No? I thought you can see the
pen, you can’t see it? Okay, so I will point,
how should I do this? Okay, I’ll try. So, you see, these ones are for the sequences, and these are the feature. I told you, we have 54 features,
and this are the weights. So, this is a good transaction, and you see, the weights
that got the highest, was the ones, that we tried to figure out, whether it is a fraud,
or not, and also here. Here, the difficult transaction
wasn’t the last one, but, two in ad-valt and previous to that, and we looked at it and it’s turned out, that the one over there, it was, because it was very
close to the sixth one, and also, this one, missed answering some of the authentication questions, and this has to do something with log in, and here, it was enough to
look at the correct one, and also, you can see of the features that you get different
weights over features, for different, whether it’s true or not, and so, it’s very specialized, and you look here, you can
explain to the expert, or you can generate explanation, why is this kind of a transaction was classified as fraud,
or classified as not fraud. So, this is very useful,
to have these weights. This is an example, we
can look at other ones. Okay, so just to conclude
about this first thing, we saw that modeling online banking fraud detection problem as
a sequence specification, is beneficial, and this is really useful. We develop online classifiers, that apply attention over feature and over sequences. This leads to ability
to provide explanations, and they also, it’s really easy to train, and so, the model is simple, is powerful, and we hope that we will be able to convince this vendor to use it, which is probably the most
difficult task in a project, because, they are very conservative and they take their time to adapt, and banks, also very conservative, but, we show them that
this is really working. Okay, but you notice that all this work depend on the bank, providing you with some information about
transactions that fraud. Without that, you can’t do much. How can I learn without data? And the problem is, that each time this company has new
bank, for three months they just need to wait,
that there will be experts, that will annotate data, that will say, this is fraud, this is
not fraud and et cetera. And, another problem is, that there are some banks, that are not co-operative. You know, they don’t like to spend their effort on annotating
data, et cetera, so, these banks make
our vendor very unhappy, because, what can they do, they don’t know if it is fraud or not. So, what we proposed to them,
is to do transfer learning, is to use data from few banks, that they have good data on them, as a way to do prediction
on banks, that are new, or don’t have a good annotation system. So, that was the second
task of this project, and the idea is, what we have a transfer, we have features, source
with fraud, or not fraud, the annotation, and then we
have features of the target. In transfer-learning, you
have a source, and a target. You want to learn from the
source, into the target domain, and here, we have, this is the
source, and this is a target. And the question is, can we do that? That was a question. So, our idea was– One interesting thing in
this domain, in this banks, so, the bank has many, many transactions, no problem about this. They are willing to give– Even new bank, can look
at the last three months, and give thousand, 100
thousand of transactions. That’s no problem. The only problem is, that
they are not annotated. So, you are taking the good guys, the data from the banks,
that are annotated, and you are taking feature,
many, many targets, transaction with features of the new bank, and you want to take these
transactions and annotate them. That was our idea, let’s
annotate these new transactions, based on the distribution of the features. How will you do that? So, once we’ll have this,
we’ll have annotate data, we’ll put it into the
machine-learning algorithm, and we’ll have a machine-learning model, and the idea is, that, as this
bank will collect more data, we will be able to improve their
machine learning algorithms So, our idea was as following,
giving a set of old banks, and one new bank, what should we do? We first need to compute the similarity between the new bank and
the old bank, based on what? Not on the annotated data,
but the feature distribution, how similar this two banks are. I don’t know if you know,
that banks are very different. There are bank for militaries, there are banks for governments, there are banks for
rural situa– you know, in agriculture areas. Different banks, provide different– But, so, we are looking
at the features, and not on the annotation, and look
whether they are similar. Then, for each old bank,
we are computing risk score of the transaction of the new bank, and using the model of the old bank. So, you have the features,
you are putting the feature of the new bank inside
the model of the old bank, and you get a score
value, risk score value. Then, you compute some weighted sum, based on the similarity, times the risk scores. So, you have 10 banks,
and you have a new bank, so, each of the 10 banks, you are computing about each transaction, what is the risk values
that each will compute, and then you are doing some weighted sum, and then you use the annotated data, to build the model of the new bank, and improve over the time. So, the only problem in this,
is how you do similarity, between banks, we are talking
about discreet values. You can’t use any continuous similarity, so, what we decided to
do, is to build histograms for each feature, and
normalize them to, sum to one. So, we take a feature, we have
100 thousand of transaction. We are, for each possible
value of the feature, we are counting, how
many times it appeared, and then, we have, for each feature, we have histograms on the
possible values of this feature. Now, we are using, for each feature, we can look at a similar
feature of the old bank, new bank, old bank, and
compare between the histograms. All of them are normalized,
then we tried various things, and eventually we decided
to use intersection kernel to compute similarity
between the histograms. You can find it here,
and then we get a number, between zero and one, for
similarity over features, between the old bank and the new bank. Okay, but this is just for features. Now, I want to know about
the entire set of features. So, how will we combine a similarity of every feature, for similarity of banks? Then we use newer networks
to return similarity between banks, using
the data that we have. So, this was the similarity computation. Once we have this, we
could run experiments, with three months of
transactions, of 11 banks, which are labeled, either
it’s fraud, or not fraud, and we always used one left out, so we assume, first bank is
new, and the other 10 are old, and for new bank, we left some
out, some validation data, that was marked as fraud, or
not, and we removed, of course, the labels of the training data, and we used all this procedures– – [V.S.] So, the accuracy,
what method were you using? – Okay, so, this is a good point. What we just compared,
is we compared what, on the validation, what is the– the validation was marked,
whether it is fraud, or not, and the annotations, the
new annotation, we’ll just– I’m just saying, the 90% a-co-sy is on the annotation of the data, that we are going to use
later, as a prediction. So, this was 90%, whether
we are correct, or not, in the annotations that we gave, compared to the actual
annotations that was– (mumbling) Yeah, but, for the validation,
we took a balance set, because otherwise, I need to do a UC, but, for the validation, we took balance state, which is fair enough. Okay, so, again, we were
very happy with this results, and we hope that the vendor will start working with this method on the new banks. So, this was about
banks, and finding fraud, and in the next 10 minutes, I
think, I will use 15 minutes, I want to talk a little
bit about actually, physical security, and
some robots, you know. So, let me tell you about this product. So, you know, today there are
a lot autonomous vehicles, like UAV’s, like robots, going around. The problem is, that currently,
on a UAV, expensive UAV, there’s not only one
person, taking care of it, there are, sometime, two or
three people, and drones, maybe they put one person on two drones, and this is some robots, they usually put one person on one robot, and our goal was, this is a small robot, moving around. So, this are, as you can
see, autonomous robots. They can move around,
they are looking for– They cost 15 hundred
dollars, very cheap robots. They can move around, and find, whatever you want them to find. So, the question is,
if they are autonomous, why do I need people in the loop? Why all this autonomous robots, drones, et cetera, have people? So, there are three reasons, what the people are doing there. So, one, you know, these robots
are getting into trouble, and sometimes, they are broke, and if it’s really important
tasks, they want a person to fix things, when something
goes wrong with the robot. Okay, I can understand that. Then, there are tasks, that people are doing better, than
the robot, by themself. So, for example, this robots tend to get into the ladies’ room, in my lab, and they are getting into some places, and then, they are autonomous, eventually they will be able to go out, but if there is a person, coming, not me, but, you know, one of the students, coming and driving them, the
can driving out much faster, than the autonomous robot,
and also, like, pictures. So, eventually, the robot
can identify what’s going on, but people are, sometimes, much better. So, this is the second. And the most important
thing, why there are people in the loop, in all these robot, because, our customers don’t
like to give the autonomy of robot, to making all sort of decisions. Well, I’m not– One thing is shooting, no
one want the robot to shoot, without an approval of a person, but also, in he-rs, they want a person
to make sure, what’s going on. So, there is still not enough trust in the autonomy of the robot. Same in autonomous car. It’s not the ability, it’s
the trust in the system. Okay, so now, the problem, is, this guy has this 10 robots going around, 20 robots going around, and
he needs to get involved. So, people are overwhelmed
by this requests. So, there, all the time,
come request from the robots, and it turn out, that experiment shows that people can manage five robots. Of course, it depend how autonomous are the robot, et cetera, but– So, we go this project, and they told us, what can you do, that one person– very cheap robots, so, this
robots need help from people, and one person will be able
to manage more than 10 robots. Okay, I said, okay, let’s go for it, and what we decided, that
what this person really needs, the person needs a agent
that will help him, and the agent will help the team. So, I see it as a person,
the robot and the agent, and the agent help the team,
by, processing noisy signals, by prioritizing tasks for the person. The main things that
he’s telling the person, the agent, telling the person what to do, you know, now do this. So, there are all this requests coming, and he prioritize this request, and tell the person what to do. This really helpful for people, and how we did this prioritization, is based on data that we
collected and the data is, both, on the person, and the robots, and we both brought 10 people to the lab, (mumbling) 30 people to the lab, in order
to collect data on people, and we also went many, many
simulation with robots, this is much easier than
people, and base on this, we built a classifier, that
rank the task of the driving from the robot, and tell
the person what to do. Does this really help? Does the agent really help the people? And, the question is, yes. We run both experiment with
simulation, with people, but, also, we brought
12 people to the lab, with the real robots, and we
tell them, what the robot did, the robot look for green balls in the lab, and the question was,
in a given time period, how many balls the person with
the robot can find together, and the six people, first did the task, with the agent and the robot,
and then, without the agent, and six designers were around, so, it was really a controlled
system, and what we found out? So, without the agent, the team found seven, on average, seven green balls. At the same time, with agent,
they find 14, on average. But, it wasn’t just on average, every of the 12 people,
that was with the agent, the team found more
balls, than without it. And you can come and play with it, you’ll notice, it’s
really, really helpful. And we ran experiment with
this, and we ran experiments, also, in simulation of small storage, everywhere, the agent is
really helpful for the people, asset for the team, to reach its goal. So, based on this, my
student developed, re-aim. (upbeat music) (telephone ringing) (shwoosing) (alarms sounding) (upbeat music) – [Woman] Move agent three
away from agent four. Move agent four away from agent five. (beeping) Manually drive robot six to a better– (Sarit chuckling) (background noise drowns out other sounds) Your robots are too close together. Spread them around. Your robots are too close together. Spread them around. Manually drive robot six
to a better position. (alarm sounding) Move agent three, away from agent four. (intense music) (calm music) (beeping) (beeping) (beeping) (electronic beeping and screeching) (upbeat music) (electronic beeping and screeching) (upbeat music) (electronic beeping and screeching) (upbeat music) (sighing) (upbeat music) So, this video won a its-ka-i competition on a video on robots,
and you can guess why? I must tell you, when the student asked me to make a video for the competition, I said, Eh, come on, why
are you wasting my time? But, they said yes, and, now, I’m using the video all the time. So, the rule is, do whatever your student ask you, you know, just do it. (laughing) (mumbling) Okay, so, now the question is, do you want to talk about the set, or
do we, about patrols, or– (mumbling) Okay, so– (mumbling) Okay, so, let me just tell
you about another project that has to do with this robot, and this is about Adversarial Patrolling, and you can think about various scenarios, where robots can be used, for patrol, around area, on a border, you can name it, and the question is,
given a team of robots, how should they plan the patrols paths, along time, to optimize
some objective function? The choice should depend on
the different robotic models, and the existence of adversary, and the environment constraints. So, we are thinking,
all this work has to do with limited resources, about to prevent– Why won’t I tell you a story? Explaining the, why I started
working about this project. So, I have a son, and he was walking on some patrols, around a village, and that what he was
doing, they was saving, doing security of some
village, and one day, he’s calling me, and telling me, Mommy, Mommy, they stole the horses. I say, Avi, what are you talking about? So, he said, you know, we
were patrolling around, and the thieves waited,
till we were some place, they knew we were busy, they enter in, took the horses, beautiful
horses, and ran away with them. So, I told him, Avi,
you need our algorithms, and the algorithm is,
especially to this case, about smart adversaries,
that learn your pattern, and then you use this information, to get in and to get out,
and to take what they want. So, how can you do this? What you need to do, is
to use randomization, so the adversary will not know– The best case, what Avi could do, they could put guards
all over the village, but, it’s too expensive, so, instead of putting guards everywhere, you need to randomize the
activity of the guards. In our case, our robots that
are patrolling some place, you need to add randomizations, and we are looking about
robots that are easier, using, moving across some close
polygon, or some fence, which is open, but
fences is more difficult. And the task is not to prevent,
in 100%, we can’t do that. It just to increase, to maximize the detection probability,
and that’s the idea. So, if you can see, there is some robots, moving around, they are K robots. They are moving one
section at a unit time, and the most important thing, is, the robot is here,
should they turn around, or should they continue,
in the way they are going. And the probability, is associate, whether they will continue
on their directions, or, they turn around,
and go to the other side, and this probability is
characterization of the strategy. So, this is what I’m telling you. So, previous work, was mainly focusing about uniform probability,
around, everywhere, and this is, if everything is symmetric,
that what you want me to do. So, you compute one probability, and in every time unit, the
robot with probability P, continue in one direction,
and probability one minus P, is going to the other directions. But, it’s not always possible, and it’s not always the best way to do, because, if there are places,
where it’s not symmetric, where the fence is higher, or lower, and the adversary can
get faster in one place, and slower in other place, you need to find different
probabilities, to different– and this work was about this, how you are finding probabilities, that are not equal, and this raises a lot of
complexity problem, and, so, what we did in this work,
in a-symmetric situation, we show, you can see here, what if you, in previous works, they
said, you use uniform P. We are suggesting, to use non-uniform T, and you can see, in all the cases, that we looked, and different kind of, the times that the adversary can get in, the number of robots, the number– You can see, that it is better to use non-uniform probability, and that’s similarly, so,
contribution of this work. So, first, remember, it’s good to have a probabilistic movement, it’s good to have different
probability in each location, and actually, you can
think about it, in a fence, intuitively, the
probabilities are different, because, when a robot get to
the end of the fence, he– With probability one,
he needs to turn around. So, again, you need to
think about all this problem with different
probabilities, and they are, again, we show, that,
regardless of the problems, using different probability
in each segment of the fence, lead to higher probability
of detection, and– So, this is another project,
I wanted to tell you, it’s getting more interesting, when you have multiple robots, and the question is, should they overlap, and, again, we show that
overlapping is always good. You can see, over here, and
I want to just conclude, and mention that this is another work, about randomization of patrolling, it’s beneficial in complex,
a-symmetric settings, and non-uniform strategies are necessary for preventing penetration,
so, I will take question, now. – So, please, join me in thanking Sarit, for a very interesting talk. (applause) And, I’ll turn the
floor open to questions. – So, my questions is,
about password (mumbling) – [Sarit] Yeah, okay. – So, you mentioned that
you can, for a new bank, you compute the feature, the
similarity of the new bank, and, let’s say, 10 old banks, right, so, if I understand correct, you are choosing which old bank model is most
similar to the new bank, no? – No, I’m taking all the 10 banks– – Yeah.
– And computing similarity for the new bank, for all the 10– – Yeah.
– And then I’m computing what each of this bank will
say about this transaction. – Yeah.
– And then, I’m doing weighted sum, between the similarity. Think about it, like
collaborative filtering. – Yeah.
– In collaborative filtering, what you are doing, you are computing the weight for several people, and then you are making recommendation, based on this weighted
sum of the old banks, so, I’m not choosing just the bank
with the highest similarity, but I’m doing weighted sum. – Yeah, so, when you have the weights, are you applying the
weights over the predictors, of the old banks, or, you’re applying that over the features, or something? – I am computing the similarities, – Yeah.
– Based on the features. – Yeah.
– And then, I’m using this similarity measures, because, for the old bank and the new bank, the things that I have,
are the features values. For the new bank, I don’t
have any annotated data– – Yeah.
– So, I’m using the features, to compute the similarity, and then, I’m using the prediction of the old banks, on the transaction of the new banks– – Yeah.
– And then I’m doing weighted sum of this. – Okay, okay, I got it, thanks. – Good question. – In that same vein, you
said you use a neural network to identify the similarity scores. I, just, am curious, what data do you use? – I’m using this measure of
similarities, a kernel-based, this is not known-network,
this is just computation. I’m taking the– This is a well-known similarity
measure between histograms, so, I’m generating histograms, from the data of the old
banks, and the new banks, and then, I’m doing
similarity between histograms, but this this similarity
adjust on each feature. Now, I need to combine the
similarity of the features, to similarity between the banks. Here, I’m using newer networks. – [Man] Yeah, so, what would
be your ground truth, there? – Okay, so, I know that
I’m putting data aside, so, I’m too-king, just the 10 banks. On the 10 banks, that I already know, I know how similar they are,
with respect to the prediction. So, I use this as a ground truth, for an– So, now, I’m having 10
banks, I’m leaving one banks away, and I can do this similarity. (mumbling) (laughing) – Hi, professor, I’m
Chan, thanks for sharing. I have two questions.
– Sure. – The first is that, we also know that there are lots of
third party companies, also doing such a kind of
thing to prevent the fraud, just like PayPal, or the other companies. Have you ever, understand what the models, or what the algorithm,
or what the technology, the company used, and have you ever compare yours, with the theirs? – I can compare just with people, that let me know what’s out there. So, we could compare with the
vendors that gave us the data. They had their system, and I
don’t know, if you noticed, Actually, they have a lot
of hard-coded features, that they built over the years, and this is really excellent features. So, they had a lot of features,
and then they use some, relatively simple machine learning– – Okay.
– model to the– What we were saying, we are not– We didn’t went, opposed,
what they are doing, because they are experts
in their features. We said two things, one, you-ever set the features, your features are great. Use sequences, rather than focusing on just inventing features about the– Because, they are doing
all sorts of averaging. So, fine, use these features, but then, take sequences into consideration, and then, if you use sequences, and you use attention,
both on the sequence and the features, you can have two things. One, you’ll improve your
prediction, that’s one thing, and the second thing, you’ll
be able to give explanations. So, this is a win-win for them. So, we are suggesting our method, on top of the existing ones, – We are not going
– Yeah. against it, and then, on
transfer learning, of course, whatever method you are using,
we are not getting into it. We are just telling you, how
to annotate new data, and then, use whatever method you are happy to use. We are not criticizing anyone. We are just saying, on
top of what you are doing, you can use what we developed. So, this was the first question. What was the second–
– Yeah, the second is that, I also know that the other kind of fraud, is like the cheque fraud. – Again?
– Cheque, the banking cheque. – Yeah. I mean, for the scammers, maybe they just forge the signature, and they forge the amount of the cheque, and then deposit to the bank. Have you ever considered
to solve this kind of– – We didn’t look at that, but notice, that cheques are disappearing. I know it’s not in the US, but in Israel, no one use cheques, anymore, – Okay.
– because, they do online transactions. I know, in the States, that there are still evidence, but I’m not– I don’t think I use any,
because we have this apps, so, if I want to give money
for the private teacher of my kids, I just move it in the app. And, they don’t usually want to pay taxes, they want cash, but
anyway, they won’t take– (laughing) – Thank you.
– They won’t take– Yeah, but, this is a different question. – Thank you. – Thank you for that talk. – I’m very interested
in the semi-autonomous multi robots, that project. I’m interested in the question of, what are the differences
between a centralized agent, which controls all the different robots, and the human in the loop, control? So, what are the two differences? – Yeah, okay. So, the agent is not
controlling the robots. The agent, currently,
is helping the person– – Ah,
– Okay? The robots, the idea is, even if the– I don’t care, if the
robots are managed locally. Our robots, they were autonomous, in the sense that there
was no central control, but even, if there is
the central controller, of all the robots, there is
still a person in the loop, because of the three
reasons, that I told you. So, the agent goal, is to
help the person, manage this, and the video are demonstrated,
why the agent is so helpful. – Yeah. – And you can come and try it. – Yeah, okay. – Okay? – And I also liked the
idea of explainable– You can explain that different weights of different transactions, and features. Is there any other– Because you are using
attention, it’s naturally, we can see the weights of
the different components. Are there any other classifiers, that have such advantage,
of being explainable? – Well, we did explanation– I can say, only what I did,
and I tried, and it help. We did a similar project,
about a dating webpage, (groaning) where we wanted, it was
recommending people who to approach, and we used the collaborative, it was two-side collaborative filtering, so collaborative filtering also, is a good source of explanation. You know, compared to neuro network, you know that it’s very
– Yeah. – difficult to explain anything. – Right, right.
– So, it depend on the domain. In our case, it works nicely, I mean– – [Chan] I see, thank you. – [V.S.] Let’s take one last question, and let Sarit (mumbling) – Yeah, so, you talked about
looking in the sequences, in the bank fraud example, and so, what is the intuitive reason, I mean, what is the intuitive meaning,
behind a fraudulent sequence, and a non-fraudulent sequence? I think there was a slide where
you showed a fraudulent one, but I wasn’t able to pick
up why that was fraudulent. – Okay, so, I’m a fraud
person, I’m the bad guy, so, my goal is to transfer money
form your account, to some, it’s usually not to my
account, to some third account, that then will be transferred
to my account, okay. So, it’s not just, I am moving the money. I need to log into the system. I need to find out, what– At least, I need to log in,
and then to do the transaction. This is one. So, there are two actions, but
usually, it’s more like this, usually, what I’m doing, I’m
trying to log into the system, I’m changing the password whilst
changing the authentication And then, this could
be a few days earlier, and then, I log in, and then, the bank ask me for authentication, but I changed it, already, so I can do it, and then, I’m moving the money. Another case, I can– A week in advance, I can
log in, and view your way, how much money you
have, in order to learn, when you’ll have money in
the account, by the way, most of the frauds, they are
moving small amounts of money. Then people, usually, don’t notice. Just, that they– But that’s the way the sequence– You need to do various things, in order to, eventually, steal the money. So, this is a sequence. – [V.S.] Alright, just
(mumbling) question. – This is a tiny question. (mumbling) because, (mumbling) You can hear me, right? – Yeah. If you are an adversary, in this case, and you just, your approach
is to take a small amount of money (mumbling) check, most people, probably, have 20
dollars in their account, so, I just take 20 dollars. – Yeah, okay, so, this
was an example of action, but, I saw, in fraud transactions, that they are log in, they are– I don’t know why they are doing it. They are logging into
your statement, they may– One of the reasons, okay, I can also think about why they are doing it, looking at the accounts, and doing also, because, one of the features, are the IP, where you are doing the activities, and they’re raising suspicions, when, usually, I’m using one IP, and then is– Or, usually, I use a Android, and then I’m log in with iPhone. Okay, so, one of the
reasons, they may do it, is, to make the system used to them, being in the system,
before doing anything. Because, if you are logged in,
and you are not moving money, usually, you won’t be indicated as fraud. So, then, the algorithm, that is looking for different kinds of
activity, is, getting, because you are having this features of average number of times you used, I don’t know, Android,
to log in the system, so, but, I don’t know, that was our doing. – So, just to be sure,
I mean, it sounds like, (mumbling) questions, It sounds, like, this is
identifying fraud in situations, where, an account has
already been compromised, but you don’t know that
it’s been compromised, – Yeah. (mumbling) All the sequence give you, just, the person log in, and immediately move. There are all sort of strange
behavior, about fraud, otherwise, we wouldn’t be able to– – And, the other question’s the following, So, lots of banking apps, on
various mobile phone devices. There are some, which
are specific to a bank, but there are others, that
actually act more like a-qu-ids, through multiple vendors,
through multiple banks. And, so, in your code,
let’s say, it’s used, where does the code reside? Is it gonna be in-app, which will allow us to have greater control, over what’s happening between multiple banks, or is it gonna reside on the servers? – Currently, each company is working, each vendor is working
for a specific bank. I assume, the apps, in
Israel, for example, apps is owned by a specific bank– (mumbling) – And then, they want to give services to all other banks’ customers. That’s where they make the money. So, then, the security is of
the bank that give the service, but, already now, banks
don’t like to cooperate, but there are some cooperation
about IPs, suspicious IP, and one of the things that they, if they will put all their
data together, it’s better. That’s what we are saying. – Alright, well, I’ve already exceeded the number of questions,
I promised to be asked, (laughing) by a fair margin, so, please join me, again, in thanking Sarit. (applauding)

Leave a Reply

Your email address will not be published. Required fields are marked *