analysis-of-captcha-systems/paper.tex

132 lines
5.6 KiB
TeX

\documentclass[conference]{IEEEtran}
\usepackage[english]{babel} % To obtain English text with the blindtext package
\usepackage{blindtext}
\begin{document}
\title{Analysis of bot detection and spam prevention systems}
% TODO get author names from Prof. Sibi
\author {
\IEEEauthorblockN{Aravinth Manivannan, Sibi Chakkaravarth S, TODO}
\IEEEauthorblockA{SENSE, VIT AP, AP, India - pincode\\
foo@example.com
}
}
\maketitle
\begin{abstract}
CAPTCHA systems were originally designed to protect against automated
bot-based Denial of Service(DoS) attacks and spam. But over time, these
systems have become ineffective due to overfocus on identifying humans from
bots than combating DoS attacks and spam. As a result, they have become
privacy invasive systems that pose accessibility challenges with reduced
effectiveness and accuracy. mCaptcha is a proof of work based,
non-interactive DoS protection system designed to overcome the limitations
of traditional CAPTCHA systems' limitations while offering superior
protection services. The mechanism is stateless, so it is able accurately
defend against attacks over anonymous networks like TOR and the
non-interactive nature makes it ideal users with auditory, cognitive and
visual disabilities.
\end{abstract}
\section{Introduction}
\label{sec:intro}
Denial of Service(DoS) attacks and spam campaigns reduce the quality of service
for internet services. Different types of rate-limiters were employed to combat
such attacks. Today rate-limiters on the web are synonymous with CAPTCHAs.
CAPTCHA systems work on the premise that an automated bot user can inflict more
damage than a human user and attacks can be contained if they can accurately
differentiate a human from a bot. The rise of cheap human labor powered CAPTCHA
farms in third-world countries have given attackers a way to bypass CAPTCHA
systems. To combat this new threat, CAPTCHA implementers are constantly raising the
difficulty of the challenges. This universal raise in difficulty impacts bots
and unassuming alike. The web is becoming increasingly less accessible to users
with disabilities and non-English speaking users. Some CAPTCHA systems employ
multiple methods to in their process. Privacy invasive mechanisms like cookies
and IP tracking are popular methods that are used in conjunction with
traditional CAPTCHA mechanisms, both of which are ineffective against
anonymous networks like TOR and pose serious privacy risks to their users.
The rest of this paper, rates different CAPTCHA mechanisms and systems based on
parameters mentioned below and describe how mCaptcha overcomes some of
them.
\subsection{CAPTCHA rating parameters}
CAPTCHA systems use a variety of methods in their decision process. Every method
has it's own strengths and limitations but the following parameters have been
chosen to uniformly rate CAPTCHA methods and systems in an attempt to compare
them.
\begin{description}[\IEEEsetlabelwidth{Effectiveness}]
\item[Privacy]
\begin{itemize}
\item Does the method use trackers or any other identifying method?
\item Does the method work in anonymous networks like TOR?
\end{itemize}
\item[Effectiveness]
\begin{itemize}
\item Is the method/system effective in containing DoS attacks?
\item Can the method be circumvented? If yes, how practical/feasible
the attack?
\end{itemize}
\item[Accessibility]
\begin{itemize}
\item Is the method posing any challenges to visually to users
with auditory, cognitive and visual disabilities?
\item How easy is it to use?
\item Does the method have a language dependency which poses a challenge to
non-English speakers?
\end{itemize}
\item[Accuracy]
\begin{itemize}
\item How accurate is the method in detecting potentially malicious
users?
\item Are there any factors that method's impact accuracy?
\end{itemize}
\end{description}
\subsection{CAPTCHA methods analysed}
We analysed at the following CAPTCHA methods using the above mentioned
parameters. These are popular methods are currently in deployment.
%TODO add images
\subsubsection{Align object}
Objects in various degrees of misalignments are displayed to the user and are
asked to chose the one that is perfectly aligned.
% Example GitHub/Kik inverted Hipop
\subsubsection{Blurred Text}
A sequence of randomly generated letters and digits are
presented to the user with added noise, scattered distribution and
rotations. Sometimes, they are also presented in 3D form.
\subsubsection{Context based}
This method is personalised to the platforms they are displayed on. They usually
pose challenges which can only be solved if the user is familiar with the
platforms. Some examples are:
\begin{itemize}
\item What is the name of the website's mascot?
\item Who owns this website?
\item What are our members collectively called?(example: Reddit users are
called Redditors)
\end{itemize}
\subsubsection{Audio based}
A audio recording with added noise is presented to the user who is asked to
transcribe the content of the recording.
\subsubsection{IP tracking}
IP address is used to blacklist misbehaving users. Strictly speaking, this isn't
a CAPTCHA method but is frequently used in conjunction with other methods.
\subsubsection{Image identification}
A blurred image with added noise or unusual cropping is presented to the user
who is requested to identify the object in it. Sometimes, the users are also
asked to pick images that match a certain description from a collection of
images.
\subsubsection{Proof of Work based}
This is an alternative to CAPTCHA method that has been used for rate-limiting.
The user agent is presented with a challenge and is tasked generate a
cryptographic proof which computationally expensive.
\end{document}