How search bots impact event data

What are bots?

A search bot, sometimes called a spider, is a robot that continuously browses the internet, usually for the purpose of building a search index or archiving websites.

Bots can either run on servers in a datacenter such as Amazon Web Services, or sometimes, on people’s personal computers that have been infected with malware or a virus (referred to as a botnet).

Some bots, such as GoogleBot, are used for legitimate purposes such as indexing the web.

Bots can artificially inflate event data, so it’s important to be aware of their existence.

Prepr and search bots

Prepr can detect search bots that deliberately reveal themselves. All traffic from known bots and spiders is automatically excluded. This ensures that your Prepr data, to the extent possible, does not include events from known bots. At this time, you cannot disable known bot data exclusion or see how much known bot data was excluded.

Known bot traffic is identified using a combination of our research and the International Spiders and Bots List.

List of excluded bots

Below is a list of search bots and their user-agents that Prepr identifies.

user AgentSearch Bot
200pleasebot200PleaseBot
360spider360Spider
abotCrawlDaddy, abot
addthisAddThis
adldxbotMicrosoft Bing Ads
admantxADmantX Platform Semantic Analyzer
adsbot-googleGoogle Adwords
advbotAdvBot
ahrefsbotAhrefs backlinks research tool
alexaAlexa Crawler
apache-httpclientJava http library
apachebenchApacheBench (ab)
apis-googleAPIs-Google
appengine-googleGoogle App Engine
applebotApple Bot
archive.org_botInternet Archive (archive.org)
ask jeevesAsk Jeeves
asynchttpclientJava http and WebSocket client library
awe.smAwe.sm URL expander
baiduBaidu
bdcbotBig Data Corp
bingbotMicrosoft Bing
bingpreviewMicrosoft Bing preview
bitlybotbit.ly bot
blekkobotBlekkobot
blexbotBLEXBot (webmeup)
bot@linkfluence.netLinkfluence bot
bufferbotBufferBot
buibui-checkbotbuibui
butterflyTopsy Labs
buzztalkbuzztalk
catchbotCatchBot (catchbot.com)
check_httpNagios monitor
cliqzbotCliqzbot
cmradar/0.1CMRadar/0.1
coldfusionColdFusion http library
commoncrawlCCBot
comodo-webinspector-crawlerComodo
crowsnestCrowsnest
curabotcura.yt
curlcurl unix CLI http client
dap/nethttpDAP/NetHTTP
datagnionbotdatagnion.com/bot.html
daumoaKorean portal and search engine indexing bot
developers.google.com/+/web/snippetGoogle Plus
diffbotDiffbot
digitalpersona fingerprint softwareHP Fingerprint scanner
domain re-animator botDomain Re-Animator Bot
domainsbotDomainsBot
domaintunocrawlerDomainTuno
dotbotDot Bot
duckduckDuck Duck Go
elb-healthcheckerAWS ELB HealthChecker
embedlyEmbedly
eoaagentEOAAgent
eventmachine httpclientRuby http library
everyonesocialbotEveryoneSocial
evrinidEvri bot
exabotExalead's bot
exaleadcloudviewExaleadCloudView
facebookexternalhitFacebook Bot
facebotFacebook Bot
feedburnerRSS bot
feedfetcher-googleGoogle Feedfetcher
findxbotFindxbot
flipboardproxyFlipboardProxy
friendfeedbotFriendFeed
genieoGenieo Web filter bot
getprismatic.comgetprismatic.com
gigabotGigabot spider
gimme60botGimme60 (gimme60.com)
gimmeusabotGimme60 (gimme60.com)
go http packageGo http library
google page speed insightsGoogle Page Speed Insights
google Web PreviewGoogle Instant Previews crawler
google-structured-data-testing-toolGoogle-StructuredDataTestingTool
google-structureddatatestingtoolGoogle-StructuredDataTestingTool
googlebotGoogle Bot
googlestackdrivermonitoring-uptimechecksGoogleStackdriverMonitoring-UptimeChecks
grapeshotcrawlerGrapeshotCrawler
gravitybotGravity Bot
hatena::bookmarkHatena::Bookmark
heritrixheritrix
htmlparserHTMLParser
http_request2HTTP_Request2
httpclientHTTPClient
https://developers.google.com/+/web/snippet (opens in a new tab)Google+ Snippet Fetcher
hubspotHubSpot
ia_archiverInternet Archive (WayBackMachine)
icoreserviceiCoreService
idmarchidmarch.org/bot.html
inagistURL resolver
insieveInsieve Bot
insitesbotInsitesbot
instapaperInstapaper
istellabotIstellaBot
jackjack
jakarta commonsJakarta Commons HttpClient
javaGeneric Java http library
jetslideJetslide
js-kitURL resolver
kemvibotKemvi
kimengiKimengi Bot
knows.isknows.is
kojitsubotKojitsubot
komodiabotKomodiaBot
krakenkraken
laconicaLaconica
libwww-perlPerl client-server library
lijit crawlerLijit
linkdexbotLinkdex Bot
linkedinbotLinkedIn
linkscrawlerLinksCrawler
linodeLinode Longview
lipperheyLipperhey
livelapbotLivelapbot
loadtimebotLoad Time Bot
longurlURL expander service
ltx71ltx71.com
lumibotLumibot
lwp-trivialAnother Perl library
magpie-crawlermagpie-crawler
mail.ru_botMail.ru Bot
meanpathbotmeanpath
mediapartners-googleGoogle Adsense bot
megaindex.ruMegaIndex
memorybotmignify.com/bot.html
metauriMetaURI
mfe_expandMcafee spider
mir web crawlerMIR web crawler
mj12botMajestic-12 spider
mojeekbotMojeek UK search crawler
mrchromeMrChrome
ms search 6.0 robotMS Search 6.0 Robot
msnbot-mediaMicrosoft media bot
msnbotMicrosoft bot
nerdybotNerdyBot
netcraftNetcraft
netstatenetEstate NE Crawler
netvibesPersonalized dashboard bot
netzcheckbotnetzcheck
newrelicmonitorNewRelic monitor
newrelicpingerNewRelicPinger
newsmenewsme
niki-botniki-bot
ningNING
nutchApache search spider
openhosebotOpenHoseBot
orangebotOrangeBot
pagesinventorypagesinventory.com
panoptaMonitoring service
paperlibotPaperLi
peerindexpeerindex
percolatecrawlerPercolateCrawler
perfectmarketkwtbotPerfectMarket
phantomjsPhantomJS
pingdomPingdom monitoring
pinterestPinterest
plukkiebotje.com/plukkie.htm
privacyawarebotPrivacyAwareBot
proximicProximic Spider
psbot-pagePicsearch
publiclibraryarchive.orgpubliclibraryarchive.org
pycurlPython http library
python-httplib2Python-httplib2
python-requestsPython http library
python-urllibPython http library
queryseekerQuerySeekerSpider
quicklookQuickLook
re-animatorDomain Re-Animator Bot
readabilityReadability
rebelmouseRebelMouse
redditbotReddit Bot
relateiqRelateIQ
riddlerRiddler Bot
rogerbotSeoMoz spider
rssmicroRSS/Atom Feed Robot (rssmicro.com)
rubyRuby
scrapyScrapy
screaming frog seo spiderScreaming Frog SEO Spider
searchmetricsbotSearchmetricsBot
semrushbotSEO analysis bot
seokicksSEOKicks
seznambotSeznamBot
shopwikiShopWiki
shortlinktranslateLink shortener
showyoubotShowyou iOS app spider
siegeJoe Dog Siege
sistrixSISTRIX
siteuptimeSite monitoring services
slackSlackbot-LinkExpanding
slackbotSlack Bot
slurpYahoo spider
smtbotSimilarTech
socialrankSocialRankIOBot
sogouChinese search engine
spbotOpenLinkProfiler
spidergeneric web spider
spinn3rSpinn3r aggregator
sputnikbotSputnikBot
squiderSquider
statuscakeStatusCake
stripeStripe
test certificate infoC http library?
tineyeTinEye Bot
traackrTraackr Bot
trendictionbotTrendiction Search
turnitinbotTurnitinBot
tweetedtimesThe Tweeted Times
tweetmemebotTweetMeMe Crawler
twikleSocial web search bot
twitjobsearchTwitJobSearch
twitmuninTwitmunin
twitterbotTwitter URL expander
twurlyTwurly
typhoeusTyphoeus
umbotuberMetrics
unwindfetchGnip
uptimerobotUptime Robot
vagabondoVagabondo
vb projectVisual Basic
vigilVigil
vkshareVKontake Sharer
voilabotVoilaBot
vrcrawlerVenture Radar
wasalive-botWasalive Bots
watchsumoWatchSumo
wbsearchbotWare Bay Best Buys
webscoutWebscout
weseeWeSEE
wgetwget unix CLI http client
wordpressWordPress spider
wormlyWormlyBot
wotboxWotbox
xenu link sleuthXenu Link Sleuth
xing-contenttabreceiverXing bot
xovibotXoviBot
yacybotYaCy
yahoo-ad-monitoringYahoo Ad monitoring
yandexYandex
yetiNaver Corp
yourlsYOURLS
zelist.rofeed parser
zibbZIBB spider
zitebotZite
zyborgZyborg

Was this article helpful?

We’d love to learn from your feedback