How search bots impact event data
What are bots?
A search bot, sometimes called a spider, is a robot that continuously browses the internet, usually for the purpose of building a search index or archiving websites.
Bots can either run on servers in a datacenter such as Amazon Web Services, or sometimes, on people’s personal computers that have been infected with malware or a virus (referred to as a botnet).
Some bots, such as GoogleBot, are used for legitimate purposes such as indexing the web.
Bots can artificially inflate event data, so it’s important to be aware of their existence.
Prepr and search bots
Prepr can detect search bots that deliberately reveal themselves. All traffic from known bots and spiders is automatically excluded. This ensures that your Prepr data, to the extent possible, does not include events from known bots. At this time, you cannot disable known bot data exclusion or see how much known bot data was excluded.
Known bot traffic is identified using a combination of our research and the International Spiders and Bots List.
List of excluded bots
Below is a list of search bots and their user-agents that Prepr identifies.
user Agent | Search Bot |
---|---|
200pleasebot | 200PleaseBot |
360spider | 360Spider |
abot | CrawlDaddy, abot |
addthis | AddThis |
adldxbot | Microsoft Bing Ads |
admantx | ADmantX Platform Semantic Analyzer |
adsbot-google | Google Adwords |
advbot | AdvBot |
ahrefsbot | Ahrefs backlinks research tool |
alexa | Alexa Crawler |
apache-httpclient | Java http library |
apachebench | ApacheBench (ab) |
apis-google | APIs-Google |
appengine-google | Google App Engine |
applebot | Apple Bot |
archive.org_bot | Internet Archive (archive.org) |
ask jeeves | Ask Jeeves |
asynchttpclient | Java http and WebSocket client library |
awe.sm | Awe.sm URL expander |
baidu | Baidu |
bdcbot | Big Data Corp |
bingbot | Microsoft Bing |
bingpreview | Microsoft Bing preview |
bitlybot | bit.ly bot |
blekkobot | Blekkobot |
blexbot | BLEXBot (webmeup) |
bot@linkfluence.net | Linkfluence bot |
bufferbot | BufferBot |
buibui-checkbot | buibui |
butterfly | Topsy Labs |
buzztalk | buzztalk |
catchbot | CatchBot (catchbot.com) |
check_http | Nagios monitor |
cliqzbot | Cliqzbot |
cmradar/0.1 | CMRadar/0.1 |
coldfusion | ColdFusion http library |
commoncrawl | CCBot |
comodo-webinspector-crawler | Comodo |
crowsnest | Crowsnest |
curabot | cura.yt |
curl | curl unix CLI http client |
dap/nethttp | DAP/NetHTTP |
datagnionbot | datagnion.com/bot.html |
daumoa | Korean portal and search engine indexing bot |
developers.google.com/+/web/snippet | Google Plus |
diffbot | Diffbot |
digitalpersona fingerprint software | HP Fingerprint scanner |
domain re-animator bot | Domain Re-Animator Bot |
domainsbot | DomainsBot |
domaintunocrawler | DomainTuno |
dotbot | Dot Bot |
duckduck | Duck Duck Go |
elb-healthchecker | AWS ELB HealthChecker |
embedly | Embedly |
eoaagent | EOAAgent |
eventmachine httpclient | Ruby http library |
everyonesocialbot | EveryoneSocial |
evrinid | Evri bot |
exabot | Exalead's bot |
exaleadcloudview | ExaleadCloudView |
facebookexternalhit | Facebook Bot |
facebot | Facebook Bot |
feedburner | RSS bot |
feedfetcher-google | Google Feedfetcher |
findxbot | Findxbot |
flipboardproxy | FlipboardProxy |
friendfeedbot | FriendFeed |
genieo | Genieo Web filter bot |
getprismatic.com | getprismatic.com |
gigabot | Gigabot spider |
gimme60bot | Gimme60 (gimme60.com) |
gimmeusabot | Gimme60 (gimme60.com) |
go http package | Go http library |
google page speed insights | Google Page Speed Insights |
google Web Preview | Google Instant Previews crawler |
google-structured-data-testing-tool | Google-StructuredDataTestingTool |
google-structureddatatestingtool | Google-StructuredDataTestingTool |
googlebot | Google Bot |
googlestackdrivermonitoring-uptimechecks | GoogleStackdriverMonitoring-UptimeChecks |
grapeshotcrawler | GrapeshotCrawler |
gravitybot | Gravity Bot |
hatena::bookmark | Hatena::Bookmark |
heritrix | heritrix |
htmlparser | HTMLParser |
http_request2 | HTTP_Request2 |
httpclient | HTTPClient |
https://developers.google.com/+/web/snippet (opens in a new tab) | Google+ Snippet Fetcher |
hubspot | HubSpot |
ia_archiver | Internet Archive (WayBackMachine) |
icoreservice | iCoreService |
idmarch | idmarch.org/bot.html |
inagist | URL resolver |
insieve | Insieve Bot |
insitesbot | Insitesbot |
instapaper | Instapaper |
istellabot | IstellaBot |
jack | jack |
jakarta commons | Jakarta Commons HttpClient |
java | Generic Java http library |
jetslide | Jetslide |
js-kit | URL resolver |
kemvibot | Kemvi |
kimengi | Kimengi Bot |
knows.is | knows.is |
kojitsubot | Kojitsubot |
komodiabot | KomodiaBot |
kraken | kraken |
laconica | Laconica |
libwww-perl | Perl client-server library |
lijit crawler | Lijit |
linkdexbot | Linkdex Bot |
linkedinbot | |
linkscrawler | LinksCrawler |
linode | Linode Longview |
lipperhey | Lipperhey |
livelapbot | Livelapbot |
loadtimebot | Load Time Bot |
longurl | URL expander service |
ltx71 | ltx71.com |
lumibot | Lumibot |
lwp-trivial | Another Perl library |
magpie-crawler | magpie-crawler |
mail.ru_bot | Mail.ru Bot |
meanpathbot | meanpath |
mediapartners-google | Google Adsense bot |
megaindex.ru | MegaIndex |
memorybot | mignify.com/bot.html |
metauri | MetaURI |
mfe_expand | Mcafee spider |
mir web crawler | MIR web crawler |
mj12bot | Majestic-12 spider |
mojeekbot | Mojeek UK search crawler |
mrchrome | MrChrome |
ms search 6.0 robot | MS Search 6.0 Robot |
msnbot-media | Microsoft media bot |
msnbot | Microsoft bot |
nerdybot | NerdyBot |
netcraft | Netcraft |
netstate | netEstate NE Crawler |
netvibes | Personalized dashboard bot |
netzcheckbot | netzcheck |
newrelicmonitor | NewRelic monitor |
newrelicpinger | NewRelicPinger |
newsme | newsme |
niki-bot | niki-bot |
ning | NING |
nutch | Apache search spider |
openhosebot | OpenHoseBot |
orangebot | OrangeBot |
pagesinventory | pagesinventory.com |
panopta | Monitoring service |
paperlibot | PaperLi |
peerindex | peerindex |
percolatecrawler | PercolateCrawler |
perfectmarketkwtbot | PerfectMarket |
phantomjs | PhantomJS |
pingdom | Pingdom monitoring |
plukkie | botje.com/plukkie.htm |
privacyawarebot | PrivacyAwareBot |
proximic | Proximic Spider |
psbot-page | Picsearch |
publiclibraryarchive.org | publiclibraryarchive.org |
pycurl | Python http library |
python-httplib2 | Python-httplib2 |
python-requests | Python http library |
python-urllib | Python http library |
queryseeker | QuerySeekerSpider |
quicklook | QuickLook |
re-animator | Domain Re-Animator Bot |
readability | Readability |
rebelmouse | RebelMouse |
redditbot | Reddit Bot |
relateiq | RelateIQ |
riddler | Riddler Bot |
rogerbot | SeoMoz spider |
rssmicro | RSS/Atom Feed Robot (rssmicro.com) |
ruby | Ruby |
scrapy | Scrapy |
screaming frog seo spider | Screaming Frog SEO Spider |
searchmetricsbot | SearchmetricsBot |
semrushbot | SEO analysis bot |
seokicks | SEOKicks |
seznambot | SeznamBot |
shopwiki | ShopWiki |
shortlinktranslate | Link shortener |
showyoubot | Showyou iOS app spider |
siege | Joe Dog Siege |
sistrix | SISTRIX |
siteuptime | Site monitoring services |
slack | Slackbot-LinkExpanding |
slackbot | Slack Bot |
slurp | Yahoo spider |
smtbot | SimilarTech |
socialrank | SocialRankIOBot |
sogou | Chinese search engine |
spbot | OpenLinkProfiler |
spider | generic web spider |
spinn3r | Spinn3r aggregator |
sputnikbot | SputnikBot |
squider | Squider |
statuscake | StatusCake |
stripe | Stripe |
test certificate info | C http library? |
tineye | TinEye Bot |
traackr | Traackr Bot |
trendictionbot | Trendiction Search |
turnitinbot | TurnitinBot |
tweetedtimes | The Tweeted Times |
tweetmemebot | TweetMeMe Crawler |
twikle | Social web search bot |
twitjobsearch | TwitJobSearch |
twitmunin | Twitmunin |
twitterbot | Twitter URL expander |
twurly | Twurly |
typhoeus | Typhoeus |
umbot | uberMetrics |
unwindfetch | Gnip |
uptimerobot | Uptime Robot |
vagabondo | Vagabondo |
vb project | Visual Basic |
vigil | Vigil |
vkshare | VKontake Sharer |
voilabot | VoilaBot |
vrcrawler | Venture Radar |
wasalive-bot | Wasalive Bots |
watchsumo | WatchSumo |
wbsearchbot | Ware Bay Best Buys |
webscout | Webscout |
wesee | WeSEE |
wget | wget unix CLI http client |
wordpress | WordPress spider |
wormly | WormlyBot |
wotbox | Wotbox |
xenu link sleuth | Xenu Link Sleuth |
xing-contenttabreceiver | Xing bot |
xovibot | XoviBot |
yacybot | YaCy |
yahoo-ad-monitoring | Yahoo Ad monitoring |
yandex | Yandex |
yeti | Naver Corp |
yourls | YOURLS |
zelist.ro | feed parser |
zibb | ZIBB spider |
zitebot | Zite |
zyborg | Zyborg |
Was this article helpful?
We’d love to learn from your feedback