Het grote Rspamd topique

maandag 25 april 2016 10:52

Acties:

Verwijderd

Topicstarter

Introductie

Rspamd is een nieuwe anti-spam filter voor op het Linux/BSD platform.

Rspamd is an advanced spam filtering system that allows evaluation of messages by a number of rules including regular expressions, statistical analysis and custom services such as URL black lists. Each message is analysed by rspamd and given a spam score.

According to this spam score and the user’s settings rspamd recommends an action for the MTA to apply to the message- for example to pass, reject or add a header. Rspamd is designed to process hundreds of messages per second simultaneously and has a number of features available.

Hier aan toegevoegd (wellicht relevant). In tegenstelling tot SpamAssassin is het helemaal geschreven in C.
Dat doet in ieder geval de snelheid ten goede.

Features

Featureset is heel uitgebreid

Spam filtering features
Rspamd distribution contains a number of mail processing features, including such techniques as:

Regular expressions filtering - allows basic processing of messages, their textual parts, MIME headers and SMTP data received by MTA against a set of expressions that includes both normal regular expressions and message processing functions. Rspamd expressions are the powerful tool that allows to filter messages based on some pre-defined rules. This feature is similar to regular expressions in spamassassin spam filter. Moreover, rspamd supports Spamassassin rules directly via the plugin.

SPF module that allows to validate a message’s sender against the policy defined in the DNS record of sender’s domain. You can read about SPF policies here. A number of mail systems includes SPF support, such as gmail or yahoo mail.

DKIM module validates message’s cryptographic signature against public key placed in the DNS record of sender’s domain. Like SPF, this technique is widely spread and allows to validate that a message is sent from that specific domain.

DMARC module validates the joint SPF and DKIM policies for a sender and evaluates if there are additional restrictions. Rspamd also support storing report data within redis storage.

DNS black lists allows to estimate reputation of sender’s IP address or network. Rspamd uses a number of DNS lists including such lists as SORBS or spamhaus. However, rspamd doesn’t trust any specific DNS list and use a conjunction of estimations instead that allows to avoid mistakes and false positives. Rspamd also uses positive and grey DNS lists for checking for trusted senders.

URL black lists are rather similar to DNS black lists but uses URLs in a message to make an estimation of sender’s reputation. This technique is very useful for finding malicious or phished domains and filter such mail.

Statistics - rspamd uses bayesian classifier based on five-gramms of input. This means that the input is estimated not based on individual words, but all input is organized in chains that are further estimated by bayesian classifier. This approach allows to achieve better results than traditionally used monogramms (or words literally speaking), that is described in details in the following paper.

Fuzzy hashes - for checking of malicious mail patterns rspamd uses so called fuzzy hashes. Unlike normal hashes, these structures are targeted to hide small differences between text patterns allowing to find similar messages quickly. Rspamd has internal storage of such hashes and allows to block mass spam sendings quickly based on user’s feedback that specifies messages reputation. Moreover, it allows to feed rspamd with data from honeypots without polluting the statistical module.

Rspamd uses the conjunction of different techniques to make the finall decision about a message. This allows to improve the overall quality of filtering and reduce the number of false positives (e.g. when a innocent message is badly classified as a spam one). I have tried to simplify rspamd usage by adding the following elements:

Web interface - rspamd is shipped with the fully functional ajax based web interface that allows to observe rspamd statistic; to configure rules, weights and lists; to scan and learn messages and to view the history of scans. The interface is self-hosted, requires zero configuration and follows the recent web applications standards. You don’t need a web server or applications server to run web UI - you just need to run rspamd itself and a web browser.

Integration with MTA - rspamd can work with the most popular mail transfer systems, such as postfix, exim or sendmail. For postfix and sendmail, there is an rmilter project, whilst for exim there are several solutions to work with rspamd. Should you require MTA integration then please consult with the integration guide.

Easy configuration - rspamd uses UCL language for configuration. UCL is a simple and intuitive language that is focused on easy to read configuration files. You have many choices to write your definitions, so use whatever you like (even a strict JSON would be OK).

Dynamic tables - rspamd allows to specify some data as dynamic maps that are checked in runtime with updating data when they are changed. Rspamd supports file and HTTP maps.

Performance
Rspamd was designed to be fast. The core of rspamd is written in C and uses event-driven model that allows to process multiple messages simultaenously and without blocking. Moreover, a set of techniques was used in rspamd to process messages faster:

Finite state machines processing - rspamd uses specialized finite state machines for the performance critical tasks to process input faster than a set of regular expressions. Of course, it is possible to implement these machines by ordinary perl regular expressions but then they won’t be compact or human-readable. On the contrary, rspamd optimizes such actions as headers processing, received elements extraction, protocol operations by builiding the conrete automata for an assigned task.

Expressions optimizer - allows to optimize expressions by exectution of likely false or likely true expressions in order in the branches. That allows to reduce number of expensive expressions calls when scanning a message.

Symbols optimizer - rspamd tries to check first the rules that are frequent or inexpensive in terms of time or CPU resourses, which allows to block spam before processing of expensive rules (rules with negative weights are always checked before other ones). You can view my presentation about it here.

Event driven model - rspamd is designed not to block anywhere in the code and counting that spam checks requires a lot of network operations, rspamd can process many messages simultaneously increasing the efficiency of shared DNS caches and other system resources. Moreover, event-driven system normally scales automatically and you won’t need to do any tuning in the most of cases.

Threaded expressions and statistics - rspamd allows to perform computation resources greedy tasks, such as regular expressions or statistics, in separate threads pools, which allows to scale even more on the modern multi-core systems.

Clever choice of data structures - rspamd tries to use the optimal data structure for each task, for example, it uses very efficient suffix tries for fast matching of a text against a set of multiple patterns. Or it uses radix bit trie for storing IP addresses information that provides O(1) access time complexity.

You can also check the user’s report regarding rspamd performance at haraka github.

Extensions
Besides of the C core, rspamd provides the Lua API to access almost all the features available directly from C. Lua is an extremely easy to learn programming language, though it is powerful enough to implement complex mail filters. In fact, rspamd has a significant amout of code written completely in Lua, such as DNS blacklists checks, or user’s settings, or different maps implementation. You can also write your own filters and rules in Lua adopting rspamd functionality to your needs. Furthermore, Lua programs are very fast and their performance is rather close to pure C. However, you should note that for the most of performance critical tasks you usually use the rspamd core functionality than Lua code. Anyway, you can also use LuaJIT with rspamd if your goal is maximum performance. From the Lua API you can do the following tasks:

Reading the configuration parameters - Lua code has the full access to the parsed configuration knobs and you can easily modify your plugins behaviour by means of the main rspamd configuration

Registering custom filters - it is more than simple to add your own filters to rspamd: just add new index to the global variable rspamd_config:

rspamd_config.MYFILTER = function(task)
-- Do something
end
Full access to the content of messages - you can access text parts, headers, SMTP data and so on and so forth by using of task object. The full list of methods could be found here.

Pre- and post- filters - you can register callbacks that are called before or after messages processing to make results more precise or to make some early decision, for example, to implement a rate limit.

Registering functions for rspamd - you can write your own functions in Lua to extend rspamd internal expression functions.

Managing statistics - Lua scripts can define a set of statistical files to be scanned or learned for a specific message allowing to create more complex statistical systems, e.g. based on an input language. Moreover, you can even learn rspamd statistic from Lua scripts.

Standalone Lua applications - you can even write your own worker based on rspamd core and performing some asynchronous logic in Lua. Of course, you can use the all features from rspamd core, including such features as non-blocking IO, HTTP client and server, non-blocking redis client, asynchronous DNS, UCL configuration and so on and so forth.

API documentation - rspamd Lua API has an detailed documentation where you can find examples, references and the guide about how to extend rspamd with Lua.

URL: https://rspamd.com

vrijdag 29 april 2016 17:36

Acties:

InflatableMouse

Carina Nebula says hi!

Ik vind het erg interessant (beheer veel mail relays voor m'n werk) dus ik ga de site es lezen, met name de integratie.

vrijdag 29 april 2016 17:50

Acties:

H!GHGuY

Try and take over the world...

Verwijderd schreef op maandag 25 april 2016 @ 10:52:
Hier aan toegevoegd (wellicht relevant). In tegenstelling tot SpamAssassin is het helemaal geschreven in C.
Dat doet in ieder geval de snelheid ten goede.

maar niet de security

Je wil niet hebben dat random spam met een exploit je machine overneemt. Zoiets schrijf je in een taal die deftig kan omgaan met strings, op z'n minst in C++, beter nog in een higher-level taal.

ASSUME makes an ASS out of U and ME

vrijdag 29 april 2016 18:11

Acties:

CyBeR

💩

H!GHGuY schreef op vrijdag 29 april 2016 @ 17:50:
[...]

maar niet de security

Je wil niet hebben dat random spam met een exploit je machine overneemt. Zoiets schrijf je in een taal die deftig kan omgaan met strings, op z'n minst in C++, beter nog in een higher-level taal.

D'r zijn inderdaad nul manieren voor remote code execution in perl.

All my posts are provided as-is. They come with NO WARRANTY at all.

vrijdag 29 april 2016 21:09

Acties:

H!GHGuY

Try and take over the world...

CyBeR schreef op vrijdag 29 april 2016 @ 18:11:
[...]

D'r zijn inderdaad nul manieren voor remote code execution in perl.

Zeg ik niet, maar C maakt het wel erg makkelijk.

ASSUME makes an ASS out of U and ME

maandag 2 mei 2016 15:21

Acties:

Verwijderd

Topicstarter

Als je door de documentatie leest zijn er meerdere zaken die er zitten:
Dat er o.a gebruik wordt gemaakt van PCRE.
Het mogelijk is om Spamassasin rules te gebruiken
ClamAV integratie in Rmilter.

Misschien mis ik iets.. Maar volgens mij moet dat afdoende zijn.

dinsdag 3 mei 2016 15:24

Acties:

InflatableMouse

Carina Nebula says hi!

@Typnix,

Heb je het zelf al draaien, als test of mischien wel in produktie?

Heb je ervaring met andere soortgelijke produkten?

dinsdag 3 mei 2016 18:46

Acties:

Wolfboy

ubi dubium ibi libertas

H!GHGuY schreef op vrijdag 29 april 2016 @ 17:50:
[...]

maar niet de security

Je wil niet hebben dat random spam met een exploit je machine overneemt. Zoiets schrijf je in een taal die deftig kan omgaan met strings, op z'n minst in C++, beter nog in een higher-level taal.

En daarom draai je zo'n tool ook als compleet unprivileged user. Dat zou ik bij spamassassin overigens ook aanraden.

Blog [Stackoverflow] [LinkedIn]

dinsdag 3 mei 2016 20:47

Acties:

H!GHGuY

Try and take over the world...

Wolfboy schreef op dinsdag 03 mei 2016 @ 18:46:
[...]

En daarom draai je zo'n tool ook als compleet unprivileged user. Dat zou ik bij spamassassin overigens ook aanraden.

Oh, right. Ik vergat het weeral. Root exploits bestaan niet.
Wat mij betreft draai je dit in een VM die je elke week weggooit en opnieuw opzet, geautomatiseerd natuurlijk. En natuurlijk ook in een volledig afgeschermd netwerk segment waar de ACL's enkel mail in en mail out toelaten. Minder dan dat is gewoon vragen om hommeles.

ASSUME makes an ASS out of U and ME

dinsdag 3 mei 2016 21:17

Acties:

CyBeR

💩

Kom nou.

All my posts are provided as-is. They come with NO WARRANTY at all.

woensdag 4 mei 2016 02:33

Acties:

Wolfboy

ubi dubium ibi libertas

H!GHGuY schreef op dinsdag 03 mei 2016 @ 20:47:
[...]

Oh, right. Ik vergat het weeral. Root exploits bestaan niet.
Wat mij betreft draai je dit in een VM die je elke week weggooit en opnieuw opzet, geautomatiseerd natuurlijk. En natuurlijk ook in een volledig afgeschermd netwerk segment waar de ACL's enkel mail in en mail out toelaten. Minder dan dat is gewoon vragen om hommeles.

In welke taal is jouw mailserver geschreven?
En je ssh server, webserver, kernel, etc...

Om de Perl interpreter zelf niet te vergeten natuurlijk. Als je het echt veilig wil hebben dan moet je gewoon de netwerkverbinding eruit trekken.

Daarnaast zijn root exploits meestal net zo eenvoudig uitvoerbaar vanuit Perl als vanuit C.

[ Voor 6% gewijzigd door Wolfboy op 04-05-2016 02:34 ]

Blog [Stackoverflow] [LinkedIn]

woensdag 4 mei 2016 12:21

Acties:

H!GHGuY

Try and take over the world...

Wolfboy schreef op woensdag 04 mei 2016 @ 02:33:
[...]
In welke taal is jouw mailserver geschreven?
En je ssh server, webserver, kernel, etc...

- OpenSSH en co zijn C. Tot daar volg ik je. Maar de exposure tov unauthenticated input is beperkt tot de authenticatie achter te de rug is. Maw: de attack surface voor unauthenticated en untrusted users is vele malen kleiner dan de totale attack surface.
- webserver: CGI is al vele jaren verfoeid door iedereen en opnieuw: de exposure die Apache of nginx krijgen tov de untrusted/unauthenticated input is een fractie van wat de onderliggende (typisch PHP dezer dagen) taal te verwerken krijgt.
- kernel: de kernel behandelt voornamelijk de lager gelegen protocols. Veel routers/firewalls/... droppen al een hoop troep die er niet proper uit ziet. Tegelijk wil ik er op wijzen dat de kernel weinig pattern-matching/regex/... operaties doet. Het soort operaties is dus in veel gevallen gecontroleerder. (met eBPF zal dit misschien wel wat veranderen natuurlijk).
...

Punt die ik wil maken: de attack surface van die componenten is een klein deel van de totale attack surface van de volledige component en zélfs daar komen bugs en exploits in voor.
Hier heb je een tool met als enigste doel slurry filteren en is de volledige attack surface benaderbaar door unauthenticated/untrusted users. Als je telt in bugs/LOC en dan die ratio doortrekt in het aantal LOC die mogelijks unauth/untr data behandelt, dan is dit gewoon een gigantische attack vector.

Om de Perl interpreter zelf niet te vergeten natuurlijk. Als je het echt veilig wil hebben dan moet je gewoon de netwerkverbinding eruit trekken.

Sure, de perl interpreter kan vatbaar zijn voor dit soort issues. Maar ik patch liever 1 interpreter waarmee ik honderden programma's afdek dan tig programma's 1 voor 1 te fixen omdat ze weer eens in dezelfde strcpy/strcat/... valkuil terecht komen.

Daarnaast zijn root exploits meestal net zo eenvoudig uitvoerbaar vanuit Perl als vanuit C.

Jij hebt het vooral over implementaties van exploits (i.e. je schrijft code in Perl om een machine te exploiten). Wat je wil bekijken is de vatbaarheid voor remote code execution bij het behandelen van externe untrusted unauthenticated input. Wat dat betreft zijn higher-level talen een veelvoud meer robuust dan plain-old C. Zelfs C++, met zijn std::string is een veelvoud meer robuust tegen die ongein.

Hier sjees je echt de goorste internet-troep (spam, malware, ...) door een C programma wat vol steekt met string handling, een van dé zaken waar C als taal echt niet de juiste tooling voor heeft. Zo maak je van de spam filter zelf ook een target met een enorme attack surface, geschreven in een taal die zich bij uitstek NIET leent voor dit soort zaken. Dus: consider me sceptical.

ASSUME makes an ASS out of U and ME

donderdag 5 mei 2016 11:19

Acties:

Thralas

Je lijkt het grote gevaar te zoeken in string handling. Dat levert al lang geen interessante bugs meer op in C, mits de zaken juist gecompileerd zijn (SSP, FORTIfY_SOURCE). De goto exploit primitives in 2016 zijn use-after-frees, en daar is C++ net zo'n groot probleem als C.

Bovendien is memory corruption (met moderne mitigations) in een non-interactieve daemon als rspamd al snel erg onpraktisch. Alle daemons die jij als voorbeeld aanhaalt zijn dat wel, dan pas wordt het interessant (vgl. Heartbleed) - om het nog niet te hebben over softwarepakketten waar de aanvaller ook nog een interpreter cadeau krijgt.

donderdag 5 mei 2016 15:04

Acties:

CAPSLOCK2000

zie teletekst pagina 888

Durft iemand een vergelijking te maken tussen rspamd en amavis? Op het eerste gezicht lijken ze veel op elkaar, allebij bevatten ze een heel arsenaal aan maatregelen die je op je mail los kan laten om tot een eindoordeel te komen.

Ik moet zeggen dat de a-synchrone verwerking en goede caching me wel trekt. Mijn huidige anti-spam oplossing doet ontzettend veel werk dubbel omdat alles voor ieder mail helemaal opnieuw wordt gedaan. Als je dan een massmail ontvangt waarin alleen de naam van de ontvanger verschilt dan is dat enorm inefficient.

This post is warranted for the full amount you paid me for it.

dinsdag 10 mei 2016 19:59

Acties:

Verwijderd

Topicstarter

CAPSLOCK2000 schreef op donderdag 05 mei 2016 @ 15:04:
Durft iemand een vergelijking te maken tussen rspamd en amavis? Op het eerste gezicht lijken ze veel op elkaar, allebij bevatten ze een heel arsenaal aan maatregelen die je op je mail los kan laten om tot een eindoordeel te komen.

Ik moet zeggen dat de a-synchrone verwerking en goede caching me wel trekt. Mijn huidige anti-spam oplossing doet ontzettend veel werk dubbel omdat alles voor ieder mail helemaal opnieuw wordt gedaan. Als je dan een massmail ontvangt waarin alleen de naam van de ontvanger verschilt dan is dat enorm inefficient.

Op welke punten had je vergelijkingen willen zien?

dinsdag 10 mei 2016 20:32

Acties:

CAPSLOCK2000

zie teletekst pagina 888

features en performance

Ik wil antwoord op de vraag "Waarom zou ik van amavis overstappen op rspamd?"
Heeft rspamd features die amavis niet heeft?
Hoe snel is rspamd in vergelijking met amavis?

This post is warranted for the full amount you paid me for it.

dinsdag 10 mei 2016 21:53

Acties:

DiedX

Ik ben eergisteren over gestapt van mailscanner. Viel mij 100% mee, vooral de adaptieve greylisting bevalt mij erg goed.

Ik ben nog absoluut niet in de AMA-mood, maar wellicht kan ik wat zaken voor jullie testen...

DiedX supports the Roland™, Sound Blaster™ and Ad Lib™ sound cards

woensdag 11 mei 2016 10:51

Acties:

Verwijderd

Topicstarter

CAPSLOCK2000 schreef op dinsdag 10 mei 2016 @ 20:32:
features en performance

Ik wil antwoord op de vraag "Waarom zou ik van amavis overstappen op rspamd?"
Heeft rspamd features die amavis niet heeft?
Hoe snel is rspamd in vergelijking met amavis?

Een indicatie qua performance:
https://rspamd.com/misc/2016/03/03/rspamd-performance.html

- Als je Rmilter (wat Rspamd als afhankelijkheid heeft) samen met Redis configureert. Dan kan het ook greylisting aan.
- Intergratie met Clamav vanuit Rmilter
- Het heeft een simpele web interface waarmee je statistieken eruit kan halen, kleine configuratie aanpassingen doen: https://rspamd.com/webui/

Voor de rest zou ik vooral de website bestuderen.