Machine Learning leverages a four-phase process: Collection, Extraction, Learning and Classification.
Collection
Like DNA analysis, file analysis starts with massive data quantities – specific types of files (executables, PDFs, Microsoft Word® documents, Java, etc.). Millions of files are collected from industry sources, proprietary repositories and inputs from active computers.
The goal is to ensure:
- statistically significant sample sizes
- sample files of the broadest type and authorship (author groups such as Microsoft, Adobe, etc.)
- an unbiased collection, not over-collecting specific file types.
Files are then reviewed and placed into three buckets: known and verified valid; known and verified malicious; and unknown. An accurate review is imperative – the inclusion of malicious in the valid bucket or valid in the malicious bucket would create incorrect bias.
Extraction
The extraction of attributes follows, which is substantively different from behavior identification or malware analysis historically conducted by threat researchers. Rather than seeking things analysts believe might be malicious, this approach leverages the compute capacity of machines and data-mining to identify the broadest possible set of file characteristics — some as basic as the file size and others as complex as the first logic leap in the binary.
The atomic characteristics are then extracted, depending on file type (.exe, .dll, .com, .pdf, .java, .doc, .ppt, etc.). By identifying the broadest possible set of attributes, manual classification bias is removed. Use of millions of attributes also increases the cost an attacker incurs, creating a piece of malware that could go undetected. This attribute identification and extraction process creates a file genome comparable to the human genome and can be used to mathematically determine expected characteristics of files, just as human DNA analysis is leveraged, determining characteristics and behaviors of cells.
Learning
Once collected, the output is normalized and converted to numerical values for use in statistical models. Vectorization and machine learning are then applied to eliminate human impurities and to speed analytical processing. Leveraging the attributes identified in extraction, mathematicians then develop statistical models that predict whether a file is benign or malicious. Dozens of models are created with key measurements, ensuring the predictive accuracy. Ineffective models are scrapped. Effective models are subjected to multiple levels of testing.
The first level starts with a sample of known files. Later stages involve the entire file corpus (tens of millions of files). The final models are then loaded into a production environment for use in file classification.
It’s important to remember that for every file scrutinized, millions of attributes are analyzed to differentiate between legitimate files and malware. This is how machine learning identifies malware – whether known or unknown – and achieves unprecedented levels of accuracy. It divides a single file into an astronomical number of characteristics and analyzes each against hundreds of millions of other files to reach a decision about the health of each characteristic.
Classification
Statistical models once built can be used by math engines to classify files, which are unknown (e.g., files never seen before). This analysis takes milliseconds and is extremely precise because of the breadth of the file characteristics analyzed […] Read more »..
ROLE DESCRIPTION
We are looking for a Membership Manager to join the company and take on one of the most opportunistic roles the industry has to offer. This is a role that allows for you to create and develop relationships with leading solution providers in the enterprise technology space. Through extensive research and conversation you will learn the goals and priorities of IT & IT Security Executives and collaborate with companies that have the solutions they are looking for. This role requires professionalism, drive, desire to learn, enthusiasm, energy and positivity.
Role Requirements:
Role Responsibilities:
Apex offers our team:
Entry level salary with competitive Commission & Bonus opportunities
Apex offers the ability to make a strong impact on our products and growing portfolio.
Three months of hands on training and commitment to teach you the industry and develop invaluable sales and relationship skills.
Opportunity to grow into leadership role and build a team
Extra vacation day for your birthday when it falls on a weekday
All major American holidays off
10 paid vacation days after training period
5 paid sick days
Apply Now >>