About the Role:
CrowdStrike is looking for a Data Analyst to join our growing Generative AI Research Center. This is a junior/entry-level position with quick advancement opportunities. As Data Analyst you will focus on data and corpus labeling, as well as other data-related tasks critical to supporting our large language models (LLMs) and cybersecurity initiatives. This role is crucial in enhancing our products capabilities by ensuring the accuracy and quality of the data used to train models and detect threats, thereby supporting the overall mission of the Generative AI Research Center.
CrowdStrike is a cybersecurity company, but we do not require candidates for this role to have prior security industry experience. We will mentor and train in security topics as needed. We do expect a strong interest in CrowdStrike's mission and a willingness to engage with the needs of our product teams and customers.
If you are a hands-on engineer who loves technical challenges and wants to operate at scale, apply & let's talk!
Interviewing process: online and onsite where applicable
What You'll Do:
Label and annotate cybersecurity-related datasets to prepare them for analysis and machine learning tasks.
Ensure labeling accuracy and consistency across different datasets, including threat intelligence data, incident reports, network logs, etc.
Gather data from various cybersecurity sources, including threat intelligence feeds, logs, and internal reports.
Clean and preprocess data to make it suitable for analysis and modeling
Perform exploratory data analysis to uncover patterns, trends, and insights related to cybersecurity threats and vulnerabilities.
Utilize statistical methods and tools to interpret data and identify potential security issues.
Create and maintain dashboards and reports to communicate findings to cybersecurity stakeholders.
Develop visualizations to present data in a clear and concise manner, highlighting key security metrics and trends.
Work closely with analysts, data scientists, engineers, and other team members to support their data needs.
Support the implementation and optimization of MLOps pipelines, leveraging data insights to deploy, monitor and scale machine learning models for different solutions.
Participate in team meetings and contribute to project planning and discussions, providing data-driven insights.
Document processes, methodologies, and insights gained from data analysis and labeling activities.
Maintain clear records of data sources, cleaning steps, and labeling criteria to ensure reproducibility and auditability.
What You'll Need:
Bachelor's degree in Computer Science or related STEM field.
Proficiency in data manipulation and analysis tools (e.g., Python, SQL).
Familiarity with relevant libraries and frameworks (e.g., TensorFlow, PyTorch).
Experience with data labeling and annotation tools.
Strong analytical and problem-solving skills, with an understanding of cybersecurity concepts.
Excellent communication and collaboration abilities.
Attention to detail and a commitment to data accuracy.
Tech Stack (not mandatory to know everything; a robust learning capacity is essential):
Python
SQL
Data Labeling and Annotation Tools like Labelbox, Prodigy, etc.
Data Analysis and Visualization like Pandas, NumPy, Matplotlib, Seaborn, etc.
Docker
Kubernetes
AWS
Kafka
GIT
Bonus Points:
Existing exposure to Go, AWS, Cassandra, Kafka, Elasticsearch.
Experience with Language Models, Data Science, Data Engineering.
Experience with data labeling and annotation tools, particularly in a cybersecurity context.