My internship experience at Semgrep

updated: 2025-11-17

In this post, I will refer to the company Semgrep as "Semgrep", and the core program Semgrep develops as "semgrep". I worked on the team that works on semgrep.

What Semgrep does

semgrep is a tool that performs static analysis by scanning source code. It can scan many languages, including Python, Java, Go, Scala, etc. There are MANY languages that it supports.

It achieves this with a similar mechanism as LLVM. Semgrep uses a single intermediate representation (IR) for all languages. This way, supporting a new language comes down to adding a conversion to this IR.

How Semgrep works

Why did I want to work at Semgrep?

semgrep is implemented in OCaml, and personally I find that language to be quite cozy.

I am also quite interested in programming languages and compilers in general and there was work of a similar flavor at Semgrep.

What was the main problem I solved?

I worked on resolving one scalability problem for semgrep.

It centered around taint analysis, which is the way semgrep scans for vulnerabilities like SQL injections.

Here is an example of the kind of vulnerability it might be able to catch:

x = tainted_value  # x is tainted
dangerous_call(x)  # this code gets flagged since we run a dangerous code on x