Recently, a paper is published to demonstrate how a visibly valid contribution can contain malicious code by exporting the Unicode control characters. Some of these attacks has been tested on Python and it works. Shall the Python and open-source communities be concerned?
After researchers at the University of Cambridge published a paper(https://trojansource.codes/trojan-source.pdf) about a malicious attack named Trojan Source, which exploited the fact that some program interpreters, like CPython, can handle Unicode. This has caused concerns in the open-source community about the malicious contribution that looks totally legitimate in human eyes but contain invisible attacks. As a member of the Python community, we should all be aware of that and understand how we can prevent this attack to happen.
About this talk:
In this talk, Cheuk will decode the finding in this paper to a level that can be understood by everyone. She will start with a joke example who you can mess up someone by using Unicode. She will then explain what is Unicode and why it causes trouble. Afterwards, she will explain the Python examples(https://github.com/nickboucher/trojan-source/tree/main/Python) in the paper and why it can be dangerous. Lastly, she will open up a discussion on how we should defend ourselves from those attacks and what we can do as a community.
Outline (30 mins talk):
**5 minutes - Introduction, the opening of the talk
In this session, Cheuk will ask audiences to debug a code snippet that looks absolutely fine but will not work as code. She will explain that this is the same concept used in Trojan Source.
**10 mins - What is Unicode
In this session, Cheuk will give an introduction about what is Unicode, what it is to a computer and why we need Unicode in computers. She will also explain how the benefit of having Unicode can also be a downfall to make us vulnerable to the Trojan Source attack.
**10 mins - How Trojan Source works in Python
In this session, Cheuk will show a few examples using the Trojan Source in legitimate Python code. She will point out how the attack is hiding in the source code and in what cases it can be dangerous.
**5 mins - How to protect ourselves
In this session, Cheuk will open the discussion and make a few suggestions of how we can protect ourselves as a community. This will lead to the Q&A session where the audience can weigh in on their own thought.
From those who are curious to maintainers of open-source libraries. This is the knowledge we should all know and be aware of. Cheuk will explain in a way that expects no prior knowledge is needed.
What will audiences learn
About Trojan Source attacks and how it works. They may also learn about how interpreters, especially Python interpreters, works with Unicode. Plus, they may have increased awareness about security in the open-source world.