Search this Site
Open LLVM Projects
This document is meant to be a sort of "big TODO list" for LLVM. Each project in this document is something that would be useful for LLVM to have, and would also be a great way to get familiar with the system. Some of these projects are small and self-contained, which may be implemented in a couple of days, others are larger. Several of these projects may lead to interesting research projects in their own right. In any case, we welcome all contributions.
If you are thinking about tackling one of these projects, please send a mail to the LLVM Developer's mailing list, so that we know the project is being worked on. Additionally this is a good way to get more information about a specific project or to suggest other projects to add to this page.
The projects in this page are open-ended. More specific projects are filed as unassigned enhancements in the LLVM bug tracker. See the list of currently outstanding issues if you wish to help improve LLVM.
Description of the project: The copy-paste is a common programming practice. Most of the programmers start from a code snippet, which already exists in the system and modify it to match their needs. Easily, some of the code snippets end up being copied dozens of times. This manual process is error prone, which leads to a seamless introduction of new hard-to-find bugs. Also, copy-paste usually means worse maintainability, understandability and logical design. Clang and clang's static analyzer provide all the building blocks to build a generic C/C++ copy-paste detecting infrastructure. The infrastructure should be evolved alongside with useful features such as bug checkers and compiler diagnostics.
Confirmed Mentor: Vassil Vassilev
How to contact the mentor: email@example.com or firstname.lastname@example.org
Desirable skills: Advanced C++, Basic knowledge of Clang/Clang Static Analyzer
What the student will learn: Static analyzer, Clang, etc
Further information: The developed copy-paste infrastructure could be used to build for building more advanced bug checkers. Some examples and possible applications are:
Description of the project: The idea of this project is to extend scan-build to make it more reliable for projects. Currently, scan-build results pages do not allow the tracking of a defect or the possibility to tag false positive defects.
The project will focus on:
During this GSoC, the student will have to keep in mind the tracking platform should not be tied to scan-build and should be usable by other clang analyzer clients (Xcode, Eclipse, etc) with different needs (different database, etc.).
Note that the creation of a Unique Defect ID can be an hard problem.
Confirmed Mentor: Sylvestre Ledru, Jean-Daniel Dupas
How to contact the mentor: email@example.com & firstname.lastname@example.org
Deliverables of the project: New version of scan-build
Requirements: Triage and/or fix scan-build bugs or implement a proof of concept
Desirable skills: C++, Database, scan-build
What the student will learn: Static analyzer, Clang, etc
In addition to hacking on the main LLVM project, LLVM has several subprojects, including Clang and VMKit. If you are interested in working on these, please see their "Open projects" page:
Improvements to the current infrastructure are always very welcome and tend to be fairly straight-forward to implement. Here are some of the key areas that can use improvement...
Currently, both Clang and LLVM have a separate target description infrastructure, with some features duplicated, others "shared" (in the sense that Clang has to create a full LLVM target description to query specific information).
This separation has grown in parallel, since in the beginning they were quite different and served disparate purposes. But as the compiler evolved, more and more features had to be shared between the two so that the compiler would behave properly. An example is when targets have default features on speficic configurations that don't have flags for. If the back-end has a different "default" behaviour than the front-end and the latter has no way of enforcing behaviour, it simply won't work.
Of course, an alternative would be to create flags for all little quirks, but first, Clang is not the only front-end or tool that uses LLVM's middle/back ends, and second, that's what "default behaviour" is there for, so we'd be missing the point.
Several ideas have been floating around to fix the Clang driver WRT recognizing architectures, features and so on (table-gen it, user-specific configuration files, etc) but none of them touch the critical issue: sharing that information with the back-end.
Recently, the idea to factor out the target description infrastructure from both Clang and LLVM into its own library that both use, has been floating around. This would make sure that all defaults, flags and behaviour are shared, but would also reduce the complexity (and thus the cost of maintenance) a lot. That would also allow all tools (lli, llc, lld, lldb, etc) to have the same behaviour across the board.
The main challenges are:
The LLVM bug tracker occasionally has "code-cleanup" bugs filed in it. Taking one of these and fixing it is a good way to get your feet wet in the LLVM code and discover how some of its components work. Some of these include some major IR redesign work, which is high-impact because it can simplify a lot of things in the optimizer.
Some specific ones that would be great to have:
Additionally, there are performance improvements in LLVM that need to get fixed. These are marked with the slow-compile keyword. Use this Bugzilla query to find them.
The llvm-test testsuite is a large collection of programs we use for nightly testing of generated code performance, compile times, correctness, etc. Having a large testsuite gives us a lot of coverage of programs and enables us to spot and improve any problem areas in the compiler.
One extremely useful task, which does not require in-depth knowledge of compilers, would be to extend our testsuite to include new programs and benchmarks. In particular, we are interested in cpu-intensive programs that have few library dependencies, produce some output that can be used for correctness testing, and that are redistributable in source form. Many different programs are suitable, for example, see this list for some potential candidates.
We are always looking for new testcases and benchmarks for use with LLVM. In particular, it is useful to try compiling your favorite C source code with LLVM. If it doesn't compile, try to figure out why or report it to the llvm-bugs list. If you get the program to compile, it would be extremely useful to convert the build system to be compatible with the LLVM Programs testsuite so that we can check it into SVN and the automated tester can use it to track progress of the compiler.
When testing a code, try running it with a variety of optimizations, and with all the back-ends: CBE, llc, and lli.
Find benchmarks either using our test results or on your own, where LLVM code generators do not produce optimal code or simply where another compiler produces better code. Try to minimize the test case that demonstrates the issue. Then, either submit a bug with your testcase and the code that LLVM produces vs. the code that it should produce, or even better, see if you can improve the code generator and submit a patch. The basic idea is that it's generally quite easy for us to fix performance problems if we know about them, but we generally don't have the resources to go finding out why performance is bad.
The LNT perf database has some nice features like detect moving average, standard deviations, variations, etc. But the report page give too much emphasis on the individual variation (where noise can be higher than signal), eg. this case.
The first part of the project would be to create an analysis tool that would track moving averages and report:
The second part would be to create a web page which would show all related benchmarks (possibly configurable, like a dashboard) and show the basic statistics with red/yellow/green colour codes to show status and links to more detailed analysis of each benchmark.
A possible third part would be to be able to automatically cross reference different builds, so that if you group them by architecture/compiler/number of CPUs, this automated tool would understand that the changes are more common to one particular group.
The LLVM Coverage Report has a nice interface to show what source lines are covered by the tests, but it doesn't mentions which tests, which revision and what architecture is covered.
A project to renovate LCOV would involve:
Another idea is to enable the test suite to run all built backends, not just the host architecture, so that coverage report can be built in a fast machine and have one report per commit without needing to update the buildbots.
Sometimes creating new things is more fun than improving existing things. These projects tend to be more involved and perhaps require more work, but can also be very rewarding.
Many proposed extensions and improvements to LLVM core are awaiting design and implementation.
We have a strong base for development of both pointer analysis based optimizations as well as pointer analyses themselves. It seems natural to want to take advantage of this:
We now have a unified infrastructure for writing profile-guided transformations, which will work either at offline-compile-time or in the JIT, but we don't have many transformations. We would welcome new profile-guided transformations as well as improvements to the current profiling system.
Ideas for profile-guided transformations:
Improvements to the existing support:
LLVM aggressively optimizes for performance, but does not yet optimize for code size. With a new ARM backend, there is increasing interest in using LLVM for embedded systems where code size is more of an issue.
Someone interested in working on implementing code compaction in LLVM might want to read this article, describing using link-time optimizations for code size optimization.
LLVM Compiler Infrastructure
Last modified: $Date: 2009/12/16 09:03:23 $