The Structure of the Proton
In current collider experiments and in particular in upcoming ones, like the Electron Ion Collider at the Brookhaven National Laboratory at New York, the structure the constituents of nuclei, i.e., protons and neutrons, are (and will be) extensively studied. While we know protons and neutrons are made of quarks and gluons, we know little about how these building blocks are arranged. And while protons and neutrons make up the bulk of everything we see in the universe, their constituent quarks account for only a small fraction of their mass. Although being massless, gluons are in fact responsible for more than 90 percent of the mass of visible matter in the universe. These gluons generate the so-called strong force, one of the four
SuperGLEBer - The first comprehensive German-language benchmark for LLMs
Large Language Models (LLMs) are continuously being developed and improved, and there is no shortage of benchmarks that quantify how well they work; LLM benchmarking is indeed a long-standing practice especially in the NLP research community. However, the majority of these benchmarks are not designed for German-language LLMs. We assembled a broad Natural Language Understanding benchmark suite for the German language and evaluated a wide array of existing German-capable models.
This allows us to comprehensively chart the landscape of German LLMs.
A deep unsupervised Model for Protein Design
The design of new functional proteins can tackle many of the problems humankind is facing today but so far has proven very challenging1. Analogies between protein sequences and human languages have been long noted and a summary of their most prominent similarities is described. Given the tremendous success of Natural Language Processing (NLP) methods in recent years, its application to protein research opens a fresh perspective, shifting from the current energy-function centered paradigm to an unsupervised learning approach based entirely on sequences. To explore this opportunity further we have pre-trained a generative language model on the entire protein sequence space. We find that our language model, ProtGPT2, effectively speaks the
Strong-field Response of complex Systems
The interaction of light with matter covers a large number of physical phenomena that we literally see in our everyday life. Early scientists mostly focused on investigations of electromagnetic radiation in the visible range and at low intensities, where material polarization responds linearly to incident electromagnetic fields. Utilizing the compute clusters at PC2, this project aims at simulating and interpreting the strong-field dynamics of real molecules and larger systems in a rigorous real-space real-time approach including non-linear strong-field effects such as photoionization and high-order harmonic generation of systems ranging from small (chiral) molecules over nano-systems to the condensed phase.