Observatorio de I+D+i UPM

Memorias de investigación
A Generic Approach to Bytecode Analysis
Áreas de investigación
  • Lenguaje de programación
Analysis of the Java language (either in its source version or its compiled bytecode) using the framework of abstract interpretation has been the subject of significant research in the last decade. Most of this research concentrates on finding new abstract domains that better approximate a particular concrete property of the program analyzed in order to optimize compilation or statically verify certain properties about the run-time behavior of the code. In contrast to this concentration and progress on the development of new, refined domains, there has been comparatively little work in the development of extensible, generic frameworks that sustain the analysis. The first component in such a generic framework is a standard representation of the program that facilitates later analyses or optimizations. Although many times the description of that Control Flow Graph is omitted, we show that a uniform, compact representation is fundamental in order to manipulate similar constructions of the language in a uniform way. The Horn clause-based representation chosen is general enough to represent not only object-oriented programs, but also logic programming applications. In the context of the abstract interpretation framework, the fixpoint algorithm that lies at its very core has been demonstrated to have dramatic impact in the efficiency and precision of any analysis. A particularly optimal combination of the two attributes can be achieved by a flow-sensitive, context-sensitive fixpoint algorithm. We present a detailed description of such an algorithm, which contemplates mutually recursive nodes in the Control Flow Graph and uses memoization for further efficiency. Generic abstract interpretation frameworks work in conjunction with abstract domains. Each domain captures a particular property of the program. A very interesting characteristic to analyze is whether a set of variables share, i.e., they might reach the same memory location. The information gathered by a sharing domain is used for parallelization and/or for optimizing the compilation. What we present is a combination of domains (sharing, nullity and types) which can work together to refine their results, i.e, be more precise. The approach is shown to achieve better results than previous sharing analyses. The combinatorial nature of the set sharing domain has been the subject of intense debate within the analysis community. The exponential complexity of some of the operations (e.g., abstract unification in Prolog or field load/store in Java) seemed to be a big obstacle that prevented the domain from being widely used. In this thesis, we present the first scalable implementation of set sharing, which is based in a special type of Binary Decision Diagrams, called Zero-supressed Binary Decision Diagrams (ZBDDs). By representing sets of sets with this data structure, we not only improve the memory requirements of the analysis but also achieve better efficiency in terms of the overall running time.
Tipo de Tesis
Sobresaliente cum laude
Esta actividad pertenece a memorias de investigación
  • Participante: Mario Méndez Lojo (Universidad de Nuevo Mexico)
  • Director: Manuel de Hermenegildo Salinas (UPM)
Grupos de investigación, Departamentos, Centros e Institutos de I+D+i relacionados
  • Creador: Grupo de Investigación: Computación lógica, Lenguajes, Implementación y Paralelismo (CLIP)
  • Departamento: Inteligencia Artificial
S2i 2023 Observatorio de investigación @ UPM con la colaboración del Consejo Social UPM
Cofinanciación del MINECO en el marco del Programa INNCIDE 2011 (OTR-2011-0236)
Cofinanciación del MINECO en el marco del Programa INNPACTO (IPT-020000-2010-22)