Learning Component Behavior from Tests : Theory and Algorithms for Automata with Data

Sammanfattning: Formal models are often used to describe the behavior of a computer program or component. Behavioral models have many different usages, e.g., in model-based techniques for software development and verification,such as model checking and model based testing.The application of model-based techniques is hampered by the current lack of adequate models for most actual systems, largely due to the significant manual effort typically needed to construct them. To remedy this, automata learning techniques (whereby models can be inferred by performing tests on a component) have been developed for finite automata that capture control flow. However, many usages requiremodels also to capture data flow, i.e., how behavior is affected by data values in method invocations and commands. Unfortunately, techniques are less developed for models that combinecontrol flow with data.In this thesis, we extend automata learning to infer automata models that captureboth control flow and data flow. We base our technique on a corresponding extension of classical automata theoryto capture data.We define a formalism for register automata, a model that extends finite automata by adding registers that can store data values and be used in guards and assignments on transitions. Our formalism is parameterized on a theory, i.e., a set of relations on a data domain. We present a Nerode congruence for the languages that our register automata can recognize, and provide a Myhill-Nerode theorem for constructing canonical register automata, thereby extending classical automata theory to register automata.We also present a learning algorithm for register automata: the new SL' algorithm, which extends the well-known L' algorithm for finite automata. The SL' algorithm is based on our new Nerode congruence, and uses a novel technique to infer symbolic data constraints on parameters. We evaluated our algorithm in a black-box scenario, inferring, e.g., the connection establishment phase of TCP and a priority queue, in addition to a number of smaller models. The SL' algorithm is implemented in a tool, which allows for use in more realistic settings, e.g., where models have both input and output, and for directly inferring Java classes.

  KLICKA HÄR FÖR ATT SE AVHANDLINGEN I FULLTEXT. (PDF-format)