Difference between revisions of "Refactoring"

From Schema Evolution
Jump to: navigation, search
(Introduction)
(Introduction)
Line 1: Line 1:
 
==Introduction==
 
==Introduction==
  
The traditional conception of waterfall software development  is strongly limited to cope with systems with stable requirements, in a sense waterfall is only suited to immutable systems. Unfortunately evolution seems to be inherent in the world, and software needs to be constantly adapted to changing requirements. So no matter how good the requirements analysis has been at design time, during system development and maintaince the requirements will change and the design of the system will require to be adapted. In this sense a lot of effort has been devoted to support software evolution and maintainance. We believe the one of the most challenging issues while maintaining a software is to keep the design of the application clean. No matter how hard one tries to deal with this issue, the so-called ''software entropy'', tends to reduce the quality of a design. The notion of quality of a design has been deeply discussed and still no agreement on software quality metrics has been riched \cite{isoquality}, however we will refer to a simpler notion, which is the following: ''given two software designs delivering the very same functionalities, a design is better if it is: simpler and easier to understand''. During a software lifetime is really common that new features need to be added, and due to changes in the requirements some of the existing features modified. Each modification tend to drift slightly from the original design, unexpected requirements and functionalities are dealt locally, and rarely the overall design re-evaluated, as a result an initial careful design ends up in a messy, not comprehensible piece of code, this in turn will produce in the following iterations even more hassle for the programmer that instead of understanding the global software rational will operate locally ''``hacking'''' a solution for the current client request, typically with strict deadlines. The overall result of this process is a system which gets harder and harder to maintain in which the cost of changes explode exponentially, a heavy depending on the original developer, the only one able to modify and maintain the system, a system more prone to errors. In this dreadful scenario, is evident the need for techniques aimed to improve software design quality during the overall system lifetime. For this purpose, the most general notion is the one of ''design restructuring'', the activity of improving the design of an existing system, while throughout this paper we will always refer to the notion of ''design refactoring'', which is a design restructuring referred to object-oriented (OO) programs. Refactoring is also the most commonly used term, and it is often improperly used also when referring to non-OO programs. Refactoring is a disciplined way to restructure a system in order to re-built a reasonable and comprehensible design from a working but messy body of code. Needless to say, refactoring is not a silver-bullet, but just a continuous activity that can effectively help the developers to cope with software aging. The rest of this paper is organized as follows: Section  introduces the main refactoring principles and provide a general view of the problem, Section  briefly discusses the relationship between refactoring and extreme programming, Section  presents the key refactoring techniques, Section , recalling the metaphor of ``code smells'' discusses the main symptoms of design deterioration, Section  is devoted to a more formal reference model, developed by M. Collins-Hope and H. Matthews, to do refactoring activities within agile methodologies, Section  is dedicated to the refactoring tool-support while in Section  we briefly discuss the role of refactoring w.r.t. research activity and in Section  draw some concluding remarks.
+
The traditional conception of waterfall software development  is strongly limited to cope with systems with stable requirements, in a sense waterfall is only suited to immutable systems. Unfortunately evolution seems to be inherent in the world, and software needs to be constantly adapted to changing requirements. So no matter how good the requirements analysis has been at design time, during system development and maintaince the requirements will change and the design of the system will require to be adapted. In this sense a lot of effort has been devoted to support software evolution and maintainance. We believe the one of the most challenging issues while maintaining a software is to keep the design of the application clean. No matter how hard one tries to deal with this issue, the so-called ''software entropy'', tends to reduce the quality of a design. The notion of quality of a design has been deeply discussed and still no agreement on software quality metrics has been riched \cite{isoquality}, however we will refer to a simpler notion, which is the following: ''given two software designs delivering the very same functionalities, a design is better if it is: simpler and easier to understand''. During a software lifetime is really common that new features need to be added, and due to changes in the requirements some of the existing features modified. Each modification tend to drift slightly from the original design, unexpected requirements and functionalities are dealt locally, and rarely the overall design re-evaluated, as a result an initial careful design ends up in a messy, not comprehensible piece of code, this in turn will produce in the following iterations even more hassle for the programmer that instead of understanding the global software rational will operate locally ''hacking'' a solution for the current client request, typically with strict deadlines. The overall result of this process is a system which gets harder and harder to maintain in which the cost of changes explode exponentially, a heavy depending on the original developer, the only one able to modify and maintain the system, a system more prone to errors. In this dreadful scenario, is evident the need for techniques aimed to improve software design quality during the overall system lifetime. For this purpose, the most general notion is the one of ''design restructuring'', the activity of improving the design of an existing system, while throughout this paper we will always refer to the notion of ''design refactoring'', which is a design restructuring referred to object-oriented (OO) programs. Refactoring is also the most commonly used term, and it is often improperly used also when referring to non-OO programs. Refactoring is a disciplined way to restructure a system in order to re-built a reasonable and comprehensible design from a working but messy body of code. Needless to say, refactoring is not a silver-bullet, but just a continuous activity that can effectively help the developers to cope with software aging. The rest of this paper is organized as follows: Section  introduces the main refactoring principles and provide a general view of the problem, Section  briefly discusses the relationship between refactoring and extreme programming, Section  presents the key refactoring techniques, Section , recalling the metaphor of ``code smells'' discusses the main symptoms of design deterioration, Section  is devoted to a more formal reference model, developed by M. Collins-Hope and H. Matthews, to do refactoring activities within agile methodologies, Section  is dedicated to the refactoring tool-support while in Section  we briefly discuss the role of refactoring w.r.t. research activity and in Section  draw some concluding remarks.
  
 
==Refactoring Principles==
 
==Refactoring Principles==

Revision as of 09:28, 5 June 2008

Contents

Introduction

The traditional conception of waterfall software development is strongly limited to cope with systems with stable requirements, in a sense waterfall is only suited to immutable systems. Unfortunately evolution seems to be inherent in the world, and software needs to be constantly adapted to changing requirements. So no matter how good the requirements analysis has been at design time, during system development and maintaince the requirements will change and the design of the system will require to be adapted. In this sense a lot of effort has been devoted to support software evolution and maintainance. We believe the one of the most challenging issues while maintaining a software is to keep the design of the application clean. No matter how hard one tries to deal with this issue, the so-called software entropy, tends to reduce the quality of a design. The notion of quality of a design has been deeply discussed and still no agreement on software quality metrics has been riched \cite{isoquality}, however we will refer to a simpler notion, which is the following: given two software designs delivering the very same functionalities, a design is better if it is: simpler and easier to understand. During a software lifetime is really common that new features need to be added, and due to changes in the requirements some of the existing features modified. Each modification tend to drift slightly from the original design, unexpected requirements and functionalities are dealt locally, and rarely the overall design re-evaluated, as a result an initial careful design ends up in a messy, not comprehensible piece of code, this in turn will produce in the following iterations even more hassle for the programmer that instead of understanding the global software rational will operate locally hacking a solution for the current client request, typically with strict deadlines. The overall result of this process is a system which gets harder and harder to maintain in which the cost of changes explode exponentially, a heavy depending on the original developer, the only one able to modify and maintain the system, a system more prone to errors. In this dreadful scenario, is evident the need for techniques aimed to improve software design quality during the overall system lifetime. For this purpose, the most general notion is the one of design restructuring, the activity of improving the design of an existing system, while throughout this paper we will always refer to the notion of design refactoring, which is a design restructuring referred to object-oriented (OO) programs. Refactoring is also the most commonly used term, and it is often improperly used also when referring to non-OO programs. Refactoring is a disciplined way to restructure a system in order to re-built a reasonable and comprehensible design from a working but messy body of code. Needless to say, refactoring is not a silver-bullet, but just a continuous activity that can effectively help the developers to cope with software aging. The rest of this paper is organized as follows: Section introduces the main refactoring principles and provide a general view of the problem, Section briefly discusses the relationship between refactoring and extreme programming, Section presents the key refactoring techniques, Section , recalling the metaphor of ``code smells discusses the main symptoms of design deterioration, Section is devoted to a more formal reference model, developed by M. Collins-Hope and H. Matthews, to do refactoring activities within agile methodologies, Section is dedicated to the refactoring tool-support while in Section we briefly discuss the role of refactoring w.r.t. research activity and in Section draw some concluding remarks.

Refactoring Principles

Refactoring techniques are commonly misleaded and confused with bug fixing, performance improvement or evolutionary maintenance; the differences among these are actually wide and need to be discussed in depth.

The act of fixing a bug or adding a new feature implies a modification of the semantics of the code; refactoring must not modify the original system semantics\cite{opdyke} but is about rewriting the code in such a way that allows a better understanding and an easier maintainability (and as a result an easier evolution).

Again as Martin Fowler says in his book \cite{refactoring}: ``refactoring is not a way to improve system performance, this, in accordance with the notion of quality introduced above, clearly states that the goal of refactoring is to improve the comprehensibility and simplicity of a system design and not to improve performance. However in turn a better design is a good starting point for a system profiling and tuning phase, aimed, this time, to improve performance.

With these premises we can introduce the so-called two-hat principle that states the distinction between phases in which the semantics of code is modified and those in which we refactor the code without changing the external behavior. The importance of keeping separated the two phases resides in the fact that it is quite hard to write, in a single pass, correct and easy understandable (well-designed) code. The problem is particulary present when a XP-like software development process is used due to the absence of a-priori design (see Section ).

Refactoring has to be considered a phase of the development process and should benefit the same ``best practices used in code development; for example the concept of \textit{proceeding by goals} that means making simple, local and, as much as possible, self-contained changes to the code. The reason behind this, is that of avoiding large and time consuming changes that can introduce bugs or make the code unclear. After each modification a complete run of test cases is needed to make sure that no bug has been introduced and that the system external behavior is maintained unchanged; this principle is called continuous testing and is shared by extreme programming methodologies and practices \cite{testxp}\cite{testxpexample}.

Another principle borrowed from extreme programmers is pair programming that, in our setting, will be called pair refactoring. The benefits of refactoring with a partner are almost the same of programming in pairs \cite{adam01experimental}; while one developer performs refactorings, the other checks that modifications to the code do not change the original code semantics nor introduce new bugs.

A tricky principle is continuous attention which means that after a refactoring, if the code is better (i.e., easier to understand, has reduced repetition, has a better design, etc.) then it can be released (or committed) otherwise, a rollback is needed. This is somewhat related to the notion of safety, in fact, if a refactoring does not effectively improve the quality of code, it is safer to return to previous situation, which has been tested more deeply, to avoid unforeseen behaviors of the code just introduced by the refactoring.

Role of Testing

As said above, testing is one of the foundations for refactoring. Before starting refactoring a software it is needed to have solid test-suites, and due to principles such as continous testing it is strongly suggested the test-suites being fully automatic and self testing;

however here there is an additional reason to build or enrich test suites which is related to the fact that refactoring is a risky process that can broke a working software introducing new bugs, so you have to be sure that the code is working as before after a refactoring. The process can be seen as an alternation of phases: small change, test, small change, test and so on so forth. Tests must be run frequently, the common thumb rule is: all test-suites must be run at least once a day.

One of the most interesting relationships between testing and refactoring is that test-suites can benefit from refactoring. The rationale is that after refactoring the code is easy to understand and so identifying potential unforseen bugs is more feasible, in such situations the developer is asked to write and store immediatly a piece of code that exposes the bug and which will become part of the original test-suite, enriching it.

It is obvious that finding bugs is a process that takes a lot of time so it is strongly suggested to use a semi-automatic support tool such as the JUnit\cite{junit} suite that is in bundle with almost all IDE's for Java like Eclipse\cite{eclipsebook} and NetBeans.

With these premises we can state that: A refactoring is safe if, applied to a working program, does not break it.

Refactoring can be done, according to Martin Fowler, in different ways with increasingly levels of trust: \item Trust your coding abilities. \item Trust that your compiler will catch errors that you miss. \item Trust that your test suite will catch errors that you and your compiler miss. \item Trust that code reviewers will catch errors that you, your compiler and your test suite miss.

It is often suggested to refactor using appropriate refactoring tool (see Section ), to reduce the risk of manual errors and to increase the overall refactoring safety.

When Refactoring

It is interesting to analyze when refactoring is needed, this is important mostly for project managers that have to plan and evaluate the cost of refactoring phases during the development process. One of the typical situations that needs refactoring is when a new functionality is being added, if the process of adding the feature is not absolutely seamless a rule of thumb encourage a refactoring phase before the development of the new feature. This situation is typical in agile methodologies when the design phase is done continuously during the development but is quite common in all software projects when evolutionary maintainance is required.

Another situation in which is quite useful doing refactoring is before code reviews. It is common that a piece of code that is clear for a developer results obscure for the rest of the team. Refactoring can improve the effectiveness of code reviews just because the code is easy to understand for many people and not only for its developer, so bugs can be found faster and further refactorings can be suggested by team's members.

Some people are used to consider the act of fixing a bug as a warning for refactoring, the presence of a bug into the code is sometimes related to a tricky control flow or on a long list of done to a piece of code without a proper evolution of its structure. We consider this last rule of thumb refactor once for every bug too much time(money)-consuming. We think that a better compromise could be the definintion of a threshold <math>\textstyle \upsilon</math> which represents a certain amount of solved bugs in a module or in a subsystem, when the amount of fixed bugs goes over <math>\textstyle \upsilon</math> we should plan a refactoring process, clearly <math>\textstyle \upsilon</math> should not be a magical fixed value but must be related to some code metrics (e.g., code complexity, number of developers, number of revisions, etc.), the proper setting of this value is extremely project specific.

Refactoring and Extreme Programming

Before discussing in more details some of the most important refactoring techniques it is important to point out the relationship between refactoring and agile methods, in particular considering the more radical such as Extreme Programming. Agile methods born as an answer to the limitations of the waterfall approach w.r.t. changing requirements, and in general to rapidly evolving software. The focus here is no ``embracing the change as simply, an immutable state of the world; the evolution is perceived as inherent in the software and as a result the upfront design is trade for more lightweight design activities. Without going into too much details in these methodologies, it is relevant to notice that each of this consider of paramount importance the chance of evolving the system design during the development, e.g., Extreme Programming basically consider the design as capillary activity to be carried out continuously during the implementation of the system, this means being able to modify previous design as the development goes on. In this scenario, refactoring plays an important role, a reduce amount of upfront design must be balanced by the possibility of systematically restructure the system, refactoring here has to be considered as an enabling technique, introducing a more formalized way of re-designing a system during the development, also to guarantee overall software quality, a delicate issue for agile methodologies \cite{softwarequality}.

Key Refactoring

This section shows some of the most common refactorings that are the building blocks for more complex, refactorings. The first definition of key refactorings is due to W. Wake\cite{workbook} while a complete list of refactorings is available online at Refactoring website\cite{refactoringweb} and a rich survey on refactoring approaches is presented in \cite{survey2004}. It is important to notice how it is only partially feasible the automatic support of refactoring. In fact, some of the modifications which have to be applied to the source code are not syntactic and the developer intervention is required to preserve the semantics of the original code. Here follows a list of key refactoring with a brief explanation:

\item \textbf{Change Bidirectional Association with Unidirectional:} This refactoring is applied when there is an unneeded two-way association between classes but one of them no longer needs features from the other. The proposed action for this case is simply to drop the unneeded end of the association, the problem is obviously to identify all methods or portions of code that use the removed reference. This refactorings can only be done in a semi-automatic way because the machine cannot determine how the calls to the removed reference can be modified to preserve the semantics of the original code.

\item \textbf{Remove Middle Man:} This refactoring is aimed to remove unneeded o tricky delegations between classes, e.g., if a client class named <math>\textstyle A</math> needs features from a server class named <math>\textstyle B</math>, <math>\textstyle A</math> should access directly to <math>\textstyle B</math>'s public methods without delegating this job to a middle class <math>\textstyle C</math>. There exist some known exceptions which do not make this refactoring suitable, for example when the delegation is introduced for some specific reasons (e.g., security, shared resources with concurrent access, etc.) like the case of Proxy Pattern\cite{patterns}. This refactoring has an anti-pattern\cite{antipatterns} behavior because it might lead to de-structure the code. The difference resides in the fact that the purpose of this refactoring is to remove a pattern structure that it is not needed and it is a developer's responsibility to decide if the pattern is necessary or if it can be removed safely.

\item \textbf{Extract Class:} Sometimes it happens that a class is doing the work that should be done by two different classes. Extract class addresses the problem by creating a new class and moving the relevant fields and methods from the old class into the new class. The new class must contain all methods and fields that are semantically bound to each other.

\item \textbf{Extract Method:} Similar to the previous one, this refactoring is aimed to extract only a portion of code from a method than can semantically exist as a separate method: these code fragments can be sometimes scattered in different places of the class but they can be grouped together to do something useful. Refactoring tools are, in general, able to automatically create a new method signature from a selected portion of the code and to replace it with a call to the new method.

\item \textbf{Extract Interface:} Same as above applied to interfaces, sometimes it might be useful to Extract from a class an interface, which may contain only a portion of the class methods, in order to support further implementations of the same interface, Figure shows an example of tool support for this refactoring.

\item \textbf{Move Field:} This simple refactoring can be applied when a field is used by another class more often than the class in which it is defined. This behavior may indicate that the field is semantically related to the other class more than the one hosting it (locality principle). The effect of this refactoring is the creation of a new field in the accessing class; the original references to the field can be automatically modified using the parse tree to comply with the new position.

\item \textbf{Move Method:} Like fields also methods can be moved to achieve locality. This time however moving a method can be done by either remove the original method from the source class or add a delegation to the target class that actually owns the moved method. The choice depends on your needs to preserve the original calls (e.g., for compatibility reasons) or not.

It is important to notice that for each refactoring there exists, in general, an inverse refactoring, e.g., Collapse Hierarchy and Extract Hierarchy refactorings are one the inverse the other. Refactorings are tipically intended at code level but there are situations in which you can raise the level of abstraction and apply refactorings also to other artifacts like architecture diagrams, behavioral models, database schemata or even to requirements analysis. Refactoring of non-code artifacts raise the expressive power of the modification that can be made. On the other hand, applying refactorings to different types of artifacts increase issues such as synchronization of modification, e.g., the modification introduce in a sequence diagram must be kept in synch with the actual implementation.

Smells

When approaching the problem of refactoring a software, one of the first issues to be addressed is where to refactor, i.e., which portion of the code are not well designed and need some dedicated activity. Fowler speaks in \cite{refactoring} about ``smells', referring to symptoms in the code that can be used to point out potential problems in the design. In general, there exist no sound and complete approaches to identify smells; the identification of smells can highly depend on the particular application domain. The most common classification of ``smells is presented in \cite{refactoringweb} without any formal framework.

On the contrary, Tourw\`e and Mens faces this issue formally by using a semi-automatic approach based on logic meta programming\cite{tourwemetaprogramming}. Their work allows to formally define what is a smell and then automatically detecting it, proposing also a set of semantic-preserving refactorings.

A promising approach is that of Simon et. al that use object-oriented metrics\cite{simonetal} to identify smells and proposing a suitable refactoring. Also the proposed refactorings are chosen applying appropriate metrics. This approach is more effective if used in conjunction with a graphical support showing the portion of code involved in refactorings.

In the last years a formal approach have been proposed by Kataoka et al. implemented into the Daikon tool\cite{daikon} to automatically verify whether a refactoringa can be applied preserving all the program invariants\footnote{An invariant is a certain condition that is always true for all the executions of a program}. The main problem with this approach is that it requires dynamic analysis of the runtime behaviour: the application needs to be executed to infer the program invariants. To this extent, the tool uses a representative set of test suites. However it is a known issue that it is impossible to build a test suite covering all possible runs of a program. Therefore, invariants may not hold in general.

The following subsection is dedicated to a more in-depth discussion of code smells, as presented by Fowler in his book \cite{refactoring}.

Code Smells

In this section we classify typical structures at code level that are considered symptoms of bad design to be addressed by refactoring techniques. In our dissertation we avoid to discuss situations which are commonly known to be source of errors like dead code, duplication or misused language features, while focusing on less trivial cases. In the literature it is common to group smells into two families: smells within classes and smells between classes; here we introduce another family called non standard smells to indicate situations where a smell mixes different levels of abstraction, so not being classifiable in the previous classes.

\subsubsection{Smells Within Classes} The first family contains smells which are localized inside a single class. It is possible to identify some common patterns of smells that share the same symptoms of bad design.

\smallskip \item \textbf{Long pieces of code:} It is know that long pieces of code are source of errors. When a method or a class appears to contain too many lines of code, there is probably the need to introduce some procedural abstractions to structure the code in a better way. Special cases are long parameters lists that can be inspected looking for groups of parameters that can be naturally grouped together as fields of a new class representing the semantic binding between them. This family of smells contains also all the situations where tricky conditional logic (e.g., big switch structures, long boolean formulae) is in place; these situations can often be eliminated with a proper use of polymorphism. \smallskip \item \textbf{Uncommunicative names:} This subset of smells includes all the ``worst practice in assigning names to code elements, e.g., the hungarian notation or names that do not reflect the meaning of the method or field. As an example when designing and implementing a stack you should not call methods insert or remove because in such a structure, due the actual behavior of the methods, the semantics is much more explicit if the methods are named push and pop. \smallskip \item \textbf{Unnecessary complexity:} This family of smells is quite controversial, in fact, together with very reasonable simplification of over-complex code structure, it supports one of the most extreme principle of the agile methodologies: ``You Are not Going to Need It, YAGNI principle (e.g., remove useless design patterns or unused abstract classes). In the context of Extreme Programming properly used, but currently not indispensable design patterns need to be removed to lower the complexity. This aggressive approach, may tend to oversimplify the code, and remove known and effective pattern. In particular, if it is rather clear that the system evolution will require a given set of functionalities in the nearly future, it seems unreasonable not to anticipate these changes with a good design of the application. This class of refactorings must be handled carefully by the designer, where a proper trade off must be set case by case.

\subsubsection{Smells among Classes} Let us now introduce a family of smells that can be detected among classes, this kind of refactorings are more difficult to implement for the obvious reason that a single developer might not have the ownership of one of the involved classes, therefore there might be the need to work in team to apply these refactorings. \smallskip \item \textbf{Misused Hierarchy:} This set of smells includes all the situations where you feel the need to re-organize your classes into an hierarchy or to re-organize the hierarchy itself to eliminate ad hoc solutions to accomodate changes in classes. These smells range from tidly coupled classes that can be organized into a delegation, over-intimate classes, refused bequest smell which refers to the case in which a subclass no longer needs some of the methods from its superclass. An example of the latter smell is the classic Penguins-don't-fly problem reported in Figure . \smallskip \item \textbf{Sparse Responsibility:} Inside this class we group all the cases where a functionality is implemented by an object that was not supposed to do that or when a functionality is spread on several classes but would be better located into one or few classes. The aim of the refactorings addressing these smells is to relocate methods and fields in such a way that the responsibility of a functionality can be easily identified in a reduced number of classes. These refactorings tend to improve readability an maintainability of a software.

FIGURE DELETED

\subsubsection{Non Standard Smells} In this section we introduce some smells that need to be addressed carefully due to the impact that the corresponding refactorings have on the software structure or even in external applications interacting with the refactored classes. \item \textbf{Absent MVC:} This smell is almost self-explaining and refers to those softwares that are supposed to have a three-tier structures (e.g., application with a graphical interface) where the presentation layer (view) is supposed to be independent from the layer implementing the business logic (controller) and from the layer implementing the data abstractions (model). It is quite common the situation when the model and the controller are too much coupled making difficult the maintenance and the evolution of software. Refactoring this means introducing a proper MVC design pattern, which as one can easily see has a quite heavy impact on the code structure. \item \textbf{Changing Interfaces and Exceptions:} These smells refer to all those cases where appear evident that, often for historical reasons, a set of classes which are supposed to share an interface are actually implementing different interfaces, or cases where an interface need simply to be changed to reflect evolution of the rest of the system. The solution of the first example can be the unification of interfaces but the interfaces are public (or even worst published) this might affect not-controllable third-party applications and the refactoring must be done carefully ensuring the backward compatibility as much as possible. Exceptions smells are special cases of the previous ones. Clearly this family of refactorings are really tricky since they cannot leverage the encapsulation, but need to operate on the external behavior of classes or subsystems.

Design Smells

As said before, not only the code might be addressed with refactorings but also other artifacts. Recent research trends addressed design level artifacts with refactorings (actually restructuring) for UML models\cite{astels02}\cite{pollet01} sometimes integrating a refactoring browser (see Section ) with a UML modeling tool. Refactorings can be applied to class diagrams, statechart diagrams and activity diagrams. For each of them, the user can apply refactorings that cannot easily be expressed in other diagrams or in the source code. These approaches are desirable as a way to refactor designs artifacts independent of the underlying programming language. To deal with refactoring of software architectures, Philipps and Rumpe proposed a promising approach where refactoring rules operate directly on the graphical representation of a system architecture\cite{philipps97}. These rules preserve the behaviour specified by the causal relationship between the components. An example of this kind of methodologies is discussed in Section .

Requirements Smells

Refactoring or, more appropriately, restructuring can also be applied to requirements. This approach has been firstly proposed by Russo et al. in \cite{russoetal}. They suggested to restructure requirements expressed in natural language by decomposing them into a structure of viewpoints. A viewpoint encapsulates partial requirements of a set of system components, where their interactions are made explicit. This restructuring approach seems to increase requirements understanding and makes easier to identify portions of the code to be refactored when requirements change.

Supporting Design Refactoring

In this section we will present a work of M. Collins-Hope and H. Matthews, defining an architectural reference model for refactoring softwares developed with an agile development process\cite{referencemodel}. We believe that this model is relevant also for refactoring software developed following other methodologies. In fact, this model is used to: \item provide a framework for decision making during the design of components; \item support and re-enforce the appropriate application of good OO design principles, in particular those concerning stability and dependency management; \item provide an architectural framework to encourage re-use; \item encourage re-use of business specific components (e.g., processes); \item place components in the reference model as the unifying means to tie together different architectural views of a system; \item improve the understanding of the layering in a component context.

Reference Model

The overall architectural model is shown in Figure and is divided into layers each of which has a defined semantics and are layered according to how specific they are:

FIGURE DELETED

\item \textbf{Application Interface:} being the most specific occupies the highest position in the layering. It is responsible for managing the interaction between the ``outside world and the lower layers within the application. It typically consists of components providing GUI interface functionalities for the human interaction or managing public interfaces if the system need to interact with other automatic systems. This layer contains what Jacobson et al. call boundary classes \cite{jacobson}. \item \textbf{Application Specific:} This layer is comprised of objects and components that encapsulate the great majority of the business process and the associated business rules automated by the application. Typically it will contain many objects similar to Jacobson's control objects; and often also acts as the ``knowledge layer in Fowler's operational/knowledge split. It may also contain specialised subclasses implementing interfaces left ``open (as in Open Closed Principle\cite{martin96}) by the more general purpose components in the layer below, and typically does not contain persistent business classes. Most important, this layer contains the bindings to tie together components within the next layer. \item \textbf{Business Domain Specific:} This layer is comprised of components which encapsulate the interface to one or more business classes, which are specific to the domain (area of business) of the application, and are generally used from multiple places within the application. They might also be used by a family of related applications (e.g., a software product line). This layer typically contains the entity classes discussed by Jacobson. \item \textbf{Technical Infrastructure:} This layer groups the components that are potentially re-usable across many domains, providing general purpose technical services such as persistence infrastructure, general programming infrastructure (e.g., lists, collections, etc.). \item \textbf{Platform Software:} This is the last (and most re-usable) layer. This is comprised of standard or really common pieces of software that are brought in to underpin the application (e.g., operating systems, distribution infrastructure like CORBA, .COM, etc.)

Layer Semantics

As said before there exists a clear semantics associated to the model axes.

The vertical axis indicates the specificity (how specific it is to a particular application/environment) of a component in the application. The higher it appears in the layering of the reference model, the more specific it is. The lower it appears, the more general purpose it is. With this semantics it is possible to associate a ``centre of gravity to the application: \item high: very application specific, difficult to extend without substantial modification to existing components. \item low: good layering applied, likely to have hooks for extension without any modification to several existing components.

The layer ordering (high to low) is based on component compile time dependencies between the components that reside within the layer. In the terms presented in this paper, the device driver interface of an operating system is an extension point to enable customisation of the operating system component to a piece of particular hardware. The operating system is more generic (general purpose) than the device drivers it uses (which are tied to particular hardware). The device drivers are also dependent upon the operating system for their definition. Summarising, the layering semantics presented here tie together the concept of the specificity of a component with the notion of compile time dependencies. The higher a component in the model, the more specific it is likely to be, and the more dependent it is likely to be on other components, and the other way around.

Rules

There are some simple rules associated with this model: \item there should be a clear and simple mapping between component structure and source code structure (the simplest being a 1-1 mapping), and between the component structure any other analysis and design artefacts produced during the development process (e.g., the design view of a component). \item the level of a component is the highest level of any of its constituent classes. \item components should not (and by the above definition, cannot) cross layers. \item the compile time dependencies between components within any particular layer should be to components in either the same or a lower layer. \item the application and domain layers should be technology independent in the interface components w.r.t. to the outside world. An example of UML refactoring using this model is shown in Figure

FIGURE DELETED

Model Problems

The model presented is a theoretical support but in real world applications some shortcomings of this model have been pointed out. The most evident is that, sometimes, it is not feasible to respect rules like: ``a component can only depend on the lower layers. Another common problem is that there are clearly situations in which the proposed granularity is not enough and there will be sub-layerings within the layers presented worth being captured by the model. However, the price would have been too high in terms of additional complexity. The last known issue is that lower layers are not always easy to extend (e.g., CORBA), if proprietary technologies are involved this is not even possible.

Tool Support

Although it is possible to refactor manually, tool support is considered a crucial issue for refactoring. Today, a wide range of tools is available to automate various aspects of refactoring. In this section, we explore known characteristics that might affect the usability of a tool which are the levels of automation, reliability, configurability and scalability. The existing tools present only a partial support, w.r.t. the desiderata, however already results in practice really effective.

Automation

The degree of automation of a refactoring tool depends on which refactoring activities are supported and how many of these activities are automated. A semi-automatic approach can drastically increase the productivity\cite{tokuda01} in terms of coding and debugging time. Another main advantage of refactoring tools from the viewpoint of the developer is that their behaviour-preserving nature significantly reduces the need for debugging and testing, two activities that are known to be very time consuming and labour intensive. As an alternative to this semi-automatic approach, some researchers proved the feasibility of fully automated refactoring\cite{guru} but this opportunity is restricted to certain strongly-typed languages and for a reduced number of refactorings. In many cases, automating refactoring activities gives rise to new activities or opportunities that were not possible without automation. For example, the benefit of automatic refactorings is that the process of refactoring is reversible and any change can be easily undone, to allow the software to be restored to its original state if it turns out that the refactorings did not succeed in their design-improving goal. Compared to partial automation, fully automated refactoring and restructuring tools exhibit the disadvantage of doing too much work, in the sense that it might happen that some portions of the refactored software become more difficult to understand than before. The main reason for this is that a significant part of the knowledge required to perform the refactoring cannot be extracted from the software itself, but remains implicit in the developer's mind, and that often a ``weird human design is still easier to get than a ``weird automatic design.

Reliability

The reliability of a refactoring tool can be defined as the ability to guarantee that the provided refactoring transformations preserve the semantics of the original code. As said before it is possible to guarantee this behavior only in very specific cases. Because of these restrictions, most tools check the refactoring preconditions before applying it, and perform tests afterwards. In absence of a full guarantee of behaviour preservation, it is essential that a refactoring tool provides effective undo mechanisms to rollback the changes.

Configurability

There is a variety of ways in which a user (or a group or users) should be able to configure a refactoring tool for a particular usage, for example by adding, removing or modifying existing refactorings and smell specifications or by defining composite refactorings by combining primitive ones. Having a configurable tool is a must to allow proper personalization, a key asset during the development.

Scalability

To increase the scalability and performance of a refactoring tool, frequently used sequences of primitive refactorings should be combined into composite refactorings. The use of composite refactorings has several advantages. First of all, they better capture the specific intent of the software change induced by the refactoring. As such, it becomes easier to understand how the software has been refactored. Second of all, composite refactorings result in a performance gain because the tool needs to check the preconditions only once for the composite refactoring, rather than for each primitive refactoring in the sequence separately \cite{roberts99}\cite{mens99}. A third advantage of composite refactorings is that we can weaken the behaviour preservation requirements of its primitive constituents, as far as the overall result is consistent. The primitive refactorings in a sequence do not have to be behaviour preserving, as long as the net effect of their composition is behaviour preserving, this approach is a known issue in the filed of database about integrity and transactions. A final aspect of scalability has to do with change propagation. Because changes tend to propagate throughout the software, the application of a certain refactoring may suggest or even require other refactorings to be applied as well, in order to achieve the goal intended by the original refactoring, a tool to be scalable need to efficiently manage this kind of ``domino effects.

FIGURE DELETED

Refactoring Browser

Nowadays almost every Integrated Development Environment (IDE) such as Eclipse\cite{eclipse} and NetBeans include a refactoring browser\cite{roberts97} to support semi-automatic refactorings, while the identification of which portion of the software needs to be refactored and the selection of the most appropriate refactoring to apply remains a developer task. The first implementation of a refactoring browser appeared in 1998 supporting only the SmallTalk language. A refactoring browser is a tool that automatically checks if the pre and post conditions of a syntactic refactorings are satisfied (i.e., the refactoring is safe) speeding up the process of actualizing refactorings. From a theoretical point of view, after an automatic refactoring we need no to run our test suite because safeness is automatically checked by the browser and the semantics should not be changed. However, since it is not possible to always guarantee the preservation of the semantics of the original code, we cannot achieve for every refactoring a fully-automatic behavior and tests cannot, in general, be eliminated. What is needed by by a refactoring browser to work is the access to the parse tree used by the IDE and a (program) database allowing a fast search of the various entities across the entire code (e.g., to find the calls to a method or uses of a variable). Figure shows an example of Eclipse dialog to support the Extract Interface refactoring.

Teaching and Research: the role of Refactoring

While studying refactoring we realized that this is a common, although often unconscious, practice in university. When portion of a system are developed by independent small teams, typically by students in course projects or thesis, the need for refactoring and integrating theme is evident. While involving students in research activities will heavily increase the design and coding ``power of a research group, from a didactical point of view this give the students the chance to be involved early in real projects. Participating in real projects they will experience real design and development challenges and being properly guided they will learn design best practice appreciating at a time also the impact of their theoretical background in the engineering practice. From our experience the students will be more motivated and will both learn more, and be more prone to research activities. However one can certainly argue about the quality of the produced solutions, it seems that the overall quality of design and code can decrease by unexperienced students independently designing core components. Being the supervising activities strictly time-bounded is unreasonable expect a tight enough check of the design performed by members of the research group. For this reason we advocate a continuous Refactoring of the developed solutions, to be carried out at each stable point by experienced members of the team. The overall quality of solutions will remain very high and the refactoring overhead will definitely be less than the speed up obtained by heavily involving students in the design and development process. As a result the overall development quality will remain high, fully exploiting the work done by students together with the experience introduced in terms of refactoring by the teachers, while the student will have the chance to challenge themselves against difficult problems, considering new solutions, and learning the critical approach, typical of research itself.

Conclusions

Due to its flexible and volatile nature software is always required to be flexible and ready to adapt to changing needs and fast evolving requirements, this require techniques to cope with. Wild software evolution always turns out to degenerate the code quality, and well designed systems tend to evolve in poorly-structured, messy, unmaintainable softwares. Refactoring is a disciplined way to maintain software design and may effectively been applied to improve the design of existing code. Clearly refactoring does not solve issues at no cost, refactoring is an expensive, although often needed phase in software maintainance. While refactoring enthusiast claim this to be some sort of panacea for software evolution and design degradation, the problem must be seen with a broader viewpoint, considering refactoring just a useful technique. Most of the literature in this field seems to be quite informal, and appears to be more a set of ``rules of thumb plus general OO best practice more than a general theoretical framework. Nonetheless the practical impact of this is not refutable and several successful applications of refactoring are reported in literature. We consider refactoring very useful if approached as a constant activity to be carried out during development, to maintain a good software design.

References

Personal tools