Jump to content

User:Danakil/Programming language (reformatted)

From Wikipedia, the free encyclopedia
This is a candidate reformatting of Programming language. Feel free to modify it at will.

A programming language or computer language is a standardized communication technique for expressing instructions to a computer. It is a set of syntactic and semantic rules used to define computer programs. A language enables a programmer to precisely specify what data a computer will act upon, how these data will be stored/transmitted, and precisely what actions to take under various circumstances.

Introduction

[edit]

A primary purpose of programming languages is to enable programmers to express their intent for a computation more easily than they could with a lower-level language or machine code. For this reason, programming languages are generally designed to use a higher-level syntax, which can be easily communicated and understood by human programmers. Programming languages are important tools for helping software engineers write better programs faster.

Understanding programming languages is crucial for those engaged in computer science because today, all types of computation are done with computer languages.

During the last few decades, a large number of computer languages have been introduced, have replaced each other, and have been modified/combined. Although there have been several attempts to make a universal computer language that serves all purposes, all of them have failed. The need for a significant range of computer languages is because the purpose of programming languages varies from commercial software development to scientific to hobby use; the gap in skill between novices and experts is huge and some languages are too difficult for beginners to understand and use; computer programmers have different preferences; and finally, acceptable runtime cost may be very different for programs running on a microcontroller and programs running on a supercomputer.

There are many special purpose languages, for use in special situations: PHP is a scripting language that is especially suited for Web development; Perl is suitable for text manipulation; the C language has been widely used for development of operating systems and compilers (so-called system programming).

Programming languages make computer programs less dependent on certain machines or environments. This is because programming languages are converted into specific machine code for a given machine rather than being executed directly by the machine.

There are two mechanisms used to translate a program written in a programming language into the specific machine code of the computer being used.

If the translation mechanism used is one that translates the program text as a whole and then runs the internal format, this mechanism is spoken of as compilation. The compiler is therefore a program which takes the human-readable program text (called source code) as data input and supplies object code as output. The resulting object code may be machine code which will be executed directly by the computer's central processing unit (CPU), or it may be code matching the specification of a virtual machine.

If the program code is translated at runtime, with each translated step being executed immediately, the translation mechanism is spoken of as an interpreter. Interpreted programs run usually more slowly than compiled programs, but have more flexibility because they can interact with the execution environment. See interpreted language for detail. Although the definition may not be identical, these typically fall into the category of scripting languages.

Most languages can be either compiled or interpreted, but most are better suited for one than the other. In some programming systems, programs are compiled in multiple stages, into a variety of intermediate representations. Typically, later stages of compilation are closer to machine code than earlier stages. One common variant of this implementation strategy, first used by BCPL in the late 1960s, was to compile programs to an intermediate representation called "O-code" for a virtual machine, which was then compiled for the actual machine. This successful strategy was later used by Pascal with P-code and Smalltalk with byte code, although in many cases the intermediate code was interpreted rather than being compiled.

For a detailed timeline, see History of programming languages

Features of a programming language

[edit]

Each programming language can be thought of as a set of formal specifications concerning syntax, vocabulary, and meaning.

These specifications usually include:

  • Data and data structures
  • Instruction and control flow
  • Reference mechanisms and re-use
  • Design philosophy

Most languages that are widely used, or have been used for a considerable period of time, have standardization bodies that meet regularly to create and publish formal definitions of the language, and discuss extending or supplementing the already extant definitions.

Data types and data structures

[edit]

Internally, all data in a modern digital computer are stored simply as zeros or ones (binary). The data typically represent information in the real world such as names, bank accounts and measurements and so the low-level binary data are organised by programming languages into these high-level concepts.

The given system by which data are organized in a program is the type system of the programming language; the design and study of type systems is known as type theory. Languages can be classified as statically typed systems, and dynamically typed languages. Statically-typed languages can be further subdivided into languages with manifest types, where each variable and function declaration has its type explicitly declared, and type-inferred languages. It is possible to perform type inference on programs written in a dynamically-typed language, but it is entirely possible to write programs in these languages that make type inference infeasible. Sometimes type-inferred and dynamically-typed languages are called latently typed.

With statically-typed languages, there usually are pre-defined types for individual pieces of data (such as numbers within a certain range, strings of letters, etc.), and programmatically named values (variables) can have only one fixed type, and allow only certain operations: numbers cannot change into names and vice versa. Examples of these languages are: C, C++ and Java.

Dynamically-typed languages treat all data locations interchangeably, so inappropriate operations (like adding names, or sorting numbers alphabetically) will not cause errors until run-time. Examples of these languages are: Objective-C, Lisp, JavaScript, Tcl and Prolog.

Type-inferred languages superficially treat all data as not having a type, but actually do sophisticated analysis of the way the program uses the data to determine which elementary operations are performed on the data, and therefore deduce what type the variables have at compile-time. Type-inferred languages can be more flexible to use, while creating more efficient programs; however, this ability is difficult to include in a programming language implementation, so it is relatively rare. Examples of these languages are: Haskell, MUMPS and ML.

Strongly typed languages do not permit the usage of values as different types; they are rigorous about detecting incorrect type usage, either at runtime for dynamically typed languages, or at compile time for statically typed languages. Ada, Java, ML, and Python are examples of strongly typed languages.

Weakly typed languages do not strictly enforce type rules or have an explicit type-violation mechanism, often allowing for undefined behavior, segmentation violations, or other unsafe behavior if types are assigned incorrectly. Assembly language, C, C++, and Tcl are examples of weakly typed languages.

The typing traits strong to weak form a continuum. ML is strongly typed relative to Java, which is strongly typed relative to C, and the reverse, C is weakly typed relative to Java, which is weakly typed relative to ML. Use of these terms is often a matter of perspective, much in the way that an assembly language programmer would consider C to be a high-level language while a Java programmer would consider C to be a low-level language.

In contrast, the traits strong and static are orthogonal concepts. C is a weakly, statically typed language. Java is a strongly, statically typed language. Python is a strongly, dynamically typed language. Tcl is a weakly, dynamically typed language. Further complicating matters is that some people incorrectly use the term strongly typed to mean strongly, statically typed, or, even more confusingly, to mean simply statically typed. In the latter usage, C would be called strongly typed, despite being unable to catch very many type errors, and that it is trivial and common to defeat its type system, even accidentally.

Most languages also provide means to assemble complex data structures from built-in types and to associate names with these new combined types (using arrays, lists, stacks, files).

Object-oriented programming languages allow a programmer to define data-types called "Objects" which have their own intrinsic functions and variables (called methods and attributes respectively). A program containing objects allows the objects to operate as independent but interacting sub-programs: this interaction can be designed at coding time to model or simulate real-life interacting objects. This is a very useful, and intuitive, functionality. Programs such as Python and Ruby have developed as object-oriented (OO) languages. They are comparatively easy to learn and to use, and are gaining popularity in professional programming circles, as well as being accessible to non-professionals. These more intuitive languages have increased the public availability and power of customised computer applications.

Aside from when and how the correspondence between expressions and types is determined, there's also the crucial question of what types the language defines at all, and what types it allows as the values of expressions (expressed values) and as named values (denoted values). Low-level languages like C typically allow programs to name memory locations, regions of memory, and compile-time constants, while allowing expressions to return values that fit into machine registers; ANSI C extended this by allowing expressions to return struct values as well (see record). Functional languages often allow variables to name run-time computed values directly instead of naming memory locations where values may be stored. Languages that use garbage collection are free to allow arbitrarily complex data structures as both expressed and denoted values.

Finally, in some languages, procedures are allowed only as denoted values (they cannot be returned by expressions or bound to new names); in others, they can be passed as parameters to routines, but cannot otherwise be bound to new names; in others, they are as freely usable as any expressed value, but new ones cannot be created at run-time; and in still others, they are first-class values that can be created at run-time.

Instruction and control flow

[edit]

Once data has been specified, the machine must be instructed how to perform operations on the data. Elementary statements may be specified using keywords or may be indicated using some well-defined grammatical structure. Each language takes units of these well-behaved statements and combines them using some ordering system. Depending on the language, differing methods of grouping these elementary statements exist. This allows writing programs that can cover a variety of input, instead of being limited to a small number of cases. Furthermore, beyond the data manipulation instructions, other typical instructions in a language are those used for control flow (branches, definitions by cases, loops, backtracking, functional composition).

Reference mechanisms and re-use

[edit]

The core of the idea of reference is that there must be a method of indirectly designating storage space. The most common method is through named variables. Depending on the language, further indirection may include references that are pointers to other storage space stored in such variables or groups of variables. Similar to this method of naming storage is the method of naming groups of instructions. Most programming language use macro calls, procedure calls or function calls as the statements that use these names. Using symbolic names in this way allows a program to achieve significant flexibility, as well as a high measure of reusability. Indirect references to available programs or predefined data divisions allow many application-oriented languages to integrate typical operations as if the programming language included them as higher level instructions.

Design philosophies

[edit]

For the above-mentioned purposes, each language has been developed using a special design or philosophy. Some aspect or another is especially stressed by the way the language uses data structures, or by which its special notation encourages certain ways to solve problems or express their structure.

Since programming languages are artificial languages, they require a high degree of discipline to accurately specify which operations are desired. Programming languages are not error tolerant; however, the burden of recognising and using the special vocabulary is reduced by help messages generated by the programming language implementation.

A few languages offer a high degree of freedom in allowing self-modification in which a program re-writes parts of itself to handle new cases. Typically, only machine language and members of the Lisp family (Common Lisp, Scheme) provide this ability. Some languages such as MUMPS and Perl allow modification of data structures that contain program fragments, and provide methods to transfer program control to those data structures. Languages that support dynamic linking and loading such as C, C++, and Java, can emulate self-modification by either embedding a small compiler or calling a full compiler and linking in the resulting object code. Interpreting code by recompiling it in real time is called dynamic recompilation. Emulators and other virtual machines exploit this technique for greater performance.

The rigorous definition of the meaning of programming languages is the subject of formal semantics.

Classifications of programming languages

[edit]

There are a variety of ways to classify programming languages. The distinctions are not clear-cut; a given language standard may be implemented in multiple classifications. For example, a language may have both compiled and interpreted implementations.

Also, most compiled languages contain some run-time interpreted features. The most notable example is the familiar I/O format string, which is written in a specialized, little language and which is used to describe how to convert program data to or from an external representation. This string is typically interpreted at run time by a specialized format-language interpreter program included in the run-time support libraries. Many programmers have found the flexibility of this arrangement to be very valuable.

See also

[edit]

Lists of programming languages:

Related topics:

[edit]