Programming/C/Memory Model: Difference between revisions

From Dev Wiki
< Programming‎ | C
Jump to navigation Jump to search
m (Brodriguez moved page C++/Memory Model to C/Memory Model)
(Add more about pointers)
 
(5 intermediate revisions by the same user not shown)
Line 1: Line 1:
TODO: Populate page
Both [[ Programming/C | C]] and [[ Programming/C++ | C++]] force the programmer to manage pointers and memory.
 
 
This is a manual process. This is also both a blessing and a curse.
 
 
On one hand, the programmer can potentially manage references and variable/object deconstruction (aka, what many languages refer to as "garbage collection") much better than any language that has garbage collection built-in and handled automagically.<br>
In other words, a given [[ Programming/C | C]]/[[ Programming/C++ | C++]] program can potentially be magnitudes more efficient.
 
On the other hand, if not careful, or if they don't know what they're doing, a programmer can introduce many bugs and unintentional problems by incorrect manual garbage collection.<br>
A set of memory can be allocated that's too small. So memory references overlap and garble data. Or data can be forgotten about, potentially hanging in memory and taking up space far far longer than it would have been around with automatic garbage collection.
 
 
Pointers and memory management is one of the biggest differences between C/C++ and any other language.
 
 
== Computer Memory ==
 
A computer's memory is a series of locations called '''addresses'''. Each memory address:
* Can contain a set amount of data, represented as a given amount of 1's and 0's.
* Is represented with a unique identifier number.
 
{{ todo | Document binary, at least basics. }}
 
The 1's and 0's are called '''Binary'''. The computer automatically converts data to binary to store it in memory, and then back to the original format, so we as humans can understand it. Similarly, the computer will automatically determine the unique identifiers that correspond to each address space.
 
So while it's potentially useful to understand these underlying concepts, we as programmers don't (usually) need to directly deal with these concepts.
 
 
{{ Note | For very large pieces of data, we can chain together memory addresses to store more information. Thus, different addresses have varying sizes. }}
 
 
== Stack Memory Vs Heap Memory ==
 
 
=== Stack ===
 
The '''stack''' is a contiguous section of memory that contains memory for local variables. Every program has a unique stack generated at runtime.
 
 
It is called "the stack" because variables are placed in stack-order.<br>
Aka, a program starts, and the {{ ic |main()}} function (as well as any corresponding local variables) are immediately placed on the stack.<br>
Any time another function is called, that is placed directly on top of the current existing stack (along with any corresponding local variables for that function).
 
In this way, when a function is done executing, it will always be on the top of the stack, and will thusly be popped off the stack, as it is no longer required.
 
This process repeats until all functions for the program have completed, and the program's {{ ic |main()}} function finally terminates.
 
 
Due to how this works, the stack size for any given function is known at compile time. This memory is allocated at application start up.
 
A '''Stack Overflow''' error is when functions on the stack nest too deep, and the application runs out of the stack memory that was assigned at program startup.
 
 
=== Heap ===
 
The '''heap''' is a section of memory that allows variables to be dynamically allocated during runtime. This is done via calls with the {{ ic |new}} keyword, {{ ic |malloc()}} calls, or other similar functions.
 
 
Note that, unlike the stack, everything in the heap is manually allocated by the programmer.
 
Similarly, it everything in the heap should be manually unallocated when no longer used. Any instance when a program does NOT release allocated heap memory is a memory leak.
 
 
As long as the OS has enough memory available, memory of any size can be allocated.
 
 
== Pointers ==
 
A '''pointer''' is simply a variable that says "there is something here at this specific location in memory, and I expect it to be of x size."
 
In other words, pointers are references to specific locations in memory. Any value can have a pointer created for it, regardless of it the value is on the stack or heap.
 
 
The actual size of a pointer depends on the OS. For example, in a 64-bit OS, the pointer is really just a 64-bit unsigned integer.
 
 
When possible, it's better to pass pointers around, as it's faster to pass a pointer than, say a whole class object. Aka, pointers are often preferable to copying a value.
 
 
=== Implicit Vs Explicit References ===
 
Pointers are defined by using the {{ ic |*}} character after the variable type. We can also get the memory location of a value by using the {{ ic |&}} character before a variable.
 
 
This is an '''implicit''' example. In this code, it will look like and be treated as if it were a standard variable.
// Create int with value of "5".
int my_int = 5;
// Create pointer to above int.
int& my_pointer = my_int;
// We can then reference as if it were any standard variable.
printf("%d", my_pointer);
 
 
This is an '''explicit''' example. In this code, it will look like and be treated as a pointer.
// Create int with value of "5".
int my_int = 5;
// Create pointer to above int.
int* my_pointer = &my_int;
// We must then use pointer syntax to actually get our value.
printf("%d", *my_pointer);
 
 
{{ warn |If you ever have a variable on the heap (created via {{ ic |new}}, {{ ic |malloc()}}, or similar)}} that no longer has a pointer referencing it, then that is a memory leak}}.
 
 
== Misc Advice ==
 
* Be careful when copying any non-trivial variables, such as classes.
** Any class variables such as pointers will be copied as-is. This means when the copy is destroyed, the pointer is de-referenced for all classes.
** This would then potentially cause errors for any other copies (or the original) when trying to access the pointer or deconstruct those as well.
* With classes, it's generally best to define manually constructors and deconstructors. This can, for example, help avoid the above problem, as well as other potential problems.

Latest revision as of 02:47, 23 April 2023

Both C and C++ force the programmer to manage pointers and memory.


This is a manual process. This is also both a blessing and a curse.


On one hand, the programmer can potentially manage references and variable/object deconstruction (aka, what many languages refer to as "garbage collection") much better than any language that has garbage collection built-in and handled automagically.
In other words, a given C/ C++ program can potentially be magnitudes more efficient.

On the other hand, if not careful, or if they don't know what they're doing, a programmer can introduce many bugs and unintentional problems by incorrect manual garbage collection.
A set of memory can be allocated that's too small. So memory references overlap and garble data. Or data can be forgotten about, potentially hanging in memory and taking up space far far longer than it would have been around with automatic garbage collection.


Pointers and memory management is one of the biggest differences between C/C++ and any other language.


Computer Memory

A computer's memory is a series of locations called addresses. Each memory address:

  • Can contain a set amount of data, represented as a given amount of 1's and 0's.
  • Is represented with a unique identifier number.
ToDo: Document binary, at least basics.

The 1's and 0's are called Binary. The computer automatically converts data to binary to store it in memory, and then back to the original format, so we as humans can understand it. Similarly, the computer will automatically determine the unique identifiers that correspond to each address space.

So while it's potentially useful to understand these underlying concepts, we as programmers don't (usually) need to directly deal with these concepts.


Note: For very large pieces of data, we can chain together memory addresses to store more information. Thus, different addresses have varying sizes.


Stack Memory Vs Heap Memory

Stack

The stack is a contiguous section of memory that contains memory for local variables. Every program has a unique stack generated at runtime.


It is called "the stack" because variables are placed in stack-order.
Aka, a program starts, and the main() function (as well as any corresponding local variables) are immediately placed on the stack.
Any time another function is called, that is placed directly on top of the current existing stack (along with any corresponding local variables for that function).

In this way, when a function is done executing, it will always be on the top of the stack, and will thusly be popped off the stack, as it is no longer required.

This process repeats until all functions for the program have completed, and the program's main() function finally terminates.


Due to how this works, the stack size for any given function is known at compile time. This memory is allocated at application start up.

A Stack Overflow error is when functions on the stack nest too deep, and the application runs out of the stack memory that was assigned at program startup.


Heap

The heap is a section of memory that allows variables to be dynamically allocated during runtime. This is done via calls with the new keyword, malloc() calls, or other similar functions.


Note that, unlike the stack, everything in the heap is manually allocated by the programmer.

Similarly, it everything in the heap should be manually unallocated when no longer used. Any instance when a program does NOT release allocated heap memory is a memory leak.


As long as the OS has enough memory available, memory of any size can be allocated.


Pointers

A pointer is simply a variable that says "there is something here at this specific location in memory, and I expect it to be of x size."

In other words, pointers are references to specific locations in memory. Any value can have a pointer created for it, regardless of it the value is on the stack or heap.


The actual size of a pointer depends on the OS. For example, in a 64-bit OS, the pointer is really just a 64-bit unsigned integer.


When possible, it's better to pass pointers around, as it's faster to pass a pointer than, say a whole class object. Aka, pointers are often preferable to copying a value.


Implicit Vs Explicit References

Pointers are defined by using the * character after the variable type. We can also get the memory location of a value by using the & character before a variable.


This is an implicit example. In this code, it will look like and be treated as if it were a standard variable.

// Create int with value of "5".
int my_int = 5;

// Create pointer to above int.
int& my_pointer = my_int;

// We can then reference as if it were any standard variable.
printf("%d", my_pointer);


This is an explicit example. In this code, it will look like and be treated as a pointer.

// Create int with value of "5".
int my_int = 5;

// Create pointer to above int.
int* my_pointer = &my_int;

// We must then use pointer syntax to actually get our value.
printf("%d", *my_pointer);


Warn: If you ever have a variable on the heap (created via new, malloc(), or similar)

that no longer has a pointer referencing it, then that is a memory leak}}.


Misc Advice

  • Be careful when copying any non-trivial variables, such as classes.
    • Any class variables such as pointers will be copied as-is. This means when the copy is destroyed, the pointer is de-referenced for all classes.
    • This would then potentially cause errors for any other copies (or the original) when trying to access the pointer or deconstruct those as well.
  • With classes, it's generally best to define manually constructors and deconstructors. This can, for example, help avoid the above problem, as well as other potential problems.