2 min read
Array is a data structured that is ordered and indexed. Arrays are created in different manners in each language but they are all indexed in almost the same way. In this post I'm gonna be talking about the array data structure in couple of languages and how can we work with them.
How we define the arrays may vary and effect the performance. For examples in languages like Python, JavaScript and Java we don't need to define and initial size of the array. In Python and JavaScript we can keep different data types in the same array structure. But in Java we need to define the type of the values we are going to keep in the array.
Not defining an initial size for an array let's us increse the size of our array easily by just adding more items into it. That's because behind the scenes our language is hadnling the memory allocation itself. There are many methods to increasing sizes of the array in a fast way. The method I like working with languages in which we need to define a size is checking the size of the array and doubling the size if necessary. I have written the code that will mimic the behaviour of a dynmaic array in python. Even though it's basically built-in into python it's important to see how it actually works:
import ctypes
# python doesn't have a static array as in java
class DynamicArray(object):
# the constructor
def __init__(self):
self.n = 0 # Actual element count inside the array
self.capacity = 1 # initial capacity
self.A = self.make_array(self.capacity) # create the array with given capacity
# len(instance) returns the length of the instance
def __len__(self):
return self.n
def __getitem__(self, k):
# handle the if the k index exists in the array
if not 0 <= k < self.n:
return IndexError('Out of bounds! No insertion')
return self.A[k] # return if it exists of course
def append(self, ele):
# if the capacity is reached double the size
# don't forget in python and java it's "=="
if self.n == self.capacity:
self._resize(2*self.capacity)
self.A[self.n] = ele # append the element
self.n += 1 # increase the size of the array by one
def insertAt(self, item, index): # insert at a certain index
if index < 0 or index > self.n:
# if index is negative or larger than the array size
print("Invalid index")
return
# again if the size reaches the capacity
if self.n == self.capacity:
self._resize(2*self.capacity)
# range(start, stop, step), just a reminder
# moving all the items after the given index one point right
# in order to create the space, going right to left
for i in range(self.n-1, index-1, -1):
self.A[i+1] = self.A[i]
#finally inserting the item
self.A[index] = item
self.n+=1
# delete from the end of the array
def delete(self):
# handle an empty array
if self.n == 0:
print("Array is empty")
return
# give 0 to the last item in the array
self.A[self.n-1]
self.n -= 1
# remove at a certain index
def removeAt(self, index):
# similar to the insertAt method
if self.n == 0:
print("Array is empty")
if index < 0 or index >= self.n:
return IndexError("Index is out of bounds! No deletion")
# if it's the last item in the array
if index == self.n -1:
self.A[index] = 0
self.n -= 1
return
for i in range(index, self.n-1):
# moving the elements one index left
self.A[i] = self.A[i+1]
# took out an element so shorten the array
self.A[self.n-1] = 0
self.n -= 1
# Finally resize method for the capacity increases
# Doubling the capacity each time
def _resize(self, new_cap):
B = self.make_array(new_cap) # New bigger array
# move everything inside the new value
for k in range(self.n):
B[k] = self.A[k]
self.A = B
self.capacity = new_cap
def make_array(self, new_cap):
return (new_cap * ctypes.py_object)()
# Instantiate
arr = DynamicArray()
# Append new element
arr.append(1)
arr.append(2)
arr.append(3)
arr.append(4)
arr.append(5)
arr.append(6)
print(len(arr))
In C we define the arrays with and initial size and types. This comes with a lot of benefits in terms of speed and memory usage. This is one of the reasons that C is faster than Python and JavaScript in general.
In python also we can use the array method for creating fixed arrays (since 3.3 release):
import array
list = list(range(10))
arr = array.array('i', list)
arr # array('i', [0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
i here stands for "integer". This is a more performant option to the builtin lists in python. Maybe consider using it in the future. You need to also create arrays with initial sizes and data types when you are using popular frameworks in data science, like numpy. For example:
import numpy as np
np.ones((2, 5), dtype=float)
This would give us the following structure:
array([[ 1., 1., 1., 1., 1.],
[ 1., 1., 1., 1., 1.],
[ 1., 1., 1., 1., 1.]])
It's still possible to implement the same logic in Python and JavaScript. For example in if you are going to use numpy library in Python
JavaScript has an object that allows us to create a typed and sized array. TypedArray describes the structure that represents and array with the underlying binary data buffer. This is how we would create a typed array in javascript:
// We are creating a typed array by defining its size in bytes
// and the data type it's going to contain
const tp1 = new Int8Array(4); Int8Array [0, 0, 0, 0]
tp1[3] = 77;
console.log(tp1); Int8Array [0, 0, 0, 77]
If you wanna read more about the TypedArray constructor you can read this article. It's definitely worth reading I assure you. If you don't know much about data types you might wanna read this wikipedia article first.
Insertion and deletion opeartions in javascript arrays and python lists costs O(N) (linear) time. The reason behind this is that when we insert an item in them we move the items after the point of insertion. We first free the space that we wanna insert in by carrying all the items one index to the right. When this operation is done we insert the item into the given index. The same opration is performed in reverse when we want to do a deletion. First we empty the index and then we push everything after the index one index to the left.
The lookup operations takes only constant time as our arrays are indexed. In comparison with a linked list for example, this operation is significantly faster (O(N) vs O(1)).
When we start developping apps and skip the basics (because it's much more fun this way) this can cost us in terms of performance. It's never a bad idea to learn about the basics of computer science and programming in your free time.
Now I hope you have a better idea about the array data structure and I could inspire you to go deeper. Keep learning !
Ilker Akbiyik