# Function to multiply 'a' with 'x' and add 'b'
def function1(a, x, b):
return a * x + b
# Example usage
print(function1(2, 3, 4)) # Output: (2 * 3) + 4 = 10
10
In programming, we often perform the same tasks repeatedly. Functions and Loops help us write cleaner, shorter, and more efficient code.
A function is a block of code designed to perform a specific task. Using functions helps us avoid redundant code.
This visual representation helps illustrate how functions work systematically. The label “Function Machine” on the machine reinforces that it applies a specific rule to transform the input into an output. The function in the image is:
\[f(x) = x + 3\]
This means that any number inputted into the machine will have 3 added to it before being output.
This function takes three numbers as inputs and returns their calculation.
# Function to multiply 'a' with 'x' and add 'b'
def function1(a, x, b):
return a * x + b
# Example usage
print(function1(2, 3, 4)) # Output: (2 * 3) + 4 = 10
10
# Function to multiply 'a' with 'x' and add 'b'
<- function(a, x, b) {
function1 return(a * x + b)
}
# Example usage
print(function1(2, 3, 4)) # Output: (2 * 3) + 4 = 10
[1] 10
This function analyzes two datasets by calculating their mean, median, and standard deviation, useful in data analysis.
import statistics
from tabulate import tabulate
# Function to compare two datasets
def compare_data(group1, group2):
return {
"group1": {
"mean": statistics.mean(group1),
"median": statistics.median(group1),
"std_dev": statistics.stdev(group1)
},"group2": {
"mean": statistics.mean(group2),
"median": statistics.median(group2),
"std_dev": statistics.stdev(group2)
}
}
# Sample datasets
= [10, 20, 30, 40, 50]
data1 = [15, 25, 35, 45, 55]
data2
# Get results
= compare_data(data1, data2)
results
# Convert results to a table format
= [
table "Metric", "Group 1", "Group 2"],
["Mean", results["group1"]["mean"], results["group2"]["mean"]],
["Median", results["group1"]["median"], results["group2"]["median"]],
["Standard Deviation", results["group1"]["std_dev"], results["group2"]["std_dev"]]
[
]
# Print table
print(tabulate(table, headers="firstrow", tablefmt="grid"))
+--------------------+-----------+-----------+
| Metric | Group 1 | Group 2 |
+====================+===========+===========+
| Mean | 30 | 35 |
+--------------------+-----------+-----------+
| Median | 30 | 35 |
+--------------------+-----------+-----------+
| Standard Deviation | 15.8114 | 15.8114 |
+--------------------+-----------+-----------+
# Load library
library(knitr)
# Function to compare two datasets
<- function(group1, group2) {
compare_data data.frame(
Statistic = c("Mean", "Median", "Std Dev"),
Group1 = round(c(mean(group1), median(group1), sd(group1)), 2),
Group2 = round(c(mean(group2), median(group2), sd(group2)), 2)
)
}
# Sample data
<- c(10, 20, 30, 40, 50)
data1 <- c(15, 25, 35, 45, 55)
data2
# Print as formatted table
kable(compare_data(data1, data2))
Statistic | Group1 | Group2 |
---|---|---|
Mean | 30.00 | 35.00 |
Median | 30.00 | 35.00 |
Std Dev | 15.81 | 15.81 |
Functions save time by allowing code reuse, improve program organization and readability, and make debugging and future development easier.
In the field of computational geometry, functions are essential for converting mathematical expressions into executable code. For example, the formulas for calculating the area and perimeter of various two-dimensional shapes can be implemented as separate functions. This approach makes the development process more efficient and easier to manage. The following sections explain in detail how these geometric formulas are coded, using Python and R as examples.
Shape | Area Formula (A) | Perimeter Formula (P) | Variables Description |
---|---|---|---|
Triangle | \(A = \frac{1}{2}(b \times h)\) | \(P = a + b + c\) | \(b\) = base, \(h\) = height, \(a\), \(b\), \(c\) = sides |
Rectangle | \(A = l \times b\) | \(P = 2(l+b)\) | \(l\) = length, \(b\) = breadth |
Square | \(A = s \times s\) | \(P = 4 \times s\) | \(s\) = side |
Circle | \(A = \pi r^2\) | \(P = 2\pi r\) | \(r\) = radius, \(\pi = 3.14\) or \(\frac{22}{7}\) |
Ellipse | \(A = \pi \times a \times b\) | \(P = \pi(a+b)\) | \(a\) = semi-major axis, \(b\) = semi-minor axis |
Parallelogram | \(A = b \times h\) | \(P = 2(a+b)\) | \(b\) = base, \(h\) = height, \(a\), \(b\) = lengths of opposite sides |
Rhombus | \(A = \frac{1}{2}(d_1 \times d_2)\) | \(P = 4 \times a\) | \(d_1, d_2\) = diagonals, \(a\) = side |
Trapezium | \(A = \frac{1}{2}(a+b) \times h\) | Sum of all sides | \(a\), \(b\) = lengths of parallel sides, \(h\) = height |
With the formulas provided above, you can create functions that calculate the area and perimeter for different shapes. This not only makes your code modular and easier to maintain but also enables you to test individual pieces of logic in isolation. This example below in Python and R that demonstrate how to implement functions for these calculations.
import math
# Function to calculate area and perimeter for multiple shapes
def calculate_area_perimeter(shape, **kwargs):
if shape == "triangle":
= kwargs.get("base")
base = kwargs.get("height")
height = kwargs.get("side_a")
side_a = kwargs.get("side_b")
side_b = kwargs.get("side_c")
side_c = 0.5 * base * height
area = side_a + side_b + side_c
perimeter elif shape == "rectangle":
= kwargs.get("length")
length = kwargs.get("breadth")
breadth = length * breadth
area = 2 * (length + breadth)
perimeter elif shape == "square":
= kwargs.get("side")
side = side ** 2
area = 4 * side
perimeter elif shape == "circle":
= kwargs.get("radius")
radius = math.pi * radius ** 2
area = 2 * math.pi * radius
perimeter elif shape == "ellipse":
= kwargs.get("a")
a = kwargs.get("b")
b = math.pi * a * b
area = math.pi * (a + b)
perimeter elif shape == "parallelogram":
= kwargs.get("base")
base = kwargs.get("height")
height = kwargs.get("side_a")
side_a = kwargs.get("side_b")
side_b = base * height
area = 2 * (side_a + side_b)
perimeter elif shape == "rhombus":
= kwargs.get("d1")
d1 = kwargs.get("d2")
d2 = kwargs.get("side")
side = 0.5 * d1 * d2
area = 4 * side
perimeter elif shape == "trapezium":
= kwargs.get("a")
a = kwargs.get("b")
b = kwargs.get("height")
height = kwargs.get("side_a")
side_a = kwargs.get("side_b")
side_b = 0.5 * (a + b) * height
area = a + b + side_a + side_b
perimeter else:
return "Invalid shape. Choose a valid 2D shape."
return {"area": area, "perimeter": perimeter}
# Example usage
= calculate_area_perimeter("triangle",
result =6,
base=4,
height=5,
side_a=6,
side_b=7)
side_cprint("Triangle-Area & Perimeter:", result["area"], "and", result["perimeter"])
Triangle-Area & Perimeter: 12.0 and 18
# Function to calculate area and perimeter for multiple shapes
<- function(shape, ...) {
calculate_area_perimeter <- list(...)
args
if (shape == "triangle") {
<- args$base
base <- args$height
height <- args$side_a
side_a <- args$side_b
side_b <- args$side_c
side_c <- 0.5 * base * height
area <- side_a + side_b + side_c
perimeter else if (shape == "rectangle") {
} <- args$length
length <- args$breadth
breadth <- length * breadth
area <- 2 * (length + breadth)
perimeter else if (shape == "square") {
} <- args$side
side <- side^2
area <- 4 * side
perimeter else if (shape == "circle") {
} <- args$radius
radius <- pi * radius^2
area <- 2 * pi * radius
perimeter else if (shape == "ellipse") {
} <- args$a
a <- args$b
b <- pi * a * b
area <- pi * (a + b)
perimeter else if (shape == "parallelogram") {
} <- args$base
base <- args$height
height <- args$side_a
side_a <- args$side_b
side_b <- base * height
area <- 2 * (side_a + side_b)
perimeter else if (shape == "rhombus") {
} <- args$d1
d1 <- args$d2
d2 <- args$side
side <- 0.5 * d1 * d2
area <- 4 * side
perimeter else if (shape == "trapezium") {
} <- args$a
a <- args$b
b <- args$height
height <- args$side_a
side_a <- args$side_b
side_b <- 0.5 * (a + b) * height
area <- a + b + side_a + side_b
perimeter else {
} stop("Invalid shape. Choose a valid 2D shape.")
}
return(list(area = area, perimeter = perimeter))
}
# Example usage
<- calculate_area_perimeter("triangle",
result base = 6,
height = 4,
side_a = 5,
side_b = 6,
side_c = 7)
cat("Triangle - Area & Perimeter:", result$area, "and", result$perimeter, "\n")
Triangle - Area & Perimeter: 12 and 18
Loops allow us to execute the same code multiple times without rewriting it. Loops allow us to perform repetitive calculations for mathematical analysis and data processing. Types of Loops:
The Fibonacci sequence is a series of numbers where each number is the sum of the two preceding ones:
\[F(n) = F(n-1) + F(n-2)\]
Example: $0,1,1,2,3,5,8,13,21,\dots$
def fibonacci(n):
= [0, 1]
fib_series for i in range(2, n):
-1] + fib_series[-2])
fib_series.append(fib_series[return fib_series
print(fibonacci(10)) # Output: [0, 1, 1, 2, 3, 5, 8, 13, 21, 34]
[0, 1, 1, 2, 3, 5, 8, 13, 21, 34]
<- function(n) {
fibonacci <- c(0, 1)
fib_series for (i in 3:n) {
<- c(fib_series, fib_series[i-1] + fib_series[i-2])
fib_series
}return(fib_series)
}
print(fibonacci(10)) # Output: 0 1 1 2 3 5 8 13 21 34
[1] 0 1 1 2 3 5 8 13 21 34
This function generates a sequence based on the type specified: either an arithmetic sequence or a geometric sequence. For an arithmetic sequence, each term is obtained by adding a constant difference to the previous term. For a geometric sequence, each term is obtained by multiplying the previous term by a constant ratio.
def generate_sequence(seq_type, n, a, d=None, r=None):
"""
Generate an arithmetic or geometric sequence.
Parameters:
seq_type (str): Type of sequence - "arithmetic" or "geometric".
n (int): The number of terms in the sequence.
a (numeric): The first term of the sequence.
d (numeric, optional): The common difference (required for arithmetic).
r (numeric, optional): The common ratio (required for geometric).
Returns:
list: A list containing the generated sequence.
"""
= []
sequence if seq_type.lower() == "arithmetic":
if d is None:
raise ValueError("'d' must be provided for an arithmetic sequence")
for i in range(n):
+ i * d)
sequence.append(a elif seq_type.lower() == "geometric":
if r is None:
raise ValueError("'r' must be provided for a geometric sequence")
for i in range(n):
* (r ** i))
sequence.append(a else:
raise ValueError("seq_type must be either 'arithmetic' or 'geometric'")
return sequence
# Example usage:
print(generate_sequence("arithmetic", 10, 1, d=2))
[1, 3, 5, 7, 9, 11, 13, 15, 17, 19]
print(generate_sequence("geometric", 10, 1, r=3))
[1, 3, 9, 27, 81, 243, 729, 2187, 6561, 19683]
<- function(seq_type, n, a, d = NULL, r = NULL) {
generate_sequence #' Generate an arithmetic or geometric sequence.
#'
#' @param seq_type specifying the type of sequence:"arithmetic"/"geometric".
#' @param n The number of terms in the sequence.
#' @param a The first term of the sequence.
#' @param d The common difference (required for arithmetic sequences).
#' @param r The common ratio (required for geometric sequences).
#'
#' @return A numeric vector containing the generated sequence.
<- numeric(n)
sequence if (tolower(seq_type) == "arithmetic") {
if (is.null(d)) stop("'d' must be provided for an arithmetic sequence.")
for (i in 1:n) {
<- a + (i - 1) * d
sequence[i]
}else if (tolower(seq_type) == "geometric") {
} if (is.null(r)) stop("'r' must be provided for a geometric sequence.")
for (i in 1:n) {
<- a * (r^(i - 1))
sequence[i]
}else {
} stop("seq_type must be either 'arithmetic' or 'geometric'")
}return(sequence)
}
# Example usage:
print(generate_sequence("arithmetic", 10, 1, d = 2))
[1] 1 3 5 7 9 11 13 15 17 19
print(generate_sequence("geometric", 10, 1, r = 3))
[1] 1 3 9 27 81 243 729 2187 6561 19683
Linear regression is used to find the relationship between an independent variable \(X\) and a dependent variable \(Y\):
\[Y = aX + b\]
where:
import numpy as np
# Data (X: study hours, Y: exam scores)
= np.array([1, 2, 3, 4, 5])
X = np.array([2, 4, 5, 4, 5])
Y
# Calculate slope (a) and intercept (b)
= len(X)
n = sum(X), sum(Y)
sum_x, sum_y = sum(X * Y)
sum_xy = sum(X ** 2)
sum_x2
= (n * sum_xy - sum_x * sum_y) / (n * sum_x2 - sum_x ** 2)
a = (sum_y - a * sum_x) / n
b
print(f"Linear Regression: Y = {a:.2f}X + {b:.2f}")
Linear Regression: Y = 0.60X + 2.20
# Data
<- c(1, 2, 3, 4, 5)
X <- c(2, 4, 5, 4, 5)
Y
# Calculate slope (a) and intercept (b)
<- length(X)
n <- sum(X)
sum_x <- sum(Y)
sum_y <- sum(X * Y)
sum_xy <- sum(X^2)
sum_x2
<- (n * sum_xy - sum_x * sum_y) / (n * sum_x2 - sum_x^2)
a <- (sum_y - a * sum_x) / n
b
print(paste("Linear Regression: Y =", round(a, 2), "X +", round(b, 2)))
[1] "Linear Regression: Y = 0.6 X + 2.2"
Functions and loops help us create simpler and more efficient code. By understanding these two concepts, we can write better and more readable programs.
Let’s apply these Functions and Loops to real-world data science tasks:
import pandas as pd
import random
def create_employee_dataset(num_employees):
= {
positions "Staff": (3000, 5000, 1, 5),
"Supervisor": (5000, 8000, 5, 10),
"Manager": (8000, 12000, 10, 15),
"Director": (12000, 15000, 15, 25)
}
= ["Finance", "HR", "IT", "Marketing", "Operations", "Sales"]
departments = ["New York", "Los Angeles", "Chicago", "Houston", "Phoenix"]
locations
= {
data "ID_Number": [],
"Position": [],
"Salary": [],
"Age": [],
"Experience": [],
"Department": [],
"Location": []
}
for _ in range(num_employees):
= random.randint(10000, 99999)
id_number = random.choice(list(positions.keys()))
position = random.randint(positions[position][0],
salary 1])
positions[position][= random.randint(positions[position][2],
experience 3])
positions[position][= experience + random.randint(22, 35) # aligns with experience
age = random.choice(departments)
department = random.choice(locations)
location
"ID_Number"].append(id_number)
data["Position"].append(position)
data["Salary"].append(salary)
data["Age"].append(age)
data["Experience"].append(experience)
data["Department"].append(department)
data["Location"].append(location)
data[
return pd.DataFrame(data)
# Create the employee dataset
= create_employee_dataset(20)
df print(df)
ID_Number Position Salary Age Experience Department Location
0 81341 Manager 11352 47 13 Sales New York
1 52591 Supervisor 5466 29 5 Sales Los Angeles
2 71794 Staff 4201 40 5 Sales Chicago
3 35098 Manager 11046 44 14 IT Phoenix
4 44337 Staff 4638 31 4 Finance Houston
5 40107 Manager 10637 47 12 Operations Chicago
6 21790 Supervisor 5746 36 7 IT Phoenix
7 17060 Supervisor 6078 33 6 IT Phoenix
8 67971 Staff 4672 31 5 Sales Houston
9 44172 Manager 11605 34 12 Finance New York
10 40220 Supervisor 6328 32 7 Sales Houston
11 60793 Supervisor 7153 28 5 Sales Los Angeles
12 30595 Supervisor 5051 30 8 IT Phoenix
13 45161 Supervisor 6827 39 6 Marketing Los Angeles
14 42268 Supervisor 6556 39 6 Sales Los Angeles
15 74667 Supervisor 7727 31 9 Finance Houston
16 94126 Staff 4255 30 2 Operations Los Angeles
17 81800 Staff 3543 37 2 Sales Phoenix
18 73586 Supervisor 7297 35 8 Operations Houston
19 17845 Director 14764 52 22 Marketing Los Angeles
<- function(num_employees) {
create_employee_dataset # Define positions with corresponding salary and experience ranges
<- list(
positions "Staff" = c(3000, 5000, 1, 5),
"Supervisor" = c(5000, 8000, 5, 10),
"Manager" = c(8000, 12000, 10, 15),
"Director" = c(12000, 15000, 15, 25)
)
# Define additional categorical data: departments and locations
<- c("Finance", "HR", "IT", "Marketing", "Operations", "Sales")
departments <- c("New York", "Los Angeles", "Chicago", "Houston", "Phoenix")
locations
# Initialize empty vectors for each column
<- integer(num_employees)
ID_Number <- character(num_employees)
Position <- integer(num_employees)
Salary <- integer(num_employees)
Age <- integer(num_employees)
Experience <- character(num_employees)
Department <- character(num_employees)
Location
# Generate data for each employee
for (i in 1:num_employees) {
<- sample(10000:99999, 1)
ID_Number[i] <- sample(names(positions), 1)
pos <- pos
Position[i]
<- positions[[pos]][1:2]
salary_range <- sample(salary_range[1]:salary_range[2], 1)
Salary[i]
<- positions[[pos]][3:4]
exp_range <- sample(exp_range[1]:exp_range[2], 1)
Experience[i]
<- Experience[i] + sample(22:35, 1)
Age[i] <- sample(departments, 1)
Department[i] <- sample(locations, 1)
Location[i]
}
# Combine the vectors into a data frame
<- data.frame(
df ID_Number = ID_Number,
Position = Position,
Salary = Salary,
Age = Age,
Experience = Experience,
Department = Department,
Location = Location,
stringsAsFactors = FALSE
)
return(df)
}
# Example usage:
<- create_employee_dataset(20)
df print(df)
ID_Number Position Salary Age Experience Department Location
1 69234 Director 12640 49 22 Operations Houston
2 80527 Manager 9701 44 10 Finance Phoenix
3 97411 Supervisor 7418 43 8 Marketing Phoenix
4 27080 Director 12855 47 17 Sales Chicago
5 33747 Manager 9984 32 10 Finance New York
6 61960 Staff 4807 27 1 Marketing New York
7 10731 Manager 11119 40 11 HR New York
8 63971 Supervisor 5794 41 6 Marketing Houston
9 44077 Manager 11335 38 12 Marketing New York
10 39831 Director 14628 48 18 Operations Houston
11 16524 Staff 4641 31 5 Operations New York
12 66562 Manager 8777 46 14 IT Los Angeles
13 42653 Supervisor 7694 44 10 Finance Chicago
14 11767 Staff 3000 39 5 HR Chicago
15 95766 Director 14270 47 24 Marketing New York
16 48978 Staff 3196 38 5 Finance Chicago
17 92076 Staff 3872 39 4 Sales Chicago
18 67811 Staff 3064 29 3 Finance Chicago
19 31942 Director 13987 43 17 Finance Phoenix
20 22611 Supervisor 5493 33 9 Finance Los Angeles
import pandas as pd
import numpy as np
def manual_statistics(df, column=None):
def stats_for_column(values):
# Remove missing values for accurate computations
= values.dropna()
values if pd.api.types.is_numeric_dtype(values):
= len(values)
count = np.mean(values)
mean_value = np.median(values)
median_value = np.var(values, ddof=1) if count > 1 else 0
variance_value = np.sqrt(variance_value)
std_dev_value = np.min(values)
min_value = np.max(values)
max_value = np.percentile(values, 25)
q1 = np.percentile(values, 75)
q3 return {
"count": count,
"mean": mean_value,
"median": median_value,
"variance": variance_value,
"std_dev": std_dev_value,
"min": min_value,
"q1": q1,
"q3": q3,
"max": max_value
}else:
= len(values)
count = values.nunique()
unique_count = values.mode()
mode_series = mode_series.iloc[0] if not mode_series.empty else None
mode_value = values.value_counts().to_dict()
frequency return {
"count": count,
"unique": unique_count,
"mode": mode_value,
"frequency": frequency
}
if column is not None:
return stats_for_column(df[column])
else:
= {}
summary for col in df.columns:
= stats_for_column(df[col])
summary[col] return summary
# Get summary statistics for all columns
= manual_statistics(df)
stats_all
# Display the results in attractive tables using pandas' to_markdown()
for col, stats in stats_all.items():
print(f"\n### Summary Statistics for '{col}'\n")
if pd.api.types.is_numeric_dtype(df[col]):
# Create a DataFrame for numeric statistics with Statistic and Value
= pd.DataFrame({
stats_df "Statistic": list(stats.keys()),
"Value": list(stats.values())
})print(stats_df.to_markdown(index=False))
else:
# For categorical data, create summary table and frequency distribution
= pd.DataFrame({
summary_df "Statistic": ["count", "unique", "mode"],
"Value": [stats["count"], stats["unique"], stats["mode"]]
})= stats["frequency"]
freq_dict = pd.DataFrame({
freq_df "Category": list(freq_dict.keys()),
"Frequency": list(freq_dict.values())
})print(summary_df.to_markdown(index=False))
print("\n")
print(freq_df.to_markdown(index=False))
### Summary Statistics for 'ID_Number'
| Statistic | Value |
|:------------|----------------:|
| count | 20 |
| mean | 51866.1 |
| median | 44749 |
| variance | 5.16475e+08 |
| std_dev | 22726.1 |
| min | 17060 |
| q1 | 38854.8 |
| q3 | 72242 |
| max | 94126 |
### Summary Statistics for 'Position'
| Statistic | Value |
|:------------|:-----------|
| count | 20 |
| unique | 4 |
| mode | Supervisor |
| Category | Frequency |
|:-----------|------------:|
| Supervisor | 10 |
| Staff | 5 |
| Manager | 4 |
| Director | 1 |
### Summary Statistics for 'Salary'
| Statistic | Value |
|:------------|----------------:|
| count | 20 |
| mean | 7247.1 |
| median | 6442 |
| variance | 9.31655e+06 |
| std_dev | 3052.3 |
| min | 3543 |
| q1 | 4956.25 |
| q3 | 8454.5 |
| max | 14764 |
### Summary Statistics for 'Age'
| Statistic | Value |
|:------------|---------:|
| count | 20 |
| mean | 36.25 |
| median | 34.5 |
| variance | 46.6184 |
| std_dev | 6.82777 |
| min | 28 |
| q1 | 31 |
| q3 | 39.25 |
| max | 52 |
### Summary Statistics for 'Experience'
| Statistic | Value |
|:------------|---------:|
| count | 20 |
| mean | 7.9 |
| median | 6.5 |
| variance | 22.5158 |
| std_dev | 4.74508 |
| min | 2 |
| q1 | 5 |
| q3 | 9.75 |
| max | 22 |
### Summary Statistics for 'Department'
| Statistic | Value |
|:------------|:--------|
| count | 20 |
| unique | 5 |
| mode | Sales |
| Category | Frequency |
|:-----------|------------:|
| Sales | 8 |
| IT | 4 |
| Finance | 3 |
| Operations | 3 |
| Marketing | 2 |
### Summary Statistics for 'Location'
| Statistic | Value |
|:------------|:------------|
| count | 20 |
| unique | 5 |
| mode | Los Angeles |
| Category | Frequency |
|:------------|------------:|
| Los Angeles | 6 |
| Phoenix | 5 |
| Houston | 5 |
| New York | 2 |
| Chicago | 2 |
library(knitr)
library(kableExtra)
<- function(df, column = NULL) {
manual_statistics # Helper function to compute statistics for a single column
<- function(values) {
stats_for_column # Remove NA values for accurate computations
<- values[!is.na(values)]
values
if (is.numeric(values)) {
<- length(values)
count <- mean(values)
mean_value <- median(values)
median_value <- if (count > 1) var(values) else 0
variance_value <- sqrt(variance_value)
std_dev_value <- min(values)
min_value <- max(values)
max_value <- as.numeric(quantile(values, 0.25))
q1 <- as.numeric(quantile(values, 0.75))
q3
return(list(
count = count,
mean = mean_value,
median = median_value,
variance = variance_value,
std_dev = std_dev_value,
min = min_value,
q1 = q1,
q3 = q3,
max = max_value
))else {
} <- length(values)
count <- length(unique(values))
unique_count <- table(values)
tab <- names(tab)[which.max(tab)]
mode_value <- as.list(tab)
frequency
return(list(
count = count,
unique = unique_count,
mode = mode_value,
frequency = frequency
))
}
}
# If a specific column is provided, compute statistics only for that column.
if (!is.null(column)) {
return(stats_for_column(df[[column]]))
else {
} # Otherwise, compute statistics for each column in the DataFrame.
<- list()
summary for (col in names(df)) {
<- stats_for_column(df[[col]])
summary[[col]]
}return(summary)
} }
# Hitung summary statistics untuk semua kolom
<- manual_statistics(df)
stats_all
# Loop untuk menampilkan hasil setiap kolom dengan DT::datatable
for (col in names(stats_all)) {
cat(paste0("<h3>Summary Statistics for '", col, "'</h3>"))
<- stats_all[[col]]
col_stats
if (is.numeric(df[[col]])) {
<- data.frame(
stats_df Statistic = names(col_stats),
Value = as.numeric(unlist(col_stats)),
stringsAsFactors = FALSE
)print(DT::datatable(stats_df,
caption = paste("Summary for", col),
options = list(pageLength = 5, autoWidth = TRUE)))
else {
} <- data.frame(
summary_df Statistic = c("count", "unique", "mode"),
Value = c(col_stats$count, col_stats$unique, col_stats$mode),
stringsAsFactors = FALSE
)<- as.data.frame(do.call(rbind, col_stats$frequency))
freq_df <- cbind(Category = rownames(freq_df), freq_df)
freq_df rownames(freq_df) <- NULL
names(freq_df)[2] <- "Frequency"
print(DT::datatable(summary_df,
caption = paste("Summary for", col),
options = list(pageLength = 5, autoWidth = TRUE)))
cat("<br>")
print(DT::datatable(freq_df,
caption = paste("Frequency Distribution for", col),
options = list(pageLength = 5, autoWidth = TRUE)))
}
cat("<br><br>")
}
FALSE <h3>Summary Statistics for 'ID_Number'</h3><br><br><h3>Summary Statistics for 'Position'</h3><br><br><br><h3>Summary Statistics for 'Salary'</h3><br><br><h3>Summary Statistics for 'Age'</h3><br><br><h3>Summary Statistics for 'Experience'</h3><br><br><h3>Summary Statistics for 'Department'</h3><br><br><br><h3>Summary Statistics for 'Location'</h3><br><br><br>