# Function to multiply 'a' with 'x' and add 'b'
def function1(a, x, b):
return a * x + b
# Example usage
print(function1(2, 3, 4)) # Output: (2 * 3) + 4 = 10
10
In programming, we often perform the same tasks repeatedly. Functions and Loops help us write cleaner, shorter, and more efficient code.
A function is a block of code designed to perform a specific task. Using functions helps us avoid redundant code.
This visual representation helps illustrate how functions work systematically. The label “Function Machine” on the machine reinforces that it applies a specific rule to transform the input into an output. The function in the image is:
\[f(x) = x + 3\]
This means that any number inputted into the machine will have 3 added to it before being output.
This function takes three numbers as inputs and returns their calculation.
# Function to multiply 'a' with 'x' and add 'b'
def function1(a, x, b):
return a * x + b
# Example usage
print(function1(2, 3, 4)) # Output: (2 * 3) + 4 = 10
10
# Function to multiply 'a' with 'x' and add 'b'
<- function(a, x, b) {
function1 return(a * x + b)
}
# Example usage
print(function1(2, 3, 4)) # Output: (2 * 3) + 4 = 10
[1] 10
This function analyzes two datasets by calculating their mean, median, and standard deviation, useful in data analysis.
import statistics
from tabulate import tabulate
# Function to compare two datasets
def compare_data(group1, group2):
return {
"group1": {
"mean": statistics.mean(group1),
"median": statistics.median(group1),
"std_dev": statistics.stdev(group1)
},"group2": {
"mean": statistics.mean(group2),
"median": statistics.median(group2),
"std_dev": statistics.stdev(group2)
}
}
# Sample datasets
= [10, 20, 30, 40, 50]
data1 = [15, 25, 35, 45, 55]
data2
# Get results
= compare_data(data1, data2)
results
# Convert results to a table format
= [
table "Metric", "Group 1", "Group 2"],
["Mean", results["group1"]["mean"], results["group2"]["mean"]],
["Median", results["group1"]["median"], results["group2"]["median"]],
["Standard Deviation", results["group1"]["std_dev"], results["group2"]["std_dev"]]
[
]
# Print table
print(tabulate(table, headers="firstrow", tablefmt="grid"))
+--------------------+-----------+-----------+
| Metric | Group 1 | Group 2 |
+====================+===========+===========+
| Mean | 30 | 35 |
+--------------------+-----------+-----------+
| Median | 30 | 35 |
+--------------------+-----------+-----------+
| Standard Deviation | 15.8114 | 15.8114 |
+--------------------+-----------+-----------+
# Load library
library(knitr)
# Function to compare two datasets
<- function(group1, group2) {
compare_data data.frame(
Statistic = c("Mean", "Median", "Std Dev"),
Group1 = round(c(mean(group1), median(group1), sd(group1)), 2),
Group2 = round(c(mean(group2), median(group2), sd(group2)), 2)
)
}
# Sample data
<- c(10, 20, 30, 40, 50)
data1 <- c(15, 25, 35, 45, 55)
data2
# Print as formatted table
kable(compare_data(data1, data2))
Statistic | Group1 | Group2 |
---|---|---|
Mean | 30.00 | 35.00 |
Median | 30.00 | 35.00 |
Std Dev | 15.81 | 15.81 |
Functions save time by allowing code reuse, improve program organization and readability, and make debugging and future development easier.
In the field of computational geometry, functions are essential for converting mathematical expressions into executable code. For example, the formulas for calculating the area and perimeter of various two-dimensional shapes can be implemented as separate functions. This approach makes the development process more efficient and easier to manage. The following sections explain in detail how these geometric formulas are coded, using Python and R as examples.
Shape | Area Formula (A) | Perimeter Formula (P) | Variables Description |
---|---|---|---|
Triangle | \(A = \frac{1}{2}(b \times h)\) | \(P = a + b + c\) | \(b\) = base, \(h\) = height, \(a\), \(b\), \(c\) = sides |
Rectangle | \(A = l \times b\) | \(P = 2(l+b)\) | \(l\) = length, \(b\) = breadth |
Square | \(A = s \times s\) | \(P = 4 \times s\) | \(s\) = side |
Circle | \(A = \pi r^2\) | \(P = 2\pi r\) | \(r\) = radius, \(\pi = 3.14\) or \(\frac{22}{7}\) |
Ellipse | \(A = \pi \times a \times b\) | \(P = \pi(a+b)\) | \(a\) = semi-major axis, \(b\) = semi-minor axis |
Parallelogram | \(A = b \times h\) | \(P = 2(a+b)\) | \(b\) = base, \(h\) = height, \(a\), \(b\) = lengths of opposite sides |
Rhombus | \(A = \frac{1}{2}(d_1 \times d_2)\) | \(P = 4 \times a\) | \(d_1, d_2\) = diagonals, \(a\) = side |
Trapezium | \(A = \frac{1}{2}(a+b) \times h\) | Sum of all sides | \(a\), \(b\) = lengths of parallel sides, \(h\) = height |
With the formulas provided above, you can create functions that calculate the area and perimeter for different shapes. This not only makes your code modular and easier to maintain but also enables you to test individual pieces of logic in isolation. This example below in Python and R that demonstrate how to implement functions for these calculations.
import math
# Function to calculate area and perimeter for multiple shapes
def calculate_area_perimeter(shape, **kwargs):
if shape == "triangle":
= kwargs.get("base")
base = kwargs.get("height")
height = kwargs.get("side_a")
side_a = kwargs.get("side_b")
side_b = kwargs.get("side_c")
side_c = 0.5 * base * height
area = side_a + side_b + side_c
perimeter elif shape == "rectangle":
= kwargs.get("length")
length = kwargs.get("breadth")
breadth = length * breadth
area = 2 * (length + breadth)
perimeter elif shape == "square":
= kwargs.get("side")
side = side ** 2
area = 4 * side
perimeter elif shape == "circle":
= kwargs.get("radius")
radius = math.pi * radius ** 2
area = 2 * math.pi * radius
perimeter elif shape == "ellipse":
= kwargs.get("a")
a = kwargs.get("b")
b = math.pi * a * b
area = math.pi * (a + b)
perimeter elif shape == "parallelogram":
= kwargs.get("base")
base = kwargs.get("height")
height = kwargs.get("side_a")
side_a = kwargs.get("side_b")
side_b = base * height
area = 2 * (side_a + side_b)
perimeter elif shape == "rhombus":
= kwargs.get("d1")
d1 = kwargs.get("d2")
d2 = kwargs.get("side")
side = 0.5 * d1 * d2
area = 4 * side
perimeter elif shape == "trapezium":
= kwargs.get("a")
a = kwargs.get("b")
b = kwargs.get("height")
height = kwargs.get("side_a")
side_a = kwargs.get("side_b")
side_b = 0.5 * (a + b) * height
area = a + b + side_a + side_b
perimeter else:
return "Invalid shape. Choose a valid 2D shape."
return {"area": area, "perimeter": perimeter}
# Example usage
= calculate_area_perimeter("triangle",
result =6,
base=4,
height=5,
side_a=6,
side_b=7)
side_cprint("Triangle-Area & Perimeter:", result["area"], "and", result["perimeter"])
Triangle-Area & Perimeter: 12.0 and 18
# Function to calculate area and perimeter for multiple shapes
<- function(shape, ...) {
calculate_area_perimeter <- list(...)
args
if (shape == "triangle") {
<- args$base
base <- args$height
height <- args$side_a
side_a <- args$side_b
side_b <- args$side_c
side_c <- 0.5 * base * height
area <- side_a + side_b + side_c
perimeter else if (shape == "rectangle") {
} <- args$length
length <- args$breadth
breadth <- length * breadth
area <- 2 * (length + breadth)
perimeter else if (shape == "square") {
} <- args$side
side <- side^2
area <- 4 * side
perimeter else if (shape == "circle") {
} <- args$radius
radius <- pi * radius^2
area <- 2 * pi * radius
perimeter else if (shape == "ellipse") {
} <- args$a
a <- args$b
b <- pi * a * b
area <- pi * (a + b)
perimeter else if (shape == "parallelogram") {
} <- args$base
base <- args$height
height <- args$side_a
side_a <- args$side_b
side_b <- base * height
area <- 2 * (side_a + side_b)
perimeter else if (shape == "rhombus") {
} <- args$d1
d1 <- args$d2
d2 <- args$side
side <- 0.5 * d1 * d2
area <- 4 * side
perimeter else if (shape == "trapezium") {
} <- args$a
a <- args$b
b <- args$height
height <- args$side_a
side_a <- args$side_b
side_b <- 0.5 * (a + b) * height
area <- a + b + side_a + side_b
perimeter else {
} stop("Invalid shape. Choose a valid 2D shape.")
}
return(list(area = area, perimeter = perimeter))
}
# Example usage
<- calculate_area_perimeter("triangle",
result base = 6,
height = 4,
side_a = 5,
side_b = 6,
side_c = 7)
cat("Triangle - Area & Perimeter:", result$area, "and", result$perimeter, "\n")
Triangle - Area & Perimeter: 12 and 18
Loops allow us to execute the same code multiple times without rewriting it. Loops allow us to perform repetitive calculations for mathematical analysis and data processing. Types of Loops:
The Fibonacci sequence is a series of numbers where each number is the sum of the two preceding ones:
\[F(n) = F(n-1) + F(n-2)\]
Example: $0,1,1,2,3,5,8,13,21,\dots$
def fibonacci(n):
= [0, 1]
fib_series for i in range(2, n):
-1] + fib_series[-2])
fib_series.append(fib_series[return fib_series
print(fibonacci(10)) # Output: [0, 1, 1, 2, 3, 5, 8, 13, 21, 34]
[0, 1, 1, 2, 3, 5, 8, 13, 21, 34]
<- function(n) {
fibonacci <- c(0, 1)
fib_series for (i in 3:n) {
<- c(fib_series, fib_series[i-1] + fib_series[i-2])
fib_series
}return(fib_series)
}
print(fibonacci(10)) # Output: 0 1 1 2 3 5 8 13 21 34
[1] 0 1 1 2 3 5 8 13 21 34
This function generates a sequence based on the type specified: either an arithmetic sequence or a geometric sequence. For an arithmetic sequence, each term is obtained by adding a constant difference to the previous term. For a geometric sequence, each term is obtained by multiplying the previous term by a constant ratio.
def generate_sequence(seq_type, n, a, d=None, r=None):
"""
Generate an arithmetic or geometric sequence.
Parameters:
seq_type (str): Type of sequence - "arithmetic" or "geometric".
n (int): The number of terms in the sequence.
a (numeric): The first term of the sequence.
d (numeric, optional): The common difference (required for arithmetic).
r (numeric, optional): The common ratio (required for geometric).
Returns:
list: A list containing the generated sequence.
"""
= []
sequence if seq_type.lower() == "arithmetic":
if d is None:
raise ValueError("'d' must be provided for an arithmetic sequence")
for i in range(n):
+ i * d)
sequence.append(a elif seq_type.lower() == "geometric":
if r is None:
raise ValueError("'r' must be provided for a geometric sequence")
for i in range(n):
* (r ** i))
sequence.append(a else:
raise ValueError("seq_type must be either 'arithmetic' or 'geometric'")
return sequence
# Example usage:
print(generate_sequence("arithmetic", 10, 1, d=2))
[1, 3, 5, 7, 9, 11, 13, 15, 17, 19]
print(generate_sequence("geometric", 10, 1, r=3))
[1, 3, 9, 27, 81, 243, 729, 2187, 6561, 19683]
<- function(seq_type, n, a, d = NULL, r = NULL) {
generate_sequence #' Generate an arithmetic or geometric sequence.
#'
#' @param seq_type specifying the type of sequence:"arithmetic"/"geometric".
#' @param n The number of terms in the sequence.
#' @param a The first term of the sequence.
#' @param d The common difference (required for arithmetic sequences).
#' @param r The common ratio (required for geometric sequences).
#'
#' @return A numeric vector containing the generated sequence.
<- numeric(n)
sequence if (tolower(seq_type) == "arithmetic") {
if (is.null(d)) stop("'d' must be provided for an arithmetic sequence.")
for (i in 1:n) {
<- a + (i - 1) * d
sequence[i]
}else if (tolower(seq_type) == "geometric") {
} if (is.null(r)) stop("'r' must be provided for a geometric sequence.")
for (i in 1:n) {
<- a * (r^(i - 1))
sequence[i]
}else {
} stop("seq_type must be either 'arithmetic' or 'geometric'")
}return(sequence)
}
# Example usage:
print(generate_sequence("arithmetic", 10, 1, d = 2))
[1] 1 3 5 7 9 11 13 15 17 19
print(generate_sequence("geometric", 10, 1, r = 3))
[1] 1 3 9 27 81 243 729 2187 6561 19683
Linear regression is used to find the relationship between an independent variable \(X\) and a dependent variable \(Y\):
\[Y = aX + b\]
where:
import numpy as np
# Data (X: study hours, Y: exam scores)
= np.array([1, 2, 3, 4, 5])
X = np.array([2, 4, 5, 4, 5])
Y
# Calculate slope (a) and intercept (b)
= len(X)
n = sum(X), sum(Y)
sum_x, sum_y = sum(X * Y)
sum_xy = sum(X ** 2)
sum_x2
= (n * sum_xy - sum_x * sum_y) / (n * sum_x2 - sum_x ** 2)
a = (sum_y - a * sum_x) / n
b
print(f"Linear Regression: Y = {a:.2f}X + {b:.2f}")
Linear Regression: Y = 0.60X + 2.20
# Data
<- c(1, 2, 3, 4, 5)
X <- c(2, 4, 5, 4, 5)
Y
# Calculate slope (a) and intercept (b)
<- length(X)
n <- sum(X)
sum_x <- sum(Y)
sum_y <- sum(X * Y)
sum_xy <- sum(X^2)
sum_x2
<- (n * sum_xy - sum_x * sum_y) / (n * sum_x2 - sum_x^2)
a <- (sum_y - a * sum_x) / n
b
print(paste("Linear Regression: Y =", round(a, 2), "X +", round(b, 2)))
[1] "Linear Regression: Y = 0.6 X + 2.2"
Functions and loops help us create simpler and more efficient code. By understanding these two concepts, we can write better and more readable programs.
Let’s apply these Functions and Loops to real-world data science tasks:
import pandas as pd
import random
def create_employee_dataset(num_employees):
= {
positions "Staff": (3000, 5000, 1, 5),
"Supervisor": (5000, 8000, 5, 10),
"Manager": (8000, 12000, 10, 15),
"Director": (12000, 15000, 15, 25)
}
= ["Finance", "HR", "IT", "Marketing", "Operations", "Sales"]
departments = ["New York", "Los Angeles", "Chicago", "Houston", "Phoenix"]
locations
= {
data "ID_Number": [],
"Position": [],
"Salary": [],
"Age": [],
"Experience": [],
"Department": [],
"Location": []
}
for _ in range(num_employees):
= random.randint(10000, 99999)
id_number = random.choice(list(positions.keys()))
position = random.randint(positions[position][0],
salary 1])
positions[position][= random.randint(positions[position][2],
experience 3])
positions[position][= experience + random.randint(22, 35) # aligns with experience
age = random.choice(departments)
department = random.choice(locations)
location
"ID_Number"].append(id_number)
data["Position"].append(position)
data["Salary"].append(salary)
data["Age"].append(age)
data["Experience"].append(experience)
data["Department"].append(department)
data["Location"].append(location)
data[
return pd.DataFrame(data)
# Create the employee dataset
= create_employee_dataset(20)
df print(df)
ID_Number Position Salary Age Experience Department Location
0 86126 Director 12765 47 17 IT Chicago
1 33564 Supervisor 6790 31 7 Operations Chicago
2 46445 Staff 4114 27 1 Finance Los Angeles
3 65745 Manager 9967 44 14 IT Houston
4 77151 Supervisor 6872 33 5 Marketing Los Angeles
5 29628 Manager 11568 47 15 Finance Phoenix
6 56593 Staff 3957 26 2 Sales Phoenix
7 39205 Director 12394 43 18 HR Phoenix
8 17440 Staff 4987 33 5 Finance Los Angeles
9 41580 Director 14198 43 16 Operations New York
10 68936 Supervisor 7145 27 5 Operations Phoenix
11 52580 Manager 11280 38 11 IT Houston
12 89750 Supervisor 5399 30 7 IT Los Angeles
13 58177 Supervisor 7466 40 7 Finance Los Angeles
14 16889 Manager 10153 41 14 Marketing Chicago
15 86019 Supervisor 5446 39 8 Sales Chicago
16 69877 Supervisor 6159 32 7 Finance Los Angeles
17 12394 Supervisor 5464 41 8 Marketing Phoenix
18 85458 Supervisor 5508 35 5 Finance Phoenix
19 34243 Director 13449 46 18 Marketing New York
<- function(num_employees) {
create_employee_dataset # Define positions with corresponding salary and experience ranges
<- list(
positions "Staff" = c(3000, 5000, 1, 5),
"Supervisor" = c(5000, 8000, 5, 10),
"Manager" = c(8000, 12000, 10, 15),
"Director" = c(12000, 15000, 15, 25)
)
# Define additional categorical data: departments and locations
<- c("Finance", "HR", "IT", "Marketing", "Operations", "Sales")
departments <- c("New York", "Los Angeles", "Chicago", "Houston", "Phoenix")
locations
# Initialize empty vectors for each column
<- integer(num_employees)
ID_Number <- character(num_employees)
Position <- integer(num_employees)
Salary <- integer(num_employees)
Age <- integer(num_employees)
Experience <- character(num_employees)
Department <- character(num_employees)
Location
# Generate data for each employee
for (i in 1:num_employees) {
<- sample(10000:99999, 1)
ID_Number[i] <- sample(names(positions), 1)
pos <- pos
Position[i]
<- positions[[pos]][1:2]
salary_range <- sample(salary_range[1]:salary_range[2], 1)
Salary[i]
<- positions[[pos]][3:4]
exp_range <- sample(exp_range[1]:exp_range[2], 1)
Experience[i]
<- Experience[i] + sample(22:35, 1)
Age[i] <- sample(departments, 1)
Department[i] <- sample(locations, 1)
Location[i]
}
# Combine the vectors into a data frame
<- data.frame(
df ID_Number = ID_Number,
Position = Position,
Salary = Salary,
Age = Age,
Experience = Experience,
Department = Department,
Location = Location,
stringsAsFactors = FALSE
)
return(df)
}
# Example usage:
<- create_employee_dataset(20)
df print(df)
ID_Number Position Salary Age Experience Department Location
1 89448 Director 14019 54 22 IT Houston
2 91719 Staff 4485 29 3 Marketing Houston
3 61283 Director 12884 45 17 Finance Phoenix
4 63557 Staff 3641 32 5 IT Phoenix
5 84461 Manager 10035 40 10 Operations Chicago
6 59823 Director 13348 45 20 Finance New York
7 59041 Manager 9393 37 13 Finance Houston
8 69157 Staff 3980 26 2 Sales Houston
9 76890 Staff 4753 29 2 Operations Houston
10 55000 Manager 9375 39 13 Marketing Chicago
11 22192 Manager 11612 46 15 Marketing New York
12 97870 Director 14747 56 22 Marketing Chicago
13 78823 Supervisor 5781 29 5 IT Chicago
14 70550 Supervisor 6384 41 9 Sales Chicago
15 20601 Supervisor 7561 38 10 Sales Phoenix
16 68781 Director 14730 45 15 Marketing Houston
17 31999 Manager 11137 34 11 Marketing Phoenix
18 51925 Supervisor 6945 35 6 IT Houston
19 29311 Manager 8687 45 15 Marketing Los Angeles
20 64260 Staff 3320 33 4 Finance Houston
import pandas as pd
import numpy as np
def manual_statistics(df, column=None):
def stats_for_column(values):
# Remove missing values for accurate computations
= values.dropna()
values if pd.api.types.is_numeric_dtype(values):
= len(values)
count = np.mean(values)
mean_value = np.median(values)
median_value = np.var(values, ddof=1) if count > 1 else 0
variance_value = np.sqrt(variance_value)
std_dev_value = np.min(values)
min_value = np.max(values)
max_value = np.percentile(values, 25)
q1 = np.percentile(values, 75)
q3 return {
"count": count,
"mean": mean_value,
"median": median_value,
"variance": variance_value,
"std_dev": std_dev_value,
"min": min_value,
"q1": q1,
"q3": q3,
"max": max_value
}else:
= len(values)
count = values.nunique()
unique_count = values.mode()
mode_series = mode_series.iloc[0] if not mode_series.empty else None
mode_value = values.value_counts().to_dict()
frequency return {
"count": count,
"unique": unique_count,
"mode": mode_value,
"frequency": frequency
}
if column is not None:
return stats_for_column(df[column])
else:
= {}
summary for col in df.columns:
= stats_for_column(df[col])
summary[col] return summary
# Get summary statistics for all columns
= manual_statistics(df)
stats_all
# Display the results in attractive tables using pandas' to_markdown()
for col, stats in stats_all.items():
print(f"\n### Summary Statistics for '{col}'\n")
if pd.api.types.is_numeric_dtype(df[col]):
# Create a DataFrame for numeric statistics with Statistic and Value
= pd.DataFrame({
stats_df "Statistic": list(stats.keys()),
"Value": list(stats.values())
})print(stats_df.to_markdown(index=False))
else:
# For categorical data, create summary table and frequency distribution
= pd.DataFrame({
summary_df "Statistic": ["count", "unique", "mode"],
"Value": [stats["count"], stats["unique"], stats["mode"]]
})= stats["frequency"]
freq_dict = pd.DataFrame({
freq_df "Category": list(freq_dict.keys()),
"Frequency": list(freq_dict.values())
})print(summary_df.to_markdown(index=False))
print("\n")
print(freq_df.to_markdown(index=False))
### Summary Statistics for 'ID_Number'
| Statistic | Value |
|:------------|----------------:|
| count | 20 |
| mean | 53390 |
| median | 54586.5 |
| variance | 6.19468e+08 |
| std_dev | 24889.1 |
| min | 12394 |
| q1 | 34073.2 |
| q3 | 71695.5 |
| max | 89750 |
### Summary Statistics for 'Position'
| Statistic | Value |
|:------------|:-----------|
| count | 20 |
| unique | 4 |
| mode | Supervisor |
| Category | Frequency |
|:-----------|------------:|
| Supervisor | 9 |
| Director | 4 |
| Manager | 4 |
| Staff | 3 |
### Summary Statistics for 'Salary'
| Statistic | Value |
|:------------|----------------:|
| count | 20 |
| mean | 8254.05 |
| median | 7008.5 |
| variance | 1.12852e+07 |
| std_dev | 3359.34 |
| min | 3957 |
| q1 | 5459.5 |
| q3 | 11352 |
| max | 14198 |
### Summary Statistics for 'Age'
| Statistic | Value |
|:------------|---------:|
| count | 20 |
| mean | 37.15 |
| median | 38.5 |
| variance | 48.1342 |
| std_dev | 6.93788 |
| min | 26 |
| q1 | 31.75 |
| q3 | 43 |
| max | 47 |
### Summary Statistics for 'Experience'
| Statistic | Value |
|:------------|---------:|
| count | 20 |
| mean | 9.5 |
| median | 7.5 |
| variance | 29.2105 |
| std_dev | 5.40468 |
| min | 1 |
| q1 | 5 |
| q3 | 14.25 |
| max | 18 |
### Summary Statistics for 'Department'
| Statistic | Value |
|:------------|:--------|
| count | 20 |
| unique | 6 |
| mode | Finance |
| Category | Frequency |
|:-----------|------------:|
| Finance | 6 |
| IT | 4 |
| Marketing | 4 |
| Operations | 3 |
| Sales | 2 |
| HR | 1 |
### Summary Statistics for 'Location'
| Statistic | Value |
|:------------|:------------|
| count | 20 |
| unique | 5 |
| mode | Los Angeles |
| Category | Frequency |
|:------------|------------:|
| Los Angeles | 6 |
| Phoenix | 6 |
| Chicago | 4 |
| Houston | 2 |
| New York | 2 |
library(knitr)
library(kableExtra)
<- function(df, column = NULL) {
manual_statistics # Helper function to compute statistics for a single column
<- function(values) {
stats_for_column # Remove NA values for accurate computations
<- values[!is.na(values)]
values
if (is.numeric(values)) {
<- length(values)
count <- mean(values)
mean_value <- median(values)
median_value <- if (count > 1) var(values) else 0
variance_value <- sqrt(variance_value)
std_dev_value <- min(values)
min_value <- max(values)
max_value <- as.numeric(quantile(values, 0.25))
q1 <- as.numeric(quantile(values, 0.75))
q3
return(list(
count = count,
mean = mean_value,
median = median_value,
variance = variance_value,
std_dev = std_dev_value,
min = min_value,
q1 = q1,
q3 = q3,
max = max_value
))else {
} <- length(values)
count <- length(unique(values))
unique_count <- table(values)
tab <- names(tab)[which.max(tab)]
mode_value <- as.list(tab)
frequency
return(list(
count = count,
unique = unique_count,
mode = mode_value,
frequency = frequency
))
}
}
# If a specific column is provided, compute statistics only for that column.
if (!is.null(column)) {
return(stats_for_column(df[[column]]))
else {
} # Otherwise, compute statistics for each column in the DataFrame.
<- list()
summary for (col in names(df)) {
<- stats_for_column(df[[col]])
summary[[col]]
}return(summary)
} }
# Hitung summary statistics untuk semua kolom
<- manual_statistics(df)
stats_all
# Loop untuk menampilkan hasil setiap kolom dengan DT::datatable
for (col in names(stats_all)) {
cat(paste0("<h3>Summary Statistics for '", col, "'</h3>"))
<- stats_all[[col]]
col_stats
if (is.numeric(df[[col]])) {
<- data.frame(
stats_df Statistic = names(col_stats),
Value = as.numeric(unlist(col_stats)),
stringsAsFactors = FALSE
)print(DT::datatable(stats_df,
caption = paste("Summary for", col),
options = list(pageLength = 5, autoWidth = TRUE)))
else {
} <- data.frame(
summary_df Statistic = c("count", "unique", "mode"),
Value = c(col_stats$count, col_stats$unique, col_stats$mode),
stringsAsFactors = FALSE
)<- as.data.frame(do.call(rbind, col_stats$frequency))
freq_df <- cbind(Category = rownames(freq_df), freq_df)
freq_df rownames(freq_df) <- NULL
names(freq_df)[2] <- "Frequency"
print(DT::datatable(summary_df,
caption = paste("Summary for", col),
options = list(pageLength = 5, autoWidth = TRUE)))
cat("<br>")
print(DT::datatable(freq_df,
caption = paste("Frequency Distribution for", col),
options = list(pageLength = 5, autoWidth = TRUE)))
}
cat("<br><br>")
}
FALSE <h3>Summary Statistics for 'ID_Number'</h3><br><br><h3>Summary Statistics for 'Position'</h3><br><br><br><h3>Summary Statistics for 'Salary'</h3><br><br><h3>Summary Statistics for 'Age'</h3><br><br><h3>Summary Statistics for 'Experience'</h3><br><br><h3>Summary Statistics for 'Department'</h3><br><br><br><h3>Summary Statistics for 'Location'</h3><br><br><br>